Suppose that the data for analysis includes the attribute age. The age values for the data tuples
are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) Use smoothing by bin means to smooth these data given, using a bin depth of 3.
Illustrate your steps. Comment on the effect of this technique for the given data.
(b) How might you determine outliers in the data?
(c) What other methods are there for data smoothing?
Answers
Answer:
Step 1 : Since the bin depth is 3 i.e., every bin will have 3 values. And we have total 27 values , so there will be 9 bins
BIN 1 : 13,15,16
BIN 2 : 16,19,20
BIN 3 : 20,21,22
BIN 4 : 22,25,25
BIN 5 : 25,25,30
BIN 6 : 33,33,35
BIN 7 : 35,35,35
BIN 8 : 36,40,45
BIN 9 : 46,52,70
Step 2 : Now every bin value will be replaced by the respective mean of that bin
BIN 1 : 14.67,14.67,14.67
BIN 2 : 18.33,18.33,18.33
BIN 3 : 21,21,21
BIN 4 : 24,24,24
BIN 5 : 26.67,26.67,26.67
BIN 6 : 33.67,33.67,33.67
BIN 7 : 35,35,35
BIN 8 : 40.33,40.33,40.33
BIN 9 : 56,56,56
In Smoothing by bin means, each value in a bin is replaced by the mean value of the bin. In general, the larger the width the greater the effect of the smoothing.
b) How might you determine outliers in the data?
Ans : 1) Through Boxplot Analysis : The individual point plotted beyond a certain threshold (1.5*IQR).
2) Through Scatter Plots : Outliers are clearly visible in this plot.
3) Through Clustering : Outliers may be detected by clustering, where similar values are organised into a group or cluster.
c) What other methods are there for data smoothing?
Ans : Data Smoothing can be done through :
1) Binning
2) Regression
3) Outlier Analysis
Explanation: