What is the difference between discrimination and classification? Between characterization and clustering?
Between classification and prediction? For each of these pairs of tasks, how are they similar?
Answers
Explanation:
Department of Computer Science582634 Data MiningExercices 118 March 2009
1. Present an example where data mining is crucial to the success of a business. What data mining functions does this business need? Can they be performed alternatively by data query processing orsimple statistical analysis?
A suitable example could be found from practically any business that sells items or services. Suchbusiness would require both cross-market analysis (finding associations between product sales) andcustomer profiling (what types of customers buy what products). Based on the acquired profilespredictions can be made on what kind of marketing strategies would be most effective.In theory this knowledge can be acquired with data query processing or simple statistical analysis,but it would require a considerable amount of manual work by expert market analysts, both in orderto decide which queries to use or how to interpret the statistics and due to the huge amount of data.
2. What is the difference between discrimination and classification? Between characterization andclustering? Between classification and prediction? For each of these pairs of tasks, how are theysimilar?
Discrimination vs. classification
Data discrimination is a comparison of the general features of a target class data objects with thegeneral features of objects from one or a set of contrasting classes. Classification is the process offinding a set of models (or functions) that describe and distinguish data classes or concepts, for thepurpose of being able to use the model to predict the class of objects whose class label is unknown.The model is based on the analysis of a set of training data (data objects whose class label isknown). (Han & Kamber)The difference between discrimination and classification is that discrimination compares the general
features
of the target class data to that of contrasting classes, whereas in classification the goal is tobuild models that describe and distinguish data
classes
from each other. As for similarity, bothmethods are interested in things that are different between some classes of objects.
Characterization vs. clustering
Data characterization is a summarization of the general characteristics or features of a target class ofdata. In clustering the objects are grouped together based on the principle of maximizing theintraclass similarity and minimizing the interclass similarity, for e.g. the purpose of generatingtraining data for classification. (Han & Kamber)So the difference between characterization and clustering is that in characterization the generalfeatures of target class are deduced, whereas in clustering similar objects are simply groupedtogether without any interest in their features at this point (rules can be later derived from theformed cluster). You could also say that the output of the process is different: in characterization itis a set of general features, whereas in clustering it a set of object classes. As for similarity, bothmethods are interested in things that are common for some class of objects.
Dismiss user rating prompt
Improve Your Experience
Rating will help us to suggest even better related documents to all of our readers!
67% found this document useful, Mark this document as usefulUseful
33% found this document not useful, Mark this document as not usefulNot useful
Classification vs. prediction
Classification is the process of finding a set of models (or functions) that describe and distinguishdata classes or concepts, for the purpose of being able to use the model to predict the class ofobjects whose class label is unknown. In prediction, rather than predicting class labels, the maininterest (usually) is missing or unavailable data values. (Han & Kamber)So, although classification is actually the step of finding the models, the goal of both methods is topredict something about unknown data objects. The difference is that in classification that“something” is the class of objects, whereas in prediction it is the missing data values.