What is classification? Explain the various steps involved in construction of tables?
Answers
Classification means putting objects into groups (classes) based on some property they have. If you are given a group of things, such as triangles or people, you can classify them based on some property they have. For example you might classify people by eye color.
Answer:
A bank is interested in knowing which customers are likely to default on loan payments. The bank is also interested in knowing what characteristics of customers may explain their loan payment behavior. An advertiser is interested in choosing the set of customers or prospects who are most likely to respond to a direct mail campaign. The advertiser is also interested in knowing what characteristics of consumers are most likely to explain responsiveness to the campaign. A procurement manager is interested in knowing which orders will most likely be delayed, based on recent behavior of the suppliers. An investor is interested in knowing which assets are most likely to increase in value.
Classification (or categorization) techniques are useful to help answer such questions. They help predict the group membership (or class - hence called classification techniques) of individuals (data), for predefined group memberships (e.g. “success” vs “failure” for binary classification, the focus of this note), and also to describe which characteristics of individuals can predict their group membership. Examples of group memberships/classes could be: (1) loyal customers versus customers who will churn; (2) high price sensitive versus low price sensitive customers; (3) satisfied versus dissatisfied customers; (4) purchasers versus non-purchasers; (5) assets that increase in value versus not; (6) products that may be good recommendations to a customer versus not, etc. Characteristics that are useful in classifying individuals/data into predefined groups/classes could include for example (1) demographics; (2) psychographics; (3) past behavior; (4) attitudes towards specific products, (5) social network data, etc.
There are many techniques for solving classification problems: classification trees, logistic regression, discriminant analysis, neural networks, boosted trees, random forests, deep learning methods, nearest neighbors, support vector machines, etc, (e.g. see the R package “e1071” for more example methods). There are also many R packages for everything developed in the past - including the “fashionable” methods on deep learning - see various news here or here or here for example. Microsoft also has a large collection of methods they they develop. In this report, for simplicity we focus on the first two, although one can always use some of the other methods instead of the ones discussed here. The focus of this note is not do explain any specific (“black box, math”) classification method, but to describe a process for classification independent of the method used (e.g. independent of the method selected in one of the steps in the process outlined below).
An important question when using classification methods is to assess the relative performance of all available methods/models i.e. in order to use the best one according to our criteria. To this purpose there are standard performance classification assessment metrics, which we discuss below - this is a key focus of this note.
Step-by-step explanation:
Pls mark as BRAINLIEST if you are satisfied ☺️