Computer Science, asked by Uday1088, 10 months ago

Clustering algorithm for categorical as well as real data

Answers

Answered by Anonymous

Answer:

That's a good idea. It's basically the Hamming distance between the categorical variables. You could compute that, and also an Euclidean distance between the numerical variables, and give a weight to each, depending on the number of categorical and non-categorical variables, the range of the numerical features, etc.

Answered by hinaguptagracy

Explanation:

In my opinion, the actual clustering algorithm that creates a partition of your dataset is a secondary choice. The success of clustering depends mainly on the choice of dissimilarity measure. Only if you formalize your prior knowledge / your expectations about what makes things different into a measure of dissimilarity, your clustering result can help you to understand your data. Clustering is a little tricky because the objective function to optimize is often unclear and might even be a different one for two different researchers, even if they use the same dataset. In case you have a dataset that has different labels or some pairwise dissimilarities defined, you can try to infer a proper dissimilarity function. Otherwise, you have to make assumptions about your data (e.g., distributional assumptions) to obtain an interpretable result.

Previous Question

Next Question