Find the index of cluster center in data scikitlearn
Answers
Answered by
0
There is no "original representation" of the cluster centers in k-means; they are not actually points (vectorized documents) from the input set, but means of multiple points. Such means cannot be transformed back into documents since the bag-of-words representation destroys the order of terms.
One possible approximation is to take a centroid vector, then use TfidfVectorizer.inverse_transform on it to find out which terms have non-zero tf-idf value in it.
You could achieve what you want with the k-medoids algorithm, which does assign actual input points as centroids, but that is not implemented in scikit-learn.
Similar questions