Short text clustering is an essential pre-process in social network analysis, where k-means is one of the most famous clustering algorithms for its simplicity and efficiency. However, k-means is instable and sensitive to the initial cluster centers, and it can be trapped in some local optimums. Moreover, its parameter of cluster number k is hard to be determined accurately. In this paper, we propose an improved k-means algorithm MAKM (MAFIA-based k-means) equipped with a new feature extraction method TT (Term Transition) to overcome the shortages. In MAKM, the initial centers and the cluster number k are determined by an improved algorithm of Mining Maximal Frequent Item Sets. In TT, we claim that co-occurrence between two words in short text represents greater correlation and each word has certain probabilities of spreading to others. The Experiment on real datasets shows our approach achieves better results. © Springer-Verlag 2013.
CITATION STYLE
Ma, P., & Zhang, Y. (2013). MAKM: A MAFIA-based k-means algorithm for short text in social networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7826 LNCS, pp. 210–218). https://doi.org/10.1007/978-3-642-37450-0_15
Mendeley helps you to discover research relevant for your work.