Fast and simple deterministic seeding of KMeans for text document clustering

Ehsan Sherkat; Julien Velcin; Evangelos E. Milios

Conference Proceedings

Fast and simple deterministic seeding of KMeans for text document clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11018 LNCS 76-88

DOI: 10.1007/978-3-319-98932-7_7

8Citations

9Readers

Get full text

Abstract

KMeans is one of the most popular document clustering algorithms. It is usually initialized by random seeds that can drastically impact the final algorithm performance. There exists many random or order-sensitive methods that try to properly initialize KMeans but their problem is that their result is non-deterministic and unrepeatable. Thus KMeans needs to be initialized several times to get a better result, which is a time-consuming operation. In this paper, we introduce a novel deterministic seeding method for KMeans that is specifically designed for text document clustering. Due to its simplicity, it is fast and can be scaled to large datasets. Experimental results on several real-world datasets demonstrate that the proposed method has overall better performance compared to several deterministic, random, or order-sensitive methods in terms of clustering quality and runtime.

Author supplied keywords

Cite

CITATION STYLE

APA

Sherkat, E., Velcin, J., & Milios, E. E. (2018). Fast and simple deterministic seeding of KMeans for text document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11018 LNCS, pp. 76–88). Springer Verlag. https://doi.org/10.1007/978-3-319-98932-7_7

Fast and simple deterministic seeding of KMeans for text document clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions