Clustering narrow-domain short texts using k-means, linguistic patterns and LSI

Svetlana Popova; Vera Danilova; Artem Egorov

Journal Article

Clustering narrow-domain short texts using k-means, linguistic patterns and LSI

Communications in Computer and Information Science (2014) 436 66-77

DOI: 10.1007/978-3-319-12580-0_18

1Citations

1Readers

Get full text

Abstract

In the present work we consider the problem of narrowdomain clustering of short texts, such as academic abstracts. Our main objective is to check whether it is possible to improve the quality of kmeans algorithm expanding the feature space by adding a dictionary of word groups that were selected from texts on the basis of a fixed set of patterns. Also, we check the possibility to increase the quality of clustering by mapping the feature spaces to a semantic space with a lower dimensionality using Latent Semantic Indexing (LSI). The results allow us to assume that the aforementioned modifications are feasible in practical terms as compared to the use of k-means in the feature space defined only by the main dictionary of the corpus.

Author supplied keywords

Cite

CITATION STYLE

APA

Popova, S., Danilova, V., & Egorov, A. (2014). Clustering narrow-domain short texts using k-means, linguistic patterns and LSI. Communications in Computer and Information Science, 436, 66–77. https://doi.org/10.1007/978-3-319-12580-0_18

Clustering narrow-domain short texts using k-means, linguistic patterns and LSI

Abstract

Author supplied keywords

Cite

Register to see more suggestions