Clustering narrow-domain short texts using k-means, linguistic patterns and LSI

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the present work we consider the problem of narrowdomain clustering of short texts, such as academic abstracts. Our main objective is to check whether it is possible to improve the quality of kmeans algorithm expanding the feature space by adding a dictionary of word groups that were selected from texts on the basis of a fixed set of patterns. Also, we check the possibility to increase the quality of clustering by mapping the feature spaces to a semantic space with a lower dimensionality using Latent Semantic Indexing (LSI). The results allow us to assume that the aforementioned modifications are feasible in practical terms as compared to the use of k-means in the feature space defined only by the main dictionary of the corpus.

Cite

CITATION STYLE

APA

Popova, S., Danilova, V., & Egorov, A. (2014). Clustering narrow-domain short texts using k-means, linguistic patterns and LSI. Communications in Computer and Information Science, 436, 66–77. https://doi.org/10.1007/978-3-319-12580-0_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free