In the present work we consider the problem of narrowdomain clustering of short texts, such as academic abstracts. Our main objective is to check whether it is possible to improve the quality of kmeans algorithm expanding the feature space by adding a dictionary of word groups that were selected from texts on the basis of a fixed set of patterns. Also, we check the possibility to increase the quality of clustering by mapping the feature spaces to a semantic space with a lower dimensionality using Latent Semantic Indexing (LSI). The results allow us to assume that the aforementioned modifications are feasible in practical terms as compared to the use of k-means in the feature space defined only by the main dictionary of the corpus.
CITATION STYLE
Popova, S., Danilova, V., & Egorov, A. (2014). Clustering narrow-domain short texts using k-means, linguistic patterns and LSI. Communications in Computer and Information Science, 436, 66–77. https://doi.org/10.1007/978-3-319-12580-0_18
Mendeley helps you to discover research relevant for your work.