In this paper, we present a clustering approach based on the combined use of a continuous vector space representation of sentences and the k-means algorithm. The principal motivation of this proposal is to split a big heterogeneous corpus into clusters of similar sentences. We use the word2vec toolkit for obtaining the representation of a given word as a continuous vector space. We provide empirical evidence for proving that the use of our technique can lead to better clusters, in terms of intra-cluster perplexity and F1 score.
CITATION STYLE
Chinea-Rios, M., Sanchis-Trilles, G., & Casacuberta, F. (2015). Sentence clustering using continuous vector space representation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9117, pp. 432–440). Springer Verlag. https://doi.org/10.1007/978-3-319-19390-8_49
Mendeley helps you to discover research relevant for your work.