Sentence clustering using continuous vector space representation

Mara Chinea-Rios; Germán Sanchis-Trilles; Francisco Casacuberta

Conference Proceedings

Sentence clustering using continuous vector space representation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9117 432-440

DOI: 10.1007/978-3-319-19390-8_49

2Citations

7Readers

Get full text

Abstract

In this paper, we present a clustering approach based on the combined use of a continuous vector space representation of sentences and the k-means algorithm. The principal motivation of this proposal is to split a big heterogeneous corpus into clusters of similar sentences. We use the word2vec toolkit for obtaining the representation of a given word as a continuous vector space. We provide empirical evidence for proving that the use of our technique can lead to better clusters, in terms of intra-cluster perplexity and F1 score.

Author supplied keywords

Cite

CITATION STYLE

APA

Chinea-Rios, M., Sanchis-Trilles, G., & Casacuberta, F. (2015). Sentence clustering using continuous vector space representation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9117, pp. 432–440). Springer Verlag. https://doi.org/10.1007/978-3-319-19390-8_49

Sentence clustering using continuous vector space representation

Abstract

Author supplied keywords

Cite

Register to see more suggestions