Evaluation of internal validity measures in short-text corpora

Diego Ingaramo; David Pinto; Paolo Rosso; Marcelo Errecalde

Conference Proceedings

Evaluation of internal validity measures in short-text corpora

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 4919 LNCS 555-567

DOI: 10.1007/978-3-540-78135-6_48

29Citations

13Readers

Get full text

Abstract

Short texts clustering is one of the most difficult tasks in natural language processing due to the low frequencies of the document terms. We are interested in analysing these kind of corpora in order to develop novel techniques that may be used to improve results obtained by classical clustering algorithms. In this paper we are presenting an evaluation of different internal clustering validity measures in order to determine the possible correlation between these measures and that of the F-Measure, a well-known external clustering measure used to calculate the performance of clustering algorithms. We have used several short-text corpora in the experiments carried out. The obtained correlation with a particular set of internal validity measures let us to conclude that some of them may be used to improve the performance of text clustering algorithms. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Ingaramo, D., Pinto, D., Rosso, P., & Errecalde, M. (2008). Evaluation of internal validity measures in short-text corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4919 LNCS, pp. 555–567). https://doi.org/10.1007/978-3-540-78135-6_48

Evaluation of internal validity measures in short-text corpora

Abstract

Cite

Register to see more suggestions