SuDoC: Semi-unsupervised classification of text document opinions using a few labeled examples and clustering

František Dařena; Jan Žižka

Conference Proceedings

SuDoC: Semi-unsupervised classification of text document opinions using a few labeled examples and clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8132 LNAI 625-636

DOI: 10.1007/978-3-642-40769-7_54

5Citations

4Readers

Get full text

Abstract

The presented novel procedure named SuDoC - or Semi-unsupervised Document Classification - provides an alternative method to standard clustering techniques when it is necessary to separate a very large set of textual instances into groups that represent the text-document semantics. Unlike the conventional clustering, SuDoC proceeds from an initial small set of typical specimen that can be created manually and which provides the necessary bias for generating appropriate classes. SuDoC starts with a higher number of generated clusters and - to avoid over-fitting - reiteratively decreases their quantity, increasing the resulting classification generality. The unlabeled instances are automatically labeled according to their similarity to the defined labeled samples, thus reaching higher classification accuracy in the future. The results of the presented strengthened clustering procedure are demonstrated using a real-world data set represented by hotel guests' unstructured reviews written in natural language. © 2013 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Dařena, F., & Žižka, J. (2013). SuDoC: Semi-unsupervised classification of text document opinions using a few labeled examples and clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8132 LNAI, pp. 625–636). https://doi.org/10.1007/978-3-642-40769-7_54

SuDoC: Semi-unsupervised classification of text document opinions using a few labeled examples and clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions