SuDoC: Semi-unsupervised classification of text document opinions using a few labeled examples and clustering

5Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The presented novel procedure named SuDoC - or Semi-unsupervised Document Classification - provides an alternative method to standard clustering techniques when it is necessary to separate a very large set of textual instances into groups that represent the text-document semantics. Unlike the conventional clustering, SuDoC proceeds from an initial small set of typical specimen that can be created manually and which provides the necessary bias for generating appropriate classes. SuDoC starts with a higher number of generated clusters and - to avoid over-fitting - reiteratively decreases their quantity, increasing the resulting classification generality. The unlabeled instances are automatically labeled according to their similarity to the defined labeled samples, thus reaching higher classification accuracy in the future. The results of the presented strengthened clustering procedure are demonstrated using a real-world data set represented by hotel guests' unstructured reviews written in natural language. © 2013 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Dařena, F., & Žižka, J. (2013). SuDoC: Semi-unsupervised classification of text document opinions using a few labeled examples and clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8132 LNAI, pp. 625–636). https://doi.org/10.1007/978-3-642-40769-7_54

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free