We study unsupervised classification of text documents into a taxonomy of concepts annotated by only a few keywords. Our central claim is that the structure of the taxonomy encapsulates background knowledge that can be exploited to improve classification accuracy. Under our hierarchical Dirichlet generative model for the document corpus, we show that the unsupervised classification algorithm provides robust estimates of the classification parameters by performing regularization, and that our algorithm can be interpreted as a regularized EM algorithm. We also propose a technique for the automatic choice of the regularization parameter. In addition we propose a regularization scheme for K-means for hierarchies. We experimentally demonstrate that both our regularized clustering algorithms achieve a higher classification accuracy over simple models like minimum distance, Naïve Bayes, EM and K-means. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Sona, D., Veeramachaneni, S., Polettini, N., & Avesani, P. (2006). Regularization for unsupervised classification on taxonomies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4203 LNAI, pp. 691–696). Springer Verlag. https://doi.org/10.1007/11875604_76
Mendeley helps you to discover research relevant for your work.