Regularization for unsupervised classification on taxonomies

Diego Sona; Sriharsha Veeramachaneni; Nicola Polettini; Paolo Avesani

Conference Proceedings

Regularization for unsupervised classification on taxonomies

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4203 LNAI 691-696

DOI: 10.1007/11875604_76

1Citations

6Readers

Get full text

Abstract

We study unsupervised classification of text documents into a taxonomy of concepts annotated by only a few keywords. Our central claim is that the structure of the taxonomy encapsulates background knowledge that can be exploited to improve classification accuracy. Under our hierarchical Dirichlet generative model for the document corpus, we show that the unsupervised classification algorithm provides robust estimates of the classification parameters by performing regularization, and that our algorithm can be interpreted as a regularized EM algorithm. We also propose a technique for the automatic choice of the regularization parameter. In addition we propose a regularization scheme for K-means for hierarchies. We experimentally demonstrate that both our regularized clustering algorithms achieve a higher classification accuracy over simple models like minimum distance, Naïve Bayes, EM and K-means. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Sona, D., Veeramachaneni, S., Polettini, N., & Avesani, P. (2006). Regularization for unsupervised classification on taxonomies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4203 LNAI, pp. 691–696). Springer Verlag. https://doi.org/10.1007/11875604_76

Regularization for unsupervised classification on taxonomies

Abstract

Cite

Register to see more suggestions