This paper addresses the fundamental problem of document classification, and we focus attention on classification problems where the classes are mutually exclusive. In the course of the paper we advocate an approximate sampling distribution for word counts in documents, and demonstrate the model's capacity to outperform both the simple multinomial and more recently proposed extensions on the classification task. We also compare the classifiers to a linear SVM, and show that provided certain conditions are met, the new model allows performance which exceeds that of the SVM and attains amongst the very best published results on the Newsgroups classification task. © 2008 Licensed under the Creative Commons.
CITATION STYLE
Ben, A. (2008). An improved hierarchical Bayesian model of language for document classification. In Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 25–32). https://doi.org/10.3115/1599081.1599085
Mendeley helps you to discover research relevant for your work.