An improved hierarchical Bayesian model of language for document classification

Allison Ben

Conference ProceedingsOPEN ACCESS

An improved hierarchical Bayesian model of language for document classification

Ben A

Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference (2008) 1 25-32

DOI: 10.3115/1599081.1599085

5Citations

81Readers

Abstract

This paper addresses the fundamental problem of document classification, and we focus attention on classification problems where the classes are mutually exclusive. In the course of the paper we advocate an approximate sampling distribution for word counts in documents, and demonstrate the model's capacity to outperform both the simple multinomial and more recently proposed extensions on the classification task. We also compare the classifiers to a linear SVM, and show that provided certain conditions are met, the new model allows performance which exceeds that of the SVM and attains amongst the very best published results on the Newsgroups classification task. © 2008 Licensed under the Creative Commons.

Cite

CITATION STYLE

APA

Ben, A. (2008). An improved hierarchical Bayesian model of language for document classification. In Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 25–32). https://doi.org/10.3115/1599081.1599085

An improved hierarchical Bayesian model of language for document classification

Abstract

Cite

Register to see more suggestions