Modeling documents by combining semantic concepts with unsupervised statistical learning

Chaitanya Chemudugunta; America Holloway; Padhraic Smyth; Mark Steyvers

Conference ProceedingsOPEN ACCESS

Modeling documents by combining semantic concepts with unsupervised statistical learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5318 LNCS 229-244

DOI: 10.1007/978-3-540-88564-1_15

58Citations

95Readers

Abstract

Human-defined concepts are fundamental building-blocks in constructing knowledge bases such as ontologies. Statistical learning techniques provide an alternative automated approach to concept definition, driven by data rather than prior knowledge. In this paper we propose a probabilistic modeling framework that combines both human-defined concepts and data-driven topics in a principled manner. The methodology we propose is based on applications of statistical topic models (also known as latent Dirichlet allocation models). We demonstrate the utility of this general framework in two ways. We first illustrate how the methodology can be used to automatically tag Web pages with concepts from a known set of concepts without any need for labeled documents. We then perform a series of experiments that quantify how combining human-defined semantic knowledge with data-driven techniques leads to better language models than can be obtained with either alone. © 2008 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Chemudugunta, C., Holloway, A., Smyth, P., & Steyvers, M. (2008). Modeling documents by combining semantic concepts with unsupervised statistical learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5318 LNCS, pp. 229–244). Springer Verlag. https://doi.org/10.1007/978-3-540-88564-1_15

Modeling documents by combining semantic concepts with unsupervised statistical learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions