Using LDA to detect semantically incoherent documents

Hemant Misra; Olivier Cappé; François Yvon

Conference Proceedings

Using LDA to detect semantically incoherent documents

CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning (2008) 41-48

DOI: 10.3115/1596324.1596332

40Citations

155Readers

Get full text

Abstract

Detecting the semantic coherence of a document is a challenging task and has several applications such as in text segmentation and categorization. This paper is an attempt to distinguish between a 'semantically coherent' true document and a 'randomly generated' false document through topic detection in the framework of latent Dirichlet analysis. Based on the premise that a true document contains only a few topics and a false document is made up of many topics, it is asserted that the entropy of the topic distribution will be lower for a true document than that for a false document. This hypothesis is tested on several false document sets generated by various methods and is found to be useful for fake content detection applications. © 2008.

Cite

CITATION STYLE

APA

Misra, H., Cappé, O., & Yvon, F. (2008). Using LDA to detect semantically incoherent documents. In CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning (pp. 41–48). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1596324.1596332

Using LDA to detect semantically incoherent documents

Abstract

Cite

Register to see more suggestions