Using LDA to detect semantically incoherent documents

40Citations
Citations of this article
155Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Detecting the semantic coherence of a document is a challenging task and has several applications such as in text segmentation and categorization. This paper is an attempt to distinguish between a 'semantically coherent' true document and a 'randomly generated' false document through topic detection in the framework of latent Dirichlet analysis. Based on the premise that a true document contains only a few topics and a false document is made up of many topics, it is asserted that the entropy of the topic distribution will be lower for a true document than that for a false document. This hypothesis is tested on several false document sets generated by various methods and is found to be useful for fake content detection applications. © 2008.

Cite

CITATION STYLE

APA

Misra, H., Cappé, O., & Yvon, F. (2008). Using LDA to detect semantically incoherent documents. In CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning (pp. 41–48). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1596324.1596332

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free