Abstract
Detecting the semantic coherence of a document is a challenging task and has several applications such as in text segmentation and categorization. This paper is an attempt to distinguish between a 'semantically coherent' true document and a 'randomly generated' false document through topic detection in the framework of latent Dirichlet analysis. Based on the premise that a true document contains only a few topics and a false document is made up of many topics, it is asserted that the entropy of the topic distribution will be lower for a true document than that for a false document. This hypothesis is tested on several false document sets generated by various methods and is found to be useful for fake content detection applications. © 2008.
Cite
CITATION STYLE
Misra, H., Cappé, O., & Yvon, F. (2008). Using LDA to detect semantically incoherent documents. In CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning (pp. 41–48). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1596324.1596332
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.