Using Topic Modeling and Similarity Thresholds to Detect Events

16Citations
Citations of this article
93Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a Retrospective Event Detection algorithm, called Eventy-Topic Detection (ETD), which automatically generates topics that describe events in a large, temporal text corpus. Our approach leverages the structure of the topic modeling framework, specifically the Latent Dirichlet Allocation (LDA), to generate topics which are then later labeled as Eventy-Topics or non-Eventy-Topics. The system first runs daily LDA topic models, then calculates the cosine similarity between the topics of the daily topic models, and then runs our novel Bump-Detection algorithm. Similar topics labeled as an Eventy-Topic are then grouped together. The algorithm is demonstrated on two Terabyte sized corpuses - a Reuters News corpus and a Twitter corpus. Our method is evaluated on a human annotated test set. Our algorithm demonstrates its ability to accurately describe and label events in a temporal text corpus.

Cite

CITATION STYLE

APA

Keane, N., Yee, C., & Zhou, L. (2015). Using Topic Modeling and Similarity Thresholds to Detect Events. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, EVENTS 2015 (pp. 34–42). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w15-0805

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free