This paper presents a Retrospective Event Detection algorithm, called Eventy-Topic Detection (ETD), which automatically generates topics that describe events in a large, temporal text corpus. Our approach leverages the structure of the topic modeling framework, specifically the Latent Dirichlet Allocation (LDA), to generate topics which are then later labeled as Eventy-Topics or non-Eventy-Topics. The system first runs daily LDA topic models, then calculates the cosine similarity between the topics of the daily topic models, and then runs our novel Bump-Detection algorithm. Similar topics labeled as an Eventy-Topic are then grouped together. The algorithm is demonstrated on two Terabyte sized corpuses - a Reuters News corpus and a Twitter corpus. Our method is evaluated on a human annotated test set. Our algorithm demonstrates its ability to accurately describe and label events in a temporal text corpus.
CITATION STYLE
Keane, N., Yee, C., & Zhou, L. (2015). Using Topic Modeling and Similarity Thresholds to Detect Events. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, EVENTS 2015 (pp. 34–42). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w15-0805
Mendeley helps you to discover research relevant for your work.