Event extraction from Twitter using Non-Parametric Bayesian Mixture Model with Word Embeddings

Deyu Zhou; Xuan Zhang; Yulan He

Conference ProceedingsOPEN ACCESS

Event extraction from Twitter using Non-Parametric Bayesian Mixture Model with Word Embeddings

15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference (2017) 1 808-817

DOI: 10.18653/v1/e17-1076

24Citations

106Readers

Abstract

To extract structured representations of newsworthy events from Twitter, unsupervised models typically assume that tweets involving the same named entities and expressed using similar words are likely to belong to the same event. Hence, they group tweets into clusters based on the cooccurrence patterns of named entities and topical keywords. However, there are two main limitations. First, they require the number of events to be known beforehand, which is not realistic in practical applications. Second, they don't recognise that the same named entity might be referred to by multiple mentions and tweets using different mentions would be wrongly assigned to different events. To overcome these limitations, we propose a nonparametric Bayesian mixture model with word embeddings for event extraction, in which the number of events can be inferred automatically and the issue of lexical variations for the same named entity can be dealt with properly. Our model has been evaluated on three datasets with sizes ranging between 2,499 and over 60 million tweets. Experimental results show that our model outperforms the baseline approach on all datasets by 5-8% in F-measure.

Cite

CITATION STYLE

APA

Zhou, D., Zhang, X., & He, Y. (2017). Event extraction from Twitter using Non-Parametric Bayesian Mixture Model with Word Embeddings. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference (Vol. 1, pp. 808–817). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/e17-1076

Event extraction from Twitter using Non-Parametric Bayesian Mixture Model with Word Embeddings

Abstract

Cite

Register to see more suggestions