Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation

Mark Dredze; Nicholas Andrews; Jay DeYoung

Conference ProceedingsOPEN ACCESS

Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation

EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 4th International Workshop on Natural Language Processing for Social Media, SocialNLP 2016 (2016) 20-25

DOI: 10.18653/v1/w16-6204

12Citations

77Readers

Abstract

Work on cross document coreference resolution (CDCR) has primarily focused on news articles, with little to no work for social media. Yet social media may be particularly challenging since short messages provide little context, and informal names are pervasive. We introduce a new Twitter corpus that contains entity annotations for entity clusters that supports CDCR. Our corpus draws from Twitter data surrounding the 2013 Grammy music awards ceremony, providing a large set of annotated tweets focusing on a single event. To establish a baseline we evaluate two CDCR systems and consider the performance impact of each system component. Furthermore, we augment one system to include temporal information, which can be helpful when documents (such as tweets) arrive in a specific order. Finally, we include annotations linking the entities to a knowledge base to support entity linking. Our corpus is available: https://bitbucket.org/mdredze/tgx.

Cite

CITATION STYLE

APA

Dredze, M., Andrews, N., & DeYoung, J. (2016). Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 4th International Workshop on Natural Language Processing for Social Media, SocialNLP 2016 (pp. 20–25). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-6204

Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation

Abstract

Cite

Register to see more suggestions