Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation

12Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

Abstract

Work on cross document coreference resolution (CDCR) has primarily focused on news articles, with little to no work for social media. Yet social media may be particularly challenging since short messages provide little context, and informal names are pervasive. We introduce a new Twitter corpus that contains entity annotations for entity clusters that supports CDCR. Our corpus draws from Twitter data surrounding the 2013 Grammy music awards ceremony, providing a large set of annotated tweets focusing on a single event. To establish a baseline we evaluate two CDCR systems and consider the performance impact of each system component. Furthermore, we augment one system to include temporal information, which can be helpful when documents (such as tweets) arrive in a specific order. Finally, we include annotations linking the entities to a knowledge base to support entity linking. Our corpus is available: https://bitbucket.org/mdredze/tgx.

Cite

CITATION STYLE

APA

Dredze, M., Andrews, N., & DeYoung, J. (2016). Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 4th International Workshop on Natural Language Processing for Social Media, SocialNLP 2016 (pp. 20–25). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-6204

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free