More features are not always better: Evaluating generalizing models in incident type classification of tweets

2Citations
Citations of this article
95Readers
Mendeley users who have this article in their library.

Abstract

Social media represents a rich source of upto-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity for further processing. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. In this study, we investigate which features are most suitable for training generalizable models, i.e., models that show good performance across different datasets. We reimplemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many sophisticated features are not necessarily valuable for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams.

Cite

CITATION STYLE

APA

Schulz, A., Guckelsberger, C., & Schmidt, B. (2015). More features are not always better: Evaluating generalizing models in incident type classification of tweets. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 421–430). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1048

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free