More features are not always better: Evaluating generalizing models in incident type classification of tweets

Axel Schulz; Christian Guckelsberger; Benedikt Schmidt

Conference ProceedingsOPEN ACCESS

More features are not always better: Evaluating generalizing models in incident type classification of tweets

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015) 421-430

DOI: 10.18653/v1/d15-1048

2Citations

95Readers

Abstract

Social media represents a rich source of upto-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity for further processing. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. In this study, we investigate which features are most suitable for training generalizable models, i.e., models that show good performance across different datasets. We reimplemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many sophisticated features are not necessarily valuable for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams.

Cite

CITATION STYLE

APA

Schulz, A., Guckelsberger, C., & Schmidt, B. (2015). More features are not always better: Evaluating generalizing models in incident type classification of tweets. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 421–430). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1048

More features are not always better: Evaluating generalizing models in incident type classification of tweets

Abstract

Cite

Register to see more suggestions