Finding the needle in a haystack: Extraction of Informative COVID-19 Danish Tweets

2Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.

Abstract

Finding informative COVID-19 posts in a stream of tweets is very useful to monitor health-related updates. Prior work focused on a balanced data setup and on English, but informative tweets are rare, and English is only one of the many languages spoken in the world. In this work, we introduce a new dataset of 5,000 tweets for finding informative COVID-19 tweets for Danish. In contrast to prior work, which balances the label distribution, we model the problem by keeping its natural distribution. We examine how well a simple probabilistic model and a convolutional neural network (CNN) perform on this task. We find a weighted CNN to work well but it is sensitive to embedding and hyperparameter choices. We hope the contributed dataset is a starting point for further work in this direction.

Cite

CITATION STYLE

APA

Olsen, B., & Plank, B. (2021). Finding the needle in a haystack: Extraction of Informative COVID-19 Danish Tweets. In W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference (pp. 11–19). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.wnut-1.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free