This paper aims to aid the ongoing research efforts for combating the Infodemic related to COVID-19. We provide an automatically annotated, bilingual (Arabic/English) COVID-19 Twitter dataset (COVID-19-FAKES). This dataset has been continuously collected from February 04, 2020, to March 10, 2020. For annotating the collected dataset, we utilized the shared information on the official websites and the official Twitter accounts of the WHO, UNICEF, and UN as a source of reliable information, and the collected COVID-19 pre-checked facts from different fact-checking websites to build a ground-truth database. Then, the Tweets in the COVID-19-FAKES dataset are annotated using 13 different machine learning algorithms and employing 7 different feature extraction techniques. We are making our dataset publicly available to the research community (https://github.com/mohaddad/COVID-FAKES). This work will help researchers in understanding the dynamics behind the COVID-19 outbreak on Twitter. Furthermore, it could help in studies related to sentiment analysis, the analysis of the propagation of misleading information related to this outbreak, the analysis of users’ behavior during the crisis, the detection of botnets, the analysis of the performance of different classification algorithms with various feature extraction techniques that are used in text mining. It is worth noting that, in this paper, we use the terms of misleading information, misinformation, and fake news interchangeably.
CITATION STYLE
Elhadad, M. K., Li, K. F., & Gebali, F. (2021). COVID-19-FAKES: A Twitter (Arabic/English) dataset for detecting misleading information on COVID-19. In Advances in Intelligent Systems and Computing (Vol. 1263 AISC, pp. 256–268). Springer. https://doi.org/10.1007/978-3-030-57796-4_25
Mendeley helps you to discover research relevant for your work.