A Novel Dataset for Fake News Detection in Tamil Regional Language

T. T. Mirnalinee; Bhuvana Jayaraman; A. Anirudh; R. Jagadish; A. Karthik Raja

Conference Proceedings

A Novel Dataset for Fake News Detection in Tamil Regional Language

Communications in Computer and Information Science (2023) 1802 CCIS 311-323

DOI: 10.1007/978-3-031-33231-9_22

4Citations

4Readers

Get full text

Abstract

Tamil is one of the very few ancient languages that have survived the passage of time. And yet even though a lot of pieces of literature are available for this language, not a lot of data is labeled. With the Internet boom and digitization in all mediums, it is important to build classifiers for data analysis and prediction. But the available labeled data is from little to none in each domain. With the internet being used in all walks of life, the news spread via this medium quickly. Misleading and distorted information will not only affect the individual but also impact on the public. This research work elaborates on the creation of one such corpus meant for fake new detection. News snippets were scrapped from the news media and are annotated into fake and real news. The news of two classes are further labelled manually as 5 classes namely Sports, Politics, Science, Entertainment and Miscellaneous. The corpus has a collection of 2949 fake news and 2324 samples of genuine news was also added to the corpus to provide for a balanced dataset. One of the main observations was that a major chunk of the fake news data was political. For bench-marking this dataset we have built 5 baseline models with our corpus, each model showed improvement in different areas. Four machine learning and one deep learning model were trained on this new corpus.

Author supplied keywords

Cite

CITATION STYLE

APA

Mirnalinee, T. T., Jayaraman, B., Anirudh, A., Jagadish, R., & Karthik Raja, A. (2023). A Novel Dataset for Fake News Detection in Tamil Regional Language. In Communications in Computer and Information Science (Vol. 1802 CCIS, pp. 311–323). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-33231-9_22

A Novel Dataset for Fake News Detection in Tamil Regional Language

Abstract

Author supplied keywords

Cite

Register to see more suggestions