A Novel Dataset for Fake News Detection in Tamil Regional Language

4Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Tamil is one of the very few ancient languages that have survived the passage of time. And yet even though a lot of pieces of literature are available for this language, not a lot of data is labeled. With the Internet boom and digitization in all mediums, it is important to build classifiers for data analysis and prediction. But the available labeled data is from little to none in each domain. With the internet being used in all walks of life, the news spread via this medium quickly. Misleading and distorted information will not only affect the individual but also impact on the public. This research work elaborates on the creation of one such corpus meant for fake new detection. News snippets were scrapped from the news media and are annotated into fake and real news. The news of two classes are further labelled manually as 5 classes namely Sports, Politics, Science, Entertainment and Miscellaneous. The corpus has a collection of 2949 fake news and 2324 samples of genuine news was also added to the corpus to provide for a balanced dataset. One of the main observations was that a major chunk of the fake news data was political. For bench-marking this dataset we have built 5 baseline models with our corpus, each model showed improvement in different areas. Four machine learning and one deep learning model were trained on this new corpus.

Cite

CITATION STYLE

APA

Mirnalinee, T. T., Jayaraman, B., Anirudh, A., Jagadish, R., & Karthik Raja, A. (2023). A Novel Dataset for Fake News Detection in Tamil Regional Language. In Communications in Computer and Information Science (Vol. 1802 CCIS, pp. 311–323). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-33231-9_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free