TamilEmo: Fine-grained Emotion Detection Dataset for Tamil

3Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Emotional Analysis from textual input has been considered both a challenging and interesting task in Natural Language Processing. However, due to the lack of datasets in low-resource languages (e.g. Tamil), it is difficult to conduct research of high standards in this area. Therefore we introduce a large manually annotated dataset of more than 42k Tamil YouTube comments, labeled for 31 emotions for emotion recognition. The goal of this dataset is to improve emotion detection in multiple downstream tasks in Tamil. We have also created three different groupings of our emotions namely 3-class, 7-class, and 31-class, and evaluated the models’ performance in each category of the grouping. We ran several baselines of different models and our MuRIL model has achieved the highest macro F1 score of 0.67 across our 3-class group dataset. In 7-class and 31-class groups, the MuRIL and Random Forest models performed well with a macro F1 score of 0.52 and 0.29 respectively.

Cite

CITATION STYLE

APA

Vasantharajan, C., Priyadharshini, R., Kumarasen, P. K., Ponnusamy, R., Thangasamy, S., Benhur, S., … Chakravarthi, B. R. (2023). TamilEmo: Fine-grained Emotion Detection Dataset for Tamil. In Communications in Computer and Information Science (Vol. 1802 CCIS, pp. 35–50). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-33231-9_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free