Emotional Analysis from textual input has been considered both a challenging and interesting task in Natural Language Processing. However, due to the lack of datasets in low-resource languages (e.g. Tamil), it is difficult to conduct research of high standards in this area. Therefore we introduce a large manually annotated dataset of more than 42k Tamil YouTube comments, labeled for 31 emotions for emotion recognition. The goal of this dataset is to improve emotion detection in multiple downstream tasks in Tamil. We have also created three different groupings of our emotions namely 3-class, 7-class, and 31-class, and evaluated the models’ performance in each category of the grouping. We ran several baselines of different models and our MuRIL model has achieved the highest macro F1 score of 0.67 across our 3-class group dataset. In 7-class and 31-class groups, the MuRIL and Random Forest models performed well with a macro F1 score of 0.52 and 0.29 respectively.
CITATION STYLE
Vasantharajan, C., Priyadharshini, R., Kumarasen, P. K., Ponnusamy, R., Thangasamy, S., Benhur, S., … Chakravarthi, B. R. (2023). TamilEmo: Fine-grained Emotion Detection Dataset for Tamil. In Communications in Computer and Information Science (Vol. 1802 CCIS, pp. 35–50). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-33231-9_3
Mendeley helps you to discover research relevant for your work.