Abstract
In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated with five types of emotions: joy, fear, sadness, anger and neutral, with the intent of being used for opinion mining and analysis tasks. In this article we present some features of our novel dataset, and create a benchmark to achieve the first supervised machine learning model for automatic Emotion Detection in Romanian short texts. We investigate the performance of four classical machine learning models: Multinomial Naive Bayes, Logistic Regression, Support Vector Classification and Linear Support Vector Classification. We also investigate more modern approaches like fastText, which makes use of subword information. Lastly, we fine-tune the Romanian BERT for text classification and our experiments show that the BERT-based model has the best performance for the task of Emotion Detection from Romanian tweets.
Author supplied keywords
Cite
CITATION STYLE
Ciobotaru, A., & Dinu, L. P. (2021). RED: A Novel Dataset for Romanian Emotion Detection from Tweets. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 291–300). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_034
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.