Abstract
The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BANGLABOOK, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook.
Cite
CITATION STYLE
Kabir, M., Mahfuz, O. B., Raiyan, S. R., Mahmud, H., & Hasan, M. K. (2023). BANGLABOOK: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1237–1247). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.80
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.