BANGLABOOK: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

9Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BANGLABOOK, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook.

Cite

CITATION STYLE

APA

Kabir, M., Mahfuz, O. B., Raiyan, S. R., Mahmud, H., & Hasan, M. K. (2023). BANGLABOOK: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1237–1247). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.80

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free