Fine-Tuning BERT on Twitter and Reddit Data in Luganda and English

Richard Kimera; Daniela N. Rim; Heeyoul Choi

Conference ProceedingsOPEN ACCESS

Fine-Tuning BERT on Twitter and Reddit Data in Luganda and English

ACM International Conference Proceeding Series (2023) 63-70

DOI: 10.1145/3639233.3639344

0Citations

14Readers

Get full text

Abstract

Deep learning techniques, driven by the Transformer architecture and models like BERT, find broad utility. While sentiment analysis in high-resource languages is well-established, it's largely unexplored in low-resource ones. Our focus is on Luganda, a prevalent Ugandan language, spoken by over 21 million people. We utilized three datasets, from social media, to train machine learning models as baseline models and used BERT for deep learning. Our findings enhance sentiment analysis in both Luganda and English. Our approach for data extraction aids domain-specific dataset construction. This research advances NLP and aligns with global deep-learning initiatives.

Author supplied keywords

Cite

CITATION STYLE

APA

Kimera, R., Rim, D. N., & Choi, H. (2023). Fine-Tuning BERT on Twitter and Reddit Data in Luganda and English. In ACM International Conference Proceeding Series (pp. 63–70). Association for Computing Machinery. https://doi.org/10.1145/3639233.3639344

Fine-Tuning BERT on Twitter and Reddit Data in Luganda and English

Abstract

Author supplied keywords

Cite

Register to see more suggestions