IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection

Derwin Suhartono; Wilson Wongso; Alif Tri Handoyo

Journal ArticleOPEN ACCESS

IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection

IEEE Access (2024) 12 87323-87332

DOI: 10.1109/ACCESS.2024.3416955

0Citations

22Readers

Abstract

Sarcasm detection in the Indonesian language poses a unique set of challenges due to the linguistic nuances and cultural specificities of the Indonesian social media landscape. Understanding the dynamics of sarcasm in this context requires a deep dive into language patterns and the socio-cultural background that shapes the use of sarcasm as a form of criticism and expression. In this study, we developed the first publicly available Indonesian sarcasm detection benchmark datasets from social media texts. We extensively investigated the results of classical machine learning algorithms, pre-trained language models, and recent large language models (LLMs). Our findings show that fine-tuning pre-trained language models is still superior to other techniques, achieving F1 scores of 62.74% and 76.92% on the Reddit and Twitter subsets respectively. Further, we show that recent LLMs fail to perform zero-shot classification for sarcasm detection and that tackling data imbalance requires a more sophisticated data augmentation approach than our basic methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Suhartono, D., Wongso, W., & Tri Handoyo, A. (2024). IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection. IEEE Access, 12, 87323–87332. https://doi.org/10.1109/ACCESS.2024.3416955

IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions