Improving Turkish Text Sentiment Classification Through Task-Specific and Universal Transformations: An Ensemble Data Augmentation Approach

7Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The exponential growth of digital data in recent years has spurred a significant interest in natural language processing (NLP) and sentiment analysis. However, the effectiveness of NLP models heavily relies on the availability of large, annotated datasets, which are often scarce or entirely absent for numerous languages, including Turkish. This scarcity of annotated data for Turkish presents a formidable obstacle in developing NLP models for the language. To overcome this challenge, various techniques have been proposed to augment the size of annotated datasets, with text data augmentation emerging as a promising solution. Text data augmentation involves the generation of synthetic data by transforming existing data, thus expanding the diversity and volume of the annotated dataset. While this technique has shown remarkable success in bolstering the performance of NLP models, its exploration in the context of Turkish and other low-resource languages has been limited. This paper introduces a novel ensemble approach to text data augmentation tailored for Turkish text sentiment classification. Our approach integrates both task-specific and universal transformations, capitalizing on the strengths of each to enrich the training dataset. We evaluate our proposed approach on the TRSAv1 dataset and compare it with established data augmentation techniques. The experimental results demonstrate that our ensemble method achieves superior accuracy in sentiment classification compared to conventional techniques. Additionally, we conduct an in-depth analysis to assess the impact of individual transformation functions on classification performance. Our contribution lies in bridging the gap in research on data augmentation techniques tailored to Turkish NLP, emphasizing the need for more advanced ensemble methods, and offering benchmarking results that pave the way for the development of precise NLP models not only for Turkish but also for other low-resource languages.

Cite

CITATION STYLE

APA

Onan, A., & Balbal, K. F. (2024). Improving Turkish Text Sentiment Classification Through Task-Specific and Universal Transformations: An Ensemble Data Augmentation Approach. IEEE Access, 12, 4413–4458. https://doi.org/10.1109/ACCESS.2024.3349971

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free