Solving Data Imbalance in Text Classification with Constructing Contrastive Samples

5Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Contrastive learning (CL) has been successfully applied in Natural Language Processing (NLP) as a powerful representation learning method and has shown promising results in various downstream tasks. Recent research has highlighted the importance of constructing effective contrastive samples through data augmentation. However, current data augmentation methods primarily rely on random word deletion, substitution, and cropping, which may introduce noisy samples and hinder representation learning. In this article, we propose a novel approach to address data imbalance in text classification by constructing contrastive samples. Our method involves the use of a Label-indicative Component to generate high-quality positive samples for the minority class, along with the introduction of a Hard Negative Mixing strategy to synthesize challenging negative samples at the feature level. By applying supervised contrastive learning to these samples, we are able to obtain superior text representations, which significantly benefit text classification tasks with imbalanced data. Our approach effectively mitigates distributional biases and promotes noise-resistant representation learning. To validate the effectiveness of our method, we conducted experiments on benchmark datasets (THUCNews, AG's News, 20NG) as well as the imbalanced FDCNews dataset. The code for our method is publicly available at the following GitHub repository: https://github.com/hanggun/CLDMTC.

Cite

CITATION STYLE

APA

Chen, X., Zhang, W., Pan, S., & Chen, J. (2023). Solving Data Imbalance in Text Classification with Constructing Contrastive Samples. IEEE Access, 11, 90554–90562. https://doi.org/10.1109/ACCESS.2023.3306805

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free