Classifier learning from imbalanced corpus by autoencoded over-sampling

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Class imbalance is a common problem in classifier learning but it is difficult to solve. Textual data are ubiquitous and their analytics have great potential in many applications. In this paper, we propose a solution to build accurate sentiment classifiers from imbalanced textual data. We first establish topic vectors to capture local and global patterns from a corpus. Synthetic minority over-sampling technique is then used to balance the data while avoiding overfitting. However, we found that residue overfitting is still prominent. To address this problem, we propose an autoencoded oversampling framework to reconstruct balanced datasets. Our extensive experiments on different datasets with various imbalanced ratios and number of classes have found that our approach is sound and effective.

Cite

CITATION STYLE

APA

Park, E., Wong, R. K., & Chu, V. W. (2019). Classifier learning from imbalanced corpus by autoencoded over-sampling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11670 LNAI, pp. 16–29). Springer Verlag. https://doi.org/10.1007/978-3-030-29908-8_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free