Classifier learning from imbalanced corpus by autoencoded over-sampling

Eunkyung Park; Raymond K. Wong; Victor W. Chu

Conference Proceedings

Classifier learning from imbalanced corpus by autoencoded over-sampling

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11670 LNAI 16-29

DOI: 10.1007/978-3-030-29908-8_2

0Citations

3Readers

Get full text

Abstract

Class imbalance is a common problem in classifier learning but it is difficult to solve. Textual data are ubiquitous and their analytics have great potential in many applications. In this paper, we propose a solution to build accurate sentiment classifiers from imbalanced textual data. We first establish topic vectors to capture local and global patterns from a corpus. Synthetic minority over-sampling technique is then used to balance the data while avoiding overfitting. However, we found that residue overfitting is still prominent. To address this problem, we propose an autoencoded oversampling framework to reconstruct balanced datasets. Our extensive experiments on different datasets with various imbalanced ratios and number of classes have found that our approach is sound and effective.

Author supplied keywords

Cite

CITATION STYLE

APA

Park, E., Wong, R. K., & Chu, V. W. (2019). Classifier learning from imbalanced corpus by autoencoded over-sampling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11670 LNAI, pp. 16–29). Springer Verlag. https://doi.org/10.1007/978-3-030-29908-8_2

Classifier learning from imbalanced corpus by autoencoded over-sampling

Abstract

Author supplied keywords

Cite

Register to see more suggestions