Oversampling for imbalanced data via optimal transport

41Citations
Citations of this article
51Readers
Mendeley users who have this article in their library.

Abstract

The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.

Cited by Powered by Scopus

Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks

122Citations
142Readers
119Citations
105Readers
Get full text
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Yan, Y., Tan, M., Xu, Y., Cao, J., Ng, M., Min, H., & Wu, Q. (2019). Oversampling for imbalanced data via optimal transport. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 5605–5612). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33015605

Readers over time

‘18‘19‘20‘21‘22‘23‘24‘2505101520

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 21

81%

Researcher 3

12%

Professor / Associate Prof. 1

4%

Lecturer / Post doc 1

4%

Readers' Discipline

Tooltip

Computer Science 14

64%

Mathematics 5

23%

Engineering 2

9%

Arts and Humanities 1

5%

Save time finding and organizing research with Mendeley

Sign up for free
0