A Cost-sensitive Active Learning for Imbalance Data with Uncertainty and Diversity Combination

8Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The class distributions of real-world classification datasets are usually imbalanced because many applications, such as network intrusion detection, tumor classification, financial risk identification, etc., exhibit imbalance natures that positive examples are rare. When labeling such data to create training sets for supervised learning, too many examples belonging to the majority class will be labeled, which dramatically increase the labeling cost and usually is unnecessary, because balanced datasets are more suitable for inducing good learners. To deal with this problem, this paper proposes a novel cost-sensitive active learning algorithm that combines the uncertainty and diversity measures to select training examples for an unlabeled sample pool. We use the proportions of the majority and the minority against the whole examples in the training dataset as the weights of the majority class and the minority class, respectively. With the class weights, the minor examples can obtain more emphasis when building learning models. Experimental results show that our proposed method can significantly reduce the label cost while improving the performance of learning models.

Cite

CITATION STYLE

APA

Dong, H., Zhu, B., & Zhang, J. (2020). A Cost-sensitive Active Learning for Imbalance Data with Uncertainty and Diversity Combination. In ACM International Conference Proceeding Series (pp. 218–224). Association for Computing Machinery. https://doi.org/10.1145/3383972.3384002

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free