D-Confidence: An active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions

4Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In some classification tasks, such as those related to the automatic building and maintenance of text corpora, it is expensive to obtain labeled instances to train a classifier. In such circumstances it is common to have massive corpora where a few instances are labeled (typically a minority) while others are not. Semi-supervised learning techniques try to leverage the intrinsic information in unlabeled instances to improve classification models. However, these techniques assume that the labeled instances cover all the classes to learn which might not be the case. Moreover, when in the presence of an imbalanced class distribution, getting labeled instances from minority classes might be very costly, requiring extensive labeling, if queries are randomly selected. Active learning allows asking an oracle to label new instances, which are selected by criteria, aiming to reduce the labeling effort. D-Confidence is an active learning approach that is effective when in presence of imbalanced training sets. In this paper we evaluate the performance of d-Confidence in comparison to its baseline criteria over tabular and text datasets. We provide empirical evidence that d-Confidence reduces label disclosure complexity-which we have defined as the number of queries required to identify instances from all classes to learn-when in the presence of imbalanced data. © 2012 The Brazilian Computer Society.

Cite

CITATION STYLE

APA

Escudeiro, N. F., & Jorge, A. M. (2012). D-Confidence: An active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions. Journal of the Brazilian Computer Society, 18(4), 311–330. https://doi.org/10.1007/s13173-012-0069-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free