Learning to classify documents with only a small positive training set

Xiao Li; Bing Liu; See Kiong Ng

Conference ProceedingsOPEN ACCESS

Learning to classify documents with only a small positive training set

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4701 LNAI 201-213

DOI: 10.1007/978-3-540-74958-5_21

17Citations

25Readers

Abstract

Many real-world classification applications fall into the class of positive and unlabeled (PU) learning problems. In many such applications, not only could the negative training examples be missing, the number of positive examples available for learning may also be fairly limited due to the impracticality of hand-labeling a large number of training examples. Current PU learning techniques have focused mostly on identifying reliable negative instances from the unlabeled set U. In this paper, we address the oft-overlooked PU learning problem when the number of training examples in the positive set P is small. We propose a novel technique LPLP (Learning from Probabilistically Labeled Positive examples) and apply the approach to classify product pages from commercial websites. The experimental results demonstrate that our approach outperforms existing methods significantly, even in the challenging cases where the positive examples in P and the hidden positive examples in U were not drawn from the same distribution. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Li, X., Liu, B., & Ng, S. K. (2007). Learning to classify documents with only a small positive training set. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4701 LNAI, pp. 201–213). Springer Verlag. https://doi.org/10.1007/978-3-540-74958-5_21

Learning to classify documents with only a small positive training set

Abstract

Cite

Register to see more suggestions