Learning from positive and unlabeled examples with different data distributions

Xiao Li Li; Bing Liu

Conference ProceedingsOPEN ACCESS

Learning from positive and unlabeled examples with different data distributions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3720 LNAI 218-229

DOI: 10.1007/11564096_24

70Citations

81Readers

Abstract

We study the problem of learning from positive and unlabeled examples. Although several techniques exist for dealing with this problem, they all assume that positive examples in the positive set P and the positive examples in the unlabeled set U are generated from the same distribution. This assumption may be violated in practice. For example, one wants to collect all printer pages from the Web. One can use the printer pages from one site as the set P of positive pages and use product pages from another site as U. One wants to classify the pages in U into printer pages and non-printer pages. Although printer pages from the two sites have many similarities, they can also be quite different because different sites often present similar products in different styles and have different focuses. In such cases, existing methods perform poorly. This paper proposes a novel technique A-EM to deal with the problem. Experiment results with product page classification demonstrate the effectiveness of the proposed technique. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Li, X. L., & Liu, B. (2005). Learning from positive and unlabeled examples with different data distributions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3720 LNAI, pp. 218–229). Springer Verlag. https://doi.org/10.1007/11564096_24

Learning from positive and unlabeled examples with different data distributions

Abstract

Cite

Register to see more suggestions