When does co-training work in real data?

18Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Co-training, a paradigm of semi-supervised learning, may alleviate effectively the data scarcity problem (i.e., the lack of labeled examples) in supervised learning. The standard two-view co-training requires the dataset be described by two views of attributes, and previous theoretical studies proved that if the two views satisfy the sufficiency and independence assumptions, co-training is guaranteed to work well.However, little work has been done on how these assumptions can be empirically verified given datasets. In this paper, we first propose novel approaches to verify empirically the two assumptions of co-training based on datasets. We then propose simple heuristic to split a single view of attributes into two views, and discover regularity on the sufficiency and independence thresholds for the standard two-view co-training to work well. Our empirical results not only coincide well with the previous theoretical findings, but also provide a practical guideline to decide when co-training should work well based on datasets. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Ling, C. X., Du, J. D., & Zhou, Z. H. (2009). When does co-training work in real data? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5476 LNAI, pp. 596–603). https://doi.org/10.1007/978-3-642-01307-2_58

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free