This paper we consider the potential role of unlabeled data in supervised learning. We present an algorithm and experimental results demonstrating that unlabeled data can significantly improve learning accuracy in certain practical problems. We then identify the abstract problem structure that enables the algorithm to successfully utilize this unlabeled data, and prove that unlabeled data will boost learning accuracy for problems in this class. The problem class we identify includes problems where the features describing the examples are redundantly sufficient for classifying the example; a notion we make precise in the paper. This problem class includes many natural learning problems faced by humans, such as learning a semantic lexicon over noun phrases in natural language, and learning to recognize objects from multiple sensor inputs. We argue that models of human and animal learning should consider more strongly the potential role of unlabeled data, and that many natural learning problems fit the class we identify.
CITATION STYLE
Mitchell, T. M. (2004). The Role of Unlabeled Data in Supervised Learning. In Language, Knowledge, and Representation (pp. 103–111). Springer Netherlands. https://doi.org/10.1007/978-1-4020-2783-3_7
Mendeley helps you to discover research relevant for your work.