Extracting information from the web for concept learning and collaborative filtering

William W. Cohen

Conference Proceedings

Extracting information from the web for concept learning and collaborative filtering

Cohen W

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1968 1-12

DOI: 10.1007/3-540-40992-0_1

0Citations

17Readers

Get full text

Abstract

Previous work on extracting information from the web generally makes few assumptions about how the extracted information will be used. As a consequence, the goal of web-based extraction systems is usually taken to be the creation of high-quality, noise-free data with clear semantics. This is a difficult problem which cannot be completely automated. Here we consider instead the problem of extracting web data for certain machine learning systems: specifically, collaborative filtering (CF) and concept learning (CL) systems. CF and CL systems are highly tolerant of noisy input, and hence much simpler extraction systems can be used in this context. For CL, we will describe a simple method that uses a given set of web pages to construct new features, which reduce the error rate of learned classifiers in a wide variety of situations. For CF, we will describe a simple method that automatically collects useful information from the web without any human intervention. The collected information, represented as “pseudo-users”, can be used to “jumpstart” a CF system when the user base is small (or even absent).

Cite

CITATION STYLE

APA

Cohen, W. W. (2000). Extracting information from the web for concept learning and collaborative filtering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1968, pp. 1–12). Springer Verlag. https://doi.org/10.1007/3-540-40992-0_1

Extracting information from the web for concept learning and collaborative filtering

Abstract

Cite

Register to see more suggestions