Abstract
We address the problem of integrating documents from different sources into a master catalog. This problem is pervasive in web marketplaces and portals. Current technology for automating this process consists of building a classifier that uses the categorization of documents in the master catalog to construct a model for predicting the category of unknown documents. Our key insight is that many of the data sources have their own categorization, and classification accuracy can be improved by factoring in the implicit information in these source categorizations. We show how a Naive Bayes classification can be enhanced to incorporate the similarity information present in source catalogs. Our analysis and empirical evaluation show substantial improvement in the accuracy of catalog integration.
Author supplied keywords
Cite
CITATION STYLE
Agrawal, R., & Srikant, R. (2001). On integrating catalogs. In Proceedings of the 10th International Conference on World Wide Web, WWW 2001 (pp. 603–612). Association for Computing Machinery, Inc. https://doi.org/10.1145/371920.372163
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.