Information integration of partially labeled data

Steffen Rendle; Lars Schmidt-Thieme

Conference Proceedings

Information integration of partially labeled data

Studies in Classification, Data Analysis, and Knowledge Organization (2008) 171-179

DOI: 10.1007/978-3-540-78246-9_21

0Citations

8Readers

Get full text

Abstract

A central task when integrating data from different sources is to detect identical items. For example, price comparison websites have to identify offers for identical products. This task is known, among others, as record linkage, object identification, or duplicate detection. In this work, we examine problem settings where some relations between items are given in advance - for example by EAN article codes in an e-commerce scenario or by manually labeled parts. To represent and solve these problems we bring in ideas of semi-supervised and constrained clustering in terms of pairwise must-link and cannot-link constraints. We show that extending object identification by pairwise constraints results in an expressive framework that subsumes many variants of the integration problem like traditional object identification, matching, iterative problems or an active learning setting. For solving these integration tasks, we propose an extension to current object identification models that assures consistent solutions to problems with constraints. Our evaluation shows that additionally taking the labeled data into account dramatically increases the quality of state-of-the-art object identification systems.

Cite

CITATION STYLE

APA

Rendle, S., & Schmidt-Thieme, L. (2008). Information integration of partially labeled data. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 171–179). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-540-78246-9_21

Information integration of partially labeled data

Abstract

Cite

Register to see more suggestions