Sign up & Download
Sign in

A Framework for Reconciling Attribute Values from Multiple Data Sources

by Z Jiang, S Sarkar, P De, D Dey
Management Science (2007)

Abstract

Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the "best" value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single "best" value has to be chosen and stored for every attribute of an entity.

Cite this document (BETA)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

9 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
22% Researcher (at an Academic Institution)
 
22% Professor
 
22% Ph.D. Student
by Country
 
11% Austria
 
11% Switzerland
 
11% China