Probabilistic Data Integration

  • Keulen M
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Synonyms Uncertain data integration Definitions Probabilistic data integration (PDI) is a specific kind of data integration where integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. The approach is based on the view that data quality problems (as they occur in an integration process) can be modeled as uncertainty (van Keulen 2012), and this uncertainty is considered an important result of the integration process (Magnani and Montesi 2010). The PDI process contains two phases (see Fig. 2): (i) a quick partial integration where certain data quality problems are not solved immediately , but explicitly represented as uncertainty in the resulting integrated data stored in a proba-bilistic database; (ii) continuous improvement by using the data-a probabilistic database can be queried directly resulting in possible or approximate answers (Dalvi et al. 2009)-and gathering evidence (e.g., user feedback) for improving the data quality. A probabilistic database is a specific kind of DBMS that allows storage, querying, and manipulation of uncertain data. It keeps track of alternatives and the dependencies among them.

Cite

CITATION STYLE

APA

Keulen, M. V. (2018). Probabilistic Data Integration. In Encyclopedia of Big Data Technologies (pp. 1–9). Springer International Publishing. https://doi.org/10.1007/978-3-319-63962-8_18-1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free