Ontology-based data cleaning

Zoubida Kedad; Elisabeth Métais

Conference Proceedings

Ontology-based data cleaning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2553 137-149

DOI: 10.1007/3-540-36271-1_12

26Citations

19Readers

Get full text

Abstract

Multi-source information systems, such as data warehouses, are composed of a set of heterogeneous and distributed data sources. The relevant information is extracted from these sources, cleaned, transformed and then integrated. The confrontation of two different data sources may reveal different kinds of heterogeneities: at the intensional level, the conflicts are related to the structure of the data. At the extensional level, the conflicts are related to the instances of the data. The process of detecting and solving the conflicts at the extensional level is known as data cleaning. In this paper, we will focus on the problem of differences in terminologies and we propose a solution based on linguistic knowledge provided by a domain ontology. This approach is well suited for application domains with intensive classification of data such as medicine or pharmacology. The main idea is to automatically generate some correspondence assertions between instances of objects. The user can parametrize this generation process by defining a level of accuracy expressed using the domain ontology.

Author supplied keywords

Cite

CITATION STYLE

APA

Kedad, Z., & Métais, E. (2002). Ontology-based data cleaning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2553, pp. 137–149). Springer Verlag. https://doi.org/10.1007/3-540-36271-1_12

Ontology-based data cleaning

Abstract

Author supplied keywords

Cite

Register to see more suggestions