SCHEMA MATCHING AND MAPPING - BASED DATA INTEGRATION

Do Hong Hai

Thesis

SCHEMA MATCHING AND MAPPING - BASED DATA INTEGRATION

Hai D

(2005) 222

N/ACitations

1Readers

Abstract

Schema matching aims at identifying semantic correspondences between elements of two schemas, e.g., database schemas, ontologies, and XML message formats. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. In today's systems, schema matching is manual; a time-consuming, tedious, and error-prone process, which becomes increasingly impractical with a higher number of schemas and data sources to be dealt with. To reduce the amount of manual effort as much as possible, approaches to semi-automatically determine element correspondences are required. We start by surveying the existing approaches and prototypes for schema matching and explain their common features and applicability using a previously proposed taxonomy. We further identify the major criteria that influence the effectiveness of a match approach. We use these criteria to compare the evaluation of various recent prototypes and discuss the issues that need to be addressed in future evaluations. Besides helping us to develop and test our own system, the surveys of match approaches and of evaluations aim at guiding future implementations, so that they can be documented better, their result be more reproducible, and a comparison between different systems and approaches be easier. Based on the insights about the state of the art, we have developed Coma (Combining Matchers) and further extended it to Coma++, both representing generic and customizable systems for semi-automatic schema matching. In particular, Coma++ offers a platform for flexible combination of different match algorithms. It provides a large spectrum of individual matchers, including a novel approach reusing results from previous match operations, and various mechanisms to combine and refine matcher results. Based on this flexible infrastructure, match processing is supported as a workflow, allowing to divide and successively solve a match task in multiple stages. In particular, we implement specific workflows (i.e., strategies) for context-dependent matching of schemas with shared elements and fragment-based matching of very large schemas. With the flexibility to customize matchers and match strategies, Coma++ also represents a platform for comparative evaluation of match approaches. In fact, we performed comprehensive evaluations using real-world schemas found on the web and ontologies from a published ontology alignment contest. In particular, the E-busi…

Cite

CITATION STYLE

APA

Hai, D. H. (2005). SCHEMA MATCHING AND MAPPING - BASED DATA INTEGRATION. Department of Computer Science.

SCHEMA MATCHING AND MAPPING - BASED DATA INTEGRATION

Abstract

Cite

Register to see more suggestions