A Content-Driven ETL Processes for Open Data

5Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The emergent statistical Open Data (OD) seems very promising to generate various analysis scenarios for decision-making systems. Nevertheless, OD has problematic characteristics such as semantic and structural heterogeneousness, lack of schemas, autonomy and dispersion. These characteristics shakes the traditional Extract-Transform-Load (ETL) processes since these latter generally deal with well structured schemas. We propose in this paper a content-driven ETL processes which automates "as far as possible" the extraction phase based only on the content of flat Open Data sources. Our processes rely on data annotations and data mining techniques to discover hierarchical relationships. Processed data are then transformed into instance-schema graphs to facilitate the structural data integration and the definition of the multidimensional schemas of the data warehouse. © Springer International Publishing Switzerland 2015.

Cite

CITATION STYLE

APA

Berro, A., Megdiche, I., & Teste, O. (2015). A Content-Driven ETL Processes for Open Data. In Advances in Intelligent Systems and Computing (Vol. 312, pp. 29–40). Springer Verlag. https://doi.org/10.1007/978-3-319-10518-5_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free