A Content-Driven ETL Processes for Open Data

Alain Berro; Imen Megdiche; Olivier Teste

Conference Proceedings

A Content-Driven ETL Processes for Open Data

Advances in Intelligent Systems and Computing (2015) 312 29-40

DOI: 10.1007/978-3-319-10518-5_3

5Citations

15Readers

Get full text

Abstract

The emergent statistical Open Data (OD) seems very promising to generate various analysis scenarios for decision-making systems. Nevertheless, OD has problematic characteristics such as semantic and structural heterogeneousness, lack of schemas, autonomy and dispersion. These characteristics shakes the traditional Extract-Transform-Load (ETL) processes since these latter generally deal with well structured schemas. We propose in this paper a content-driven ETL processes which automates "as far as possible" the extraction phase based only on the content of flat Open Data sources. Our processes rely on data annotations and data mining techniques to discover hierarchical relationships. Processed data are then transformed into instance-schema graphs to facilitate the structural data integration and the definition of the multidimensional schemas of the data warehouse. © Springer International Publishing Switzerland 2015.

Author supplied keywords

Cite

CITATION STYLE

APA

Berro, A., Megdiche, I., & Teste, O. (2015). A Content-Driven ETL Processes for Open Data. In Advances in Intelligent Systems and Computing (Vol. 312, pp. 29–40). Springer Verlag. https://doi.org/10.1007/978-3-319-10518-5_3

A Content-Driven ETL Processes for Open Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions