The emergent statistical Open Data (OD) seems very promising to generate various analysis scenarios for decision-making systems. Nevertheless, OD has problematic characteristics such as semantic and structural heterogeneousness, lack of schemas, autonomy and dispersion. These characteristics shakes the traditional Extract-Transform-Load (ETL) processes since these latter generally deal with well structured schemas. We propose in this paper a content-driven ETL processes which automates "as far as possible" the extraction phase based only on the content of flat Open Data sources. Our processes rely on data annotations and data mining techniques to discover hierarchical relationships. Processed data are then transformed into instance-schema graphs to facilitate the structural data integration and the definition of the multidimensional schemas of the data warehouse. © Springer International Publishing Switzerland 2015.
CITATION STYLE
Berro, A., Megdiche, I., & Teste, O. (2015). A Content-Driven ETL Processes for Open Data. In Advances in Intelligent Systems and Computing (Vol. 312, pp. 29–40). Springer Verlag. https://doi.org/10.1007/978-3-319-10518-5_3
Mendeley helps you to discover research relevant for your work.