Complementing data in the ETL process

Lívia De S. Ribeiro; Ronaldo R. Goldschmidt; Maria Cláudia Cavalcanti

Conference Proceedings

Complementing data in the ETL process

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6862 LNCS 112-123

DOI: 10.1007/978-3-642-23544-3_9

5Citations

18Readers

Get full text

Abstract

Data quality in a typical Data Warehouse (DW) environment is critical. The process of transferring data from different sources into the DW environment, known as ETL (Extraction, Transformation, and Load), usually takes care of improving the data quality. However, it is not unusual to identify null values in a DW fact table during the ETL process, and this may impact negatively on the accuracy of data analyses results. Data imputation1 techniques are commonly used for dealing with the missing value problem. Some of them observe table values to generate a new value for the missing one. This paper proposes a new strategy to address the missing data problem on the ETL process. The idea is to enrich the DW fact table with dimension attributes, in order to reach better imputation results. The strategy uses the k-NN algorithm as the imputation approach. Tests performed on an implemented prototype showed promising results with respect to imputation quality. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

De S. Ribeiro, L., Goldschmidt, R. R., & Cavalcanti, M. C. (2011). Complementing data in the ETL process. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6862 LNCS, pp. 112–123). https://doi.org/10.1007/978-3-642-23544-3_9

Complementing data in the ETL process

Abstract

Author supplied keywords

Cite

Register to see more suggestions