Abstract
Aiming at the problems of incremental data extraction and task scheduling in ETL processing, an optimization strategy based on data quality control is proposed. We discuss the common mechanisms of data incremental extraction and quality control in the ET subsystem of data warehouse, and propose an incremental data extraction method based on key attribute comparison. The key functions of ETL data extraction is utilized, and the database incremental extraction method is adopted. During the process of data processing, the problem of data loss is solved by means of auxiliary tables, and the problem of data quality at the instance level is solved by data cleaning. Finally, the experimental results on real data sets show that the ETL design with quality control optimizes the overall workflow and ensures that data can be accurately extracted into the data warehouse.
Author supplied keywords
Cite
CITATION STYLE
Wang, X. P., & Li, J. Y. (2023). Design of Data Quality Control System Based on ETL. In Journal of Physics: Conference Series (Vol. 2476). Institute of Physics. https://doi.org/10.1088/1742-6596/2476/1/012083
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.