Extract-Transform-Load (ETL) processes are used for extracting data, transforming it and loading it into data warehouses (DWs). The dominating ETL tools use graphical user interfaces (GUIs) such that the developer “draws” the ETL flow by connecting steps/transformations with lines. This gives an easy overview, but can also be rather tedious and require much trivial work for simple things. We therefore challenge this approach and propose to do ETL programming by writing code. To make the programming easy, we present the Python-based framework pygrametl which offers commonly used functionality for ETL development. By using the framework, the developer can efficiently create effective ETL solutions from which the full power of programming can be exploited. In this chapter, we present our work on pygrametl and related activities. Further, we consider some of the lessons learned during the development of pygrametl as an open source framework.
CITATION STYLE
Thomsen, C., Andersen, O., Jensen, S. K., & Pedersen, T. B. (2018). Programmatic ETL. In Lecture Notes in Business Information Processing (Vol. 324, pp. 21–50). Springer Verlag. https://doi.org/10.1007/978-3-319-96655-7_2
Mendeley helps you to discover research relevant for your work.