In our implementation research, we apply workflow approach to the modeling and development of the Big Data processing pipeline using open source technologies. The data processing workflow is a set of interrelated steps which launch some particular jobs such as Spark job, shell job or Postgre SQL command. All workflow steps are chained to form integrated process and imitate the data load from staging storage area to the datamart storage area. The experimental workflow-based implementation of a data processing pipeline was performed that stages through different storage areas and uses actual industrial KPI dataset of some 30 millions records. Evaluation of implementation results provides proofs of the applicability of proposed workflow to other application domains and datasets which should satisfy the data format at input stage of the workflow.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Suleykin, A., & Panfilov, P. (2019). Implementing big data processing workflows using open source technologies. In Annals of DAAAM and Proceedings of the International DAAAM Symposium (Vol. 30, pp. 394–404). Danube Adria Association for Automation and Manufacturing, DAAAM. https://doi.org/10.2507/30th.daaam.proceedings.054