Big data generation

Tilmann Rabl; Hans Arno Jacobsen

Conference Proceedings

Big data generation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8163 LNCS 20-27

DOI: 10.1007/978-3-642-53974-9_3

15Citations

33Readers

Get full text

Abstract

Big data challenges are end-to-end problems. When handling big data it usually has to be preprocessed, moved, loaded, processed, and stored many times. This has led to the creation of big data pipelines. Current benchmarks related to big data only focus on isolated aspects of this pipeline, usually the processing, storage and loading aspects. To this date, there has not been any benchmark presented covering the end-to-end aspect for big data systems. In this paper, we discuss the necessity of ETL like tasks in big data benchmarking and propose the Parallel Data Generation Framework (PDGF) for its data generation. PDGF is a generic data generator that was implemented at the University of Passau and is currently adopted in TPC benchmarks. © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Rabl, T., & Jacobsen, H. A. (2014). Big data generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8163 LNCS, pp. 20–27). Springer Verlag. https://doi.org/10.1007/978-3-642-53974-9_3

Big data generation

Abstract

Cite

Register to see more suggestions