Big data normalization for massively parallel processing databases

Nikolay Golov; Lars Rönnbäck

Conference Proceedings

Big data normalization for massively parallel processing databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9382 154-163

DOI: 10.1007/978-3-319-25747-1_16

7Citations

11Readers

Get full text

Abstract

High performance querying and ad-hoc querying are commonly viewed as mutually exclusive goals in massively parallel processing databases. In the one extreme, a database can be set up to provide the results of a single known query so that the use of available of resources are maximized and response time minimized, but at the cost of all other queries being sub optimally executed. In the other extreme, when no query is known in advance, the database must provide the information without such optimization, normally resulting in inefficient execution of all queries. This paper introduces a novel technique, highly normalized Big Data using Anchor modeling, that provides a very efficient way to store information and utilize resources, thereby providing ad-hoc querying with high performance for the first time in massively parallel processing databases. A case study of how this approach is used for a Data Warehouse at Avito over two years time, with estimates for and results of real data experiments carried out in HP Vertica, an MPP RDBMS, are also presented.

Author supplied keywords

Cite

CITATION STYLE

APA

Golov, N., & Rönnbäck, L. (2015). Big data normalization for massively parallel processing databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9382, pp. 154–163). Springer Verlag. https://doi.org/10.1007/978-3-319-25747-1_16

Big data normalization for massively parallel processing databases

Abstract

Author supplied keywords

Cite

Register to see more suggestions