Optimizing database load and extract for big data era

9Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With growing and pervasive interest in Big Data, SQL relational databases need to compete with data management by Hadoop, NoSQL and NoDB. Database research has mainly focused on result generation by query processing. But SQL databases require data in-place before queries may be processed. The process of DB loading has been a bottleneck leading to external ETL/ELT techniques for loading large data sets. This paper focuses on DB engine level techniques for optimizing both data loads and extracts in an MPP, shared-nothing SQL database, dbX, available on in-house commodity hardware and cloud systems. The agile, data loading of dbX exploits parallelism at multiple levels to achieve TBs of data load per hour making it suitable for cloud and continuous actionable knowledge applications. Implementation techniques at DB engine level, extensions to load/extract syntax and performance results are presented. Load optimization techniques help to speed up data extract to flat files and CTAS type SQL queries too. We show linear scale up with cluster scale out for load/extract in public cloud and commodity hardware systems without recourse to database tuning or use of expensive database appliances. © 2014 Springer International Publishing Switzerland.

Cite

CITATION STYLE

APA

Sridhar, K. T., & Sakkeer, M. A. (2014). Optimizing database load and extract for big data era. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8422 LNCS, pp. 503–512). Springer Verlag. https://doi.org/10.1007/978-3-319-05813-9_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free