Partitioning in Apache Spark

4Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Apache Spark performs in-memory computation. The data structure used is Resilient Distributed Datasets (RDDs). These RDDs are partitioned using inbuilt Hash and Range Partitioning. We propose a partition scheme which uses modular division on keys of elements with numbers from 2 to 10. This scheme works on smaller datasets in order to enhance the execution time.

Cite

CITATION STYLE

APA

Sreeyuktha, H. S., & Geetha Reddy, J. (2019). Partitioning in Apache Spark. In Lecture Notes in Networks and Systems (Vol. 74, pp. 493–498). Springer. https://doi.org/10.1007/978-981-13-7082-3_56

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free