Partitioning in Apache Spark

H. S. Sreeyuktha; J. Geetha Reddy

Book Chapter

Partitioning in Apache Spark

Sreeyuktha H
Geetha Reddy J

Springer, (2019), 493-498

DOI: 10.1007/978-981-13-7082-3_56

4Citations

1Readers

Get full text

Abstract

Apache Spark performs in-memory computation. The data structure used is Resilient Distributed Datasets (RDDs). These RDDs are partitioned using inbuilt Hash and Range Partitioning. We propose a partition scheme which uses modular division on keys of elements with numbers from 2 to 10. This scheme works on smaller datasets in order to enhance the execution time.

Author supplied keywords

Hash partition
Partition
Range partition
Resilient Distributed Datasets (RDDs)

Cite

CITATION STYLE

APA

Sreeyuktha, H. S., & Geetha Reddy, J. (2019). Partitioning in Apache Spark. In Lecture Notes in Networks and Systems (Vol. 74, pp. 493–498). Springer. https://doi.org/10.1007/978-981-13-7082-3_56

Partitioning in Apache Spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions