Apache Spark performs in-memory computation. The data structure used is Resilient Distributed Datasets (RDDs). These RDDs are partitioned using inbuilt Hash and Range Partitioning. We propose a partition scheme which uses modular division on keys of elements with numbers from 2 to 10. This scheme works on smaller datasets in order to enhance the execution time.
CITATION STYLE
Sreeyuktha, H. S., & Geetha Reddy, J. (2019). Partitioning in Apache Spark. In Lecture Notes in Networks and Systems (Vol. 74, pp. 493–498). Springer. https://doi.org/10.1007/978-981-13-7082-3_56
Mendeley helps you to discover research relevant for your work.