Cost-effective data partition for distributed stream processing system

Xiaotong Wang; Junhua Fang; Yuming Li; Rong Zhang; Aoying Zhou

Conference Proceedings

Cost-effective data partition for distributed stream processing system

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10178 LNCS 623-635

DOI: 10.1007/978-3-319-55699-4_39

1Citations

3Readers

Get full text

Abstract

Data skew and dynamics greatly affect throughput of stream processing system. It requires to design a high-efficient partition method to evenly distribute workload in a distributed and parallel. Previous research mainly focuses on load balancing adjustment based on key-asgranularity or tuple-as-granularity, both of which have their own limitations such as clumsy balance activities or expensive network cost. In this paper, we present a comprehensive cost model for partitioning method, which makes a synthesis estimation of memory, CPU and network resource utilization. Based on cost model, we propose a novel load balancing adjustment algorithm, which adopts the idea of “Split keys on demand and Merge keys as far as possible”, and is adaptive to different skewed workload. Our evaluation demonstrates that our method outperforms the state-of-the-art partitioning schemes while maintaining high throughput and resource utilization.

Cite

CITATION STYLE

APA

Wang, X., Fang, J., Li, Y., Zhang, R., & Zhou, A. (2017). Cost-effective data partition for distributed stream processing system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10178 LNCS, pp. 623–635). Springer Verlag. https://doi.org/10.1007/978-3-319-55699-4_39

Cost-effective data partition for distributed stream processing system

Abstract

Cite

Register to see more suggestions