To process large-scale real-Time data streams, existing distributed stream processing systems (DSPSs) leverage different stream partitioning strategies the one-To-many data partitioning strategy plays an important role in various applications. With one-To-many data partitioning, an upstream processing instance sends a generated tuple to a potentially large number of downstream processing instances. Existing DSPSs leverage an instance-oriented communication mechanism, where an upstream instance transmits a tuple to different downstream instances separately. However, in one-Tomany data partitioning, multiple downstream instances typically run on the same machine to exploit multi-core resources therefore, a DSPS actually sends a data item to a machine multiple times, raising significant unnecessary costs for serialization and communication. We show that such a mechanism can lead to serious performance bottleneck due to CPU overload. To address the problem, we design and implement Whale, an efficient RDMA (Remote Direct Memory Access) assisted distributed stream processing system. Two factors contribute to the efficiency of this design. First, we propose a novel RDMAassisted stream multicast scheme with a selfadjusting non-blocking tree structure to alleviate the CPU workloads of an upstream instance during one-To-many data partitioning. Second, we re-design the communication mechanism in existing DSPSs by replacing the instanceoriented communication with a new worker-oriented communication scheme, which saves significant costs for redundant serialization and communication. We implement Whale on top of Apache Storm and conduct comprehensive experiments to evaluate its performance with large-scale real world datasets the results show that Whale achieves 56.6× improvement of system throughput and 97% reduction of processing latency compared to existing designs.
CITATION STYLE
Tan, J., Chen, H., Wang, Y., & Jin, H. (2021). Whale: Efficient one-To-many data partitioning in rdmaassisted distributed stream processing systems. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476192
Mendeley helps you to discover research relevant for your work.