It is a common sense that people harbors the belief that stragglers exert huge influence upon the performance conducted by the analysis systems of big data for the reason of poor performance made by some computing nodes, data skew and so on. Accordingly, stragglers have been billed as an indispensable bottleneck in Map-Reduce framework processing. However, existing studies on stragglers identification are targeting coarse-grained detection, schedule level optimization and off-line log based cause analysis. Accuracy identifying the stragglers in time for each job, however, is an extremely tough because (1) Auite a number of root causes for stragglers in data analytics frameworks;(2) The number of key parameters affecting stragglers identification; and (3) The different clusters configurations, and their impact on the stragglers detection, vary among different job types and sizes. Either existing solutions adopt a "tweak-and-pray" manual tuning approach, which is complex, time-consuming and error-prone, or only most of them fix theirs eyes upon coarse-grained straggler detection. In this paper, we systematically conduct the exploration on the fundamental problem of automatic, adaptive straggler identification on big data analytics platform. Under the inspiration of the recent triumphs over implementing Reinforcement Learning (RL) techniques for solving complex online optimal problems, we conducted investigation that Reinforcement learning are reasonably employed to adaptively opt the optimal parameters to identify stragglers free of the intervention of human beings. Specifically, we propose Hawkeye, a general adaptive speculative execution system which identifies stragglers by reinforcement are learning to launch speculative tasks on heterogeneous cluster at runtime. In accordance with the experimental conclusions, Hawkeye manages to cut down the job completion time over the distinct type applications. An instance is that it reveals as many as nearly 37% decrease average job completion time based on an improvement of 23% on the preciseness of the present resolutions to the heterogeneous cluster.
CITATION STYLE
Du, H., & Zhang, S. (2020). Hawkeye: Adaptive straggler identification on heterogeneous spark cluster with reinforcement learning. IEEE Access, 8, 57822–57832. https://doi.org/10.1109/ACCESS.2020.2982320
Mendeley helps you to discover research relevant for your work.