Rapidly generated data and the amount magnitude of data analytical jobs pose great pressure to the underlying computing facilities. A distributed multi-cluster computing environment such as a hybrid cloud consequently raises its necessity due to its advantages in adapting geographically distributed and potentially cloud-based computing resources. Different clusters forming such an environment could be heterogeneous and may be resource-elastic as well. From analytical perspective, in accordance with increasing needs on streaming applications and timely analytical demands, many data analytical jobs nowadays are time-critical in terms of their temporal urgency. And the overall workload of the computing environment can be hybrid to contain both time-critical and general applications. These all call for an efficient resource management approach capable to apprehend both computing environment and application features. However, the added up complexity and high dynamics of the system greatly hinder the performance of traditional rule-based approaches. In this work, we propose to utilize deep reinforcement learning for developing elasticity-compatible resource management for a heterogeneous distributed computing environment, aiming for less occurrences of missing temporal deadline while maintaining low average execution time ratio. Along with reinforcement learning we design a deep model employing Long Short-Term Memory (LSTM) structure and partial model sharing for multi-target learning mechanism. The experimental results show that the proposed approach could greatly outperform baselines and serve as a robust resource management for variant workloads.
CITATION STYLE
Liu, Z., Wang, L., & Quan, G. (2020). Deep Reinforcement Learning based Elasticity-compatible Heterogeneous Resource Management for Time-critical Computing. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3404397.3404475
Mendeley helps you to discover research relevant for your work.