Queue congestion prediction for large-scale high performance computing systems using a hidden Markov model

Ju Won Park; Min Woo Kwon; Taeyoung Hong

Journal ArticleOPEN ACCESS

Queue congestion prediction for large-scale high performance computing systems using a hidden Markov model

Journal of Supercomputing (2022) 78(10) 12202-12223

DOI: 10.1007/s11227-022-04356-z

8Citations

7Readers

Abstract

To share limited, large-capacity resources, the high-performance computing field provides services by allocating available resources to jobs through batch job schedulers. Therefore, it is natural that a queue waiting time occurs until the resources are available if resources are not sufficient. The prediction of queue waiting time is very useful to improve overall resource utilization. However, the queue waiting time is very difficult to predict because it is significantly affected by the many factors such as applied scheduling algorithm and characteristics of the executed job. In this study, a method of predicting queue waiting time using only the historical log data created by the batch job scheduler is examined. Specifically, a method of predicting queue waiting time based on a hidden Markov model is proposed. It has the following three stages. First, outliers are removed by applying the outlier detection algorithm using a statistics-based parametric method. Second, the parameters of the hidden state are estimated using the observed queue waiting time sequence based on the historical job log. Third, the queue waiting interval at time t+ 1 is provided using the estimated parameters at time t. Comparing the prediction accuracy with those of the other prediction methods, experimental results show that the proposed algorithm improves the prediction accuracy by up to 60%.

Author supplied keywords

Cite

CITATION STYLE

APA

Park, J. W., Kwon, M. W., & Hong, T. (2022). Queue congestion prediction for large-scale high performance computing systems using a hidden Markov model. Journal of Supercomputing, 78(10), 12202–12223. https://doi.org/10.1007/s11227-022-04356-z

Queue congestion prediction for large-scale high performance computing systems using a hidden Markov model

Abstract

Author supplied keywords

Cite

Register to see more suggestions