HSM: A hybrid slowdown model for multitasking GPUs

Xia Zhao; Magnus Jahre; Lieven Eeckhout

Conference ProceedingsOPEN ACCESS

HSM: A hybrid slowdown model for multitasking GPUs

International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (2020) 1371-1385

DOI: 10.1145/3373376.3378457

30Citations

42Readers

Get full text

Abstract

Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways - leading to suboptimal resource utilization when a single GPU is used to run a single application. One solution is to use the GPU in a multitasking fashion to improve utilization. Unfortunately, multitasking leads to destructive interference between co-running applications which causes fairness issues and Quality-of-Service (QoS) violations. We propose the Hybrid Slowdown Model (HSM) to dynamically and accurately predict application slowdown due to interference. HSM overcomes the low accuracy of prior white-box models, and training and implementation overheads of pure black-box models, with a hybrid approach. More specifically, the white-box component of HSM builds upon the fundamental insight that effective bandwidth utilization is proportional to DRAM row buffer hit rate, and the black-box component of HSM uses linear regression to relate row buffer hit rate to performance. HSM accurately predicts application slowdown with an average error of 6.8%, a significant improvement over the current state-of-the-art. In addition, we use HSM to guide various resource management schemes in multitasking GPUs: HSM-Fair significantly improves fairness (by 1.59× on average) compared to even partitioning, whereas HSM-QoS improves system throughput (by 18.9% on average) compared to proportional SM partitioning while maintaining the QoS target for the high-priority application in challenging mixed memory/compute-bound multi-program workloads.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, X., Jahre, M., & Eeckhout, L. (2020). HSM: A hybrid slowdown model for multitasking GPUs. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (pp. 1371–1385). Association for Computing Machinery. https://doi.org/10.1145/3373376.3378457

HSM: A hybrid slowdown model for multitasking GPUs

Abstract

Author supplied keywords

Cite

Register to see more suggestions