Improving address translation in multi-GPUs via sharing and spilling aware TLB design

13Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In recent years, the ever-growing application complexity and input dataset sizes have driven the popularity of multi-GPU systems as a desirable computing platform for many application domains. While employing multiple GPUs intuitively exposes substantial parallelism for the application acceleration, the delivered performance rarely scales with the number of GPUs. One of the major challenges behind is the address translation efficiency. Many prior works focus on CPUs or single GPU execution scenarios while the address translation in multi-GPU systems receives little attention. In this paper, we conduct a comprehensive investigation of the address translation efficiency in both "single-application-multi-GPU"and "multi-application-multi-GPU"execution paradigms. Based on our observations, we propose a new TLB hierarchy design, called least- TLB, tailored for multi-GPU systems and effectively improves the TLB performance with minimal hardware overheads. Experimental results on 9 single-application workloads and 10 multi-application workloads indicate the proposed least-TLB improves the performances, on average, by 23.5% and 16.3%, respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, B., Yin, J., Zhang, Y., & Tang, X. (2021). Improving address translation in multi-GPUs via sharing and spilling aware TLB design. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO (pp. 1154–1168). IEEE Computer Society. https://doi.org/10.1145/3466752.3480083

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free