Abstract
Efficient memory sharing among multiple compute engines playsan important role in shaping the overall application performanceon CPU-GPU heterogeneous platforms. Unified Virtual Memory(UVM) is a promising feature that allows globally-visible data structures and pointers such that the GPU can access the physical memory space on the CPU side, and take advantage of the host OS pagingmechanism without explicit programmer effort. However, a keyrequirement for the guaranteed performance is effective hardwaresupport of address translation. Particularly, we observe that GPU execution suffers from high TLB miss rates in a UVM environment, especially for irregular and/or memory-intensive applications. In thispaper, we propose simple yet effective compression mechanismsfor address translations to improve GPU TLB hit rates. Specifically,we explore and leverage the TLB compressibility during the execution of GPU applications to design efficient address translationcompression with minimal runtime overhead. Experimental resultsacross 22 applications indicate that our proposed approach significantly improves GPU TLB hit rates, which translate to 12% averageperformance improvement. Particularly, for 16 irregular and/ormemory-intensive applications, the performance improvementsachieved reach up to 69.2%, with an average of 16.3%.
Author supplied keywords
Cite
CITATION STYLE
Tang, X., Zhang, Z., Xu, W., Kandemir, M. T., Melhem, R., & Yang, J. (2020). Enhancing Address translations in throughput processors via compression. In Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT (pp. 191–204). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3410463.3414633
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.