Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems

Mark Hildebrand; Jason Lowe-Power; Venkatesh Akella

Conference Proceedings

Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13948 LNCS 42-61

DOI: 10.1007/978-3-031-32041-5_3

0Citations

1Readers

Get full text

Abstract

We propose a new data structure called CachedEmbeddings for training large scale deep learning recommendation models (DLRM) efficiently on heterogeneous (DRAM + non-volatile) memory platforms. CachedEmbeddings implements an implicit software-managed cache and data movement optimization that is integrated with the Julia programming framework to optimize the implementation of large scale DLRM implementations with multiple sparse embedded tables operations. In particular we show an implementation that is 1.4X to 2X better than the best known Intel CPU based implementations on state-of-the-art DLRM benchmarks on a real heterogeneous memory platform from Intel, and 1.32X to 1.45X improvement over Intel’s 2LM implementation that treats the DRAM as a hardware managed cache.

Cite

CITATION STYLE

APA

Hildebrand, M., Lowe-Power, J., & Akella, V. (2023). Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13948 LNCS, pp. 42–61). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-32041-5_3

Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems

Abstract

Cite

Register to see more suggestions