Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a new data structure called CachedEmbeddings for training large scale deep learning recommendation models (DLRM) efficiently on heterogeneous (DRAM + non-volatile) memory platforms. CachedEmbeddings implements an implicit software-managed cache and data movement optimization that is integrated with the Julia programming framework to optimize the implementation of large scale DLRM implementations with multiple sparse embedded tables operations. In particular we show an implementation that is 1.4X to 2X better than the best known Intel CPU based implementations on state-of-the-art DLRM benchmarks on a real heterogeneous memory platform from Intel, and 1.32X to 1.45X improvement over Intel’s 2LM implementation that treats the DRAM as a hardware managed cache.

Cite

CITATION STYLE

APA

Hildebrand, M., Lowe-Power, J., & Akella, V. (2023). Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13948 LNCS, pp. 42–61). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-32041-5_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free