GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Maximilian Lam; Jeff Johnson; Wenjie Xiong; Kiwan Maeng; Udit Gupta; Yang Li; Liangzhen Lai; Ilias Leontiadis; Minsoo Rhu; Hsien Hsin S. Lee; Vijay Janapa Reddi; Gu Yeon Wei; David Brooks; Edward Suh

Conference ProceedingsOPEN ACCESS

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (2024) 1 197-214

DOI: 10.1145/3617232.3624855

10Citations

17Readers

Get full text

Abstract

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1 - 10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20× over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5× additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000 queries per second - -a > 100× throughput improvement over a CPU-based baseline - -while maintaining model accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Lam, M., Johnson, J., Xiong, W., Maeng, K., Gupta, U., Li, Y., … Suh, E. (2024). GPU-based Private Information Retrieval for On-Device Machine Learning Inference. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (Vol. 1, pp. 197–214). Association for Computing Machinery. https://doi.org/10.1145/3617232.3624855

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Abstract

Author supplied keywords

Cite

Register to see more suggestions