Exploiting Parallelism Opportunities with Deep Learning Frameworks

18Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.

Cite

CITATION STYLE

APA

Wang, Y. E., Wu, C. J., Wang, X., Hazelwood, K., & Brooks, D. (2021). Exploiting Parallelism Opportunities with Deep Learning Frameworks. ACM Transactions on Architecture and Code Optimization, 18(1). https://doi.org/10.1145/3431388

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free