Exploiting Parallelism Opportunities with Deep Learning Frameworks

Yu Emma Wang; Carole Jean Wu; Xiaodong Wang; Kim Hazelwood; David Brooks

Journal ArticleOPEN ACCESS

Exploiting Parallelism Opportunities with Deep Learning Frameworks

ACM Transactions on Architecture and Code Optimization (2021) 18(1)

DOI: 10.1145/3431388

18Citations

62Readers

Abstract

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y. E., Wu, C. J., Wang, X., Hazelwood, K., & Brooks, D. (2021). Exploiting Parallelism Opportunities with Deep Learning Frameworks. ACM Transactions on Architecture and Code Optimization, 18(1). https://doi.org/10.1145/3431388

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Abstract

Author supplied keywords

Cite

Register to see more suggestions