Empirical performance analysis of collective communication for distributed deep learning in a many-core CPU environment

Junghoon Woo; Hyeonseong Choi; Jaehwan Lee

Journal ArticleOPEN ACCESS

Empirical performance analysis of collective communication for distributed deep learning in a many-core CPU environment

Applied Sciences (Switzerland) (2020) 10(19)

DOI: 10.3390/APP10196717

4Citations

10Readers

Abstract

To accommodate lots of training data and complex training models, "distributed" deep learning training has become employed more and more frequently. However, communication bottlenecks between distributed systems lead to poor performance of distributed deep learning training. In this study, we proposed a new collective communication method in a Python environment by utilizing Multi-Channel Dynamic Random Access Memory (MCDRAM) in Intel Xeon Phi Knights Landing processors. Major deep learning software platforms, such as TensorFlow and PyTorch, offer Python as a main development language, so we developed an efficient communication library by adapting Memkind library, which is a C-based library to utilize high-performance memory MCDRAM. For performance evaluation, we tested the popular collective communication methods in distributed deep learning, such as Broadcast, Gather, and AllReduce. We conducted experiments to analyze the effect of high-performance memory and processor location on communication performance. In addition, we analyze performance in a Docker environment for further relevance given the recent major trend of Cloud computing. By extensive experiments in our testbed, we confirmed that the communication in our proposed method showed performance improvement by up to 487%.

Author supplied keywords

Cite

CITATION STYLE

APA

Woo, J., Choi, H., & Lee, J. (2020). Empirical performance analysis of collective communication for distributed deep learning in a many-core CPU environment. Applied Sciences (Switzerland), 10(19). https://doi.org/10.3390/APP10196717

Empirical performance analysis of collective communication for distributed deep learning in a many-core CPU environment

Abstract

Author supplied keywords

Cite

Register to see more suggestions