DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

  • Wang S
  • Zheng H
  • Wen X
  • et al.
N/ACitations
Citations of this article
21Readers
Mendeley users who have this article in their library.

Abstract

This paper comprehensively analyzes distributed high-performance computing methods for accelerating deep learning training. We explore the evolution of distributed computing architectures, including data parallelism, model parallelism, and pipeline parallelism, and their hybrid implementations. The study delves into optimization techniques crucial for large-scale training, such as distributed optimization algorithms, gradient compression, and adaptive learning rate methods. We investigate communication-efficient algorithms, including Ring All Reduce variants and decentralized training approaches, which address the scalability challenges in distributed systems. The research examines hardware acceleration and specialized systems, focusing on GPU clusters, custom AI accelerators, high-performance interconnects, and distributed storage systems optimized for deep learning workloads. Finally, we discuss this field's challenges and future directions, including scalability-efficiency trade-offs, fault tolerance, energy efficiency in large-scale training, and emerging trends like federated learning and neuromorphic computing. Our findings highlight the synergy between advanced algorithms, specialized hardware, and optimized system designs in pushing the boundaries of large-scale deep learning, paving the way for future breakthroughs in artificial intelligence.

Cite

CITATION STYLE

APA

Wang, S., Zheng, H., Wen, X., & Fu, S. (2024). DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING. Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (Online), 3(3), 108–126. https://doi.org/10.60087/jklst.v3.n4.p22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free