Abstract
Deep learning training drives a wide range of machine learning applications on high performance computing systems including object detection and activity detection in computer vision, natural language processing, anomaly detection, speech recognition, super-resolution, and realistic image creation from text. Deep learning models are usually developed and trained inside a machine learning framework such as PyTorch or TensorFlow and may require tens of thousands of training epochs across multiple devices for hundreds of hours to adequately train the model for use in inference. This chapter explores deep learning in the context of high performance computing and introduces the different modalities for leveraging high performance computing systems to concurrently train a machine learning model.
Author supplied keywords
Cite
CITATION STYLE
Sterling, T., Anderson, M., & Brodowicz, M. (2024). Machine Learning. In High Performance Computing: Modern Systems and Practices (pp. 383–393). Elsevier. https://doi.org/10.1016/B978-0-12-823035-0.00019-5
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.