Angel: A new large-scale machine learning system

Jie Jiang; Lele Yu; Jiawei Jiang; Yuhong Liu; Bin Cui

Journal ArticleOPEN ACCESS

Angel: A new large-scale machine learning system

National Science Review (2018) 5(2) 216-236

DOI: 10.1093/nsr/nwx018

63Citations

73Readers

Abstract

Machine Learning (ML) techniques now are ubiquitous tools to extract structural information from data collections.With the increasing volume of data, large-scaleML applications require an efficient implementation to accelerate the performance. Existing systems parallelize algorithms through either data parallelism or model parallelism. But data parallelism cannot obtain good statistical efficiency due to the conflicting updates to parameters while the performance is damaged by global barriers in model parallel methods. In this paper, we propose a new system, named Angel, to facilitate the development of large-scale ML applications in production environment. By allowing concurrent updates to model across different groups and scheduling the updates in each group, Angel can achieve a good balance between hardware efficiency and statistical efficiency. Besides, Angel reduces the network latency by overlapping the parameter pulling and update computing and also utilizes the sparseness of data to avoid the pulling of unnecessary parameters. We also enhance the usability of Angel by providing a set of efficient tools to integrate with application pipelines and provisioning efficient fault tolerance mechanisms.We conduct extensive experiments to demonstrate the superiority of Angel.

Author supplied keywords

Cite

CITATION STYLE

APA

Jiang, J., Yu, L., Jiang, J., Liu, Y., & Cui, B. (2018). Angel: A new large-scale machine learning system. National Science Review, 5(2), 216–236. https://doi.org/10.1093/nsr/nwx018

Angel: A new large-scale machine learning system

Abstract

Author supplied keywords

Cite

Register to see more suggestions