Deep learning (DL) has become an integral part of solutions to various important problems, which is why ensuring the quality of DL systems is essential. One of the challenges of achieving reliability and robustness of DL software is to ensure that algorithm implementations are numerically stable. DL algorithms require a large amount and a wide variety of numerical computations. A naive implementation of numerical computation can lead to errors that may result in incorrect or inaccurate learning and results. A numerical algorithm or a mathematical formula can have several implementations that are mathematically equivalent, but have different numerical stability properties. Designing numerically stable algorithm implementations is challenging, because it requires an interdisciplinary knowledge of software engineering, DL, and numerical analysis. In this paper, we study two mature DL libraries PyTorch and Tensorflow with the goal of identifying unstable numerical methods and their solutions. Specifically, we investigate which DL algorithms are numerically unstable and conduct an in-depth analysis of the root cause, manifestation, and patches to numerical instabilities. Based on these findings, we launch DeepStability, the first database of numerical stability issues and solutions in DL. Our findings and DeepStability provide future references to developers and tool builders to prevent, detect, localize and fix numerically unstable algorithm implementations. To demonstrate that, using DeepStability we have located numerical stability issues in Tensorflow, and submitted a fix which has been accepted and merged in.
CITATION STYLE
Kloberdanz, E., Kloberdanz, K. G., & Le, W. (2022). DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep Learning. In Proceedings - International Conference on Software Engineering (Vol. 2022-May, pp. 586–597). IEEE Computer Society. https://doi.org/10.1145/3510003.3510095
Mendeley helps you to discover research relevant for your work.