Algorithms for Efficient Reproducible Floating Point Summation

10Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

We define "reproducibility"as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a "reproducible accumulator"data structure (the "binned number") and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing n words with a 6-word reproducible accumulator requires approximately 9n floating point operations (arithmetic, comparison, and absolute value) and approximately 3n bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 229 times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.

Cite

CITATION STYLE

APA

Ahrens, P., Demmel, J., & Nguyen, H. D. (2020). Algorithms for Efficient Reproducible Floating Point Summation. ACM Transactions on Mathematical Software, 46(3). https://doi.org/10.1145/3389360

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free