On Accurate Floating-Point Summation

49Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

The acumulation of floating-point sums is considered on a computer which performs t-digit base β floating-point addition with exponents in the range —m to M. An algorithm is given for accurately summing n t-digit floating-point numbers. Each of these n numbers is split into q parts, forming q·n t-digit floating-point numbers. Each of these is then added to the appropriate one of η auxiliary t-digit accumulators. Finally, the accumulators are added together to yield the computed sum. In all, q·n + η - 1 t-digit floating-point additions are performed. Let ν = ⌈(M + m + 1)/(η + 1)⌉. If n ≤ (1/q)β⌈((q-1)/q)t⌈-ν+1 (*), then the relative error in the computed sum is at most ⌈(t + 1)/ν⌉β1-t. Further, with an additional q + η - 1 t-digit additions, the computed sum can be corrected to full t-digit accuracy. For example, for the IBM/360 (β = 16, t = 14, M = 63, m = 64), typical values for q and η are q = 2 and η = 32. In this case, (*) becomes n ≤ 1/2 × 164 = 32,768, and we have ⌈(t + 1)/ν⌉β1-t = 4 × 16-13. © 1971, ACM. All rights reserved.

Cite

CITATION STYLE

APA

Malcolm, M. A. (1971). On Accurate Floating-Point Summation. Communications of the ACM, 14(11), 731–736. https://doi.org/10.1145/362854.362889

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free