The acumulation of floating-point sums is considered on a computer which performs t-digit base β floating-point addition with exponents in the range —m to M. An algorithm is given for accurately summing n t-digit floating-point numbers. Each of these n numbers is split into q parts, forming q·n t-digit floating-point numbers. Each of these is then added to the appropriate one of η auxiliary t-digit accumulators. Finally, the accumulators are added together to yield the computed sum. In all, q·n + η - 1 t-digit floating-point additions are performed. Let ν = ⌈(M + m + 1)/(η + 1)⌉. If n ≤ (1/q)β⌈((q-1)/q)t⌈-ν+1 (*), then the relative error in the computed sum is at most ⌈(t + 1)/ν⌉β1-t. Further, with an additional q + η - 1 t-digit additions, the computed sum can be corrected to full t-digit accuracy. For example, for the IBM/360 (β = 16, t = 14, M = 63, m = 64), typical values for q and η are q = 2 and η = 32. In this case, (*) becomes n ≤ 1/2 × 164 = 32,768, and we have ⌈(t + 1)/ν⌉β1-t = 4 × 16-13. © 1971, ACM. All rights reserved.
CITATION STYLE
Malcolm, M. A. (1971). On Accurate Floating-Point Summation. Communications of the ACM, 14(11), 731–736. https://doi.org/10.1145/362854.362889
Mendeley helps you to discover research relevant for your work.