This paper presents the performance of DSP, image and 3D applications on recent general-purpose microprocessors using streaming SIMD ISA extensions (integer and floating point). The 9 benchmarks benchmark we use for this evaluation have been optimized for DLP and caches use with SIMD extensions and data prefetch. The result of these cumulated optimizations is a speedup that ranges from 1. 9 to 7. 1. All the benchmarks were originaly computation bound and 7 becomes memory bandwidth bound with the addition of SIMD and data prefetch. Quadrupling the memory bandwidth has no effect on original kernels but improves the performance of SIMD kernels by 15-55%.
CITATION STYLE
Sebot, J., & Drach-Temam, N. (2001). Memory bandwidth: The true bottleneck of SIMD multimedia performance on a superscalar processor. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2150, pp. 439–447). Springer Verlag. https://doi.org/10.1007/3-540-44681-8_63
Mendeley helps you to discover research relevant for your work.