Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as a great optimization example Well walk step by step through 7 different versions Demonstrates
CITATION STYLE
Harris, M., Blelloch, G. E., Maggs, B. M., Govindaraju, N. K., Lloyd, B., Wang, W., … Margolin, L. G. (2007). Optimizing parallel reduction in CUDA. Proc. of ACM SIGMOD, 21, 13, 104–110.
Mendeley helps you to discover research relevant for your work.