In this paper, we review some alternatives to reduce the computational complexity of the Non-Local Means image filter and present a CUDA-based implementation of it for GPUs, comparing its performance on different GPUs and with respect to reference CPU implementations. Starting from a naive CUDA implementation, we describe different aspects of CUDA and the algorithm itself that can be leveraged to decrease the execution time. Our GPU implementation achieved speedups of up to 35.8x with respect to our reduced-complexity reference implementation on the CPU, and more than 700x over a plain CPU implementation. © Springer-Verlag 2013.
CITATION STYLE
Márques, A., & Pardo, A. (2013). Implementation of non local means filter in GPUs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8258 LNCS, pp. 407–414). https://doi.org/10.1007/978-3-642-41822-8_51
Mendeley helps you to discover research relevant for your work.