Implementation of non local means filter in GPUs

14Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we review some alternatives to reduce the computational complexity of the Non-Local Means image filter and present a CUDA-based implementation of it for GPUs, comparing its performance on different GPUs and with respect to reference CPU implementations. Starting from a naive CUDA implementation, we describe different aspects of CUDA and the algorithm itself that can be leveraged to decrease the execution time. Our GPU implementation achieved speedups of up to 35.8x with respect to our reduced-complexity reference implementation on the CPU, and more than 700x over a plain CPU implementation. © Springer-Verlag 2013.

Cite

CITATION STYLE

APA

Márques, A., & Pardo, A. (2013). Implementation of non local means filter in GPUs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8258 LNCS, pp. 407–414). https://doi.org/10.1007/978-3-642-41822-8_51

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free