Efficient GPU implementation of the integral histogram

13Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The integral histogram for images is an efficient preprocessing method for speeding up diverse computer vision algorithms including object detection, appearance-based tracking, recognition and segmentation. Our proposed Graphics Processing Unit (GPU) implementation uses parallel prefix sums on row and column histograms in a cross-weave scan with high GPU utilization and communication-aware data transfer between CPU and GPU memories. Two different data structures and communication models were evaluated. A 3-D array to store binned histograms for each pixel and an equivalent linearized 1-D array, each with distinctive data movement patterns. Using the 3-D array with many kernel invocations and low workload per kernel was inefficient, highlighting the necessity for careful mapping of sequential algorithms onto the GPU. The reorganized 1-D array with a single data transfer to the GPU with high GPU utilization, was 60 times faster than the CPU version for a 1K x 1K image reaching 49 fr/sec and 21 times faster for 512 x 512 images reaching 194 fr/sec. The integral histogram module is applied as part of the likelihood of features tracking (LOFT) system for video object tracking using fusion of multiple cues. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Poostchi, M., Palaniappan, K., Bunyak, F., Becchi, M., & Seetharaman, G. (2013). Efficient GPU implementation of the integral histogram. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7728 LNCS, pp. 266–278). https://doi.org/10.1007/978-3-642-37410-4_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free