With regards to the nature of high intensive computation for motion estimation with an H.264/AVC encoder, this paper presents a parallel block-matching algorithm implemented on a general purpose graphics processing units (GPU) to speed up the execution of UAV video coding. Traditional parallel block-matching algorithms are primarily used to leverage the huge number of computational cores in graphic processing units, which can be used to compute the block-matching operation at each candidate position in a search range by an independent thread of kernel computation. In realistic scenarios, the time used to transfer pixel values among the various memory modules to fulfill the operation in a GPU system is much higher than the computation time used for computing each block-matching operation by the kernel threads. This leads to a performance improvement bottleneck for GPU algorithm design. The proposed algorithm exploits the characteristics of distinct memory modules on the data transfer speed for the block-matching algorithm and proposes a feasible mechanism to reduce the bandwidth of data transmission required for the parallel block-matching algorithms implemented on GPU system. With experiments on GPU systems, the proposed parallel block-matching algorithm gains up to 99% execution reduction of motion estimation compared to the host processor only motion estimation process.
CITATION STYLE
Lin, Y. C., & Wu, S. C. (2017). An accelerated H.264/AVC encoder on graphic processing unit for UAV videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10149 LNCS, pp. 251–258). Springer Verlag. https://doi.org/10.1007/978-3-319-54609-4_19
Mendeley helps you to discover research relevant for your work.