Abstract
Video super-resolution (VSR) aims at recovering high-resolution frames from their low-resolution counterparts. Over the past few years, deep neural networks have dominated the video super-resolution task because of its strong non-linear representational ability. To exploit temporal correlations, most deep neural networks have to face two challenges: (1) how to align consecutive frames containing motions, occlusions and blurring, and establish accurate temporal correspondences, (2) how to effectively fuse aligned frames and balance their contributions. In this work, a novel video super-resolution network, named NLVSR, is proposed to solve above problems in an efficient and effective manner. For alignment, a temporal-spatial non-local operation is employed to align each frame to the reference frame. Compared with existing alignment approaches, the proposed temporal-spatial non-local operation is able to integrate the global information of each frame by a weighted sum, leading to a better performance in alignment. For fusion, an attention-based progressive fusion framework was designed to integrate aligned frames gradually. To penalize the points with low-quality in aligned features, an attention mechanism was employed for a robust reconstruction. Experimental results demonstrate the superiority of the proposed network in terms of quantitative and qualitative evaluation, and surpasses other state-of-the-art methods by 0.33 dB at least.
Cite
CITATION STYLE
Zhou, C., Chen, C., Ding, F., & Zhang, D. (2021). Video super-resolution with non-local alignment network. IET Image Processing, 15(8), 1655–1667. https://doi.org/10.1049/ipr2.12134
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.