Hierarchical feature fusion attention network for image super-resolution reconstruction

Pengcheng Lei; Cong Liu; Jiangang Tang; Dunlu Peng

Journal ArticleOPEN ACCESS

Hierarchical feature fusion attention network for image super-resolution reconstruction

Journal of Image and Graphics (2020) 25(9) 1773-1786

DOI: 10.11834/jig.190607

16Citations

5Readers

Abstract

Objective: Single-image super-resolution (SISR) techniques aim to reconstruct a high-resolution image from a single low-resolution image. Given that high-resolution images contain substantial useful information, SISR technology has been widely used in medical imaging, face authentication, public relations, security monitoring, and other tasks. With the rapid development of deep learning, the convolution neural network (CNN)-based SISR method has achieved remarkable success in the field of SISR. From super-resolution CNN (SRCNN) to residual channel attention network (RCAN), the depth and the performance of the network have considerably improved. However, some problems need to be improved. 1) Increasing the depth of a network can improve reconstruction performance effectively; however, it also increases the calculation complexity of the network and leads to a poor real-time performance. 2) An image contains a large amount of high- and low-frequency information. The area with high-frequency information should be more important than the area with low-frequency information. However, most recent CNN-based methods treat these two areas equally and thus lack flexibility. 3) Feature maps at different depths carry different receptive field information with different scales. Integrating these feature maps can enhance the information flow of different convolution layers. Most current CNN-based methods only consider feature maps with a single scale. To solve these problems, we propose a lightweight hierarchical feature fusion spatial attention network to learn additional useful high-frequency information. Method: The proposed network is mainly composed of four parts, namely, the shallow feature extraction, hierarchical feature fusion, up-sampling, and reconstruction parts. In the shallow feature extraction part, a convolution layer is used to extract the shallow feature and expand the number of channels. The hierarchical feature fusion part comprises nine residual attention blocks, which are evenly divided into three residual attention groups, each of which contains three residual attention blocks. The feature maps at different depths are fused by using local and global feature fusion strategies. On the one hand, the local feature fusion strategy is used to fuse the feature maps obtained by the three residual attention blocks in each residual attention group. On the other hand, the global feature fusion strategy is used to fuse the feature maps obtained by three residual attention groups. The two feature fusion strategies can integrate feature maps with different scales to enhance the information flow of different depths in the network. This study focuses on the residual attention block, which is composed of a residual block module and a spatial attention module. In each residual attention block, two 3 × 3 convolution layers are first used to extract several feature maps, and then a spatial attention module is used to assign different weights to different spatial positions for different feature maps. The core problem is how to obtain the appropriate weight set. According to our analysis, pooling along the channel axis can effectively highlight the importance of the areas with high-frequency information. Hence, we first apply average and maximum pooling along the channel axis to generate two representative feature descriptors. Afterward, a 5×5 and a 1×1 convolution layer are used to fuse the information in each position with its neighbor positions. The spatial attention value of each position is finally obtained by using a sigmoid function. The third part is the up-sampling part, which uses subpixel convolution to upsample the low-resolution (LR) feature maps and obtain a large-scale feature map. Lastly, in the reconstruction part, the number of channels is compressed to the target number by using a 3 × 3 convolution layer, thus obtaining a reconstructed high-resolution image. During the training stage, a DIVerse 2K(DIV2K) dateset is used to train the proposed network, and 32 000 image patches with a size of 48×48 pixels are obtained as LR images by random cropping. L1 loss is used as the loss function in our network; this function is optimized using the Adam algorithm. Result: We compare our network with some traditional methods, such as bicubic interpolation, SRCNN, very deep super-resolution convolutional networks (VDSR), deep recursive residual networks (DRRN), residual dense networks (RDN), and RCAN. Five datasets, including Set5, Set14, Berkeley segmentation dataset(BSD)100, Urban100, and Manga109, are used as testsets to show the performance of the proposed method. Two indices, including peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), are used to evaluate the reconstruction results of the proposed method and the other methods used for comparison. The average PSNR and SSIM values are obtained from the results of different methods on the five test datasets with different scale factors. Four test images with different scales are used to show the reconstruction results from using different methods. In addition, the proposed method is compared with enhanced deep residual networks (EDSR) in the convergence curve. Experiments show that the proposed method can recover more detailed information and clearer edges compared with most of the compared methods. Conclusion: We propose a hierarchical feature fusion attention network in this study. Such network can quickly recover high-frequency details with the help of the spatial attention module and the hierarchical feature fusion structure, thus obtaining reconstructed results that have a more detailed texture.

Author supplied keywords

Cite

CITATION STYLE

APA

Lei, P., Liu, C., Tang, J., & Peng, D. (2020). Hierarchical feature fusion attention network for image super-resolution reconstruction. Journal of Image and Graphics, 25(9), 1773–1786. https://doi.org/10.11834/jig.190607

Hierarchical feature fusion attention network for image super-resolution reconstruction

Abstract

Author supplied keywords

Cite

Register to see more suggestions