Recently, Transformer-based architecture has been introduced into face super-resolution task due to its advantage in capturing long-range dependencies. However, these approaches tend to integrate global information in a large searching region, which neglect to focus on the most relevant information and induce blurry effect by the irrelevant textures. Some improved methods simply constrain self-attention in a local window to suppress the useless information. But it also limits the capability of recovering high-frequency details when flat areas dominate the local searching window. To improve the above issues, we propose a novel self-refinement mechanism which could adaptively achieve texture-aware reconstruction in a coarse-to-fine procedure. Generally, the primary self-attention is first conducted to reconstruct the coarse-grained textures and detect the fine-grained regions required further compensation. Then, region selection attention is performed to refine the textures on these key regions. Since self-attention considers the channel information on tokens equally, we employ a dual-branch feature integration module to privilege the important channels in feature extraction. Furthermore, we design the wavelet fusion module which integrates shallow-layer structure and deep-layer detailed feature to recover realistic face images in frequency domain. Extensive experiments demonstrate the effectiveness on a variety of datasets. The code is released at https://github.com/Guanxin-Li/LAA-Transformer.
CITATION STYLE
Li, G., Shi, J., Zong, Y., Wang, F., Wang, T., & Gong, Y. (2023). Learning Attention from Attention: Efficient Self-Refinement Transformer for Face Super-Resolution. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2023-August, pp. 1035–1043). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2023/115
Mendeley helps you to discover research relevant for your work.