Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer architecture similar to that of ResNet-50 to represent appropriate perceptual mechanisms in image quality assessment (IQA) to build an accurate IQA model. We employ adaptive learnable position embedding to handle images with arbitrary resolution. We propose a new transformer block (TB) by taking advantage of transformers to capture long-range dependencies, and of local information perception (LIP) to model local features for enhanced representation learning. The module increases the model’s understanding of the image content. Dual path pooling (DPP) is used to keep more contextual image quality information in feature downsampling. Experimental results verify that Conv-Former not only outperforms the state-of-the-art methods on authentic image databases, but also achieves competing performances on synthetic image databases which demonstrate the strong fitting performance and generalization capability of our proposed model.

Cite

CITATION STYLE

APA

Han, L., Lv, H., Zhao, Y., Liu, H., Bi, G., Yin, Z., & Fang, Y. (2023). Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment. Sensors, 23(1). https://doi.org/10.3390/s23010427

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free