Hybrid Architecture Based on CNN and Transformer for Strip Steel Surface Defect Classification

68Citations
Citations of this article
46Readers
Mendeley users who have this article in their library.

Abstract

Strip steel surface defects occur frequently during the manufacturing process, and these defects cause hidden risks in the use of subsequent strip products. Therefore, it is crucial to classify the strip steel’s surface defects accurately and efficiently. Most classification models of strip steel surface defects are generally based on convolutional neural networks (CNNs). However, CNNs, with local receptive fields, do not have admirable global representation ability, resulting in poor classification performance. To this end, we proposed a hybrid network architecture (CNN-T), which merges CNN and Transformer encoder. The CNN-T network has both strong inductive biases (e.g., translation invariance, locality) and global modeling capability. Specifically, CNN first extracts low-level and local features from the images. The Transformer encoder then globally models these features, extracting abstract and high-level semantic information and finally sending them to the multilayer perceptron classifier for classification. Extensive experiments show that the classification performance of CNN-T outperforms pure Transformer networks and CNNs (e.g., GoogLeNet, MobileNet v2, ResNet18) on the NEU-CLS dataset (training ratio is 80%) with a 0.28–2.23% improvement in classification accuracy, with fewer parameters (0.45 M) and floating-point operations (0.12 G).

Cite

CITATION STYLE

APA

Li, S., Wu, C., & Xiong, N. (2022). Hybrid Architecture Based on CNN and Transformer for Strip Steel Surface Defect Classification. Electronics (Switzerland), 11(8). https://doi.org/10.3390/electronics11081200

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free