Images Classification Integrating Transformer with Convolutional Neural Network

Yulin Peng

Journal ArticleOPEN ACCESS

Images Classification Integrating Transformer with Convolutional Neural Network

Peng Y

Advances in Engineering Technology Research (2023) 6(1) 621

DOI: 10.56028/aetr.6.1.621.2023

N/ACitations

5Readers

Abstract

Convolutional neural networks (CNN) are one of the most widely used deep learning methods in computer vision, which can effectively extract local spatial information from images, but lack global understanding and dependency modelling of image features. As a result, contextual information cannot be fully utilized by the network. For example, on coordinate modelling tasks (such as object detection, image generation, etc.), CNN may not be able to accurately locate or reconstruct the position and shape of objects. In contrast to traditional CNN models such as ResNet, Transformers rely on their global attention mechanism to capture long-distance dependencies between patches. The thesis presents an enhanced lightweight method which integrates Transformer with five convolutional neural layers. Model based on CNN and Transformer is tested on the two benchmark datasets MNIST and CIFAR-10. After a few epochs, the model is convergent and reaches high accuracy of 99.34% in MNIST and 92.04% in CIFAR-10. This model outperforms the single CNN and some state-of-the-art models in classifying both datasets, especially in distinguishing similar images like ‘6’ and ‘9’, ‘bird’ and ‘plane’. These results indicate the model's good robustness and generality.

Cite

CITATION STYLE

APA

Peng, Y. (2023). Images Classification Integrating Transformer with Convolutional Neural Network. Advances in Engineering Technology Research, 6(1), 621. https://doi.org/10.56028/aetr.6.1.621.2023

Images Classification Integrating Transformer with Convolutional Neural Network

Abstract

Cite

Register to see more suggestions