Images Classification Integrating Transformer with Convolutional Neural Network

  • Peng Y
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Convolutional neural networks (CNN) are one of the most widely used deep learning methods in computer vision, which can effectively extract local spatial information from images, but lack global understanding and dependency modelling of image features. As a result, contextual information cannot be fully utilized by the network. For example, on coordinate modelling tasks (such as object detection, image generation, etc.), CNN may not be able to accurately locate or reconstruct the position and shape of objects. In contrast to traditional CNN models such as ResNet, Transformers rely on their global attention mechanism to capture long-distance dependencies between patches. The thesis presents an enhanced lightweight method which integrates Transformer with five convolutional neural layers. Model based on CNN and Transformer is tested on the two benchmark datasets MNIST and CIFAR-10. After a few epochs, the model is convergent and reaches high accuracy of 99.34% in MNIST and 92.04% in CIFAR-10. This model outperforms the single CNN and some state-of-the-art models in classifying both datasets, especially in distinguishing similar images like ‘6’ and ‘9’, ‘bird’ and ‘plane’. These results indicate the model's good robustness and generality.

Cite

CITATION STYLE

APA

Peng, Y. (2023). Images Classification Integrating Transformer with Convolutional Neural Network. Advances in Engineering Technology Research, 6(1), 621. https://doi.org/10.56028/aetr.6.1.621.2023

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free