We present a review of the methods behind the top 40 highest accuracies achieved on the ILSVRC 2012 Imagenet validation set as ranked on Papers with Code. A significant proportion of these methods involve using transformer-based architectures, but it should be noted that none of the methods are naïve self-attention transformers, which would be unmanageably large if the tokens were derived on a per-pixel basis. Rather, the works we review here all toil with different methods of combining the global nature of self-attention with the local nature of fine-grained image features, which have historically been the strength of convolutional neural networks. However, it should be noted that 9 out of 22 works reviewed did NOT use transformers.
CITATION STYLE
Byerly, A., Kalganova, T., & Ott, R. (2022). The Current State of the Art in Deep Learning for Image Classification: A Review. In Lecture Notes in Networks and Systems (Vol. 507 LNNS, pp. 88–105). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-10464-0_7
Mendeley helps you to discover research relevant for your work.