Abstract
In this work, we explore the You Only Look Once (YOLO) single-stage object detection architecture and compare it to the simultaneous classification of 10647 fixed region proposals. We use two different approaches to demonstrate that each of YOLO’s grid cells is attentive to a specific sub-region of previous layers. This finding makes YOLO’s method comparable to local region proposals. Such insight reduces the conceptual gap between YOLO-like single-stage object detection models, R-CNN-like two-stage region proposal based models, and ResNet-like image classification models. For this work, we created interactive exploration tools for a better visual understanding of the YOLO information processing streams: https://limchr.github.io/yolo_visu
Author supplied keywords
Cite
CITATION STYLE
Limberg, C., Melnik, A., Ritter, H., & Prendinger, H. (2023). YOLO: You Only Look 10647 Times. In Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 5, pp. 153–160). Science and Technology Publications, Lda. https://doi.org/10.5220/0011677300003417
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.