Abstract
Crowding and occlusion pose significant challenges for pedestrian detection, which can easily lead to missed and false detections for small-scale and occluded pedestrian objects in dense pedestrian scenarios. To enhance dense pedestrian detection accuracy, we propose the Residual Transformer YOLO (RT-YOLO) algorithm in this paper. The RT-YOLO algorithm enhances the multi-scale fusion strategy based on YOLOv7 and introduces a dedicated detection layer for small-scale occluded targets. It also integrates Resnet and Transformer structures to improve the small-scale feature layer and detection head, enhancing feature extraction capabilities. Additionally, the RT-YOLO algorithm incorporates the Normalization-based Attention Module (NAM) into the backbone and neck networks to identify the region of interest. The experiments demonstrate that on the CrowdHuman and WiderPerson datasets, at IOU (Intersection over Union) = 0.5, the overall improvement in (Formula presented.) is 3.8% and 3.4%. In the IOU range from 0.5 to 1, the improvement in (Formula presented.) : 95 is 5.1% and 4%. RT-YOLO achieves an FPS of 67, maintaining real-time performance. On the VOC2007 dataset, (Formula presented.) has been enhanced by 5.1%, indicating higher effectiveness and robustness.
Author supplied keywords
Cite
CITATION STYLE
Ye, H., & Wang, Y. (2023). Residual Transformer YOLO for Detecting Multi-Scale Crowded Pedestrian. Applied Sciences (Switzerland), 13(21). https://doi.org/10.3390/app132112032
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.