TPH-YOLOv5++: Boosting Object Detection on Drone-Captured Scenarios with Cross-Layer Asymmetric Transformer

47Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Object detection in drone-captured images is a popular task in recent years. As drones always navigate at different altitudes, the object scale varies considerably, which burdens the optimization of models. Moreover, high-speed and low-altitude flight cause motion blur on densely packed objects, which leads to great challenges. To solve the two issues mentioned above, based on YOLOv5, we add an additional prediction head to detect tiny-scale objects and replace CNN-based prediction heads with transformer prediction heads (TPH), constructing the TPH-YOLOv5 model. TPH-YOLOv5++ is proposed to significantly reduce the computational cost and improve the detection speed of TPH-YOLOv5. In TPH-YOLOv5++, cross-layer asymmetric transformer (CA-Trans) is designed to replace the additional prediction head while maintain the knowledge of this head. By using a sparse local attention (SLA) module, the asymmetric information between the additional head and other heads can be captured efficiently, enriching the features of other heads. In the VisDrone Challenge 2021, TPH-YOLOv5 won 4th place and achieved well-matched results with the 1st place model (AP 39.43%). Based on the TPH-YOLOv5 and CA-Trans module, TPH-YOLOv5++ can further increase efficiency while achieving comparable and better results.

Cite

CITATION STYLE

APA

Zhao, Q., Liu, B., Lyu, S., Wang, C., & Zhang, H. (2023). TPH-YOLOv5++: Boosting Object Detection on Drone-Captured Scenarios with Cross-Layer Asymmetric Transformer. Remote Sensing, 15(6). https://doi.org/10.3390/rs15061687

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free