SRDD: a lightweight end-to-end object detection with transformer

12Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Computer vision is now playing a vital role in modern UAV (Unmanned Aerial Vehicle) systems. However, the on-board real-time small object detection for UAVs remains challenging. This paper presents an end-to-end ViT (Vision Transformer) detector, named Sparse ROI-based Deformable DETR (SRDD), to make ViT model available to UAV on-board systems. We embed a scoring network in the transformer T-encoder to selectively prune the redundant tokens, at the same time, introduce ROI-based detection refinement module in the decoder to optimise detection performance while maintaining end-to-end detection pipeline. By using scoring networks, we compress the Transformer encoder/decoder to 1/3-layer structure, which is far slim compared with DETR. With the help of lightweight backbone ResT and dynamic anchor box, we relieve the memory insufficient of on-board SoC. Experiment on UAVDT dataset shows the proposed SRDD method achieved 50.2% mAP (outperforms Deformable DETR at least 7%). In addition, the lightweight version of SRDD achieved 51.08% mAP with 44% Params reduction.

Cite

CITATION STYLE

APA

Zhu, Y., Xia, Q., & Jin, W. (2022). SRDD: a lightweight end-to-end object detection with transformer. Connection Science, 34(1), 2448–2465. https://doi.org/10.1080/09540091.2022.2125499

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free