We present MCF3D, a multi-stage complementary fusion three-dimensional (3D) object detection network for autonomous driving, robot navigation, and virtual reality. This is an end-to-end learnable architecture, which takes both LIDAR point clouds and RGB images as inputs and utilizes a 3D region proposal subnet and second stage detector(s) subnet to achieve high-precision oriented 3D bounding box prediction. To fully exploit the strength of multimodal information, we design a series of fine and targeted fusion methods based on the attention mechanism and prior knowledge, including 'pre-fusion,' 'anchor-fusion,' and 'proposal-fusion.' Our proposed RGB-Intensity form encodes the reflection intensity onto the input image to strengthen the representational power. Our designed proposal-element attention module allows the network to be guided to focus more on efficient and critical information with negligible overheads. In addition, we propose a cascade-enhanced detector for small classes, which is more selective against close false positives. The experiments on the challenging KITTI benchmark show that our MCF3D method produces state-of-the-art results while running in near real-time with a low memory footprint.
CITATION STYLE
Wang, J., Zhu, M., Sun, D., Wang, B., Gao, W., & Wei, H. (2019). MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection. IEEE Access, 7, 90801–90814. https://doi.org/10.1109/ACCESS.2019.2927012
Mendeley helps you to discover research relevant for your work.