Abstract
Artificial intelligence gradually plays the essential role in automatic driving, such as 3d object detection. Many state-of-the-art 3d detection frameworks fuse point cloud data and image data to perceive the surrounding environment of the vehicle. However, these approaches focus more on vehicle detections, and for objects with less point cloud sampling, such as pedestrians and cyclists, the performance is moderate. In this paper, we propose the multi-fusion framework with two kinds of attention mechanisms to solve the above problem and improve the detection accuracy of 3d objects. The proposed 3d attention mechanism with voxel sparse information is utilized in the framework. This framework contains two important modules: point fusion with 2d attention and voxel fusion with 3d attention. These modules firstly obtain the image features by projecting the lidar point or 8 vertices of the voxel to image feature maps. Then, these modules perform attentive fusion on the voxelized image features, point-wise image features and lidar data. Our evaluation on the challenging KITTI dataset, including 3d and bird's eye view metrics, demonstrates great improvements, especially at objects with less point cloud sampling.
Author supplied keywords
Cite
CITATION STYLE
Wang, N., & Sun, P. (2021). Multi-fusion with attention mechanism for 3D object detection. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (Vol. 2021-July, pp. 475–480). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2021-115
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.