Water segmentation is a critical task for ensuring the safety of unmanned surface vehicles (USVs). Most existing image-based water segmentation methods may be inaccurate due to light reflection on the water. The fusion-based method combines the paired 2D camera images and 3D LiDAR point clouds as inputs, resulting in a high computational load and considerable time consumption, with limits in terms of practical applications. Thus, in this study, we propose a multimodal fusion water segmentation method that uses a transformer and knowledge distillation to leverage 3D LiDAR point clouds in order to assist in the generation of 2D images. A local and non-local cross-modality fusion module based on a transformer is first used to fuse 2D images and 3D point cloud information during the training phase. A multi-to-single-modality knowledge distillation module is then applied to distill the fused information into a pure 2D network for water segmentation. Extensive experiments were conducted with a dataset containing various scenes collected by USVs in the water. The results demonstrate that the proposed method achieves approximately 1.5% improvement both in accuracy and MaxF over classical image-based methods, and it is much faster than the fusion-based method, achieving speeds ranging from 15 fps to 110 fps.
CITATION STYLE
Zhang, J., Gao, J., Liang, J., Wu, Y., Li, B., Zhai, Y., & Li, X. (2023). Efficient Water Segmentation with Transformer and Knowledge Distillation for USVs. Journal of Marine Science and Engineering, 11(5). https://doi.org/10.3390/jmse11050901
Mendeley helps you to discover research relevant for your work.