We propose the use of a 152-layer Fully Convolutional Residual Network (ResNet-FCN) for non motion-based semantic segmentation of fish objects in underwater videos that is robust to varying backgrounds and changes in illumination. For supervised training, we use weaklylabelled ground truth derived from motion-based adaptive Mixture of Gaussians Background Subtraction. Segmentation results of videos taken from six different sites at a benthic depth of around 10m using ResNet-FCN provide a fish object average precision of 65.91%, and average recall of 83.99%. The network is able to correctly segment fish objects solely through color-based input features, without need for motion cues, and it could detect fish objects even in frames that have strong changes in illumination due to wave motion at the sea surface. It can segment fish objects that are located far from the camera despite varying benthic background appearance and differences in aquatic hues.
CITATION STYLE
Labao, A. B., & Naval, P. C. (2017). Weakly-Labelled semantic segmentation of fish objects in underwater videos using a deep residual network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10192 LNAI, pp. 255–265). Springer Verlag. https://doi.org/10.1007/978-3-319-54430-4_25
Mendeley helps you to discover research relevant for your work.