Strategies for Tackling the Class Imbalance Problem in Marine Image Classification

4Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Research of deep learning algorithms, especially in the field of convolutional neural networks (CNN), has shown significant progress. The application of CNNs in image analysis and pattern recognition has earned a lot of attention in this regard and few applications to classify a small number of common taxa in marine image collections have been reported yet. In this paper, we address the problem of class imbalance in marine image data, i.e. the common observation that 80%–90% of the data belong to a small subset of $$L^\prime $$ classes among the total number of L observed classes, with $$L^\prime \ll L$$. A small number of methods to compensate for the class imbalance problem in the training step have been proposed for the common computer vision benchmark datasets. But marine image collections (showing for instance megafauna as considered in this study) pose a greater challenge as the observed imbalance is more extreme as habitats can feature a high biodiversity but a low species density. In this paper, we investigate the potential of various over-/undersampling methods to compensate for the class imbalance problem in marine imaging. In addition, five different balancing rules are proposed and analyzed to examine the extent to which sampling should be used, i.e. how many samples should be created or removed to gain the most out of the sampling algorithms. We evaluate these methods with AlexNet trained for classifying benthic image data recorded at the Porcupine Abyssal Plain (PAP) and use a Support Vector Machine as baseline classifier. We can report that the best of our proposed strategies in combination with data augmentation applied to AlexNet results in an increase of thirteen basis points compared to AlexNet without sampling. Furthermore, examples are presented, which show that the combination of oversampling and augmentation leads to a better generalization than pure augmentation.

Cite

CITATION STYLE

APA

Langenkämper, D., van Kevelaer, R., & Nattkemper, T. W. (2019). Strategies for Tackling the Class Imbalance Problem in Marine Image Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11188 LNCS, pp. 26–36). Springer Verlag. https://doi.org/10.1007/978-3-030-05792-3_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free