Deep learning-based action recognition in videos has obtained much attention because of achieving remarkable performance in diverse applications. However, due to the heterogeneous background and noisy spatio-temporal cues, extracting highly discriminative features is still quite challenging. To deal with this problem, numerous methods have been published based on the attention mechanism and skeleton modality. Instead of focusing on data pre-processing, we shed light on the feature map and concentrate on extracting highly discriminative features. First, we introduce Batch-wise Entropy Supervised Stream (BESS) to extend feature discrimination similar to the uncertainty of the corresponding batch. Secondly, to obtain a more generalized model, we propose a Stream to Harmonize the feature discrimination by Augmenting both Features (HAFS) of ResNext101 and BESS. These two streams are hallucinated by the distillation and feature fusion technique effectively into HAFS. We introduce a new metric to assess the characteristics of the feature map. This metric depicts the relationship between the feature discrimination and recognition accuracy. Finally, we comprehensively evaluate our approach on two benchmark datasets, HMDB51 and UCF101. Experimental results demonstrate that, extending and then harmonizing the feature discrimination is one of the effective ways of generating highly discriminative features. Experimental outcomes indicate the superiority of our proposed technique over the existing state-of-the-art methods.
CITATION STYLE
Hossain, M. I., Siddique, A., Hossain, M. A., Hossain, M. D., & Huh, E. N. (2020). Batch Entropy Supervised Convolutional Neural Networks for Feature Extraction and Harmonizing for Action Recognition. IEEE Access, 8, 206427–206444. https://doi.org/10.1109/ACCESS.2020.3037529
Mendeley helps you to discover research relevant for your work.