Distinct two-stream convolutional networks for human action recognition in videos using segment-based temporal modeling

Ashok Sarabu; Ajit Kumar Santra

Journal ArticleOPEN ACCESS

Distinct two-stream convolutional networks for human action recognition in videos using segment-based temporal modeling

Data (2020) 5(4) 1-12

DOI: 10.3390/data5040104

10Citations

16Readers

Abstract

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Sarabu, A., & Santra, A. K. (2020). Distinct two-stream convolutional networks for human action recognition in videos using segment-based temporal modeling. Data, 5(4), 1–12. https://doi.org/10.3390/data5040104

Distinct two-stream convolutional networks for human action recognition in videos using segment-based temporal modeling

Abstract

Author supplied keywords

Cite

Register to see more suggestions