We propose a deep learning based technique to classify actions based on Long Short Term Memory (LSTM) networks. The proposed scheme first learns spatial temporal features from the video, using an extension of the Convolutional Neural Networks (CNN) to 3D. A Recurrent Neural Network (RNN) is then trained to classify each sequence considering the temporal evolution of the learned features for each time step. Experimental results on the CMU MoCap, UCF 101, Hollywood 2 dataset show the efficacy of the proposed approach. We extend the proposed framework with an efficient motion feature, to enable handling significant camera motion. The proposed approach outperforms the existing deep models for each dataset.
CITATION STYLE
Singh, K. K., & Mukherjee, S. (2018). Recognizing human activities in videos using improved dense trajectories over LSTM. In Communications in Computer and Information Science (Vol. 841, pp. 78–88). Springer Verlag. https://doi.org/10.1007/978-981-13-0020-2_8
Mendeley helps you to discover research relevant for your work.