Action recognition in videos is becoming popular these years. The difficulty is how to extract the temporal information, which is important in the target actions. In this paper, we propose a conceptually, simple network for short-term action recognition. The proposed network architecture is extended from standard neural network to Autoencoder, which estimates pixel-wise evidence in frames, and they are integrated to classify the actions in the simple classifier. In the proposed architecture, the standard 2D convolutional layers for image classification are extended to 3D convolutional layers in the Autoencoder to extract the temporal information in the target actions. In the training phase, classifiers are introduced in the middle of layer to let the features of the middle layers are well discriminated. Also, classifiers are introduced at the end of layer to improve performance of the standard classifier. We have performed experiments using UCF101 dataset to evaluate the effectiveness of the proposed architecture. The results show that our methods can get efficient performance in short-term action recognition.
CITATION STYLE
Wang, X. H., Miyao, J., & Kurita, T. (2020). Short-Term Action Recognition by 3D Convolutional Neural Network with Pixel-Wise Evidences. In Communications in Computer and Information Science (Vol. 1212 CCIS, pp. 69–82). Springer. https://doi.org/10.1007/978-981-15-4818-5_6
Mendeley helps you to discover research relevant for your work.