In this paper, we develop a novel method to design a robust feature representation based on deep convolutional features and Latent Dirichlet Allocation (LDA) model for human action recognition. Compared to traditional CNN features which explore the outputs from the fully connected layers in CNN, we show that a low dimension feature representation generated on the deep convolutional layers is more discriminative. In addition, based on the convolutional feature maps, we use a multi-scale pooling strategy to better handle the objects with different scales and deformations. Moreover, we adopt LDA to explore the semantic relationship in video sequences and generate a topic histogram to represent a video, since LDA puts more emphasis on the content coherence than mere spatial contiguity. Extensive experimental results on two challenging datasets show that the proposed approach outperforms or is competitive with state-of-the-art methods for the application of human action recognition.
CITATION STYLE
Zhou, Y., Pu, N., Qian, L., Wu, S., & Xiao, G. (2018). Human action recognition in videos of realistic scenes based on multi-scale CNN feature. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10736 LNCS, pp. 316–326). Springer Verlag. https://doi.org/10.1007/978-3-319-77383-4_31
Mendeley helps you to discover research relevant for your work.