Human action recognition in videos of realistic scenes based on multi-scale CNN feature

3Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we develop a novel method to design a robust feature representation based on deep convolutional features and Latent Dirichlet Allocation (LDA) model for human action recognition. Compared to traditional CNN features which explore the outputs from the fully connected layers in CNN, we show that a low dimension feature representation generated on the deep convolutional layers is more discriminative. In addition, based on the convolutional feature maps, we use a multi-scale pooling strategy to better handle the objects with different scales and deformations. Moreover, we adopt LDA to explore the semantic relationship in video sequences and generate a topic histogram to represent a video, since LDA puts more emphasis on the content coherence than mere spatial contiguity. Extensive experimental results on two challenging datasets show that the proposed approach outperforms or is competitive with state-of-the-art methods for the application of human action recognition.

Cite

CITATION STYLE

APA

Zhou, Y., Pu, N., Qian, L., Wu, S., & Xiao, G. (2018). Human action recognition in videos of realistic scenes based on multi-scale CNN feature. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10736 LNCS, pp. 316–326). Springer Verlag. https://doi.org/10.1007/978-3-319-77383-4_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free