Hierarchical dynamic parsing and encoding for action recognition

17Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A video action generally exhibits quite complex rhythms and non-stationary dynamics. To model such non-uniform dynamics, this paper describes a novel hierarchical dynamic encoding method to capture both the locally smooth dynamics and globally drastic dynamic changes. It provides a multi-layer joint representation for temporal modeling for action recognition. At the first layer, the action sequence is parsed in an unsupervised manner into several smooth-changing stages corresponding to different key poses or temporal structures. The dynamics within each stage are encoded by mean-pooling or learning to rank based encoding. At the second layer, the temporal information of the ordered dynamics extracted from the previous layer is encoded again to form the overall representation. Extensive experiments on a gesture action dataset (Chalearn) and several generic action datasets (Olympic Sports and Hol-lywood2) have demonstrated the effectiveness of the proposed method.

Cite

CITATION STYLE

APA

Su, B., Zhou, J., Ding, X., Wang, H., & Wu, Y. (2016). Hierarchical dynamic parsing and encoding for action recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9908 LNCS, pp. 202–217). Springer Verlag. https://doi.org/10.1007/978-3-319-46493-0_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free