Middle-level representation for human activities recognition: The role of spatio-temporal relationships

5Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local features-based methods or global template-based methods, we propose to represent a video sequence by a set of middle-level parts. A part, or component, has consistent spatial structure and consistent motion. We first segment the visual motion patterns and generate a set of middle-level components by clustering keypoints-based trajectories extracted from the video. To further exploit the interdependencies of the moving parts, we then define spatio-temporal relationships between pairwise components. The resulting descriptive middle-level components and pairwise-components thereby catch the essential motion characteristics of human activities. They also give a very compact representation of the video. We apply our framework on popular and challenging video datasets: Weizmann dataset and UT-Interaction dataset. We demonstrate experimentally that our middle-level representation combined with a χ 2-SVM classifier equals to or outperforms the state-of-the-art results on these dataset. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Yuan, F., Prinet, V., & Yuan, J. (2012). Middle-level representation for human activities recognition: The role of spatio-temporal relationships. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6553 LNCS, pp. 168–180). https://doi.org/10.1007/978-3-642-35749-7_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free