Visual code-sentences: A new video representation based on image descriptor sequences

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present a new descriptor-sequence model for action recognition that enhances discriminative power in the spatio-temporal context, while maintaining robustness against background clutter as well as variability in inter-/intra-person behavior. We extend the framework of Dense Trajectories based activity recognition (Wang et al., 2011) and introduce a pool of dynamic Bayesian networks (e.g., multiple HMMs) with histogram descriptors as codebooks of composite action categories represented at respective key points. The entire codebooks bound with spatio-temporal interest points constitute intermediate feature representation as basis for generic action categories. This representation scheme is intended to serve as visual code-sentences which subsume a rich vocabulary of basis action categories. Through extensive experiments using KTH, UCF Sports, and Hollywood2 datasets, we demonstrate some improvements over the state-of-the-art methods. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Mitarai, Y., & Matsugu, M. (2012). Visual code-sentences: A new video representation based on image descriptor sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7583 LNCS, pp. 321–331). Springer Verlag. https://doi.org/10.1007/978-3-642-33863-2_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free