Visual code-sentences: A new video representation based on image descriptor sequences

Yusuke Mitarai; Masakazu Matsugu

Conference ProceedingsOPEN ACCESS

Visual code-sentences: A new video representation based on image descriptor sequences

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7583 LNCS(PART 1) 321-331

DOI: 10.1007/978-3-642-33863-2_32

0Citations

3Readers

Abstract

We present a new descriptor-sequence model for action recognition that enhances discriminative power in the spatio-temporal context, while maintaining robustness against background clutter as well as variability in inter-/intra-person behavior. We extend the framework of Dense Trajectories based activity recognition (Wang et al., 2011) and introduce a pool of dynamic Bayesian networks (e.g., multiple HMMs) with histogram descriptors as codebooks of composite action categories represented at respective key points. The entire codebooks bound with spatio-temporal interest points constitute intermediate feature representation as basis for generic action categories. This representation scheme is intended to serve as visual code-sentences which subsume a rich vocabulary of basis action categories. Through extensive experiments using KTH, UCF Sports, and Hollywood2 datasets, we demonstrate some improvements over the state-of-the-art methods. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Mitarai, Y., & Matsugu, M. (2012). Visual code-sentences: A new video representation based on image descriptor sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7583 LNCS, pp. 321–331). Springer Verlag. https://doi.org/10.1007/978-3-642-33863-2_32

Visual code-sentences: A new video representation based on image descriptor sequences

Abstract

Cite

Register to see more suggestions