Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

6Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We introduce a novel spatiotemporal deformable part model for the localization of fine-grained human interactions of two persons in unsegmented videos. Our approach is the first to classify interactions and additionally provide the temporal and spatial extent of the interaction in the video. To this end, our models contain part detectors that support different scales as well as different types of feature descriptors, which are combined in a single graph. This allows us to model the detailed coordination between people in terms of body pose and motion. We demonstrate that this helps to avoid confusions between visually similar interactions. We show that robust results can be obtained when training on small numbers of training sequences (5–15) per interaction class. We achieve AuC scores of 0.82 with an IoU of 0.3 on the publicly available ShakeFive2 dataset, which contains interactions that differ slightly in their coordination. To further test the generalization of our models, we perform cross-dataset experiments where we test on two other publicly available datasets: UT-Interaction and SBU Kinect. These experiments show that our models generalize well to different environments.

Cite

CITATION STYLE

APA

van Gemeren, C., Poppe, R., & Veltkamp, R. C. (2018). Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions. Eurasip Journal on Image and Video Processing, 2018(1). https://doi.org/10.1186/s13640-018-0255-0

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free