Many human action recognition tasks involve data that can be <br />factorized into multiple views such as body postures and hand shapes. These <br />views often interact with each other over time, providing important cues to <br />understanding the action. We present multi-view latent variable discriminative <br />models that jointly learn both view-shared and view-specific sub-structures to <br />capture the interaction between views. Knowledge about the underlying structure <br />of the data is formulated as a multi-chain structured latent conditional model, <br />explicitly learning the interaction between multiple views using disjoint sets <br />of hidden variables in a discriminative manner. The chains are tied using a <br />predetermined topology that repeats over time. We present three topologies - <br />linked, coupled, and linked-coupled - that differ in the type of interaction <br />between views that they model. We evaluate our approach on both segmented and <br />unsegmented human action recognition tasks, using the ArmGesture, the NATOPS, <br />and the ArmGesture-Continuous data. Experimental results show that our approach <br />outperforms previous state-of-the-art action recognition models.
Song, Y., Morency, L. P., & Davis, R. (2012). Multi-view latent variable discriminative models for action recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2120–2127). https://doi.org/10.1109/CVPR.2012.6247918