We present a method for gesture detection and localization based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i) careful initialization of individual modalities; and ii) gradual fusion of modalities from strongest to weakest cross-modality structure. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams.
CITATION STYLE
Neverova, N., Wolf, C., Taylor, G. W., & Nebout, F. (2015). Multi-scale deep learning for gesture detection and localization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8925, pp. 474–490). Springer Verlag. https://doi.org/10.1007/978-3-319-16178-5_33
Mendeley helps you to discover research relevant for your work.