Deep convolutional neural networks for efficient pose estimation in gesture videos

Tomas Pfister; Karen Simonyan; James Charles; Andrew Zisserman

Conference Proceedings

Deep convolutional neural networks for efficient pose estimation in gesture videos

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9003 538-552

DOI: 10.1007/978-3-319-16865-4_35

78Citations

156Readers

Get full text

Abstract

Our objective is to efficiently and accurately estimate the upper body pose of humans in gesture videos. To this end, we build on the recent successful applications of deep convolutional neural networks (ConvNets). Our novelties are: (i) our method is the first to our knowledgeto use ConvNets for estimating human pose in videos; (ii) a new network that exploits temporal information from multiple frames, leadingto better performance; (iii) showing that pre-segmenting the foreground of the video improves performance; and (iv) demonstrating that even without foreground segmentations, the network learns to abstract away from the background and can estimate the pose even in the presence of a complex, varying background. We evaluate our method on the BBC TV Signing dataset and show that our pose predictions are significantly better, and an order of magnitude faster to compute, than the state of the art [3].

Cite

CITATION STYLE

APA

Pfister, T., Simonyan, K., Charles, J., & Zisserman, A. (2015). Deep convolutional neural networks for efficient pose estimation in gesture videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9003, pp. 538–552). Springer Verlag. https://doi.org/10.1007/978-3-319-16865-4_35

Deep convolutional neural networks for efficient pose estimation in gesture videos

Abstract

Cite

Register to see more suggestions