Abstract
Our objective is to efficiently and accurately estimate human upper body pose in gesture videos. To this end, we build on the recent successful applications of random forests (RF) classifiers and regressors, and develop a pose estimation model with the following novelties: (i) the joints are estimated sequentially, taking account of the human kinematic chain. This means that we don't have to make the simplifying assumption of most previous RF methods - that the joints are estimated independently; (ii) by combining both classifiers (as a mixture of experts) and regressors, we show that the learning problem is tractable and that more context can be taken into account; and (iii) dense optical flow is used to align multiple expert joint position proposals from nearby frames, and thereby improve the robustness of the estimates. The resulting method is computationally efficient and can overcome a number of the errors (e.g. confusing left/right hands) made by RF pose estimators that infer their locations independently. We show that we improve over the state of the art on upper body pose estimation for two public datasets: the BBC TV Signing dataset and the ChaLearn Gesture Recognition dataset.
Cite
CITATION STYLE
Charles, J., Pfister, T., Magee, D., Hogg, D., & Zisserman, A. (2014). Upper body pose estimation with temporal sequential forests. In BMVC 2014 - Proceedings of the British Machine Vision Conference 2014. British Machine Vision Association, BMVA. https://doi.org/10.5244/c.28.54
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.