One-shot learning gesture recognition from RGB-D data using bag of features

95Citations
Citations of this article
95Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature (BoF) models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2). © 2013 Jun Wan, Qiuqi Ruan, Shuang Deng and Wei Li.

Cite

CITATION STYLE

APA

Wan, J., Ruan, Q., Li, W., & Deng, S. (2013). One-shot learning gesture recognition from RGB-D data using bag of features. Journal of Machine Learning Research, 14, 2549–2582. https://doi.org/10.1007/978-3-319-57021-1_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free