A 3DCNN-LSTM Multi-Class Temporal Segmentation for Hand Gesture Recognition

12Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper introduces a multi-class hand gesture recognition model developed to identify a set of hand gesture sequences from two-dimensional RGB video recordings, using both the appearance and spatiotemporal parameters of consecutive frames. The classifier utilizes a convolutional-based network combined with a long-short-term memory unit. To leverage the need for a large-scale dataset, the model deploys training on a public dataset, adopting a technique known as transfer learning to fine-tune the architecture on the hand gestures of relevance. Validation curves performed over a batch size of 64 indicate an accuracy of 93.95% (±0.37) with a mean Jaccard index of 0.812 (±0.105) for 22 participants. The fine-tuned architecture illustrates the possibility of refining a model with a small set of data (113,410 fully labelled image frames) to cover previously unknown hand gestures. The main contribution of this work includes a custom hand gesture recognition network driven by monocular RGB video sequences that outperform previous temporal segmentation models, embracing a small-sized architecture that facilitates wide adoption.

Cite

CITATION STYLE

APA

Gionfrida, L., Rusli, W. M. R., Kedgley, A. E., & Bharath, A. A. (2022). A 3DCNN-LSTM Multi-Class Temporal Segmentation for Hand Gesture Recognition. Electronics (Switzerland), 11(15). https://doi.org/10.3390/electronics11152427

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free