In this article, we propose a Hand Gesture Recognition (HGR) system based on a novel deep transformer (DT) neural network for media player control. The extracted hand skeleton features are processed by separate transformers for each finger in isolation to better identify the finger characteristics to drive the following classification. The achieved HGR accuracy (0.853) outperforms state-of-the-art HGR approaches when tested on the popular NVIDIA dataset. Moreover, we conducted a subjective assessment involving 30 people to evaluate the Quality of Experience (QoE) provided by the proposed DT-HGR for controlling a media player application compared with two traditional input devices, i.e., mouse and keyboard. The assessment participants were asked to evaluate objective (accuracy) and subjective (physical fatigue, usability, pragmatic quality, and hedonic quality) measurements. We found that (i) the accuracy of DT-HGR is very high (91.67%), only slightly lower than that of traditional alternative interaction modalities; and that (ii) the perceived quality for DT-HGR in terms of satisfaction, comfort, and interactivity is very high, with an average Mean Opinion Score (MOS) value as high as 4.4, whereas the alternative approaches did not reach 3.8, which encourages a more pervasive adoption of the natural gesture interaction.
CITATION STYLE
Floris, A., Porcu, S., & Atzori, L. (2024). Controlling Media Player with Hands: A Transformer Approach and a Quality of Experience Assessment. ACM Transactions on Multimedia Computing, Communications and Applications, 20(5). https://doi.org/10.1145/3638560
Mendeley helps you to discover research relevant for your work.