Sign Language Recognition models have been steadily increasing in performance in the last years, fueled by Neural Network models. Furthermore, generic Neural Network models have taken precedence over specialized models designed specifically for Sign Language. Despite this, the completeness and complexity of datasets has not scaled accordingly. This deficiency presents a significant challenge for deploying Sign Language Recognition models, specially given that Sign Languages are specific to countries or even regions. Following this trend, we experiment with three models built on standard recurrent and convolutional neural network layers. We evaluate the models on LSA64, the only Argentinian Sign Language dataset available. Coupled with simple but carefully chosen hyperparameters and preprocessing techniques, these models are all able to achieve near perfect accuracy on LSA64, surpassing all previous models, many specifically designed for this task. Furthermore, we perform ablation studies that indicate that temporal data augmentation can provide a significant boost to accuracy, unlike traditional spatial data augmentation techniques. Finally, we analyze the activation values of the three models to understand the types of features learned, and find they develop on hand-specific filters to classify signs.
CITATION STYLE
Mindlin, I., Quiroga, F., Ronchetti, F., Bianco, P. D., Ríos, G., Lanzarini, L., & Hasperué, W. (2021). A Comparison of Neural Networks for Sign Language Recognition with LSA64. In Communications in Computer and Information Science (Vol. 1444 CCIS, pp. 104–117). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-84825-5_8
Mendeley helps you to discover research relevant for your work.