Deep learning-based bimodal speech and facial expression recognition of miners’ unsafe emotions

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Under the influence of unsafe emotions, miners’ ability to perceive risks is hindered, which can easily lead to decision-making errors and safety accidents. To recognize unsafe emotions exhibited by miners during operations, this study proposes a deep learning-based bimodal framework that integrates speech and facial expression features. A convolutional neural network (CNN) combined with a bidirectional long short-term memory (Bi-LSTM) network is employed to model local spectral patterns and temporal dependencies in speech signals, and ShuffleNet-V2 is used to capture deep facial features. In addition, three feature enhancement strategies are proposed to improve the generalization ability of the model. By constructing a dataset containing five categories of miners’ unsafe emotions for network training, the model achieves a mean recognition accuracy of 85.56%. Furthermore, we conducted a preliminary field test of the bimodal model in a real mining environment. The results provide preliminary evidence of its potential applicability in real-world mining conditions.

Cite

CITATION STYLE

APA

Lu, Y., Zhao, Z., Yan, L., & Shi, X. (2026). Deep learning-based bimodal speech and facial expression recognition of miners’ unsafe emotions. PLOS ONE, 21(5 May). https://doi.org/10.1371/journal.pone.0348906

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free