Deep learning-based bimodal speech and facial expression recognition of miners’ unsafe emotions

Ying Lu; Zihao Zhao; Liang Yan; Xinyv Shi

Journal ArticleOPEN ACCESS

Deep learning-based bimodal speech and facial expression recognition of miners’ unsafe emotions

PLOS ONE (2026) 21(5 May)

DOI: 10.1371/journal.pone.0348906

0Citations

1Readers

Abstract

Under the influence of unsafe emotions, miners’ ability to perceive risks is hindered, which can easily lead to decision-making errors and safety accidents. To recognize unsafe emotions exhibited by miners during operations, this study proposes a deep learning-based bimodal framework that integrates speech and facial expression features. A convolutional neural network (CNN) combined with a bidirectional long short-term memory (Bi-LSTM) network is employed to model local spectral patterns and temporal dependencies in speech signals, and ShuffleNet-V2 is used to capture deep facial features. In addition, three feature enhancement strategies are proposed to improve the generalization ability of the model. By constructing a dataset containing five categories of miners’ unsafe emotions for network training, the model achieves a mean recognition accuracy of 85.56%. Furthermore, we conducted a preliminary field test of the bimodal model in a real mining environment. The results provide preliminary evidence of its potential applicability in real-world mining conditions.

Cite

CITATION STYLE

APA

Lu, Y., Zhao, Z., Yan, L., & Shi, X. (2026). Deep learning-based bimodal speech and facial expression recognition of miners’ unsafe emotions. PLOS ONE, 21(5 May). https://doi.org/10.1371/journal.pone.0348906

Deep learning-based bimodal speech and facial expression recognition of miners’ unsafe emotions

Abstract

Cite

Register to see more suggestions