Abstract
Under the influence of unsafe emotions, miners’ ability to perceive risks is hindered, which can easily lead to decision-making errors and safety accidents. To recognize unsafe emotions exhibited by miners during operations, this study proposes a deep learning-based bimodal framework that integrates speech and facial expression features. A convolutional neural network (CNN) combined with a bidirectional long short-term memory (Bi-LSTM) network is employed to model local spectral patterns and temporal dependencies in speech signals, and ShuffleNet-V2 is used to capture deep facial features. In addition, three feature enhancement strategies are proposed to improve the generalization ability of the model. By constructing a dataset containing five categories of miners’ unsafe emotions for network training, the model achieves a mean recognition accuracy of 85.56%. Furthermore, we conducted a preliminary field test of the bimodal model in a real mining environment. The results provide preliminary evidence of its potential applicability in real-world mining conditions.
Cite
CITATION STYLE
Lu, Y., Zhao, Z., Yan, L., & Shi, X. (2026). Deep learning-based bimodal speech and facial expression recognition of miners’ unsafe emotions. PLOS ONE, 21(5 May). https://doi.org/10.1371/journal.pone.0348906
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.