On the Usage of Pre-Trained Speech Recognition Deep Layers to Detect Emotions

18Citations
Citations of this article
65Readers
Mendeley users who have this article in their library.

Abstract

One of the Industry 4.0 landmarks, concerns the optimization of manufacturing processes by increasing the operator's productivity. But productivity is highly affected by the operator's emotions. Positive emotions (e.g. happiness) are positively related to productivity, in contrast negative emotions (e.g. frustration) are negative related to productivity and positive related to misconducts and misbehaviors on the workplace. Thus perhaps, automatic recommendation systems can suggest actions or instructions to eliminate or attenuate undesired negative emotions on the workplace. These systems might support their actions based on the reliability of emotion detectors. In this paper, emotions are detected thought a speech system. Our solution was built over deep speech recognition layers, namely the first two convolutional layers of the pre-trained 2015 Baidu's speech recognition model. In re-utilizing these first two convolutional layers, robust meta-features are expected to be extracted. Our deep learning model attempts to predict the seven primary emotions on the MELD test set.Furthermore, our solution did not use any contextual data and yet it achieved robust results. The proposed weighted TrBaidu algorithm achieved state-of-art results on the detection of joy and surprise emotions, a F1-score rate of 23 % for both emotions.

Cite

CITATION STYLE

APA

Oliveira, J., & Praca, I. (2021). On the Usage of Pre-Trained Speech Recognition Deep Layers to Detect Emotions. IEEE Access, 9, 9699–9705. https://doi.org/10.1109/ACCESS.2021.3051083

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free