Abstract
This study presents a widespread analysis of affective vocal expression classification systems. In this study, the Hilbert–Huang–Hurst coefficient (HHHC) vector is proposed as a non-linear vocal source feature to represent the emotional states according to their effects on the speech production mechanism. Affective states are highlighted by the empirical mode decomposition-based method, which exploits the non-stationarity of the acoustic variations. Hurst coefficients are then estimated from the decomposition modes to form the feature vector. Additionally, a vector of the index of non-stationarity (INS) is introduced as dynamic information to the HHHC. The proposed feature vector is evaluated in speech emotion classification experiments with three databases in German and English languages. Three state-of-the-art acoustic feature vectors are adopted as a baseline. The (Formula presented.) -integrated Gaussian mixture model ((Formula presented.) -GMM) is also introduced for the emotion representation and classification. Its performance is compared to competing for stochastic and machine learning classifiers. Results demonstrate that the HHHC leads to significant classification improvement when compared to the baseline acoustic feature vectors. Moreover, results also show that the (Formula presented.) -GMM outperforms the competing classification methods. Finally, the complementarity aspects of HHHC and INS are also evaluated for the GeMAPS and eGeMAPS feature sets.
Author supplied keywords
- $\alpha $α-GMM
- $\alpha $α-integrated Gaussian mixture model
- $α-GMM
- English language
- Gaussian mixture model
- Gaussian processes
- GeMAPS feature set
- German language
- HHHC
- Hilbert transforms
- Hilbert–Huang–Hurst coefficient vector
- Hilbert–Huang–Hurst-based nonlinear acoustic feature vector
- acoustic feature vectors
- acoustic signal processing
- affective computing
- affective vocal expression classification systems
- eGeMAPS feature set
- emotion recognition
- emotion representation
- empirical mode decomposition
- index of nonstationarity
- learning (artificial intelligence)
- learning systems
- machine learning classifiers
- mixture models
- nonlinear vocal source feature
- signal classification
- signal representation
- speech emotion classification experiments
- speech enhancement
- speech production mechanism
- speech recognition
- stochastic classifiers
- stochastic models
- stochastic processes
Cite
CITATION STYLE
Vieira, V., Coelho, R., & de Assis, F. M. (2020). Hilbert–Huang–Hurst-based non-linear acoustic feature vector for emotion classification with stochastic models and learning systems. IET Signal Processing, 14(8), 522–532. https://doi.org/10.1049/iet-spr.2019.0383
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.