A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

44Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Voice is an essential component of human communication, serving as a fundamental medium for expressing thoughts, emotions, and ideas. Disruptions in vocal fold vibratory patterns can lead to voice disorders, which can have a profound impact on interpersonal interactions. Early detection of voice disorders is crucial for improving voice health and quality of life. This research proposes a novel methodology called VDDMFS [voice disorder detection using MFCC (Mel-frequency cepstral coefficients), fundamental frequency and spectral centroid] which combines an artificial neural network (ANN) trained on acoustic attributes and a long short-term memory (LSTM) model trained on MFCC attributes. Subsequently, the probabilities generated by both the ANN and LSTM models are stacked and used as input for XGBoost, which detects whether a voice is disordered or not, resulting in more accurate voice disorder detection. This approach achieved promising results, with an accuracy of 95.67%, sensitivity of 95.36%, specificity of 96.49% and f1 score of 96.9%, outperforming existing techniques.

Cite

CITATION STYLE

APA

Verma, V., Benjwal, A., Chhabra, A., Singh, S. K., Kumar, S., Gupta, B. B., … Chui, K. T. (2023). A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-49869-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free