Abstract
A person’s speech can be altered by various changes in the autonomic nervous system and effective technologies can process this information to recognize emotion. As an example, speech produced in a state of fear, anger, or joy becomes loud and fast, with a higher and wider range in pitch, whereas emotions such as sadness or tiredness generate slow and low-pitched speech. Detection of human emotions through voice-pattern and speech-pattern analysis has many applications such as better assisting human-machine interactions. This paper aims to detect emotions from audio. Several machine learning algorithms including K-nearest neighbours (KNN) and decision trees were implemented, based on acoustic features such as Mel Frequency Cepstral Coefficient (MFCC). Our evaluation shows that the proposed approach yields accuracies of 98%, 92% and 99% using KNN, Decision Trees and Extra-Tree Classifiers, respectively, for 7 emotions using Toronto Emotional Speech Set (TESS) Dataset.
Cite
CITATION STYLE
Mande, A. A. (2019). EMOTION DETECTION USING AUDIO DATA SAMPLES. International Journal of Advanced Research in Computer Science, 10(6), 13–20. https://doi.org/10.26483/ijarcs.v10i6.6489
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.