With the rapid penetration of the Internet across the globe and increased bandwidth, the rise of audio data is significant. In one of the recent surveys on the Internet, it is mentioned that video and audio streaming will gobble up more than 50% of the Internet traffic. Moreover, with the recent rise of voice assistants, the significance to the audio data, especially voice data is at the zenith. In this background, there is a need to analyze the audio data for gathering significant insights which will have larger implications in the domains of health, marketing and media. In this project, an open-source approach is proposed to analyze the audio data using the acoustic features like Mel frequency cepstral coefficients (MFCCs), unlike converting the audio to text and performing the analysis on the converted textual data. In this work, a convolutional neural network (CNN) model is developed to predict the emotions on the given audio data.
CITATION STYLE
Potluri, A., Guguloth, R., & Muppala, C. (2020). Emotion-based extraction, classification and prediction of the audio data. In Advances in Intelligent Systems and Computing (Vol. 1090, pp. 301–309). Springer. https://doi.org/10.1007/978-981-15-1480-7_26
Mendeley helps you to discover research relevant for your work.