Speech emotion recognition using segmental level prosodic analysis

  • Koolagudi S
  • Kumar N
  • Rao K
  • 16


    Mendeley users who have this article in their library.
  • 34


    Citations of this article.


In this paper, prosodic analysis of speech segments is performed to recognise emotions. Speech signal is segmented into words and syllables. Energy and pitch parameters are extracted from utterances, words and syllables separately to develop emotion recognition models. Eight emotions (anger, disgust, fear, happy, neutral, sad, sarcastic and surprise) of simulated emotion speech corpus, IITKGP-SESC \cite{koolagudi2009} are used in this work for recognition of emotions. Word boundaries are manually marked for 15 utterances of IITKGP-SESC. Syllable boundaries are detected using vowel onset points (VOPs) as anchor locations. Recognition performance of emotions using segmental level prosodic features is not found to be appreciable, but by combining spectral features along with prosodic features, emotion recognition performance is considerably improved. Support vector machines (SVM) and Gaussian mixture models (GMM) are used to develop emotion models to analyse different speech segments for emotion recognition.

Author-supplied keywords

  • Emotion recognition
  • Emotion verification
  • Energy
  • Pitch
  • SVM
  • Segmental level prosodic features
  • VOP

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Shashidhar G. Koolagudi

  • Nitin Kumar

  • K. Sreenivasa Rao

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free