Pitch prediction from MFCC vectors for speech reconstruction

  • Xu Shao
  • Milner B
  • 15


    Mendeley users who have this article in their library.
  • 2


    Citations of this article.


The paper proposes a technique for reconstructing an acoustic speech signal solely from a stream of Mel-frequency cepstral coefficients (MFCCs). Previous speech reconstruction methods have required an additional pitch element, but this work proposes two maximum a posteriori (MAP) methods for predicting pitch from the MFCC vectors themselves. The first method is based on a Gaussian mixture model (GMM) while the second scheme utilises the temporal correlation available from a hidden Markov model (HMM) framework. A formal measurement of both frame classification accuracy and RMS pitch error shows that an HMM-based scheme with 5 clusters per state is able to classify correctly over 94% of frames and has an RMS pitch error of 3.1 Hz in comparison to a reference pitch. Informal listening tests and analysis of spectrograms reveals that speech reconstructed solely from the MFCC vectors is almost indistinguishable from that using the reference pitch.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Xu Shao

  • B. Milner

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free