Improving Speech Recognition Rate through Analysis Parameters

  • Eringis D
  • Tamulevičius G
N/ACitations
Citations of this article
23Readers
Mendeley users who have this article in their library.

Abstract

Speech signal is redundant and non-stationary by nature. Because of vocal tract inertness these variations are not very rapid and the signal can be considered as stationary in short segments. It is presumed that in short-time magnitude spectrum the most distinct information of speech is contained. This is the main reason for speech signal analysis in frame-by-frame manner. The analyzed speech signal is segmented into overlapping segments (so-called frames) for this purpose. Segments of 15-25 ms with the overlap of 10-15 ms are used usually.In this paper we present results of our investigation of analysis window length and frame shift influence on speech recognition rate. We have analyzed three different cepstral analysis approaches for this purpose: mel frequency cepstral analysis (MFCC), linear prediction cepstral analysis (LPCC) and perceptual linear prediction cepstral analysis (PLPC). The highest speech recognition rate was obtained using 10 ms length analysis window with the frame shift varying from 7.5 to 10 ms (regardless of analysis type). The highest increase of recognition rate was 2.5 %.

Cite

CITATION STYLE

APA

Eringis, D., & Tamulevičius, G. (2014). Improving Speech Recognition Rate through Analysis Parameters. Electrical, Control and Communication Engineering, 5(1), 61–66. https://doi.org/10.2478/ecce-2014-0009

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free