Selecting good speech features for recognition

Youngjik Lee; Kyu Woong Hwang

Journal ArticleOPEN ACCESS

Selecting good speech features for recognition

ETRI Journal (1996) 18(1) 29-40

DOI: 10.4218/etrij.96.0196.0013

16Citations

5Readers

Abstract

This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the melscaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.

Cite

CITATION STYLE

APA

Lee, Y., & Hwang, K. W. (1996). Selecting good speech features for recognition. ETRI Journal, 18(1), 29–40. https://doi.org/10.4218/etrij.96.0196.0013

Selecting good speech features for recognition

Abstract

Cite

Register to see more suggestions