Do we need STRFs for cocktail parties? On the relevance of physiologically motivated features for human speech perception derived from automatic speech recognition

2Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Complex auditory features such as spectro-temporal receptive fields (STRFs) derived from the cortical auditory neurons appear to be advantageous in sound processing. However, their physiological and functional relevance is still unclear. To assess the utility of such feature processing for speech reception in noise, automatic speech recognition (ASR) performance using feature sets obtained from physiological and/or psychoacoustical data and models is compared to human performance. Time-frequency representations with a nonlinear compression are compared with standard features such as mel-scaled spectrograms. Both alternatives serve as an input to model estimators that infer spectro-temporal filters (and subsequent nonlinearity) from physiological measurements in auditory brain areas of zebra finches. Alternatively, a filter bank of 2-dimensional Gabor functions is employed, which covers a wide range of modulation frequencies in the time and frequency domain. The results indicate a clear increase in ASR robustness using complex features (modeled by Gabor functions), while the benefit from physiologically derived STRFs is limited. In all cases, the use of power-normalized spectral representations increases performance, indicating that substantial dynamic compression is advantageous for level-independent pattern recognition. The methods employed may help physiologists to look for more relevant STRFs and to better understand specific differences in estimated STRFs. © Springer Science+Business Media New York 2013.

Cite

CITATION STYLE

APA

Kollmeier, B., Schädler, M. R. R., Meyer, A., Anemüller, J., & Meyer, B. T. (2013). Do we need STRFs for cocktail parties? On the relevance of physiologically motivated features for human speech perception derived from automatic speech recognition. In Advances in Experimental Medicine and Biology (Vol. 787, pp. 333–341). Springer Science and Business Media, LLC. https://doi.org/10.1007/978-1-4614-1590-9_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free