Relationship between contributions of temporal amplitude envelope of speech and modulation transfer function in room acoustics to perception of noise-vocoded speech

13Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

Speech signals can be represented as a sum of amplitude-modulated frequency bands. This sum can also be regarded as a temporal amplitude envelope (TAE) with temporal fine structure. Our previous studies using noise-vocoded speech (NVS) showed that the TAE of speech plays an important role in the perception of linguistic information (speech intelligibility) as well as non-linguistic information (e.g., vocal-emotion recognition). It was found that the upper limit of the modulation frequency from 4 to 8 Hz on the TAE is important for speech intelligibility, while that from 8 to 16 Hz is important for vocal-emotion recognition. However, speech intelligibility generally dramatically degrades due to reverberation. The concept of the modulation transfer function (MTF) takes into account the relationship between the transfer function in an enclosure in terms of input and output TAEs and characteristics of the enclosure under reverberant conditions. This concept was introduced as a measure in room acoustics for assessing the effect of an enclosure on speech intelligibility. For this study, we conducted two experiments involving word intelligibility tests and vocal-emotion recognition with NVS under reverberant conditions to investigate the relationship between the contributions of the TAE of speech and MTF of reverberation to modulation perception of NVS. We also pointed out that the straightforward scheme, i.e., the relationship between the contributions of the static features (peak/slope) in the modulation spectrum (MS) of speech and MTF of reverberation, cannot consistently account for the auditory perception of both linguistic and non-linguistic information obtained from these perceptual data of NVS under reverberant conditions. We then developed a scheme in which the relationship between the contributions of the temporal MS features and MTF of reverberation to modulation perception can consistently account for these perceptual data of NVS.

Cite

CITATION STYLE

APA

Unoki, M., & Zhu, Z. (2020). Relationship between contributions of temporal amplitude envelope of speech and modulation transfer function in room acoustics to perception of noise-vocoded speech. Acoustical Science and Technology, 41(1), 233–244. https://doi.org/10.1250/ast.41.233

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free