Acoustic event mixing to multichannel ami data for distant speech recognition and acoustic event classification benchmarking

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Currently, the quality of Distant Speech Recognition (DSR) systems cannot match the quality of speech recognition on clean speech acquired by close-talking microphones. The main problems behind DSR are situated with the far field nature of data, one of which is unpredictable occurrence of acoustic events and scenes, which distort the signal’s speech component. Application of acoustic event detection and classification (AEC) in conjunction with DSR can benefit speech enhancement and improve DSR accuracy. However, no publicly available corpus for conjunctive AEC and DSR currently exists. This paper proposes a procedure of realistically mixing acoustic events and scenes with far field multi-channel recordings of the AMI meeting corpus, accounting for spatial reverberation and distinctive placement of sources of different kind. We evaluate the derived corpus for both DSR and AEC tasks and present replicative results, which can be used as a baseline for the corpus. The code for the proposed mixing procedure is made available online.

Cite

CITATION STYLE

APA

Astapov, S., Svirskiy, G., Lavrentyev, A., Prisyach, T., Popov, D., Ubskiy, D., & Kabarov, V. (2019). Acoustic event mixing to multichannel ami data for distant speech recognition and acoustic event classification benchmarking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11658 LNAI, pp. 31–42). Springer Verlag. https://doi.org/10.1007/978-3-030-26061-3_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free