A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

Angus Addlesee; Yanchao Yu; Arash Eshghi

Conference ProceedingsOPEN ACCESS

A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (2020) 3492-3503

DOI: 10.18653/v1/2020.coling-main.312

14Citations

84Readers

Abstract

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.

Cite

CITATION STYLE

APA

Addlesee, A., Yu, Y., & Eshghi, A. (2020). A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 3492–3503). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.312

A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

Abstract

Cite

Register to see more suggestions