A real-Time end-To-end multilingual speech recognition architecture

43Citations
Citations of this article
74Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the monolingual character of many ASR systems, in which users are constrained to a single preset language. In this work, we present an end-To-end multi-language ASR architecture, developed and deployed at Google, that allows users to select arbitrary combinations of spoken languages. We leverage recent advances in language identification and a novel method of real-Time language selection to achieve similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.

Cite

CITATION STYLE

APA

Gonzalez-Dominguez, J., Eustis, D., Lopez-Moreno, I., Senior, A., Beaufays, F., & Moreno, P. J. (2015). A real-Time end-To-end multilingual speech recognition architecture. IEEE Journal on Selected Topics in Signal Processing, 9(4), 749–759. https://doi.org/10.1109/JSTSP.2014.2364559

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free