Semi-Supervised Acoustic Model Retraining for Medical ASR

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Training models for speech recognition usually requires accurate word-level transcription of available speech data. For the domain of medical dictations, it is common to have “semi-literal” transcripts available: large numbers of speech files along with their associated formatted episode report, whose content only partially overlaps with the spoken content of the dictation. We present a semi-supervised method for generating acoustic training data by decoding dictations with an existing recognizer, confirming which sections are correct by using the associated report, and repurposing these audio sections for training a new acoustic model. The effectiveness of this method is demonstrated in two applications: first, to adapt a model to new speakers, resulting in a 19.7% reduction in relative word errors for these speakers; and second, to supplement an already diverse and robust acoustic model with a large quantity of additional data (from already known voices), leading to a 5.0% relative error reduction on a large test set of over one thousand speakers.

Cite

CITATION STYLE

APA

Finley, G. P., Edwards, E., Salloum, W., Robinson, A., Sadoughi, N., Axtmann, N., … Suendermann-Oeft, D. (2018). Semi-Supervised Acoustic Model Retraining for Medical ASR. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11096 LNAI, pp. 177–187). Springer Verlag. https://doi.org/10.1007/978-3-319-99579-3_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free