An audio-based sequential punctuation model for ASR and its effect on human readability

6Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Inserting punctuation marks into the word chain hypothesis produced by automatic speech recognition (ASR) has long been a neglected task. In several application domains of ASR, real-time punctuation is, however, vital to improve human readability. The paper proposes and evaluates a prosody inspired approach and a phrase sequence model implemented as a recurrent neural network to predict the punctuation marks from the audio. In a very basic and lightweight modeling framework, we show that punctuation is possible by state-of-the-art performance, solely based on the audio signal for speech close to read quality. We test the approach on more spontaneous speaking styles and on ASR transcripts which may contain word errors. A subjective evaluation is also carried out to quantify the benefits of the punctuation on human readability, and we also show that when a critical punctuation accuracy is reached, humans are not able to distinguish automatic and human produced punctuation, even if the former may contain punctuation errors.

Cite

CITATION STYLE

APA

Szaszák, G. (2019). An audio-based sequential punctuation model for ASR and its effect on human readability. Acta Polytechnica Hungarica, 16(2), 93–108. https://doi.org/10.12700/APH.16.2.2019.2.6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free