An audio-based sequential punctuation model for ASR and its effect on human readability

György Szaszák

Journal ArticleOPEN ACCESS

An audio-based sequential punctuation model for ASR and its effect on human readability

Szaszák G

Acta Polytechnica Hungarica (2019) 16(2) 93-108

DOI: 10.12700/APH.16.2.2019.2.6

6Citations

10Readers

Abstract

Inserting punctuation marks into the word chain hypothesis produced by automatic speech recognition (ASR) has long been a neglected task. In several application domains of ASR, real-time punctuation is, however, vital to improve human readability. The paper proposes and evaluates a prosody inspired approach and a phrase sequence model implemented as a recurrent neural network to predict the punctuation marks from the audio. In a very basic and lightweight modeling framework, we show that punctuation is possible by state-of-the-art performance, solely based on the audio signal for speech close to read quality. We test the approach on more spontaneous speaking styles and on ASR transcripts which may contain word errors. A subjective evaluation is also carried out to quantify the benefits of the punctuation on human readability, and we also show that when a critical punctuation accuracy is reached, humans are not able to distinguish automatic and human produced punctuation, even if the former may contain punctuation errors.

Author supplied keywords

Cite

CITATION STYLE

APA

Szaszák, G. (2019). An audio-based sequential punctuation model for ASR and its effect on human readability. Acta Polytechnica Hungarica, 16(2), 93–108. https://doi.org/10.12700/APH.16.2.2019.2.6

An audio-based sequential punctuation model for ASR and its effect on human readability

Abstract

Author supplied keywords

Cite

Register to see more suggestions