Achieving Timestamp Prediction While Recognizing with Non-autoregressive End-to-End ASR Model

Xian Shi; Yanni Chen; Shiliang Zhang; Zhijie Yan

Conference Proceedings

Achieving Timestamp Prediction While Recognizing with Non-autoregressive End-to-End ASR Model

Communications in Computer and Information Science (2023) 1765 CCIS 89-100

DOI: 10.1007/978-981-99-2401-1_8

0Citations

3Readers

Get full text

Abstract

Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment (FA) and provide timestamps, while end-to-end ASR systems especially AED based ones are short of such ability. This paper proposes to perform timestamp prediction (TP) while recognizing by utilizing continuous integrate-and-fire (CIF) mechanism in non-autoregressive ASR model - Paraformer. Foucing on the fire place bias issue of CIF, we conduct post-processing strategies including fire-delay and silence insertion. Besides, we propose to use scaled-CIF to smooth the weights of CIF output, which is proved beneficial for both ASR and TP task. Accumulated averaging shift (AAS) and diarization error rate (DER) are adopted to measure the quality of timestamps and we compare these metrics of proposed system and conventional hybrid force-alignment system. The experiment results over manually-marked timestamps testset show that the proposed optimization methods significantly improve the accuracy of CIF timestamps, reducing 66.7% and 82.1% of AAS and DER respectively. Comparing to Kaldi force-alignment trained with the same data, optimized CIF timestamps achieved 12.3% relative AAS reduction.

Author supplied keywords

Cite

CITATION STYLE

APA

Shi, X., Chen, Y., Zhang, S., & Yan, Z. (2023). Achieving Timestamp Prediction While Recognizing with Non-autoregressive End-to-End ASR Model. In Communications in Computer and Information Science (Vol. 1765 CCIS, pp. 89–100). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-2401-1_8

Achieving Timestamp Prediction While Recognizing with Non-autoregressive End-to-End ASR Model

Abstract

Author supplied keywords

Cite

Register to see more suggestions