Abstract
This paper describes the offline and simultaneous speech translation (ST) systems developed at AppTek for IWSLT 2021. Our offline ST submission includes the direct end-to-end system and the so-called posterior tight integrated model, which is akin to the cascade system but is trained in an end-to-end fashion, where all the cascaded modules are end-to-end models themselves. For simultaneous ST, we combine hybrid automatic speech recognition (ASR) with a machine translation (MT) approach whose translation policy decisions are learned from statistical word alignments. Compared to last year, we improve general quality and provide a wider range of quality/latency trade-offs, both due to a data augmentation method making the MT model robust to varying chunk sizes. Finally, we present a method for ASR output segmentation into sentences that introduces a minimal additional delay.
Cite
CITATION STYLE
Bahar, P., Wilken, P., di Gangi, M., & Matusov, E. (2021). Without Further Ado: Direct and Simultaneous Speech Translation by AppTek in 2021. In IWSLT 2021 - 18th International Conference on Spoken Language Translation, Proceedings (pp. 52–63). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.iwslt-1.5
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.