This system paper describes the Xiaomi Translation System for the IWSLT 2022 Simultaneous Speech Translation (noted as SST) shared task. We participate in the English-to-Mandarin Chinese Text-to-Text (noted as T2T) track. Our system is built based on the Transformer model with novel techniques borrowed from our recent research work. For the data filtering, language-model-based and rule-based methods are conducted to filter the data to obtain high-quality bilingual parallel corpora. We also strengthen our system with some dominating techniques related to data augmentation, such as knowledge distillation, tagged back-translation, and iterative back-translation. We also incorporate novel training techniques such as R-drop, deep model, and large batch training which have been shown to be beneficial to the naive Transformer model. In the SST scenario, several variations of wait-k strategies are explored. Furthermore, in terms of robustness, both data-based and model-based ways are used to reduce the sensitivity of our system to Automatic Speech Recognition (ASR) outputs. We finally design some inference algorithms and use the adaptive-ensemble method based on multiple model variants to further improve the performance of the system. Compared with strong baselines, fusing all techniques can improve our system by 2~3 BLEU scores under different latency regimes.
CITATION STYLE
Guo, B., Liu, M., Zhang, W., Chen, H., Mu, C., Li, X., … Guo, Y. (2022). The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022. In IWSLT 2022 - 19th International Conference on Spoken Language Translation, Proceedings of the Conference (pp. 216–224). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.iwslt-1.17
Mendeley helps you to discover research relevant for your work.