Automatic correction of i/y spelling in czech asr output

Jan Švec; Jan Lehečka; Luboš Šmídl; Pavel Ircing

Conference Proceedings

Automatic correction of i/y spelling in czech asr output

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12284 LNAI 321-330

DOI: 10.1007/978-3-030-58323-1_35

2Citations

1Readers

Get full text

Abstract

This paper concentrates on the design and evaluation of the method that would be able to automatically correct the spelling of i/y in the Czech words at the output of the ASR decoder. After analysis of both the Czech grammar rules and the data, we have decided to deal only with the endings consisting of consonants b/f/l/m/p/s/v/z followed by i/y in both short and long forms. The correction is framed as the classification task where the word could belong to the “i” class, the “y” class or the “empty” class. Using the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) architecture, we were able to substantially improve the correctness of the i/y spelling both on the simulated and the real ASR output. Since the misspelling of i/y in the Czech texts is seen by the majority of native Czech speakers as a blatant error, the corrected output greatly improves the perceived quality of the ASR system.

Author supplied keywords

Cite

CITATION STYLE

APA

Švec, J., Lehečka, J., Šmídl, L., & Ircing, P. (2020). Automatic correction of i/y spelling in czech asr output. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12284 LNAI, pp. 321–330). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58323-1_35

Automatic correction of i/y spelling in czech asr output

Abstract

Author supplied keywords

Cite

Register to see more suggestions