Robust recognition of conversational telephone speech via multi-condition training and data augmentation

Jiří Málek; Jindřich Ždánský; Petr Červa

Conference Proceedings

Robust recognition of conversational telephone speech via multi-condition training and data augmentation

Lecture Notes in Computer Science (2018) 11107 LNAI 324-333

DOI: 10.1007/978-3-030-00794-2_35

7Citations

3Readers

Get full text

Abstract

In this paper, we focus on automatic recognition of telephone conversational speech in scenario, when no amount of genuine telephone recordings is available for training. The training set contains only data from a significantly different domain, such as recording of broadcast news. Significant mismatch arises between training and test conditions, which leads to deteriorated performance of the resulting recognition system. We aim to diminish this mismatch using the data augmentation. Speech compression and narrow-band spectrum are significant features of the telephone speech. We apply these effects to the training dataset artificially, in order to make it more similar to the desired test conditions. Using such augmented dataset, we subsequently train an acoustic model. Our experiments show that the augmented models achieve accuracy close to the results of a model trained on genuine telephone data. Moreover, when the augmentation is applied to the real-world telephone data, further accuracy gains are achieved.

Author supplied keywords

Cite

CITATION STYLE

APA

Málek, J., Ždánský, J., & Červa, P. (2018). Robust recognition of conversational telephone speech via multi-condition training and data augmentation. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 324–333). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_35

Robust recognition of conversational telephone speech via multi-condition training and data augmentation

Abstract

Author supplied keywords

Cite

Register to see more suggestions