This paper describes in detail the acoustic modeling part of the keyword search system developed in the Speech Technology Center (STC) for the OpenKWS 2016 evaluation. The key idea was to utilize diversity of both sound representations and acoustic model architectures in the system. For the former, we extended speaker-dependent bottleneck (SDBN) approach to the multilingual case, which is the main contribution of the paper. Two types of multilingual SDBN features were applied in addition to conventional spectral and cepstral features. The acoustic model architectures employed in the final system are based on deep feedforward and recurrent neural networks. We also applied speaker adaptation of acoustic models using multilingual i-vectors, speed perturbation based data augmentation and semi-supervised training. Final STC system comprised 9 acoustic models, which allowed it to achieve strong performance and to be among the top three systems in the evaluation.
CITATION STYLE
Medennikov, I., Romanenko, A., Prudnikov, A., Mendelev, V., Khokhlov, Y., Korenevsky, M., … Zatvornitskiy, A. (2017). Acoustic modeling in the STC keyword search system for openKWS 2016 evaluation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10458 LNAI, pp. 76–86). Springer Verlag. https://doi.org/10.1007/978-3-319-66429-3_7
Mendeley helps you to discover research relevant for your work.