Speaker-clustered acoustic models evaluated on GPU for on-line subtitling of parliament meetings

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent model or even gender-dependent models. Frequent changes of speakers and a direct connection of the LVCSR system to the audio channel require an automatic switching/fusion of models as quickly as possible. An important part of the solution is real time likelihood evaluations of all clustered acoustic models, taking advantage of a fast GPU(Graphic Processing Unit). The proposed method achieved a WER reduction to the baseline gender-independent model over 2.34% relatively with more than 2M Gaussian mixtures evaluated in real-time. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Psutka, J. V., Vaněk, J., & Psutka, J. (2011). Speaker-clustered acoustic models evaluated on GPU for on-line subtitling of parliament meetings. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6836 LNAI, pp. 284–290). https://doi.org/10.1007/978-3-642-23538-2_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free