Since the performance of automatic speech recognition (ASR) still degrades under ad-verse acoustic conditions, recognition robustness can be improved by incorporating further modal-ities. The arising question of information fusion shows interesting parallels to problems in digi-tal communications, where the turbo principle revolutionized reliable communication. In this pa-per, we examine whether the immense gains obtained in communications could also probably be achieved in the field of ASR, since decoding algorithms are often practically the same: Viterbi algorithm, or forward-backward algorithm (FBA). First, we show that an ASR turbo recognition scheme can be implemented within the classical FBA framework by modifying the observation likelihoods only; second, we extend our solution to a generalized turbo ASR approach, which is fully applicable to multimodal ASR. Applied to an audio-visual speech recognition task, our pro-posed method clearly outperforms a conventional coupled hidden-Markov model approach as well as an iterative state-of-the-art approach with up to 32.3% relative reduction in word error rate.
CITATION STYLE
Receveur, S., Scheler, D., & Fingscheidt, T. (2016). A Turbo-Decoding Weighted Forward-Backward Algorithm for Multimodal Speech Recognition (pp. 179–192). https://doi.org/10.1007/978-3-319-21834-2_16
Mendeley helps you to discover research relevant for your work.