Abstract
We use a machine learner trained on a combination of acoustic and contextual features to predict the accuracy of incoming n-best automatic speech recognition (ASR) hypotheses to a spoken dialogue system (SDS). Our novel approach is to use a simple statistical User Simulation (US) for this task, which measures the likelihood that the user would say each hypothesis in the current context. Such US models are now common in machine learning approaches to SDS, are trained on real dialogue data, and are related to theories of "alignment" in psycholinguistics. We use a US to predict the user's next dialogue move and thereby re-rank n-best hypotheses of a speech recognizer for a corpus of 2564 user utterances. The method achieved a significant relative reduction of Word Error Rate (WER) of 5% (this is 44% of the possible WER improvement on this data), and 62% of the possible semantic improvement (Dialogue Move Accuracy), compared to the baseline policy of selecting the topmost ASR hypothesis. The majority of the improvement is attributable to the User Simulation feature, as shown by Information Gain analysis. © 2009 Association for Computational Linguistics.
Cite
CITATION STYLE
Lemon, O., & Konstas, I. (2009). User Simulations for context-sensitive speech recognition in Spoken Dialogue Systems. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 505–513). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609123
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.