We describe a machine learning approach that allows an open-world spoken dialog system to learn to predict engagement intentions in situ, from interaction. The proposed approach does not require any developer supervision, and leverages spatiotemporal and attentional features automatically extracted from a visual analysis of people coming into the proximity of the system to produce models that are attuned to the characteristics of the environment the system is placed in. Experimental results indicate that a system using the proposed approach can learn to recognize engagement intentions at low false positive rates (e.g. 2-4%) up to 3-4 seconds prior to the actual moment of engagement. © 2009 Association for Computational Linguistics.
CITATION STYLE
Bohus, D., & Horvitz, E. (2009). Learning to predict engagement with a spoken dialog system in open-world settings. In Proceedings of the SIGDIAL 2009 Conference: 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 244–252). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1708376.1708411
Mendeley helps you to discover research relevant for your work.