The goal in user interfaces is natural interactivity unencumbered by sensor and display technology. In this paper, we propose that a multi-modal approach using inverse modeling techniques from computer vision, speech recognition, and acoustics can result in such interfaces. In particular, we demonstrate a system for audiovisual tracking, showing that such a system is more robust, more accurate, more compact, and yields more information than using a single modality for tracking. We also demonstrate how such a system can be used to find the talker among a group of individuals, and render 3D scenes to the user.
CITATION STYLE
Pingali, G., Tunali, G., & Carlbom, I. (1999). Audio-Visual Tracking for Natural Interactivity. In MULTIMEDIA 1999 - Proceedings of the 7th ACM International Conference on Multimedia (Part 1) (Vol. 1, pp. 373–382). Association for Computing Machinery, Inc. https://doi.org/10.1145/319463.319652
Mendeley helps you to discover research relevant for your work.