In this paper, we present visual tracking techniques for multimodal human computer interaction. First, we discuss techniques for tracking human faces in which human skin-color is used as a major feature. An adaptive stochastic model has been developed to characterize the skin-color distributions. Based on the maximum likelihood method, the model parameters can be adapted for different people and different lighting conditions. The feasibility of the model has been demonstrated by the development of a real-time face tracker. The system has achieved a rate of 30+ frames/second using a low-end workstation with a framegrabber and a camera. We also present a top-down approach for tracking facial features such as eyes, nostrils, and lip corners. These real-time visual tracking techniques have been successfully applied to many applications such as gaze tracking, and lip-reading. The face tracker has been combined with a microphone array for extracting speech signal from a specific person. The gaze tracker has been combined with a speech recognizer in a multimodal interface for controlling a panoramic image viewer.
CITATION STYLE
Yang, J., Stiefelhagen, R., Meier, U., & Waibel, A. (1998). Visual tracking for multimodal human computer interaction. In Conference on Human Factors in Computing Systems - Proceedings (pp. 140–147). ACM. https://doi.org/10.1145/274644.274666
Mendeley helps you to discover research relevant for your work.