Abstract
We address the 3D audio-visual mouth tracking problem when using a compact platform with co-located audio-visual sensors, without a depth camera. In particular, we propose a multi-modal particle filter that combines a face detector and 3D hypothesis mapping to the image plane. The audio likelihood computation is assisted by video, which relies on a GCC-PHAT based acoustic map. By combining audio and video inputs, the proposed approach can cope with a reverberant and noisy environment, and can deal with situations when the person is occluded, outside the Field of View (FoV), or not facing the sensors. Experimental results show that the proposed tracker is accurate both in 3D and on the image plane.
Author supplied keywords
Cite
CITATION STYLE
Qian, X., Xompero, A., Cavallaro, A., Brutti, A., Lanz, O., & Omologo, M. (2018). 3D Mouth Tracking from a Compact Microphone Array Co-Located with a camera. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2018-April, pp. 3071–3075). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICASSP.2018.8461323
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.