Although multi-modal (e.g. voice and face) biometric verification systems were in active development and showed impressive performance they need to be protected from spoofing attacks. In this paper we present methods for verifying face liveness based on estimation of synchrony between audio stream and lips movements track during the pronunciation of passphrase. The passphrase consists of a random set of the predetermined English words that are generated dynamically for each verification attempt. Lip movements extraction is performed by using of so-called Constrained Local Model of face shape. Audio stream is used to determine time intervals of pronounced words by means of automatic segmentation. Estimation of synchrony is done by analysis of lip movements for each word by employing a feedforward neural network and a Gaussian naive Bayes classifier. Finally, liveness score assessment is performed by averaging of individual word predictions during verification phrase utterance. For GRID corpus dataset average EER of 4.38% was achieved.
CITATION STYLE
Melnikov, A., Akhunzyanov, R., Kudashev, O., & Luckyanets, E. (2015). Audiovisual liveness detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9280, pp. 643–652). Springer Verlag. https://doi.org/10.1007/978-3-319-23234-8_59
Mendeley helps you to discover research relevant for your work.