ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors

3Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

How to integrate the temporal and spatial continuity information, when designing the video texture description operator, is crucial to realize video face recognition and facilitate video analysis and understanding, however, it has still yet to be properly addressed. In this paper, a novel video face recognition algorithm is proposed based on an aggregated local spatial-temporal descriptor (ST-VLAD), followed by a novel Fisher Criterion-based weight-learning method, which portrays the local information of the video more accurately, therefore largely improving the representation ability of description vectors. The proposed descriptor was tested on two representative databases, Honda/UCSD and YouTube Face database, achieving accuracies of 89.7% and 87.3%, respectively. The proposed method greatly outperformed the other existing state-of-art methods, suggesting a potential broad utility in the field of video face recognition.

Cite

CITATION STYLE

APA

Wang, Y., Huang, Y. P., & Shen, X. J. (2021). ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors. IEEE Access, 9, 31170–31178. https://doi.org/10.1109/ACCESS.2021.3060180

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free