ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors

Yu Wang; Yong Ping Huang; Xuan Jing Shen

Journal ArticleOPEN ACCESS

ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors

IEEE Access (2021) 9 31170-31178

DOI: 10.1109/ACCESS.2021.3060180

3Citations

14Readers

Abstract

How to integrate the temporal and spatial continuity information, when designing the video texture description operator, is crucial to realize video face recognition and facilitate video analysis and understanding, however, it has still yet to be properly addressed. In this paper, a novel video face recognition algorithm is proposed based on an aggregated local spatial-temporal descriptor (ST-VLAD), followed by a novel Fisher Criterion-based weight-learning method, which portrays the local information of the video more accurately, therefore largely improving the representation ability of description vectors. The proposed descriptor was tested on two representative databases, Honda/UCSD and YouTube Face database, achieving accuracies of 89.7% and 87.3%, respectively. The proposed method greatly outperformed the other existing state-of-art methods, suggesting a potential broad utility in the field of video face recognition.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., Huang, Y. P., & Shen, X. J. (2021). ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors. IEEE Access, 9, 31170–31178. https://doi.org/10.1109/ACCESS.2021.3060180

ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors

Abstract

Author supplied keywords

Cite

Register to see more suggestions