A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos

9Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Face-based video retrieval (FBVR) is the task of retrieving videos that containing the same face shown in the query image. In this article, we present the first end-to-end FBVR pipeline that is able to operate on large datasets of unconstrained, multi-shot, multi-person videos. We adapt an existing audiovisual recognition dataset to the task of FBVR and use it to evaluate our proposed pipeline. We compare a number of deep learning models for shot detection, face detection, and face feature extraction as part of our pipeline on a validation dataset made of more than 4000 videos. We obtain 97.25% mean average precision on an independent test set, composed of more than 1000 videos. The pipeline is able to extract features from videos at ∼ 7 times the real-time speed, and it is able to perform a query on thousands of videos in less than 0.5 s.

Cite

CITATION STYLE

APA

Ciaparrone, G., Chiariglione, L., & Tagliaferri, R. (2022). A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos. Neural Computing and Applications, 34(10), 7489–7506. https://doi.org/10.1007/s00521-021-06875-x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free