Deep metric learning with improved triplet loss for face clustering in videos

18Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Face clustering in videos is to partition a large amount of faces into a given number of clusters, such that some measure of distance is minimized within clusters and maximized between clusters. In real-world videos, head pose, facial expression, scale, illumination, occlusion and some uncontrolled factors may dramatically change the appearance variations of faces. In this paper, we tackle this problem by learning non-linear metric function with a deep convolutional neural network from the input image to a low-dimensional feature embedding with the visual constraints among face tracks. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function, which pushes the negative face away from the positive pairs, and requires the distance of the positive pair to be less than a margin. We extensively evaluate the proposed algorithm on a set of challenging videos and demonstrate significant performance improvement over existing techniques.

Cite

CITATION STYLE

APA

Zhang, S., Gong, Y., & Wang, J. (2016). Deep metric learning with improved triplet loss for face clustering in videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9916 LNCS, pp. 497–508). Springer Verlag. https://doi.org/10.1007/978-3-319-48890-5_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free