Learnable PINs: Cross-modal embeddings for person identity

Arsha Nagrani; Samuel Albanie; Andrew Zisserman

Conference ProceedingsOPEN ACCESS

Learnable PINs: Cross-modal embeddings for person identity

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11217 LNCS 73-89

DOI: 10.1007/978-3-030-01261-8_5

10Citations

191Readers

Abstract

We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.

Author supplied keywords

Cite

CITATION STYLE

APA

Nagrani, A., Albanie, S., & Zisserman, A. (2018). Learnable PINs: Cross-modal embeddings for person identity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11217 LNCS, pp. 73–89). Springer Verlag. https://doi.org/10.1007/978-3-030-01261-8_5

Learnable PINs: Cross-modal embeddings for person identity

Abstract

Author supplied keywords

Cite

Register to see more suggestions