Unsupervised naming of speakers in broadcast TV: Using written names, pronounced names or both ?

  • Poignant J
  • Besacier L
  • Le V
 et al. 
  • 13


    Mendeley users who have this article in their library.
  • 8


    Citations of this article.


Persons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric mod- els is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities.

Author-supplied keywords

  • ASR
  • OCR
  • Speaker identification

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

  • PUI: 373776582
  • ISSN: 19909772
  • SCOPUS: 2-s2.0-84900561584
  • SGR: 84900561584


  • Johann Poignant

  • Laurent Besacier

  • Viet Bac Le

  • Sophie Rosset

  • Georges Quénot

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free