Unsupervised naming of speakers in broadcast TV: Using written names, pronounced names or both ?

9Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Persons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric models is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities. Copyright © 2013 ISCA.

Author supplied keywords

Cite

CITATION STYLE

APA

Poignant, J., Besacier, L., Le, V. B., Rosset, S., & Quénot, G. (2013). Unsupervised naming of speakers in broadcast TV: Using written names, pronounced names or both ? In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1462–1466). International Speech and Communication Association. https://doi.org/10.21437/interspeech.2013-380

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free