Look who's talking

Eleonora D'Arca; Neil M. Robertson; James Hopgood

Conference Proceedings

Look who's talking

IET Conference Publications (2013) 2013(619 CP)

DOI: 10.1049/cp.2013.2075

N/ACitations

5Readers

Get full text

Abstract

This paper proposes a method to automatically detect and localise the dominant speaker in a conversation by using audio and video information. The idea is that gesturing means speaking, so we look for people hands or heads movements to infer a person is talking. In a normal conversational context with two or more people, we learn Mel-frequency cepstral coefficients (MFCC) and find how they correlate with the optical flow associated with moving pixel regions by canonical correlation analysis (CCA). In complex scenarios, this operation could be resulting in associating pixel regions to sounds which actually are not really correlated. Therefore, we also triangulate the information coming from the microphones to estimate the position of the actual audio source, narrowing down the visual space of search, hence reducing the probabilities of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real data the improvement in dominant speaker localization.

Cite

CITATION STYLE

APA

D’Arca, E., Robertson, N. M., & Hopgood, J. (2013). Look who’s talking. In IET Conference Publications (Vol. 2013). https://doi.org/10.1049/cp.2013.2075

Look who's talking

Abstract

Cite

Register to see more suggestions