In this chapter we consider the separation of multiple sound sources of different types including multiple speakers and transients, which are measured by a single microphone and by a video camera. We address the problem of separating a particular sound source from all other sources focusing specifically on obtaining an underlying representation of it while attenuating all other sources. By pointing the video camera merely to the desired sound source, the problem becomes equivalent to extracting the common source to the audio and the video modalities while ignoring the other sources. We use a kernel-based method, which is particularly designed for this task, providing an underlying representation of the common source. We demonstrate the usefulness of the obtained representation for the activity detection of the common source and discuss how it may be further used for source separation.
CITATION STYLE
Dov, D., Talmon, R., & Cohen, I. (2018). Audio-visual source separation with alternating diffusion maps. In Signals and Communication Technology (pp. 365–382). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-73031-8_14
Mendeley helps you to discover research relevant for your work.