A major aim of separating sound source is to separate the sound of interest out of mixture, such as the sound of objects on the screen. In this paper we put forward a method incorporating sound-indicated object detection and using the detection result to separate the on screen sounds and the off screen ones. After training, the object detection network could recognize which object is sounding just like human learns what object making what sound. And then using the temporal information of sounds in a video segment, we separate out sound of the object that is not shown in the video. At last, experiments are carried out in data from AudioSet and we demonstrate that the method works well in given scenarios.
CITATION STYLE
Zhou, J., Wang, F., Guo, D., Liu, H., & Sun, F. (2019). Video-guided sound source separation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11740 LNAI, pp. 415–426). Springer Verlag. https://doi.org/10.1007/978-3-030-27526-6_36
Mendeley helps you to discover research relevant for your work.