Human beings can get a visual image of the surrounding environment from sounds they hear. Can we give similar capabilities to computers? In this article, we introduce our recent efforts in cross-media scene analysis applied to estimate the type, location, and visual shape of objects in a scene based only on sound sources recorded with multiple microphones.
CITATION STYLE
Irie, G., Kameoka, H., Kimura, A., Hiramatsu, K., & Kashino, K. (2018). Cross-media scene analysis: Estimating objects’ visuals only from audio. NTT Technical Review, 16(11), 35–40. https://doi.org/10.53829/ntr201811fa5
Mendeley helps you to discover research relevant for your work.