Cross-media scene analysis: Estimating objects' visuals only from audio

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Human beings can get a visual image of the surrounding environment from sounds they hear. Can we give similar capabilities to computers? In this article, we introduce our recent efforts in cross-media scene analysis applied to estimate the type, location, and visual shape of objects in a scene based only on sound sources recorded with multiple microphones.

Cite

CITATION STYLE

APA

Irie, G., Kameoka, H., Kimura, A., Hiramatsu, K., & Kashino, K. (2018). Cross-media scene analysis: Estimating objects’ visuals only from audio. NTT Technical Review, 16(11), 35–40. https://doi.org/10.53829/ntr201811fa5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free