With the advent of new generations of personal assistants integrated with voice-controlled devices (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.), the demand for efficient mechanisms to detect, localize and recognize the source of sound events is raising up. As such, microphone-array based devices using improved algorithms are of interest for the research community. In this context, the recent success of deep learning algorithms in various domains (e.g., computer vision, speech recognition, etc.) opens the door to their application to the SELD (Sound Event Localization and Detection) problem. Here, the challenge stands on effectively combining deep neural networks (DNNs) with embedded devices driving specific configurations of the microphone arrays. In this work, we propose the QuadCOIN system. It is an embedded system executing the algorithms needed to detect and localize a sound event in the space all around, which exploits a specific arrangement of microphones that improves the precision in estimating the sound source position. Specifically, our system is composed of an embedded computing device coupled with four groups of microphones, each arranged as a small grid of four sensing elements (i.e., four microphone arrays). The embedded computing device collects the estimations of the event localization from the four groups of sensors, and then determines the exact position of the sound source. To this end, each group of microphones runs a cutting-edge Convolutional Neural Network (CNN), which allows to detect events of interest. The CNN has been trained using datasets generated through a developed in-house framework. As proof of the feasibility of the proposed system, we implemented it on low-cost hardware, which is composed of a single board computer (SBC) and four ST-BlueCOIN microphone arrays. Experimental results carried out on the QuadCOIN system, demonstrate its precision and accuracy in detecting sound events and localizing the corresponding sound sources.
CITATION STYLE
Ciccia, S., Scionti, A., Vitali, G., & Terzo, O. (2021). Quadcoins-network: a deep learning approach to sound source localization. In Advances in Intelligent Systems and Computing (Vol. 1194 AISC, pp. 130–141). Springer. https://doi.org/10.1007/978-3-030-50454-0_13
Mendeley helps you to discover research relevant for your work.