This paper addresses reduction of computational cost in training of a Deep Neural Network (DNN), in particular, for sound identification using highly noise-contaminated sound recorded with a microphone array embedded in an Unmanned Aerial Vehicle (UAV), aiming at people’s voice detection quickly and widely in a disastrous situation. It is known that a DNN training method called end-to-end training shows high performance, since it uses a huge neural network with high nonlinearity which is trained with a large amount of raw input signals without preprocessing. Its computational cost is, however, expensive due to the high complexity of the neural network. Therefore, we propose twostage DNN training using two separately-trained networks; denoising of sound sources and sound source identification. Since the huge network is divided into two smaller networks, the complexity of the networks is expected to decrease and each of them can consider a specific model of denoising and identification. This results in faster convergence and computational cost reduction in DNN training. Preliminary results showed that only 71% of training time was necessary with the proposed two staged network, while maintaining the accuracy of sound source identification, compared to end-to-end training using noisy acoustic signals recorded with an 8 ch circular microphone array embedded in a UAV.
CITATION STYLE
Morito, T., Sugiyama, O., Uemura, S., Kojima, R., & Nakadai, K. (2016). Reduction of computational cost using two-stage deep neural network for training for denoising and sound source identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9799, pp. 562–573). Springer Verlag. https://doi.org/10.1007/978-3-319-42007-3_49
Mendeley helps you to discover research relevant for your work.