Abstract
Removing background noise from acoustic observations to obtain clean signals is an important research topic regarding numerous real acoustic applications. Owing to their strong model capacity in function mapping, deep neural network-based algorithms have been successfully applied in target signal enhancement in acoustic applications. As most target signals carry semantic information encoded in a hierarchal structure in short-and long-term contexts, noise may distort such structures nonuniformly. In most deep neural network-based algorithms, such local and global effects are not explicitly considered in a modeling architecture for signal enhancement. In this article, we propose a temporal attentive pooling (TAP) mechanism combined with a conventional convolutional recurrent neural network (CRNN) model, called TAP-CRNN, which explicitly considers both global and local information for acoustic signal enhancement (ASE). In the TAP-CRNN model, we first use a convolution layer to extract local information from acoustic signals and a recurrent neural network (RNN) architecture to characterize temporal contextual information. Second, we exploit a novel attention mechanism to contextually process salient regions of noisy signals. We evaluate the proposed ASE system using an infant cry dataset. The experimental results confirm the effectiveness of the proposed TAP-CRNN, compared with related deep neural network models, and demonstrate that the proposed TAP-CRNN can more effectively reduce noise components from infant cry signals with unseen background noises at different signal-to-noise levels. We further tested the TAP-CRNN ASE system on a downstream infant cry detection (ICD) system, which determines whether a sound segment is involved in an infant cry event. Experimental results show that TAP-CRNN ASE can effectively reduce the noise components, thereby improving the performance of ICD under noisy conditions.
Author supplied keywords
Cite
CITATION STYLE
Hussain, T., Wang, W. C., Gogate, M., Dashtipour, K., Tsao, Y., Lu, X., … Hussain, A. (2022). A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement. IEEE Transactions on Artificial Intelligence, 3(5), 833–842. https://doi.org/10.1109/TAI.2022.3169995
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.