Abstract
This contribution aims at speech model-based speech enhancement by exploiting the source-filter model of human speech production. The proposed method enhances the excitation signal in the cepstral domain by making use of a deep neural network DNN. We investigate two types of target representations along with the significant effects of their normalization. The new approach exceeds the performance of a formerly introduced classical signal processing-based cepstral excitation manipulation CEM method in terms of noise attenuation by about 1.5 dB. We show that this gain also holds true when comparing serial combinations of envelope and excitation enhancement. In the important low-SNR conditions, no significant trade-off for speech component quality or speech intelligibility is induced, while allowing for substantially higher noise attenuation. In total, a traditional purely statistical state-of-the-art speech enhancement system is outperformed by more than 3 dB noise attenuation.
Author supplied keywords
Cite
CITATION STYLE
Elshamy, S., & Fingscheidt, T. (2019). DNN-Based cepstral excitation manipulation for speech enhancement. IEEE/ACM Transactions on Audio Speech and Language Processing, 27(11), 1803–1814. https://doi.org/10.1109/TASLP.2019.2933698
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.