Effect of spectrogram resolution on deep-neural-network-based speech enhancement

Daiki Takeuchi; Kohei Yatabe; Yuma Koizumi; Yasuhiro Oikawa; Noboru Harada

Journal ArticleOPEN ACCESS

Effect of spectrogram resolution on deep-neural-network-based speech enhancement

Acoustical Science and Technology (2020) 41(5) 769-775

DOI: 10.1250/ast.41.769

6Citations

5Readers

Abstract

In recent single-channel speech enhancement, deep neural network (DNN) has played a quite important role for achieving high performance. One standard use of DNN is to construct a maskgenerating function for time-frequency (T-F) masking. For applying a mask in T-F domain, the shorttime Fourier transform (STFT) is usually utilized because of its well-understood and invertible nature. While the mask-generating regression function has been studied for a long time, there is less research on T-F transform from the viewpoint of speech enhancement. Since the performance of speech enhancement depends on both the T-F mask estimator and T-F transform, investigating T-F transform should be beneficial for designing a better enhancement system. In this paper, as a step toward optimal T-F transform in terms of speech enhancement, we experimentally investigated the effect of parameter settings of STFT on a DNN-based mask estimator. We conducted the experiments using three types of DNN architectures with three types of loss functions, and the results suggested that U-Net is robust to the parameter setting while that is not the case for fully connected and BLSTM networks.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Takeuchi, D., Yatabe, K., Koizumi, Y., Oikawa, Y., & Harada, N. (2020). Effect of spectrogram resolution on deep-neural-network-based speech enhancement. Acoustical Science and Technology, 41(5), 769–775. https://doi.org/10.1250/ast.41.769

Readers' Seniority

PhD / Post grad / Masters / Doc 2

100%

Readers' Discipline

Computer Science 1

50%

Engineering 1

50%

Effect of spectrogram resolution on deep-neural-network-based speech enhancement

Abstract

Author supplied keywords

References Powered by Scopus

U-net: Convolutional networks for biomedical image segmentation

Supervised speech separation based on deep learning: An overview

Deep clustering: Discriminative embeddings for segmentation and separation

Cited by Powered by Scopus

Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments

Phase-aware deep speech enhancement: It's all about the frame length

APPLADE: ADJUSTABLE PLUG-AND-PLAY AUDIO DECLIPPER COMBINING DNN WITH SPARSE OPTIMIZATION

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline