Abstract
The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) and Spoken Document Retrieval (SDR) performance. This paper addresses novel advances in audio segmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include:PMVDR (Perceptual Minimum Variance Distortionless Response), SZCR (Smoothed Zero Crossing Rate), and FBLC (FilterBank Log Coefficients); next we consider a new distance metric, T 2-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after the segmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional HER and Frame Accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 Broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus.
Cite
CITATION STYLE
Huang, R., & Hansen, J. H. L. (2004). Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 1). https://doi.org/10.1109/icassp.2004.1326092
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.