Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora

26Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) and Spoken Document Retrieval (SDR) performance. This paper addresses novel advances in audio segmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include:PMVDR (Perceptual Minimum Variance Distortionless Response), SZCR (Smoothed Zero Crossing Rate), and FBLC (FilterBank Log Coefficients); next we consider a new distance metric, T 2-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after the segmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional HER and Frame Accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 Broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus.

Cite

CITATION STYLE

APA

Huang, R., & Hansen, J. H. L. (2004). Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 1). https://doi.org/10.1109/icassp.2004.1326092

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free