Speech/music discrimination in audio podcast using structural segmentation and timbre recognition

Mathieu Barthet; Steven Hargreaves; Mark Sandler

Conference Proceedings

Speech/music discrimination in audio podcast using structural segmentation and timbre recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6684 LNCS 138-162

DOI: 10.1007/978-3-642-23126-1_10

4Citations

11Readers

Get full text

Abstract

We propose two speech/music discrimination methods using timbre models and measure their performances on a 3 hour long database of radio podcasts from the BBC. In the first method, the machine estimated classifications obtained with an automatic timbre recognition (ATR) model are post-processed using median filtering. The classification system (LSF/K-means) was trained using two different taxonomic levels, a high-level one (speech, music), and a lower-level one (male and female speech, classical, jazz, rock & pop). The second method combines automatic structural segmentation and timbre recognition (ASS/ATR). The ASS evaluates the similarity between feature distributions (MFCC, RMS) using HMM and soft K-means algorithms. Both methods were evaluated at a semantic (relative correct overlap RCO), and temporal (boundary retrieval F-measure) levels. The ASS/ATR method obtained the best results (average RCO of 94.5% and boundary F-measure of 50.1%). These performances were favourably compared with that obtained by a SVM-based technique providing a good benchmark of the state of the art. © 2011 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Barthet, M., Hargreaves, S., & Sandler, M. (2011). Speech/music discrimination in audio podcast using structural segmentation and timbre recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6684 LNCS, pp. 138–162). https://doi.org/10.1007/978-3-642-23126-1_10

Speech/music discrimination in audio podcast using structural segmentation and timbre recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions