In this paper, we investigate the effect of temporal context for speech/ non-speech detection (SND). It is shown that even a simple feature such as full-band energy, when employed with a large-enough context, shows promise for further investigation. Experimental evaluations on the test data set, with a state-of-the-art multi-layer perceptron based SND system and a simple energy threshold based SND method, using the F-measure, show an absolute performance gain of 4.4% and 5.4% respectively. The optimal contextual length was found to be 1000 ms. Further numerical optimizations yield an improvement (3.37% absolute), resulting in an absolute gain of 7.77% and 8.77% over the MLP based and energy based methods respectively. ROC based performance evaluation also reveals promising performance for the proposed method, particularly in low SNR conditions. © 2008 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Krishnan Parthasarathi, S. H., Motlíček, P., & Hermansky, H. (2008). Exploiting contextual information for speech/non-speech detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5246 LNAI, pp. 451–459). https://doi.org/10.1007/978-3-540-87391-4_58
Mendeley helps you to discover research relevant for your work.