A geometric interpretation of non-target-normalized maximum cross-channel correlation for vocal activity detection in meetings

Kornel Laskowski; Tanja Schultz

Conference ProceedingsOPEN ACCESS

A geometric interpretation of non-target-normalized maximum cross-channel correlation for vocal activity detection in meetings

NAACL-HLT 2007 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Companion Volume: Short Papers (2007) 89-92

DOI: 10.3115/1614108.1614131

6Citations

76Readers

Abstract

Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, standard vocal activity detection algorithms have been shown to be ineffective, because participants typically vocalize for only a fraction of the recorded time and because, while they are not vocalizing, their channels are frequently dominated by crosstalk from other participants. In the present work, we review a particular type of normalization of maximum cross-channel correlation, a feature recently introduced to address the crosstalk problem. We derive a plausible geometric interpretation and show how the frame size affects performance.

Cite

CITATION STYLE

APA

Laskowski, K., & Schultz, T. (2007). A geometric interpretation of non-target-normalized maximum cross-channel correlation for vocal activity detection in meetings. In NAACL-HLT 2007 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Companion Volume: Short Papers (pp. 89–92). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1614108.1614131

A geometric interpretation of non-target-normalized maximum cross-channel correlation for vocal activity detection in meetings

Abstract

Cite

Register to see more suggestions