On ideal binary mask as the computational goal of auditory scene analysis

544Citations
Citations of this article
141Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In his famous treatise of computational vision, Marr (1982) makes a compelling argument for separating different levels of analysis in order to understand complex information processing. In particular, the computational theory level, concerned with the goal of computation and general processing strategy, must be separated from the algorithm level, or the separation of what from how. This chapter is an attempt at a computational-theory analysis of auditory scene analysis, where the main task is to understand the character of the CASA problem. My analysis results in the proposal of the ideal binary mask as a main goal of CASA. This goal is consistent with characteristics of human auditory scene analysis. The goal is also consistent with more specific objectives such as enhancing ASR and speech intelligibility. The resulting evaluation metric has the properties of simplicity and generality, and is easy to apply when the premixing target is available. The goal of the ideal binary mask has led to effective for speech separation algorithms that attempt to explicitly estimate such masks. © 2005 Springer Science + Business Media, Inc.

Cite

CITATION STYLE

APA

Wang, D. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In Speech Separation by Humans and Machines (pp. 181–197). Springer US. https://doi.org/10.1007/0-387-22794-6_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free