On ideal binary mask as the computational goal of auditory scene analysis

Deliang Wang

Book Chapter

On ideal binary mask as the computational goal of auditory scene analysis

Wang D

Springer US, (2005), 181-197

DOI: 10.1007/0-387-22794-6_12

544Citations

141Readers

Get full text

Abstract

In his famous treatise of computational vision, Marr (1982) makes a compelling argument for separating different levels of analysis in order to understand complex information processing. In particular, the computational theory level, concerned with the goal of computation and general processing strategy, must be separated from the algorithm level, or the separation of what from how. This chapter is an attempt at a computational-theory analysis of auditory scene analysis, where the main task is to understand the character of the CASA problem. My analysis results in the proposal of the ideal binary mask as a main goal of CASA. This goal is consistent with characteristics of human auditory scene analysis. The goal is also consistent with more specific objectives such as enhancing ASR and speech intelligibility. The resulting evaluation metric has the properties of simplicity and generality, and is easy to apply when the premixing target is available. The goal of the ideal binary mask has led to effective for speech separation algorithms that attempt to explicitly estimate such masks. © 2005 Springer Science + Business Media, Inc.

Cite

CITATION STYLE

APA

Wang, D. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In Speech Separation by Humans and Machines (pp. 181–197). Springer US. https://doi.org/10.1007/0-387-22794-6_12

On ideal binary mask as the computational goal of auditory scene analysis

Abstract

Cite

Register to see more suggestions