We propose a robust method to estimate the number of audio sources and the mixing matrix in a linear instantaneous mixture, even with more sources than sensors. Our method is based on a multiscale Short Time Fourier Transform (STFT), and relies on the assumption that in the neighborhood of some (unknown) scales and time-frequency points, only one source contributes to the mixture. Such time-frequency regions provide local estimates of the corresponding columns of the mixing matrix. Our main contribution is a new clustering algorithm called DEMIX to estimate the number of sources and the mixing matrix based on such local estimates. In contrast to DUET or other similar sparsity-based algorithms, which rely on a global scatter plot, our algorithm exploits a local confidence measure to weight the influence of each time-frequency point in the estimated matrix. Inspired by the work of Deville, the confidence measure relies on the time-frequency local persistence of the activity/inactivity of each source. Experiments are provided with stereophonic mixtures and show the improved performance of DEMIX compared to K-means or ELBG clustering algorithms. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Arberet, S., Gribonval, R., & Bimbot, F. (2006). A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3889 LNCS, pp. 536–543). https://doi.org/10.1007/11679363_67
Mendeley helps you to discover research relevant for your work.