Speech and Music Classification and Separation: A Review

Abdullah I. Al-Shoshan

Journal ArticleOPEN ACCESS

Speech and Music Classification and Separation: A Review

Al-Shoshan A

Journal of King Saud University - Engineering Sciences (2006) 19(1) 95-132

DOI: 10.1016/S1018-3639(18)30850-X

31Citations

60Readers

Abstract

The classification and separation of speech and music signals have attracted attention by many researchers. The purpose of the classification process is needed to build two different libraries: speech library and music library, from a stream of sounds. However, the separation process is needed in a cocktail-party problem to separate speech from music and remove the undesired one. In this paper, a review of the existing classification and separation algorithms is presented and discussed. The classification algorithms will be divided into three categories: time-domain, frequency-domain, and time-frequency domain approaches. The time-domain approaches used in literature are: the zero-crossing rate (ZCR), the short-time energy (STE), the ZCR and the STE with positive derivative, with some of their modified versions, the variance of the roll-off, and the neural networks. The frequency-domain approaches are mainly based on: spectral centroid, variance of the spectral centroid, spectral flux, variance of the spectral flux, roll-off of the spectrum, cepstral residual, and the delta pitch. The time-frequency domain approaches have not been yet tested thoroughly in literature; so, the spectrogram and the evolutionary spectrum will be introduced. Also, some new algorithms dealing with music and speech separation and segregation processes will be presented.

Cite

CITATION STYLE

APA

Al-Shoshan, A. I. (2006). Speech and Music Classification and Separation: A Review. Journal of King Saud University - Engineering Sciences, 19(1), 95–132. https://doi.org/10.1016/S1018-3639(18)30850-X

Speech and Music Classification and Separation: A Review

Abstract

Cite

Register to see more suggestions