Speech signal modeling using multivariate distributions

Ali Aroudi; Hadi Veisi; Hossein Sameti; Zahra Mafakheri

Journal ArticleOPEN ACCESS

Speech signal modeling using multivariate distributions

Eurasip Journal on Audio, Speech, and Music Processing (2015) 2015(1) 1-14

DOI: 10.1186/s13636-015-0078-1

7Citations

14Readers

Abstract

Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density function (pdf) for speech signals is Gaussian, recent studies have shown the superiority of super-Gaussian pdfs. A large research effort has focused on the investigation of a univariate case of speech signal distribution; however, in this paper, we study the multivariate distributions of speech signal and its representations using the conventional distribution functions, e.g., multivariate Gaussian and multivariate Laplace, and the copula-based multivariate distributions as candidates. The copula-based technique is a powerful method in modeling non-Gaussian multivariate distributions with non-linear inter-dimensional dependency. The level of similarity between the candidate pdfs and the real speech pdf in different domains is evaluated using the energy goodness-of-fit test. In our evaluations, the best-fitted distributions for speech signal vectors with different lengths in various domains are determined. A similar experiment is performed for different classes of English phonemes (fricatives, nasals, stops, vowels, and semivowel/glides). The evaluation results demonstrate that the multivariate distribution of speech signals in different domains is mostly super-Gaussian, except for Mel-frequency cepstral coefficient. Also, the results confirm that the distribution of the different phoneme classes is better statistically modeled by a mixture of Gaussian and Laplace pdfs. The copula-based distributions provide better statistical modeling of vectors representing discrete Fourier transform (DFT) amplitude of speech vectors with a length shorter than 500 ms.

Author supplied keywords

Cite

CITATION STYLE

APA

Aroudi, A., Veisi, H., Sameti, H., & Mafakheri, Z. (2015). Speech signal modeling using multivariate distributions. Eurasip Journal on Audio, Speech, and Music Processing, 2015(1), 1–14. https://doi.org/10.1186/s13636-015-0078-1

Speech signal modeling using multivariate distributions

Abstract

Author supplied keywords

Cite

Register to see more suggestions