An analysis of shallow and deep representations of speech based on unsupervised classification of isolated words

Giampiero Salvi

Book Chapter

An analysis of shallow and deep representations of speech based on unsupervised classification of isolated words

Salvi G

Springer Science and Business Media Deutschland GmbH, (2016), 151-157

DOI: 10.1007/978-3-319-28109-4_15

1Citations

4Readers

Get full text

Abstract

We analyse the properties of shallow and deep representations of speech. Mel frequency cepstral coefficients (MFCC) are compared to representations learned by a four layer Deep Belief Network (DBN) in terms of discriminative power and invariance to irrelevant factors such as speaker identity or gender. To avoid the influence of supervised statistical modelling, an unsupervised isolated word classification task is used for the comparison. The deep representations are also obtained with unsupervised training (no back-propagation pass is performed). The results show that DBN features provide a more concise clustering and higher match between clusters and word categories in terms of adjusted Rand score. Some of the confusions present with the MFCC features are, however, retained even with the DBN features.

Author supplied keywords

Cite

CITATION STYLE

APA

Salvi, G. (2016). An analysis of shallow and deep representations of speech based on unsupervised classification of isolated words. In Smart Innovation, Systems and Technologies (Vol. 48, pp. 151–157). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-28109-4_15

An analysis of shallow and deep representations of speech based on unsupervised classification of isolated words

Abstract

Author supplied keywords

Cite

Register to see more suggestions