Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition

Toru Nakashika; Toshiya Yoshioka; Tetsuya Takiguchi; Yasuo Ariki; Stefan Duffner; Christophe Garcia

Journal ArticleOPEN ACCESS

Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition

Nakashika T
Yoshioka T
Takiguchi T
et al.

Transactions on Machine Learning and Artificial Intelligence (2014) 2(2) 48-62

DOI: 10.14738/tmlai.22.150

N/ACitations

11Readers

Abstract

In this paper, we investigate the recognition of speech produced by a person with an articulation disorder resulting from athetoid cerebral palsy. The articulation of the first spoken words tends to become unstable due to strain on speech muscles, and that causes degradation of speech recognition. Therefore, we propose a robust feature extraction method using a convolutive bottleneck network (CBN) instead of the well-known MFCC. The CBN stacks multiple various types of layers, such as a convolution layer, a subsampling layer, and a bottleneck layer, forming a deep network. Applying the CBN to feature extraction for dysarthric speech, we expect that the CBN will reduce the influence of the unstable speaking style caused by the athetoid symptoms. Furthermore, we also adopt dropout in the output layer since automatically-assigned labels to the dysarthric speech are usually unreliable due to ambiguous phonemes uttered by the person with speech disorders. We confirmed its effectiveness through word-recognition experiments, where the CNN-based feature extraction method outperformed the conventional feature extraction method.

Cite

CITATION STYLE

APA

Nakashika, T., Yoshioka, T., Takiguchi, T., Ariki, Y., Duffner, S., & Garcia, C. (2014). Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition. Transactions on Machine Learning and Artificial Intelligence, 2(2), 48–62. https://doi.org/10.14738/tmlai.22.150

Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition

Abstract

Cite

Register to see more suggestions