Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition

  • Nakashika T
  • Yoshioka T
  • Takiguchi T
  • et al.
N/ACitations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we investigate the recognition of speech produced by a person with an articulation disorder resulting from athetoid cerebral palsy. The articulation of the first spoken words tends to become unstable due to strain on speech muscles, and that causes degradation of speech recognition. Therefore, we propose a robust feature extraction method using a convolutive bottleneck network (CBN) instead of the well-known MFCC. The CBN stacks multiple various types of layers, such as a convolution layer, a subsampling layer, and a bottleneck layer, forming a deep network. Applying the CBN to feature extraction for dysarthric speech, we expect that the CBN will reduce the influence of the unstable speaking style caused by the athetoid symptoms. Furthermore, we also adopt dropout in the output layer since automatically-assigned labels to the dysarthric speech are usually unreliable due to ambiguous phonemes uttered by the person with speech disorders. We confirmed its effectiveness through word-recognition experiments, where the CNN-based feature extraction method outperformed the conventional feature extraction method.

Cite

CITATION STYLE

APA

Nakashika, T., Yoshioka, T., Takiguchi, T., Ariki, Y., Duffner, S., & Garcia, C. (2014). Convolutive Bottleneck Network with Dropout for Dysarthric Speech Recognition. Transactions on Machine Learning and Artificial Intelligence, 2(2), 48–62. https://doi.org/10.14738/tmlai.22.150

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free