Combining continuous word representation and prosodic features for ASR error prediction

15Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent advances in continuous word representation have been successfully used in several natural language processing tasks. This paper focuses on error prediction in Automatic Speech Recognition (ASR) outputs and proposes to investigate the use of continuous word representation (word embeddings) within a neural network architecture. The main contribution of this paper is about word embeddings combination: several combination approaches are proposed in order to take advantage of their complementarity. The use of prosodic features, in addition to classical syntactic ones, is evaluated. Experiments are made on automatic transcriptions generated by the LIUM ASR system applied on the ETAPE corpus. They show that the proposed neural architecture, using an effective continuous word representation combination and prosodic features as additional features, outperforms significantly state-of-the-art approach based on the use of Conditional Random Fields. Last, the proposed system produces a well calibrated confidence measure, evaluated in terms of Normalized Cross Entropy.

Cite

CITATION STYLE

APA

Ghannay, S., Estève, Y., Camelin, N., Dutrey, C., Santiago, F., & Adda-Decker, M. (2015). Combining continuous word representation and prosodic features for ASR error prediction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9449, pp. 84–95). Springer Verlag. https://doi.org/10.1007/978-3-319-25789-1_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free