Adapting translation models for transcript disfluency detection

42Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

Abstract

Transcript disfluency detection (TDD) is an important component of the real-time speech translation system, which arouses more and more interests in recent years. This paper presents our study on adapting neural machine translation (NMT) models for TDD. We propose a general training framework for adapting NMT models to TDD task rapidly. In this framework, the main structure of the model is implemented similar to the NMT model. Additionally, several extended modules and training techniques which are independent of the NMT model are proposed to improve the performance, such as the constrained decoding, denoising autoencoder initialization and a TDD-specific training object. With the proposed training framework, we achieve significant improvement. However, it is too slow in decoding to be practical. To build a feasible and production-ready solution for TDD, we propose a fast non-autoregressive TDD model following the non-autoregressive NMT model emerged recently. Even we do not assume the specific architecture of the NMT model, we build our TDD model on the basis of Transformer, which is the state-of-the-art NMT model. We conduct extensive experiments on the publicly available set, Switchboard, and in-house Chinese set. Experimental results show that the proposed model significantly outperforms previous state-ofthe-art models.

References Powered by Scopus

Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

235Citations
N/AReaders
Get full text

Disfluency detection with a semi-markov model and prosodic features

53Citations
N/AReaders
Get full text

Using integer linear programming for detecting speech disfluencies

43Citations
N/AReaders
Get full text

Cited by Powered by Scopus

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

82Citations
N/AReaders
Get full text

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

48Citations
N/AReaders
Get full text

Consecutive Decoding for Speech-to-text Translation

32Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Dong, Q., Wang, F., Yang, Z., Chen, W., Xu, S., & Xu, B. (2019). Adapting translation models for transcript disfluency detection. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 6351–6358). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33016351

Readers over time

‘19‘20‘21‘22‘23‘240481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 11

73%

Researcher 2

13%

Professor / Associate Prof. 1

7%

Lecturer / Post doc 1

7%

Readers' Discipline

Tooltip

Computer Science 14

78%

Engineering 2

11%

Mathematics 1

6%

Linguistics 1

6%

Save time finding and organizing research with Mendeley

Sign up for free
0