Improving disfluency detection by self-training a self-attentive model

25Citations
Citations of this article
104Readers
Mendeley users who have this article in their library.

Abstract

Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word embeddings are pre-trained on a large amount of unlabeled data, using additional unlabeled data to train a neural model might seem redundant. However, we show that self-training - a semi-supervised technique for incorporating unlabeled data - sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations. We also show that ensembling self-trained parsers provides further gains for disfluency detection.

Cite

CITATION STYLE

APA

Lou, P. J., & Johnson, M. (2020). Improving disfluency detection by self-training a self-attentive model. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 3754–3763). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.346

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free