How does BERT process disfluency?

Ye Tian; Tim Nieradzik; Sepehr Jalali; Da Shan Shiu

Conference Proceedings

How does BERT process disfluency?

SIGDIAL 2021 - 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (2021) 208-217

DOI: 10.18653/v1/2021.sigdial-1.22

2Citations

53Readers

Get full text

Abstract

Natural conversations are filled with disfluencies. This study investigates if and how BERT understands disfluency with three experiments: (1) a behavioural study using a downstream task, (2) an analysis of sentence embeddings and (3) an analysis of the attention mechanism on disfluency. The behavioural study shows that without fine-tuning on disfluent data, BERT does not suffer significant performance loss when presented disfluent compared to fluent inputs (exp1). Analysis on sentence embeddings of disfluent and fluent sentence pairs reveals that the deeper the layer, the more similar their representation (exp2). This indicates that deep layers of BERT become relatively invariant to disfluency. We pinpoint attention as a potential mechanism that could explain this phenomenon (exp3). Overall, the study suggests that BERT has knowledge of disfluency structure. We emphasise the potential of using BERT to understand natural utterances without disfluency removal.

Cite

CITATION STYLE

APA

Tian, Y., Nieradzik, T., Jalali, S., & Shiu, D. S. (2021). How does BERT process disfluency? In SIGDIAL 2021 - 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (pp. 208–217). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.sigdial-1.22

How does BERT process disfluency?

Abstract

Cite

Register to see more suggestions