PSST! Prosodic Speech Segmentation with Transformers

Nathan Roll; Calbert Graham; Simon Todd

Conference ProceedingsOPEN ACCESS

PSST! Prosodic Speech Segmentation with Transformers

CoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings (2023) 476-487

DOI: 10.18653/v1/2023.conll-1.31

1Citations

16Readers

Abstract

We develop and probe a model for detecting the boundaries of prosodic chunks in untranscribed conversational English speech. The model is obtained by fine-tuning a Transformer-based speech-to-text (STT) model to integrate the identification of Intonation Unit (IU) boundaries with the STT task. The model shows robust performance, both on held-out data and on out-of-distribution data representing different dialects and transcription protocols. By evaluating the model on degraded speech data, and comparing it with alternatives, we establish that it relies heavily on lexico-syntactic information inferred from audio, and not solely on acoustic information typically understood to cue prosodic structure. We release our model1 as both a transcription tool and a baseline for further improvements in prosodic segmentation.

Cite

CITATION STYLE

APA

Roll, N., Graham, C., & Todd, S. (2023). PSST! Prosodic Speech Segmentation with Transformers. In CoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings (pp. 476–487). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.conll-1.31

PSST! Prosodic Speech Segmentation with Transformers

Abstract

Cite

Register to see more suggestions