Abstract
Speech data is an extremely rich and important source of information. However, we lack suitable methods for the semantic annotation of speech data. For instance, semantic role labeling (SRL) of speech that has been transcribed by an automated speech recognition (ASR) system is still an unsolved problem. SRL of ASR data is difficult and complex due to the absence of sentence boundaries, punctuation, grammar errors, words that are wrongly transcribed, and word deletions and insertions. In this paper we propose a novel approach to SRL of ASR data based on the following idea: (1) train the SRL system on data segmented into frames, where each frame consists of a predicate and its semantic roles without considering sentence boundaries; (2) label it with the semantics of PropBank roles; and to assist the above (3) train a part-of-speech (POS) tagger to work on noisy and error prone ASR data. Experiments with the OntoNotes corpus show improvements compared to the state-of-the-art SRL applied on ASR data.
Author supplied keywords
Cite
CITATION STYLE
Shrestha, N., & Moens, M. F. (2018). Semantic role labeling of speech transcripts without sentence boundaries. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 379–387). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_41
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.