Abstract
In this paper, we propose a speaker change detection system based on lexical information from the transcribed speech. For this purpose, we applied a recurrent neural network to decide if there is an end of an utterance at the end of a spoken word. Our motivation is to use the transcription of the conversation as an additional feature for a speaker diarization system to refine the segmentation step to achieve better accuracy of the whole diarization system. We compare the proposed speaker change detection system based on transcription (text) with our previous system based on information from spectrogram (audio) and combine these two modalities to improve the results of diarization. We cut the conversation into segments according to the detected changes and represent them by an i-vector. We conducted experiments on the English part of the CallHome corpus. The results indicate improvement in speaker change detection (by 0.5% relatively) and also in speaker diarization (by 1% relatively) when both modalities are used.
Author supplied keywords
Cite
CITATION STYLE
Zajíc, Z., Soutner, D., Hrúz, M., Müller, L., & Radová, V. (2018). Recurrent neural network based speaker change detection from text transcription applied in telephone speaker diarization system. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 342–350). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_37
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.