Detecting overlapping speech with long short-term memory recurrent neural networks

  • Geiger J
  • Eyben F
  • Schuller B
 et al. 
  • 29


    Mendeley users who have this article in their library.
  • 9


    Citations of this article.


Detecting segments of overlapping speech (when two or more speakers are active at the same time) is a challenging problem. Previously, mostly HMM-based systems have been used for overlap detection, employing various different audio features. In this work, we propose a novel overlap detection system using Long Short-Term Memory (LSTM) recurrent neural networks. LSTMs are used to generate framewise overlap predictions which are applied for overlap detection. Furthermore, a tandem HMM-LSTM system is obtained by adding LSTM predictions to the HMM feature set. Experiments with the AMI corpus show that overlap detection performance of LSTMs is comparable to HMMs. The combination of HMMs and LSTMs improves overlap detection by achieving higher recall. Copyright © 2013 ISCA.

Author-supplied keywords

  • Long short-term memory
  • Neural networks
  • Speaker diarization
  • Speech overlap detection

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

  • SCOPUS: 2-s2.0-84906242216
  • SGR: 84906242216
  • PUI: 373776593
  • ISSN: 19909772


  • Jürgen T. Geiger

  • Florian Eyben

  • Björn Schuller

  • Gerhard Rigoll

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free