Deep scattering spectra with Deep Neural Networks for LVCSR tasks

  • Sainath T
  • Peddinti V
  • Kingsbury B
 et al. 
  • 33

    Readers

    Mendeley users who have this article in their library.
  • 6

    Citations

    Citations of this article.

Abstract

Log-mel filterbank features, which are commonly used features for CNNs, can remove higher-resolution information from the speech signal. A novel technique, known as Deep Scattering Spectrum (DSS), addresses this issue and looks to preserve this information. DSS features have shown promise on TIMIT, both for classification and recognition. In this paper, we extend the use of DSS features for LVCSR tasks. First, we explore the optimal multi-resolution time and frequency scattering operations for LVCSR tasks. Next, we explore techniques to reduce the dimension of the DSS features. We also incorporate speaker adaptation techniques into the DSS features. Results on a 50 and 430 hour English Broadcast News task show that the DSS features provide between a 4-7% relative improvement in WER over log-mel features, within a state-of-the-art CNN framework which incorporates speaker-adaptation and sequence training. Finally, we show that DSS features are similar to multi-resolution log-mel + MFCCs, and similar improvements can be obtained with this representation.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

  • PUI: 600413172
  • ISSN: 19909772
  • SCOPUS: 2-s2.0-84910098075
  • SGR: 84910098075

Authors

  • Tara N. Sainath

  • Vijayaditya Peddinti

  • Brian Kingsbury

  • Petr Fousek

  • Bhuvana Ramabhadran

  • David Nahamoo

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free