Distributed learning of multilingual DNN feature extractors using GPUs

  • Miao Y
  • Zhang H
  • Metze F
  • 27


    Mendeley users who have this article in their library.
  • 11


    Citations of this article.


Multilingual deep neural networks (DNNs) can act as deep feature extractors and have been applied successfully to cross- language acoustic modeling. Learning these feature extractors becomes an expensive task, because of the enlarged multi- lingual training data and the sequential nature of stochastic gradient descent (SGD). This paper investigates strategies to accelerate the learning process over multiple GPU cards. We propose the DistModel and DistLang frameworks which distribute feature extractor learning by models and languages respectively. The time-synchronous DistModel has the nice property of tolerating infrequent model averaging. With 3 GPUs, DistModel achieves 2.6? speed-up and causes no loss on word error rates. When using DistLang, we observe better acceleration but worse recognition performance. Further eva- luations are conducted to scale DistModel to more languages and GPU cards.

Author-supplied keywords

  • Automatic speech recognition
  • Deep neural networks
  • Distributed learning

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

  • SCOPUS: 2-s2.0-84910068044
  • ISSN: 19909772
  • SGR: 84910068044
  • PUI: 600412698


  • Yajie Miao

  • Hao Zhang

  • Florian Metze

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free