For speech recognition research, it is often necessary to start with a competent baseline acoustic model. But training and tuning a competent model using research recognizers such as Cambridge’sHTKandCMU’s Sphinx can be time-consuming. In an effort to minimize wasted effort, I have created recipes for HTK and Sphinx which utilize the standard Wall Street Journal training corpus. In this paper, these recipes are de- scribed. The word error rate (WER) and real-time perfor- mance of the models are evaluated for differingHMMtopolo- gies, number of tied states, number of Gaussians, and differ- ing test sets. Mygoal is to provide practical advice and results to researchers who are thinking of using HTK or Sphinx for real-time recognition on dictation-like tasks.
CITATION STYLE
Vertanen, K. (2006). Baseline WSJ acoustic models for HTK and Sphinx: Training recipes and recognition experiments. Cavendish Laboratory, University of Cambridge. Retrieved from http://medcontent.metapress.com/index/A65RM03P4874243N.pdf
Mendeley helps you to discover research relevant for your work.