In this paper, we propose a lexicon optimization method based on a confusability measure (CM) to develop a large vocabulary continuous speech recognition (LVCSR) system with unseen words. When a lexicon is built or expanded for unseen words by using grapheme-to-phoneme (G2P) conversion, the lexicon size increases because G2P is generally realized by 1-to-N-best mapping. Thus, the proposed method attempts to prune the confusable words in the lexicon by a CM defined as the acoustic model distance between two phonemic sequences. It is demonstrated through the LVCSR experiments that the proposed lexicon optimization method achieves a relative word error rate (WER) reduction of 14.72% in a Wall Street Journal task compared to the 1-to-4-best G2P converted lexicon approach.
CITATION STYLE
Kim, N. K., Seong, W. K., & Kim, H. K. (2015). Lexicon optimization for wfst-based speech recognition using acoustic distance based confusability measure and G2P conversion. In Natural Language Dialog Systems and Intelligent Assistants (pp. 119–217). Springer International Publishing. https://doi.org/10.1007/978-3-319-19291-8_12
Mendeley helps you to discover research relevant for your work.