Latent Semantic Information in Maximum Entropy Language Models for Conversational Speech Recognition

8Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

Latent semantic analysis (LSA), first exploited in indexing documents for information retrieval, has since been used by several researchers to demonstrate impressive reductions in the perplexity of statistical language models on text corpora such as the Wall Street Journal. In this paper we present an investigation into the use of LSA in language modeling for conversational speech recognition. We find that previously proposed methods of combining an LSA-based unigram model with an N-gram model yield much smaller reductions in perplexity on speech transcriptions than has been reported on written text. We next present a family of exponential models in which LSA similarity is a feature of a word-history pair. The maximum entropy model in this family yields a greater reduction in perplexity, and statistically significant improvements in recognition accuracy over a trigram model on the Switchboard corpus. We conclude with a comparison of this LSA-featured model with a previously proposed topic-dependent maximum entropy model.

References Powered by Scopus

SWITCHBOARD: Telephone speech corpus for research and development

1518Citations
N/AReaders
Get full text

A maximum entropy approach to adaptive statistical language modelling

409Citations
N/AReaders
Get full text

Exploiting latent sematic information in statistical language modeling

307Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Multilingual Speech Processing

125Citations
N/AReaders
Get full text

Language models based on semantic composition

54Citations
N/AReaders
Get full text

Large-scale latent semantic analysis

6Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Deng, Y., & Khudanpur, S. (2003). Latent Semantic Information in Maximum Entropy Language Models for Conversational Speech Recognition. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003. Association for Computational Linguistics (ACL). https://doi.org/10.3115/1073445.1073453

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 19

58%

Researcher 8

24%

Professor / Associate Prof. 4

12%

Lecturer / Post doc 2

6%

Readers' Discipline

Tooltip

Computer Science 29

83%

Linguistics 4

11%

Neuroscience 1

3%

Social Sciences 1

3%

Save time finding and organizing research with Mendeley

Sign up for free