A Dirichlet-smoothed bigram model for retrieving spontaneous speech

Matthew Lease; Eugene Charniak

Conference Proceedings

A Dirichlet-smoothed bigram model for retrieving spontaneous speech

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5152 LNCS 687-694

DOI: 10.1007/978-3-540-85760-0_87

4Citations

5Readers

Get full text

Abstract

We present two simple but effective smoothing techniqes for the standard language model (LM) approach to information retrieval [12]. First, we extend the unigram Dirichlet smoothing technique popular in IR [17] to bigram modeling [16]. Second, we propose a method of collection expansion for more robust estimation of the LM prior, particularly intended for sparse collections. Retrieval experiments on the MALACH archive [9] of automatically transcribed and manually summarized spontaneous speech interviews demonstrates strong overall system performance and the relative contribution of our extensions. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Lease, M., & Charniak, E. (2008). A Dirichlet-smoothed bigram model for retrieving spontaneous speech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5152 LNCS, pp. 687–694). Springer Verlag. https://doi.org/10.1007/978-3-540-85760-0_87

A Dirichlet-smoothed bigram model for retrieving spontaneous speech

Abstract

Cite

Register to see more suggestions