We present two simple but effective smoothing techniqes for the standard language model (LM) approach to information retrieval [12]. First, we extend the unigram Dirichlet smoothing technique popular in IR [17] to bigram modeling [16]. Second, we propose a method of collection expansion for more robust estimation of the LM prior, particularly intended for sparse collections. Retrieval experiments on the MALACH archive [9] of automatically transcribed and manually summarized spontaneous speech interviews demonstrates strong overall system performance and the relative contribution of our extensions. © 2008 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Lease, M., & Charniak, E. (2008). A Dirichlet-smoothed bigram model for retrieving spontaneous speech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5152 LNCS, pp. 687–694). Springer Verlag. https://doi.org/10.1007/978-3-540-85760-0_87
Mendeley helps you to discover research relevant for your work.