We report on a first attempt to perform cross-language spoken document retrieval. Without prior monolingual speech retrieval experience we applied the same general approach we use for bilingual retrieval that is typified by the use of overlapping character n-grams for tokenization and a statistical language model of retrieval. An innovative approach was adopted for coping with out-of-vocabulary words and misspelled or mistranscribed words: direct translation of individual n-grams was the sole mechanism to translate source language queries into target language terms. Though this approach shows promise, especially for non-speech retrieval, our performance appears to lag that of other teams participating in this novel evaluation. © Springer-Verlag 2004.
CITATION STYLE
Mcnamee, P., & Mayfield, J. (2004). N-grams for translation and retrieval in CL-SDR. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3237, 658–663. https://doi.org/10.1007/978-3-540-30222-3_63
Mendeley helps you to discover research relevant for your work.