N-grams for translation and retrieval in CL-SDR

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We report on a first attempt to perform cross-language spoken document retrieval. Without prior monolingual speech retrieval experience we applied the same general approach we use for bilingual retrieval that is typified by the use of overlapping character n-grams for tokenization and a statistical language model of retrieval. An innovative approach was adopted for coping with out-of-vocabulary words and misspelled or mistranscribed words: direct translation of individual n-grams was the sole mechanism to translate source language queries into target language terms. Though this approach shows promise, especially for non-speech retrieval, our performance appears to lag that of other teams participating in this novel evaluation. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Mcnamee, P., & Mayfield, J. (2004). N-grams for translation and retrieval in CL-SDR. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3237, 658–663. https://doi.org/10.1007/978-3-540-30222-3_63

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free