Robust Dictionary Lookup in Multiple Noisy Orthographies

0Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.

Abstract

We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate’s “did you mean" feature, as well as the Yamli smart Arabic keyboard.

Cite

CITATION STYLE

APA

Zhang, L., Habash, N., & Toussaint, G. (2017). Robust Dictionary Lookup in Multiple Noisy Orthographies. In WANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop (pp. 119–129). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-1315

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free