Everyday the newswire introduce events from all over the world, highlighting new names of persons, locations and organizations with different origins. These names appear as Out of Vocabulary (OOV) words for Machine translation, cross lingual information retrieval, and many other NLP applications. One way to deal with OOV words is to transliterate the unknown words, that is, to render them in the orthography of the second language. We introduce a statistical approach for transliteration only using the bilingual resources released in the shared task and without any previous knowledge of the target languages. Mapping the Transliteration problem to the Machine Translation problem, we make use of the phrase based SMT approach and apply it on substrings of names. In the English to Russian task, we report ACC (Accuracy in top-1) of 0.545, Mean F-score of 0.917, and MRR (Mean Reciprocal Rank) of 0.596. Due to time constraints, we made a single experiment in the English to Chinese task, reporting ACC, Mean F-score, and MRR of 0.411, 0.737, and 0.464 respectively. Finally, it is worth mentioning that the system is language independent since the author is not aware of either languages used in the experiments.
CITATION STYLE
Noeman, S. (2009). Language independent transliteration system using phrase based SMT approach on substrings. In NEWS 2009 - 2009 Named Entities Workshop: Shared Task on Transliteration at the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 (pp. 112–115). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699705.1699734
Mendeley helps you to discover research relevant for your work.