Direct combination of spelling and pronunciation information for robust back-transliteration

Slaven Bilac; Hozumi Tanaka

Conference Proceedings

Direct combination of spelling and pronunciation information for robust back-transliteration

Lecture Notes in Computer Science (2005) 3406 413-424

DOI: 10.1007/978-3-540-30586-6_44

7Citations

6Readers

Get full text

Abstract

Transliterating words and names from one language to another is a frequent and highly productive phenomenon. For example, English word cache is transliterated in Japanese as "kyasshu". Transliteration is information losing since important distinctions are not always preserved in the process. Hence, automatically converting transliterated words back into their original form is a real challenge. Nonetheless, due to its wide applicability in MT and CLIR, it is an interesting problem from a practical point of view. In this paper, we demonstrate that back-transliteration accuracy can be improved by directly combining grapheme-based (i.e. spelling) and phoneme-based (i.e. pronunciation) information. Rather than producing back-transliterations based on grapheme and phoneme model independently and then interpolating the results, we propose a method of first combining the sets of allowed rewrites (i.e. edits) and then calculating the back-transliterations using the combined set. Evaluation on both Japanese and Chinese transliterations shows that direct combination increases robustness and positively affects back-transliteration accuracy. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Bilac, S., & Tanaka, H. (2005). Direct combination of spelling and pronunciation information for robust back-transliteration. In Lecture Notes in Computer Science (Vol. 3406, pp. 413–424). Springer Verlag. https://doi.org/10.1007/978-3-540-30586-6_44

Direct combination of spelling and pronunciation information for robust back-transliteration

Abstract

Cite

Register to see more suggestions