Extraction of name and transliteration in monolingual and parallel corpora

Tracy Lin; Jian Cheng Wu; Jason S. Chang

Journal Article

Extraction of name and transliteration in monolingual and parallel corpora

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3265 177-186

DOI: 10.1007/978-3-540-30194-3_20

5Citations

36Readers

Get full text

Abstract

Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Lin, T., Wu, J. C., & Chang, J. S. (2004). Extraction of name and transliteration in monolingual and parallel corpora. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3265, 177–186. https://doi.org/10.1007/978-3-540-30194-3_20

Extraction of name and transliteration in monolingual and parallel corpora

Abstract

Cite

Register to see more suggestions