Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates. © Springer-Verlag 2004.
CITATION STYLE
Lin, T., Wu, J. C., & Chang, J. S. (2004). Extraction of name and transliteration in monolingual and parallel corpora. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3265, 177–186. https://doi.org/10.1007/978-3-540-30194-3_20
Mendeley helps you to discover research relevant for your work.