Named Entity recognition (NER) is an important part of many natural language processing tasks. Most current approaches employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an algorithm to automatically discover Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language. We observe that NEs have similar time distributions across such corpora, and that they are often transliterated, and develop an algorithm that exploits both iteratively. The algorithm makes use of a new, frequency based, metric for time distributions and a resource free discriminative approach to transliteration. We evaluate the algorithm on an English-Russian corpus, and show high level of NEs discovery in Russian. © 2006 Association for Computational Linguistics.
CITATION STYLE
Klementiev, A., & Roth, D. (2006). Named Entity transliteration and discovery from multilingual comparable corpora. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference (pp. 82–88). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220835.1220846
Mendeley helps you to discover research relevant for your work.