Named Entity transliteration and discovery from multilingual comparable corpora

Alexandre Klementiev; Dan Roth

Conference ProceedingsOPEN ACCESS

Named Entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference (2006) 82-88

DOI: 10.3115/1220835.1220846

39Citations

14Readers

Abstract

Named Entity recognition (NER) is an important part of many natural language processing tasks. Most current approaches employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an algorithm to automatically discover Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language. We observe that NEs have similar time distributions across such corpora, and that they are often transliterated, and develop an algorithm that exploits both iteratively. The algorithm makes use of a new, frequency based, metric for time distributions and a resource free discriminative approach to transliteration. We evaluate the algorithm on an English-Russian corpus, and show high level of NEs discovery in Russian. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Klementiev, A., & Roth, D. (2006). Named Entity transliteration and discovery from multilingual comparable corpora. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference (pp. 82–88). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220835.1220846

Named Entity transliteration and discovery from multilingual comparable corpora

Abstract

Cite

Register to see more suggestions