Named Entity transliteration and discovery from multilingual comparable corpora

39Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Named Entity recognition (NER) is an important part of many natural language processing tasks. Most current approaches employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an algorithm to automatically discover Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language. We observe that NEs have similar time distributions across such corpora, and that they are often transliterated, and develop an algorithm that exploits both iteratively. The algorithm makes use of a new, frequency based, metric for time distributions and a resource free discriminative approach to transliteration. We evaluate the algorithm on an English-Russian corpus, and show high level of NEs discovery in Russian. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Klementiev, A., & Roth, D. (2006). Named Entity transliteration and discovery from multilingual comparable corpora. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference (pp. 82–88). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220835.1220846

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free