Transliteration equivalence using canonical correlation analysis

Raghavendra Udupa; Mitesh M. Khapra

Conference Proceedings

Transliteration equivalence using canonical correlation analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 5993 LNCS 75-86

DOI: 10.1007/978-3-642-12275-0_10

7Citations

9Readers

Get full text

Abstract

We address the problem of Transliteration Equivalence, i.e. determining whether a pair of words in two different languages (e.g. Auden,) are name transliterations or not. This problem is at the heart of Mining Name Transliterations (MINT) from various sources of multilingual text data including parallel, comparable, and non-comparable corpora and multilingual news streams. MINT is useful in several cross-language tasks including Cross-Language Information Retrieval (CLIR), Machine Translation (MT), and Cross-Language Named Entity Retrieval. We propose a novel approach to Transliteration Equivalence using language-neutral representations of names. The key idea is to consider name transliterations in two languages as two views of the same semantic object and compute a low-dimensional common feature space using Canonical Correlation Analysis (CCA). Similarity of the names in the common feature space forms the basis for classifying a pair of names as transliterations. We show that our approach outperforms state-of-the-art baselines in the CLIR task for Hindi-English (3 collections) and Tamil-English (2 collections). © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Udupa, R., & Khapra, M. M. (2010). Transliteration equivalence using canonical correlation analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5993 LNCS, pp. 75–86). Springer Verlag. https://doi.org/10.1007/978-3-642-12275-0_10

Transliteration equivalence using canonical correlation analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions