Transliteration equivalence using canonical correlation analysis

7Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We address the problem of Transliteration Equivalence, i.e. determining whether a pair of words in two different languages (e.g. Auden,) are name transliterations or not. This problem is at the heart of Mining Name Transliterations (MINT) from various sources of multilingual text data including parallel, comparable, and non-comparable corpora and multilingual news streams. MINT is useful in several cross-language tasks including Cross-Language Information Retrieval (CLIR), Machine Translation (MT), and Cross-Language Named Entity Retrieval. We propose a novel approach to Transliteration Equivalence using language-neutral representations of names. The key idea is to consider name transliterations in two languages as two views of the same semantic object and compute a low-dimensional common feature space using Canonical Correlation Analysis (CCA). Similarity of the names in the common feature space forms the basis for classifying a pair of names as transliterations. We show that our approach outperforms state-of-the-art baselines in the CLIR task for Hindi-English (3 collections) and Tamil-English (2 collections). © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Udupa, R., & Khapra, M. M. (2010). Transliteration equivalence using canonical correlation analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5993 LNCS, pp. 75–86). Springer Verlag. https://doi.org/10.1007/978-3-642-12275-0_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free