Similarity of names across scripts: Edit distance using learned costs of N-grams

Bruno Pouliquen

Conference Proceedings

Similarity of names across scripts: Edit distance using learned costs of N-grams

Pouliquen B

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5221 LNAI 405-416

DOI: 10.1007/978-3-540-85287-2_39

4Citations

8Readers

Get full text

Abstract

Any cross-language processing application has to first tackle the problem of transliteration when facing a language using another script. The first solution consists of using existing transliteration tools, but these tools are not usually suitable for all purposes. For some specific script pairs they do not even exist. Our aim is to discriminate transliterations across different scripts in a unified way using a learning method that builds a transliteration model out of a set of transliterated proper names. We compare two strings using an algorithm that builds a Levenshtein edit distance using n-grams costs. The evaluations carried out show that our similarity measure is accurate. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Pouliquen, B. (2008). Similarity of names across scripts: Edit distance using learned costs of N-grams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5221 LNAI, pp. 405–416). https://doi.org/10.1007/978-3-540-85287-2_39

Similarity of names across scripts: Edit distance using learned costs of N-grams

Abstract

Author supplied keywords

Cite

Register to see more suggestions