A comparison of string metrics for matching names and records

William W Cohen; Pradeep Ravikumar; Stephen E Fienberg

Journal Article

A comparison of string metrics for matching names and records

Cohen W
Ravikumar P
Fienberg S

KDD Workshop on Data Cleaning and Object Consolidation (2003) 3 73-78

DOI: citeulike-article-id:964346

N/ACitations

329Readers

Get full text

Abstract

Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme, which was developed in the probabilistic record linkage community.

Cite

CITATION STYLE

APA

Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003). A comparison of string metrics for matching names and records. KDD Workshop on Data Cleaning and Object Consolidation, 3, 73–78. https://doi.org/citeulike-article-id:964346

A comparison of string metrics for matching names and records

Abstract

Cite

Register to see more suggestions