A comparison of string metrics for matching names and records

  • Cohen W
  • Ravikumar P
  • Fienberg S
N/ACitations
Citations of this article
326Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme, which was developed in the probabilistic record linkage community.

Cite

CITATION STYLE

APA

Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003). A comparison of string metrics for matching names and records. KDD Workshop on Data Cleaning and Object Consolidation, 3, 73–78. https://doi.org/citeulike-article-id:964346

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free