The stringdist package for approximate string matching

294Citations
Citations of this article
1.1kReaders
Mendeley users who have this article in their library.

Abstract

Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R's native exact matching functions match and %in%.

Cite

CITATION STYLE

APA

van der Loo, M. P. J. (2014). The stringdist package for approximate string matching. R Journal, 6(1), 111–122. https://doi.org/10.32614/rj-2014-011

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free