Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R's native exact matching functions match and %in%.
CITATION STYLE
van der Loo, M. P. J. (2014). The stringdist package for approximate string matching. R Journal, 6(1), 111–122. https://doi.org/10.32614/rj-2014-011
Mendeley helps you to discover research relevant for your work.