fuzzylink: Probabilistic Record Linkage Using Pretrained Text Embeddings

Links datasets through fuzzy string matching using pretrained text embeddings. Produces more accurate record linkage when lexical string distance metrics are a poor guide to match quality (e.g., "Patricia" is more lexically similar to "Patrick" than it is to "Trish"). Capable of performing multilingual record linkage. Methods are described in Ornstein (2025) <https://joeornstein.github.io/publications/fuzzylink.pdf>.

Version: 0.2.1
Depends: R (≥ 4.1.0)
Imports: stats, utils, dplyr, Rfast, reshape2, stringdist, stringr, httr, jsonlite, httr2, ranger
Published: 2025-06-14
DOI: 10.32614/CRAN.package.fuzzylink
Author: Joe Ornstein ORCID iD [aut, cre, cph]
Maintainer: Joe Ornstein <jornstein at uga.edu>
BugReports: https://github.com/joeornstein/fuzzylink/issues
License: MIT + file LICENSE
URL: https://github.com/joeornstein/fuzzylink
NeedsCompilation: no
Materials: README NEWS
CRAN checks: fuzzylink results

Documentation:

Reference manual: fuzzylink.pdf

Downloads:

Package source: fuzzylink_0.2.1.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): fuzzylink_0.2.1.tgz, r-oldrel (arm64): fuzzylink_0.2.1.tgz, r-release (x86_64): not available, r-oldrel (x86_64): not available

Linking:

Please use the canonical form https://CRAN.R-project.org/package=fuzzylink to link to this page.