Febrl – A parallel open source data linkage system

80Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In many data mining projects information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. Most of the time the linkage process is challenged by the lack of a common unique entity identifier, and thus becomes non-trivial. Linking todays large data collections becomes increasingly difficult using traditional linkage techniques. In this paper we present an innovating data linkage system called Febrl, which includes a new probabilistic approach for improved data cleaning and standardisation, innovative indexing methods, a parallelisation approach which is implemented transparently to the user, and a data set generator which allows the random creation of records containing names and addresses. Implemented as open source software, Febrl is an ideal experimental platform for new linkage algorithms and techniques.

Cite

CITATION STYLE

APA

Christen, P., Churches, T., & Hegland, M. (2004). Febrl – A parallel open source data linkage system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3056, pp. 638–647). Springer Verlag. https://doi.org/10.1007/978-3-540-24775-3_75

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free