On evaluation and training-set construction for duplicate detection

  • Bilenko M
  • Mooney R
N/ACitations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

A variety of experimental methodologies have been used to evalu- ate the accuracy of duplicate-detection systems. We advocate pre- senting precision-recall curves as the most informative evaluation methodology. We also discuss a number of issues that arise when evaluating and assembling training data for adaptive systems that use machine learning to tune themselves to specific applications. We consider several different application scenarios and experimen- tally examine the effectiveness of alternative methods of collecting training data under each scenario. We propose two new approaches to collecting training data called static-active learning and weakly- labeled non-duplicates, and present experimental results on their effectiveness.

Cite

CITATION STYLE

APA

Bilenko, M., & Mooney, R. J. (2003). On evaluation and training-set construction for duplicate detection. Proceedings of the KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, (June), 7–12. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.5309&rep=rep1&type=pdf

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free