Automatic Selection of Training Examples for a Record Deduplication Method Based on Genetic Programming

  • Gonçalves G
  • Carvalho M
  • Laender A
  • et al.
N/ACitations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

Recently, machine learning techniques have been usedto solve the record deduplication problem. However,these techniques require examples, manually generatedin most cases, for training purposes. This hinders theuse of such techniques because of the cost required tocreate the set of examples. In this article, we proposean approach based on a deterministic technique toautomatically suggest training examples for adeduplication method based on genetic programming. Ourexperiments with synthetic datasets show that, by usingonly 15percent of the examples suggested by ourapproach, it is possible to achieve results in terms ofF1 that are equivalent to those obtained when using allthe examples, leading to savings in training time of upto 85percent

Cite

CITATION STYLE

APA

Gonçalves, G. S., Carvalho, M. G. D., Laender, A. H. F., & Gonçalves, M. A. (2010). Automatic Selection of Training Examples for a Record Deduplication Method Based on Genetic Programming. Group, 1(2), 213–228.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free