Abstract
Recently, machine learning techniques have been usedto solve the record deduplication problem. However,these techniques require examples, manually generatedin most cases, for training purposes. This hinders theuse of such techniques because of the cost required tocreate the set of examples. In this article, we proposean approach based on a deterministic technique toautomatically suggest training examples for adeduplication method based on genetic programming. Ourexperiments with synthetic datasets show that, by usingonly 15percent of the examples suggested by ourapproach, it is possible to achieve results in terms ofF1 that are equivalent to those obtained when using allthe examples, leading to savings in training time of upto 85percent
Author supplied keywords
Cite
CITATION STYLE
Gonçalves, G. S., Carvalho, M. G. D., Laender, A. H. F., & Gonçalves, M. A. (2010). Automatic Selection of Training Examples for a Record Deduplication Method Based on Genetic Programming. Group, 1(2), 213–228.
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.