Informativeness-Based Active Learning for Entity Resolution

Victor Christen; Peter Christen; Erhard Rahm

Conference ProceedingsOPEN ACCESS

Informativeness-Based Active Learning for Entity Resolution

Communications in Computer and Information Science (2020) 1168 CCIS 125-141

DOI: 10.1007/978-3-030-43887-6_11

7Citations

6Readers

Abstract

Entity Resolution is a crucial task to integrate data from different sources to identify records that represent the same entity. Entity resolution commonly employs supervised learning techniques based on training data of matching and non-matching pairs of records and their attribute similarities as represented by similarity vectors. To reduce the amount of manual labelling to generate suitable training data, we propose a novel active learning approach that does not require any prior knowledge about true matches and that is independent of the learning method used. Our approach successively identifies new training examples based on an informativeness measure for similarity vectors by considering their relationship to already classified vectors and the uncertainty in the similarity vector space covered by the current training set. Experiments on several data sets show that even for a small labelling effort our approach achieves comparable results to fully supervised approaches and it can outperform previous active learning approaches for entity resolution.

Author supplied keywords

Cite

CITATION STYLE

APA

Christen, V., Christen, P., & Rahm, E. (2020). Informativeness-Based Active Learning for Entity Resolution. In Communications in Computer and Information Science (Vol. 1168 CCIS, pp. 125–141). Springer. https://doi.org/10.1007/978-3-030-43887-6_11

Informativeness-Based Active Learning for Entity Resolution

Abstract

Author supplied keywords

Cite

Register to see more suggestions