Efficient mapReduce-based method for massive entity matching

Pingfu Chao; Zhu Gao; Yuming Li; Junhua Fang; Rong Zhang; Aoying Zhou

Conference Proceedings

Efficient mapReduce-based method for massive entity matching

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9098 494-497

DOI: 10.1007/978-3-319-21042-1_48

0Citations

2Readers

Get full text

Abstract

Most of the state-of-the-art MapReduce-based entity matching methods inherit traditional Entity Resolution techniques on centralized system and focus on data blocking strategies in order to solve the load balancing problem occurred in distributed environment. In this paper, we propose a MapReduce-based entity matching framework for processing semi-structured and unstructured data. We use a Locality Sensitive Hash (LSH) function to generate low dimensional signatures for high dimensional entities; we introduce a series of random algorithms to ensure that similar signatures will be matched in reduce phase with high probability. Moreover, our framework contains a solution for reducing redundant similarity computation. Experiments show that our approach has a huge advantage on processing speed whilst keeps a high accuracy.

Cite

CITATION STYLE

APA

Chao, P., Gao, Z., Li, Y., Fang, J., Zhang, R., & Zhou, A. (2015). Efficient mapReduce-based method for massive entity matching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9098, pp. 494–497). Springer Verlag. https://doi.org/10.1007/978-3-319-21042-1_48

Efficient mapReduce-based method for massive entity matching

Abstract

Cite

Register to see more suggestions