Large-scale similarity join with edit-distance constraints

Chen Lin; Haiyang Yu; Wei Weng; Xianmang He

Conference Proceedings

Large-scale similarity join with edit-distance constraints

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8422 LNCS(PART 2) 328-342

DOI: 10.1007/978-3-319-05813-9_22

7Citations

9Readers

Get full text

Abstract

In the age of big data, the data quality problem is more severe than ever. As an essential step in data cleaning, similarity join has attracted lots of attentions from the database community. In this work, to address the similarity join problem with edit-distance constraints, we first improve the partition-based join algorithm for small scale data. Then we extend the algorithm based on MapReduce framework for large-scale data. Extensive experiments on both real and simulated datasets demonstrate the efficiency of our algorithms. © 2014 Springer International Publishing Switzerland.

Author supplied keywords

Cite

CITATION STYLE

APA

Lin, C., Yu, H., Weng, W., & He, X. (2014). Large-scale similarity join with edit-distance constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8422 LNCS, pp. 328–342). Springer Verlag. https://doi.org/10.1007/978-3-319-05813-9_22

Large-scale similarity join with edit-distance constraints

Abstract

Author supplied keywords

Cite

Register to see more suggestions