Similarity search on massive data based on FPGA

Yanzheng Wang; Hong Gao; Shengfei Shi; Hongzhi Wang

Conference Proceedings

Similarity search on massive data based on FPGA

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9645 343-352

DOI: 10.1007/978-3-319-32055-7_28

0Citations

5Readers

Get full text

Abstract

Data quality is a very important question in massive data process. When we want to distill valuable knowledge from a mass set of data, the key point is to know whether the dataset is clean. So before we extract useful massage from the dataset we’d better do some data clean job. Similarity search is a very important method in data clean. MapReduce will be used to do similarity search in our data clean system. But the efficiency is very low. We found that when we process the massive data stored in HDFS with MapReduce programing model every part of the dataset will be scanned and this is very time-consuming especially for large scale dataset. In this paper we will do filter operation on original data with hardware before we use similarity search to do data clean.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., Gao, H., Shi, S., & Wang, H. (2016). Similarity search on massive data based on FPGA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9645, pp. 343–352). Springer Verlag. https://doi.org/10.1007/978-3-319-32055-7_28

Similarity search on massive data based on FPGA

Abstract

Author supplied keywords

Cite

Register to see more suggestions