Similarity search on massive data based on FPGA

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data quality is a very important question in massive data process. When we want to distill valuable knowledge from a mass set of data, the key point is to know whether the dataset is clean. So before we extract useful massage from the dataset we’d better do some data clean job. Similarity search is a very important method in data clean. MapReduce will be used to do similarity search in our data clean system. But the efficiency is very low. We found that when we process the massive data stored in HDFS with MapReduce programing model every part of the dataset will be scanned and this is very time-consuming especially for large scale dataset. In this paper we will do filter operation on original data with hardware before we use similarity search to do data clean.

Cite

CITATION STYLE

APA

Wang, Y., Gao, H., Shi, S., & Wang, H. (2016). Similarity search on massive data based on FPGA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9645, pp. 343–352). Springer Verlag. https://doi.org/10.1007/978-3-319-32055-7_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free