An adaptive similarity search in massive datasets

Trong Nhan Phan; Josef Küng; Tran Khanh Dang

Journal Article

An adaptive similarity search in massive datasets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9480 45-74

DOI: 10.1007/978-3-662-49175-1_3

2Citations

4Readers

Get full text

Abstract

Similarity search is an important task engaging in different fields of studies as well as in various application domains. The era of big data, however, has been posing challenges on existing information systems in general and on similarity search in particular. Aiming at large-scale data processing, we propose an adaptive similarity search in massive datasets with MapReduce. Additionally, our proposed scheme is both applicable and adaptable to popular similarity search cases such as pairwise similarity, search-by-example, range queries, and k-Nearest Neighbour queries. Moreover, we embed our collaborative refinements to effectively minimize irrelevant data objects as well as unnecessary computations. Furthermore, we experience our proposed methods with the two different document models known as shingles and terms. Last but not least, we conduct intensive empirical experiments not only to verify these methods themselves but also to compare them with a previous related work on real datasets. The results, after all, confirm the effectiveness of our proposed methods and show that they outperform the previous work in terms of query processing.

Author supplied keywords

Cite

CITATION STYLE

APA

Phan, T. N., Küng, J., & Dang, T. K. (2016). An adaptive similarity search in massive datasets. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9480, 45–74. https://doi.org/10.1007/978-3-662-49175-1_3

An adaptive similarity search in massive datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions