Parallel top-k query processing on uncertain strings using mapreduce

Hui Xu; Xiaofeng Ding; Hai Jin; Wenbin Jiang

Conference Proceedings

Parallel top-k query processing on uncertain strings using mapreduce

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9050 89-103

DOI: 10.1007/978-3-319-18123-3_6

4Citations

3Readers

Get full text

Abstract

Top-k query is an important and essential operator for data analysis over string collections. However, when uncertainty comes into big data, it calls for new parallel algorithms for efficient query processing on large scale uncertain strings. In this paper, we proposed aMapReducebased parallel algorithm, called MUSK, for answering top-k queries over large scale uncertain strings. We used the probabilistic n-grams to generate key-value pairs. To improve the performance, a novel lower bound for expected edit distance was derived to prune strings based on a new defined function gram mapping distance. By integrating the bound with TA, the filtering power in the Map stage was optimized effectively to decrease the transmission cost. Comprehensive experimental results on both real-world and synthetic datasets showed that MUSK outperformed the baseline approach with speeds up to 6 times in the best case, which indicated good scalability over large datasets.

Cite

CITATION STYLE

APA

Xu, H., Ding, X., Jin, H., & Jiang, W. (2015). Parallel top-k query processing on uncertain strings using mapreduce. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9050, pp. 89–103). Springer Verlag. https://doi.org/10.1007/978-3-319-18123-3_6

Parallel top-k query processing on uncertain strings using mapreduce

Abstract

Cite

Register to see more suggestions