Approximate substring matching over uncertain strings

16Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

Abstract

Text data is prevalent in life. Some of this data is uncertain and is best modeled by probability distributions. Examples include biological sequence data and automatic ECG annotations, among others. Approximate substring matching over uncertain texts is largely an unexplored problem in data management. In this paper, we study this intriguing question. We propose a semantics called (k, t)-matching queries and argue that it is more suitable in this context than a related semantics that has been proposed previously. Since uncertainty incurs considerable overhead on indexing as well as the final verification for a match, we devise techniques for both. For indexing, we propose a multilevel filtering technique based on measuring signature distance; for verification, we design two algorithms that give upper and lower bounds and significantly reduce the costs. We validate our algorithms with a systematic evaluation on two real-world datasets and some synthetic datasets. © 2011 VLDB Endowment.

Cite

CITATION STYLE

APA

Ge, T., & Li, Z. (2011). Approximate substring matching over uncertain strings. In Proceedings of the VLDB Endowment (Vol. 4, pp. 772–782). VLDB Endowment. https://doi.org/10.14778/3402707.3402717

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free