Approximate substring matching over uncertain strings

Tingjian Ge; Zheng Li

Conference ProceedingsOPEN ACCESS

Approximate substring matching over uncertain strings

Proceedings of the VLDB Endowment (2011) 4(11) 772-782

DOI: 10.14778/3402707.3402717

16Citations

26Readers

Abstract

Text data is prevalent in life. Some of this data is uncertain and is best modeled by probability distributions. Examples include biological sequence data and automatic ECG annotations, among others. Approximate substring matching over uncertain texts is largely an unexplored problem in data management. In this paper, we study this intriguing question. We propose a semantics called (k, t)-matching queries and argue that it is more suitable in this context than a related semantics that has been proposed previously. Since uncertainty incurs considerable overhead on indexing as well as the final verification for a match, we devise techniques for both. For indexing, we propose a multilevel filtering technique based on measuring signature distance; for verification, we design two algorithms that give upper and lower bounds and significantly reduce the costs. We validate our algorithms with a systematic evaluation on two real-world datasets and some synthetic datasets. © 2011 VLDB Endowment.

Cite

CITATION STYLE

APA

Ge, T., & Li, Z. (2011). Approximate substring matching over uncertain strings. In Proceedings of the VLDB Endowment (Vol. 4, pp. 772–782). VLDB Endowment. https://doi.org/10.14778/3402707.3402717

Approximate substring matching over uncertain strings

Abstract

Cite

Register to see more suggestions