Optimizing monitoring queries over distributed data

Frank Neven; Dieter Van De Craen

Conference Proceedings

Optimizing monitoring queries over distributed data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 3896 LNCS 829-846

DOI: 10.1007/11687238_49

2Citations

4Readers

Get full text

Abstract

Scientific data in the life sciences is distributed over various independent multi-format databases and is constantly expanding, We discuss a scenario where a life science research lab monitors over time the results of queries to remote databases beyond their control. Queries are registered at a local system and get executed on a daily basis in batch mode. The goal of the paper is to study evaluation strategies minimizing the total number of accesses to databases when evaluating all queries in bulk. We use an abstraction based on the relational model with fan-out constraints and conjunctive queries. We show that the above problem remains NP-hard in two restricted settings: queries of bounded depth and the scenario with a fixed schema. We further show that both restrictions taken together results in a tractable problem. As the constant for the latter algorithm is too high to be feasible in practice, we present four heuristic methods that are experimentally compared on randomly generated and biologically motivated schemas. Our algorithms are based on a greedy method and approximations for the shortest common super sequence problem. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Neven, F., & Van De Craen, D. (2006). Optimizing monitoring queries over distributed data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3896 LNCS, pp. 829–846). https://doi.org/10.1007/11687238_49

Optimizing monitoring queries over distributed data

Abstract

Cite

Register to see more suggestions