To date, work on caching for OLAP workloads has focussed on using cached results from a previous query as the answer to another query. This strategy is effective when the query stream exhibits a high degree of locality. It unfortunately misses the dramatic performance improvements obtainable when the answer to a query, while not immediately available in the cache, can be computed from data in the cache. In this paper, we consider the common sub case of answering queries by aggregating data in the cache. In order to use aggregation in the cache, one must solve two sub problems: (1) determining when it is possible to answer a query by aggregating data in the cache, and (2) determining the fastest path for this aggregation, since there can be many. We present two strategies - a naive one and a Virtual Count based strategy. The virtual count based method finds if a query is computable from the cache almost instantaneously, with a small overhead of maintaining the summary state of the cache. The algorithm also maintains cost-based information that can be used to figure out the best possible option for computing a query result from the cache. Experiments with our implementation show that aggregation in the cache leads to substantial performance improvement. The virtual count based methods further improve the performance compared to the naive approaches, in terms of cache lookup and aggregation times.
CITATION STYLE
Deshpande, P. M., & Naughton, J. F. (2000). Aggregate aware caching for multi-dimensional queries. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1777, pp. 167–182). Springer Verlag. https://doi.org/10.1007/3-540-46439-5_11
Mendeley helps you to discover research relevant for your work.