Our aim is to provide efficient partitioning and replication of data. We seek to accommodate a variety of transaction types (both short and long-running, read and write-oriented) to support workloads in cloud environments. We do so by introducing an approach that partitions and allocates small units of data, that we call micropartitions, to multiple database nodes. Only the necessary data is available to the workload in the form of micropartitions. Transactions are routed directly to the appropriate micropartitions. First, we use agglomerative hierarchical clustering technique to group the workload queries based on data requirements. We represent each cluster with an abstract query definition. The abstract query definition is a query statement that represents the minimal data requirements that would satisfy all the queries that belong to a given cluster. A micropartition is realized by executing the abstract query. We show that our abstract query definition is complete and minimal. Intuitively, completeness means that all queries of the corresponding cluster can be correctly answered using the micropartition generated from the abstract query. The minimality property means that no smaller partition of the data can satisfy all of the queries in the cluster. Our empirical results show that our approach improves data access efficiency over standard partitioning of data.
CITATION STYLE
Kish, A. V., Rose, J. R., & Farkas, C. (2015). Efficient partitioning and allocation of data for workload queries. Lecture Notes in Electrical Engineering, 313, 549–555. https://doi.org/10.1007/978-3-319-06773-5_73
Mendeley helps you to discover research relevant for your work.