Data placement in Bubba

George Copeland; William Alexander; Ellen Boughter; Tom Keller

Conference ProceedingsOPEN ACCESS

Data placement in Bubba

Proceedings of the ACM SIGMOD International Conference on Management of Data (1988) 1988-June 99-108

DOI: 10.1145/50202.50213

146Citations

6Readers

Abstract

This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. "Highly-parallel" implies that load balancing is a critical performance issue. tlData-intensive" means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue. In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i.e., spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead. We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.

Cite

CITATION STYLE

APA

Copeland, G., Alexander, W., Boughter, E., & Keller, T. (1988). Data placement in Bubba. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Vol. 1988-June, pp. 99–108). Association for Computing Machinery. https://doi.org/10.1145/50202.50213

Data placement in Bubba

Abstract

Cite

Register to see more suggestions