Online aggregation for large MapReduce jobs

Niketan Pansare; Vinayak Borkar; Chris Jermaine; Tyson Condie

Conference ProceedingsOPEN ACCESS

Online aggregation for large MapReduce jobs

Proceedings of the VLDB Endowment (2011) 4(11) 1135-1145

DOI: 10.14778/3402707.3402748

139Citations

96Readers

Abstract

In online aggregation, a database system processes a user's aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this paper, we consider how online aggregation can be built into a MapReduce system for large-scale data processing. Given the MapReduce paradigm's close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be run in the cloud), online aggregation is a very attractive technology. Since large-scale cloud computations are typically pay-as-you-go, a user can monitor the accuracy obtained in an online fashion, and then save money by killing the computation early once sufficient accuracy has been obtained. © 2011 VLDB Endowment.

Cite

CITATION STYLE

APA

Pansare, N., Borkar, V., Jermaine, C., & Condie, T. (2011). Online aggregation for large MapReduce jobs. In Proceedings of the VLDB Endowment (Vol. 4, pp. 1135–1145). VLDB Endowment. https://doi.org/10.14778/3402707.3402748

Online aggregation for large MapReduce jobs

Abstract

Cite

Register to see more suggestions