Online aggregation for large MapReduce jobs

139Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

Abstract

In online aggregation, a database system processes a user's aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this paper, we consider how online aggregation can be built into a MapReduce system for large-scale data processing. Given the MapReduce paradigm's close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be run in the cloud), online aggregation is a very attractive technology. Since large-scale cloud computations are typically pay-as-you-go, a user can monitor the accuracy obtained in an online fashion, and then save money by killing the computation early once sufficient accuracy has been obtained. © 2011 VLDB Endowment.

Cite

CITATION STYLE

APA

Pansare, N., Borkar, V., Jermaine, C., & Condie, T. (2011). Online aggregation for large MapReduce jobs. In Proceedings of the VLDB Endowment (Vol. 4, pp. 1135–1145). VLDB Endowment. https://doi.org/10.14778/3402707.3402748

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free