Big Data Storage and Processing on Azure Clouds: Experiments at Scale and Lessons Learned

  • Tudoran R
  • Costan A
  • Antoniu G
  • et al.
N/ACitations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data-intensive computing is now starting to be considered as the basis for a new, fourth paradigm for science. Two factors are encouraging this trend. First, vast amounts of data are becoming available in more and more application areas. Second, the infrastructures allowing to persistently store these data for sharing and processing are becoming a reality. This allows to unify knowledge acquired through the previous three paradigms for scientific research (theory, experiments and simulations) with vast amounts of multidisciplinary data. The technical and scientific issues related to this context have been designated as the ``Big Data'' challenges. In this landscape, building a functional infrastructure for the requirements of Big Data applications is critical and is still a challenge. An important step has been made thanks to the emergence of cloud infrastructures, which are bringing the first bricks to cope with the challenging scale of the Big Data vision. Clouds bring to life the illusion of a (more-or-less) infinitely scalable infrastructure managed through a fully outsourced ICT service. Instead of having to buy and manage hardware, users ``rent'' outsourced resources as needed. However, cloud technologies have not reached yet their full potential. In particular, the capabilities available now for data storage and processing are still far from meeting the application requirements. In this work we investigate several hot challenges related to Big Data management on clouds. We discuss current state-of-the-art solutions, their limitations and some ways to overcome them. We illustrate our study with a concrete application study from the area of joint genetic and neuroimaging data analysis. The goal of this chapter is to present the conclusions of this study performed through a large-scale experiment carried out across three data centers of Microsoft's Azure cloud platform during 2 weeks, which consumed approximately 200.000 compute hours.

Cite

CITATION STYLE

APA

Tudoran, R., Costan, A., Antoniu, G., & Goetz, B. (2014). Big Data Storage and Processing on Azure Clouds: Experiments at Scale and Lessons Learned. In Cloud Computing for Data-Intensive Applications (pp. 331–355). Springer New York. https://doi.org/10.1007/978-1-4939-1905-5_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free