A comparison of systems to large-scale data access

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the amount of data produced in several application domains, it is increasingly difficult to manage and query related large data repositories (https://www.lsstcorp.org/sciencewiki/images/DC- Handbook-v1.1.pdf). Within the PetaSky project, we focus on the problem of managing scientific data in the field of cosmology. The data we consider are those of the LSST project. The overall expected size of the database that will be produced will exceed 60 PB. This paper presents preliminary results of experiments conducted on PT1.1 (http://lsst1.ncsa.uiuc.edu/schema/index.php?sVer=PT1-1 (with a size of 90 GB.)) and PT1.2 (http://lsst1.ncsa.uiuc.edu/schema/index.php?sVer=PT1-1 (with a size of 145 GB.)) data sets in order to compare the performances of both centralized and distributed database management systems. As for centralized systems, we have deployed three different DBMSs: Mysql, Postgresql and DBMS-X (a commercial relational database). Regarding distributed systems, we have deployed HadoopDB and Hive. The goal of these experiments is to report on the ability of these systems to support large scale declarative queries. We mainly investigate the impact of data partitioning, indexing and compression on query execution performances. © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Mesmoudi, A., & Hacid, M. S. (2014). A comparison of systems to large-scale data access. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8505 LNCS, pp. 161–175). Springer Verlag. https://doi.org/10.1007/978-3-662-43984-5_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free