A Comparison of HDFS Compact Data Formats: Avro Versus Parquet

  • Plase D
  • Niedrite L
  • Taranovs R
N/ACitations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro.

Cite

CITATION STYLE

APA

Plase, D., Niedrite, L., & Taranovs, R. (2017). A Comparison of HDFS Compact Data Formats: Avro Versus Parquet. Mokslas - Lietuvos Ateitis, 9(3), 267–276. https://doi.org/10.3846/mla.2017.1033

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free