File system performance tuning for standard big data benchmarks

2Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Modern file system manages super large data sets to perform data intensive and cost-effective analytical processing. Performance of a file system relies on storages, software, workload characteristic and configurations. Complex techniques have to be used in analysis because the data are often hybrid mix of different formats and different structured datasets. Performance study helps to optimize these factors and improve the design of a file system to meet the requirements of a specific application. A promising approach is to allocate the diverse data of various applications on different file systems according to their individual properties, in order to support the best possible performance to every particular application. Some basis that simulate the characters and scenarios of each step of data execution procedures are addressed in this paper. Based on workload characteristic analysis, administrator can implement tuning methods in the large and high-dimensional configuration parameter settings provided by the platform accordingly. Preliminary results are provided by running standard benchmark TPCx-HS, TPCx-BB, TPC-H and HiBench K-means on Ext4 and Btrfs file systems, and the impactions of workload characteristics to the benchmark performance have been analysed.

Cite

CITATION STYLE

APA

Ren, D. Q., & Xia, B. (2018). File system performance tuning for standard big data benchmarks. In ACM International Conference Proceeding Series (Vol. Part F137704, pp. 22–26). Association for Computing Machinery. https://doi.org/10.1145/3219788.3219809

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free