Maximizing the performance of scientific data transfer by optimizing the interface between parallel file systems and advanced research networks

6Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

The large amount of time spent transferring experimental data in fields such as genomics is hampering the ability of scientists to generate new knowledge. Often, computer hardware is capable of faster transfers but sub-optimal transfer software and configurations are limiting performance. This work seeks to serve as a guide to identifying the optimal configuration for performing genomics data transfers. A wide variety of tests narrow in on the optimal data transfer parameters for parallel data streaming across Internet2 and between two CloudLab clusters loading real genomics data onto a parallel file system. The best throughput was found to occur with a configuration using GridFTP with at least 5 parallel TCP streams with a 16 MiB TCP socket buffer size to transfer to/from 4–8 BeeGFS parallel file system nodes connected by InfiniBand.

Cite

CITATION STYLE

APA

Mills, N., Feltus, F. A., & Ligon, W. B. (2018). Maximizing the performance of scientific data transfer by optimizing the interface between parallel file systems and advanced research networks. Future Generation Computer Systems, 79, 190–198. https://doi.org/10.1016/j.future.2017.04.030

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free