Abstract
The fat-tree topology is one of the most commonly used network topologies in HPC systems. Vendors support several options that can be configured when deploying fat-tree networks on production systems, such as link bandwidth, number of rails, number of planes, and tapering. This paper showcases the use of simulations to compare the impact of these design options on representative production HPC applications, libraries, and multi-job workloads. We present advances in the TraceR-CODES simulation framework that enable this analysis and evaluate its prediction accuracy against experiments on a production fat-tree network. In order to understand the impact of different network configurations on various anticipated scenarios, we study workloads with different communication patterns, computation-to-communication ratios, and scaling characteristics. Using multi-job workloads, we also study the impact of inter-job interference on performance and compare the cost-performance tradeoffs. CCS CONCEPTS • Networks →Network performance modeling; Network simulations; Network performance analysis;
Author supplied keywords
Cite
CITATION STYLE
Jain, N., Bhatele, A., Howell, L. H., Bohme, D., Karlin, I., Leon, E. A., … Leininger, M. L. (2017). Predicting the Performance Impact of Different Fat-Tree Configurations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC (Vol. 2017-November). IEEE Computer Society. https://doi.org/10.1145/3126908.3126967
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.