In this paper we explore network topologies suitable for future exascale systems that need to support over fifty thousand endpoints. With the increased necessity to use optics at higher link speeds, some of the more traditional topologies, such as Tori and Fat-Trees, become prohibitively expensive at such large scale. We identify two cost efficient hierarchical topologies, one a canonical Dragonfly, and one a variant of the Dragonfly topology that we call Megafly. Megafly is an indirect hierarchical topology with high path diversity, flexible tapering options and an abundance of possible system design points. We describe and analyze the Megafly topology to understand its key features and advantages, when compared to the Dragonfly. Additionally, we define a Megafly tapering scheme that enables a good balance of system performance versus cost. Our evaluation shows that the Megafly topology achieves equal or better throughput than the Dragonfly on a variety of traffic patterns, while requiring only half of the virtual channels for deadlock-free routing. Megafly also provides better fairness, which is shown in the evaluation of synchronizing traffic patterns, such as neighbor exchanges. We also showcase the design flexibility and cost vs. performance trade-offs of Megafly in a mini case study that illustrates the challenges of building a high performance fabric topology.
CITATION STYLE
Flajslik, M., Borch, E., & Parker, M. A. (2018). Megafly: A topology for exascale systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10876 LNCS, pp. 289–310). Springer Verlag. https://doi.org/10.1007/978-3-319-92040-5_15
Mendeley helps you to discover research relevant for your work.