Flexer: Out-of-Order Scheduling for Multi-NPUs

7Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent neural accelerators often comprise multiple neural processing units (NPUs) with shared cache and memory. The regular schedules of state-of-the-art scheduling techniques miss important opportunities for memory reuse. This paper presents Flexer, an out-of-order (OoO) scheduler that maximizes instruction-level parallelism and data reuse on such multi-NPU systems. Flexer employs a list scheduling algorithm to dynamically schedule the tiled workload to all NPUs. To cope with the irregular data access patterns of OoO schedules, several heuristics help maximize data reuse by considering the availability of data tiles at different levels in the memory hierarchy. Evaluated with several neural networks on 2 to 4-core multi-NPUs, Flexer achieves a speedup of up to 2.2x and a 1.2-fold reduction in data transfers for individual layers compared to the best static execution order.

Cite

CITATION STYLE

APA

Min, H., Kwon, J., & Egger, B. (2023). Flexer: Out-of-Order Scheduling for Multi-NPUs. In CGO 2023 - Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (pp. 212–223). Association for Computing Machinery, Inc. https://doi.org/10.1145/3579990.3580025

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free