Abstract
Recent neural accelerators often comprise multiple neural processing units (NPUs) with shared cache and memory. The regular schedules of state-of-the-art scheduling techniques miss important opportunities for memory reuse. This paper presents Flexer, an out-of-order (OoO) scheduler that maximizes instruction-level parallelism and data reuse on such multi-NPU systems. Flexer employs a list scheduling algorithm to dynamically schedule the tiled workload to all NPUs. To cope with the irregular data access patterns of OoO schedules, several heuristics help maximize data reuse by considering the availability of data tiles at different levels in the memory hierarchy. Evaluated with several neural networks on 2 to 4-core multi-NPUs, Flexer achieves a speedup of up to 2.2x and a 1.2-fold reduction in data transfers for individual layers compared to the best static execution order.
Author supplied keywords
Cite
CITATION STYLE
Min, H., Kwon, J., & Egger, B. (2023). Flexer: Out-of-Order Scheduling for Multi-NPUs. In CGO 2023 - Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (pp. 212–223). Association for Computing Machinery, Inc. https://doi.org/10.1145/3579990.3580025
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.