Convolution is one of the most computationally intensive operations that must be performed for machine learning model inference. A traditional approach to computing convolutions is known as the Im2Col + BLAS method. This article proposes SConv: a direct-convolution algorithm based on an MLIR/LLVM code-generation toolchain that can be integrated into machine-learning compilers. This algorithm introduces: (a) Convolution Slicing Analysis (CSA) - a convolution-specific 3D cache-blocking analysis pass that focuses on tile reuse over the cache hierarchy; (b) Convolution Slicing Optimization - a code-generation pass that uses CSA to generate a tiled direct-convolution macro-kernel; and (c) Vector-based Packing - an architecture-specific optimized input-tensor packing solution based on vector-register shift instructions for convolutions with unitary stride. Experiments conducted on 393 convolutions from full ONNX-MLIR machine learning models indicate that the elimination of the Im2Col transformation and the use of fast packing routines result in a total packing time reduction, on full model inference, of 2.3×-4.0× on Intel x86 and 3.3×-5.9× on IBM POWER10. The speed-up over an Im2Col + BLAS method based on current BLAS implementations for end-to-end machine-learning model inference is in the range of 11%-27% for Intel x86 and 11%-34% for IBM POWER10 architectures. The total convolution speedup for model inference is 13%-28% on Intel x86 and 23%-39% on IBM POWER10. SConv also outperforms BLAS GEMM, when computing pointwise convolutions in more than 82% of the 219 tested instances.
CITATION STYLE
Ferrari, V., Sousa, R., Pereira, M., De Carvalho, J. P. L., Amaral, J. N., Moreira, J., & Araujo, G. (2023). Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions. ACM Transactions on Architecture and Code Optimization, 20(4). https://doi.org/10.1145/3625004
Mendeley helps you to discover research relevant for your work.