A compiler approach for exploiting partial SIMD parallelism

Hao Zhou; Jingling Xue

Journal ArticleOPEN ACCESS

A compiler approach for exploiting partial SIMD parallelism

ACM Transactions on Architecture and Code Optimization (2016) 13(1)

DOI: 10.1145/2886101

37Citations

20Readers

Abstract

Existing vectorization techniques are ineffective for loops that exhibit little loop-level parallelism but some limited superword-level parallelism (SLP). We show that effectively vectorizing such loops requires partial vector operations to be executed correctly and efficiently, where the degree of partial SIMD parallelism is smaller than the SIMD datapath width. We present a simple yet effective SLP compiler technique called PAVER (PArtial VEctorizeR), formulated and implemented in LLVM as a generalization of the traditional SLP algorithm, to optimize such partially vectorizable loops. The key idea is to maximize SIMD utilization by widening vector instructions used while minimizing the overheads caused by memory access, packing/ unpacking, and/or masking operations, without introducing new memory errors or new numeric exceptions. For a set of 9 C/C++/Fortran applications with partial SIMD parallelism, PAVER achieves significantly better kernel and whole-program speedups than LLVM on both Intel's AVX and ARM's NEON.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhou, H., & Xue, J. (2016). A compiler approach for exploiting partial SIMD parallelism. ACM Transactions on Architecture and Code Optimization, 13(1). https://doi.org/10.1145/2886101

A compiler approach for exploiting partial SIMD parallelism

Abstract

Author supplied keywords

Cite

Register to see more suggestions