A compiler approach for exploiting partial SIMD parallelism

37Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

Existing vectorization techniques are ineffective for loops that exhibit little loop-level parallelism but some limited superword-level parallelism (SLP). We show that effectively vectorizing such loops requires partial vector operations to be executed correctly and efficiently, where the degree of partial SIMD parallelism is smaller than the SIMD datapath width. We present a simple yet effective SLP compiler technique called PAVER (PArtial VEctorizeR), formulated and implemented in LLVM as a generalization of the traditional SLP algorithm, to optimize such partially vectorizable loops. The key idea is to maximize SIMD utilization by widening vector instructions used while minimizing the overheads caused by memory access, packing/ unpacking, and/or masking operations, without introducing new memory errors or new numeric exceptions. For a set of 9 C/C++/Fortran applications with partial SIMD parallelism, PAVER achieves significantly better kernel and whole-program speedups than LLVM on both Intel's AVX and ARM's NEON.

Cite

CITATION STYLE

APA

Zhou, H., & Xue, J. (2016). A compiler approach for exploiting partial SIMD parallelism. ACM Transactions on Architecture and Code Optimization, 13(1). https://doi.org/10.1145/2886101

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free