Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

  • Kronbichler M
  • Allalen M
N/ACitations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This work presents high-order discontinuous Galerkin finite element kernels optimized for node-level performance on a series of Intel architectures ranging from Sandy Bridge to Skylake. The kernels implement matrix-free evaluation of integrals with sum factorization techniques. In order to increase performance and thus to help to achieve higher energy efficiency, this work proposes an element-based shared-memory parallelization option and compares it to a well-established shared-memory parallelization with global face data. The new algorithm is supported by the relevant metrics in terms of arithmetics and memory transfer. On a single node with $$2\times 24$$2{\texttimes}24cores of Intel Skylake Scalable, we report more than 1,200 GFLOPs/s in double precision for the full operator evaluation and up to 175 GB/s of memory throughput. Finally, we also show that merging the more arithmetically heavy operator evaluation with vector operations in application code allows to more than double efficiency on the latest hardware both with respect to energy as well as regarding time to solution.

Cite

CITATION STYLE

APA

Kronbichler, M., & Allalen, M. (2018). Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations (pp. 89–110). https://doi.org/10.1007/978-3-319-99654-7_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free