Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

Martin Kronbichler; Momme Allalen

Book Chapter

Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

Kronbichler M
Allalen M

DOI: 10.1007/978-3-319-99654-7_7

N/ACitations

7Readers

Get full text

Abstract

This work presents high-order discontinuous Galerkin finite element kernels optimized for node-level performance on a series of Intel architectures ranging from Sandy Bridge to Skylake. The kernels implement matrix-free evaluation of integrals with sum factorization techniques. In order to increase performance and thus to help to achieve higher energy efficiency, this work proposes an element-based shared-memory parallelization option and compares it to a well-established shared-memory parallelization with global face data. The new algorithm is supported by the relevant metrics in terms of arithmetics and memory transfer. On a single node with $$2\times 24$$2{\texttimes}24cores of Intel Skylake Scalable, we report more than 1,200 GFLOPs/s in double precision for the full operator evaluation and up to 175 GB/s of memory throughput. Finally, we also show that merging the more arithmetically heavy operator evaluation with vector operations in application code allows to more than double efficiency on the latest hardware both with respect to energy as well as regarding time to solution.

Cite

CITATION STYLE

APA

Kronbichler, M., & Allalen, M. (2018). Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations (pp. 89–110). https://doi.org/10.1007/978-3-319-99654-7_7

Efficient High-Order Discontinuous Galerkin Finite Elements with Matrix-Free Implementations

Abstract

Cite

Register to see more suggestions