Ćwirydowicz, K.Chalmers, N.Karakus, A.Warburton, T.2018-01-192018-01-192017-09http://hdl.handle.net/10919/81866This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving close-to-the-peak performance for these operators requires extensive optimization because of the operators' properties: low arithmetic intensity, tiered structure, and the need to store intermediate results inside the kernel. We give a guided overview of optimization strategies and we present a performance model that allows us to compare the efficacy of these optimizations against an empirically calibrated roofline.enIn Copyrightcs.MScs.DCcs.NAcs.PFmath.NAAcceleration of tensor-product operations for high-order finite element methodsArticle - RefereedWarburton, T [0000-0002-3202-1151]