Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

Mustafa Abdulmajeed AbdulJabbar, Mohammed Al Farhan, Rio Yokota, David E. Keyes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.
Original languageEnglish (US)
Title of host publicationEuro-Par 2017: Parallel Processing
PublisherSpringer Nature
Pages553-564
Number of pages12
ISBN (Print)9783319642024
DOIs
StatePublished - Aug 1 2017

Fingerprint Dive into the research topics of 'Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture'. Together they form a unique fingerprint.

Cite this