An FMM based on dual tree traversal for many-core architectures

Rio Yokota

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other state-of-the-art fast N-body codes demonstrates that orders of magnitude increase in performance can be achieved by careful selection of the optimal algorithm and low-level optimization of the code. The current N-body solver uses a fast multipole method with an efficient strategy for finding the list of cell-cell interactions by a dual tree traversal. A task-based threading model is used to maximize thread-level parallelism and intra-node load-balancing. In order to extract the full potential of the SIMD units on the latest CPUs, the inner kernels are optimized using AVX instructions.
Original languageEnglish (US)
Pages (from-to)301-324
Number of pages24
JournalJournal of Algorithms & Computational Technology
Volume7
Issue number3
DOIs
StatePublished - Sep 2013

ASJC Scopus subject areas

  • Numerical Analysis
  • Computational Mathematics
  • Applied Mathematics

Fingerprint Dive into the research topics of 'An FMM based on dual tree traversal for many-core architectures'. Together they form a unique fingerprint.

Cite this