We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernelsâ'the Biot-Savart equation and stretching term of the vorticity equationâ'are mathematically reformulated so that only two Laplace scalar potentials are used instead of six, while automatically ensuring divergence-free far-field computation. Based on this formulation, and on our previous work for a scalar heterogeneous FMM algorithm, we develop a new FMM-based vortex method capable of simulating general flows including turbulence on heterogeneous architectures, which distributes the work between multi-core CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm also uses new data structures which can dynamically manage inter-node communication and load balance efficiently but with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s. © 2012 IEEE.
|Original language||English (US)|
|Title of host publication||2012 SC Companion: High Performance Computing, Networking Storage and Analysis|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||2|
|State||Published - Nov 2012|