scalable parallelization algorithm to port an explicit marching-on-in-time (MOT)-based time domain volume integral equation (TDVIE) solver onto multi-GPUs is described. The algorithm makes use of MPI and OpenACC for efficient implementation. The MPI processes are responsible for synchronizing and communicating the distributed compute kernels of the MOT-TDVIE solver between the GPUs, where one MPI task is assigned to one GPU. The compiler directives of the OpenACC are responsible for the data transfer and kernels’ offloading from the CPU to the GPU and their execution on the GPU. The speedups achieved against the MPI/OpenMP code execution on multiple CPUs and parallel efficiencies are presented. Index Terms ─ Explicit marching-on-in-time scheme, GPU, MPI, OpenACC, time-domain volume integral equation.
|Original language||English (US)|
|Number of pages||4|
|Journal||APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL|
|State||Published - 2018|