Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

Khalid Hasanov, Jean-Noël Quintin, Alexey Lastovetsky

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Original languageEnglish (US)
Pages (from-to)3991-4014
Number of pages24
JournalThe Journal of Supercomputing
Issue number11
StatePublished - Mar 4 2014
Externally publishedYes


Dive into the research topics of 'Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms'. Together they form a unique fingerprint.

Cite this