High Performance Polar Decomposition on Distributed Memory Systems

Dalal E. Sukkari, Hatem Ltaief, David E. Keyes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations


The polar decomposition of a dense matrix is an important operation in linear algebra. It can be directly calculated through the singular value decomposition (SVD) or iteratively using the QR dynamically-weighted Halley algorithm (QDWH). The former is difficult to parallelize due to the preponderant number of memory-bound operations during the bidiagonal reduction. We investigate the latter scenario, which performs more floating-point operations but exposes at the same time more parallelism, and therefore, runs closer to the theoretical peak performance of the system, thanks to more compute-bound matrix operations. Profiling results show the performance scalability of QDWH for calculating the polar decomposition using around 9200 MPI processes on well and ill-conditioned matrices of 100K×100K problem size. We study then the performance impact of the QDWH-based polar decomposition as a pre-processing step toward calculating the SVD itself. The new distributed-memory implementation of the QDWH-SVD solver achieves up to five-fold speedup against current state-of-the-art vendor SVD implementations. © Springer International Publishing Switzerland 2016.
Original languageEnglish (US)
Title of host publicationEuro-Par 2016: Parallel Processing
PublisherSpringer Nature
Number of pages12
ISBN (Print)9783319436586
StatePublished - Aug 9 2016


Dive into the research topics of 'High Performance Polar Decomposition on Distributed Memory Systems'. Together they form a unique fingerprint.

Cite this