A scalable high performant cholesky factorization for multicore with GPU accelerators

Hatem Ltaief*, Stanimire Tomov, Rajib Nath, Peng Du, Jack Dongarra

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

We present a Cholesky factorization for multicore with GPU accelerators systems. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs' compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. This results in a scalable hybrid Cholesky factorization of unprecedented performance. In particular, using NVIDIA's Tesla S1070 (4 C1060 GPUs, each with 30 cores @1.44 GHz) connected to two dual-core AMD Opteron @1.8GHz processors, we reach up to 1.163 TFlop/s in single and up to 275 GFlop/s in double precision arithmetic. Compared with the performance of the embarrassingly parallel xGEMM over four GPUs, where no communication between GPUs are involved, our algorithm still runs at 73% and 84% for single and double precision arithmetic respectively.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing for Computational Science, VECPAR 2010 - 9th International Conference, Revised Selected Papers
Pages93-101
Number of pages9
Volume6449 LNCS
DOIs
StatePublished - 2011
Externally publishedYes
Event9th International Conference on High Performance Computing for Computational Science, VECPAR 2010 - Berkeley, CA, United States
Duration: Jun 22 2010Jun 25 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6449 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th International Conference on High Performance Computing for Computational Science, VECPAR 2010
CountryUnited States
CityBerkeley, CA
Period06/22/1006/25/10

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A scalable high performant cholesky factorization for multicore with GPU accelerators'. Together they form a unique fingerprint.

Cite this