ACCELERATING RECOVERY IN MPI ENVIRONMENTS

Research output: Patent

Abstract

A computer usable program product for accelerating recovery in an MPI environment is provided in the illustrative embodiments. A first portion of a distributed application executes using a first processor and a second portion using a second processor in a distributed computing environment. After a failure of operation of the first portion, the first portion is restored to a checkpoint. A first part of the first portion is distributed to a third processor and a second part to a fourth processor. A computation of the first portion is performed using the first and the second parts in parallel. A first message is computed in the first portion and sent to the second portion, the message having been initially computed after a time of the checkpoint. A second message is replayed from the second portion without computing the second message in the second portion.

Original languageEnglish (US)
Patent numberUS2012226939
IPCG06F 11/ 14 A I
Priority date04/16/12
StatePublished - Sep 6 2012
Externally publishedYes

Cite this