A Survey of Rollback-Recovery Protocols in Message-Passing Systems

Elmootazbellah Elnozahy*, Lorenzo Alvisi, Yi Min Wang, David B. Johnson

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1031 Scopus citations

Abstract

This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based. Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants. Depending on how determinants are logged, log-based protocols can be pessimistic, optimistic, or causal. Throughout the survey, we highlight the research issues that are at the core of rollback-recovery and present the solutions that currently address them. We also compare the performance of different rollback-recovery protocols with respect to a series of desirable properties and discuss the issues that arise in the practical implementations of these protocols.

Original languageEnglish (US)
Pages (from-to)375-408
Number of pages34
JournalACM Computing Surveys
Volume34
Issue number3
DOIs
StatePublished - Sep 1 2002

Keywords

  • Message logging
  • Rollback-recovery

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this