Consistent Checkpointing in Distributed Computations: Theoretical Results and Protocols

Francesco Quaglia

This thesis is focused on the study of consistent checkpointing in distributed computations. The model of the computation is asynchronous. The investigated checkpointing approach is known as communication-induced. In this approach, processes of the distributed computation take checkpoints at their own pace (namely basic checkpoints) and some additional checkpoints (namely forced checkpoints) are induced by a lazy coordination scheme, in order to guarantee consistency of global checkpoints. The lazy coordination is realized by piggybacking control information on application messages. Upon the receipt of a message, the recipient process evaluates a predicate basing on the incoming control information and on its local context; if the predicate is evaluated to TRUE, a forced checkpoint is taken. The thesis reports both theoretical results on this issue and protocols derived from those results.

BibTeX Entry:

author = {Quaglia, Francesco},
school = {Sapienza, University of Rome},
title = {Consistent Checkpointing in Distributed Computations: Theoretical Results and Protocols},
year = {1999},
type = {phdthesis},
comment = {Supervisor: B. Ciciani}