The 2007 Edsger
W. Dijkstra Prize in Distributed Computing goes to
the paper |
|
``Consensus in the presence of partial
synchrony'' |
by Cynthia Dwork,
Nancy Lynch, and Larry Stockmeyer, |
|
which appeared in the Journal
of the ACM, Vol. 35, No. 2, April, 1988. pages
288--323. |
A preliminary version
appeared in PODC 1984. |
|
This paper introduces a
number of practically motivated partial synchrony models that lie between the
completely synchronous and the completely asynchronous models, and in which
consensus is solvable. It gave practitioners the right tool for building
fault tolerant systems, and contributed to the understanding that safety can
be maintained at all times, despite the impossibility of consensus and
progress is facilitated during periods of stability. These are the pillars on
which every fault tolerant system has been built for two decades. This
includes academic projects such as Petal, Frangipani, and Boxwood, as well as
real life data centers, such as the Google file system. |
|
In distributed systems,
balancing the pragmatics of building software that works against the need for
rigor is particularly difficult because of impossibility results such as the FLP theorem. The publication by Dwork,
Lynch, and Stockmeyer was in many respects the
first to suggest a path through this thicket, and has been enormously
influential. It presents consensus algorithms for a number of partial
synchrony models with different timing requirements and failure assumptions:
crash, authenticated Byzantine, and Byzantine failures. It also proves tight
lower bounds on the resilience of such algorithms. |
|
The eventual synchrony
approach introduced in this paper is used to model algorithms that provide
safety at all times, even in completely asynchronous runs, and guarantee liveness once the system stabilizes. This has since been
established as the leading approach for circumventing the FLP
impossibility result and solving asynchronous consensus, atomic broadcast,
and state-machine replication. |
|
In particular, the
distributed systems engineering community has been increasingly drawn towards
systems architectures that reflect the basic split between safety and liveness cited above.
Dwork, Lynch, and Stockmeyer
thus planted the seed for a profound rethinking of the ways that we should
build, and reason about, this class of systems. Following this direction are
many foundational solutions. First, these include state machine replication
methods such as Lamport's seminal Paxos algorithm and many group communication methods.
Another important branch of research that directly follows this work is given
by Chandra and Toueg's unreliable failure detector
abstraction, which is realized in the eventual synchrony model of this paper.
As Chandra and Toueg write:``we argue that partial synchrony assumptions can
be encapsulated in the unreliability of failure detectors. For example, in
the models of partial synchrony considered in Dwork
et al. it is easy to implement a failure detector that satisfies the
properties of <>W.'' Finally, the insight by Dwork,
Lynch, and Stockmeyer also led to various
timed-based models of partial synchrony, such as Cristian
and Fetzer's Timed-Asynchronous model and others. |
|
The award committee would
like to acknowledge the sincere efforts by the nominators of this work, as
well as all other (worthy!) nominations which came short of winning. |
|
The committee wishes to pay
a special tribute via this award to Larry Stockmeyer,
who passed away on July 31, 2004. Larry's impact on the field through this
paper and many others will always be remembered. |
|
The Committee of the 2007 Edsger W. Dijkstra Prize in
Distributed Computing |
|
Hagit Attiya |
Dahlia Malkhi |
Keith Marzullo |
Marios Mavronicolas |
Andrzej Pelc |
Roger Wattenhofer
(Chair) |