From CasGroup

Jump to: navigation, search
People err. That is a fact of life. People are not precision machinery designed for accuracy
In fact, we humans are a different kind of device entirely. Creativity, adaptability, and
flexibility are our strengths. Continual alertness and precision in action or memory are our
weaknesses. We are amazingly error tolerant, even when physically damaged. We are extremely
flexible, robust, creative, and superb at finding explanations and meanings from partial and
noisy evidence. The same properties that lead to such robustness and creativity also produce
errors. The natural tendency to interpret partial information -- although often our prime virtue
-- can cause operators to misinterpret system behavior in such a plausible way that the
misinterpretation can be difficult to discover. - Donald A. Norman

Robustness in the context of software systems and applications is defined as the degree to which a system or component can still function in the presence of pertubations: faults, failures or adverse conditions. It is is associated with the resilience of a system and the ability to maintain function despite adverse (worst case) conditions and unfavorable changes in internal structure or external environment. A robust system is "pertubation-resistant". A system which can stil function in the presence of faults is called fault tolerant. Robustness depends on the fragility and brittleness of a system, and is a measure of how sensitive a particular system is to changes and disturbances. Reliability, robustness and fault tolerance can be achieved by redundancy and replication. In his paper The Ontology of Complex Systems, William Wimsatt gives the following definition of robustness:

Things are robust if they are accessible (detectable, measureable, derivable, defineable, produceable, or the like) in a variety of independent ways.

Basically, a system is robust if a certain operation or process can be achieved in many ways, if there are many ways of doing the same thing. For example, there are many different ways to reach a node, because there is more than one path to it, or there are many different ways to access a certain node type, because there are many redundant instances.

  • diversity of links: reach a node on different paths
  • diversity of nodes: access different redundant instances of a node
  • diversity of design/versions: reach a computational goal in different ways

In any case, diversity and redundancy are the key. There are in general two different ways to increase robustness in distributed systems: first redundancy of components, elements and nodes, and second redundancy of paths, links and channels between them. The former can be found in the replication of nodes, the latter can be observed in the internet, if one route is blocked, then another is taken. If there is always another way to reach the goal, then the failure of a single node or link does not affect the function of the system. A third method to increase robustness and fault tolerance is 'design diversity' to protect the system against design faults: the use of multiple functionally equivalent but diverse program versions based on the same specification to ensure safety in critical applications and to protect the most vital subsystems of complex systems and networks against design faults (also known as N-Version Programming or NVP which was first proposed by Algirdas Avizienis in 1977).

Personal tools