A distributed system in computer science is simply a single, unified system of many (at least two) distributed computers, processors or processes which communicate which each other through a common communications medium or network. The distribution can be physical (over a geographical area) or logical (over a virtual space). A distributed system consists of many elements that are separated and connected by networks that take time to transmit their messages. It is often often represented as a connected graph, the nodes are the computers or processes, and the edges are general bidirectional communication channels or links.
Distributed Systems is a general term for "Distributed Data Processing System" or "Distributed Computing System" (DCS). One of the first definitions of a "Distributed Data Processing System" was made by Philip H. Enslow in 1978 (IEEE Computer Vol. 11 January (1978) 13-21). He specified the five properties multiplicity, physical distribution, unity of system operation, system transparency and cooperative autonomy. The five characteristics are explained in his papers as follows:
- A multiplicity of general-purpose resource components, including both physical and logical resources, that can be assigned to specific tasks on a dynamic basis.
- A physical distribution of these physical and logical components of the system interacting through a communication network
- A high-level operating system that unifies and integrates the control the distributed components.
- System transparency permitting services to be requested by name only. The server does not have to be identified.
- Cooperative Autonomy, characterizing the operation and interaction of both physical and logical resources.
According to Enslow, only a combination of all five criteria uniquely defines a distributed data processing system. This early definition gives already a hint at the inherent complexity of distributed systems. Complexity means unity in diversity, and the definition from Enslow emphasizes unification and integration of distributed components, and cooperation of autonomous elements.
Lesie Lamport made simply the following definition A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. (this definition is from an e-Mail message sent to a DEC SRC bulletin board at on 28 May 1987, see )
Singhal and Casavant define a "Distributed Computing System" (IEEE Computer Vol. 24 August (1991) 12-14) as a collection of autonomous computers connected by a communication network. The add that these sites typically do no share a common memory, and communicate solely by means of message passing instead.
Andrew S. Tanenbaum and Maarten van Steen (2002) gave the following more recent definition of a distributed system (see their book "Distributed Systems" on page 2): A distributed system is a collection of independent computers that appears to its users as a single coherent system. In computer science, a distributed system is generally an interconnected collection of autonomous computers, processes, or processors.
Couloris, Dollimore and Kinberg (2005) define it through networks, communication and messages (see their book "Distributed Systems" on page 1): A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages.
Microsoft's Technet offers this definition of a distributed system: "A non-centralized network consisting of numerous computers that can communicate with one another and that appear to users as parts of a single, large, accessible "storehouse" of shared hardware, software, and data. A distributed system is conceptually the opposite of a centralized, or monolithic, system in which clients connect to a single central computer, such as a mainframe" .
A distributed system is a system which is not centralized on a single machine, although a client-server system running on a single machine can be considered as a simple distributed system, since it is logically distributed. The opposite of distributed systems are traditional centralized or localized systems with a central memory, a single processor, and a well defined state and time. A natural result of distribution is message exchange to connect the distributed components. Distributed systems are not possible without networks and message exchange, although a network alone is not a distributed system. It must offer a single-system view, for instance a single distributed file system for a network of computers. The nodes in a network cooperate in the achievement of a common aim by exchanging messages.
Properties and Features
Advantages of distributed systems are modularity, flexibility, scalablitiy, resource sharing and availability. Drawbacks are the natural complexity of many distributed systems, as well as heavy management and security problems. Complexity is an inherent property of a distributed system, if you define complexity as unity in diversity. Unity arises from the single system view, diversity from the collection of independent processes and computers.
A distributed system has no global state which can be detected instantly, and there is no global time which is valid for all computers, nodes or entities. The nodes or processes and links or communication channels are not reliable. Sometimes the network is reliable, sometimes we can assume the processes are reliable, but in general neither the exchange of messages through the network, nor the computation through the individual processes are reliable and can fail. In a distributed system, the nodes or processes can no longer communicate through a central memory (Raynal, 1988). The only way they have of transmitting information between them is to use messages and message transmission on communication channels or network links. These communication channels can be physical or logical, and either synchronous or asynchronous.
In synchronous systems or network models, the processors communicate with each other through exchange of messages in synchronous rounds. The processors operate in lock-step synchrony, contrary to asynchronous systems or network models which are much more difficult. While messages in synchronous systems arrive instantly or within a finite timespan, there is no time limit for messages in general asynchronous systems. The uncertainty in synchronous systems is therefore much higher, because messages can arrive in asynchronous systems at arbitrary times. A natural problem with asynchronous communication is the ordering of messages, the distinction between lost and merely late messages, and violations of causality (effect is seen before cause).
Distributed Algorithms are used to solve the computational problems and to exploit the power of distributed systems. Problems which are simple in normal, centralized systems often become difficult to solve in distributed systems, for example to monitor the state of the system or the termination detection problem. Additional problems as race conditions and deadlocks arise. Distributed systems offer the potential for robustness and fault tolerance due to redundancy of components and links. Unfortunately, a huge number of nodes also increases the statistical probability for faults and failures. When the number of components in a distributed system increases, the probability of a failure may become very large. This is one reason why distributed systems are complex: more nodes mean higher potential for robustness and fault-tolerance due to increased redundany, but more nodes also mean higher probability for faults and failures. Fault tolerance and various forms of consensus and consistency that can be achieved in the presence of failures have been an important topic for a long time.
Problems and Challenges
Distributed systems are difficult to understand and hard to engineer, because they are much more complex than non-distributed systems. Steve Vinoski writes about them (in his article IEEE Internet Computing, July-August (2004) 91-94) "Distributed systems, middleware, and integration are just plain hard. The seemingly intractable difficulties permeating these areas drive our industry to continually seek out easier ways to build such systems. During the past 20 years, we’ve produced myriad toolkits aimed at simplifying distributed programming and integration, including many proprietary or homegrownbased approaches and many others based on wellknown approaches such as Distributed Computing Environment (DCE), Corba, J2EE, and Web services. Regardless of the underlying technology, each approach seems to start out being simple (compared to what went before it); as each matures, however, it seems to wind up just as complicated as the approach it was designed to displace, if not more so."
A typical problem in distributed systems is the complexity which results from the wish to be able to write a distributed application just as if it were a non-distributed one. To treat a distributed and diverse systems as a unified system leads automatically to complexity (if you define complexity as unity in diversity). Michi Henning writes about the topic in the comp.object.corba newsgroup: "I think that at least part of the reputation for complexity in CORBA stems from the desire to pretend that no network exists and from the wish to be able to write a distributed application just as if it were a non-distributed one. I’m afraid that this will remain a pipe dream for many more years. Distribution adds complexity, no matter how sophisticated a platform you use." The same problem exists for distributed computing and distributed algorithms. The attempt to make distributed computing follow the model of local computing leads to reduces scalability and reliability, and many distributed algorithms try to turn a distributed system in a non-distributed one by looking for a global leader, global state, global time, global consensus, etc.
An example for a distributed system is a Cluster. The simplest distributed system is a Client-Server system with only two computers and request-reply communication, where one acts as a server and delivers data (for instance a Web Server) and one as a client that makes requests (for example a computer with a browser). A Client-Server system is one of the two main types of architectural models besides Peer-to-Peer systems. Other examples of distributed systems are the World Wide Web (WWW), the global Internet and local Intranets.
- Intel Research Lab at Berkeley
- Microsoft Research
- Microsoft Research - Cambridge Distributed Systems
- HP Research Labs
- IBM Research
- PDOC Principles of Distributed Computing, Proceedings can be found here
- IPDPS IEEE International Parallel & Distributed Processing Symposium
Classic Papers related to definitions and early perspectives
- Philip H. Enslow, What is a 'distributed' data processing system?, IEEE Computer Vol. 11 January (1978) 13-21
- John A. Stankovic, A Perspective on Distributed Computer Systems, IEEE Transactions on Computers, Vol. C-33 No. 12 December (1984) 1102-1115
- Mukesh Singhal, Thomas L. Casavant, Distributed Computing Systems, IEEE Computer Vol. 24 August (1991) 12-14
- Andrew S. Tanenbaum and Maarten van Steen, Distributed Systems, Prentice Hall, 2002, ISBN 0-13-088893-1
- George Couloris, Jean Dollimore, and Tim Kinberg, Distributed Systems - Concepts and Design, 4th Edition, Addison-Wesley, 2005, ISBN 0321263545.
- Michel Raynal, Networks and Distributed Computation - Concepts, Tools, and Algorithms, The MIT Press, 1988, ISBN 0262-18130-4