Scalability means how well a solution to some problem will work when the size of the problem increases. The degree of scalability determines how much a system can be expanded without performance degradation or alteration of the internal structure of the system.
Scalability can be defined formally as the capability of a system or application or product to continue to function well without performance loss under increasing load by adding additonal resources or instances of the system. Couloris et al.  define scalability as follows: "a distributed system is scalable if the cost of adding a user is a constant amount in terms of the resources that must be added". The system can serve an increasing demand or a larger number of users by adding additional instances or devices. The IEEE definition says it is "the ability to grow the power or capacity of a system by adding components" . The key scalability technique is just as in fault tolerance replication: a service can be replicated at many nodes to serve a larger demand. Scalability is also related to and sometimes required for load balancing. A large-scale Internet application must be parallelized and replicated to scale well.
Werner Vogels, CTO of Amazon.com, mentions on his personal weblog: "A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added." (see ). Scalability refers to the property of a system architecture which determines the limit of the ability to grow and to scale up. Ken Birman gives in his article "Can Web Services Scale Up ?" the following definition of a scalable system: "In a nutshell, a scalable system is one that can flexibly accommodate growth in its client base. Such systems typically run on a clustered computer or in a large data center and must be able to handle high loads or sudden demand bursts and a vast number of users. They must reliably respond even in the event of failures or reconfiguration. Ideally, they’re self-managed and automate as many routine services such as backups and component upgrades as possible".
Ways and Means
The key to scalablity are the KISS principle and the four "S": to keep it simple, stateless, scriptable and small. Simple components have their advantages. It is well-known in software engineering that simple components and simple processes offer less possibilities to make errors and mistakes. The less code you write, the less can go wrong and the less there is to maintain. Hotmail engineer Phil Smoot said in an ACM interview (ACM queue, Vol. 3 No. 10, December/January 2005-2006): "New hires tend to want to do complex things, but we know that complex things break in complex ways. The veterans want simple designs, with simple interfaces and simple constructs that are easy to understand and debug and easy to put back together after they break." A lot of relatively small and simple commodity machines will not only be cheaper than a few big and expensive machines, they also scale much better and offer a better fault-tolerance.
Statelessness is often the essential key. Stateless nature of the system means high scalability. If servers, objects, components or EJBs are stateless and have no private state, they can be replicated easily. Redundancy and replication become very difficult subjects if the corresponding entities or objects are stateful: the state must be consistent among all replicated instances (consistency), and an access should either affect all or none of the replicated objects (atomicity).
Stateless means simple processes, and simple solutions are always good. Communication with Web Servers over pure HTTP is stateless - probably one reason why the world-wide web is very scalable and successful. Phil Smoot said about the MSN service in the ACM interview "In general, we try to keep no session affinity between a Web server that is performing a given page paint and the middle-tier servers that manage the transactions against the underlying data stores."
In order to reach statelessness, one can store something for instance in the database layer, if it needs to be stateful. This is sometines considered as "push all state down to the database". A typical request in such a logical three tier architecture triggers the following actions: "loading state for a set of objects from the database, operating on them, pushing their state back down into the database (if needed), writing the response, and then getting the hell out of there i.e. releasing all references to objects loaded for this request, leaving them for garbage collection", see .
Scriptable applications are easy to automate. Hotmail engineer Phil Smoot said in an ACM interview: "Our operation group never wants to rely on any sort of user interface [or complex GUI]. Everything has to be scriptable and run from some sort of command line. That's the only way you're going to be able to execute scripts and gather the results over thousands of machines".
In a large system, you have to assume that everything is going to fail, whether software or hardware. Failure of components in any small-scale system is the exception, not the rule, whereas failure of components in any large-scale system is the rule, not the exception. Therefore you have to design "for failure" in any large-scale system, i.e. you have to design the system as if any component could fail at any time. Thus an important means to achieve scalability is to make self-* properties a part of the system (self-configuration, self-management, self-inspection, self-repair), the system has to observe, heal and regenerate itself constantly. There are completely new problems on massive scales (the probability of failures and faults increases, and there is always some node that fails, thus assume that nodes fail and try to keep the system healthy through continuous refresh and recovery oriented computing).
Multi Tier Architecture
A physical three tier architecture alone is not necessarily more scalable than a two tier architecture, because it expands in the wrong direction and can cause "remote object hell". This is illustrated nicely by Ryan Tomayko in his blog, see 
"Many large enterprise web applications tried really hard to implement a Physical Three Tier Architecture, or they did in the beginning. The idea is that you have a physical presentation tier (usually JSP, ASP, or some other *SP) that talks to a physical app tier via some form of remote method invocation (usually EJB/RMI, CORBA, DCOM) that talks to a physical database tier (usually Oracle, DB2, MS-SQL Server). The proposed benefits of this approach is that you can scale out (i.e. add more boxes) to any of the physical tiers as needed.
Great, right? Well, no. It turns out this is a horrible, horrible, horrible way of building large applications and no one has ever actually implemented it successful. If anyone has implemented it successfully, they immediately shat their pants when they realized how much surface area and moving parts they would then be keeping an eye on. The main problem with this architecture is the physical app box in the middle. We call it the remote object circle of hell. This is where the tool vendors solve all kinds of interesting what if type problems using extremely sophisticated techniques, which introduce one thousand actual real world problems, which the tool vendors happily solve, which introduces one thousand more real problems, ad infinitum... It's hard to develop, deploy, test, maintain, evolve; it eats souls, kills kittens, and hates freedom and democracy."
Examples and Applications
Open source software is an important factor in achieving scalability. Yahoo uses open source software as - FreeBSD, Apache, Python, PHP, Perl, and MySQL - to achieve scalability. Like Yahoo, Google is an academic organization and has an open source attitude.
Massively scalable architectures are rare. They can be found in the proprietary systems of the large IT companies, which have literally thousands of servers, millions of users and billions of transactions per day. The question if you get better performance or not if you are adding more and more boxes and computers is of crucial importance to these companies.
Google has a special design of its hardware infrastructure - Google uses a highly redundant cluster architecture of custom made commodity desktop PCs - and an own Google file system (GFS). The principle is to use lots of relatively small and inexpensive machines instead of a few big and expensive machines. Goolge has also functional server pools (for instance web servers, index servers and document servers) like eBay. Google uses Python extensively since the beginning, and they have even hired the Python creator Guido van Rossum. Many components of the Google spider and search engine seem to be written in Python.
Amazon uses mainly its own proprietary custom software in C/C++, Java, SQL, Perl, on Linux servers, and partially Oracle's Real Application Clusters (RAC) software. It uses "a homegrown message queuing architecture and Web services to wire together its collection of internally written applications." (see ) The large-scale system from Amazon is built out of many different custom-made applications or "services", about 100-150. Each of this applications has different requirements in respect of availability and consisteny, for example the order service (taking an order), the customer service (updating customer records), the reviewer service (storing customer reviews), or the catalog service (checking the availability of a particular item). Amazon uses a massive parallel and modular architecture: all small pieces of business functionality is encapsulated by different isolated and separated services, and each service is responsible for the corresponding data and data management. Amazon also uses the autonomous "team method":the same team of developers that has built and developed the service is also responsible for operating it. It can use the tools and methods it wants, as long as the desired functionality is delivered. Amazon finally uses a simple queue service to connect the applications of the auonomous teams that offers a reliable, highly scalable hosted queue for buffering messages between distributed application components. Components of the huge distributed application are decoupled with messages and queues so that they run independently.
Yahoo has it's own page rendering patent. Yahoo uses like Microsoft's Hotmail service the FreeBSD operating system , besides the Apache Web Server and other open source software. Similar to Google, Yahoo uses open source software extensively, for example Python for its groups site. The Yahoo online groups, a comprehensive public archive of Internet mailing lists, was originally implemented in pure Python.
Microsoft MSN's services include e-Mail, Instant Messenger, news, weather, sport, etc. The Hotmail e-Mail is one of MSN's larger services and relies on more than 10,000 servers spread around the globe: they have special mail servers and storage servers. They use of course Microsoft products (what else), "clusters" as a unit which can be build in and out on demand, and scriptable applications which can be automated.
eBay uses since the Version 3 eBay architecture J2EE technologies and IBM WebSphere Application Server. They emphasize stateless design and functional server pools (partitioning of application servers based on use cases). They don't seem to use entity beans very much. "In general, the approach that eBay is alluding to (and Google has confirmed) is that architectures that consist of pools or farms of machines dedicated on a use-case basis will provide better scalability and availability as compared to a few behemoth machines." 
Werner Vogels: A word on scalability
Nuggets of Wisdom from eBay's Architecture
eBay Creates Technology Architecture for the Future
Amazon.com at LinuxWorld: All Linux, All the Time
The magic that makes Google tick (ZDNet 2004)
Web Search For A Planet: The Google Cluster Architecture
Design for Scalability - an Update
- Can Web Services Scale Up ?, Ken Birman