Conceptualizing Scalability
Learning objectives
- Knows what the term scalability refers to.
- Understands the difference between vertical and horizontal scaling.
The term scalability has been defined in the literature multiple times. Here are a few examples from the 1990s.
"By scalability we mean that the proposed protocols for data delivery are cost-effective even when there are a very large number (100's, 1000's, even tens of thousands) of destinations that the data needs to be delivered to." Hall, Robert W., et al. "Corona: a communication service for scalable, reliable group collaboration systems." Proceedings of the 1996 ACM conference on Computer supported cooperative work. 1996.
"We call a system scalable if the system response time for individual requests is kept as small as theoretical possible when the number of simultaneous HTTP requests increases, while maintaining a low request drop rate and achieving a high peak request rate." Andresen, Daniel, et al. "SWEB: Towards a scalable World Wide Web server on multicomputers." Proceedings of International Conference on Parallel Processing. IEEE, 1996.
"By scalability, we mean that when the load offered to the service increases, an incremental and linear increase in hardware can maintain the same per-user level of service." Fox, Armando, et al. "Cluster-based scalable network services." Proceedings of the sixteenth ACM symposium on Operating systems principles. 1997.
In all of the above examples, and in the earlier literature in general, the key concern has been scaling to meet increasing demand. There are, in principle, two strategies for scaling software: vertical scaling and horizontal scaling. Vertical scaling refers to adding more resources to the current server, while horizontal scaling refers to adding more servers to handle the task.
Vertical scaling - scaling up
Vertical scaling means adding more resources to the current server. This includes upgrading hardware resources such as the number of available processors and the amount of available memory, which in turn improves the processing power of the single server (Fig. 1).
In practice, however, there are limits to scaling vertically, as there are limits to the processing power and memory that can be added to a single computer. In addition, as quite a bit of time in web applications is used for waiting other resources, adding more processing power does not necessarily reduce time spent on waiting.
Horizontal scaling - scaling out
Horizontal scaling means adding more servers to handle the workload (Fig. 2). As a classic example of horizontal scaling, take a look at the NASA Home Page from late 1996. To meet the global demand of seeing updates on the Mars Pathfinder mission, NASA mirrored the contents of their homepage to a number of servers, and linked users to those servers.
In a more modern web application context, distributing the workload over multiple servers requires also a mechanism for distributing the incoming requests to those servers. One approach that we discuss in later chapters is the use of load balancing.
Adjusting to demand
While the earlier definitions of scalability often were related to meeting increased demands, scalability has come to also mean meeting fluctuating demands. In practice, while there may be the need to scale up and out to match increasing demand, there are also times to scale down and in when the workload decreases (Fig. 3).