ACID, CAP, and BASE
Learning objectives
- You know the terms ACID, CAP Theorem, and BASE.
- You know how ACID, CAP Theorem, and BASE relate to the scalability of database systems.
Databases provide the means to store, retrieve, and organize data. Databases should allow situations, where the database can be accessed concurrently by multiple applications and multiple users; in the case of scalable applications, ideally, also in a distributed manner. Here, we rehearse some of key properties of databases (often discussed in introductory database courses.
ACID
Transactional database systems isolate activities of individual users from all other concurrently happening activities, limiting the duration and degree of the isolation. In transactional database systems, queries that belong to transaction should be either completed fully or not completed at all, and changes occurring during a transaction should not influence other concurrently executed queries. Four properties -- the acronym ACID -- highlight the quality of a transactional database system.
- Atomicity -- a database that offers atomicity must guarantee that each transaction must complete fully or not complete at all, and the user must be made aware of whether the transaction was completed (or not).
- Consistency -- a database that offers consistency must guarantee that the results of a transaction that completes fully are committed (and stored) to the database.
- Isolation -- a database that offers isolation must hide events within a transaction from other concurrently running transactions, which allows resetting a failed transaction to the initial state, as required by Atomicity.
- Durability -- a database that offers durability must guarantee that the committed results survive malfunctions.
For additional details, look into the Principles of Transaction-Oriented Database Recovery.
CAP
When we consider scalable web applications that eventually will become distributed, maintaining consistency becomes challenging as networks can have partitions -- i.e., there are breaks in networks where parts of a network are isolated from others. Due to this, when designing a distributed database application, tradeoffs between availability of the database application and the correctness of the data in the database are involved.
For additional details, look into Consistency in a Partitioned Network: a Survey.
The CAP Principle (also CAP Theorem) highlights that distributed systems can have only two out of the following three properties:
- Strong Consistency -- a distributed system with strong consistency can guarantee that the results of a transaction that completes fully are committed and stored to the database (in the same vein as Consistency in ACID).
- High Availability -- a distributed system with high availabity uses redundancy and data replication, leading to a situation where at least some replica of data can be reached even in the case of network partitions.
- Partition-resilience -- a distributed system with partition resilience can survive even in the case of (network) partitions between the data replicas.
As an example, systems without partition resilience can offer strong consistency and high availability, but can do only so when there are no network partitions that would separate replicas (and e.g. disallow propagating changes to other replicas).
For additional details, look into Harvest, Yield, and Scalable Tolerant Systems.
BASE
While the acronym ACID provides a set of principles that are required for a transactional database system, the acronym BASE provides a set of principles for designing distributed database system. BASE stands for Basically Available, Soft state, Eventually consistent.
- Basically Available -- a distributed system with basic availability distributes data and accepts failures, which means that at least some users will get data.
- Soft state -- a distributed system with soft state can lose state information, e.g., in the case of system crashes or network issues.
- Eventually consistent -- a distributed system with eventual consistency guarantees that data will be consistent at some point in time.
At the core of BASE is the notion that one should accept that reaching consistency may take time, and that partial failures are acceptable. The article linked below highlights also the use of event-driven architecture to inform users when state has become consistent -- Creating an event within the transaction that commits the asset to the receiving user provides a mechanism for performing further processing once a known state has been reached. EDA (event-driven architecture) can provide dramatic improvements in scalability and architectural decoupling.
For additional details, look into BASE: An Acid Alternative: In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.
Reading task
Over the years, the key question in distributed database systems has evolved into How to effectively achieve High Availability and Consistency. The evolution towards this is explored in this reading task.
Here, your task is to read the article CAP Twelve Years Later: How the "Rules" Have Changed and create two questions based on it. The article is available also at https://sfu-db.github.io/dbsystems/Papers/CAP-12years.pdf as a PDF.
For writing the question, refer also to the notes on good questions.
Write the questions using the widget shown below.
Question not found or loading of the question is still in progress.
Once you have created the two questions, answer six or more peer-authored questions below. After each question, you are given a possibility to rate the question -- please, rate each question that you answer.