Data & Data Scalability

Approaches to Consistency: ACID and BASE

Learning Objectives

You know the terms consistency and transaction.
You know of ACID and BASE consistency models.
You understand the tradeoffs between ACID and BASE.

Consistency and transactions

A database system often serves multiple users or applications simultaneously, each performing queries, updates, or other operations independently, while sharing the same database. A key requirement from databases is ensuring consistency — databases must adhere to predefined rules such as user-defined checks and constraints and foreign key relations, even when accessed concurrently.

Consistency means that the database reflects valid states at all times. For example, in a banking system, a user-defined check may state that account balances should never become negative. Without proper safeguards, simultaneous queries — such as two concurrent withdrawals where an account would have sufficient balance for either one of them but not for both — could produce inconsistent or partial results, leading to incorrect states.

To ensure consistency, one query must not interfere with another query. For example, if one query reads records while another updates them, the read query should reflect either the state before or after the entire update, but never a partial update.

Database systems enforce consistency through transactions. Transactions are a sequence of operations such as queries executed as a single, isolated unit. Transactions guarantee that each operation is completed fully or not at all. They shield concurrent transactions from interfering with each other, ensuring data integrity. This ensures that even in the event of a failure, like the check for negativity, the database can revert a transaction to return to a consistent state.

There are two main approaches to data consistency used in database systems: ACID and BASE.

Loading Exercise...

ACID

Transactional database systems isolate queries to individual transactions, where queries that belong to a transaction should either be completed fully or not completed at all, and changes occurring during a transaction should not influence other concurrently executed queries. Four properties — the acronym ACID — highlight the quality of a transactional database system.

Atomicity — a database that offers atomicity must guarantee that each transaction must complete fully or not complete at all, and the user must be made aware of whether the transaction was completed (or not).
Consistency — a database that offers consistency must guarantee that the database constraint checks (e.g. integrity rules, foreign key checks) pass before and after each transaction.
Isolation — a database that offers isolation must hide events within a transaction from other concurrently running transactions, which ensures that failed transactions can be reset to their initial state, supporting Atomicity.
Durability — a database that offers durability must guarantee that the committed results survive failures.

While atomicity, consistency, and isolation can be handled on the code-level, durability is handled by logging of database operations to persistent storage before writing the changes to the database. This mechanism is called write-ahead logging (WAL). WAL allows surviving malfunctions through comparing the log and the database state — if a database server crashes during an update procedure, the server can revert and reapply any uncompleted operations in the log.

For history of ACID, see e.g. the Principles of Transaction-Oriented Database Recovery.*

Loading Exercise...

BASE

When working with databases at a scale, the data is typically divided into multiple servers. Scaling ACID systems can be challenging (but not impossible), as the properties of ACID can limit scalability. As an example, guaranteeing consistency across multiple servers would require a lot of communication between the servers, which could slow down the system.

The acronym BASE provides a set of principles for distributed database systems that allow sacrificing immediate consistency to achieve higher availability. BASE stands for Basically Available, Soft state, Eventually consistent.

Basically Available — the system aims for high availability but accepts temporary inconsistencies and failures, where even in the case of a failure, at least some users should get data.
Soft state — the system can store data in a way that is not guaranteed to be stable at all times. This means that a distributed system with soft state can lose state information, e.g., in the case of system crashes or network issues.
Eventually consistent — a distributed system with eventual consistency guarantees that data will be consistent at some point in time.

It’s important to note that in the BASE acronym, consistency refers to data consistency across multiple servers, which differs from the consistency in the ACID model that ensures data integrity through rules and constraints. At the same time, BASE systems such as MongoDB have transaction-like features with limitations such as guaranteeing atomicity only at the document-level, or if providing multi-document transactions, having throughput or size limitations performance trade-offs (which is to be expected when adapting ACID-like properties).

For additional details, see BASE: An Acid Alternative: In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.

At its core, BASE acknowledges that while consistency is desired, achieving it may take time. Therefore, partial failures, such as temporarily inconsistent results, are acceptable provided they occur infrequently.

Loading Exercise...

What to choose and when

As a rule of thumb, if the application cannot tolerate stale data or lost writes (banking, ticket reservations, etc.) ACID is the way to go. On the other hand, if the application can tolerate inconsistencies (e.g. social media sites, e-commerce sites), BASE can also be an option. Although BASE is often discussed in the context of distributed database systems, many distributed database systems also offer ACID properties through between-system communication.

Inherently, BASE systems allow better scalability and availability, with the loss of strong consistency. In practice, many systems have polyglot persistence, where different parts of the system use different databases that match the requirements of the specific part of the system. For example, a system might use a relational database for financial transactions and a NoSQL database for user profiles.

Loading Exercise...

← Overview

Data Distribution: Replication, Sharding, and CAP & PACELC →