Replication

9 min read•Designing Data-Intensive Applications

Replication

Replication means keeping copies of the same data on multiple machines connected via a network. Why bother?

Latency: Keep data geographically close to users
Availability: System continues working even if some machines fail
Throughput: Scale out the number of machines that can serve read queries

The challenge: keeping data consistent across replicas when it changes.

Note

If data never changes, replication is trivial—just copy it once. All the difficulty comes from handling changes to data.

Leaders and Followers

The most common replication approach: leader-based replication (also called master-slave or primary-replica).

How It Works

One replica is designated the leader (master, primary)
Other replicas are followers (slaves, read replicas, secondaries)
Writes go only to the leader
The leader sends data changes to all followers as a replication log
Followers apply changes in the same order
Reads can go to any replica

Leader-Based Replication

PLAINTEXT

Client writes:
   ┌─────────────────────────────────┐
   │                                 │
   ▼                                 │
┌──────────┐   replication log   ┌───┴──────┐
│  LEADER  │────────────────────►│ FOLLOWER │
│          │                     │          │
└──────────┘                     └──────────┘
      │                               ▲
      │   replication log             │
      └──────────────────────────────►│
                                 ┌──────────┐
                                 │ FOLLOWER │
                                 │          │
                                 └──────────┘
 
Client reads: (can use any replica)

This is used by PostgreSQL, MySQL, MongoDB, Kafka, and many others.

Synchronous vs Asynchronous Replication

When the leader receives a write, how long does it wait before confirming?

Synchronous

The leader waits until followers confirm they've received the write before responding to the client.

Advantage: Followers are guaranteed to have an up-to-date copy Disadvantage: If any follower doesn't respond, writes are blocked

Asynchronous

The leader responds immediately after its own write, without waiting for followers.

Advantage: Leader can continue even if followers fall behind Disadvantage: If leader fails, some writes may be lost

Semi-Synchronous

Common compromise: one follower is synchronous, others are asynchronous. If the synchronous follower becomes slow, another takes over.

Warning

Fully synchronous replication is impractical—a single node failure would halt the entire system. Most systems use asynchronous or semi-synchronous replication.

Setting Up New Followers

Adding a new replica isn't as simple as copying files—the data is constantly changing. Here's a consistent approach:

Take a consistent snapshot of the leader's database
Copy the snapshot to the new follower
Follower connects to leader and requests all changes since the snapshot
Once caught up, follower can process ongoing changes

Most databases automate this process.

Handling Node Failures

Follower Failure: Catch-up Recovery

If a follower crashes and restarts, it knows its last processed position in the replication log. It simply requests all changes since that position and catches up.

Leader Failure: Failover

Leader failure is trickier. The system must:

Detect that the leader has failed (usually via timeout)
Choose a new leader (often the most up-to-date follower)
Reconfigure the system so clients send writes to the new leader

Failover Process

PLAINTEXT

1. Leader stops responding to heartbeats
 
2. Followers start election
   └─ Usually choose follower with most recent data
 
3. New leader is chosen
 
4. Old leader's unreplicated writes are... lost? conflicting?
   └─ This is the hard part
 
5. Clients are redirected to new leader

Failover Pitfalls

Lost writes: If the old leader had accepted writes not yet replicated, they vanish. What if other systems (caches, external APIs) already acted on those writes?

Split brain: Two nodes both believe they're the leader. Without careful design, both accept writes, causing data divergence.

Timeout tuning: Too short = unnecessary failovers during load spikes. Too long = extended downtime.

Warning

Failover is surprisingly difficult to get right. Many systems opt for manual failover, with operators triggering it when confident the leader has failed.

Replication Logs

How does the leader send changes to followers?

Statement-Based Replication

Send the SQL statements themselves:

SQL

INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
UPDATE users SET last_login = NOW() WHERE id = 123;

Problem: Non-deterministic functions (NOW(), RAND()) give different results on each replica. Statements with side effects or auto-incrementing columns also cause issues.

Write-Ahead Log (WAL) Shipping

Send the same low-level log the leader uses for crash recovery. This describes which bytes changed in which disk blocks.

Problem: Tightly coupled to storage engine internals. Followers must run the same database version.

Logical (Row-Based) Log Replication

Send a logical description of changes:

PLAINTEXT

INSERT into users: {id: 1, name: "Alice", email: "alice@example.com"}
UPDATE users WHERE id=123: {last_login: "2024-01-15 10:30:00"}
DELETE from users WHERE id: 456

Advantage: Decoupled from storage format. Easier for external systems to consume (change data capture).

Trigger-Based Replication

Application-level triggers copy data to another table when changes occur. Maximum flexibility but more overhead and potential for bugs.

Replication Lag

With asynchronous replication, followers may be seconds or even minutes behind the leader. This is replication lag.

For read-heavy workloads reading from followers, lag causes weird effects:

Reading Your Own Writes

You submit a comment, refresh the page, and your comment isn't there yet (because you read from a lagging follower).

Solutions:

Read your own recent writes from the leader
Track the timestamp of your last write; don't read from followers behind that point
Client remembers its last write position; reject reads from followers that haven't caught up

Read-Your-Writes Consistency

PLAINTEXT

1. User posts comment         → goes to Leader
2. User refreshes page        → reads from Follower
3. Follower hasn't caught up  → comment missing!
 
Fix: Route user's reads to Leader for N seconds after their writes

Monotonic Reads

You query twice and see newer data first, then older data (because you hit different followers with different lag).

Solution: Route each user's reads to the same replica (e.g., based on hash of user ID).

Consistent Prefix Reads

In partitioned databases, you might see an answer before the question if partitions have different lag.

Causality Violation

PLAINTEXT

Partition A (lagging):    "What's the score?"
Partition B (current):    "It's 3-2!"
 
An observer might see: "It's 3-2!" then "What's the score?"

Solution: Ensure causally related writes go to the same partition, or use explicit causal ordering.

Multi-Leader Replication

In leader-based replication, the leader is a bottleneck and single point of failure. Multi-leader (multi-master) replication allows writes to multiple nodes.

Use Cases

Multi-datacenter operation: Each datacenter has its own leader. Better latency, better availability when a datacenter goes down.

Clients with offline operation: Mobile apps can work offline, syncing later. Each device is effectively a leader.

Collaborative editing: Google Docs allows multiple users to edit simultaneously.

The Big Problem: Write Conflicts

What happens when two leaders accept conflicting writes?

Write Conflict

PLAINTEXT

Datacenter A:  UPDATE users SET name = 'Alice Smith' WHERE id = 1;
Datacenter B:  UPDATE users SET name = 'Alice Jones' WHERE id = 1;
 
Both succeed locally. When they replicate... which one wins?

Conflict avoidance: Route all writes for a given record to the same leader. Simple, but loses benefits if that datacenter fails.

Converging to a consistent state: All replicas must eventually agree. Strategies include:

Last write wins (based on timestamp or ID) — simple but loses data
Merge the values somehow ("Alice Smith-Jones"?)
Record the conflict for later resolution (shopping cart approach)
Prompt the user to resolve

Warning

Multi-leader replication adds significant complexity. Avoid it unless you have a strong use case (multi-datacenter, offline clients, collaborative editing).

Leaderless Replication

Some databases abandon the leader concept entirely. Any replica can accept writes. Dynamo-style databases (Cassandra, Riak, Voldemort) work this way.

Quorums

With n replicas, writes must be confirmed by w nodes, and reads must query r nodes.

As long as w + r > n, you're guaranteed that at least one node has the latest data.

Quorum Configuration

PLAINTEXT

n = 3 replicas
w = 2 (write to at least 2)
r = 2 (read from at least 2)
 
Write: Send to all 3, succeed when 2 confirm
Read: Query all 3, use the most recent value
 
Since w + r = 4 > 3, reads and writes overlap → we see latest data

Handling Stale Data

When you read from r nodes, you might get different values. Two repair mechanisms:

Read repair: When reading detects a stale value, write the newer value back to the stale replica.

Anti-entropy: Background process that compares replicas and copies missing data.

Sloppy Quorums

If required nodes are unavailable, a sloppy quorum lets you use other nodes temporarily. When original nodes recover, data is hinted handoff to them.

This improves availability but means you might read stale data even with w + r > n.

Summary

Replication serves three main purposes:

Purpose	Approach
Reduce latency	Place replicas near users
Increase availability	Keep working if nodes fail
Increase read throughput	Serve reads from multiple replicas

The main replication architectures:

Architecture	Leaders	Complexity	Conflict Handling
Single-leader	1	Low	None (leader decides)
Multi-leader	2+	High	Required
Leaderless	0	Medium	Required

Note

Replication lag is inevitable with asynchronous replication. Design your application to handle temporary inconsistency, or pay the performance cost of synchronous replication.

Key concepts:

Synchronous vs asynchronous determines durability guarantees
Failover is hard; lost writes and split-brain are real risks
Replication lag causes read-your-writes, monotonic reads, and causality issues
Multi-leader and leaderless require conflict resolution
Quorums (w + r > n) provide probabilistic guarantees