Skip to main content
Guides Skills and frameworks Distributed Systems Interviews: CAP, Consensus & Replication
Skills and frameworks

Distributed Systems Interviews: CAP, Consensus & Replication

10 min read · April 24, 2026

Master the core distributed systems concepts — CAP theorem, consensus algorithms, and replication strategies — that senior engineering interviews actually test.

Distributed Systems Interviews: CAP, Consensus & Replication

Distributed systems questions are the great filter in senior and staff engineering interviews. Candidates who've spent years writing application code suddenly freeze when asked to explain why their database can't guarantee both consistency and availability during a network partition. If you're targeting Principal Engineer, Tech Lead, or Staff-level roles in 2026, this material isn't optional — it's the baseline. This guide explains CAP theorem, consensus algorithms, and replication strategies in plain language, connects them to real systems you've probably used, and tells you exactly how to talk about them in an interview without sounding like you memorized a Wikipedia article.

CAP Theorem Is Simpler Than You Think — And More Nuanced Than Textbooks Admit

CAP theorem states that a distributed system can guarantee at most two of three properties simultaneously: Consistency, Availability, and Partition Tolerance. Here's the problem: most explanations stop there, which makes the theorem sound like a menu where you pick two items. That framing is wrong, and interviewers at top companies know it.

Partition tolerance isn't optional. Networks fail. Packets drop. A distributed system that cannot tolerate partitions is a system that breaks the moment two nodes stop talking to each other — which is every production system, eventually. So the real choice is: when a partition occurs, do you sacrifice consistency or availability?

  • CP systems (Consistent + Partition Tolerant) refuse to serve stale data during a partition. They return errors or block until consistency is restored. HBase, Zookeeper, and etcd behave this way. Banking systems and distributed locks lean CP.
  • AP systems (Available + Partition Tolerant) keep serving requests during a partition, but different nodes may return different values. Cassandra, CouchDB, and DynamoDB (in default mode) are AP. Shopping carts and DNS caches lean AP.
  • CA systems don't exist in a distributed setting. A single-node PostgreSQL instance is effectively CA — but the moment you add replication, you're back to the CAP trade-off.

In an interview, don't recite these definitions robotically. Instead, anchor them to decisions: "When we built our order processing service, we chose a CP approach because we couldn't tolerate two nodes accepting conflicting writes on the same inventory item. The cost was that during a partition, some write operations would fail rather than return stale confirmation — which we handled with client-side retry logic." That answer shows you've lived the trade-off.

PACELC Extends CAP for the 99% Case — Use It to Differentiate Yourself

CAP only talks about behavior during partitions. PACELC (pronounced "pass-elk") asks a harder question: what's the latency vs. consistency trade-off even when the network is healthy?

The model says: if there's a Partition (P), choose between Availability (A) and Consistency (C). Else (E), even without a partition, choose between Latency (L) and Consistency (C).

This matters because most of your system's life is spent in the "else" branch — no partition, just normal operation. DynamoDB is PA/EL: it favors availability during partitions and low latency the rest of the time. Spanner is PC/EC: it pays the latency cost of strong consistency always, using TrueTime to provide external consistency across global data centers.

Most candidates can explain CAP. Fewer can explain why a system's behavior during normal operation matters just as much as its behavior during failures. PACELC is your signal that you think at systems scale.

Drop PACELC into your answer when the interviewer asks about database selection for a global system. It reframes the conversation from "pick two" to "these are the continuous trade-offs we're managing."

Replication Is How You Build Durability — Know All Three Models Cold

Replication means copying data across multiple nodes. The why is obvious — fault tolerance and read scalability. The how is where interviews test your depth. There are three main replication strategies:

1. Single-leader (primary-replica) replication All writes go to one leader. The leader replicates to followers. Reads can go to followers for scalability. MySQL, PostgreSQL, and most traditional RDBMS use this by default. The failure mode is leader unavailability: if your leader dies before replicating a write, you've lost data (or you've accepted a consistency hole, depending on your sync_commit settings).

2. Multi-leader replication Multiple nodes accept writes. Used in multi-datacenter deployments (active-active) and offline-capable clients (like Google Docs). The hard problem is write conflicts — if two users edit the same row on different leaders simultaneously, who wins? You need a conflict resolution strategy: last-write-wins (LWW), application-level merging, or CRDTs (more on that shortly).

3. Leaderless replication (Dynamo-style) No single leader. Clients write to multiple nodes directly, using quorums. If you have N replicas, a write quorum of W nodes must acknowledge, and a read quorum of R nodes must be consulted, where W + R > N to guarantee overlap. Cassandra and Riak use this model. The trade-off is tunable consistency: you control the W/R values to slide between AP and CP behavior at query time.

For an interview about a high-throughput system like Alex's 10M daily transaction platform, you'd articulate this as: "We used DynamoDB with W=2, R=2, N=3. That gave us strong read-your-writes consistency for the critical checkout flow, while the third replica handled async replication for analytics pipelines."

Consensus Algorithms Are the Hard Core — Here's What You Actually Need to Know

Consensus is the problem of getting multiple nodes to agree on a single value even when nodes crash or messages are delayed. It's the foundation of distributed locks, leader election, and replicated state machines. You need to know two algorithms: Paxos and Raft.

Paxos is theoretically elegant and practically painful. It was described by Leslie Lamport in 1989 and remains the intellectual foundation of the field. The protocol has two phases:

  1. Prepare phase: A proposer sends a prepare request with a proposal number to a majority of acceptors. Acceptors promise not to accept lower-numbered proposals and return any value they've already accepted.
  2. Accept phase: If a majority responds, the proposer sends an accept request. If a majority accepts, consensus is reached.

Paxos is notorious for being difficult to implement correctly. In interviews, you don't need to implement it. You need to explain the intuition: a value is chosen when a majority of nodes agree on it, and a majority guarantee means no two conflicting majorities can exist simultaneously.

Raft was designed specifically to be understandable. It separates consensus into three sub-problems:

  • Leader election: Nodes use randomized election timeouts to avoid split votes. The first node to time out requests votes; it wins if it gets a majority.
  • Log replication: The leader receives client requests, appends them to its log, and replicates to followers. An entry is committed once a majority acknowledges it.
  • Safety: Raft guarantees that committed entries are never lost. A leader can only be elected if its log is at least as up-to-date as the majority.

Raft is what etcd (and therefore Kubernetes) uses. If you're working on infrastructure or platform engineering, knowing Raft at implementation depth is legitimate differentiation.

For interviews, here's the framing that lands: "Raft and Paxos solve the same problem but make different engineering trade-offs. Paxos is more flexible but historically harder to implement without bugs. Raft makes stronger structural choices — single leader, sequential log — to make the correctness argument easier to reason about. For most production systems I'd choose a library like etcd over implementing either from scratch."

Consistency Models Are a Spectrum — Stop Thinking Binary

Interviewers love asking about "strong consistency" as if it's a single thing. It isn't. Consistency is a spectrum, and knowing the vocabulary makes you look like you've read Designing Data-Intensive Applications (you should have).

From strongest to weakest:

  1. Linearizability (strong consistency): Every read sees the most recent write, globally. Operations appear instantaneous and in real-time order. Google Spanner achieves this. Expensive.
  2. Sequential consistency: All nodes see operations in the same order, but that order doesn't have to match real-time. Weaker than linearizability but easier to implement.
  3. Causal consistency: Operations that are causally related appear in order. Independent operations can appear in any order. Used in systems like COPS and some MongoDB configurations.
  4. Eventual consistency: Given enough time and no new writes, all replicas will converge to the same value. Cassandra, DynamoDB defaults. The weakest meaningful guarantee.
  5. Read-your-writes consistency: A specific client always sees its own writes, even if other clients temporarily see stale data. A practical middle ground for user-facing applications.

In a system design interview for a social media feed, you'd say: "We don't need linearizability for the feed. If a user posts and their friend's feed is 200ms stale, that's fine. We need read-your-writes consistency for the poster themselves — they should see their own post immediately. Eventual consistency with a session guarantee covers both requirements without the cost of strong consistency."

CRDTs and Vector Clocks: The Advanced Material That Gets You Hired at Staff Level

If you're targeting Staff, Principal, or Architect roles in 2026, you need to go one level deeper.

Vector clocks track causality in distributed systems. Each node maintains a vector of counters, one per node. When a node processes an event, it increments its own counter. When it sends a message, it includes its current vector. The receiver takes the element-wise maximum. This lets you determine whether two events are causally related or concurrent — concurrent events are the source of conflicts.

CRDTs (Conflict-free Replicated Data Types) are data structures mathematically designed so that concurrent updates always merge without conflicts. Examples:

  • G-Counter: A grow-only counter. Each node has its own counter; the total is the sum. No merge conflict possible.
  • OR-Set: An add/remove set where adds always win over concurrent removes (using unique tags).
  • LWW-Register: Last-write-wins register using timestamps.

Figma, Notion, and Apple's Notes app use CRDT-based architectures for collaborative editing. If you mention CRDTs in the context of a collaborative document or shopping cart design, you signal that you know what production systems at scale actually look like.

Failure Modes Are What Interviewers Actually Want to Probe

Every distributed systems concept exists because of failure. When you explain any of the above, connect it to a concrete failure mode:

  • Split-brain: Two nodes in a primary-replica setup both think they're the leader. Causes: network partition, slow heartbeat. Solution: fencing tokens, STONITH (Shoot The Other Node In The Head), or Raft's term-based leadership.
  • Cascading failures: One slow node causes timeouts that back up queues that overwhelm the next node. Solution: circuit breakers, bulkheads, backpressure.
  • Thundering herd: A cache expires and thousands of requests simultaneously hit the database. Solution: cache stampede protection (probabilistic early expiration or mutex-based regeneration).
  • Clock skew: Two nodes disagree on the current time, causing incorrect LWW decisions. Solution: logical clocks (Lamport timestamps) or bounded uncertainty (TrueTime).

Talking about failure modes — not just happy-path architectures — is the single biggest differentiator between Senior and Staff-level interview performance.

Next Steps

Here's what to do in the next seven days to turn this guide into interview-ready knowledge:

  1. *Read chapters 5, 7, 8, and 9 of Designing Data-Intensive Applications by Martin Kleppmann.* This is the single best technical resource on distributed systems for practitioners. Those four chapters cover replication, transactions, distributed systems trouble, and consistency/consensus directly. Budget 6-8 hours.
  1. Run through the Raft visualization at raft.github.io. Spend 20 minutes interacting with the animated simulator. Kill nodes. Watch elections happen. See how log replication works under failure. You'll understand Raft 10x better than from reading alone.
  1. Pick one production system you've worked with (DynamoDB, Cassandra, PostgreSQL, Kafka) and write a 500-word document explaining its consistency model, replication strategy, and failure behavior. Articulating this in writing forces you to find the gaps in your knowledge before an interviewer does.
  1. Practice the CAP/PACELC framing out loud for two system design scenarios — one that clearly needs CP (distributed lock, payment processing) and one that clearly needs AP (social feed, DNS, shopping cart). Time yourself to two minutes per answer. If you can't do it cleanly in two minutes, you need more reps.
  1. Review one real-world distributed systems paper or post-mortem. The AWS DynamoDB 2022 paper, the Kafka replication deep-dive on the Confluent blog, or any public incident report from Cloudflare or GitHub gives you real-world language that makes your answers sound like experience rather than study.