Skip to main content
Guides Skills and frameworks Consistent Hashing Interview Guide: Virtual Nodes, Ring Layout, and Rebalancing
Skills and frameworks

Consistent Hashing Interview Guide: Virtual Nodes, Ring Layout, and Rebalancing

9 min read · April 25, 2026

Consistent hashing is the single most-asked data-partitioning technique in system design loops. Here's how to draw the ring, name virtual nodes correctly, and answer the rebalancing follow-up.

Consistent Hashing Interview Guide: Virtual Nodes, Ring Layout, and Rebalancing

Consistent hashing shows up in every system design loop at the staff-plus level. Cassandra, DynamoDB, Riak, memcached clients (Ketama), Akamai's CDN routing, and Discord's message ingest all use variants of it. If you're being asked to shard anything, you will be asked about consistent hashing. Most candidates can draw the ring; far fewer can explain why virtual nodes matter, how rebalancing actually works, or when a naive hash is better.

This guide is the version of the consistent hashing conversation I wish every candidate walked into. The goal is to be able to draw the ring, name the tradeoffs, and answer the follow-ups on hot shards and failure domains.

The problem it solves

Imagine you have 4 cache servers and you shard keys by hash(key) mod N. You add a fifth server. Suddenly hash(key) mod 5 != hash(key) mod 4 for ~80% of keys, and your cache hit rate craters while everything reloads from the origin.

Karger et al. (1997, "Consistent Hashing and Random Trees") proposed mapping both servers and keys onto a circular hash space (typically a 2^32 or 2^64 ring) and assigning each key to the next server clockwise. Adding or removing a server only moves keys between that server and its clockwise neighbor — on average K/N keys, not K * (N-1)/N.

The 1997 paper was about web caching; the 2007 Amazon Dynamo paper made it the default for distributed databases. Every modern cloud-native database either uses it directly (Cassandra, Riak) or uses a closely related range-based variant (DynamoDB, CockroachDB).

Drawing the ring — what interviewers want

                  hash space 0 .. 2^32-1
                           0
           key K1 --->  o  |  o  <--- Node A (hash 0x10000000)
                        \  |  /
                         \ | /
    Node D (0xC000...) o---+---o Node B (0x40000000)
                         / | \
                        /  |  \
           key K2 --->  o  |  o  <--- Node C (hash 0x80000000)
                           |

The rules you must state out loud:

  • Both keys and nodes are hashed into the same space, conventionally 0..2^32 or 0..2^64. MD5 is common (Cassandra's Murmur3, Ketama's MD5). The specific hash doesn't matter as long as it's uniform and fast.
  • A key is owned by the first node encountered going clockwise from the key's hash.
  • Adding a node only affects the range between the new node and its clockwise predecessor — roughly K/N keys move.
  • Removing a node transfers its range to the next clockwise node.

Drawing this once and walking through the add/remove operations takes 90 seconds and is table stakes for any sharding question.

Why virtual nodes exist

The naive ring has two problems interviewers will probe:

  1. Imbalance. With 10 physical nodes hashed onto a ring, the distribution of arc lengths is not uniform — some nodes own 15% of the ring, others 5%. Your load is skewed before a single request arrives.
  2. Cascading failure on node loss. When a node dies, its entire range moves to one neighbor, which then gets double load — often triggering a cascade.

Virtual nodes (vnodes) solve both. Each physical node gets mapped to the ring at K distinct positions (typically 100 to 256). Now each physical node owns a few hundred small arcs scattered around the ring.

With vnodes:

  • Load distribution approaches uniform as K grows. K=128 gives <5% variance in practice.
  • When a node fails, its ~128 small arcs get redistributed to ~128 different neighbors, not one. No cascade.
  • Adding heterogeneous hardware is trivial: a 2x-larger node just gets 2x vnodes.

Cassandra uses 256 vnodes per physical node by default. Riak uses a fixed ring of 64/128/256 partitions, each of which is assigned to a physical node — effectively virtual nodes with a fixed count. DynamoDB abstracts this entirely; users don't see partitions until they exceed the per-partition throughput limit.

If you draw the ring in an interview without mentioning virtual nodes, you will be corrected.

Pseudocode

A minimal implementation that tends to land well on the whiteboard:

class ConsistentHashRing:
    def __init__(self, vnodes_per_node=128):
        self.vnodes = 128
        self.ring = SortedDict()  # hash -> physical_node_id

    def add_node(self, node_id):
        for i in range(self.vnodes):
            h = hash(f"{node_id}#{i}")
            self.ring[h] = node_id

    def remove_node(self, node_id):
        for i in range(self.vnodes):
            h = hash(f"{node_id}#{i}")
            del self.ring[h]

    def get_node(self, key):
        if not self.ring:
            return None
        h = hash(key)
        # Find smallest ring hash >= h, wrapping around
        idx = self.ring.bisect_right(h) % len(self.ring)
        return self.ring.values()[idx]

Real implementations use a sorted structure (tree or a sorted array with binary search) so lookup is O(log N). With N=1000 physical nodes and 128 vnodes each, that's 128,000 entries and log2(128000) ≈ 17 comparisons per lookup — trivially fast.

Rebalancing — the hard part

The follow-up every interviewer asks: what happens during rebalancing?

Three modes:

  • Planned add. Compute the new node's vnode positions, determine the arcs moving, stream data from current owners to the new node, then flip ownership atomically. Cassandra calls this bootstrap; it's intentionally slow to avoid thrashing.
  • Planned remove. Stream data from the leaving node to its clockwise successors, then update the ring membership. Cassandra calls this decommission.
  • Unplanned failure. The node's data is still on the replicas (assuming replication factor > 1). Read requests are served from replicas until either the node recovers (cheap) or a replacement node boots and rebuilds (expensive).

The bytes-moved math is the question you need to nail. For N nodes and a dataset D:

  • Naive mod N: adding one node moves ≈ D * (N-1)/N bytes.
  • Consistent hashing without vnodes: adding one node moves ≈ D/N bytes, to the clockwise neighbor.
  • Consistent hashing with K vnodes: adding one node moves ≈ D/N bytes total, spread across ~K different source nodes. Much lower per-source load.

The last bullet is why Cassandra can add a node to a 100-node cluster without the operator noticing.

Replication on the ring

Production systems don't store one copy per key. They store R copies (replication factor). On a hashed ring, the convention is:

  • The primary is the first node clockwise from the key.
  • Replicas are the next R-1 nodes clockwise.

With vnodes, "next R-1 nodes" must be distinct physical nodes, not vnodes of the same physical node. Cassandra's rack-aware and datacenter-aware replication strategies enforce this: the replicas must span different racks/AZs/regions per the configured snitch.

This is where the follow-up "what's your failure domain" lives. If you put all three replicas on nodes in the same rack, you lose a rack and lose data. Cassandra's NetworkTopologyStrategy is the standard answer; name it.

When NOT to use consistent hashing

Senior candidates refuse to add complexity that isn't needed. The bad cases:

  • Fixed, small node count. If you have 4 nodes that will never change, hash mod N is fine. Consistent hashing is for elasticity.
  • Range queries. Consistent hashing shatters range locality — the keys user:1000 and user:1001 hash to unrelated ring positions. If your workload is range-scan-heavy, use range partitioning (HBase, CockroachDB, BigTable). You can still get elasticity via range splits.
  • Strong cross-key transactions. Consistent hashing spreads related data across nodes. Multi-key transactions become cross-shard 2PC. If your workload needs tight transactions, consider colocating by a partition key (Vitess, Citus).
  • Small data on a single machine. If it fits on one box, you don't need a ring.
  • Systems where operators expect predictable placement. Range partitioning is operationally easier to reason about; "user X's data is on shard 3" is comprehensible. "User X's data is on the node that won the consistent hash race" is not.

Real-world implementations

Name-drop these and you sound like you've shipped:

  • Amazon Dynamo (2007 paper). The canonical reference. Introduced virtual nodes at scale, combined with vector clocks and sloppy quorums. DynamoDB descended from this paper but hides the ring from users.
  • Cassandra. 256 vnodes per node by default (configurable), token-based ring, Murmur3 hash. Jepsen-tested and battle-hardened.
  • Riak. 64/128/256 fixed-partition ring, each partition assigned to a physical node. Simpler conceptually than Cassandra's unbounded vnodes.
  • Memcached with Ketama. Client-side consistent hashing library (Last.fm, 2007). Every major memcached client (libmemcached, twemproxy, python-memcached) supports Ketama-compatible hashing for cache cluster membership.
  • Akamai. Uses consistent hashing to route content requests to caches; mentioned in the 1997 Karger paper as the original production use case.
  • Discord. Wrote about their move to consistent hashing for session routing on their engineering blog — worth reading before a chat/messaging interview.
  • Envoy and Nginx. Support consistent hashing for upstream selection (ring_hash and maglev load balancers). Maglev (Google, 2016) is a consistent-hash variant with better worst-case lookup and resilience.

Common candidate mistakes

  • Forgetting virtual nodes. Always mention them. Naive rings are pedagogical only.
  • Claiming consistent hashing is "the" answer to sharding. It's one answer. Range partitioning, hash-range hybrid (CockroachDB), and explicit directory-based partitioning (Vitess) are all valid and often better.
  • Ignoring failure domain placement. Vnodes help distribute load; they do not automatically put replicas in different racks. That's a separate configuration.
  • Not naming the hash function. MurmurHash3 for Cassandra, MD5 for Ketama. "A good hash" is vague; candidates who name the function are signaling real experience.
  • Forgetting the hot-key problem. Even perfectly balanced ring, a single celebrity key saturates one replica set. Consistent hashing does not fix hot keys. Name a secondary mitigation (key splitting, client-side caching).
  • Claiming O(1) lookup. Lookup is O(log V) where V is the total vnode count. Usually fine, but say it correctly.

Advanced follow-ups

  • "What about Rendezvous hashing (HRW)?" Answer: alternative to ring-based. For each key, compute hash(key, node) for every node; pick the highest. No ring, no vnodes needed for balance. Used in some load balancers and CDNs. Downside: O(N) per lookup instead of O(log N).
  • "What about Jump hash?" Answer: Google's 2014 paper. O(1) lookup, O(N) for adding a node (you have to rebuild). Great for systems where node count is relatively stable.
  • "How do you handle hot keys on the ring?" Answer: key salting (key#0, key#1, ..., key#N) and read all shards, or client-side caching to reduce load on the hot replica.
  • "How do you bound replica copies to specific AZs?" Answer: Cassandra's NetworkTopologyStrategy or equivalent — walk clockwise but skip nodes until you get the required AZ diversity.
  • "What about Maglev hashing?" Answer: Google 2016 paper, used in Google Cloud Load Balancer and Envoy. Builds a lookup table of prime size via a permutation; gives minimal disruption on node changes and O(1) lookup.
  • "What's the rebuild cost when a node dies and you replace it?" Answer: streaming the lost replica's data from surviving replicas. With replication factor 3 and 1TB per node, that's 1TB of cross-AZ traffic. Your cost model must account for it.
  • "How does DynamoDB hide the ring?" Answer: internally it's hash-partitioned; users see logical partitions bounded by per-partition throughput limits (adaptive capacity since 2018). The ring is a service internal, not user-visible.

The candidates who ace consistent hashing in an interview are not the ones who memorize the Karger paper. They are the ones who can draw the ring, name the hash function, add vnodes before being asked, walk through the rebalancing math, and refuse to use consistent hashing when range partitioning is the correct answer.

If you can draw the ring, explain why vnodes exist, compute the bytes moved on add/remove, and name two real production systems using the technique, you have covered 90% of what this question actually tests. Consistent hashing is the default sharding algorithm of the modern cloud era and a reliable source of interview points if you articulate it with precision.