Database Sharding Interview Guide: Strategies, Hot Keys, and Resharding
Sharding is the system design topic candidates most confidently get wrong. Here is how to pick a shard key, survive hot partitions, and answer the resharding question that trips up staff candidates.
Database Sharding Interview Guide: Strategies, Hot Keys, and Resharding
Sharding is the section of the system design interview where candidates sound most confident and most wrong. They say "we'll shard by user ID" as if that settles the question, when in reality they've picked one of five strategies and ignored the three hard problems: hot keys, cross-shard joins, and online resharding.
This guide is the sharding conversation as it actually plays out at Google, Meta, Uber, Stripe, and Discord — all of which have public postmortems or engineering blogs on their sharding decisions. The interviewer wants evidence you've thought about the failure modes, not a confident hand-wave.
What interviewers actually want to hear
The first signal is whether you can name the strategies and pick one on purpose:
- Range-based sharding. Rows split by a contiguous range of the shard key.
userswhereid0-1M on shard A, 1M-2M on shard B. Simple, preserves range scans, but notorious for hot tails — new auto-increment IDs all land on the latest shard (the "hot last shard" problem that killed many early MongoDB deployments). - Hash-based sharding. Apply a hash function to the shard key and modulo by the number of shards. Even distribution, no range scans, catastrophic to reshard naively (every row moves when you change N).
- Consistent hashing. Hash keys onto a ring; each shard owns an arc. Adding a shard only moves 1/N of keys. Widely used — DynamoDB, Cassandra, Riak, Discord's message store. Understand virtual nodes (vnodes) and why they solve the load-imbalance problem of naive consistent hashing.
- Directory-based sharding. A lookup service maps keys to shards. Flexible (you can move individual tenants without rehashing), expensive (lookup is an extra hop, lookup service becomes a SPOF). Used by Vitess, Figma's shard-per-team architecture, and most multi-tenant SaaS at scale.
- Geo-based sharding. Rows live in the region closest to the user. Stripe, Airbnb, and many others run this for latency and regulatory reasons. Watch for users who travel or tenants that span regions.
You should also name the axis of sharding. Horizontal sharding splits rows across nodes. Vertical sharding splits columns (user auth on one DB, user preferences on another). Real systems do both — called composite sharding or multi-dimensional sharding. An interviewer who hears you name both levels is checking one box.
Then name the shard key criteria: high cardinality, even distribution, request locality (queries should hit one shard), and immutability (changing shard keys means row migration). A bad shard key is usually the hidden root cause when someone says "our database doesn't scale."
The tradeoffs you need to name
Interviewers are listening for these specific tradeoffs:
- Consistency vs. availability. Sharded systems are still subject to CAP. If you shard a relational DB and need cross-shard transactions, you're now doing distributed transactions with two-phase commit (Spanner, CockroachDB) or accepting eventual consistency. Name 2PC's downsides: blocking, coordinator failure, slow.
- Read amplification on fan-out. A query that doesn't include the shard key becomes a scatter-gather across all shards. p99 latency becomes the max of N shards. Scatter-gather is fine at small N and terrible at large N. If a query must fan out, denormalize or precompute.
- Cross-shard joins. Don't. If you need them, you've picked the wrong shard key or you need a secondary materialized view. Most sharded systems either denormalize aggressively or pipe data into a separate analytics store (Snowflake, BigQuery, ClickHouse) for cross-shard queries.
- Referential integrity. Foreign keys across shards don't work. You enforce integrity at the application layer or you colocate related rows on the same shard (e.g., shard
ordersby the sameuser_idasusers). - Resharding cost. This is the question interviewers love. Naive hash mod N requires moving every row if N changes. Consistent hashing moves 1/N. Directory-based lets you move tenants individually. Plan for resharding in your original design or you'll be rewriting the data layer in two years.
- Hot keys and skew. A celebrity user, a large tenant, or a popular product will saturate one shard. Name this before the interviewer does.
Cite the CAP theorem with discipline — the common "CP vs AP" framing is a cartoon, not a proof. What you actually want to say is "under a network partition, this system prefers X," and name the bounded staleness window.
Hot keys: the question that breaks candidates
Hot keys (hot partitions, hot shards) are the single most common real-world problem with sharded systems and the question that reliably catches mid-level candidates.
The problem: one shard key value gets disproportionate traffic. A celebrity tweet, a popular product on Black Friday, a tenant that is 1000x larger than average. One shard sees 80% of traffic, the others idle, the system falls over.
Mitigations you should name:
- Salted keys. Append a suffix to hot keys (
product:iphone17:{0..9}) and distribute writes across ten virtual keys. Reads fan-out and aggregate. Standard pattern for Cassandra and DynamoDB write-heavy hot keys. - Per-tenant dedicated shards. Large tenants get their own shard (Figma does this explicitly — "logical shards" where a giant org can be promoted to a single physical shard).
- Read replicas for read-hot keys. If the hot key is read-hot, not write-hot, throw replicas at it. Easier than write-side sharding.
- Caching layer. A Redis or in-process L1 cache absorbs reads on hot keys. This is the first line of defense before you touch shard design.
- Adaptive capacity. DynamoDB now automatically splits a partition when it detects a hot key. Other systems (Bigtable, Spanner) do automatic splits on range-sharded tables. Know which systems do this and which require manual intervention.
- Write sharding with a time suffix. For time-series hot writes (metrics, logs), shard by
(metric, time_bucket)so recent writes spread across shards but historical queries can still range-scan.
When the interviewer asks "what if a tenant is 100x the average size," the answer is not "we'll split them." The answer is "we'll move that tenant to a dedicated shard via directory lookup, write a one-time migration job with dual-write and backfill, and verify with checksum compare." Walk through the migration steps. That's the staff signal.
When you should NOT shard
A staff candidate refuses to shard when it's premature. The bad cases:
- When your single-node database isn't saturated. Modern Postgres on a big instance (r7g.16xlarge, 512 GB RAM, NVMe) handles tens of thousands of writes per second and tens of terabytes of data. If you're not near that ceiling, don't shard.
- When read replicas solve your problem. Read scaling is trivial. Write scaling is hard. Make sure you actually have write-scaling problems before sharding.
- When you can vertically partition instead. Splitting
user_authanduser_profileinto separate DBs is cheaper than horizontal sharding and keeps each DB simple. - When operational maturity is low. Sharding adds a monster of operational complexity. If your team doesn't have on-call muscle for distributed systems, a managed solution (Spanner, Cockroach, Aurora Limitless, Yugabyte) absorbs the complexity at the cost of dollars.
- When the workload is analytical. OLAP workloads belong in columnar stores (ClickHouse, BigQuery, Snowflake) that handle sharding automatically. Don't manually shard a row store for analytics queries.
Real-world example: Discord's trillion-message migration
Discord's 2023 migration from Cassandra to ScyllaDB is required reading for interviews.
Discord stored trillions of messages, originally sharded in Cassandra by (channel_id, bucket) where bucket was a time slice. This worked for years but hit operational limits — GC pauses, compaction storms, hot partitions from giant channels. They migrated to ScyllaDB for the same data model but with C++ performance and shard-per-core architecture.
Key lessons interviewers test on:
- The shard key choice
(channel_id, bucket)was correct in principle but produced hot partitions when a channel was massive. They had to add careful bucket sizing and rate limiting at the application layer. - Live migration with dual-write, backfill, and verification — a textbook pattern you should be able to describe.
- Why they didn't reshard in place: the data model was fine, the engine was the problem.
Other canonical examples:
- Figma's database sharding. Their 2023 blog post explains horizontal sharding of Postgres with a custom routing layer. Shows how to get from "one Postgres" to "sharded Postgres" without a total rewrite.
- Instagram's ID generation. Twitter Snowflake-style IDs embedding shard info and timestamp. Shows how the shard can be inferred from the primary key without a lookup.
- Uber's Schemaless and Docstore. Layered sharding on top of MySQL. The Docstore blog is an excellent treatment of sharding as an application-layer abstraction.
- DynamoDB partition keys. Know how composite keys (
partition_key,sort_key) work and why hot partitions are DynamoDB's famous failure mode before adaptive capacity. - Vitess. The MySQL sharding layer behind YouTube, Slack, Square, and GitHub. VSchema and vindexes are how you pick shard keys in Vitess.
Common candidate mistakes
- "We'll shard by user ID" with no justification. Why user ID? What about queries that don't include it? What happens for admin/reporting?
- Ignoring the time axis. If most queries are "recent data," time-sharding is often more useful than identity-sharding.
- Forgetting cross-shard queries exist. Admin panels, reporting, full-text search — these always cross shards. Plan the secondary index or analytics pipeline.
- Proposing 2PC like it's free. Distributed transactions are expensive, rare in modern designs. Say "we'll use a saga" or "we'll accept eventual consistency" unless the use case genuinely needs linearizability.
- Not explaining resharding. If you can't answer "what if you go from 8 to 16 shards," the rest of your design is theoretical.
- Confusing partitioning with sharding. Postgres table partitioning (same DB) is not sharding (different DBs). Mixing these up is a junior tell.
- Picking a bad shard key by assuming uniform load. Real traffic is always skewed. Power law, Zipf, Pareto. Name it.
Advanced follow-ups interviewers will ask
- "How do you do online resharding without downtime?" Dual-write to old and new shards, backfill historical data, verify with checksums, cut over reads, drain old shard. Weeks of work. Name the phases.
- "How do you handle secondary indexes across shards?" Two options: local index (per-shard, fast writes, slow cross-shard reads) or global index (separate sharded structure, slower writes, fast reads). DynamoDB calls these LSI and GSI.
- "How do you maintain consistency when moving a tenant?" Lock writes briefly, copy, verify, flip routing. Or more likely, CDC-based migration with eventual cutover and final sync.
- "What happens when a shard fails?" Replication within each shard (leader + followers), automatic failover, bounded read staleness on followers.
- "Why not just use Spanner/Cockroach and let the DB handle sharding?" Perfectly valid answer. You're trading dollars and vendor lock-in for operational simplicity. Call it out as a conscious choice.
- "How does this change if you need strong global consistency?" Spanner-style TrueTime, synchronous cross-region replication, or accept the latency hit. If the interviewer pushes, discuss linearizability vs. serializable snapshot isolation.
Sharding is the domain where specificity separates the staff signal from the senior signal. Pick a strategy on purpose. Name the failure modes. Walk through resharding. Cite real systems. If you can narrate the migration plan for moving a hot tenant off a shared shard onto a dedicated one, without hand-waving any phase, you are operating at the top of the candidate pool.
Related guides
- Database Schema Design Interview Guide — Normalization, Indexing, and Access Patterns — A tactical database schema design interview guide for modeling entities, choosing normalization boundaries, designing indexes, and explaining access-pattern tradeoffs.
- A/B Testing Interview Questions in 2026 — Power Analysis, Peeking, and SRM — A tactical guide to A/B testing interview questions in 2026, with answer frameworks for power analysis, peeking, sample-ratio mismatch, guardrails, metrics, and experiment trade-offs. Built for product analysts, data scientists, PMs, and growth roles.
- API Design Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A practical API design interview cheatsheet for 2026: how to scope the problem, choose REST/GraphQL/gRPC patterns, model resources, handle auth, versioning, rate limits, and avoid the traps that cost senior candidates offers.
- API Design Interview Guide — REST vs GraphQL vs gRPC, Versioning, and Pagination — A practical API design interview guide covering REST, GraphQL, gRPC, versioning, pagination, idempotency, errors, auth, rate limits, and the tradeoffs interviewers expect.
- AWS Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A high-signal AWS interview cheatsheet for 2026 covering architecture patterns, IAM, networking, reliability, cost, debugging, and the answers that show real cloud judgment.
