Skills and frameworks

Caching Strategies for System Design Interviews: Write-Through, Write-Back, and TTL Patterns

9 min read · April 25, 2026

The caching section of a FAANG system design loop is where mediocre candidates blur together. Here's how to name tradeoffs, pick a pattern on purpose, and survive the hot-key follow-up.

Caching Strategies for System Design Interviews: Write-Through, Write-Back, and TTL Patterns

Every senior and staff system design loop at Google, Meta, Stripe, Datadog, and the rest of the FAANG-tier shops contains at least one caching decision. Most candidates fumble it because they treat caching as a single decision ("we'll add Redis in front of the database") instead of a set of tradeoffs about consistency, invalidation, and failure modes.

This guide is the version of the caching conversation I wish every staff-plus candidate walked into. It assumes you already know what a cache is. What interviewers actually test is whether you can pick a write policy on purpose, name what breaks, and recover when they push on hot keys or stampedes.

What interviewers actually want to hear

The senior signal is the ability to name the write pattern, the read pattern, the eviction policy, and the invalidation strategy as four separate decisions.

Most candidates merge them into one. That is the tell.

Write patterns come in five common flavors:

Cache-aside (lazy loading). App reads cache, falls back to DB on miss, writes to DB and invalidates or updates the cache. Default choice. Redis + a relational store is 90% of production systems.
Write-through. App writes to cache, cache writes to DB synchronously before acknowledging. Strongly consistent but every write pays the cache latency plus the DB latency.
Write-back (write-behind). App writes to cache, cache acknowledges immediately, and flushes to DB asynchronously. Fast writes, big durability risk on cache failure.
Write-around. Writes skip the cache and go directly to the DB; the cache only populates on read. Good when the write-heavy working set is different from the read-heavy working set.
Refresh-ahead. Cache proactively reloads entries before TTL expiry based on access patterns. Used in CDNs and in Netflix's EVCache.

Read patterns are simpler: read-through (cache fetches from DB on miss, app never talks to DB) versus cache-aside (app fetches from DB on miss). Read-through pushes policy into the cache layer; cache-aside keeps it in the app. Most teams I've worked with prefer cache-aside because it's debuggable.

Eviction policies you should be able to name and defend: LRU (Redis default allkeys-lru), LFU (allkeys-lfu, better for skewed workloads like celebrity tweets), FIFO, TinyLFU (Caffeine, excellent for JVM app caches), and SLRU. If you say "we'll use LRU" without being able to explain when LFU or TinyLFU wins, you're not at the staff bar.

Invalidation is where most candidates die. Name the three hard cases: stale writes from the DB bypassing cache, multi-region replication lag, and partial failures where the DB commits but the invalidation message drops. Then name your mitigation: TTL as a safety net, versioned keys, or dual writes via CDC (Debezium or AWS DMS streaming to a Kafka topic the cache consumers tail).

The tradeoffs you need to name out loud

Interviewers are listening for specific tradeoff vocabulary. You should explicitly name:

Consistency model. Strong consistency (write-through, synchronous invalidation) vs. eventual consistency (write-back, TTL-based). Be explicit. "We accept stale reads up to 60 seconds" is an answer. "It'll be pretty consistent" is not.
Durability risk. Write-back has a window where acknowledged writes are only in the cache. A Redis node failure means lost writes. You can mitigate with AOF persistence (appendfsync everysec) or by using Redis on ElastiCache with multi-AZ, but you still have a window.
Thundering herd on miss. When a popular key expires, every request races to the DB. Mitigate with request coalescing (singleflight in Go, dataloader in Node), probabilistic early expiration, or lock-on-miss.
Hot keys. A single celebrity key saturates a single Redis shard. This is a real problem. Mitigate with client-side caching (Redis 6 client-side caching, Netflix's EVCache with L1 in-process), key splitting (celebrity:lebron:{0..N}), or a CDN layer in front.
Cold-start problem. A cache that starts empty is useless. Mitigate with warm-up scripts, replaying recent traffic, or refresh-ahead for known hot entries.
Memory cost. Redis at 100GB+ is expensive. Working set sizing matters. If your working set is larger than RAM, you're shopping for a different architecture, not a bigger cache.

Name the RFC when relevant. HTTP caching (Cache-Control, ETag, Vary, RFC 9111) is fair game for any system that hits a browser or CDN. If the interviewer asks about browser caching and you don't know stale-while-revalidate and stale-if-error, you're losing a layer.

When you should NOT suggest a cache

Senior candidates distinguish themselves by refusing to add a cache when it's wrong. The bad cases:

Write-heavy workloads with low read amplification. If every write is read once or zero times, the cache is overhead.
Strong consistency requirements on tiny data. If you need linearizability on an account balance read immediately after a write, don't stick Redis in front. Serve from the primary or a read-your-writes replica.
Highly dynamic personalized data with no key locality. A recommendation feed where every user sees a unique page has a cache hit rate near zero. Cache the components, not the composed page.
Systems where cache failure is unacceptable and you can't afford a fallback. If the cache is a hard dependency, it's not a cache, it's a database. Treat it accordingly with replication, persistence, and backup.

When in doubt, measure before adding one. "What's the read:write ratio and hit rate we expect?" is a question that scores points.

Real-world example: Twitter's timeline cache

Twitter's home timeline is the textbook fan-out-on-write example and every interviewer expects you to know it.

Each user has a precomputed timeline in Redis, keyed by user ID, containing a list of tweet IDs. When a user tweets, a fan-out worker pushes the tweet ID into the Redis list of every follower. Reads are O(1): pull the list, hydrate the tweet objects from a separate cache or Manhattan (Twitter's KV store). This works because writes are rare relative to reads and because Twitter precomputes at write time.

It breaks on celebrity accounts. @elonmusk has 200M+ followers. Fanning out a single tweet to 200M Redis lists is a write amplification nightmare. Twitter solved this with a hybrid model: for high-follower accounts, do fan-out-on-read instead. The celebrity's tweets are pulled in at read time and merged with the precomputed list.

The lesson for interviews: pure strategies rarely survive contact with real skew. Name the hybrid.

Other canonical examples worth memorizing:

Facebook TAO. Write-through graph cache in front of MySQL, with async invalidation across regions. Famous for the "thundering herd on leader failover" failure mode in the 2010 paper.
Netflix EVCache. Memcached with client-side replication across AZs. Designed for the fact that cache misses to the origin are catastrophic during peak viewing.
DNS. The most widely deployed cache in the world. Pure TTL. RFC 1034/1035. Understand why negative caching (caching NXDOMAIN, RFC 2308) matters.
Stripe's idempotency cache. Stores request fingerprints in Redis for 24 hours to prevent duplicate charges on retries. Shows up in payment system design questions constantly.

Common candidate mistakes

The mistakes that reliably drop a candidate from a hire to a lean no:

Saying "we'll add Redis" without specifying the pattern. This is the number-one failure mode. Always state: cache-aside or read-through, write policy, TTL, eviction.
Forgetting invalidation entirely. Phil Karlton's old line about two hard problems exists for a reason. If you don't address how stale data gets removed, you haven't designed a cache.
Ignoring cache failure modes. "What happens if Redis is down" is asked 100% of the time. Your answer should include graceful degradation (serve from DB with rate limits), circuit breakers, and connection pool behavior.
Over-caching. Caching the composed final response when you should be caching components, or caching data that's already in a CDN. Layer cakes.
Miscalculating memory. A billion 1KB entries is a terabyte. Redis on a single node tops out well before that. Know when you need a cluster.
Using TTL as the only invalidation. TTL is a safety net, not a strategy. If fresh data is important, you need explicit invalidation plus TTL as backup.
Not naming the hit rate. A cache with a 30% hit rate may not be worth the complexity. 95%+ is what you want to see in production. Mention the number.

Advanced follow-ups interviewers will ask

Expect any of these as the second-layer probe:

"How do you handle cache stampede on a popular key expiring?" Answer: request coalescing (only one fetcher per key), probabilistic early expiration (XFetch algorithm), or a lock with a short TTL. Reference Facebook's 2013 memcache paper.
"How do you keep two regions consistent?" Answer: CDC from the DB into a Kafka topic consumed by invalidators in each region. Or use a globally replicated cache like DynamoDB DAX for specific workloads. Accept bounded staleness explicitly.
"What if you have a hot key?" Answer: key splitting, local L1 cache in the app process, or front it with a CDN. If the key is truly hot and needs real-time writes, you may need to shard by hash of (key, suffix) and read all shards.
"How do you warm a cold cache?" Answer: shadow traffic from production, a replay of the last N hours of read logs, or a precompute job that seeds the top 1% of keys.
"Why not use in-process cache?" Answer: it's great for ultra-low latency (Caffeine, Guava in Java; lru_cache in Python) but has no cross-instance consistency. Combine with a distributed cache for a two-tier setup.
"How do you measure the cache?" Answer: hit rate, p99 latency, eviction rate, memory utilization, and origin fallback rate. Alert on sudden hit rate drops — they usually indicate a bad deploy that invalidated a key schema.

A good pseudocode snippet that lands well in interviews:

function getUser(id):
    key = "user:" + id
    value = cache.get(key)
    if value is HIT:
        return value
    if lock.acquire(key, ttl=2s):
        try:
            value = cache.get(key)  # double-check
            if value is HIT: return value
            value = db.query(id)
            cache.set(key, value, ttl=300s)
            return value
        finally:
            lock.release(key)
    else:
        # coalesce: wait briefly, then try cache again
        sleep(50ms)
        return cache.get(key) or db.query(id)

Drawing that out with an ASCII box-and-arrow between client, cache, lock, and DB scores more points than any list of buzzwords.

The candidates who clear a staff-plus system design loop are not the ones who memorize the largest list of caching products. They are the ones who can look at a workload, name the read:write ratio, pick a write policy on purpose, and walk through what breaks on failure. Practice narrating the decision, not reciting the options.

If you can do that out loud, confidently, without hand-waving invalidation, you will outperform 80% of the candidates in the interview pool. Caching is the most common system design topic and the one where small improvements in articulation translate directly into higher leveling outcomes.

Backend System Design Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A backend System Design interview cheatsheet for 2026 with the core flow, architecture patterns, capacity heuristics, reliability tradeoffs, and traps that separate senior answers from vague box drawing.
Designing a News Feed System Design — Fanout-on-Write vs Fanout-on-Read — A system design guide for news feeds that explains fanout-on-write, fanout-on-read, hybrid timelines, ranking, caching, and interview tradeoffs. Use it to structure a senior-level feed design answer without getting lost in buzzwords.
Frontend System Design Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A practical Frontend System Design interview cheatsheet for 2026: how to structure the conversation, which patterns to reach for, what tradeoffs to name, and the traps that cost senior candidates offers.
Load Balancing for System Design Interviews: L4 vs L7, Algorithms, and Failover — The load balancer slide is a staff-level smell test. Here is how to pick L4 vs L7, name the algorithm, handle health checks, and not get caught on sticky sessions.
Rate Limiting for System Design Interviews: Token Bucket, Leaky Bucket, and Sliding Window — Rate limiting questions separate candidates who memorized a diagram from engineers who've actually run one in production. Here's how to pick an algorithm on purpose and survive the distributed-coordination follow-up.

Caching Strategies for System Design Interviews: Write-Through, Write-Back, and TTL Patterns

What interviewers actually want to hear

The tradeoffs you need to name out loud

When you should NOT suggest a cache

Real-world example: Twitter's timeline cache

Common candidate mistakes

Advanced follow-ups interviewers will ask

Related guides

More in Skills and frameworks

A/B Testing Interview Questions in 2026 — Power Analysis, Peeking, and SRM

API Design Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps

API Design Interview Guide — REST vs GraphQL vs gRPC, Versioning, and Pagination