Skip to main content
Guides Skills and frameworks Database Schema Design Interview Guide — Normalization, Indexing, and Access Patterns
Skills and frameworks

Database Schema Design Interview Guide — Normalization, Indexing, and Access Patterns

9 min read · April 25, 2026

A tactical database schema design interview guide for modeling entities, choosing normalization boundaries, designing indexes, and explaining access-pattern tradeoffs.

Database Schema Design Interview Guide — Normalization, Indexing, and Access Patterns

A database schema design interview guide should help you do three things under pressure: model the domain, protect data integrity, and serve the most important queries efficiently. Interviewers are not looking for a perfect ER diagram on the first try. They want to see whether you clarify requirements, choose reasonable normalization, design indexes around access patterns, and explain tradeoffs like a person who has maintained a real database.

Database schema design interview guide: start with access patterns

The biggest mistake is beginning with tables before you know how the product will use the data. Start by asking:

  • What are the main entities and relationships?
  • What are the top read queries by frequency and latency requirement?
  • What writes happen, and do they need transactions?
  • What data must be unique, immutable, auditable, or soft-deleted?
  • What scale are we designing for now: thousands, millions, or billions of rows?
  • Are there time-based retention, reporting, or compliance needs?

A strong opening line is: "I'll first identify the core entities and access patterns, then normalize the source of truth and add indexes or denormalized read models only where the queries require them." That tells the interviewer you are balancing correctness and performance rather than drawing tables randomly.

Normalize the source of truth

Normalization reduces duplication and update anomalies. You do not need to recite every normal form, but you should know the practical meaning.

| Principle | Practical meaning | Example | |---|---|---| | 1NF | Store atomic values, not repeating lists in one column | Use order_items, not item_ids = '1,2,3' | | 2NF | Non-key attributes depend on the whole key | In a join table, do not store product name if product ID determines it | | 3NF | Non-key attributes should not depend on other non-key attributes | Store customer address in customers, not repeated on every order unless snapshotting |

In interviews, normalize the write model first. For example, an ecommerce schema usually has users, orders, order_items, products, and payments. The price charged for an item may be copied into order_items because it is a historical snapshot, not a normalization mistake. The current product price belongs in products, but the price at purchase time belongs with the order item.

That nuance is important: good schema design is not "never duplicate." It is "duplicate intentionally when the duplicate has different meaning, performance purpose, or historical value."

Model relationships clearly

Most schema interviews involve one-to-many and many-to-many relationships.

  • One-to-many: Put a foreign key on the many side. orders.user_id references users.id.
  • Many-to-many: Use a join table. users and teams become team_members(user_id, team_id, role, joined_at).
  • Optional relationship: Use nullable foreign keys carefully, or model state separately if null has multiple meanings.
  • Polymorphic relationship: Avoid vague entity_type/entity_id when integrity matters; consider separate tables or a clear parent abstraction.

Always name primary keys, foreign keys, uniqueness constraints, and deletion behavior. Saying "I would add a foreign key" is better than drawing loose boxes. Saying "I would use ON DELETE RESTRICT for paid orders but perhaps soft-delete users for compliance workflows" is stronger.

Design indexes from queries, not vibes

Indexes speed reads by maintaining sorted or searchable data structures, commonly B-trees in relational databases. They also cost storage and slow writes because the index must be updated. The interview move is to tie every index to a query.

Example access patterns for an orders service:

| Query | Index | |---|---| | Find a user by email during login | UNIQUE INDEX users_email_idx ON users(email) | | List recent orders for a user | INDEX orders_user_created_idx ON orders(user_id, created_at DESC) | | Fetch order items by order | INDEX order_items_order_idx ON order_items(order_id) | | Find unpaid orders by status and date | INDEX orders_status_due_idx ON orders(status, due_at) | | Prevent duplicate payment provider IDs | UNIQUE INDEX payments_provider_ref_idx ON payments(provider, provider_payment_id) |

Composite index order matters. An index on (user_id, created_at) helps WHERE user_id = ? ORDER BY created_at DESC, but it is less useful for a query that filters only on created_at. A good shorthand: put equality filters first, then range/order columns, then optionally include columns for covering reads if your database supports it.

Example schema: marketplace orders

A practical interview schema might look like this:

users(id, email, name, created_at)
sellers(id, user_id, display_name, created_at)
products(id, seller_id, name, status, current_price_cents, created_at)
orders(id, buyer_id, status, total_cents, currency, created_at)
order_items(id, order_id, product_id, seller_id, quantity, unit_price_cents)
payments(id, order_id, provider, provider_payment_id, status, amount_cents, created_at)
shipments(id, order_id, carrier, tracking_number, status, shipped_at)

Then explain choices:

  • order_items.unit_price_cents snapshots the purchase price because product prices can change.
  • orders.total_cents may be stored for fast display and audit, but it should be derived or verified from items at write time.
  • payments.provider_payment_id needs a uniqueness constraint with provider to prevent duplicate processing.
  • seller_id on order_items can be denormalized from product for seller dashboards and historical ownership.
  • Money should use integer cents or decimal, not floating point.

This is the level of detail that makes the schema feel real.

Access-pattern walkthrough

After drawing tables, walk through the most important operations.

Create order: transactionally insert orders, order_items, and payment intent. Verify product status and price. Store price snapshots. Consider inventory reservation if relevant.

Show order detail: fetch order by ID, ensure the requesting user owns it or is the seller/admin, join items and payment/shipment state. Index orders(id) is primary; order_items(order_id) matters.

List a user's orders: query orders WHERE buyer_id = ? ORDER BY created_at DESC LIMIT 20. Composite index (buyer_id, created_at DESC) matters.

Seller dashboard: query items or orders by seller_id and date. If this becomes expensive, add an index or create a read model/table specifically for seller order summaries.

This walkthrough proves your design is not just normalized; it supports the product.

When to denormalize

Denormalization is acceptable when you can explain why and how consistency is managed. Common cases:

  • Historical snapshots: price, tax rate, address at order time.
  • Read-heavy aggregates: comment counts, unread counts, rating summaries.
  • Search/display optimization: copying display names into event logs or read models.
  • Distributed systems: avoiding cross-service joins in a microservice architecture.

Add the maintenance strategy: update in the same transaction, rebuild asynchronously, tolerate eventual consistency, or compute on read until scale requires storage. Interviewers like hearing the failure mode: "If the count is eventually consistent, the product can tolerate being off by a few seconds. Payment status cannot."

Constraints, integrity, and migrations

Good schema design uses the database to enforce invariants:

  • NOT NULL for required fields.
  • UNIQUE for emails, slugs, idempotency keys, external IDs.
  • CHECK constraints for positive amounts or allowed ranges.
  • Foreign keys where cross-table integrity is within the same database boundary.
  • Created/updated timestamps for debugging and audit.

For migrations, mention backward-compatible rollout: add nullable column, backfill in batches, deploy code that writes both old and new shapes if needed, verify, then enforce NOT NULL or remove old columns. That is a senior signal because real schema changes happen while traffic is live.

Scaling and partitioning discussion

Do not jump to sharding on minute one. Start with good indexes, query plans, and batching. Then discuss scale options:

  • Read replicas for read-heavy workloads with tolerance for replica lag.
  • Partitioning by time for large event/order tables where queries are time-bounded.
  • Archival for old data that is rarely accessed.
  • Materialized views or summary tables for analytics-style dashboards.
  • Sharding by user, tenant, or account only when a single database cannot handle storage/write load.

If you choose a shard key, talk about hot spots. A tenant ID works for isolation but a huge tenant can dominate one shard. A random ID spreads writes but makes tenant queries harder. There is no free choice.

Common traps in schema interviews

Trap: no uniqueness constraints. If email, slug, provider payment ID, or membership pair must be unique, say so.

Trap: indexes on every column. Indexes are not free. Tie them to specific queries and write costs.

Trap: missing join tables. Many-to-many relationships need their own table if you want metadata like role, created_at, or status.

Trap: storing arrays of IDs in a text column. It breaks constraints, querying, and indexing.

Trap: ignoring deletes. Decide soft delete, hard delete, restrict delete, or cascade based on business meaning.

Trap: random UUID vs sequential IDs without tradeoff. UUIDs help distributed creation and external exposure; sequential IDs are smaller and index-friendly but guessable.

How to talk about schema design on a resume

Good bullets are specific:

  • "Redesigned order schema with normalized payment, shipment, and item tables plus composite indexes for buyer and seller access patterns."
  • "Added uniqueness and idempotency constraints that prevented duplicate payment processing under retry traffic."
  • "Migrated high-volume event table to time partitioning and summary tables, reducing dashboard query latency without sacrificing write integrity."

Avoid "designed database schema" alone. Say what constraints, indexes, or access patterns you handled.

Prep checklist

Before a database design interview, be ready to:

  • Turn requirements into entities, relationships, and cardinalities.
  • Explain 1NF/2NF/3NF in practical language.
  • Choose when denormalization is justified.
  • Design composite indexes from actual WHERE, ORDER BY, and LIMIT patterns.
  • Discuss transactions, constraints, and idempotency for writes.
  • Handle soft deletes, audit fields, and historical snapshots.
  • Explain migration rollout without downtime.
  • Know when to escalate from indexing to partitioning, replicas, or sharding.

A strong database schema answer is not the most complicated one. It is the one that protects the source of truth, serves the top access patterns, and leaves room to evolve without corrupting production data.

A 60-second schema answer template

When the interviewer gives a vague prompt, use a repeatable script before drawing:

"I will assume we are designing the transactional source of truth, not the analytics warehouse. I will start with normalized tables for the core entities, define primary keys and foreign keys, then add indexes for the highest-volume reads. For any denormalized fields, I will call out whether they are historical snapshots, cached aggregates, or read-model optimizations. I will also identify which operations need transactions and which identifiers need uniqueness or idempotency."

Then draw the first version and immediately test it with operations. For example, if designing a booking system, run through create booking, cancel booking, list bookings for a user, list bookings for a host, and prevent double-booking. This often reveals missing constraints such as UNIQUE(resource_id, start_time) or a more complex exclusion constraint for overlapping time ranges. If the database supports exclusion constraints, mention them; otherwise say the application may need transactional checks with appropriate locking.

Also mention observability fields when relevant: created_at, updated_at, created_by, status_changed_at, and sometimes an append-only event table. These fields are not glamorous, but they make production debugging and audit possible. Interview schemas that include lifecycle and operations details tend to stand out.