Skip to main content
Guides Skills and frameworks SQL Interview Patterns Cheatsheet — Joins, Subqueries, and the 10 Templates That Recur
Skills and frameworks

SQL Interview Patterns Cheatsheet — Joins, Subqueries, and the 10 Templates That Recur

9 min read · April 25, 2026

A SQL interview patterns cheatsheet built around the recurring templates: joins, anti-joins, aggregation, top-N per group, window functions, subqueries, deduplication, retention, funnels, and data-quality checks. Use it to recognize the prompt before you write the query.

SQL Interview Patterns Cheatsheet — Joins, Subqueries, and the 10 Templates That Recur

This SQL interview patterns cheatsheet is built around a simple fact: most SQL interviews are not novel. They are joins, subqueries, aggregation, deduplication, top-N per group, running totals, funnels, retention, and data-quality checks in different clothing. If you can recognize the pattern quickly, you can spend the interview explaining assumptions instead of panicking over syntax.

The best SQL candidates start by naming the grain: one row per user, order, event, day, account, or product. Then they choose the smallest query shape that preserves that grain. That habit prevents most bad answers.

SQL interview patterns cheatsheet: first principles

Before writing SQL, ask three questions:

  1. What is the output grain? One row per what?
  2. What is the metric definition? Count users, events, orders, dollars, days, or accounts?
  3. What rows are eligible? Time window, status, geography, plan type, deleted records, bot traffic, refunds?

A candidate who asks these questions sounds senior even before the query is complete.

Template 1: inner join for matching records

Use an inner join when you only want rows that exist in both tables.

Prompt: “List orders with customer emails.”

SELECT
  o.order_id,
  o.order_date,
  c.email,
  o.revenue
FROM orders o
JOIN customers c
  ON c.customer_id = o.customer_id;

Trap: joining at the wrong grain. If customers has one row per customer, this is safe. If customer_emails has multiple rows per customer, the join duplicates orders. Always check cardinality.

Interview line: “I am assuming customers.customer_id is unique. If not, I would deduplicate or pick the current email before joining.”

Template 2: left join to keep the base table

Use a left join when every row from the left table should remain even if the right side is missing.

Prompt: “Show all customers and their order count, including customers with zero orders.”

SELECT
  c.customer_id,
  COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.customer_id
GROUP BY c.customer_id;

Trap: putting right-table filters in WHERE turns the left join into an inner join.

Wrong:

WHERE o.order_date >= DATE '2026-01-01'

Better:

LEFT JOIN orders o
  ON o.customer_id = c.customer_id
 AND o.order_date >= DATE '2026-01-01'

That preserves zero-order customers.

Template 3: anti-join for “did not do X”

Prompt: “Find users who signed up but never placed an order.”

SELECT u.user_id
FROM users u
LEFT JOIN orders o
  ON o.user_id = u.user_id
WHERE o.user_id IS NULL;

Alternative:

SELECT u.user_id
FROM users u
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.user_id = u.user_id
);

NOT EXISTS is often clearer and avoids null surprises with NOT IN. If the subquery can return nulls, NOT IN can behave unexpectedly.

Interview line: “For anti-joins, I prefer NOT EXISTS when null behavior matters.”

Template 4: aggregation at the correct grain

Prompt: “Calculate monthly revenue.”

SELECT
  DATE_TRUNC(order_date, MONTH) AS order_month,
  SUM(revenue) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY 1
ORDER BY 1;

The pattern is filter first, group to the reporting grain, aggregate the metric. The trap is counting rows when the metric should count distinct users or orders. Say what you are counting.

Prompt variant: “Monthly active users.”

SELECT
  DATE_TRUNC(event_time, MONTH) AS activity_month,
  COUNT(DISTINCT user_id) AS mau
FROM events
WHERE event_name = 'active'
GROUP BY 1;

Template 5: conditional aggregation

Prompt: “By month, show total orders, paid orders, refunded orders, and paid revenue.”

SELECT
  DATE_TRUNC(order_date, MONTH) AS order_month,
  COUNT(*) AS total_orders,
  SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS paid_orders,
  SUM(CASE WHEN status = 'refunded' THEN 1 ELSE 0 END) AS refunded_orders,
  SUM(CASE WHEN status = 'paid' THEN revenue ELSE 0 END) AS paid_revenue
FROM orders
GROUP BY 1;

Conditional aggregation is a workhorse for dashboards. In Postgres-style dialects, COUNT(*) FILTER (WHERE status = 'paid') can be cleaner. In cross-dialect interviews, SUM(CASE WHEN...) is usually safe.

Template 6: top N per group

Prompt: “For each category, return the top three products by revenue.”

WITH product_revenue AS (
  SELECT category, product_id, SUM(revenue) AS revenue
  FROM order_items
  GROUP BY 1, 2
), ranked AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY category
      ORDER BY revenue DESC, product_id
    ) AS rn
  FROM product_revenue
)
SELECT category, product_id, revenue
FROM ranked
WHERE rn <= 3;

Use ROW_NUMBER for exactly three rows. Use RANK to include ties. Use DENSE_RANK when ranks should not have gaps. Explain the tie requirement before choosing.

Template 7: deduplication with ROW_NUMBER

Prompt: “Keep the latest record per user.”

WITH ranked AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY updated_at DESC, record_id DESC
    ) AS rn
  FROM user_profiles
)
SELECT *
FROM ranked
WHERE rn = 1;

The tie-breaker is not optional. If two rows have the same updated_at, output should still be deterministic. If the table has soft deletes, include that in the order or filter depending on the requirement.

Interview line: “I am using a deterministic tie-breaker so the same source data produces the same winner.”

Template 8: previous or next event with LAG and LEAD

Prompt: “For each order, calculate days since the previous order by the same customer.”

WITH sequenced AS (
  SELECT
    customer_id,
    order_id,
    order_date,
    LAG(order_date) OVER (
      PARTITION BY customer_id
      ORDER BY order_date, order_id
    ) AS previous_order_date
  FROM orders
)
SELECT
  customer_id,
  order_id,
  order_date,
  previous_order_date,
  DATE_DIFF(order_date, previous_order_date, DAY) AS days_since_previous_order
FROM sequenced;

Use this for reorder intervals, session gaps, reactivation, first-to-second action timing, and churn warning logic. Name how null first rows should be handled.

Template 9: running total and rolling average

Prompt: “Daily revenue and month-to-date revenue.”

WITH daily AS (
  SELECT
    DATE_TRUNC(order_date, DAY) AS day,
    DATE_TRUNC(order_date, MONTH) AS month,
    SUM(revenue) AS daily_revenue
  FROM orders
  GROUP BY 1, 2
)
SELECT
  day,
  daily_revenue,
  SUM(daily_revenue) OVER (
    PARTITION BY month
    ORDER BY day
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS month_to_date_revenue
FROM daily;

Prompt variant: “Trailing seven-day average.”

AVG(daily_active_users) OVER (
  ORDER BY day
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)

Trap: rolling “seven rows” is only rolling “seven days” if every calendar day exists. Use a date spine when missing days are possible.

Template 10: funnel conversion

Prompt: “Calculate signup-to-purchase conversion.”

WITH user_steps AS (
  SELECT
    user_id,
    MIN(CASE WHEN event_name = 'signup' THEN event_time END) AS signup_time,
    MIN(CASE WHEN event_name = 'purchase' THEN event_time END) AS purchase_time
  FROM events
  GROUP BY user_id
)
SELECT
  COUNT(*) AS signed_up_users,
  COUNT(CASE WHEN purchase_time > signup_time THEN 1 END) AS purchased_users,
  COUNT(CASE WHEN purchase_time > signup_time THEN 1 END) * 1.0 / COUNT(*) AS conversion_rate
FROM user_steps
WHERE signup_time IS NOT NULL;

This template forces event order. A common bad answer counts users who ever purchased, even if the purchase came before signup due to imported historical records or identity stitching. For multi-step funnels, build one timestamp per user per step, then enforce ordering.

Bonus template: retention cohorts

Prompt: “What percentage of users return in week one after signup?”

WITH signups AS (
  SELECT user_id, DATE_TRUNC(signup_date, WEEK) AS signup_week, signup_date
  FROM users
), activity AS (
  SELECT DISTINCT user_id, DATE_TRUNC(event_time, DAY) AS activity_day
  FROM events
)
SELECT
  s.signup_week,
  COUNT(DISTINCT s.user_id) AS cohort_users,
  COUNT(DISTINCT CASE
    WHEN a.activity_day >= s.signup_date + INTERVAL '7' DAY
     AND a.activity_day <  s.signup_date + INTERVAL '14' DAY
    THEN s.user_id END) AS retained_week_1_users
FROM signups s
LEFT JOIN activity a
  ON a.user_id = s.user_id
GROUP BY s.signup_week;

Retention questions are mostly about clear windows. Is week one days 1-7 or days 7-13? Does signup day count? Are inactive users included in the denominator? Ask.

Subqueries vs CTEs vs joins

Use CTEs to make multi-step logic readable. Use subqueries when the logic is small and local. Use joins when combining row sets. Performance depends on database engine, indexes, optimizer behavior, and materialization rules, so do not make universal claims like “CTEs are always slower.” Say: “I would check the execution plan if performance matters.”

Correlated subqueries are useful for existence checks but can be expensive if executed naively. Many optimizers rewrite them, but you should understand the row-by-row mental model.

Common SQL interview mistakes

  • Not clarifying whether to count rows or distinct users.
  • Filtering a left-joined table in WHERE.
  • Joining fact tables directly and multiplying rows.
  • Forgetting time zones when grouping by day.
  • Using NOT IN with nullable subqueries.
  • Ranking without a deterministic tie-breaker.
  • Calculating averages of averages when weighted averages are needed.
  • Ignoring duplicates in event tables.
  • Confusing event time with ingestion time.
  • Writing dialect-specific syntax without naming the dialect.

How to practice this cheatsheet

Take any prompt and classify it before coding: join, anti-join, aggregate, conditional aggregate, top-N, dedupe, previous event, running total, funnel, or retention. Then state the grain and eligible rows. Only then write SQL.

A final interview script:

“I will make one row per user first, with the earliest signup and earliest purchase after signup. Then I will aggregate across users to calculate conversion. I am doing it this way to avoid counting multiple events per user and to enforce event order.”

That kind of explanation is what hiring teams remember. SQL interviews reward candidates who make data assumptions visible. The syntax matters, but the pattern recognition and metric discipline matter more.

Template 11: weighted averages and ratio metrics

Prompt: “Calculate average order value by month.” Easy. Prompt: “Calculate average conversion rate across campaigns.” Dangerous. Ratio metrics should usually be recomputed from numerator and denominator, not averaged from pre-aggregated percentages.

Bad pattern:

SELECT AVG(conversion_rate) AS avg_conversion_rate
FROM campaign_daily_stats;

Better pattern:

SELECT
  campaign_month,
  SUM(conversions) * 1.0 / NULLIF(SUM(visits), 0) AS conversion_rate
FROM campaign_daily_stats
GROUP BY campaign_month;

Interview line: “I would aggregate the numerator and denominator separately, then divide. Averaging daily rates gives each day equal weight even if traffic is very different.”

How to debug when your query looks wrong

When a SQL answer returns surprising numbers, do not randomly rewrite it. Inspect row counts after each step. Count rows before and after joins. Check duplicate keys. Compare COUNT(*) to COUNT(DISTINCT user_id). Pull five example users through the funnel manually. Add a temporary CTE that shows intermediate grain.

This debugging habit is valuable in interviews because it shows production judgment. A perfect query written silently is good; a query with visible checks is often better. Hiring teams want people who can find the source of metric drift when a dashboard is off by 12%, not just people who can pass a syntax quiz.