SQL Interview Patterns Cheatsheet — Joins, Subqueries, and the 10 Templates That Recur
A SQL interview patterns cheatsheet built around the recurring templates: joins, anti-joins, aggregation, top-N per group, window functions, subqueries, deduplication, retention, funnels, and data-quality checks. Use it to recognize the prompt before you write the query.
SQL Interview Patterns Cheatsheet — Joins, Subqueries, and the 10 Templates That Recur
This SQL interview patterns cheatsheet is built around a simple fact: most SQL interviews are not novel. They are joins, subqueries, aggregation, deduplication, top-N per group, running totals, funnels, retention, and data-quality checks in different clothing. If you can recognize the pattern quickly, you can spend the interview explaining assumptions instead of panicking over syntax.
The best SQL candidates start by naming the grain: one row per user, order, event, day, account, or product. Then they choose the smallest query shape that preserves that grain. That habit prevents most bad answers.
SQL interview patterns cheatsheet: first principles
Before writing SQL, ask three questions:
- What is the output grain? One row per what?
- What is the metric definition? Count users, events, orders, dollars, days, or accounts?
- What rows are eligible? Time window, status, geography, plan type, deleted records, bot traffic, refunds?
A candidate who asks these questions sounds senior even before the query is complete.
Template 1: inner join for matching records
Use an inner join when you only want rows that exist in both tables.
Prompt: “List orders with customer emails.”
SELECT
o.order_id,
o.order_date,
c.email,
o.revenue
FROM orders o
JOIN customers c
ON c.customer_id = o.customer_id;
Trap: joining at the wrong grain. If customers has one row per customer, this is safe. If customer_emails has multiple rows per customer, the join duplicates orders. Always check cardinality.
Interview line: “I am assuming customers.customer_id is unique. If not, I would deduplicate or pick the current email before joining.”
Template 2: left join to keep the base table
Use a left join when every row from the left table should remain even if the right side is missing.
Prompt: “Show all customers and their order count, including customers with zero orders.”
SELECT
c.customer_id,
COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id
GROUP BY c.customer_id;
Trap: putting right-table filters in WHERE turns the left join into an inner join.
Wrong:
WHERE o.order_date >= DATE '2026-01-01'
Better:
LEFT JOIN orders o
ON o.customer_id = c.customer_id
AND o.order_date >= DATE '2026-01-01'
That preserves zero-order customers.
Template 3: anti-join for “did not do X”
Prompt: “Find users who signed up but never placed an order.”
SELECT u.user_id
FROM users u
LEFT JOIN orders o
ON o.user_id = u.user_id
WHERE o.user_id IS NULL;
Alternative:
SELECT u.user_id
FROM users u
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.user_id = u.user_id
);
NOT EXISTS is often clearer and avoids null surprises with NOT IN. If the subquery can return nulls, NOT IN can behave unexpectedly.
Interview line: “For anti-joins, I prefer NOT EXISTS when null behavior matters.”
Template 4: aggregation at the correct grain
Prompt: “Calculate monthly revenue.”
SELECT
DATE_TRUNC(order_date, MONTH) AS order_month,
SUM(revenue) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY 1
ORDER BY 1;
The pattern is filter first, group to the reporting grain, aggregate the metric. The trap is counting rows when the metric should count distinct users or orders. Say what you are counting.
Prompt variant: “Monthly active users.”
SELECT
DATE_TRUNC(event_time, MONTH) AS activity_month,
COUNT(DISTINCT user_id) AS mau
FROM events
WHERE event_name = 'active'
GROUP BY 1;
Template 5: conditional aggregation
Prompt: “By month, show total orders, paid orders, refunded orders, and paid revenue.”
SELECT
DATE_TRUNC(order_date, MONTH) AS order_month,
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS paid_orders,
SUM(CASE WHEN status = 'refunded' THEN 1 ELSE 0 END) AS refunded_orders,
SUM(CASE WHEN status = 'paid' THEN revenue ELSE 0 END) AS paid_revenue
FROM orders
GROUP BY 1;
Conditional aggregation is a workhorse for dashboards. In Postgres-style dialects, COUNT(*) FILTER (WHERE status = 'paid') can be cleaner. In cross-dialect interviews, SUM(CASE WHEN...) is usually safe.
Template 6: top N per group
Prompt: “For each category, return the top three products by revenue.”
WITH product_revenue AS (
SELECT category, product_id, SUM(revenue) AS revenue
FROM order_items
GROUP BY 1, 2
), ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY revenue DESC, product_id
) AS rn
FROM product_revenue
)
SELECT category, product_id, revenue
FROM ranked
WHERE rn <= 3;
Use ROW_NUMBER for exactly three rows. Use RANK to include ties. Use DENSE_RANK when ranks should not have gaps. Explain the tie requirement before choosing.
Template 7: deduplication with ROW_NUMBER
Prompt: “Keep the latest record per user.”
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY updated_at DESC, record_id DESC
) AS rn
FROM user_profiles
)
SELECT *
FROM ranked
WHERE rn = 1;
The tie-breaker is not optional. If two rows have the same updated_at, output should still be deterministic. If the table has soft deletes, include that in the order or filter depending on the requirement.
Interview line: “I am using a deterministic tie-breaker so the same source data produces the same winner.”
Template 8: previous or next event with LAG and LEAD
Prompt: “For each order, calculate days since the previous order by the same customer.”
WITH sequenced AS (
SELECT
customer_id,
order_id,
order_date,
LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date, order_id
) AS previous_order_date
FROM orders
)
SELECT
customer_id,
order_id,
order_date,
previous_order_date,
DATE_DIFF(order_date, previous_order_date, DAY) AS days_since_previous_order
FROM sequenced;
Use this for reorder intervals, session gaps, reactivation, first-to-second action timing, and churn warning logic. Name how null first rows should be handled.
Template 9: running total and rolling average
Prompt: “Daily revenue and month-to-date revenue.”
WITH daily AS (
SELECT
DATE_TRUNC(order_date, DAY) AS day,
DATE_TRUNC(order_date, MONTH) AS month,
SUM(revenue) AS daily_revenue
FROM orders
GROUP BY 1, 2
)
SELECT
day,
daily_revenue,
SUM(daily_revenue) OVER (
PARTITION BY month
ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS month_to_date_revenue
FROM daily;
Prompt variant: “Trailing seven-day average.”
AVG(daily_active_users) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
Trap: rolling “seven rows” is only rolling “seven days” if every calendar day exists. Use a date spine when missing days are possible.
Template 10: funnel conversion
Prompt: “Calculate signup-to-purchase conversion.”
WITH user_steps AS (
SELECT
user_id,
MIN(CASE WHEN event_name = 'signup' THEN event_time END) AS signup_time,
MIN(CASE WHEN event_name = 'purchase' THEN event_time END) AS purchase_time
FROM events
GROUP BY user_id
)
SELECT
COUNT(*) AS signed_up_users,
COUNT(CASE WHEN purchase_time > signup_time THEN 1 END) AS purchased_users,
COUNT(CASE WHEN purchase_time > signup_time THEN 1 END) * 1.0 / COUNT(*) AS conversion_rate
FROM user_steps
WHERE signup_time IS NOT NULL;
This template forces event order. A common bad answer counts users who ever purchased, even if the purchase came before signup due to imported historical records or identity stitching. For multi-step funnels, build one timestamp per user per step, then enforce ordering.
Bonus template: retention cohorts
Prompt: “What percentage of users return in week one after signup?”
WITH signups AS (
SELECT user_id, DATE_TRUNC(signup_date, WEEK) AS signup_week, signup_date
FROM users
), activity AS (
SELECT DISTINCT user_id, DATE_TRUNC(event_time, DAY) AS activity_day
FROM events
)
SELECT
s.signup_week,
COUNT(DISTINCT s.user_id) AS cohort_users,
COUNT(DISTINCT CASE
WHEN a.activity_day >= s.signup_date + INTERVAL '7' DAY
AND a.activity_day < s.signup_date + INTERVAL '14' DAY
THEN s.user_id END) AS retained_week_1_users
FROM signups s
LEFT JOIN activity a
ON a.user_id = s.user_id
GROUP BY s.signup_week;
Retention questions are mostly about clear windows. Is week one days 1-7 or days 7-13? Does signup day count? Are inactive users included in the denominator? Ask.
Subqueries vs CTEs vs joins
Use CTEs to make multi-step logic readable. Use subqueries when the logic is small and local. Use joins when combining row sets. Performance depends on database engine, indexes, optimizer behavior, and materialization rules, so do not make universal claims like “CTEs are always slower.” Say: “I would check the execution plan if performance matters.”
Correlated subqueries are useful for existence checks but can be expensive if executed naively. Many optimizers rewrite them, but you should understand the row-by-row mental model.
Common SQL interview mistakes
- Not clarifying whether to count rows or distinct users.
- Filtering a left-joined table in
WHERE. - Joining fact tables directly and multiplying rows.
- Forgetting time zones when grouping by day.
- Using
NOT INwith nullable subqueries. - Ranking without a deterministic tie-breaker.
- Calculating averages of averages when weighted averages are needed.
- Ignoring duplicates in event tables.
- Confusing event time with ingestion time.
- Writing dialect-specific syntax without naming the dialect.
How to practice this cheatsheet
Take any prompt and classify it before coding: join, anti-join, aggregate, conditional aggregate, top-N, dedupe, previous event, running total, funnel, or retention. Then state the grain and eligible rows. Only then write SQL.
A final interview script:
“I will make one row per user first, with the earliest signup and earliest purchase after signup. Then I will aggregate across users to calculate conversion. I am doing it this way to avoid counting multiple events per user and to enforce event order.”
That kind of explanation is what hiring teams remember. SQL interviews reward candidates who make data assumptions visible. The syntax matters, but the pattern recognition and metric discipline matter more.
Template 11: weighted averages and ratio metrics
Prompt: “Calculate average order value by month.” Easy. Prompt: “Calculate average conversion rate across campaigns.” Dangerous. Ratio metrics should usually be recomputed from numerator and denominator, not averaged from pre-aggregated percentages.
Bad pattern:
SELECT AVG(conversion_rate) AS avg_conversion_rate
FROM campaign_daily_stats;
Better pattern:
SELECT
campaign_month,
SUM(conversions) * 1.0 / NULLIF(SUM(visits), 0) AS conversion_rate
FROM campaign_daily_stats
GROUP BY campaign_month;
Interview line: “I would aggregate the numerator and denominator separately, then divide. Averaging daily rates gives each day equal weight even if traffic is very different.”
How to debug when your query looks wrong
When a SQL answer returns surprising numbers, do not randomly rewrite it. Inspect row counts after each step. Count rows before and after joins. Check duplicate keys. Compare COUNT(*) to COUNT(DISTINCT user_id). Pull five example users through the funnel manually. Add a temporary CTE that shows intermediate grain.
This debugging habit is valuable in interviews because it shows production judgment. A perfect query written silently is good; a query with visible checks is often better. Hiring teams want people who can find the source of metric drift when a dashboard is off by 12%, not just people who can pass a syntax quiz.
Related guides
- API Design Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A practical API design interview cheatsheet for 2026: how to scope the problem, choose REST/GraphQL/gRPC patterns, model resources, handle auth, versioning, rate limits, and avoid the traps that cost senior candidates offers.
- AWS Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A high-signal AWS interview cheatsheet for 2026 covering architecture patterns, IAM, networking, reliability, cost, debugging, and the answers that show real cloud judgment.
- Backend System Design Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A backend System Design interview cheatsheet for 2026 with the core flow, architecture patterns, capacity heuristics, reliability tradeoffs, and traps that separate senior answers from vague box drawing.
- Coding Interview Patterns Cheatsheet: 15 Templates That Solve LeetCode — Skip the grind. These 15 reusable patterns cover 90% of LeetCode problems and will get you through any FAANG-level coding interview.
- Data Modeling Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps — A practical Data Modeling interview cheatsheet for 2026 covering entities, relationships, relational and NoSQL patterns, analytics models, index choices, examples, and the traps that make otherwise strong candidates look shallow.
