Skills and frameworks

SQL Interview Questions Cheatsheet: Window Functions, CTEs & Optimization

10 min read · April 24, 2026

Master the SQL concepts that actually show up in technical interviews: window functions, CTEs, and query optimization with real examples.

SQL Interview Questions Cheatsheet: Window Functions, CTEs & Optimization

SQL interviews are predictable in the best possible way. The same core concepts appear across FAANG, mid-stage startups, and enterprise shops: window functions, CTEs, indexing strategy, and query optimization. If you've been memorizing syntax without understanding the underlying mechanics, you'll get exposed the moment an interviewer asks a follow-up. This guide covers the concepts that actually matter, with real examples and the kind of opinionated framing you need to give confident answers under pressure.

This isn't a 500-question dump. It's a targeted deep-dive into the three areas that separate candidates who pass SQL rounds from candidates who don't. Master these, and you'll be in the top 20% of applicants for any role that requires SQL proficiency — which in 2026 is nearly every data-adjacent engineering, analytics, and backend role.

Window Functions Are the Single Highest-ROI Topic You Can Study

If you spend zero time on anything else, spend it here. Window functions come up in virtually every SQL interview at companies that care about data, and most candidates either don't know them or can only recite the syntax without understanding when and why to use them.

A window function performs a calculation across a set of rows related to the current row, without collapsing the result set the way GROUP BY does. That distinction is everything.

The core syntax:

SELECT
  employee_id,
  department,
  salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

The OVER() clause defines the window. PARTITION BY divides the data into groups (like GROUP BY but without collapsing rows). ORDER BY inside the window defines the ordering for functions like RANK(), ROW_NUMBER(), and running totals.

The functions you must know cold:

ROW_NUMBER() — unique sequential integer per row within the partition, no ties
RANK() — allows ties, skips subsequent ranks (1, 1, 3)
DENSE_RANK() — allows ties, does not skip ranks (1, 1, 2)
LAG(col, n) / LEAD(col, n) — access a value from n rows behind or ahead
SUM() OVER (...) / AVG() OVER (...) — running totals and moving averages
NTILE(n) — divide rows into n buckets (useful for percentiles)
FIRST_VALUE() / LAST_VALUE() — grab boundary values in a window

A classic interview problem: Find the top 3 highest-paid employees in each department.

WITH ranked AS (
  SELECT
    employee_id,
    department,
    salary,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT * FROM ranked WHERE rnk <= 3;

Why DENSE_RANK() and not ROW_NUMBER()? Because if two employees share the third-highest salary, ROW_NUMBER() would arbitrarily exclude one. Always justify your choice of ranking function — interviewers will ask.

CTEs Are About Readability and Reasoning, Not Just Syntax

Common Table Expressions (CTEs) let you define named subqueries at the top of your statement and reference them multiple times. The syntax is simple:

WITH cte_name AS (
  SELECT ...
)
SELECT * FROM cte_name;

But the real skill is knowing when CTEs make your query better and when they're just noise.

A CTE that makes a query easier to read and debug is always worth the overhead. A CTE that exists just to avoid writing a subquery twice is earned. A CTE that breaks a simple join into five steps because someone thought it looked clever is a maintenance liability.

When to use a CTE:

You need to reference the same subquery more than once in the main query
You're building a multi-step transformation that would become unreadable as nested subqueries
You want to isolate a filtering or aggregation step before joining it to another table
You're writing recursive queries (hierarchical data, org charts, graph traversal)

Recursive CTEs are their own interview category. If you're interviewing for a role with hierarchical data (org structures, category trees, bill of materials), you'll likely see this:

WITH RECURSIVE org_chart AS (
  -- Anchor: start with the CEO
  SELECT employee_id, manager_id, name, 0 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive: join to find direct reports
  SELECT e.employee_id, e.manager_id, e.name, oc.depth + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY depth, name;

The anchor member initializes the recursion. The recursive member references the CTE itself. The database keeps joining until no new rows are added. Know this pattern — it demonstrates genuine understanding of how recursive structures work in SQL.

CTE vs. subquery vs. temp table: CTEs are scoped to a single query and read once (in most databases). Temp tables persist for the session and can be indexed. If you're hitting the same expensive CTE multiple times in a complex query and performance degrades, materialize it as a temp table instead. In PostgreSQL, you can use MATERIALIZED keyword explicitly. This is the kind of operational nuance that separates engineers from analysts.

Query Optimization: What Interviewers Actually Want to Hear

When an interviewer asks "how would you optimize this query," they're not looking for you to say "add an index." They want to see a systematic diagnostic process. Here's the one you should internalize:

Run EXPLAIN / EXPLAIN ANALYZE — understand the execution plan before touching anything
Identify full table scans — look for Seq Scan (PostgreSQL) or ALL in the type column (MySQL EXPLAIN)
Check cardinality — is the database estimating row counts accurately? Poor statistics cause bad plans
Look at join order and type — nested loop vs. hash join vs. merge join have very different performance characteristics
Identify predicate pushdown opportunities — filter early, not late
Check for unnecessary sorts — ORDER BY can be expensive; window functions with ORDER BY inside the frame are often unavoidable but should be justified

The indexing fundamentals you need to know:

A B-tree index on (department, salary) supports queries filtering by department first, then salary. It does not efficiently support filtering by salary alone — the leftmost prefix rule.
Covering indexes include all columns needed by the query, eliminating the need to hit the main table. This is called an index-only scan.
Partial indexes index only a subset of rows (WHERE status = 'active'), reducing index size and improving write performance.
Index scans are not always faster than sequential scans. For small tables or queries returning a large percentage of rows, the optimizer may correctly choose a full scan.

Common anti-patterns that kill performance:

Wrapping an indexed column in a function: WHERE YEAR(created_at) = 2025 prevents index use. Rewrite as WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'.
Using SELECT * in subqueries — pull only what you need
OR across different columns often can't use a single index efficiently; consider UNION ALL instead
Implicit type casting — comparing a VARCHAR column to an integer causes a full scan
OFFSET pagination on large tables — scanning and discarding 100,000 rows to return rows 100,001–100,010 is brutally inefficient; use keyset pagination instead

The Aggregation Questions That Trip People Up

Aggregation is "basic SQL" until it isn't. Interviewers layer in edge cases specifically to test whether you actually understand what the database is doing.

Know the difference between WHERE and HAVING. WHERE filters rows before aggregation. HAVING filters groups after aggregation. You cannot reference an aggregate function in a WHERE clause.

-- Wrong: can't use aggregate in WHERE
SELECT department, COUNT(*) FROM employees
WHERE COUNT(*) > 10
GROUP BY department;

-- Correct
SELECT department, COUNT(*) FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

Understand NULL behavior in aggregates. COUNT(*) counts all rows. COUNT(column) counts non-NULL values. AVG() ignores NULLs — which can silently distort your results if NULLs represent zeroes in your domain. Know this and say it out loud when it's relevant.

Conditional aggregation is an underused pattern that appears in data transformation interviews:

SELECT
  user_id,
  COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
  COUNT(*) FILTER (WHERE event_type = 'refund') AS refund_count,
  SUM(amount) FILTER (WHERE event_type = 'purchase') AS total_revenue
FROM events
GROUP BY user_id;

This pattern (FILTER in PostgreSQL, CASE WHEN inside SUM() in MySQL/SQL Server) pivots row-level data into columns without a separate self-join. It's cleaner and faster than the alternative.

Joins: The Concepts That Still Catch Experienced Engineers

You know INNER JOIN. Every interviewer knows you know INNER JOIN. What they're testing is whether you know the subtleties.

LEFT JOIN + NULL check vs. NOT EXISTS vs. NOT IN: These three patterns all find "rows in A with no match in B" — but they behave differently when NULLs are involved. NOT IN with a subquery that can return NULLs will return zero rows due to three-valued logic. NOT EXISTS handles NULLs correctly. This is a common trap question.

Self joins appear in problems involving comparisons within the same table: find employees who earn more than their manager, find pairs of users who made purchases on the same day, find consecutive date gaps.

Cross joins generate Cartesian products. They're not a mistake — they're the right tool for generating combinations, date spine tables, or test data. Know when to use them intentionally.

Lateral joins (PostgreSQL LATERAL, SQL Server CROSS APPLY) let you reference columns from a preceding table inside a subquery — think of it as a correlated subquery that returns multiple rows. These come up in advanced interviews and are worth knowing if you're targeting senior or staff-level roles.

What Strong SQL Candidates Do That Average Candidates Don't

Here's the behavioral and communication layer that determines whether technical correctness turns into an offer:

They talk through the execution plan, not just the query. "I'd EXPLAIN this first to confirm the index is being used" signals operational experience.
They flag data quality assumptions. "I'm assuming no NULLs in the join key — in production I'd verify that" shows you've shipped real systems.
They offer tradeoffs, not single answers. "A CTE is cleaner here, but if this runs millions of times a day, I'd benchmark a materialized temp table."
They ask about scale before optimizing. A query optimization that's perfect for 10,000 rows is irrelevant if the table has 10 billion. Asking "what's the table size and expected query frequency" is exactly what a senior engineer should do.
They know their database. PostgreSQL, MySQL, BigQuery, and Snowflake have meaningfully different optimizer behaviors, index types, and SQL dialects. Saying "in Postgres I'd use a partial index here, but in BigQuery indexes don't exist so I'd think about clustering instead" demonstrates real depth.

The difference between a candidate who writes correct SQL and one who gets the offer is usually one thing: the second person explains why they made each decision, not just what they wrote.

Next Steps

You have one week before your interview sprint begins. Here's how to use it:

Spend two hours on window functions exclusively. Go to a SQL sandbox (db-fiddle.com, Mode, or a local Postgres instance), create a fake orders or employees table, and write at least 10 queries using ROW_NUMBER, RANK, LAG, LEAD, and running SUM. Don't just read — execute and read the output.

Pull three "hard" SQL problems from LeetCode or StrataScratch and solve them using CTEs. Specifically target problems involving consecutive events, ranking within groups, or multi-step aggregations. Refactor any nested subquery solution into a CTE version and compare readability.

Install PostgreSQL locally, load a medium-sized dataset (the pgbench schema or any public dataset), and run EXPLAIN ANALYZE on at least five queries. Read the output. Find one query where adding an index changes the plan. Document what changed and why.

Write out your "optimization framework" in your own words — the five-step diagnostic process from this guide, rephrased in your voice. Practice saying it out loud as if you're talking to an interviewer. Fluency under pressure requires verbal rehearsal, not just reading.

Review your target companies' tech stacks and identify which SQL dialect they use. If they're on Snowflake, read about clustering keys and query pruning. If they're on BigQuery, understand partitioned tables and slot-based billing. Dialect-specific knowledge in an interview signals genuine experience — it's one of the clearest signals you've actually shipped production queries and not just practiced on LeetCode.

SQL Window Functions Interview Guide — RANK, LAG, and Running Totals Worked Examples — A practical SQL window functions interview guide with worked RANK, LAG, running total, rolling average, and cohort-style examples. Use it to answer analytics, data engineering, and product SQL questions without defaulting to slow self-joins.
Data Engineer Interview Questions — Pipelines, SQL Optimization, and Warehouse Design — Data engineer interviews test practical judgment: modeling data, moving it reliably, optimizing SQL, and designing warehouses that analysts and products can trust. This guide covers the 2026 questions, answer patterns, and senior-level signals.
LLM Interview Questions in 2026 — Pretraining, RLHF, and Inference Optimization — Prepare for LLM interview questions in 2026 with practical answer frameworks for pretraining, RLHF, context windows, RAG, evaluation, quantization, batching, KV cache, latency, and deployment trade-offs. Written for ML, data, platform, and AI product interviews.
SQL Interview Patterns Cheatsheet — Joins, Subqueries, and the 10 Templates That Recur — A SQL interview patterns cheatsheet built around the recurring templates: joins, anti-joins, aggregation, top-N per group, window functions, subqueries, deduplication, retention, funnels, and data-quality checks. Use it to recognize the prompt before you write the query.
A/B Testing Interview Questions in 2026 — Power Analysis, Peeking, and SRM — A tactical guide to A/B testing interview questions in 2026, with answer frameworks for power analysis, peeking, sample-ratio mismatch, guardrails, metrics, and experiment trade-offs. Built for product analysts, data scientists, PMs, and growth roles.

SQL Interview Questions Cheatsheet: Window Functions, CTEs & Optimization

Window Functions Are the Single Highest-ROI Topic You Can Study

CTEs Are About Readability and Reasoning, Not Just Syntax

Query Optimization: What Interviewers Actually Want to Hear

The Aggregation Questions That Trip People Up

Joins: The Concepts That Still Catch Experienced Engineers

What Strong SQL Candidates Do That Average Candidates Don't

Next Steps

Related guides

More in Skills and frameworks

A/B Testing Interview Questions in 2026 — Power Analysis, Peeking, and SRM

API Design Interview Cheatsheet in 2026 — Patterns, Examples, Practice Plan, and Common Traps

API Design Interview Guide — REST vs GraphQL vs gRPC, Versioning, and Pagination