Skip to main content
Guides Skills and frameworks Designing a Search System Design Interview — Inverted Index, Ranking, and Recall
Skills and frameworks

Designing a Search System Design Interview — Inverted Index, Ranking, and Recall

9 min read · April 25, 2026

A practical system design guide for search interviews, covering inverted indexes, crawling and ingestion, query execution, ranking, recall, freshness, personalization, scaling, and evaluation trade-offs.

Designing a Search System Design Interview — Inverted Index, Ranking, and Recall

Designing a search system design interview is a high-signal exercise because it combines data pipelines, indexing, distributed systems, ranking, latency, and product judgment. Interviewers want to see whether you understand the core search loop: ingest documents, build an inverted index, retrieve candidates, rank results, and measure whether users actually find what they need. The best answers balance recall, precision, freshness, and cost.

This guide gives you a structured way to design search for products, marketplaces, documents, help centers, or web-like systems.

Start with scope and requirements

Before architecture, clarify the search domain:

  • What are we searching: web pages, products, users, messages, files, tickets, jobs, places?
  • How many documents and how quickly do they change?
  • What query types matter: keyword, filters, autocomplete, semantic search, faceting, geo, permissions?
  • What is the latency target?
  • How fresh do results need to be?
  • Is ranking personalized?
  • Are there access controls?
  • How do we measure success?

Functional requirements might include indexing documents, keyword search, filters, sorting, ranking, highlighting, autocomplete, and pagination. Non-functional requirements include low latency, high availability, freshness, relevance quality, observability, and scalable ingestion.

State assumptions. For example: "I will design a product search system with tens of millions of items, frequent inventory updates, keyword and filter search, and a target of sub-second response time."

High-level architecture

A clear search architecture has two paths.

Indexing path:

Source of truth -> Change events / crawler -> Ingestion pipeline -> Text processing -> Index builder -> Search index shards

Query path:

Client -> Search API -> Query parser -> Candidate retrieval -> Ranking -> Results / facets / logging

Separate the source of truth from the search index. The index is optimized for retrieval, not transaction ownership. If the product database says a listing is deleted or private, the search system must eventually reflect that, but the database remains the authority.

Inverted index explanation

The inverted index is the core concept. Instead of storing documents and scanning them for each query, search engines map terms to documents.

Example documents:

  • Doc 1: "red running shoes"
  • Doc 2: "blue trail running shoes"
  • Doc 3: "red hiking backpack"

Inverted index:

| Term | Posting list | |---|---| | red | Doc 1, Doc 3 | | running | Doc 1, Doc 2 | | shoes | Doc 1, Doc 2 | | trail | Doc 2 | | backpack | Doc 3 |

For query "red shoes," the system retrieves posting lists for red and shoes, intersects or scores candidates, then ranks them. Real indexes store far more: term frequency, document frequency, positions for phrase queries, field information, payloads, and compression metadata.

A strong interview answer explains that the inverted index enables fast candidate retrieval, while ranking determines order.

Text processing and analysis

Indexing is not just tokenizing on spaces. Text analysis may include:

  • Lowercasing and Unicode normalization.
  • Tokenization by language.
  • Stemming or lemmatization.
  • Stop-word handling.
  • Synonyms and abbreviations.
  • Spelling correction support.
  • Field weighting for title, body, tags, category, brand.
  • Handling punctuation, emojis, code, or product model numbers.

Be careful with aggressive normalization. In product search, apple watch and apple may require context. In code search, punctuation and casing can matter. In medical, legal, or compliance domains, stemming can create dangerous matches.

The mature answer: analysis should match the domain and be evaluated with real queries.

Query execution path

A strong query path answer:

  1. Client sends query, filters, user context, and pagination token to Search API.
  2. Query parser normalizes terms, detects phrases, applies synonyms, and validates filters.
  3. Broker sends the query to relevant index shards.
  4. Each shard retrieves candidate documents using the inverted index.
  5. Shards score candidates using lexical scoring and business features.
  6. Broker merges top results from shards.
  7. Ranking layer applies final scoring, personalization, diversity, or policy rules.
  8. Results are hydrated with display fields or fetched from a document store.
  9. Query, impressions, clicks, reformulations, and latency are logged for evaluation.

Mention that hydration can be a latency trap. Storing enough display fields in the index avoids a database lookup for every result, but sensitive or rapidly changing fields may need validation against the source of truth.

Ranking: from lexical to learned

Ranking is where search becomes product-specific. Basic lexical ranking might use term frequency, inverse document frequency, field boosts, phrase matches, freshness, and popularity. More advanced ranking may use machine-learned models, embeddings, personalization, or rerankers.

A practical ranking stack:

  • Candidate retrieval: fast lexical or vector retrieval to get hundreds or thousands of candidates.
  • First-pass ranking: cheap scoring using text relevance, filters, freshness, availability, popularity.
  • Reranking: more expensive model on a smaller candidate set.
  • Business rules: safety, policy, diversity, deduplication, sponsored content labeling.

A good interview phrase: "I would keep retrieval broad enough to preserve recall, then use ranking to improve precision. If retrieval misses the right document, the ranker cannot recover it."

Recall versus precision

Recall means returning all relevant documents. Precision means returned results are relevant. Most search systems need both, but the balance depends on the product.

  • Legal discovery may favor recall: missing a relevant document is costly.
  • Shopping search often balances recall with precision and availability.
  • Help center search should prioritize fast answer quality.
  • People search may require strict permissions and identity disambiguation.

Ways to improve recall:

  • Synonyms and spelling correction.
  • Query expansion.
  • Better tokenization.
  • Multiple fields and aliases.
  • Semantic/vector retrieval as a complement.
  • Relaxing filters carefully when no results appear.

Ways to improve precision:

  • Field boosts.
  • Phrase matching.
  • Better ranking features.
  • Demoting stale, unavailable, duplicate, or low-quality documents.
  • Personalization and context.
  • Result diversity when one entity dominates.

Interviewers like candidates who say how they would measure the trade-off, not just claim one is better.

Freshness and indexing updates

Search freshness can be batch, near-real-time, or real-time depending on need.

  • Batch indexing works for static catalogs or low urgency.
  • Near-real-time indexing uses change events and refresh intervals, often enough for products, jobs, or articles.
  • Real-time constraints are harder and may be needed for chat search, monitoring, or inventory where stale results cause failed actions.

A typical update path uses source database change events, a queue or stream, ingestion workers, index updates, and retry/dead-letter handling. Deletes and permission changes deserve special attention. A stale private document in search is worse than a stale title.

A mature answer includes backfills. You need a way to rebuild the entire index from source of truth, not only process incremental updates forever.

Sharding and scaling

Search indexes are commonly partitioned into shards. The query broker sends requests to shards and merges results.

Sharding choices:

  • By document ID for even distribution.
  • By tenant or organization for isolation and permission efficiency.
  • By geography or category if queries are naturally scoped.
  • By time for logs or events.

Replicas improve availability and read throughput. Hot shards can happen when one tenant, category, or time range dominates. Mitigation includes better partitioning, routing, caching, and capacity isolation.

Caching can help, but search queries often have a long tail. Cache popular queries, filters, autocomplete prefixes, and expensive facet computations. Do not rely on cache to hide poor index design.

Autocomplete, facets, and filters

Autocomplete is usually a separate path optimized for prefix latency and typo tolerance. It may use prefix indexes, tries, edge n-grams, or specialized suggesters. Ranking autocomplete suggestions should consider popularity, personalization, and freshness.

Facets require counts by category, brand, price range, status, or other fields. Facet counts can be expensive at scale. Options include precomputed counts, approximate counts, shard-level aggregation, or limiting facets to common dimensions.

Filters should be applied efficiently during retrieval or scoring. Permission filters are not optional; they must be enforced before results are shown. If using post-filtering, be careful not to return empty pages because top candidates were filtered out too late.

Modern search interviews may include embeddings. A safe answer: vector search is useful for semantic similarity and vocabulary mismatch, but it does not replace lexical search in every domain.

Hybrid search is common:

  • Use lexical retrieval for exact terms, identifiers, names, and filters.
  • Use vector retrieval for conceptual matches and natural language queries.
  • Merge candidates and rerank with a model or weighted score.

Pitfalls:

  • Vector results may be plausible but wrong.
  • Exact identifiers, SKUs, usernames, and legal terms often need lexical precision.
  • Embeddings add cost, latency, update complexity, and evaluation needs.
  • Permissions and freshness still apply.

Evaluation and metrics

Search quality must be measured online and offline.

Offline metrics:

  • Precision@k.
  • Recall@k.
  • NDCG or ranking quality against judged queries.
  • Zero-result rate.
  • Coverage by query class.

Online metrics:

  • Click-through rate, but interpreted carefully.
  • Long click or successful engagement.
  • Conversion after search.
  • Reformulation rate.
  • Time to first useful result.
  • Query abandonment.
  • Latency percentiles.

Avoid optimizing clicks alone. A misleading result can get clicks and still fail users. Pair engagement metrics with downstream success and guardrails.

Common design traps

  • Scanning the database for every query.
  • Treating the search index as the source of truth.
  • Ignoring deletes, permissions, and stale private data.
  • Promising perfect real-time freshness without cost discussion.
  • Overusing semantic search where exact matching is required.
  • Forgetting query logs and evaluation.
  • Ranking only by popularity, causing rich-get-richer results.
  • Fetching every result from the primary database before responding.

Prep checklist

Be ready to explain:

  • Requirements and domain assumptions.
  • Inverted index and posting lists.
  • Text analysis and field weighting.
  • Query broker, shards, replicas, and merge step.
  • Candidate retrieval versus ranking.
  • Recall versus precision trade-offs.
  • Freshness model and full reindexing.
  • Filters, facets, autocomplete, and permissions.
  • Hybrid lexical/vector search.
  • Search quality metrics and logging.

How to talk about search design in interviews

Use concise boundary statements:

  • "The database is the source of truth; the search index is a retrieval structure."
  • "Retrieval protects recall; ranking improves precision."
  • "Deletes and permission changes need stricter freshness than low-risk metadata updates."
  • "Hybrid search is useful, but exact identifiers still need lexical handling."
  • "I would evaluate with judged queries and online behavior, not click-through alone."

A search system design interview is not about naming a search product and stopping there. It is about showing the data flow from source to index to ranked results, then defending the trade-offs around recall, freshness, latency, and relevance. If you can explain the inverted index, ranking stages, and evaluation plan clearly, you can handle most search design prompts.

How to adapt the answer by scale

For a small internal document search, you can keep the design simple: one managed search cluster, batch indexing, basic permissions, and query logs. For a marketplace or consumer product, you need stronger ranking, near-real-time updates, facets, personalization, and experimentation. For web-scale search, crawling, deduplication, spam, distributed indexing, and massive query serving become first-order problems.

Say this explicitly in the interview. A strong answer does not over-engineer every prompt. It matches the architecture to the product's scale, freshness, and correctness needs. If the interviewer pushes for more scale, then introduce sharding, replicas, event streams, rerankers, and backfill pipelines as the next evolution rather than the default starting point.