Skip to main content
Guides Skills and frameworks Designing a Chat System Design Interview — WebSockets, Presence, and Message Storage
Skills and frameworks

Designing a Chat System Design Interview — WebSockets, Presence, and Message Storage

9 min read · April 25, 2026

A system design interview guide for chat applications, covering WebSockets, fanout, message ordering, presence, storage, delivery receipts, media, search, scaling, and common trade-offs.

Designing a Chat System Design Interview — WebSockets, Presence, and Message Storage

Designing a chat system design interview is a classic because it forces you to balance real-time delivery, durable message storage, ordering, presence, offline users, fanout, and abuse controls. The interviewer is not expecting you to rebuild every feature in Slack, WhatsApp, or Discord. They want to see whether you can define requirements, choose the right communication model, and reason through the hard parts without hand-waving.

This guide gives you a practical structure for answering chat system design prompts with WebSockets, presence, and message storage at the center.

Start with requirements

Do not jump directly to WebSockets. Begin by narrowing scope.

Functional requirements:

  • One-to-one and group conversations.
  • Send, receive, edit, and delete messages.
  • Online/offline delivery.
  • Message history.
  • Delivery and read receipts, if required.
  • Presence, typing indicators, and last seen.
  • Media attachments.
  • Search, if in scope.
  • Push notifications for offline users.

Non-functional requirements:

  • Low latency for online users.
  • Durable storage for messages.
  • Reasonable ordering within a conversation.
  • Horizontal scalability.
  • High availability across failures.
  • Abuse, spam, and rate limiting.
  • Privacy and access control.

Then state assumptions. For example: "I will design for consumer-scale group chat with millions of users, but I will keep end-to-end encryption out of scope unless you want to go there."

High-level architecture

A clear architecture:

  1. Clients connect to a real-time gateway over WebSocket.
  2. API service handles authentication, conversation metadata, and message send requests.
  3. Message service validates membership, assigns message IDs, and writes messages to durable storage.
  4. A fanout layer publishes new messages to connected recipients through their gateway nodes.
  5. Offline notification service sends push notifications.
  6. Presence service tracks online status and connection state.
  7. Message storage keeps conversation history and supports pagination.
  8. Search index consumes message events asynchronously if search is required.

A simple text diagram:

Client -> WebSocket Gateway -> Message Service -> Message Store
                         |             |
                         |             -> Event Bus -> Fanout / Push / Search
                         -> Presence Service

The main interview move is separating connection management from durable message processing. WebSocket gateways should not be the source of truth for messages.

WebSockets and connection management

Why WebSockets? They provide a persistent bidirectional connection, which is useful for low-latency messages, typing indicators, receipts, and presence updates. Alternatives include long polling, server-sent events, mobile push, or polling. For chat, WebSockets are the default online path, with push notifications as the offline path.

Gateway responsibilities:

  • Authenticate the connection using a token.
  • Maintain connection state: user ID, device ID, subscribed conversations.
  • Send heartbeats or pings to detect dead connections.
  • Enforce rate limits and message size limits.
  • Forward client events to backend services.
  • Deliver server events to connected clients.

Do not make a single gateway sticky forever if you can avoid it. Gateways should be horizontally scalable. Use a routing layer or connection registry so backend fanout knows which gateway currently holds a user's connection.

Message send flow

A strong answer traces the write path:

  1. Client sends message over WebSocket or HTTPS with conversation ID, client-generated idempotency key, body, metadata, and auth token.
  2. Gateway forwards it to the message service.
  3. Message service checks authentication, membership, rate limits, block lists, and conversation state.
  4. Message service assigns a server message ID and timestamp or sequence number.
  5. Message is written to durable storage.
  6. An event is published to the event bus.
  7. Fanout delivers to connected participants.
  8. Offline users receive push notifications.
  9. Sender receives an acknowledgement with canonical ID and status.

Idempotency matters. Mobile clients retry when networks flap. A client-generated request ID prevents duplicate messages if the first write succeeded but the acknowledgement was lost.

Message ordering

Ordering is one of the most important chat interview topics. You can usually promise ordering within a conversation, not global ordering across all conversations.

Common approaches:

  • Use a monotonically increasing sequence number per conversation.
  • Use time-sortable IDs plus server timestamps, with conflict handling.
  • Partition the message log by conversation ID so writes for a conversation are serialized through one partition.

For small and medium group chats, per-conversation sequence is straightforward. For very large channels, strict total ordering may limit throughput. You can shard large channels and accept approximate ordering for some events, or use a dedicated log partition per hot channel with backpressure.

A good interview phrase: "I would define the guarantee as stable ordering within a conversation as seen by clients after server acknowledgement. I would not promise global ordering across conversations."

Message storage design

Message storage needs high write throughput, efficient reads by conversation, pagination, and retention controls.

A practical schema:

| Field | Purpose | |---|---| | conversation_id | Partition or lookup key | | message_seq or message_id | Sort key within conversation | | sender_id | Access and display | | created_at | Display and pagination | | body | Message content or pointer to encrypted payload | | type | text, image, file, system | | edit_state | edited/deleted metadata | | idempotency_key | Duplicate prevention |

Datastore choices depend on scale and product needs. A relational database can work for moderate scale and strong querying. A wide-column store or log-structured NoSQL store can work for high-volume conversation timelines. Object storage is better for media blobs, not the message index itself.

Reads usually page backward from the latest message. Keep conversation metadata separately: participants, last message pointer, unread counts, mute state, and membership version.

Fanout: online and offline

Fanout means delivering a message event to recipients.

For one-to-one chat, fanout is easy: send to the sender's other devices and the recipient's devices.

For group chat, choose between:

  • Fanout on write: create per-user inbox entries when the message is sent. Fast reads, expensive writes for huge groups.
  • Fanout on read: store once in conversation log and have users read from it. Cheaper writes, more read-time filtering.
  • Hybrid: fanout on write for small groups, fanout on read for large channels.

Mention this trade-off. Interviewers like hearing that group size changes the architecture.

Offline users should not require the message service to block on push. Publish an event, then a notification worker decides whether to send APNs/FCM based on mute settings, device tokens, privacy preview settings, and rate limits.

Presence and typing indicators

Presence is deceptively hard because it is ephemeral, noisy, and user-visible. Do not store presence in the durable message database.

Presence service responsibilities:

  • Track active connections by user and device.
  • Update status on connect, heartbeat, disconnect, and timeout.
  • Store ephemeral state in a fast store such as Redis or a dedicated in-memory system.
  • Publish presence changes to interested users or conversations.
  • Apply privacy rules: invisible mode, blocked users, enterprise policy.

Presence should tolerate stale state. A laptop closing without clean disconnect should expire after heartbeat timeout. Typing indicators should be even more ephemeral: short TTL, rate-limited, and not durable.

A good answer: "I would treat presence as best-effort, not transactional. Messages must be durable; typing indicators can be dropped."

Receipts and unread counts

Delivery receipts and read receipts can become expensive if every message creates per-user writes in large groups.

Options:

  • Store per-user per-conversation last_read_seq instead of one row per message read.
  • Store delivery receipt per device only if the product truly needs it.
  • For large channels, show approximate read counts rather than every reader.
  • Batch receipt updates to reduce write amplification.

Unread count can be computed from latest conversation sequence minus user's last read sequence, adjusted for membership and deletes. For complex products, maintain a per-user conversation state table with last read, last delivered, mute, pinned, archive, and notification settings.

Media attachments

Do not push large media through the WebSocket path. Use object storage.

Flow:

  1. Client requests an upload URL.
  2. Client uploads media directly to object storage.
  3. Media service scans or processes the file.
  4. Message references the media object ID and metadata.
  5. Clients load thumbnails or signed URLs according to permissions.

Mention validation, size limits, virus scanning where appropriate, content moderation, thumbnail generation, and lifecycle retention.

Scaling and reliability

Key scaling decisions:

  • Partition conversations by conversation ID.
  • Keep WebSocket gateways stateless except for active connection memory.
  • Use an event bus for decoupling fanout, push, search, and analytics.
  • Replicate storage across zones.
  • Backpressure hot conversations rather than letting them overload every service.
  • Rate-limit sends, joins, typing events, and media uploads.

Reliability guarantee: a message should not be acknowledged as sent until it is durably written. Delivery to online users can happen after the write. If fanout fails, consumers can replay from the message event log or clients can fetch missed messages by sequence number.

Common interview traps

  • Treating WebSocket delivery as durable storage.
  • Ignoring mobile reconnects and duplicate sends.
  • Promising global ordering across all users.
  • Updating presence in the primary SQL message table on every heartbeat.
  • Sending media through the message service.
  • Forgetting blocked users, membership checks, and deleted conversations.
  • Making push notifications synchronous in the send path.
  • Not defining which features are in scope.

Prep checklist

Be ready to whiteboard:

  • Requirements and non-requirements.
  • WebSocket gateway responsibilities.
  • Message send flow with durable write before ack.
  • Per-conversation ordering model.
  • Message storage schema and pagination.
  • Fanout on write versus fanout on read.
  • Presence as ephemeral best-effort state.
  • Offline push notification path.
  • Read receipts and unread count strategy.
  • Hot group and abuse mitigation.

How to talk about chat system design in interviews

Use crisp language:

  • "WebSockets are the online transport, not the source of truth."
  • "I would acknowledge after durable write, then fan out asynchronously."
  • "Ordering is per conversation, not global."
  • "Presence is ephemeral and TTL-based; message history is durable."
  • "Large groups push me toward hybrid fanout."

If you can keep those boundaries clear, designing a chat system design interview becomes manageable. The goal is not to name every possible service. It is to show that you know which data must be correct, which signals can be best-effort, and where scale changes the trade-off.

Privacy, moderation, and abuse edge cases

Chat systems also need product safety boundaries. Even if the interviewer does not make abuse the core problem, mention the basics when time allows. Membership checks must happen on every send and read path, not only when a user opens the conversation. Blocked users should not be able to infer presence or read receipts. Deleted or edited messages need a clear product policy: are they tombstoned, removed for everyone, retained for compliance, or hidden only for one user?

Rate limiting should apply to message sends, group invites, typing indicators, media uploads, and account creation. Spam controls may include reputation, device signals, link scanning, and reporting flows. Moderation events should be asynchronous where possible, but high-risk media or public channels may require pre-delivery checks.

The senior framing is: real-time delivery is only one part of chat. Trust, privacy, and abuse controls must be designed into the same flows, otherwise the system scales harm as efficiently as it scales messages.