Kubernetes Interview Questions in 2026 — Controllers, Networking, and Operators
A practical guide to Kubernetes interview questions in 2026, focused on the controller model, service networking, CRDs, operators, and the debugging scenarios senior candidates actually get asked.
Kubernetes Interview Questions in 2026 — Controllers, Networking, and Operators
Kubernetes interview questions in 2026 are less about memorizing kubectl commands and more about explaining why the platform behaves the way it does. Strong candidates can walk from an API object to a controller reconciliation loop, from a Pod IP to a Service VIP, and from a CRD to an operator that safely automates production work. The bar is not perfection; it is being able to reason through failure with the mental model Kubernetes uses internally.
This guide focuses on the questions that separate someone who has only deployed YAML from someone who can operate, debug, and extend a cluster.
What interviewers are really testing
Most Kubernetes interviews contain four layers:
| Layer | What they ask | What a strong answer proves | |---|---|---| | Workload basics | Pods, Deployments, StatefulSets, Jobs | You know which abstraction owns which lifecycle | | Control plane | API server, scheduler, controllers, etcd | You understand desired state and reconciliation | | Networking | Services, DNS, CNI, kube-proxy, Ingress, NetworkPolicy | You can trace traffic and isolate failures | | Extension model | CRDs, operators, admission, custom controllers | You can automate without fighting Kubernetes |
A good interview answer usually starts with the object model, then explains the control loop, then names the operational trade-off. For example, "A Deployment does not directly run containers; it owns ReplicaSets, and the ReplicaSet maintains the desired number of Pods. That indirection is what makes rolling updates and rollbacks possible."
Controller questions: desired state, reconciliation, and ownership
Expect some version of: "What happens after I apply a Deployment?"
A crisp answer:
kubectl applysends the object to the API server.- The API server authenticates, authorizes, validates, runs admission, and persists the desired state to etcd.
- The Deployment controller observes the Deployment and creates or updates a ReplicaSet.
- The ReplicaSet controller creates Pods until actual replicas match desired replicas.
- The scheduler assigns unscheduled Pods to nodes based on constraints, resources, affinity, taints, and topology.
- The kubelet on the chosen node creates containers through the container runtime and reports status back.
The important phrase is reconciliation loop. Kubernetes components continuously compare desired state to observed state and take small corrective actions. They are not one-shot scripts.
Common follow-up questions:
- Why does Kubernetes use controllers instead of imperative commands? Because controllers make recovery automatic. If a node dies, the system can converge again without a human re-running a launch command.
- What is an owner reference? Metadata that tells Kubernetes one object is owned by another, enabling garbage collection and lifecycle linkage.
- Why might a Pod not be deleted when its owner is deleted? Finalizers, propagation policy, orphaning, or a controller that cannot complete cleanup.
- What is the difference between status and spec?
specis the desired state submitted by users or controllers.statusis observed state reported by controllers or kubelets.
A senior-level answer includes idempotency. A controller should be able to run the same reconciliation many times, tolerate partial failure, and converge without assuming it is the only actor in the cluster.
Deployment, StatefulSet, DaemonSet, and Job questions
Interviewers often ask you to choose the right workload object.
Use a Deployment for stateless replicated services where Pods are interchangeable. It supports rolling updates, rollback, surge, and availability controls.
Use a StatefulSet when each replica needs stable identity, ordered rollout, stable network names, or persistent volume association. Databases, queues, and consensus systems are common examples, although running databases on Kubernetes still requires careful operational maturity.
Use a DaemonSet when exactly one Pod, or one Pod per selected node, should run on nodes. Log collectors, node agents, CNI components, and monitoring agents fit here.
Use a Job for finite work that should complete successfully. Use a CronJob for scheduled finite work.
A strong answer explains the failure behavior. A Deployment can replace any Pod. A StatefulSet replacement must preserve identity and storage. A DaemonSet responds to node membership. A Job tracks completions and retries.
Networking questions: trace the packet
Kubernetes networking questions in 2026 usually ask you to trace a request. The classic prompt: "A user hits api.example.com; what happens before the request reaches a Pod?"
A solid path:
- DNS resolves
api.example.comto a load balancer or ingress endpoint. - The load balancer forwards traffic to an Ingress controller or Gateway implementation.
- The Ingress or Gateway routes based on host/path/TLS rules to a Service.
- The Service selects Pods using labels and exposes a stable virtual IP or endpoint set.
- kube-proxy, eBPF, or the CNI data plane forwards traffic to a Pod IP.
- The container receives the request on its declared port.
Know the distinction between these objects:
| Object | Purpose | Interview trap | |---|---|---| | Pod IP | Address for a specific Pod | It is ephemeral; do not depend on it directly | | ClusterIP Service | Stable internal virtual IP | It load-balances only to selected ready endpoints | | NodePort | Exposes a port on every node | Usually not the clean production edge by itself | | LoadBalancer | Requests an external LB from cloud provider | Behavior depends on provider integration | | Ingress | HTTP routing abstraction | It requires a controller; the object alone does nothing | | Gateway API | Newer, more expressive routing model | Not every cluster has the same implementation |
DNS, Services, and endpoints
A common question: "Why does a Service have no traffic?"
Debug in this order:
- Does the Service selector match the Pod labels?
- Are endpoints or endpoint slices populated?
- Are Pods ready, or are readiness probes failing?
- Is the Service
targetPortcorrect for the container? - Is NetworkPolicy denying traffic?
- Is kube-proxy, CNI, or node routing unhealthy?
The best answers avoid guessing. Say: "I would first compare kubectl get svc, kubectl get endpointslices, and Pod labels, because the most common cause is selector mismatch or readiness removing endpoints. Then I would test DNS resolution inside the cluster and curl the Service from a debug Pod."
DNS questions usually center on service discovery. A Service named payments in namespace prod is reachable as payments.prod.svc.cluster.local. Short names work inside the same namespace, but explicit names avoid ambiguity.
CNI, kube-proxy, and NetworkPolicy
You do not need to implement a CNI in an interview, but you should know the responsibility split.
The CNI plugin gives Pods network interfaces and makes Pod-to-Pod communication possible across nodes. Different CNIs implement routing, overlays, encryption, eBPF acceleration, and policy differently.
kube-proxy historically programs iptables or IPVS rules for Services. Some modern clusters replace or supplement it with eBPF data planes. The interview-safe phrasing is: "The Service abstraction is stable, but the implementation can be kube-proxy iptables, IPVS, or CNI/eBPF depending on the cluster."
NetworkPolicy controls allowed traffic between Pods and peers, but only if the CNI enforces it. The default is usually allow-all unless policies select a Pod. Once a Pod is selected by an ingress or egress policy, only allowed traffic in that direction is permitted.
A good NetworkPolicy answer names both labels and direction. "I would select the backend Pods, allow ingress only from frontend Pods on port 443, and add egress rules only if the cluster enforces default-deny egress."
Operators and CRDs
Kubernetes operator interview questions test whether you understand extending the API without inventing a separate control plane.
A CRD, or Custom Resource Definition, adds a new resource type to the Kubernetes API. A custom controller watches those resources and reconciles real infrastructure to match their spec. An operator is a custom controller plus domain knowledge: it encodes operations such as backup, failover, upgrade, scaling, certificate rotation, or shard rebalancing.
Example: a PostgresCluster custom resource might declare version, replicas, storage, backup policy, and failover settings. The operator watches PostgresCluster objects and creates StatefulSets, Services, Secrets, PersistentVolumeClaims, and backup Jobs. It also updates status so users can see health and phase.
Strong operator answers mention:
- Idempotent reconciliation.
- Finalizers for cleanup before deletion.
- Status conditions for observable progress.
- Versioned CRDs and schema validation.
- Safe rollouts and backoff.
- Least-privilege RBAC for the controller.
- Avoiding infinite reconcile loops caused by writing noisy status or mutating watched objects unnecessarily.
Scenario questions and answer outlines
Question: Pods are CrashLoopBackOff. What do you do?
Check recent logs, previous container logs, exit code, events, config, secrets, probes, and resource limits. Distinguish application crash from kubelet killing the container. If liveness is too aggressive, Kubernetes may be restarting a healthy-but-slow process.
Question: A rollout is stuck. What do you inspect?
Deployment status, ReplicaSets, Pod events, image pull errors, readiness probes, quotas, PodDisruptionBudgets, and scheduling constraints. Explain that a rolling update waits for new Pods to become ready before scaling down old ones, depending on max surge and max unavailable.
Question: A Pod is Pending. Why?
Scheduler cannot place it. Common causes: insufficient CPU or memory, PVC not bound, node selector mismatch, taints without tolerations, topology spread constraints, affinity rules, quota, or missing runtime class.
Question: How do requests and limits affect scheduling?
Requests drive scheduling and resource guarantees. Limits cap usage. CPU limits can throttle; memory limits can cause OOM kills. A mature answer warns against copying arbitrary limits into latency-sensitive services without observing throttling.
Common traps
- Saying "a Deployment creates Pods" without mentioning ReplicaSets. That is acceptable shorthand in casual conversation, but interviews often expect the owner chain.
- Treating Ingress as a built-in load balancer. It is only a spec until a controller implements it.
- Assuming NetworkPolicy works in every cluster. It depends on CNI support.
- Confusing liveness and readiness probes. Readiness controls traffic; liveness restarts containers.
- Treating Kubernetes as a magic database platform. Stateful workloads need backups, restore tests, storage classes, disruption planning, and operator maturity.
- Debugging from the outside only. Many service and DNS failures are fastest to test from a temporary Pod inside the namespace.
Prep checklist for Kubernetes interviews
Before the interview, be ready to whiteboard:
- The lifecycle from
kubectl applyto running container. - Deployment versus StatefulSet versus DaemonSet versus Job.
- Service routing, endpoints, DNS, and ingress.
- A failed rollout debugging flow.
- A Pending Pod debugging flow.
- Requests, limits, probes, affinity, taints, and disruption budgets.
- CRDs, operators, finalizers, and status conditions.
- One real production incident where Kubernetes helped or complicated recovery.
How to talk about Kubernetes on your resume
Avoid vague bullets like "used Kubernetes for deployments." Show scope and operational ownership:
- "Reduced failed deploys by adding readiness gates, progressive rollout checks, and rollback runbooks for 40 Kubernetes services."
- "Built a custom controller to reconcile tenant environments from CRDs, replacing manual namespace provisioning."
- "Debugged cluster networking incidents across Ingress, Service endpoint selection, and CNI policy enforcement."
The interview win is not reciting every Kubernetes object. It is showing that you understand Kubernetes as an API-driven reconciliation system. If you can explain controllers, trace networking, and describe why an operator is just domain-specific reconciliation, you can handle most Kubernetes interview questions in 2026 with confidence.
Related guides
- A/B Testing Interview Questions in 2026 — Power Analysis, Peeking, and SRM — A tactical guide to A/B testing interview questions in 2026, with answer frameworks for power analysis, peeking, sample-ratio mismatch, guardrails, metrics, and experiment trade-offs. Built for product analysts, data scientists, PMs, and growth roles.
- AWS Interview Questions in 2026 — VPC, IAM, and the Services That Always Come Up — A focused AWS interview prep guide for 2026 covering VPC design, IAM reasoning, core services, common architecture prompts, debugging flows, and the mistakes that weaken senior answers.
- Deep Learning Interview Questions in 2026 — Backprop, Optimizers, and Regularization — A 2026-ready deep learning interview guide covering backpropagation, optimizers, regularization, debugging, transformers, evaluation, and sample answers that show practical judgment.
- Docker Interview Questions in 2026 — Layers, Multi-Stage Builds, and Runtime — A practical Docker interview guide for 2026 covering image layers, Dockerfile design, multi-stage builds, runtime isolation, Compose, security, and the debugging questions candidates keep seeing.
- GraphQL Interview Questions in 2026 — Schemas, Resolvers, and N+1 Prevention — A focused GraphQL interview guide for 2026 covering schema design, resolvers, N+1 prevention, DataLoader, pagination, auth, caching, federation, mutations, observability, and production trade-offs. Built for frontend, backend, and platform candidates.
