Skip to main content
Guides Skills and frameworks AWS Interview Questions in 2026 — VPC, IAM, and the Services That Always Come Up
Skills and frameworks

AWS Interview Questions in 2026 — VPC, IAM, and the Services That Always Come Up

9 min read · April 25, 2026

A focused AWS interview prep guide for 2026 covering VPC design, IAM reasoning, core services, common architecture prompts, debugging flows, and the mistakes that weaken senior answers.

AWS Interview Questions in 2026 — VPC, IAM, and the Services That Always Come Up

AWS interview questions in 2026 still come back to three themes: can you design a network that fails safely, can you explain IAM without hand-waving, and do you know the services that appear in almost every architecture conversation? You do not need to memorize every AWS product. You do need a dependable mental model for VPCs, identity, compute, storage, databases, queues, observability, and cost-aware trade-offs.

This guide is built for backend, DevOps, platform, SRE, data, and engineering manager interviews where AWS is part of the expected operating environment.

The interview map

Most AWS interviews contain some version of this table:

| Topic | What they ask | What a strong answer shows | |---|---|---| | VPC | Public/private subnets, routing, NAT, security groups | You understand traffic paths and blast radius | | IAM | Policies, roles, STS, least privilege | You can reason about authorization, not just attach admin | | Compute | EC2, ECS, EKS, Lambda | You pick the right runtime model | | Storage | S3, EBS, EFS | You know durability, access, and lifecycle trade-offs | | Data | RDS, DynamoDB, ElastiCache | You choose consistency, scaling, and operations consciously | | Messaging | SQS, SNS, EventBridge, Kinesis | You decouple systems and handle retries | | Operations | CloudWatch, CloudTrail, Config, alarms | You can debug and audit production |

The best AWS answers use specifics while avoiding fake precision. Say what you would measure, which failure mode you are protecting against, and how permissions are constrained.

VPC questions: public, private, and routing

The classic prompt: "Design a VPC for a web application."

A strong baseline design:

  • One VPC with CIDR sized for expected growth.
  • At least two Availability Zones.
  • Public subnets for load balancers and managed edge resources that need internet-routable paths.
  • Private application subnets for ECS, EKS nodes, EC2 instances, or internal services.
  • Private data subnets for RDS, ElastiCache, or internal databases.
  • Internet Gateway attached to the VPC.
  • NAT Gateway or controlled egress path for private subnets that need outbound internet.
  • Route tables separated by subnet purpose.
  • Security groups scoped by application role.

The key distinction: a subnet is not public because of its name. It is public when its route table sends internet-bound traffic to an Internet Gateway and resources have public addressing or a public entry path. A private subnet may still reach the internet outbound through NAT, but inbound internet traffic cannot directly initiate connections to those instances.

Security groups and NACLs

Expect: "What is the difference between a security group and a network ACL?"

Security groups are stateful virtual firewalls attached to network interfaces or resources. If inbound traffic is allowed, return traffic is automatically allowed. They are usually the primary control for application access.

Network ACLs are stateless subnet-level controls. You need matching inbound and outbound rules. They are useful for coarse subnet guardrails but are awkward for detailed application policy.

A practical answer: use security groups for most service-to-service permissions, with rules like "ALB security group can reach app security group on 443" rather than CIDR-wide access. Use NACLs sparingly for broad deny or compliance boundaries.

Common trap: opening a database to 0.0.0.0/0 because it is in a private subnet. Private reduces exposure, but security groups should still restrict access to specific app roles or security groups.

NAT, endpoints, and egress control

Private workloads often need to fetch patches, call APIs, or reach AWS services. The default answer is NAT Gateway, but interviewers may ask about cost and control.

Alternatives and complements:

  • VPC endpoints for private access to AWS services such as S3, DynamoDB, ECR, Secrets Manager, or CloudWatch without traversing the public internet.
  • Egress proxies for inspection, allowlists, or audit.
  • No outbound internet for highly restricted workloads, with dependencies mirrored or accessed through endpoints.

A senior answer says: "I would use VPC endpoints for high-volume AWS service traffic, NAT for controlled general egress, and logs/flow logs to detect unexpected destinations."

IAM questions: policies, roles, and trust

IAM questions are where many candidates get vague. Keep three concepts separate:

  • Principal: who or what is making the request. User, role, service principal, federated identity, workload identity.
  • Action: what API operation is requested, such as s3:GetObject.
  • Resource: what object or ARN the action applies to.

Authorization is the combined result of identity policies, resource policies, permission boundaries, service control policies, session policies, and explicit denies. You do not need to recite every evaluation step, but you should know that explicit deny wins and that allow must exist somewhere applicable.

Roles are assumed. A role has a permissions policy and a trust policy. The trust policy says who can assume it. STS issues temporary credentials. This is why roles are preferred for applications: short-lived credentials and clear trust boundaries.

Strong answer to "How do you give an EC2 instance access to S3?"

"Attach an instance profile with an IAM role. The role policy allows only the needed S3 actions on the specific bucket or prefix, and the trust policy allows EC2 to assume the role. The app uses the default AWS credential chain to get temporary credentials. I would avoid long-lived access keys on disk."

IAM traps that interviewers notice

  • Using AdministratorAccess as a design shortcut.
  • Confusing authentication with authorization.
  • Forgetting resource policies on S3, KMS, SQS, SNS, and cross-account access.
  • Granting s3:* on all buckets for a workload that needs one prefix.
  • Ignoring KMS permissions when debugging encrypted resource access.
  • Not knowing that Lambda, ECS tasks, EKS pods, and EC2 instances each have their own role patterns.

For cross-account access, describe a role in Account B with a trust policy allowing a principal from Account A to assume it, then least-privilege permissions in Account B. Mention external IDs for third-party access when appropriate.

Services that always come up

You can answer most AWS service questions if you know the core trade-offs.

| Service | Interview use | Key trade-off | |---|---|---| | EC2 | Full control over instances | You own patching, scaling, and host operations | | ECS/Fargate | Containers with less cluster management | AWS-managed runtime with container task model | | EKS | Kubernetes workloads | Kubernetes flexibility with operational complexity | | Lambda | Event-driven functions | Simpler operations, limits around runtime, cold starts, duration, and packaging | | S3 | Object storage | Durable object store, not a filesystem | | RDS/Aurora | Managed relational database | Less ops than self-managed, still needs schema and capacity planning | | DynamoDB | Managed key-value/document at scale | Access pattern design matters upfront | | SQS | Durable queue | At-least-once delivery; consumers must be idempotent | | SNS | Pub/sub fanout | Push notification pattern, not durable work queue by itself | | EventBridge | Event routing | Great for integration; schemas and ownership matter | | CloudWatch | Metrics/logs/alarms | Good baseline; design useful signals, not alarm noise | | CloudTrail | API audit logs | Essential for security investigation |

Architecture prompt: resilient web API

If asked to design a basic production web API on AWS, outline this:

  • Route 53 DNS to an Application Load Balancer.
  • ALB in public subnets across at least two AZs.
  • App runs on ECS Fargate, EKS, or EC2 Auto Scaling in private subnets.
  • RDS/Aurora in private data subnets with Multi-AZ where needed.
  • S3 for object storage.
  • SQS for async work and retry buffering.
  • Secrets Manager or SSM Parameter Store for secrets.
  • CloudWatch metrics/logs, alarms, dashboards, and structured application logs.
  • IAM roles per workload with least privilege.
  • Security groups that allow only required paths.
  • CI/CD deploying immutable images or artifacts.

Then explain trade-offs. ECS is simpler if the team does not need Kubernetes. EKS makes sense if the company already standardizes on Kubernetes or needs portability and its ecosystem. Lambda works well for event-driven or spiky workloads but may complicate long-running jobs and local parity.

Debugging questions

Question: An EC2 instance in a private subnet cannot reach the internet. What do you check?

Route table points to NAT Gateway for 0.0.0.0/0, NAT is in a public subnet, NAT subnet routes to Internet Gateway, security group egress allows traffic, NACLs allow ephemeral return traffic, DNS is enabled, and the instance has no conflicting host firewall. Also verify the destination is reachable and not blocked by a proxy requirement.

Question: The app gets AccessDenied from S3. What do you inspect?

The role actually used by the app, identity policy, bucket policy, object ownership, KMS key policy if encrypted, prefix ARN correctness, VPC endpoint policy, SCPs, permission boundaries, and whether the request is cross-account. AccessDenied is often the combination of S3 and KMS, not just S3.

Question: Lambda is slow. What do you check?

Cold starts, memory/CPU allocation, package size, VPC attachment, downstream latency, connection reuse, concurrency limits, retries, and whether the function is doing work better handled by a queue or container.

Question: Messages are processed twice from SQS. Is AWS broken?

No. SQS standard queues provide at-least-once delivery. Consumers must be idempotent. Use visibility timeouts, dead-letter queues, deduplication logic, and FIFO queues only when ordering/deduplication requirements justify their limits.

Cost and reliability answers

Senior AWS interviews often add cost. Good answers mention:

  • Right-sizing compute and using autoscaling.
  • Choosing managed services to reduce operational cost, not just bill cost.
  • Using S3 lifecycle policies for storage classes.
  • Watching NAT Gateway data processing costs.
  • Avoiding cross-AZ data surprises in chatty architectures.
  • Using queues to smooth spikes instead of overprovisioning every downstream.
  • Setting budgets, alarms, and tags for ownership.

Reliability answers should be specific. "Multi-AZ" is not magic. You need health checks, failover behavior, backups, restore tests, runbooks, and clear RTO/RPO goals. If using RDS, know whether read replicas, Multi-AZ standby, or Aurora replicas solve your actual problem.

Prep checklist for AWS interviews

Be ready to explain:

  • Public versus private subnet based on route tables.
  • Security group versus NACL.
  • NAT Gateway, Internet Gateway, and VPC endpoints.
  • IAM role, trust policy, permissions policy, and STS.
  • How an app running on EC2/ECS/Lambda gets credentials.
  • When to use S3, EBS, EFS, RDS, DynamoDB, SQS, SNS, EventBridge.
  • One architecture you have built or would build, including failure modes.
  • One AWS outage or incident you debugged and what signal found it.
  • Cost controls you would put in from day one.

How to talk about AWS in interviews and resumes

Avoid vague bullets like "worked with AWS services." Show architecture and constraint:

  • "Designed a two-AZ VPC layout with private application subnets, scoped security groups, and VPC endpoints for S3/ECR traffic."
  • "Replaced long-lived AWS keys with task roles and least-privilege IAM policies for ECS workloads."
  • "Introduced SQS buffering and idempotent consumers to protect payment workflows from downstream outages."
  • "Reduced NAT traffic by routing high-volume AWS service calls through VPC endpoints."

AWS interview questions reward candidates who can reason from first principles: where traffic flows, who is allowed to do what, what fails, and what the bill or runbook looks like afterward. If you can explain VPC paths, IAM trust, and the core services without hiding behind acronyms, you are prepared for most AWS interviews in 2026.