AWS

This guide walks through deploying NKP on AWS with nic (the Nebari Infrastructure Core CLI), from an empty AWS account to a running cluster you can manage and tear down.

What your team gets

When nic deploy finishes, your team will have all standard NKP services plus:

A managed Kubernetes cluster ready for workloads (AWS EKS, multi-AZ by default).
Optional shared storage (EFS) that pods on any node can mount when enabled in your config.

Prerequisites

AWS account

You'll need access to an AWS account you can deploy into. If you don't have one, sign up for AWS.

Region: pick one where EKS is available and has at least two availability zones. The example uses us-west-2a and us-west-2b.
Quotas: confirm your region has enough EKS service quota for the cluster you're about to create. New accounts often need an increase.
Cost: NKP does not fit within the free tier; EKS, NAT gateways, and EFS all bill from day one.

IAM permissions

Running nic deploy (and nic destroy) needs permissions across S3, EC2, EKS, IAM, EFS, SSM, CloudWatch Logs, and Elastic Load Balancing.

For a first deployment, admin credentials are the fastest path. For production, create a customer-managed policy from this least-privilege IAM policy and attach it to your deploy user or role.

note

The policy grants the minimum for a complete cluster with all optional features (EFS, brownfield VPC adoption). If you don't use those, you can omit the corresponding permissions.

Install nic

Follow the Install NIC guide to download and install the nic CLI for your platform.

GitOps repository

See GitOps repository in the Prepare to deploy guide.

Secrets and credentials

From inside your GitOps repo clone, download the template:

cd /path/to/your-gitops-repo
curl -o .env https://raw.githubusercontent.com/nebari-dev/nebari-infrastructure-core/main/.env.example

Then uncomment and fill in the AWS block and the GitOps tokens. For .gitignore setup and GitOps token configuration, see Secrets and credentials in the Prepare to deploy guide.

AWS credentials

NIC uses the AWS Go SDK's standard credential chain, so any of these work:

AWS_PROFILE=<name>: points to an SSO or static profile in ~/.aws/config. For SSO, run aws sso login --profile <name> first so the session is valid.
AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY: static IAM user keys. Use only if you've created an IAM user with the required permissions.
AWS_SESSION_TOKEN (in addition to the above): required if your keys are temporary (e.g., copied from the AWS access portal's "Access keys" popup).

Pick one pattern for .env:

AWS_PROFILE=my-sso-profile  # SSO / IAM Identity Center
# OR static keys:
# AWS_ACCESS_KEY_ID=AKIA...
# AWS_SECRET_ACCESS_KEY=...
# AWS_SESSION_TOKEN=...     # only if keys are temporary

Verify the credentials work before deploying:

source .env
aws sts get-caller-identity

You should see your AWS account ID and the role/user the credentials resolve to.

Cost considerations

A NKP deployment provisions several AWS services that bill from day one. Check the AWS pricing pages for current rates in your region.

EKS control plane: flat per-cluster hourly rate.
EC2 instances in your node groups: instance hours per running node.
EBS root volumes on each node: per-GB-month for node disks.
NAT gateways: one per AZ (minimum two, since EKS requires two AZs). Hourly rate per gateway plus per-GB data processing.
NLBs: the AWS Load Balancer Controller creates one for ingress. Hourly rate plus Load Balancer Capacity Units (LCU).
EFS: per-GB-month for shared cluster storage. bursting is the default and is cheap at low utilization.
VPC interface endpoints: around nine created by default. Hourly rate per endpoint per AZ, plus per-GB data processing.
KMS: one customer-managed key for EKS secrets encryption. Per-key-month plus per-request.
CloudWatch Logs: EKS control-plane log ingestion and storage.

Configuration

Pick the starter config that matches your DNS setup:

aws-config.yaml for any DNS provider: you'll create A/CNAME records manually at deploy time using values nic prints.
aws-config-with-dns.yaml for Cloudflare-hosted domains: nic creates the records automatically.

Download the one you want into the directory you'll deploy from:

# Pick one:
curl -O https://raw.githubusercontent.com/nebari-dev/nebari-infrastructure-core/main/examples/aws-config.yaml
curl -O https://raw.githubusercontent.com/nebari-dev/nebari-infrastructure-core/main/examples/aws-config-with-dns.yaml

note

In later steps, <config-file> refers to the local copy you just created (e.g., aws-config.yaml).

At minimum, edit these fields:

project_name: my-cluster  # lowercase alphanumeric
domain: nebari.example.com  # a hostname you own

certificate:
  acme:
    email: you@example.com  # required for Let's Encrypt; renewal notices go here

git_repository:
  url: "https://github.com/<your-org>/<your-gitops-repo>.git"
  path: clusters/my-cluster  # subdirectory in the repo; conventionally matches project_name
  auth:
    token_env: GIT_TOKEN  # matches the GIT_TOKEN set in .env

cluster:
  aws:
    region: us-west-2  # an EKS-supported region
    availability_zones:
      - us-west-2a  # at least two AZs
      - us-west-2b

# Only if you used aws-config-with-dns.yaml:
dns:
  cloudflare:
    zone_name: example.com  # your Cloudflare zone (parent of `domain`)

Cloudflare DNS

To use aws-config-with-dns.yaml, generate a Cloudflare API token with Zone:Read and DNS:Edit permissions on the zone in dns.cloudflare.zone_name. See Cloudflare DNS for how to add it to .env.

For the full schema (brownfield VPC adoption, custom KMS keys, advanced node-group options, Longhorn, log types), see the NIC configuration reference.

Deploy and verify

Follow Deploy a cluster to deploy and verify. The first deployment takes at least 30 minutes (about 10 for the EKS control plane, then ArgoCD syncing).

After deploy, DNS handling depends on which config you chose: aws-config.yaml requires a manual A/CNAME record; aws-config-with-dns.yaml creates DNS records automatically. See Cloudflare DNS for details.

note

Verifying on AWS also needs the AWS CLI — the generated kubeconfig calls aws eks get-token to authenticate. If kubectl fails with Token has expired and refresh failed, your AWS SSO session timed out (default 8 hours); run aws sso login --profile <aws-profile> and retry.

IAM roles `nic` creates in your account

nic deploy creates a few IAM roles in your account so the cluster, the nodes, and the in-cluster controllers can each do their AWS work safely:

Role	Purpose
EKS cluster role	Lets EKS create the network interfaces, security groups, and log groups the cluster needs.
EKS node-group role	Grants nodes the standard EKS worker, ECR-read-only, and CNI policies.
AWS Load Balancer Controller role	Lets the in-cluster controller create and manage load balancers for ingress.
EBS CSI driver role	Lets the in-cluster driver mount EBS volumes into pods.
EFS CSI driver role (if EFS enabled)	Lets the in-cluster driver mount EFS into pods.

See Keycloak authentication for first sign-in.

Update an existing deployment

To change something about a running cluster (scale a node group, add a gpu group, change tags, switch EFS throughput), edit your config and re-run the deploy commands as described in Update a cluster.

warning

Changing region, project_name, or vpc_cidr_block triggers destructive resource recreation. Treat these as one-way decisions.

Upgrade Kubernetes version

EKS requires one-minor-version increments — you cannot skip versions or downgrade. To go from 1.32 to 1.34 you must upgrade twice:

1.32 → 1.33 → 1.34

Skipping a version (for example, 1.32 → 1.34 directly) is rejected by EKS during deploy.

Set kubernetes_version under cluster.aws in your config:

cluster:
  aws:
    kubernetes_version: "1.33"   # was "1.32"

EKS accepts bare minor versions ("1.33") — patch versions are not required.

EKS upgrades in two phases:

Control plane upgrades first. The Kubernetes API server and control-plane components are updated to the new version. This takes around 10 minutes and is handled by AWS.
Node groups roll one at a time. EKS launches a replacement node, waits for it to become Ready, then drains and terminates the old one — one node at a time per node group. Node groups upgrade sequentially after the control plane finishes. Plan for 20–40 minutes total depending on the number of node groups.

After upgrading, verify your EKS managed add-ons (kube-proxy, CoreDNS, VPC CNI) are on versions compatible with the new cluster version — check the Add-ons tab in the AWS EKS console and update any that are flagged.

See Upgrade Kubernetes version for the upgrade commands and post-upgrade verification steps.

Destroy

Run the destroy commands as described in Destroy a cluster.

nic destroy removes EKS, node groups, EFS, VPC components, and the state bucket. If a resource fails to delete (commonly, a leftover load balancer from the cluster's ingress), remove it manually in the AWS console before retrying with --force.

Always confirm in the AWS console that no orphan resources remain. NAT gateways, load balancers, and EBS volumes can keep billing if they're not cleaned up.

What your team gets​

Prerequisites​

AWS account​

IAM permissions​

Install nic​

GitOps repository​

Secrets and credentials​

AWS credentials​

Cost considerations​

Configuration​

Deploy and verify​

IAM roles nic creates in your account​

First sign-in​

Update an existing deployment​

Upgrade Kubernetes version​

Destroy​

What your team gets

Prerequisites

AWS account

IAM permissions

Install nic

GitOps repository

Secrets and credentials

AWS credentials

Cost considerations

Configuration

Deploy and verify

IAM roles `nic` creates in your account

First sign-in

Update an existing deployment

Upgrade Kubernetes version

Destroy