Boost your Kubernetes cluster’s Autoscaler on AWS EKS with Karpenter

Nicolas Vogt
6 min readJul 16, 2022

--

Photo by Randy Fath on Unsplash

Introduction

When you manage a Kubernetes cluster in production, it could be sometimes challenging to size it accordingly, especially when you have to deal with fast changing workloads.

The safest way is always to oversize your cluster with more nodes than needed, but you end up paying for resources you do not consume. The cheap way is to trying to use the minimum required, but you will most probably experience resource contentions before you could scale on unexpected surges in workload.

Cloud providers are offering autoscaling solutions, but they are not reactive enough for certain use cases. Plus, it is a bit challenging to apply a real pay-as-you-go model with AWS EKS, and this is where Karpenter can help you.

What is Karpenter?

Karpenter is a just-in-time capacity operator, deployed inside an AWS EKS, intercepting requests passed to Kubernetes’ admission controller and taking actions when demand does not match the available resources. It means that the operator is listening on what is being deployed in the cluster, and when the demand cannot be served by the current number of nodes, for example when someone starts a new deployment of a java application but the number of nodes in the cluster does not have enough resource to handle it, Karpenter will provision a new cluster node before allowing the pods to start. The opposite is also true, when a deployment is scaled down, Karpenter will order the cluster to scale down accordingly.

diagram from stackovercloud

Okay, but what difference does it make with current autoscaling solutions? The easiest way to do it is to set up a monitoring at the node level, and take action when CPU or RAM is trespassing a certain threshold. In most cases, this is enough. But reacting on system metrics is a bit dangerous because metric collector processes are the first to be de-prioritized when CPU resources become scarce on a machine and you could be notified very late that your node is experiencing resource contentions. Others choose to react on business metrics such as number of concurrent users, transactions, and so on.. and that requires an accurate estimation on resources needed by whatever metrics you picked. Plus it does not cover you in case of a job stuck or going crazy.

Karpenter will scale your Kubernetes up and down only when it is really needed and will scale proactively so it will prevent you from the above mentioned problems. Still, it does not protect you from bugs and other unwanted behaviors. It does not prevent you from setting up monitoring solutions.

Before we start

First of all, there are some requirements to fulfill if you intend to follow this guide. You will need at least the following tools :

There are multiple ways to deploy Karpenter, but I chose Terraform to make it really easy for you to reproduce. In my repository you will find a Terraform scripts to deploy an up-and-running EKS cluster with Karpenter in 10 minutes. So let’s get started.

You can start by cloning my repository :

~ » git clone https://github.com/nivogt/karpenter-terraform.git

You will find the following files :

  • eks.tf — terraform resources for AWS EKS
  • karpenter.tf — terraform resources for karpenter
  • providers.tf — terraform providers (aws, helm, kubectl)
  • requirements.tf — terraform requirements
  • variables.tf — terraform variables
  • manifests/inflate.yaml — manifest to deploy with kubectl to test your cluster

Deploy

First, make sure the service linked role needed by EKS already exists. If the following command returns the same error as me, this means that the service linked role is already present:

~ » aws iam create-service-linked-role --aws-service-name spot.amazonaws.comAn error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForEC2Spot has been taken in this account, please try a different suffix.

Then, you shall review the variables, because default values may not be compatible with your infrastructure. Here is the content of variables.tf :

If those are not suitable, you can change it either with an export or with a terraform.tfvars files. You will find the document on how to proceed here.

Then, you will have to download Terraform providers and modules with the init command :

~ » terraform init
Initializing modules...
Initializing the backend...Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/helm from the dependency lock file
- Reusing previous version of gavinbunney/kubectl from the dependency lock file
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/cloudinit from the dependency lock file
- Using previously-installed hashicorp/aws v4.22.0
- Using previously-installed hashicorp/helm v2.5.1
- Using previously-installed gavinbunney/kubectl v1.14.0
- Using previously-installed hashicorp/tls v3.4.0
- Using previously-installed hashicorp/cloudinit v2.2.0
Terraform has been successfully initialized!You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Once Terraform successfully initialized, you can plan your deployment and save it into a .tfplan file :

~ » terraform plan -out .tfplan

You can safely ignore the warnings, the most important at this step is that the plan shows :

Plan: 40 to add, 0 to change, 0 to destroy.

The plan should have been saved to a .tfplan file, to apply this plan, type the following command :

~ » terraform apply .tfplan
module.vpc.aws_subnet.public[2]: Creating...
module.vpc.aws_subnet.public[0]: Creating...
module.vpc.aws_subnet.private[0]: Creating...
module.vpc.aws_subnet.public[1]: Creating...
module.vpc.aws_eip.nat[0]: Modifying... [id=eipalloc-023fbbe3a73f172d0]
module.vpc.aws_subnet.private[1]: Creating...
module.vpc.aws_subnet.private[2]: Creating...

After roughly 10 minutes, you should have the following message :

Apply complete! Resources: 40 added, 0 changed, 0 destroyed.

Finally update your kubectl credentials :

aws eks update-kubeconfig --name karpenter-demo

Test and Validate

Check that there is only one node in your cluster

~ » kubectl get nodes
NAME STATUS ROLES AGE VERSION
xxx.compute.internal Ready <none> 103s v1.21.12-eks-5308cf7

Then, deploy the manifest provided with my git repository. This is a pod deployment with 0 replicas of the pause image.

~ » kubectl apply -f manifest/inflate.yaml

Scale this replicas to 5 :

~ » kubectl scale deployment inflate --replicas 5

You can check that karpenter has detected the change and ordered to create a new node :

~ » kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

You should see something like this :

2022-07-16T16:11:14.854Z INFO controller.provisioning Created node with 5 pods requesting {"cpu":"5125m","pods":"7"} from types t3a.2xlarge, c5a.2xlarge, t3.2xlarge, c6i.2xlarge, c5.2xlarge and 145 other(s) {"commit": "062a029", "provisioner": "default"}

You can also check that there is now two nodes in your cluster :

~ » kubectl get nodes
NAME STATUS ROLES AGE VERSION
xxxxxx.compute.internal Ready <none> 5m13s v1.21.12-eks-5308cf7
yyyyyy.compute.internal Ready <none> 109s v1.21.12-eks-5308cf7

Destroy

To undo every changes, start by removing the inflate deployment

~ » kubectl delete -f manifest/inflate.yaml
deployment.apps "inflate" deleted

You can check that karpenter has detected the change and ordered to remove a node :

~ » kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

You should see something like this :

2022-07-16T16:14:25.042Z INFO controller.termination Cordoned node {"commit": "062a029", "node": "yyyyyy.compute.internal"}
2022-07-16T16:14:25.307Z INFO controller.termination Deleted node {"commit": "062a029", "node": "yyyyyy.compute.internal"}

You can also check that there is only one node remaining :

~ » kubectl get nodes
NAME STATUS ROLES AGE VERSION
xxx.compute.internal Ready <none> 103s v1.21.12-eks-5308cf7

Then type the terraform destroy command :

~ » terraform destroy

Type yes when requested and wait for the following message to appear :

What next

Well, before you go use it in production I strongly advise you to read the documentation, so you can understand how to configure it accordingly with your environment.

You can restrict the use of certain type of instances, define different rules on different regions, adding taint to your nodes, … There is a lot about provisioning, limiting resource or scheduling that can be configured and you will have to experiment it by yourselves because it will be too long.

Now that you know how to scale a kubernetes cluster with Karpenter, you can check out this article I wrote about KEDA. Both of these technologies can be combined to set up scaling automation on a Kubernetes cluster.

So that’s it, please feel free to send me your questions or feedback on this topic. Keep posted, I’ll come with a new article very soon.

--

--

Nicolas Vogt
Nicolas Vogt

Written by Nicolas Vogt

Curious, most of the time, eager to learn something new when I’m not

No responses yet