What is Cluster Auto-Scaler (Pre-Karpenter)?
Cluster Auto-Scaler (CA) is a Kubernetes component that works with AWS Auto Scaling Group (ASG) to automatically adjusts the size of the Kubernetes cluster worker node (adding or removing nodes) based on resource demands. It automatically scales nodes up or down within a predefined node group, ensuring efficient resource utilisation.
- Scaling Up: Adding new nodes when pending pods cannot be scheduled due to resource shortages (e.g., insufficient CPU or memory).
- Scaling Down: Removing underutilised nodes when workloads decrease, and the nodes are idle.
- Node Groups: CA is tied to fixed node groups or auto-scaling groups defined in the cloud provider.
- Granularity: Nodes are scaled up or down at the group level, which means the size of node pools is fixed upfront.
- Scheduling: It relies on static resource templates (like instance types & size) defined in the node group configuration.
- Time: Scaling up and down using Cluster Auto-Scaler takes time. The reason is, in case of demand Kubernetes sends notification to CA, which in turn triggers AWS Auto Scaling Group APIs to add or remove server.
What is Karpenter?
Karpenter aims to address the gaps in Cluster Auto Scaler. Karpenter is an open-source, flexible, high-performance cluster auto-scaler designed to optimise Kubernetes cluster resource management. It dynamically provisions and de-provisions nodes in response to changing workloads in real-time. Unlike traditional Kubernetes Cluster Auto-scaler (CA), which works based on static scaling logic, Karpenter operates with a more dynamic and flexible approach by considering workload requirements and underlying infrastructure constraints. Karpenter automatically provisions new nodes in response to schedulable pod request. Karpenter does this by observing events within the Kubernetes cluster and then sending direct API calls to the underlying cloud provider.
- Designed to handle the full flexibility of the cloud: Karpenter can efficiently address the full range of instance types of availability through AWS with the flexibility to handle hundreds of instance types, zones, and purchase options.
- Quick node provisioning: Karpenter manages each instance directly, without use of additional orchestration mechanisms like node groups. This enables it to retry in milliseconds instead of minutes when capacity is unavailable. It also allows Karpenter to leverage diverse instance types, availability zones, and purchase options without the creation of hundreds of node groups
- Intelligent Node Optimisation: Karpenter consolidates workloads by automatically identifying underutilised nodes and rescheduling their pods onto other nodes. It terminates unnecessary nodes to reduce costs and improve resource efficiency outside of the NodePool Distruption Budget.
- Real-Time Consolidation: Unlike traditional auto-scalers, which rely on scheduled scale-down operations, Karpenter performs real-time consolidation by continuously evaluating the cluster state to ensure optimal resource usage (outside of NodePool Distruption Budget settings)
- Cost Savings: By consolidating workloads and terminating underutilised nodes, Karpenter reduces unnecessary infrastructure spending, especially in environments with varying workloads.
- Pod-Aware Decisions: Karpenter considers pod-specific requirements, such as affinity rules, taints and tolerations, during consolidation, ensuring workloads are efficiently and seamlessly rescheduled without disruption.
- Spot Instances: Karpenter’s internal queuing system “KarpenterInterruptionQueue” allows Karpenter controller to receive notification messages from other AWS services about the health and status of instances. The interruption queue also allows Karpenter to be aware of spot instance interruptions that are sent 2 minutes before spot instances are reclaimed by EC2. This allows Karpenter to spin up new instances or move pods from the to-be-removed spot instance to existing instance seamlessly. Hence, Karpenter with Spot instances configuration can be used for development environment to reduce the 90% of infrastructure cost and production workloads that cannot tolerate frequent interruptions can use reserved or on-demand instances.
How Karpenter works?
NodePool:
A NodePool in Karpenter represents a logical grouping of nodes that share similar characteristics. It allows you to define a set of workloads that require specific node attributes, such as instance types, zones, or labels.
Key Features:
- Workload-Aware Provisioning: NodePool enable defining constraints and preferences for specific workloads, such as instance types, regions, and zones. It allows Karpenter to provision nodes that precisely match the needs of the associated workloads.
- Labels and Taints: You can define labels and taints in a NodePool to influence pod scheduling and ensure workload isolation.
- For example, a NodePool can be created for GPU workloads with specific instance types like p3 for graphic intensive workloads, and another NodePool can handle general-purpose workloads using t3 & m5. instances.
NodeClass:
A NodeClass also known as EC2NodeClass is a Karpenter resource that defines the server-level details for provisioning nodes. It abstracts away cloud provider-specific configurations, enabling consistency and flexibility.
Key Features:
- Server Configuration: Specifies details like AMI (Amazon Machine Image), security groups, subnets, and instance profiles. This is where the cloud provider’s specifics are managed, such as instance types and spot/on-demand configurations.
- Reusability: NodeClass are reusable across multiple NodePool, allowing administrators to centralise infrastructure definitions.
- Decoupling Workloads and Infrastructure: By separating infrastructure details (NodeClass) from workload definitions (NodePool), Karpenter enables modular and maintainable configurations.
How to deploy Karpenter?
Karpenter can be installed into the EKS cluster using one, but not limited to the following options,
- Using Helm [Reference : Karpenter installation using Helm]
- Using Addons
In below example, Karpenter is added to EKS cluster using terraform module Kubernetes addon. [Reference: Karpenter installation using Terrafrom module addon]
How to setup NodeClass?
NodeClass or EC2NodeClass can be deployed using the below config.
- Name: Refers to EC2NodeClass object name created in EKS. (for example: kubectl describe ec2nodeclass “NODECLASS_NAME”-namespace kube-system will provide nodeclass object information in kubernetes)
- spec.amiFamily : AMIFamily dictates the default bootstrapping logic for nodes provisioned through this EC2NodeClass. Available options are al2 / al2023 / bottlerocket / windows2019 / windows2022
- spec.amiSelectorTerms: used to pass AMI to be used. AMI’s can be selected AMIs are discovered through alias, id, owner, name, and tags.
- spec.metadataOptions: Refers to enabling EC2 instance meta data which can be queried using 169.254.169.254
- httpPutResponseHopLimit : value 1 ensures the metadata can only be queried from the same instance only
- spec.blockDeviceMapping: Refers to EBS volumes configs, that will be attached to nodes during creation
- spec.userData : Refers to custom commands to be executed during the instance creation
- spec:role: Refers to instance profile role. Karpenter EKS add on creates the role as part of installing Karpenter. It can be referred using module (kubernetes_addons.karpenter.node_iam_role_name)
- spec.securityGroupSelectorTerms & spec.subnetSelectorTerms : Refers to selecting one or many SG’s & subnets. (it’s an AND or OR filter combination to select values from queried results)
- spec.associatePublicIPAddress : False setting refers as the instances are hosted in private subnet and doesn’t need a public IP
- spec.tags : Refers to all tags that needs to be added to the nodes created by Karpenter
Example config :
resource "kubernetes_manifest" "default_node_class" {
manifest = {
apiVersion = "karpenter.k8s.aws/v1beta1"
kind = "EC2NodeClass"
metadata = {
name = "defaultnodeclass"
labels = {
env = "karpenter-demo"
}
}
spec = {
amiFamily = "Bottlerocket"
amiSelectorTerms = [
{
id = "ami-xxxxxxxxxxxxxx"
}
]
metadataOptions = {
httpEndpoint = "enabled"
httpProtocolIPv6 = "disabled"
httpPutResponseHopLimit = 1
httpTokens = "required"
}
blockDeviceMappings = [
{
deviceName = "/dev/xvda"
ebs = {
volumeSize = "50Gi"
volumeType = "gp3"
encrypted = true
kmsKeyID = aws_kms_key.karpenter.key_id
deleteOnTermination = true
}
},
{
deviceName = "/dev/xvdb"
ebs = {
volumeSize = "50Gi"
volumeType = "gp3"
encrypted = true
kmsKeyID = aws_kms_key.karpenter.key_id
deleteOnTermination = true
}
}
]
userData = <<-EOF
[settings.kernel]
lockdown = "integrity"
[settings.host-containers.admin]
enabled = true
# The control host container provides out-of-band access via SSM.
# It is enabled by default, and can be disabled if you do not expect to use SSM.
# This could leave you with no way to access the API and change settings on an existing node!
[settings.host-containers.control]
enabled = true
[settings.bootstrap-containers.cis-bootstrap]
mode = "always"
source = "${data.aws_caller_identity.current.account_id}.dkr.ecr.ap-southeast-2.amazonaws.com/bottlerocket-cis-bootstrap-image:latest"
[settings.kernel.modules.sctp]
allowed = false
[settings.kernel.modules.udf]
allowed = false
[settings.kernel.sysctl]
"net.ipv4.conf.all.accept_redirects" = "0"
"net.ipv4.conf.all.log_martians" = "1"
"net.ipv4.conf.all.secure_redirects" = "0"
"net.ipv4.conf.all.send_redirects" = "0"
"net.ipv4.conf.default.accept_redirects" = "0"
"net.ipv4.conf.default.log_martians" = "1"
"net.ipv4.conf.default.secure_redirects" = "0"
"net.ipv4.conf.default.send_redirects" = "0"
"net.ipv6.conf.all.accept_redirects" = "0"
"net.ipv6.conf.default.accept_redirects" = "0"
EOF
role = module.kubernetes_addons.karpenter.node_iam_role_name
securityGroupSelectorTerms = [
{
tags = {
"karpenter.sh/discovery" = var.cluster_name
}
}
]
subnetSelectorTerms = [
{
tags = {
"Name" = "private-2a"
}
},
{
tags = {
"Name" = "private-2b"
}
},
{
tags = {
"Name" = "private-2c"
}
}
]
tags = merge(var.common_tags,
var.additional_ec2_tags,
{
Name = "karpenter"
})
associatePublicIPAddress = false
}
}
}
How to setup NodePool?
Like NodeClass, Nodepool can be configured by deployed the below config.
- Name : Refers to NodePool object name created in EKS (for example: kubectl get nodepool “NODEPOOL_NAME” -namespace kube-system)
- spec.label : Refers to the list of labels added to nodes during the creation process (these will be referred by pods using node selector config)
- spec.nodeClassRef : Refers to the NodeClass config for the nodepool.
- spec.taints: Refers to list of taints that can be added to nodes. (Pod toleration can be configured accordingly to assign pods to respective nodes)
- spec.requirements : Sections allows to provide following inputs like,
- instance family
- instance generation
- instance CPU
- instance capacity type (spot / on-demand)
- instance zone
- etc. (refer documentation for complete list)
- spec.disruption : NodePool Disruption Budget allows administrators to configure, the maintenance and business hours window. Pod consolidation or optimisation can only happen on maintenance window
- nodes: 10% means, Karpenter can alter only 10% of its capacity at any given time. If node must be reduced to 50% capacity,
- multiple iteration of 10% will happen to achieve the result
- In below example config, the Business Hours is: Monday – Friday, 6AM AEST – 12AM AEST the next day
- spec.limits: Max combined CPU / Memory that Karpenter can schedule
Example config :
resource "kubernetes_manifest" "default_node_pool" {
manifest = {
apiVersion = "karpenter.sh/v1beta1"
kind = "NodePool"
metadata = {
name = "default-pool"
labels = {
env = "Karpenter-demo"
}
}
spec = {
template = {
spec = {
metadata = {
labels = {
"karpenter/zone.type" = "internal"
}
}
spec = {
nodeClassRef = {
name = "defaultnodeclass"
}
taints = [
{
key = "karpenter/zone.type",
value = "internal",
effect = "NoSchedule"
}
],
requirements = [
{
key = "karpenter.k8s.aws/instance-family"
operator = "In"
values = ["r6a"]
},
{
key = "karpenter.k8s.aws/instance-generation"
operator = "Gt"
values = ["2"]
},
{
key = "karpenter.k8s.aws/instance-cpu"
operator = "In"
values = ["2", "8"]
},
{
key = "karpenter.k8s.aws/instance-cpu"
operator = "Lt"
values = ["33"]
},
{
key = "karpenter.sh/capacity-type"
operator = "In"
values = ["spot"]
}
]
}
}
disruption = {
consolidationPolicy = "WhenUnderutilized"
expireAfter = "10m" # Previous value 720h0m0
budgets = [
{
"nodes" : "10%"
},
{
"schedule" : "0 20 * * sun-thu",
"duration" : "18h",
"nodes" : "0"
}
]
}
limits = {
cpu = "64"
memory = "512Gi"
}
weight = 1
}
}
}
How to migrate from CA to Karpenter?
It’s simple. Deploy Karpenter into an existing cluster, create Karpenter nodes with similar taints and tolerations, and gradually reduce the existing CA node group count to zero for the pods to switch to Karpenter nodes.
Conclusion:
In conclusion,
- Deploying Karpenter on new or existing Kubernetes clusters simplifies administration by automating tasks and reducing operational overhead.
- It enhances resource utilisation through pod consolidation and the removal of underutilised servers.
- With support for diverse instance types, integration with AWS Spot termination notifications, and efficient pod migration, Karpenter is a cost-effective solution for development environments.
Reach out to Cevo if you want to swiftly deploy Karpenter to your existing or new Kubernetes cluster.
In the next blog, let’s deep dive into EKS Auto Mode, a new feature released by AWS at re:Invent 2024. EKS Auto Mode uses Karpenter behind the scenes. As the name suggests, with EKS Auto Mode, AWS handles most of the heavy lifting, further reducing operational overhead for administrators and allowing them to focus on what truly matters for their customers.