EKS Anywhere, targeted scale-down of machines on bare-metal instances

7 min readAug 6, 2024

This article is part of the EKS Anywhere series EKS Anywhere, extending the Hybrid cloud momentum | by Ambar Hassani

Recently, I was tasked to design a bare-metal deployment of EKS-Anywhere in which one of the use-cases was to ensure that scale-in processes can target specific worker nodes. The reason for doing so is obvious, e.g. replacing a faulty node or carrying out hardware replacements on that node, etc.

At the onset of the use-case, I felt pretty comfortable given the fact that TinkerBell, which is the underlying IaaS provider for Cluster-API/EKS-Anywhere associates the cluster with hardware.csv. The hardware.csv file is a list of machines with their MAC addresses and IP addresses and represent the nodes in the cluster.

The assumption was that if we remove the corresponding entry in the hardware.csv and upgrade the cluster, EKS-Anywhere and TinkerBell will automatically target that particular node for deletion.

The Problem

While I tested this out, I was surprised that the above assumption was not true. Instead, a different random node was deleted and that kind of amused me to further investigate if there were any bugs. In that stride, I landed upon EKSA bare metal cluster scale-in doesn’t honor new hardware.csv file · Issue #8190 · aws/eks-anywhere (github.com)

And I caught a breath seeking the above as a fix to my woes. I was running EKS Anywhere version 0.19.5 and per the GitHub issue, this was fixed in 0.19.7. So, I finally decided to upgrade to 0.20.x with utmost confidence that the targeted scale-in issue should be a history. With that confidence, I approached the use-case testing and landed up in a pretty embarrassing situation, where the same issue still persisted.

The below logs (also provided in the above issue via my user id thecloudgarage) explain the nature of the issue, even after upgrading EKS Anywhere to 0.20.x

EKS Version:

eksctl anywhere version
Version: v0.20.1
Release Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/eks-a/manifest.yaml
Bundle Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/69/manifest.yaml

Before starting scale in, I have 2 worker nodes

kubectl get nodes -o wide
NAME           STATUS   ROLES           AGE     VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
instance-528   Ready    control-plane   4h24m   v1.29.5-eks-1109419   10.103.15.163   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-529   Ready    <none>          31m     v1.29.5-eks-1109419   10.103.15.165   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-530   Ready    <none>          3h34m   v1.29.5-eks-1109419   10.103.15.182   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-531   Ready    control-plane   4h7m    v1.29.5-eks-1109419   10.103.15.184   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-532   Ready    control-plane   3h47m   v1.29.5-eks-1109419   10.103.15.186   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4

Then I have edited my hardware csv file to remove the instance-530 worker node

cat hardware-targeted-scale-down.csv
hostname,bmc_ip,bmc_username,bmc_password,mac,ip_address,netmask,gateway,nameservers,labels,disk
instance-531,10.204.196.126,root,xxxxxx,XX:XX:XX:XX:XX:XX,10.103.15.184,255.255.252.0,10.103.12.1,10.103.8.12|10.103.12.12,type=cp,/dev/sda
instance-532,10.204.196.127,root,xxxxxx,XX:XX:XX:XX:XX:XX,10.103.15.186,255.255.252.0,10.103.12.1,10.103.8.12|10.103.12.12,type=cp,/dev/sda
instance-528,10.204.196.125,root,xxxxxx,XX:XX:XX:XX:XX:XX,10.103.15.163,255.255.252.0,10.103.12.1,10.103.8.12|10.103.12.12,type=cp,/dev/sda
instance-529,10.204.196.129,root,xxxxxx,XX:XX:XX:XX:XX:XX,10.103.15.165,255.255.252.0,10.103.12.1,10.103.8.12|10.103.12.12,type=worker,/dev/nvme0n1

I have also adjusted my cluster config file to scale the worker node count to 1 and then ran the command

eksctl anywhere upgrade cluster -f cluster-config-upgrade-20240730094824-scale-to-1.yaml --hardware-csv hardware-targeted-scale-down.csv --kubeconfig /home/ubuntu/eksanywhere/eksa-xxxx-cluster2n/eksa-xxxx-cluster2n-eks-a-cluster.kubeconfig --skip-validations=pod-disruption
Performing setup and validations
✅ Tinkerbell provider validation
✅ SSH Keys present
✅ Validate OS is compatible with registry mirror configuration
✅ Validate certificate for registry mirror
✅ Control plane ready
✅ Worker nodes ready
✅ Nodes ready
✅ Cluster CRDs ready
✅ Cluster object present on workload cluster
✅ Upgrade cluster kubernetes version increment
✅ Upgrade cluster worker node group kubernetes version increment
✅ Validate authentication for git provider
✅ Validate immutable fields
✅ Validate cluster's eksaVersion matches EKS-Anywhere Version
✅ Validate eksa controller is not paused
✅ Validate eksaVersion skew is one minor version
Ensuring etcd CAPI providers exist on management cluster before upgrade
Pausing GitOps cluster resources reconcile
Upgrading core components
Backing up management cluster's resources before upgrading
Upgrading management cluster
Updating Git Repo with new EKS-A cluster spec
Finalized commit and committed to local repository      {"hash": "2d209dbf9ebd2a0f45ff88c8fe1a793f4d11348a"}
Forcing reconcile Git repo with latest commit
Resuming GitOps cluster resources kustomization
Writing cluster config file
🎉 Cluster upgraded!
Cleaning up backup resources

However, I still see EKS Anywhere does not delete the node that I had removed from hardware csv. Instead, it starts deleting the other node.

kubectl get nodes -o wide
NAME           STATUS                     ROLES           AGE     VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
instance-528   Ready                      control-plane   4h26m   v1.29.5-eks-1109419   10.103.15.163   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-529   Ready,SchedulingDisabled   <none>          33m     v1.29.5-eks-1109419   10.103.15.165   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-530   Ready                      <none>          3h36m   v1.29.5-eks-1109419   10.103.15.182   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-531   Ready                      control-plane   4h9m    v1.29.5-eks-1109419   10.103.15.184   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-532   Ready                      control-plane   3h49m   v1.29.5-eks-1109419   10.103.15.186   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4

Per the logic of what I understand has been fixed, instance-530 should have been deleted as it was removed from hardware csv. However after the scale in upgrade, the other node is deleted

kubectl get nodes -o wide
NAME           STATUS   ROLES           AGE     VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
instance-528   Ready    control-plane   4h32m   v1.29.5-eks-1109419   10.103.15.163   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-530   Ready    <none>          3h43m   v1.29.5-eks-1109419   10.103.15.182   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-531   Ready    control-plane   4h15m   v1.29.5-eks-1109419   10.103.15.184   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4
instance-532   Ready    control-plane   3h55m   v1.29.5-eks-1109419   10.103.15.186   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.18-0-gae71819c4

Now, the fix!

Upon discussing with various community members, it felt as if there was something wrong with Cluster-API itself. And this led to investigating a CRD named machinesets.cluster.x-k8s.io

As I studied this CRD specification, it baffled me to see a deletion policy under the CRD which held the key to remediate the issue.

spec:
  clusterName: eksa-xxxx-cluster2q
  deletePolicy: Random
  replicas: 2

The snippet of Cluster API specification for this CRD is given below and can be seen in full detail here MachineSet Controller · The Cluster API Book (k8s.io)

    // DeletePolicy defines the policy used to identify nodes to delete when downscaling.
    // Defaults to "Random".  Valid values are "Random, "Newest", "Oldest"

None of these options seem to satisfy my criteria, where I could target a specific node for scale-in.

Finally, as luck would have it., I saw another GitHub issue for Cluster-API add docs about deleting specific machines when downscaling · Issue #10306 · kubernetes-sigs/cluster-api (github.com)

And from there I could get a workaround to finally steer this in the right direction. The ultimate fix was to label the node with “cluster.x-k8s.io/delete-machine” : “”

And so it happened, with the below command, I labeled my targeted node

kubectl get nodes -A
NAME           STATUS   ROLES           AGE     VERSION
instance-574   Ready    control-plane   81m     v1.29.5-eks-1109419
instance-575   Ready    <none>          24m     v1.29.5-eks-1109419
instance-576   Ready    control-plane   48m     v1.29.5-eks-1109419
instance-577   Ready    control-plane   63m     v1.29.5-eks-1109419
instance-578   Ready    <none>          6m27s   v1.29.5-eks-1109419

kubectl label node instance-578 cluster.x-k8s.io/delete-machine=
node/instance-578 labeled

eksctl anywhere upgrade cluster -f cluster-config-upgrade-20240730094824-scale-to-1.yaml --hardware-csv hardware-targeted-scale-down.csv --kubeconfig /home/ubuntu/eksanywhere/eksa-xxxx-cluster2n/eksa-xxxx-cluster2n-eks-a-cluster.kubeconfig --skip-validations=pod-disruption
Performing setup and validations
✅ Tinkerbell provider validation
✅ SSH Keys present
✅ Validate OS is compatible with registry mirror configuration
✅ Validate certificate for registry mirror
✅ Control plane ready
✅ Worker nodes ready
✅ Nodes ready
✅ Cluster CRDs ready
✅ Cluster object present on workload cluster
✅ Upgrade cluster kubernetes version increment
✅ Upgrade cluster worker node group kubernetes version increment
✅ Validate authentication for git provider
✅ Validate immutable fields
✅ Validate cluster's eksaVersion matches EKS-Anywhere Version
✅ Validate eksa controller is not paused
✅ Validate eksaVersion skew is one minor version
Ensuring etcd CAPI providers exist on management cluster before upgrade
Pausing GitOps cluster resources reconcile
Upgrading core components
Backing up management cluster's resources before upgrading
Upgrading management cluster
Updating Git Repo with new EKS-A cluster spec
Finalized commit and committed to local repository      
Forcing reconcile Git repo with latest commit
Resuming GitOps cluster resources kustomization
Writing cluster config file
🎉 Cluster upgraded!
Cleaning up backup resources

kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
instance-574   Ready    control-plane   86m   v1.29.5-eks-1109419
instance-575   Ready    <none>          28m   v1.29.5-eks-1109419
instance-576   Ready    control-plane   52m   v1.29.5-eks-1109419
instance-577   Ready    control-plane   67m   v1.29.5-eks-1109419

And ran the EKS Anywhere upgrade with the worker count changed from 2 to 1. As a result of the above label set, scale-in processes correctly targeted instance-578 and removed it from the cluster!!!!!

Well, at the end of it, feels so easy-peasy., but clocked me almost a week scratching my head amidst the daily frenzy.

Hope this share helps the wider community who is looking for a similar use-case, which I feel is one of the most important ones when it comes to operating bare-metal clusters.

cheers,

Ambar@thecloudgarage

#iwork4dell

EKS Anywhere, targeted scale-down of machines on bare-metal instances

The Problem

Now, the fix!

Written by Ambar Hassani

No responses yet