EKS Anywhere, deploying persistent and stateful workloads with vSphere CSI
This article is part of the EKS Anywhere series EKS Anywhere, extending the Hybrid cloud momentum | by Ambar Hassani.
To give a brief context, there are multiple scenarios where one would like to deploy persistent workloads on EKS Anywhere with vSphere datastores, e.g. using Hyperconverged Infrastructure systems like VxRail with vSphere or Dell APEX Cloud Platform with VMware, wherein one would like to use the underlying datastores associated with vSphere (VSAN/VMFS) as a persistence layer for stateful applications on EKS Anywhere.
While not a mandate, some use-cases could benefit from having persistence layer on the HCI system itself. This blog is a detailed practical guide to deploying vSphere CSI for such stateful applications on EKS Anywhere running atop vSphere.
Note that EKS Anywhere versions prior to
v0.16.0
included built-in installation and management of the vSphere CSI Driver in EKS Anywhere clusters. The vSphere CSI driver components in EKS Anywhere included a Kubernetes CSI controller Deployment, a node-driver-registrar DaemonSet, a default Storage Class, and a number of related Secrets and RBAC entities.Post v0.16.0, EKS Anywhere has progressed to newer code base and there is no longer a default CSI deployment for vSphere.
CAUTION: I had written about this default deployment of vSphere CSI/Storage class in what should be now treated as an OBSOLETE article and should NOT be further referenced EKS Anywhere., & the default storage class (VMware CSI/CNS) | by Ambar Hassani | Medium
Without a default CSI deployment, we will need to deploy our own integration for vSphere CSI drivers in such use-cases. Again, if we read the EKS Anywhere documentation, there is no end-to-end example showcasing how to do this. On the Broadcom/VMware site, again there is simply too much information. And I will tell you exactly why one might get totally confused from information overload!!!
Practically speaking, the persistence layer deployment in Kubernetes for vSphere-based systems comprise of two parts, CPI (Cloud provider Interface) and CSI (Container Storage Interface).
Fortunately for EKS Anywhere users, only the CSI deployment from the code base has been removed. The CPI deployment which anyways forms the fundamental of EKS Anywhere components communicating with vSphere are present. This is not adequately documented anywhere, and one would always scratch their heads as to what and how things should be deployed.
Like, I said we need not worry about CPI deployment, the focus is now to bring in a validated set of instructions to deploy and demonstrate CSI drivers.
Let’s do this!
Let’s roll in a fresh EKS Anywhere cluster
eksctl anywhere create cluster -f eksa-1.yaml
Warning: VSphereDatacenterConfig configured in insecure mode
Using the new workflow using the controller for management cluster create
Performing setup and validations
Warning: VSphereDatacenterConfig configured in insecure mode
✅ Connected to server
✅ Authenticated to vSphere
✅ Datacenter validated
✅ Network validated
Warning: VSphereMachineConfig DiskGiB cannot be less than 20. Defaulting to 20.
Warning: VSphereMachineConfig DiskGiB cannot be less than 20. Defaulting to 20.
✅ Datastore validated
✅ Folder validated
✅ Resource pool validated
✅ Datastore validated
✅ Folder validated
✅ Resource pool validated
✅ Machine config tags validated
✅ Control plane and Workload templates validated
Provided sshAuthorizedKey is not set or is empty, auto-generating new key pair... {"vSphereMachineConfig": "eksa-1-md-0"}
Private key saved to eksa-1/eks-a-id_rsa. Use 'ssh -i eksa-1/eks-a-id_rsa <username>@<Node-IP-Address>' to login to your cluster node
✅ administrator@vsphere.local user vSphere privileges validated
✅ Vsphere Provider setup is valid
✅ Validate OS is compatible with registry mirror configuration
✅ Validate certificate for registry mirror
✅ Validate authentication for git provider
✅ Validate cluster's eksaVersion matches EKS-A version
Creating new bootstrap cluster
Provider specific pre-capi-install-setup on bootstrap cluster
Installing cluster-api providers on bootstrap cluster
Provider specific post-setup
Installing EKS-A custom components on bootstrap cluster
Installing EKS-D components
Installing EKS-A custom components (CRD and controller)
Creating new workload cluster
Creating EKS-A namespace
Installing cluster-api providers on workload cluster
Installing EKS-A secrets on workload cluster
Moving cluster management from bootstrap to workload cluster
Installing EKS-A custom components on workload cluster
Installing EKS-D components
Installing EKS-A custom components (CRD and controller)
Applying cluster spec to workload cluster
Installing GitOps Toolkit on workload cluster
GitOps field not specified, bootstrap flux skipped
Writing cluster config file
Deleting bootstrap cluster
🎉 Cluster created!
--------------------------------------------------------------------------------------
The Amazon EKS Anywhere Curated Packages are only available to customers with the
Amazon EKS Anywhere Enterprise Subscription
--------------------------------------------------------------------------------------
Enabling curated packages on the cluster
Installing helm chart on cluster {"chart": "eks-anywhere-packages", "version": "0.3.13-eks-a-62"}
Deploying vSphere CSI on EKS Anywhere cluster
Create the namespace for vSphere CSI drivers
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: vmware-system-csi
EOF
Now in the Broadcom/VMware docs, there is a set of procedures to ensure disk.EnableUUID is set on the cluster virtual machines. There is no need to do so in EKS Anywhere deployments, as this is a default routine and is already performed. One can observe via the below that the providerID is already set, which is a pre-requisite for CSI drivers and executed via disk.EnableUUID. The below command ensures that we don’t need to do any of that.
kubectl get nodes -o json | grep providerID
"providerID": "vsphere://421902ca-fb57-0d36-effa-1446349cb1c4",
"providerID": "vsphere://4219a18c-f37c-6c27-ca05-cdf8cdccfd6a",
"providerID": "vsphere://4219d62a-5f12-4faa-7dc4-af4953494e30"
"providerID": "vsphere://4219a1dd-d599-a656-cb0f-9c30fba87c15"
"providerID": "vsphere://4219bebf-df09-5223-85c2-362792e53a1e",
Find out the Cluster Unique ID. This value is used in the secret creation that follows next
eksaClusterId=$(kubectl get ns kube-system -o jsonpath='{.metadata.uid}')
Create a secret for vSphere CSI drivers to communicate with vSphere
cat <<EOF > csi-vsphere.conf
[Global]
cluster-id = "$eksaClusterId"
[VirtualCenter "apex-mg-vcsa.edub.csc"]
insecure-flag = "true"
user = "administrator@vsphere.local"
password = "XXXXXXXX"
port = "443"
datacenters = "VxRail-Datacenter"
EOF
kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=vmware-system-csi
There is no need to taint the control plane nodes as described in the Broadcom/VMware documentation as EKS Anywhere by default places those while the cluster is created. The same can be observed below
kubectl describe nodes | egrep "Taints:|Name:"
Name: eksa-1-78dlx
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: eksa-1-gzhr4
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: eksa-1-md-0-52bn5-2qmhq
Taints: <none>
Name: eksa-1-md-0-52bn5-khrlm
Taints: <none>
Name: eksa-1-vqhh9
Taints: node-role.kubernetes.io/control-plane:NoSchedule
I have applied a label of group=md-0 on my worker nodes. The intent being that the CSI node driver pods are only hosted on the worker nodes.
Next we will download the YAML file that deploys all necessary resources for vSphere CSI. The YAML file is hosted on the vSphere CSI GitHub repository https://github.com/kubernetes-sigs/vsphere-csi-driver.git
As of this writing release 3.4 is the latest release, so we will switch to that branch and navigate to manifests > vanilla and download the raw YAML for the file named vsphere-csi-driver.yaml
Once downloaded, we will edit the file to include the nodeSelector label of group: md-0 for the daemonset resources, such that the node driver pods are resident only on the EKS Anywhere cluster’s worker nodes.
A snip of how the daemonset configuration looks like at my end
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: vsphere-csi-node
namespace: vmware-system-csi
spec:
selector:
matchLabels:
app: vsphere-csi-node
updateStrategy:
type: "RollingUpdate"
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: vsphere-csi-node
role: vsphere-csi
spec:
priorityClassName: system-node-critical
nodeSelector:
kubernetes.io/os: linux
group: md-0
In addition to that we can OPTIONALLY delete the daemonset for the windows node driver pod.
Apply the csi-vsphere.yaml file to the cluster
kubectl apply -f vsphere-csi-driver.yaml
Observe the creation of controller and node pods for vSphere CSI
kubectl get pods -n vmware-system-csi
NAME READY STATUS RESTARTS AGE
vsphere-csi-controller-6b856bfcff-krwvm 7/7 Running 0 4m9s
vsphere-csi-controller-6b856bfcff-ps5tf 7/7 Running 0 4m9s
vsphere-csi-controller-6b856bfcff-w6mfg 7/7 Running 0 4m9s
vsphere-csi-node-9z7jz 3/3 Running 0 4m9s
vsphere-csi-node-bqd4g 3/3 Running 0 4m9s
Create a storage class that will be used for persistent volumes
cat <<EOF | kubectl apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: vsphere-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.vsphere.vmware.com
EOF
Create a Persistent volume claim for a block volume that will be used by the MySQL pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
labels:
name: mysql-pv-claim
csi: vsphere
spec:
storageClassName: vsphere-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
EOF
Ensure Persistent volume is created, and claim is bound
kubectl describe pvc
Name: mysql-pv-claim
Namespace: default
StorageClass: vsphere-sc
Status: Bound
Volume: pvc-b4aef04b-bfd5-4504-a7ed-cbc84ad0573d
Labels: csi=vsphere
name=mysql-pv-claim
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 8Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: mysql-795bf95547-8rql8
Events: <none>
kubectl describe pv
Name: pvc-b4aef04b-bfd5-4504-a7ed-cbc84ad0573d
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
volume.kubernetes.io/provisioner-deletion-secret-name:
volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers: [kubernetes.io/pv-protection external-attacher/csi-vsphere-vmware-com]
StorageClass: vsphere-sc
Status: Bound
Claim: default/mysql-pv-claim
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 8Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: csi.vsphere.vmware.com
FSType: ext4
VolumeHandle: 00e7921f-7f99-40d6-9ba1-1d72dfe1fdfe
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1737703907700-856-csi.vsphere.vmware.com
type=vSphere CNS Block Volume
Events: <none>
One can also note that this volume is seen as container volume in vSphere web UI. Navigate to the respective data store > monitor > cloud native storage > container volumes
Create the MySQL pod and service
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
ports:
- port: 3306
selector:
app: mysql
clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql
spec:
containers:
- image: mysql:5.6
name: mysql
env:
# Use secret in real usage
- name: MYSQL_ROOT_PASSWORD
value: password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pv-claim
EOF
Ensure that the MySQL Pod is running, which ensures that it was able to mount the persistent volume via the mentioned claim
kubectl describe pod
Name: mysql-795bf95547-8rql8
Namespace: default
Priority: 0
Service Account: default
Node: eksa-1-md-0-52bn5-khrlm/10.204.110.72
Start Time: Fri, 24 Jan 2025 12:38:26 +0000
Labels: app=mysql
pod-template-hash=795bf95547
Annotations: <none>
Status: Running
IP: 192.168.4.223
IPs:
IP: 192.168.4.223
Controlled By: ReplicaSet/mysql-795bf95547
Containers:
mysql:
Container ID: containerd://60ab0517a3ccaa05e9c19cd5b866edf6eae4a8153f10680d3a43cd2858e1386b
Image: mysql:5.6
Image ID: docker.io/library/mysql@sha256:20575ecebe6216036d25dab5903808211f1e9ba63dc7825ac20cb975e34cfcae
Port: 3306/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 24 Jan 2025 12:38:35 +0000
Ready: True
Restart Count: 0
Environment:
MYSQL_ROOT_PASSWORD: password
Mounts:
/var/lib/mysql from mysql-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vznkx (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
mysql-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mysql-pv-claim
ReadOnly: false
kube-api-access-vznkx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
That’s it!
Hopefully, this article eases out any ambiguities when it comes to deploying and leverage vSphere CSI for stateful workloads on EKS Anywhere,
cheers,
Ambar@thecloudgarage
#iwork4dell