EKS Anywhere, exploring Kubeflow and running Jupyter notebooks with NVIDIA GPU support on bare-metal clusters

Ambar Hassani
4 min readJan 22, 2025

--

This article is part of the EKS Anywhere series EKS Anywhere, extending the Hybrid cloud momentum | by Ambar Hassani

Needless, to say that Kubeflow is a very broad spectrum of a subject on its own and a tard bit difficult to be a full scope for this blog. We will continue to unfold various use-cases for Kubeflow in latter blogs.

Image credit Architecture | Kubeflow

However, in this specific article we will strive to explore an end-to-end bare minimum use case for running Jupyter notebooks on Kubeflow with NVIDIA GPUs.

The Kubeflow deployment is done on EKS Anywhere bare metal clusters and coupled with NVIDIA GPU operator. We will also be leveraging Dell’s Powerscale persistent storage layer for Kubeflow via the CSI integration.

The entirety of this deployment is covered in the below two videos.

The first video demonstrates an end-to-end workflow including deployment of an EKS-Anywhere bare metal cluster, Dell Powerscale CSI drivers, NVIDIA GPU Operator, MetalLB cloud native load balancer, KubeFlow and finally a sample Jupyter Notebook with GPU support

The second video is a Kubeflow deployment to enable SSL support via Istio Ingress gateway

Configurations used in the demonstration

Dell Powerscale CSI driver deployment via Helm charts

export passwordOfPowerScaleCluster=XXXXX
export csiReleaseNumber=2.10.0
export powerScaleClusterName=F900-AI
export userNameOfPowerScaleCluster=root
export ipOrFqdnOfPowerScaleCluster=172.29.208.91

eksdistroversion=$(kubectl version -o json | jq -r '.serverVersion.gitVersion')
export eksdistroversion

#CLONE THE POWERSCALE CSI REPO
rm -rf csi-powerscale
mkdir -p csi-powerscale
cd csi-powerscale
git clone --quiet -c advice.detachedHead=false -b csi-isilon-$csiReleaseNumber https://github.com/dell/helm-charts

#MODIFY VOLUME PREFIXES
sed -i "s/^volumeNamePrefix:.*/volumeNamePrefix:\ $clusterName/g" helm-charts/charts/csi-isilon/values.yaml
sed -i "s/snapNamePrefix: snapshot/snapNamePrefix: $clusterName-snap/g" helm-charts/charts/csi-isilon/values.yaml
sed -i 's/isiAuthType: 0/isiAuthType: 1/g' helm-charts/charts/csi-isilon/values.yaml

#MODIFY K8S VERSION IN THE HELM CHART TO CUSTOM VALUE USED BY EKS DISTRO
sed -i "s/^kubeVersion.*/kubeVersion: \"${eksdistroversion}\"/g" helm-charts/charts/csi-isilon/Chart.yaml

#PREPARE FOR POWERSCALE CSI INSTALLATION
kubectl create namespace csi-powerscale
wget https://raw.githubusercontent.com/thecloudgarage/eks-anywhere/main/powerscale/powerscale-creds.yaml
wget https://raw.githubusercontent.com/thecloudgarage/eks-anywhere/main/powerscale/emptysecret.yaml

#BUILD CREDS FILE FOR POWERSCALE CSI
sed -i "s/powerscale_cluster_name/$powerScaleClusterName/g" powerscale-creds.yaml
sed -i "s/powerscale_username/$userNameOfPowerScaleCluster/g" powerscale-creds.yaml
sed -i "s/powerscale_password/$passwordOfPowerScaleCluster/g" powerscale-creds.yaml
sed -i "s/powerscale_endpoint/$ipOrFqdnOfPowerScaleCluster/g" powerscale-creds.yaml

#CREATE SECRETS FOR POWERSCALE CSI
kubectl create secret generic isilon-creds -n csi-powerscale --from-file=config=powerscale-creds.yaml -o yaml --dry-run=client | kubectl apply -f -
kubectl create -f emptysecret.yaml

#INSTALL POWERSCALE CSI
cd helm-charts/charts
helm install isilon -n csi-powerscale csi-isilon/ --values csi-isilon/values.yaml

#CREATE STORAGE CLASS FOR POWERSCALE CSI
wget https://raw.githubusercontent.com/thecloudgarage/eks-anywhere/main/powerscale/powerscale-storageclass.yaml
kubectl create -f powerscale-storageclass.yaml

#PATCH STORAGE CLASS AS DEFAULT
kubectl patch storageclass powerscale -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Homebrew installation (pre-requisite for kustomize)

wget https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh
chmod +x install.sh
#NOTE HOW WE ARE PASSING AN ENTER FOR THE INTERACTIVE PROMPT THAT THE INSTALL SCRIPT GENERATES TO CONFIRM FOR INSTALLATION
sudo echo -ne '\n' | ./install.sh
echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"' >> /home/prd/.profile
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"

Kustomize

brew install kustomize

NVIDIA GPU Operator

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator
kubectl get pods -n gpu-operator
kubectl get node -o json | jq '.items[].metadata.labels'

MetalLB

helm repo add metallb https://metallb.github.io/metallb
helm install metallb metallb/metallb --wait --timeout 15m --namespace metallb-system --create-namespace
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- 172.29.198.75-172.29.198.76
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: example
namespace: metallb-system
EOF

Kubeflow

git clone https://github.com/kubeflow/manifests.git
cd manifests
while ! kustomize build example | kubectl apply -f - --server-side --force-conflicts; do echo "Retrying to apply resources"; sleep 10; done

Accessing Kubeflow via port-forward

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
http://127.0.0.1:8080
Default credentials- user@example.com and 12341234

Accessing Kubeflow via Istio Ingress over SSL

# PATCH THE SERVICE TYPE FOR ISTIO INGRESS GATEWAY
kubectl patch svc istio-ingressgateway -n istio-system -p '{"spec": {"type": "LoadBalancer"}}'
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

# CREATE A CERTIFICATE
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: istio-ingressgateway-certs
namespace: istio-system
spec:
commonName: istio-ingressgateway.istio-system.svc
# Use ipAddresses if your LoadBalancer issues an IP
ipAddresses:
- $INGRESS_HOST
# Use dnsNames if your LoadBalancer issues a hostname (eg DNS name from Civo dashboard)
isCA: true
issuerRef:
kind: ClusterIssuer
name: kubeflow-self-signing-issuer
secretName: istio-ingressgateway-certs
EOF

# EDIT THE INGRESS GATEWAY OBJECT FOR KUBEFLOW-GATEWAY AND REPLACE THE VALUES STARTING FROM SERVER BLOCK

KUBE_EDITOR="nano" kubectl edit -n kubeflow gateways.networking.istio.io kubeflow-gateway

servers:
- hosts:
- '*'
port:
name: https
number: 443
protocol: HTTPS
tls:
mode: SIMPLE
privateKey: /etc/istio/ingressgateway-certs/tls.key
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

# Access Kubeflow via HTTPs
https://<external-ip-of-istio-ingress-gateway>
Default credentials- user@example.com and 12341234

Basic notebook to test GPU

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Sample notebook runs for Keras jobs

# Launch a terminal inside of the Jupypter notebook instance
# Install the below pre-requisites

pip install tensorflow-datasets
pip install tfds-nightly

Run the sample notebooks from the below sites

https://www.tensorflow.org/datasets/keras_example
https://github.com/tensorflow/datasets/blob/master/docs/keras_example.ipynb

Hope the article was useful, providing insights in case you decide to hop on to this journey running Kubeflow on EKS Anywhere bare metal clusters with NVIDIA GPU support,

cheers,

Ambar@thecloudgarage

#iwork4dell

--

--

Ambar Hassani
Ambar Hassani

Written by Ambar Hassani

26+ years of blended experience of technology & people leadership, startup management and disruptive acceleration/adoption of next-gen technologies

No responses yet