SHASHI KANT SHAH: Auto Scaling (HPA) , (VPA) for Kubernetes.

Auto Scaling in Kubernetes (K8s).

Auto scaling in Kubernetes ensures that applications can handle changing workloads by dynamically adjusting the number of pods or nodes.

There are three main types of auto scaling in Kubernetes:

1️.Horizontal Pod Autoscaler (HPA) – Scales pods based on CPU, memory, or custom metrics.
2️.Vertical Pod Autoscaler (VPA) – Adjusts pod resource requests & limits.
3️.Cluster Autoscaler (CA) – Adds or removes worker nodes based on demand.

1️.Horizontal Pod Autoscaler (HPA) –

1.Enable Metrics Server:

Check if the Metrics Server is running:

# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

# kubectl get pods --all-namespaces

# kubectl get deployment -n kube-system

# kubectl -n kube-system edit deployments.apps metrics-server

command:

- /metrics-server

- --kubelet-insecure-tls

- --kubelet-preferred-address-types=InternalIP

# kubectl delete pod -n kube-system -l k8s-app=metrics-server

# kubectl get pods -n kube-system

# kubectl logs -n kube-system -l k8s-app=metrics-server

# kubectl top nodes

# kubectl top pods -A

# wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# vim components.yaml

- --kubelet-insecure-tls

- --kubelet-preferred-address-types=InternalIP

# kubectl apply -f components.yaml

# kubectl get pods -n kube-system | grep metrics-server

# kubectl logs -n kube-system -l k8s-app=metrics-server

# kubectl top nodes

# kubectl top pods -n <namespace>

2. Autoscale CPU-based HPA a Deployment.

# vim nginx-deploy.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

spec:

replicas: 2

selector:

matchLabels:

app: nginx

template:

metadata:

labels:

app: nginx

spec:

containers:

- name: nginx

image: nginx

resources:

requests:

cpu: "100m"

limits:

cpu: "200m"

ports:

- containerPort: 80

requests: The guaranteed amount of CPU/Memory for each pod.

limits: The maximum resources a pod can use before being throttled.

100m (millicores) = 0.1 CPU core

200m (millicores) = 0.2 CPU core

1000m = 1 full CPU core

3.Create the Service (nginx-service.yaml)

apiVersion: v1

kind: Service

metadata:

spec:

selector:

app: nginx

ports:

- protocol: TCP

port: 80

targetPort: 80

type: ClusterIP

# kubectl apply -f nginx-deploy.yaml

# kubectl apply -f nginx-service.yaml

# kubectl get deployments

# kubectl get pods

# kubectl get svc

scaling between 2 and 5 pods when CPU utilization exceeds 10% of the requested CPU.

# kubectl autoscale deployment <deloyment_name> --cpu-percent=10 --min=2 --max=5

# kubectl autoscale deployment nginx-deployment --cpu-percent=10 --min=2 --max=5

#kubectl get hpa

4.Generate Load on Nginx.

kubectl run -it --rm load-generator --image=busybox -- /bin/sh -c "while true; do wget -q -O- http://nginx-service; done"

# kubectl describe hpa nginx-deployment

5.How to delete HPA AutoScaling.

# kubectl delete vpa nginx-deployment

# kubectl delete -f components.yaml

# kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

# kubectl delete -f nginx-deploy.yaml

# kubectl delete -f nginx-service.yaml

HPA for Memory Autoscaling.

Scale nginx-deployment between 2 and 5 pods.

Trigger scaling if average memory utilization exceeds 30%.

1.Create an HPA for Memory Autoscaling.

# vim nginx-deploy.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

spec:

replicas: 2

selector:

matchLabels:

app: nginx

template:

metadata:

labels:

app: nginx

spec:

containers:

- name: nginx

image: nginx

resources:

requests:

cpu: "100m"

memory: "20Mi" # Set memory request

limits:

cpu: "200m"

memory: "30Mi" # Set memory limit

ports:

- containerPort: 80

requests: The guaranteed amount of CPU/Memory for each pod.
limits: The maximum resources a pod can use before being throttled.

2.Create an HPA YAML file.

i)Scale nginx-deployment between 2 and 5 pods.

ii)Trigger scaling if average memory utilization exceeds 25%.

# vim nginx-hpa.yaml

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

minReplicas: 2

maxReplicas: 5

metrics:

- type: Resource

resource:

target:

type: Utilization

averageUtilization: 25 # Adjust this threshold as needed

# kubectl apply -f nginx-deploy.yaml

# kubectl top pods

# kubectl get hpa

4. Generate Memory Load to Trigger Scaling

Run a memory-intensive process in the Nginx pods:

# kubectl run -it --rm load-generator --image=busybox -- /bin/sh

# dd if=/dev/zero of=/dev/null &

This will increase memory usage and trigger autoscaling.

# kubectl exec -it nginx-deployment-5d58ccd545-q8tdd -- /bin/bash

# kubectl exec -it nginx-deployment-5d58ccd545-88fvs -- /bin/bash

# kubectl get hpa

# kubectl get pods

# kubectl describe hpa nginx-hpa

Vertical Pod Autoscaler (VPA) –

Unlike Horizontal Pod Autoscaler (HPA), which scales the number of pods, VPA automatically adjusts CPU and memory requests/limits for your pods based on usage.

How VPA Works:

VPA Recommender:

Analyzes historical resource usage and provides recommendations for CPU and memory requests.

VPA Updater:

Automatically applies the recommended resource requests by evicting and restarting pods.

VPA Admission Controller:

Modifies pod resource requests during pod creation to ensure they align with VPA recommendations.

Modes of VPA:

Auto:

VPA automatically applies resource recommendations and restarts pods as needed.

Initial:

VPA only applies recommendations when the pod is first created.

Off:

VPA provides recommendations but does not apply them.

VPA for Autoscaling.

git clone https://github.com/kubernetes/autoscaler.git

cd autoscaler/vertical-pod-autoscaler

./hack/vpa-up.sh

kubectl get pods -n kube-system | grep vpa

for delete

./hack/vpa-down.sh

# kubectl delete -f vertical-pod-autoscaler/deploy/

# kubectl create namespace vpa-demo

Deploy an Application (Nginx).

# vim nginx-deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

namespace: vpa-demo

spec:

replicas: 2

selector:

matchLabels:

app: nginx

template:

metadata:

labels:

app: nginx

spec:

containers:

- name: nginx

image: nginx

resources:

requests:

cpu: "100m"

memory: "128Mi"

limits:

cpu: "200m"

memory: "256Mi"

# kubectl apply -f nginx-deployment.yaml

# kubectl get pods -n vpa-demo

Deploy VPA for Nginx

# vim nginx-vpa.yaml

apiVersion: autoscaling.k8s.io/v1

kind: VerticalPodAutoscaler

metadata:

namespace: vpa-demo

spec:

targetRef:

apiVersion: apps/v1

kind: Deployment

updatePolicy:

updateMode: "Auto" # VPA automatically applies changes

resourcePolicy:

containerPolicies:

- containerName: nginx

minAllowed:

cpu: "50m"

memory: "64Mi"

maxAllowed:

cpu: "500m"

memory: "512Mi"

controlledResources: ["cpu", "memory"]

# kubectl apply -f nginx-vpa.yaml

# kubectl get vpa -n vpa-demo

# kubectl describe vpa nginx-vpa -n vpa-demo

Target → The recommended CPU & memory for optimal performance.

Lower Bound / Upper Bound → The minimum and maximum safe values.

Simulate High CPU Load (Trigger VPA).

# kubectl exec -it -n vpa-demo nginx-deployment-57c44dfdb7-7jmgl -- /bin/bash

dd if=/dev/zero of=/dev/null &

# kubectl exec -it -n vpa-demo nginx-deployment-57c44dfdb7-mfdv7 -- /bin/bash

dd if=/dev/zero of=/dev/null

What is the Difference Between VPA and HPA?

Feature	VPA (Vertical Scaling)	HPA (Horizontal Scaling)
Adjusts	CPU & Memory Requests	Number of Pods
Method	Restarts pod with new resources	Adds/Removes pods dynamically
Best for	Apps with unpredictable CPU/memory needs	Apps with fluctuating traffic

What is the Difference Between VPA and ResourceQuota?

Feature	VPA (Vertical Pod Autoscaler)	ResourceQuota
Scope	Per Pod/Container	Per Namespace
Purpose	Adjusts pod resource requests and limits dynamically	Enforces fixed limits on total namespace resources
Scaling Type	Vertical Scaling (adjusts CPU/memory per pod)	Quota Management (restricts overall usage in a namespace)
Pod Restart?	Yes (Pods restart with new requests/limits)	No, only applies to new resource allocations
Use Case	Optimize resource allocation for running workloads	Prevent excessive resource usage across multiple pods

What is Cluster Autoscaler (CA)?

The Cluster Autoscaler (CA) is a Kubernetes component that automatically scales worker nodes in a cluster up or down based on workload demands. It ensures efficient resource utilization while optimizing costs.

i)Scales Up: Adds new nodes when there aren’t enough resources for pending pods.
ii)Scales Down: Removes underutilized nodes to save costs.
iii)Works with Auto Scaling Groups (ASG): On cloud platforms (AWS, GCP, Azure, OpenShift).

✅ Cluster Autoscaler automatically scales nodes up/down
✅ Works with cloud providers & can be integrated with bare metal provisioning
✅ Prevents pending pods & removes underutilized nodes to optimize costs
✅ Works best with Auto Scaling Groups (AWS, GCP, Azure) or Metal³ (Bare Metal).

Autoscaler	What It Scales	Trigger Condition	Primary Use Case
Horizontal Pod Autoscaler (HPA)	Number of Pods	CPU, Memory, or Custom Metrics	Scale pods up/down based on workload
Vertical Pod Autoscaler (VPA)	CPU & Memory of Pods	Usage history & recommendations	Adjust resource requests/limits for pods
Cluster Autoscaler (CA)	Number of Nodes	Pending Pods & Node Utilization	Add/remove worker nodes dynamically

SHASHI KANT SHAH

Shashikant shah

Monday, 3 March 2025

Auto Scaling (HPA) , (VPA) for Kubernetes.

No comments:

Post a Comment

Followers

Total Pageviews

DevOps Engineer