Shashikant shah

Monday, 3 March 2025

Auto Scaling (HPA) , (VPA) for Kubernetes.

 Auto Scaling in Kubernetes (K8s).

Auto scaling in Kubernetes ensures that applications can handle changing workloads by dynamically adjusting the number of pods or nodes.

There are three main types of auto scaling in Kubernetes:

1️.Horizontal Pod Autoscaler (HPA) – Scales pods based on CPU, memory, or custom metrics.
2️.Vertical Pod Autoscaler (VPA) – Adjusts pod resource requests & limits.
3️.Cluster Autoscaler (CA) – Adds or removes worker nodes based on demand.


1️.Horizontal Pod Autoscaler (HPA)


1.Enable Metrics Server:

Check if the Metrics Server is running:

# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

# kubectl get pods --all-namespaces

# kubectl get deployment -n kube-system

# kubectl -n kube-system edit deployments.apps metrics-server

command:

        - /metrics-server

        - --kubelet-insecure-tls

        - --kubelet-preferred-address-types=InternalIP



 



# kubectl delete pod -n kube-system -l k8s-app=metrics-server

# kubectl get pods -n kube-system

# kubectl logs -n kube-system -l k8s-app=metrics-server

# kubectl top nodes




# kubectl top pods -A






                                Or

# wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# vim components.yaml

- --kubelet-insecure-tls

- --kubelet-preferred-address-types=InternalIP

# kubectl apply -f components.yaml

# kubectl get pods -n kube-system | grep metrics-server

# kubectl logs -n kube-system -l k8s-app=metrics-server

# kubectl top nodes

# kubectl top pods -n <namespace>


2. Autoscale CPU-based HPA a Deployment.

# vim nginx-deploy.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: nginx-deployment

spec:

  replicas: 2

  selector:

    matchLabels:

      app: nginx

  template:

    metadata:

      labels:

        app: nginx

    spec:

      containers:

      - name: nginx

        image: nginx

        resources:

          requests:

            cpu: "100m"

          limits:

            cpu: "200m"

        ports:

        - containerPort: 80


requests: The guaranteed amount of CPU/Memory for each pod.

limits: The maximum resources a pod can use before being throttled.

100m (millicores) = 0.1 CPU core

200m (millicores) = 0.2 CPU core

1000m = 1 full CPU core


3.Create the Service (nginx-service.yaml)

apiVersion: v1

kind: Service

metadata:

  name: nginx-service

spec:

  selector:

    app: nginx

  ports:

  - protocol: TCP

    port: 80

    targetPort: 80

  type: ClusterIP

 

# kubectl apply -f nginx-deploy.yaml

# kubectl apply -f nginx-service.yaml

# kubectl get deployments




# kubectl get pods




# kubectl get svc




scaling between 2 and 5 pods when CPU utilization exceeds 10% of the requested CPU.

# kubectl autoscale deployment <deloyment_name> --cpu-percent=10 --min=2 --max=5

# kubectl autoscale deployment nginx-deployment --cpu-percent=10 --min=2 --max=5

#kubectl get hpa



 

4.Generate Load on Nginx.

kubectl run -it --rm load-generator --image=busybox -- /bin/sh -c "while true; do wget -q -O- http://nginx-service; done"

 




# kubectl describe hpa nginx-deployment



 


 5.How to delete HPA  AutoScaling.

# kubectl delete vpa nginx-deployment

# kubectl delete  -f  components.yaml

Or

# kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

# kubectl  delete -f nginx-deploy.yaml

# kubectl  delete -f nginx-service.yaml


HPA for Memory Autoscaling.

Scale nginx-deployment between 2 and 5 pods.

Trigger scaling if average memory utilization exceeds 30%.

 

1.Create an HPA for Memory Autoscaling.

# vim nginx-deploy.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: nginx-deployment

spec:

  replicas: 2

  selector:

    matchLabels:

      app: nginx

  template:

    metadata:

      labels:

        app: nginx

    spec:

      containers:

      - name: nginx

        image: nginx

        resources:

          requests:

            cpu: "100m"

            memory: "20Mi"  # Set memory request

          limits:

            cpu: "200m"

            memory: "30Mi"  # Set memory limit

        ports:

        - containerPort: 80

  • requests: The guaranteed amount of CPU/Memory for each pod.
  • limits: The maximum resources a pod can use before being throttled.

 

2.Create an HPA YAML file.

i)Scale nginx-deployment between 2 and 5 pods.

ii)Trigger scaling if average memory utilization exceeds 25%.

# vim nginx-hpa.yaml

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

  name: nginx-hpa

spec:

  scaleTargetRef:

    apiVersion: apps/v1

    kind: Deployment

    name: nginx-deployment

  minReplicas: 2

  maxReplicas: 5

  metrics:

  - type: Resource

    resource:

      name: memory

      target:

        type: Utilization

        averageUtilization: 25  # Adjust this threshold as needed

 

# kubectl apply -f nginx-deploy.yaml

# kubectl top pods





# kubectl get hpa



 

4. Generate Memory Load to Trigger Scaling

Run a memory-intensive process in the Nginx pods:

# kubectl run -it --rm load-generator --image=busybox -- /bin/sh

# dd if=/dev/zero of=/dev/null &

This will increase memory usage and trigger autoscaling.

# kubectl exec -it nginx-deployment-5d58ccd545-q8tdd -- /bin/bash



 






# kubectl exec -it nginx-deployment-5d58ccd545-88fvs -- /bin/bash



 






# kubectl  get hpa





# kubectl  get pods




# kubectl describe hpa nginx-hpa


Vertical Pod Autoscaler (VPA) – 

Unlike Horizontal Pod Autoscaler (HPA), which scales the number of pods, VPA automatically adjusts CPU and memory requests/limits for your pods based on usage.


How VPA Works:

  1. VPA Recommender:
    • Analyzes historical resource usage and provides recommendations for CPU and memory requests.
  2. VPA Updater:
    • Automatically applies the recommended resource requests by evicting and restarting pods.
  3. VPA Admission Controller:
    • Modifies pod resource requests during pod creation to ensure they align with VPA recommendations.

Modes of VPA:

  1. Auto:
    • VPA automatically applies resource recommendations and restarts pods as needed.
  2. Initial:
    • VPA only applies recommendations when the pod is first created.
  3. Off:
    • VPA provides recommendations but does not apply them.

VPA for Autoscaling.

git clone https://github.com/kubernetes/autoscaler.git

cd autoscaler/vertical-pod-autoscaler

./hack/vpa-up.sh

kubectl get pods -n kube-system | grep vpa

for delete

./hack/vpa-down.sh

# kubectl delete -f vertical-pod-autoscaler/deploy/

 

# kubectl create namespace vpa-demo

Deploy an Application (Nginx).

# vim nginx-deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: nginx-deployment

  namespace: vpa-demo

spec:

  replicas: 2

  selector:

    matchLabels:

      app: nginx

  template:

    metadata:

      labels:

        app: nginx

    spec:

      containers:

      - name: nginx

        image: nginx

        resources:

          requests:

            cpu: "100m"

            memory: "128Mi"

          limits:

            cpu: "200m"

            memory: "256Mi"

# kubectl apply -f nginx-deployment.yaml

# kubectl get pods -n vpa-demo


Deploy VPA for Nginx

# vim nginx-vpa.yaml

apiVersion: autoscaling.k8s.io/v1

kind: VerticalPodAutoscaler

metadata:

  name: nginx-vpa

  namespace: vpa-demo

spec:

  targetRef:

    apiVersion: apps/v1

    kind: Deployment

    name: nginx-deployment

  updatePolicy:

    updateMode: "Auto"  # VPA automatically applies changes

  resourcePolicy:

    containerPolicies:

    - containerName: nginx

      minAllowed:

        cpu: "50m"

        memory: "64Mi"

      maxAllowed:

        cpu: "500m"

        memory: "512Mi"

      controlledResources: ["cpu", "memory"]

# kubectl apply -f nginx-vpa.yaml

# kubectl get vpa -n vpa-demo

# kubectl describe vpa nginx-vpa -n vpa-demo

 Target → The recommended CPU & memory for optimal performance.

Lower Bound / Upper Bound → The minimum and maximum safe values.

Simulate High CPU Load (Trigger VPA).

#  kubectl exec -it -n vpa-demo nginx-deployment-57c44dfdb7-7jmgl -- /bin/bash

dd if=/dev/zero of=/dev/null &

# kubectl exec -it -n vpa-demo nginx-deployment-57c44dfdb7-mfdv7  -- /bin/bash

dd if=/dev/zero of=/dev/null

What is the Difference Between VPA and HPA?

Feature

VPA (Vertical Scaling)

HPA (Horizontal Scaling)

Adjusts

CPU & Memory Requests

Number of Pods

Method

Restarts pod with new resources

Adds/Removes pods dynamically

Best for

Apps with unpredictable CPU/memory needs

Apps with fluctuating traffic


What is the Difference Between VPA and ResourceQuota?

Feature

VPA (Vertical Pod Autoscaler)

ResourceQuota

Scope

Per Pod/Container

Per Namespace

Purpose

Adjusts pod resource requests and limits dynamically

Enforces fixed limits on total namespace resources

Scaling Type

Vertical Scaling (adjusts CPU/memory per pod)

Quota Management (restricts overall usage in a namespace)

Pod Restart?

Yes (Pods restart with new requests/limits)

No, only applies to new resource allocations

Use Case

Optimize resource allocation for running workloads

Prevent excessive resource usage across multiple pods


What is Cluster Autoscaler (CA)?


The Cluster Autoscaler (CA) is a Kubernetes component that automatically scales worker nodes in a cluster up or down based on workload demands. It ensures efficient resource utilization while optimizing costs.

i)Scales Up: Adds new nodes when there aren’t enough resources for pending pods.
ii)Scales Down: Removes underutilized nodes to save costs.
iii)Works with Auto Scaling Groups (ASG):
On cloud platforms (AWS, GCP, Azure, OpenShift).

Cluster Autoscaler automatically scales nodes up/down
Works with cloud providers & can be integrated with bare metal provisioning
Prevents pending pods & removes underutilized nodes to optimize costs
Works best with Auto Scaling Groups (AWS, GCP, Azure) or Metal³ (Bare Metal).

Autoscaler

What It Scales

Trigger Condition

Primary Use Case

Horizontal Pod Autoscaler (HPA)

Number of Pods

CPU, Memory, or Custom Metrics

Scale pods up/down based on workload

Vertical Pod Autoscaler (VPA)

CPU & Memory of Pods

Usage history & recommendations

Adjust resource requests/limits for pods

Cluster Autoscaler (CA)

Number of Nodes

Pending Pods & Node Utilization

Add/remove worker nodes dynamically

 

 















No comments:

Post a Comment