Understanding Horizontal Pod Autoscaler (HPA) in Kubernetes

Understanding Horizontal Pod Autoscaler (HPA) in Kubernetes — A Real-Life Example

November 08, 2025

By Raees Qazi | DevOps Engineer | Learner | Mentor | Creator

Today, I’m going to talk about a very important and practical concept in Kubernetes: HPA, which stands for Horizontal Pod Autoscaler.

As a DevOps engineer, understanding HPA is crucial for managing applications at scale — especially when dealing with sudden traffic spikes. Let’s understand this with a real-life example.

Real-World Scenario: Blessed Friday Sale on Daraz

Imagine it’s Blessed Friday Sale on Daraz — Pakistan’s biggest online shopping event. Suddenly, thousands of users start hitting the website, and the traffic increases rapidly.

Now, this is where HPA comes in.

Instead of manually increasing the number of pods, HPA will automatically scale your application pods based on the traffic load (CPU or memory usage, etc.). As traffic increases, new pods are created automatically, and once the traffic drops after the sale, extra pods are removed. This ensures your application remains highly available 24/7 with no downtime and optimal resource usage.

How HPA Works Behind the Scenes

To understand HPA better, let’s break it down with some key concepts:

🔧 Pods and Resources

Your application runs inside pods.
These pods live on worker nodes.
Every pod consumes system resources like CPU, RAM, and disk.

We control how much a pod can request or consume using resource limits and requests in the manifest file.

📊 Example: Resource Requests and Limits

Let’s say your application pod:

Requests: 50m (50 millicores of CPU)
Limit: 100m (maximum allowed usage)

Now when traffic increases:

The pod starts using 50m → 60m → 70m → 90m
Once the usage crosses a threshold (for example, 80% CPU), HPA creates a new pod automatically
This way, the load gets distributed, and your app stays responsive.

Why Setting Limits and Replicas is Important

If you don’t restrict the number of pods, HPA can keep creating pods as long as traffic increases — which can overload your node and crash it.

So we do two things:

Set resource limits and requests to manage how much each pod can consume.
Define max and min replicas in the HPA configuration to control scaling boundaries.

This keeps the node machine safe, prevents overload, and ensures efficient resource usage.

🧾 What’s in the HPA Manifest?

In your HPA YAML file, you’ll find this important line:

scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: my-app

This tells Kubernetes which object to scale. In most real-world use cases (like this one), we target a Deployment, so that the Pods inside that deployment get auto-scaled.

✅ Summary

HPA = Horizontal Pod Autoscaler
Automatically manages pods based on CPU/Memory usage
Keeps your app running smoothly during traffic spikes (like Daraz sales)
Uses limits, requests, and replica counts to keep nodes healthy
Target is usually a Deployment using scaleTargetRef

🙌 Final Words

I hope this blog helped you understand HPA in an easy and practical way. If you found it useful, share this knowledge with your team and fellow DevOps professionals. Sharing is caring, especially in tech!

Stay blessed and keep scaling smartly! 🚀

Search This Blog

Raees Qazi (RYQ's) | DevOps Engineer | Learner | Mentor | Creator