Understanding Horizontal Pod Autoscaler (HPA) in Kubernetes — A Real-Life Example
By Raees Qazi | DevOps Engineer | Learner | Mentor | Creator
Today, I’m going to talk about a very important and practical concept in Kubernetes: HPA, which stands for Horizontal Pod Autoscaler.
As a DevOps engineer, understanding HPA is crucial for managing applications at scale — especially when dealing with sudden traffic spikes. Let’s understand this with a real-life example.

Real-World Scenario: Blessed Friday Sale on Daraz
Imagine it’s Blessed Friday Sale on Daraz — Pakistan’s biggest online shopping event. Suddenly, thousands of users start hitting the website, and the traffic increases rapidly.
Now, this is where HPA comes in.
Instead of manually increasing the number of pods, HPA will automatically scale your application pods based on the traffic load (CPU or memory usage, etc.). As traffic increases, new pods are created automatically, and once the traffic drops after the sale, extra pods are removed. This ensures your application remains highly available 24/7 with no downtime and optimal resource usage.
How HPA Works Behind the Scenes
To understand HPA better, let’s break it down with some key concepts:
🔧 Pods and Resources
- Your application runs inside pods.
- These pods live on worker nodes.
- Every pod consumes system resources like CPU, RAM, and disk.
We control how much a pod can request or consume using resource limits and requests in the manifest file.
📊 Example: Resource Requests and Limits
Let’s say your application pod:
- Requests:
50m(50 millicores of CPU) - Limit:
100m(maximum allowed usage)
Now when traffic increases:
- The pod starts using 50m → 60m → 70m → 90m
- Once the usage crosses a threshold (for example, 80% CPU), HPA creates a new pod automatically
- This way, the load gets distributed, and your app stays responsive.
Why Setting Limits and Replicas is Important
If you don’t restrict the number of pods, HPA can keep creating pods as long as traffic increases — which can overload your node and crash it.
So we do two things:
- Set resource limits and requests to manage how much each pod can consume.
- Define max and min replicas in the HPA configuration to control scaling boundaries.
This keeps the node machine safe, prevents overload, and ensures efficient resource usage.
🧾 What’s in the HPA Manifest?
In your HPA YAML file, you’ll find this important line:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-appThis tells Kubernetes which object to scale. In most real-world use cases (like this one), we target a Deployment, so that the Pods inside that deployment get auto-scaled.
✅ Summary
- HPA = Horizontal Pod Autoscaler
- Automatically manages pods based on CPU/Memory usage
- Keeps your app running smoothly during traffic spikes (like Daraz sales)
- Uses limits, requests, and replica counts to keep nodes healthy
- Target is usually a Deployment using
scaleTargetRef
🙌 Final Words
I hope this blog helped you understand HPA in an easy and practical way. If you found it useful, share this knowledge with your team and fellow DevOps professionals. Sharing is caring, especially in tech!
Stay blessed and keep scaling smartly! 🚀
Comments
Post a Comment