Understanding Prometheus & Grafana — My DevOps Journey
Raees Qazi | DevOps Engineer | Learner | Mentor | Creator | CEO-Briller Technologies
Today, I want to talk about Prometheus and Grafana. I’m personally very excited about these tools because, Alhamdulillah, within one year I have completed the entire DevOps stack. Now, let’s move to the main topic.
Most people think DevOps means deploying a website, an application, using Ansible, building CI/CD pipelines, and that’s it.
But the truth is: real DevOps work actually begins after deployment.
Once an application goes live, the work of DevOps/SRE starts. We begin observing:
- CPU usage
- Memory consumption
- Incoming and outgoing traffic
- Application health
- System behavior
This is the responsibility of an SRE or Senior DevOps Engineer.
And in this phase, three important concepts come in: SLA, SLO, and SLI.

✅ What is SLA? (Service Level Agreement)
An SLA is an agreement between two parties — usually the product owner and the customer.
The product owner commits that the application will be:
- Running 24/7
- Secure and safe
- Scalable
- Reliable and available
- Performing well
In short: Your application should run smoothly without issues, and the owner is responsible for that promise.
✅ Observability Tools After Deployment
Once an app is deployed, we use different tools to observe, monitor, and track everything.
1. Monitoring
Tools monitor the node, app, and cluster to ensure everything is working properly.
2. Logging
For logs, we use:
- Grafana
- Loki
- Prom tail
Logs help us see which user came, what actions they performed, and what errors occurred.
3. Tracing
For tracing, we use eBPF, which works at the kernel level of the operating system.
This helps in identifying deep system issues and troubleshooting efficiently.
4. Alerting
For alerting, we use:
- Prometheus
- Grafana
These tools notify us when something goes wrong.
✅ What is SLO? (Service Level Objective)
SLOs ensure that the SLA is not broken.
These are internal targets we set to maintain quality.
Examples of SLOs:
- Monitoring must be enabled.
- Logging must be enabled.
- Tracing must be enabled.
- Alerting must be enabled.
In short, SLOs are the rules we follow so the SLA remains intact.
✅ What is SLI? (Service Level Indicator)
SLIs are metrics that help us measure the performance of our system.
They show when CPU or memory goes up or down, how traffic behaves, etc.
Tools used to measure SLIs:
- Prometheus (a Time Series Database — TSDB)
- Grafana
- EFK stack
- ELK stack
- Datadog
✅ Observability — Why This Matters
Observability helps us answer two important questions:
- Why is the app running fine?
- If it went down, why did it go down?
To visualize observability, we use tools like:
- Kibana
- CloudWatch
- Grafana
These give us a clear picture of what is happening inside the system.
✅ Final Thoughts
I hope this blog helped you clearly understand SLA, SLO, SLI, and how tools like Prometheus and Grafana support observability in a DevOps environment.
Please share, subscribe, and support.
More DevOps content is on the way!
🌐 Online References
- LinkedIn: https://www.linkedin.com/in/raees-yaqoob-qazi-ryqs/
- Medium Blog: https://medium.com/@raeesyaqubqazi
- Blogspot: https://brillertechnologies.blogspot.com/
- Facebook Page: https://web.facebook.com/profile.php?id=61553548371216
- YouTube Channel: https://www.youtube.com/@RaeesQ.
- TikTok: https://www.tiktok.com/@mrryqs?_t=ZS-8y7t0fQfJKu&_r=1
Comments
Post a Comment