SHASHI KANT SHAH

Prometheus is an open-source monitoring and alerting toolkit designed for recording real-time metrics in a time-series database (TSDB), built especially for cloud-native and container-based environments like Kubernetes.

It collects and stores metrics as time series data (i.e., values with timestamps), supports powerful querying via PromQL, and integrates with Grafana and Alertmanager.

Key Features of Prometheus:

Feature	Description
Time Series Storage	Stores metrics as time series with labels
Pull-based model	Prometheus scrapes metrics from targets, unlike push-based systems
PromQL	Built-in query language for filtering, calculations, and alert conditions
Service Discovery	Automatically finds services via Kubernetes, Consul, EC2, etc.
Visualization	Built-in graph UI; best used with Grafana dashboards
Alerting	Define alerts based on thresholds; send notifications via Alertmanager

Why Use Prometheus?

Open source and widely used
Lightweight and easy to install
Perfect for microservices and containerized apps
Strong community and support ecosystem
Compatible with exporters for system, database, application monitoring

Prometheus Components?

Prometheus has several core components that work together to collect, store, query, and alert based on metrics.

Component	Description
Prometheus Server	Core component that collects (scrapes), stores, and queries metrics
Exporters	Services or agents that expose metrics in Prometheus format
Alerting Rules	PromQL-based rules to define alert conditions
Alertmanager	Manages alerts – grouping, silencing, routing to email, Slack, etc.
Pushgateway	Allows short-lived jobs to push metrics to Prometheus
Service Discovery	Auto-discovers scrape targets (e.g., Kubernetes pods, EC2 instances)
Prometheus UI	Built-in web interface to run queries, check targets, and view alerts
Grafana	External tool for beautiful dashboards and visualizations using Prometheus data

Prometheus vs. Other Monitoring Tools.

Feature	Prometheus	Graphite	Elastic Stack	Datadog
Architecture	Pull-based	Push & Pull	Pull	Agent-based
Primary Data Model	Time series	Time series	Logs	Custom metrics
Query Language	PromQL	Custom DSL	kibana query	Custom UI
Horizontal Scaling	Supported	Limited	Supported	Fully managed service
Open source	Yes	Yes	Yes	No

Basic Terminologies in Prometheus

basic terminologies in Prometheus explained simply, with practical context for better understanding:

Term	Meaning
Time Series	Prometheus stores metric values with timestamps, so it creates time series data.
Metric	Metric is a measured value collected from a system, like CPU, memory, or request count. (e.g., http_requests_total)
Label	Key-value pair to differentiate time series
Job	Logical group of scrape targets
Instance	A single scrape target (usually host:port)
Target	Actual endpoint Prometheus scrapes metrics from
Scraping	Prometheus pulling data from targets at regular intervals
PromQL	Prometheus Query Language to analyze and fetch data
Exporter	Service or agent that exposes metrics in Prometheus format
Recording Rule	Precomputed PromQL query result stored as a new time series
Alerting Rule	PromQL expression that triggers alerts when conditions are met
Alertmanager	Handles alerts: grouping, deduping, routing to email, Slack, etc.
Pushgateway	Allows short-lived jobs to push metrics to Prometheus
Retention	Duration Prometheus keeps time-series data (e.g., 15 days)
TSDB	Time Series Database used internally by Prometheus
Service Discovery	Automatically finds targets (Kubernetes, Consul, EC2, etc.)
Histogram	Metric type that counts observations in configurable buckets
Summary	Metric type similar to histogram but provides quantile estimation
Gauge	Metric that can go up or down (e.g., memory usage, temperature)
Counter	Monotonically increasing metric (e.g., number of requests)
Label Set	Complete set of labels attached to a metric time series
Expression Browser	Prometheus web UI for querying and visualizing time series data

Architecture of Prometheus.

✅ 1. Time Series

What: A series of metric values tracked over time.
Example: CPU usage of a server every 10 seconds.

✅ 2. Metric

What: The actual measurement name.
Example: http_requests_total (total HTTP requests)

✅ 3. Label

What: Key-value pair to give more information about a metric.
Example:
http_requests_total{method="GET", status="200"}
method and status are labels.

✅ 4. Job

What: A group of similar targets.
Example: All Node Exporters can be under job node.

✅ 5. Instance

What: A single target with address (host:port).
Example: 10.1.2.3:9100 for one Node Exporter.

✅ 6. Target

What: The actual endpoint Prometheus collects data from.
Includes: job, instance, labels.

✅ 7. Scraping

What: The process where Prometheus collects data from a target.

✅ 8. PromQL

What: Prometheus Query Language.
Use: To filter, calculate, and display data.
Example:
rate(http_requests_total[5m])

✅ 9. Exporter

What: A tool that exposes metrics in a Prometheus-readable format.
Example:

node_exporter for Linux server metrics
mysqld_exporter for MySQL metrics

✅ 10. Recording Rule

What: Saves the result of a query as a new metric.
Why: Reduces query load and speeds up dashboards.

✅ 11. Alerting Rule

What: A rule that defines when an alert should fire.
Example:
Alert when CPU usage > 90% for 5 minutes.

✅ 12. Alertmanager

What: Manages alerts – sends them via Email, Slack, etc.
Also handles:

Grouping
Silencing
Routing

✅ 13. Pushgateway

What: Allows short-lived jobs to send data to Prometheus.
Why needed: Those jobs finish before Prometheus can scrape them.

✅ 14. Retention

What: How long Prometheus stores data.
Default: 15 days (can be customized)

✅ 15. TSDB (Time Series DB)

What: Internal database where Prometheus stores its data.
Supports: Fast reads, writes, and compression.

✅ 16. Service Discovery

What: Automatically detects targets (like Kubernetes pods, EC2).
Benefit: No need to manually add each new server.

✅ 17. Histogram

What: Metric type that counts values in buckets.
Used for: Request duration, response sizes.

✅ 18. Summary

What: Like histogram but shows percentiles (e.g., 95th percentile).
Used for: Latency measurement.

✅ 19. Gauge

What: Metric that goes up and down.
Examples: Memory usage, temperature.

✅ 20. Counter

What: Only increases.
Examples: Total HTTP requests, total errors.

✅ 21. Label Set

What: All labels assigned to a time series.
Helps to: Identify and group metrics.

✅ 22. Expression Browser

What: Built-in UI in Prometheus to test and run queries.
Use: For debugging or checking real-time metrics.

SHASHI KANT SHAH

Shashikant shah

Saturday, 4 April 2026

Introduction of Prometheus ?

Followers

Total Pageviews

DevOps Engineer