Shashikant shah

Tuesday 12 January 2021

Prometheus and Grafana

 

 

Two servers

1.server – install Prometheus and Grafana, AlertManager, push_gateway.

2.worker node – install node_exporter, nginx_exporter,  nginxlog exporter , blackbox exporter.











Server Node :-

exporter --> prometheus(promQL) --> grafana

Prometheus :-
Prometheus is a monitoring tool designed for recording real-time metrics in a time-series database. It is an open-source software project, written in Go. The Prometheus metrics are collected using HTTP pulls, allowing for higher performance and scalability.
 
Other tools which make Prometheus complete monitoring tool are:

Exporters:- These are libraries that help with exporting metrics from third-party systems as Prometheus.
 
1.Node-exporters :- Node Exporter is an 'official' exporter that collects technical information from Linux nodes, such as CPU, Disk, Memory statistics.
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Pushgateway :- we will push some custom metrics to pushgateway and configure prometheus to scrape metrics from pushgateway.
 

 








Alertmanager :- we would like to alarm based on certain metric dimensions. That’s where alertmanager fits in. We can setup targets and rules, once rules for our targets does not match, we can alarm to destinations suchs as slack, email etc.
 

Blackbox exporter :- Blackbox Exporter to Monitor Websites With Prometheus. Blackbox Exporter by Prometheus allows probing over endpoints such as http, https, icmp, tcp and dns.


 
 
 
 
 
 
 
metrics:
i)Targets (linux,window,application) à cpu status, mem/disk usage, request count  à unit called a matric and matric save in Prometheus DB.
ii)metrics Format - Human-readable text-based.
HELP :- description of what the metrics is.
 
Type :- 4 metrics types.
1) counter :- How many times X happened.(only increase value hogi, descrise nhi hogi.)
              i) number of requests served.
              ii)tasks completed or errors.
2) gauge :- what is the cuttent valume of X now? (increase and descise dono hoga. cpu load now, disk space now.)
3) summary :- How long something took Or How big something was
              i) Count shows number of time event observered.
              ii) sum shows sum of times taken by that event.
4) Histogram :- How long how big.
 
 
PromQL: Prometheus query language which allows you to filter multi-dimensional time series data.
 
 
Grafana is a tool commonly used to visualize data polled by Prometheus, for monitoring, and analysis. It is used to create dashboards with panels representing specific metrics over a set period of time.
1.Create Prometheus system group
sudo groupadd --system prometheus
sudo useradd -s /sbin/nologin --system -g prometheus prometheus
 
2.Prometheus needs a directory to store its data.
sudo mkdir /var/lib/prometheus
for i in rules rules.d files_sd; do sudo mkdir -p /etc/prometheus/${i}; done
sudo apt update
sudo apt -y install wget curl vim
 
3.Download Prometheus
mkdir -p /tmp/prometheus && cd /tmp/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.23.0/prometheus-2.23.0.linux-amd64.tar.gz
tar xvf prometheus*.tar.gz
cd prometheus*/
sudo mv prometheus promtool /usr/local/bin/
 
prometheus --version
promtool --version
 
sudo mv prometheus.yml /etc/prometheus/prometheus.yml
sudo mv consoles/ console_libraries/ /etc/prometheus/
 
 
4.Configure Prometheus
sudo vim /etc/prometheus/prometheus.yml
- job_name: 'prometheus'
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:
    - targets: ['localhost:9090']
 
 
How to verify prometheus configuation file :-
  
 
 
5.Create a Prometheus systemd Service unit file
sudo vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target
 
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP \$MAINPID
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \  ## (using Private IP for security purpose)

SyslogIdentifier=prometheus
Restart=always
 
[Install]
WantedBy=multi-user.target
 
OR
##########################
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.enable-admin-api \
  --web.enable-lifecycle

SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target
######################
 
6.Change directory permissions.
for i in rules rules.d files_sd; do sudo chown -R prometheus:prometheus /etc/prometheus/${i}; done
for i in rules rules.d files_sd; do sudo chmod -R 775 /etc/prometheus/${i}; done
sudo chown -R prometheus:prometheus /var/lib/prometheus/
 
7.Reload systemd daemon and start the service:
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
sudo systemctl status prometheus
 
OR

#htpasswd -c /etc/nginx/.htpasswd admin
 
#vim /etc/nginx/sites-enabled/prometheus.conf
server {
    listen 80 default_server;
 
    location / {
            auth_basic "Prometheus Auth";
            auth_basic_user_file /etc/nginx/.htpasswd;
            proxy_pass http://localhost:9090;
        }
}
 
http://13.127.100.171/
 
Grafana side :-
1.Source add URL
2.Basic auth enable.
3.Add username and password
 
http://13.127.100.171:9090/












Note :-

if reload prometheus from client side.
#curl -X POST http://localhost:9090/-/reload

Install Grafana ubuntu 20.4

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana
 

sudo systemctl start grafana-server

sudo systemctl enable grafana-server

sudo systemctl status grafana-server

Default logins are:

Username: admin
Password: admin


Grafana Package details:

Installs binary to /usr/sbin/grafana-server

Installs Init.d script to /etc/init.d/grafana-server

Creates default file (environment vars) to /etc/default/grafana-server

Installs configuration file to /etc/grafana/grafana.ini

Installs systemd service (if systemd is available) name grafana-server.service

The default configuration sets the log file at /var/log/grafana/grafana.log

The default configuration specifies a sqlite3 db at /var/lib/grafana/grafana.db

Installs HTML/JS/CSS and other Grafana files at /usr/share/Grafana

Install plugin cli
# grafana-cli plugins install grafana-image-renderer
 
http://13.127.100.171:3000/login










Go to “data source” – add data source – select Prometheus








Add Prometheus URL http://13.127.100.171:9090









Worker Node :-
Node exporter
 
# wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
# tar -xf node_exporter-0.17.0.linux-amd64.tar.gz
# cp node_exporter-0.17.0.linux-amd64/node_exporter /usr/local/bin
# chown root:root /usr/local/bin/node_exporter
# rm -rf node_exporter-0.17.0.linux-amd64*
 
node export default port 9100.
change port 9501
 
$ vim /etc/systemd/system/node_exporter.service
 
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
 
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9501
 
[Install]
WantedBy=multi-user.target
 
$ systemctl daemon-reload
$ systemctl start node_exporter
$ systemctl enable node_exporter
$ systemctl status node_exporter
 
http://clientIP:9501/metrics












Server node :-
Add node exporter target in  prometheus.yml
 
# vim /etc/Prometheus/prometheus.yml
  - job_name: 'prometheus'
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:
    - targets: ['localhost:9090']
 
   - job_name: 'node_example_com'
    scrape_interval: 5s
    static_configs:
    - targets: ['172.31.39.204:9501']
 
# systemctl restart prometheus
# systemctl status Prometheus












Grafana :-






Nginx connection 
Enable NGINX Status Page
# nginx -V 2>&1 | grep -o with-http_stub_status_module
 
server {
 
  listen 80 default_server;
  # remove the escape char if you are going to use this config
  server_name \_;
 
  root /var/www/html;
  index index.html index.htm index.nginx-debian.html;
 
  location /nginx_status {
        stub_status;
       # allow 0.0.0.0;  #only allow requests from localhost
      #  deny all;               #deny all other hosts
  }
 
  location / {
    try_files $uri $uri/ =404;
  }
 
}
 
#cd /tmp
 
#wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.7.0/nginx-prometheus-exporter-0.7.0-linux-amd64.tar.gz
#tar -xf nginx-prometheus-exporter-0.7.0-linux-amd64.tar.gz
#mv nginx-prometheus-exporter /usr/local/bin
#useradd -r nginx_exporter
# Create Systemd Service File
 
#vim /etc/systemd/system/nginx_prometheus_exporter.service
[Unit]
Description=NGINX Prometheus Exporter
After=network.target
 
[Service]
Type=simple
User=nginx_exporter
Group=nginx_exporter
ExecStart=/usr/local/bin/nginx-prometheus-exporter -web.listen-address=":9113" -nginx.scrape-uri http://127.0.0.1/nginx_status
 
SyslogIdentifier=nginx_prometheus_exporter
Restart=always
 
[Install]
WantedBy=multi-user.target
 
#systemctl daemon-reload
#service nginx_prometheus_exporter status
#service nginx_prometheus_exporter start











Prometheus side :-
 
# vim /etc/prometheus/prometheus.yml
- job_name: 'nginx'
    scrape_interval: 7s
    static_configs:
    - targets: ['172.31.39.204:9113']






















Add Query and save

 











Change Visualization :-

























Use plugin in Grafana for nginx service :

Code no :- 12708

https://grafana.com/grafana/dashboards/12708












Nginx stop in Worker node:-
















Monitoring Nginx status count like 200, 300,404 from different logs.

1)/var/log/nginx/access_shashi.log

2) /var/log/nginx/access.log

Worker node :-

# vim /etc/nginx/nginx.conf










# logging config
          log_format custom   '$remote_addr - $remote_user [$time_local] '
                              '"$request" $status $body_bytes_sent '
                              '"$http_referer" "$http_user_agent" "$http_x_forwarded_for"';
 
# rm -rf /etc/nginx/sites-enabled/default
 
# cat /etc/nginx/conf.d/myapp.conf
 
server {
 
  listen 80 default_server;
  # remove the escape char if you are going to use this config
  server_name \_;
 
  root /var/www/html;
  index index.html index.htm index.nginx-debian.html;
 
  location / {
    try_files $uri $uri/ =404;
  }
 
}
 
# cat /etc/nginx/conf.d/shashi.conf
server {
 
  listen 81 default_server;
  # remove the escape char if you are going to use this config
  server_name \_;
 
  root /var/www/html;
  index index.html index.htm index.nginx-debian.html;
 
   access_log /var/log/nginx/access_shashi.log custom;
   error_log /var/log/nginx/error_shashi.log;
  location / {
    try_files $uri $uri/ =404;
  }
 
}
 
# systemctl status nginx
# systemctl restart nginx
 
Download Nginx Log Exporter
 
# wget https://github.com/martin-helmich/prometheus-nginxlog-exporter/releases/download/v1.4.0/prometheus-nginxlog-exporter
 
# chmod +x prometheus-nginxlog-exporter
# mv prometheus-nginxlog-exporter /usr/bin/prometheus-nginxlog-exporter
 
# mkdir /etc/prometheus
 
# vim /etc/prometheus/nginxlog_exporter.yml
 
listen:
  port: 4040
  address: "0.0.0.0"
 
consul:
  enable: false
 
namespaces:
  - name: shashi_log
    format: "$remote_addr - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\" \"$http_x_forwarded_for\""
    source:
      files:
        - /var/log/nginx/access_shashi.log
 
    labels:
      service: "shashi_log"
      environment: "production"
      hostname: "shashi_log.example.com"
    histogram_buckets: [.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
 
    namespaces:
  - name: myapp_log
    format: "$remote_addr - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\" \"$http_x_forwarded_for\""
    source:
      files:
             - /var/log/nginx/access.log
 
    labels:
      service: "myapp"
      environment: "production"
      hostname: "myapp.example.com"
    histogram_buckets: [.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]

 

















# vim /etc/systemd/system/nginxlog_exporter.service

[Unit]

Description=Prometheus Log Exporter

Wants=network-online.target

After=network-online.target

 

[Service]

User=root

Group=root

Type=simple

ExecStart=/usr/bin/prometheus-nginxlog-exporter -config-file /etc/prometheus/nginxlog_exporter.yml

[Install]

WantedBy=multi-user.target

# systemctl daemon-reload

# systemctl enable nginxlog_exporter

# systemctl restart nginxlog_exporter

# systemctl status nginxlog_exporter

curl http://localhost:4040/metrics



















Server side :-

# vim /etc/prometheus/ prometheus.yml

  - job_name: 'log_nginx'

    scrape_interval: 10s

    static_configs:

    - targets: ['172.31.39.204:4040']

# systemctl restart Prometheus

# systemctl status Prometheus

eg :- <namespace>_http_response_count_total

Execute :- shashi_log_http_response_count_total

Execute :- myapp_http_response_count_total



























Grafana :-









configuring-grafana-and-prometheus-alertmanager

Custom rules

1.How many memory free in percent for node. 

1.Create Rule file .
# /etc/prometheus/rules/prometheus_rules.yml
groups:
  - name: custom_rules
    rules:
      - record: node_memory_MemFree_percent
        expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
 
2.We will be check rule file.
# promtool check rules prometheus_rules.yml









3. prometheus_rules.yml file add in /etc/prometheus/ prometheus.yml

# vim /etc/prometheus/ prometheus.yml

rule_files:

  - rules/prometheus_rules.yml

# systemctl  daemon-reload

# systemctl restart prometheus

# systemctl status prometheus

4. Go to Prometheus URL

# select status à Configuration








# select à Rules







# execute query – node_memory_MemFree_percent










Example 2 :-
 
Free disk space in percent
 
# vim /etc/prometheus/rules/prometheus_rules.yml
 
- record: node_filesystem_free_percent
        expr: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}










# promtool check rules prometheus_rules.yml
# systemctl restart prometheus
# systemctl status prometheus










Alerts Rules :-
1.rule for instance Down
2.rule for DiskSpaceFree10Percent less
 
# vim /etc/prometheus/rules/prometheus_alert_rules.yml
groups:
  - name: alert_rules
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance [{{ $labels.instance }}] down"
          description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."
 
      - alert: DiskSpaceFree10Percent
        expr: node_filesystem_free_percent <= 10
        labels:
          severity: warning
        annotations:
          summary: "Instance [{{ $labels.instance }}] has 10% or less Free disk space"
          description: "[{{ $labels.instance }}] has only {{ $value }}% or less free."














# promtool check rules prometheus_alert_rules.yml

# vim /etc/Prometheus/prometheus.yml

rule_files:

  - rules/prometheus_rules.yml

  - rules/prometheus_alert_rules.yml

# systemctl  daemon-reload

# systemctl restart prometheus           

# systemctl status Prometheus
















Select status à rules























Alert Manager Setup

 

# wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz

# tar xvf alertmanager-0.21.0.linux-amd64.tar.gz

# cd alertmanager-0.21.0.linux-amd64

# cp -rvf alertmanager /usr/local/bin/

# cp -rvf amtool /usr/local/bin/

# cp -rvf alertmanager.yml /etc/prometheus/

 

#  vim /etc/systemd/system/alertmanager.service

[Unit]

Description=Prometheus Alert Manager Service

After=network.target

 

[Service]

Type=simple

ExecStart=/usr/local/bin/alertmanager \

        --config.file=/etc/prometheus/alertmanager.yml

[Install]

WantedBy=multi-user.target


Change alertmanager.yml

global:
  resolve_timeout: 5m
 
route:
  group_by: ['alertname']
  receiver: 'email-me'
receivers:
- name: 'email-me'
  email_configs:
  - send_resolved: true
     to: devopstest11@gmail.com
    from: devopstest11@gmail.com
    smarthost: smtp.gmail.com:587
    auth_username: "devopstest11@gmail.com"
    auth_identity: "devopstest11@gmail.com"
    auth_password: "pass@123"
 
# amtool check-config alertmanager.yml
# service alertmanager start
# service alertmanager status
 
#vim /etc/prometheus/prometheus.yml
 
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - localhost:9093
# systemctl restart prometheus
# systemctl status prometheus
http://13.127.100.171:9090/status
select status à Runtime & build information.






























Worker node  -
# systemctl stop node_exporter.service
 
Server node :-
Logs :
# tail -f  /var/log/syslog
 
Go to setting à security
NOTE :- Less Secure app access  :- ON


























Worker node  -

# systemctl start node_exporter.service

(receive mail issue resolved).












1.Inspect option for data insert from Prometheus and Rename panel title from JSON.
  inspect – (data , stats, JSON, Query)
2.How to restore old dashboard.
setting – version
3.manule add metrics
Add panel à (panel name ) edit à metrics











Pushgateway :-

In this tutorial, we will setup pushgateway on linux machine and push some custom metrics to pushgateway and configure prometheus to scrape metrics from pushgateway.

1.Install Pushgateway Exporter.

# wget https://github.com/prometheus/pushgateway/releases/download/v0.8.0/pushgateway-0.8.0.linux-amd64.tar.gz

# tar -xvf pushgateway-0.8.0.linux-amd64.tar.gz

# cp pushgateway-0.8.0.linux-amd64/pushgateway /usr/local/bin/pushgateway

# chown root:root /usr/local/bin/pushgateway

 

# vim /etc/systemd/system/pushgateway.service

[Unit]

Description=Pushgateway

Wants=network-online.target

After=network-online.target

[Service]

User=root

Group=root

Type=simple

ExecStart=/usr/local/bin/pushgateway

[Install]

WantedBy=multi-user.target

 

# systemctl daemon-reload

# systemctl restart pushgateway

# systemctl status pushgateway

 

# vim /etc/prometheus/prometheus.yml

  - job_name: 'pushgateway'

    honor_labels: true

    static_configs:

      - targets: ['localhost:9091']

# systemctl restart prometheus

Run below command from Client side:-

# echo "cpu_utilization 20.25" | curl --data-binary @- http://localhost:9091/metrics/job/my_custom_metrics/instance/client_host/cpu/load

Take a look at the metrics endpoint of the pushgateway:

# curl -L  http://172.31.5.171:9091/metrics/  2>&1| grep "cpu_utilization"




## Pushgateway URL

 





## Go to Prometheus URL


 

 

 

 

 

 

BlackBox Exporter :-

Client side configuration of BlackBox.

# cd /opt

# wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.14.0/blackbox_exporter-0.14.0.linux-amd64.tar.gz

# tar -xvf blackbox_exporter-0.14.0.linux-amd64.tar.gz

# cp blackbox_exporter-0.14.0.linux-amd64/blackbox_exporter /usr/local/bin/blackbox_exporter

# rm -rf blackbox_exporter-0.14.0.linux-amd64*

# mkdir /etc/blackbox_exporter

# vim /etc/blackbox_exporter/blackbox.yml

modules:

  http_2xx:

    prober: http

    timeout: 5s

    http:

      valid_status_codes: []

      method: GET

#  vim /etc/systemd/system/blackbox_exporter.service

[Unit]

Description=Blackbox Exporter

Wants=network-online.target

After=network-online.target

 

[Service]

User=root

Group=root

Type=simple

ExecStart=/usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml

[Install]

WantedBy=multi-user.target

# systemctl daemon-reload

# systemctl start blackbox_exporter

# systemctl status blackbox_exporter

# systemctl enable blackbox_exporter

Note :- nginx is running 8281 and not running 8282 on client side #

Prometheus server side :-

# vim /etc/prometheus/prometheus.yml

  - job_name: 'blackbox'

    metrics_path: /probe

    params:

      module: [http_2xx]

    static_configs:

      - targets:

        - http://172.31.42.127:8281

        - http://172.31.42.127:8282

    relabel_configs:

      - source_labels: [__address__]

        target_label: __param_target

      - source_labels: [__param_target]

        target_label: instance

      - target_label: __address__

        replacement: 172.31.42.127:9115

# systemctl restart prometheus

# systemctl status prometheus

# verify blackBox exporter  

# http://52.66.196.119:9115/metrics

 











 

# verify blackbox status from Prometheus .

 









No comments:

Post a Comment