Promtail (push)
Promtail helps to monitor applications by shipping the container logs to Loki or Grafana cloud. This process primarily involves discovering targets, attaching labels to log streams from both log files and the systemd journal, and shipping them to Loki. Promtail’s service discovery is based on the Prometheus’ service discovery mechanism.
Loki
As quoted by creators of Loki, Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. Loki uses the same service discovery mechanism as that of Prometheus and adds labels to the log stream instead of indexing. Due to which, logs received from Promtail consist of the same set of labels as that of application metrics. Thus, it not only provides better context switching between logs and metrics but also avoids full index logging.
As quoted by creators of Loki, Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. Loki uses the same service discovery mechanism as that of Prometheus and adds labels to the log stream instead of indexing. Due to which, logs received from Promtail consist of the same set of labels as that of application metrics. Thus, it not only provides better context switching between logs and metrics but also avoids full index logging.
Grafana
Grafana is an open-source platform for monitoring and observability. It specifically operates on time-series data coming from sources like Prometheus and Loki. Moreover, it allows you to query, visualize, alert on the metrics regardless of its stored location. It helps to create, explore, and share dashboards and encourages data-driven culture.
Grafana is an open-source platform for monitoring and observability. It specifically operates on time-series data coming from sources like Prometheus and Loki. Moreover, it allows you to query, visualize, alert on the metrics regardless of its stored location. It helps to create, explore, and share dashboards and encourages data-driven culture.
Promtail --> Loki (logQL) --> Grafana
Install Loki
# cd /usr/local/bin
# curl -fSL -o loki.gz "https://github.com/grafana/loki/releases/download/v1.6.1/loki-linux-amd64.zip"
# gunzip loki.gz
# chmod a+x loki
# mkdir -p /etc/loki
# cd /etc/loki
# vim config-loki.yml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1 # private IP loki server
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
max_transfer_retries: 0
schema_config:
configs:
- from: 2018-04-15
store: boltdb
object_store: filesystem
schema: v11
index:
prefix: index_
period: 168h
storage_config:
boltdb:
directory: /tmp/loki/index
filesystem:
directory: /tmp/loki/chunks
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
# useradd --system loki
# vim /etc/systemd/system/loki.service
[Unit]
Description=Loki service
After=network.target
[Service]
Type=simple
User=loki
ExecStart=/usr/local/bin/loki -config.file /etc/loki/config-loki.yml
[Install]
WantedBy=multi-user.target
# systemctl daemon-reload
# systemctl start loki
# systemctl enable loki
# systemctl status loki
[Unit]
Description=Loki service
After=network.target
[Service]
Type=simple
User=loki
ExecStart=/usr/local/bin/loki -config.file /etc/loki/config-loki.yml
[Install]
WantedBy=multi-user.target
# systemctl daemon-reload
# systemctl start loki
# systemctl enable loki
# systemctl status loki
curl "127.0.0.1:3100/metrics"
#### worker node ##
# cd /usr/local/bin
# curl -fSL -o promtail.gz "https://github.com/grafana/loki/releases/download/v1.6.1/promtail-linux-amd64.zip"
# gunzip promtail.gz
# chmod a+x promtail
# mkdir -p /etc/promtail
# cd /etc/promtail
# vim config-promtail.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki_private_IP:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- node.example.com
labels:
job: varlogs
__path__: /var/log/*log
# vim /etc/systemd/system/promtail.service
[Unit]
Description=Promtail service
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/bin/promtail -config.file /etc/promtail/config-promtail.yml
[Install]
WantedBy=multi-user.target
# systemctl daemon-reload
# systemctl start promtail.service
# systemctl enable promtail.service
# systemctl status promtail.service
Description=Promtail service
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/bin/promtail -config.file /etc/promtail/config-promtail.yml
[Install]
WantedBy=multi-user.target
# systemctl daemon-reload
# systemctl start promtail.service
# systemctl enable promtail.service
# systemctl status promtail.service
Configure Loki Data Source
1.Login to Grafana web interface and select ‘Explore’. You will be prompted to create a data source.
2.Click on Add data source then select Loki from the
available options:
4. Input the following values for Loki:
Visualize Logs on Grafana with Loki
Click on Explore then select Loki at the Data source
Find
root value in logs
Alternatively, you can write a stream selector into
the query field:
{job="default/prometheus"}
Here are some example streams from your logs:
{job="varlogs"}
Combine stream selectors
{app="cassandra",namespace="prod"}
Filtering for search terms.
{app="cassandra"} |~ "(duration|latency)s*(=|is|of)s*[d.]+"
{app="cassandra"} |= "exact match"
{app="cassandra"} != "do not match"
Count over time
count_over_time({job="mysql"}[5m])
Rate
rate(({job="mysql"} |= "error" != "timeout")[10s])
This query gets the per-second rate of all non-timeout errors within the last ten seconds for the MySQL job.
Aggregate, count, and group
sum(count_over_time({job="mysql"}[5m])) by (level)
Get the count of logs during the last five minutes, grouping by level.
Some query for log count
count_over_time({filename="/var/log/syslog"} !="ERROR"[5m])
count_over_time({job="varlogs"} !="ERROR"[5m])
count_over_time({job="varlogs"} [2h])
Create loki dashboard
Create dashboard à loki select à add query
{job="default/prometheus"}
Here are some example streams from your logs:
{job="varlogs"}
{app="cassandra",namespace="prod"}
{app="cassandra"} |~ "(duration|latency)s*(=|is|of)s*[d.]+"
{app="cassandra"} |= "exact match"
{app="cassandra"} != "do not match"
count_over_time({job="mysql"}[5m])
rate(({job="mysql"} |= "error" != "timeout")[10s])
This query gets the per-second rate of all non-timeout errors within the last ten seconds for the MySQL job.
sum(count_over_time({job="mysql"}[5m])) by (level)
Get the count of logs during the last five minutes, grouping by level.
count_over_time({filename="/var/log/syslog"} !="ERROR"[5m])
count_over_time({job="varlogs"} !="ERROR"[5m])
count_over_time({job="varlogs"} [2h])
Create dashboard à loki select à add query
No comments:
Post a Comment