Monitoring Servers and Services with Prometheus Stack and Send Alerts

Posted By : Avinash Singh | 30-Nov-2021

ERP

Loading...

Let's first get some Introduction to Prometheus Stack, comprising Prometheus, Grafana, Alert manager, Cadvisor, and the Node Exporter.

Prometheus => Prometheus is an open-source monitoring solution backed by The Linux Foundation. Prometheus collects metrics from registered targets and stores them in its time series database and provides PromQL query language, which we can use to query from its collected metrics. We can set targets and rules in the Prometheus configuration file, according to which, It will send data to alert managers to publish alerts and can be easily integrated with Grafana to visualize metrics in Graphical view.

Alert Manager => The Alert manager is part of Prometheus open-source distributions, It handles alerts sent by the Prometheus server. It can group alerts and route them to the correct receiver medium like email, slack, PagerDuty.

Cadvisor => It is an open-source product by google. It monitors the container health and its metrics and can be easily integrated with Prometheus.

Node Exporter => It is a metrics collector which needs to be setup on all nodes which needs to be monitored, It collect system metrics like disk usage, memory information and other details and exports all metrics on an endpoint '/metrics', which can be pulled and collected by Prometheus.

Grafana => It is an open-source visualization and analytics tool. Grafana allows you to query, visualize, alerts on data received from data sources like Prometheus and provide a beautiful dashboard which can be easily imported and give all data in beautiful charts.

It is time for demo and see whole setup in working

Note:- For the demonstration purpose, I have also taken a Django service , PostgresSQL Database and Optaplanner service, and monitoring them, you can skip these services, to avoid any confusion, and for learning purpose, you can simply just focus on node exporter service.

Step 1:- create 2 servers, or have 2 machines in your local setup, such that both can connect with each other.

Configuration

Value

Operating System

Amazon 2 Linux

Instance Type

t3.small

Network

Public Network

Storage

16 GB EBS Storage

Step 2:- Allow these ports in security group or your firewall settings

Type

Protocol

Port Range

Description

Custom TCP Rule

TCP

3000

Grafana

Custom TCP Rule

TCP

9100

Node Exporter

Custom TCP Rule

TCP

9090

Prometheus

Custom TCP Rule

TCP

9093

Alert Manager

Custom TCP Rule

TCP

8080

Cadvisor

Step 3:- Installing docker and docker-compose on amazon2 linux server

sudo yum update -y

sudo amazon-linux-extras install docker -y

sudo service docker start

sudo usermod -a -G docker ec2-user

sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

sudo chmod +x /usr/local/bin/docker-compose

sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

Step 4:- Create Directory for Prometheus and modify file permissions

sudo mkdir -p /opt/docker/monitoring

sudo chown ec2-user:ec2-user /opt/docker/monitoring -R

Step 5:- Now we will create docker compose files for all services individually, and then combine them

  • Prometheus

prometheus:

image: prom/prometheus

container_name: prometheus

ports:

- '9090:9090'

command:

- '--config.file=/etc/prometheus/prometheus.yml' # this file contains , target information, what endpoint to monitor

volumes:

- './prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro'

- './prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml' # this file comprises rules to check with metrics, received

networks:

- monitoring

we will store, target information (what service to monitor at which endpoint, at what interval) in prometheus.yml

global:

scrape_interval: 10s

scrape_timeout: 5s

evaluation_interval: 15s

scrape_configs:

- job_name: 'prometheus-docker-service'

honor_timestamps: true

static_configs:

- targets: ['localhost:9090'] # prometheus monitoring itself

- job_name: 'prometheus-monitoring-server'

static_configs:

- targets: ['node-exporter:9100'] # monitoring node exporter of prometheus node

- job_name: app-optaplanner-container

metrics_path: /actuator/prometheus

scheme: http

static_configs:

- targets:

- 10.12.1.203:8888 # here our target is optaplanner java service, exposing its metrics on port 8888

Prometheus regularly check, whether all services are running as expected based on some predefined conditions, these conditions are defined in alert_rules.yml

groups:

- name: alert.rules

rules:

- alert: InstanceDown

# Condition for alerting

expr: up == 0

for: 2m

annotations:

title: 'Instance {{ $labels.instance }} down'

description: '{{ $labels.instance }} of job name {{ $labels.job }} has been down for more than 2 minutes.'

labels:

severity: 'critical'

  • alertmanager

alertmanager:

image: prom/alertmanager

ports:

- '9093:9093'

container_name: alertmanager

restart: always

volumes:

- './alertmanager/:/etc/alertmanager/'

command:

- '--config.file=/etc/alertmanager/config.yml'

networks:

- monitoring

If any service goes down, or not able to give response to prometheus, then alerts needs to be send, config.yml file contain , all information needed to send alerts via mail

global:

resolve_timeout: 2m

route:

receiver: 'email-notifications'

receivers:

- name: 'email-notifications'

email_configs:

- to: '[email protected],[email protected]'

from: '[email protected]'

smarthost: smtp.gmail.com:587

auth_username: '[email protected]'

auth_identity: '[email protected]'

auth_password: 'enter your passowrd'

send_resolved: true

  • cadvisor

cadvisor:

image: google/cadvisor

container_name: cadvisor

ports:

- '8080:8080'

volumes:

- '/:/rootfs:ro'

- '/var/run:/var/run:rw'

- '/sys:/sys:ro'

- '/var/lib/docker/:/var/lib/docker:ro'

networks:

- monitoring

  • Node Exporter

node-exporter:

image: prom/node-exporter

container_name: node-exporter

volumes:

- '/proc:/host/proc:ro'

- '/sys:/host/sys:ro'

- '/:/rootfs:ro'

command:

- '--path.procfs=/host/proc'

- '--path.sysfs=/host/sys'

- '--collector.filesystem.ignored-mount-points'

- ^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)

ports:

- '9100:9100'

networks:

- monitoring

  • grafana

grafana:

image: grafana/grafana

depends_on:

- prometheus

ports:

- '3000:3000'

volumes:

- 'grafana_data:/var/lib/grafana'

- ./grafana:/etc/grafana/provisioning/datasources

container_name: grafana

environment:

- [email protected]

- GF_SECURITY_ADMIN_PASSWORD=erpsolutions.oodles.io

networks:

- monitoring

Now we need to put all service in a single compose file, and make the service up

docker-compose.yml

version: '3'

services:

prometheus:

.

. # take code from above prometheus block

.

cadvisor:

.

. # take code from above cadvisor block

.

alertmanager:

.

. # take code from above alertmanager block

.

node-exporter:

.

. # take code from above node-exporter block

.

grafana:

.

. # take code from above grafana block

.

networks:

monitoring:

external: true

volumes:

grafana_data: {}

our next task is login on all server, which needs to be monitored, and run node-exporter container service on them

and at last, on the prometheus server, we will make pull all defined docker images and bring the docker-compose setup up

docker-compose -f /opt/docker/monitoring/docker-compose.yml up -d

Prometheus Service HomePage

http://ip_address_of_server:9090

To go to target section -> prometheus main page -> status > targets

Rules Page, To go to rules section -> prometheus main page -> status > rules

Grafana Service Homepage

http://ip_address_of_server:9090

To Import any new dashboard, you can go to this page.

Grafana Home -> Dashboard -> Manage -> Import -> Enter Dashboard Id, Or Paste Dashboard Json Data

Here are some beautiful Dashboards

PostgresSQL Database Metrics

Node Exporter Dashboard

Django Dashboard

If any of our server, which have node exporter configured and connected with prometheus goes down, we will recieve email alerts

Conclusion => We have set up Prometheus stack, and are now able to monitor the status of our service and servers , and will get alerts, if any service is not stable.

We are a prominent ERP development company that provides 360-degree enterprise solutions for diverse business needs of our clients. Our seasoned developers use the latest tech stack and development tools to build scalable business applications with custom features. For more information, contact us at [email protected].