Thank you for reading this post, don't forget to subscribe!

Начал обновлять дашборды в Grafana, и столкнулся с двумя интересными вещами.

Первое – что же на самом деле показывается в CloudWatch для сети в графиках NetworkIn/Out (Bytes), как эти данные правильно интерпретировать, и как данные CloudWatch коррелируют с данными самого node_exporter?

Второе – почему node_exporter должен быть запущен именно в режиме host network mode?

Сначала разберёмся с тем, что именно показывает CloudWatch: запустим тестовый EC2, там в Docker запустим node_exporter, подключим его в Prometheus, нагрузим сеть и посмотрим на графики от node_exporter и в CloudWatch.

Метрики `node_exporter` и AWS CloudWatch

Запускаем инстанс, тут t2.small, устанавливаем Docker and Docker Compose:

root@ip-172-31-17-58:/home/ubuntu# curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
root@ip-172-31-17-58:/home/ubuntu# echo \
> "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
>   $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
root@ip-172-31-17-58:/home/ubuntu# apt-get update && apt-get -y install docker-ce docker-ce-cli containerd.io
root@ip-172-31-17-58:/home/ubuntu# curl -s -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
root@ip-172-31-17-58:/home/ubuntu# chmod +x /usr/local/bin/docker-compose

root@ip-172-31-17-58:/home/ubuntu# curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

root@ip-172-31-17-58:/home/ubuntu# echo \

> "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \

> $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

root@ip-172-31-17-58:/home/ubuntu# apt-get update && apt-get -y install docker-ce docker-ce-cli containerd.io

root@ip-172-31-17-58:/home/ubuntu# curl -s -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

root@ip-172-31-17-58:/home/ubuntu# chmod +x /usr/local/bin/docker-compose

Запуск `node_exporter`

Готовим файл Docker Compose для node_exporter:

---
version: '3.8'

services:
  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

---

version: '3.8'

services:

node_exporter:

image: quay.io/prometheus/node-exporter:latest

container_name: node_exporter

command:

- '--path.rootfs=/host'

network_mode: host

pid: host

restart: unless-stopped

volumes:

- '/:/host:ro,rslave'

Запускаем

root@ip-172-31-17-58:/home/ubuntu# docker-compose -f node-exporter-compose.yaml up

Pulling node_exporter (quay.io/prometheus/node-exporter:latest)…

latest: Pulling from prometheus/node-exporter

…

Проверяем, есть ли данные:

root@ip-172-31-17-58:/home/ubuntu# curl -s localhost:9100/metrics | head -5

HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.

TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 0

go_gc_duration_seconds{quantile="0.25"} 0

go_gc_duration_seconds{quantile="0.5"} 0

Подключение Prometheus и Grafana

Добавляем его в Prometehus – тут у меня уже есть запущенный инстанс на нашем Dev-мониторинге:

...
  - job_name: 'node-exporter'
    metrics_path: '/metrics'
    static_configs:
      - targets:
        - '18.117.88.151:9100'   # test node
...
    metric_relabel_configs:

        # test node
      - source_labels: [instance]
        regex: '18.117.88.151:9100'
        target_label: host
        replacement: 'test-node-exporter'
...

...

- job_name: 'node-exporter'

metrics_path: '/metrics'

static_configs:

- targets:

- '18.117.88.151:9100' # test node

...

metric_relabel_configs:

# test node

- source_labels: [instance]

regex: '18.117.88.151:9100'

target_label: host

replacement: 'test-node-exporter'

...

Перезапускаем, проверяем новый таргет:

root@monitoring-dev:/home/admin# systemctl restart prometheus

Добавляем графики в Grafana, используем:

rate(node_network_receive_bytes_total{host="test-node-exporter", device="eth0"}[5m])
rate(node_network_transmit_bytes_total{host="test-node-exporter", device="eth0"}[5m])

rate(node_network_receive_packets_total{host="test-node-exporter", device="eth0"}[5m])
rate(node_network_transmit_packets_total{host="test-node-exporter", device="eth0"}[5m])

rate(node_network_receive_bytes_total{host="test-node-exporter", device="eth0"}[5m])

rate(node_network_transmit_bytes_total{host="test-node-exporter", device="eth0"}[5m])

rate(node_network_receive_packets_total{host="test-node-exporter", device="eth0"}[5m])

rate(node_network_transmit_packets_total{host="test-node-exporter", device="eth0"}[5m])

Проверяем график:

Тестирование сети с `iperf`

Устанавливаем iperf на тестовой машине и на сервере мониторинга:

root@monitoring-dev:/home/admin# apt -y install iperf

На тестовой машине запускаем iperf в режиме server (-s) – он будет принимать трафик:

root@ip-172-31-17-58:/home/ubuntu# iperf -s -w 2m
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------

А на сервере мониторинга – в режиме клиента, гоним трафик на тестовую машину, запускаем на 1800 секунд:

root@monitoring-dev:/home/admin# iperf -c 18.117.88.151 -w 2 -i 1s -i 1800
WARNING: TCP window size set to 2 bytes. A small window size
will give poor performance. See the Iperf documentation.
------------------------------------------------------------
Client connecting to 18.117.88.151, TCP port 5001
TCP window size: 4.50 KByte (WARNING: requested 2.00 Byte)
------------------------------------------------------------
[  3] local 10.0.0.8 port 50174 connected with 18.117.88.151 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  49.8 MBytes   417 Mbits/sec
[  3]  1.0- 2.0 sec  50.9 MBytes   427 Mbits/sec
[  3]  2.0- 3.0 sec  51.6 MBytes   433 Mbits/sec
[  3]  3.0- 4.0 sec  51.9 MBytes   435 Mbits/sec
[  3]  4.0- 5.0 sec  51.8 MBytes   434 Mbits/sec
[  3]  5.0- 6.0 sec  52.5 MBytes   440 Mbits/sec
[  3]  6.0- 7.0 sec  48.5 MBytes   407 Mbits/sec
[  3]  7.0- 8.0 sec  46.4 MBytes   389 Mbits/sec
...

root@monitoring-dev:/home/admin# iperf -c 18.117.88.151 -w 2 -i 1s -i 1800

WARNING: TCP window size set to 2 bytes. A small window size

will give poor performance. See the Iperf documentation.

------------------------------------------------------------

Client connecting to 18.117.88.151, TCP port 5001

TCP window size: 4.50 KByte (WARNING: requested 2.00 Byte)

------------------------------------------------------------

[ 3] local 10.0.0.8 port 50174 connected with 18.117.88.151 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0- 1.0 sec 49.8 MBytes 417 Mbits/sec

[ 3] 1.0- 2.0 sec 50.9 MBytes 427 Mbits/sec

[ 3] 2.0- 3.0 sec 51.6 MBytes 433 Mbits/sec

[ 3] 3.0- 4.0 sec 51.9 MBytes 435 Mbits/sec

[ 3] 4.0- 5.0 sec 51.8 MBytes 434 Mbits/sec

[ 3] 5.0- 6.0 sec 52.5 MBytes 440 Mbits/sec

[ 3] 6.0- 7.0 sec 48.5 MBytes 407 Mbits/sec

[ 3] 7.0- 8.0 sec 46.4 MBytes 389 Mbits/sec

...

Сначала был спайк, потом стабилизировалось на 15 мегабайтах/секунду:

А клиент iperf на хосте мониторинга говорит нам:

[  3] 1718.0-1719.0 sec  14.6 MBytes   123 Mbits/sec
[  3] 1719.0-1720.0 sec  14.5 MBytes   122 Mbits/sec
[  3] 1720.0-1721.0 sec  14.6 MBytes   123 Mbits/sec
[  3] 1721.0-1722.0 sec  14.5 MBytes   122 Mbits/sec
[  3] 1722.0-1723.0 sec  14.6 MBytes   123 Mbits/sec
[  3] 1723.0-1724.0 sec  14.5 MBytes   122 Mbits/sec
[  3] 1724.0-1725.0 sec  14.5 MBytes   122 Mbits/sec

Переводим биты в байты – и получаем те же 15 MB/s:

echo 122/8 | bc

CloudWatch vs `node_exporter`

Теперь смотрим в CloudWatch:

У нас есть:

4.862.000.000 байт (Statistic – Sum)
переданных за 5 минут (Period – 5 Minutes)

Что бы перевести это в мегайбайты в секунду – выполняем:

делим 4.862.000.000 на 300 – количество секунд в 5 минутах
и результат делим на 1024 два раза – байты в килобайты, потом в мегабайты

Что бы получить биты в секунду – потом ещё раз умножаем на 8.

Считаем:

4862000000/300/1024/1024

Искомые 15 MByte/sec, или 120 Mbit/sec.

`node_exporter` и Docker Host Mode

И вторая интересная штука – почему node_exporter должен запускаться в режиме сети host?

Для проверки – перезапустим node_exporter, но уберём network_mode: host:

---
version: '3.8'

services:

  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

---

version: '3.8'

services:

node_exporter:

image: quay.io/prometheus/node-exporter:latest

container_name: node_exporter

command:

- '--path.rootfs=/host'

pid: host

restart: unless-stopped

volumes:

- '/:/host:ro,rslave'

Повторяем тест с iperf, и видим… ничего:

72 байт/секунду, хотя iperf выдаёт тот же результат в районе ~120 мбит/сек.

`node_exporter` и `netstat`

Для начала посмотрим в документацию node_exporter – как именно он снимает данные о сети?

netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. Linux

Читает содержимое /proc/net/netstat.

А чем у нас отличается Docker host mode от bridge mode? Читаем документацию Docker:

container’s network stack is not isolated from the Docker host (the container shares the host’s networking namespace)

Ну и давайте убедимся: запустим параллельно два node_exporter – один в host mode, второй – в bridge:

---
version: '3.8'

services:

  node_exporter_1:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter_host
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

  node_exporter_2:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter_bridge
    command:
      - '--path.rootfs=/host'
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

---

version: '3.8'

services:

node_exporter_1:

image: quay.io/prometheus/node-exporter:latest

container_name: node_exporter_host

command:

- '--path.rootfs=/host'

network_mode: host

pid: host

restart: unless-stopped

volumes:

- '/:/host:ro,rslave'

node_exporter_2:

image: quay.io/prometheus/node-exporter:latest

container_name: node_exporter_bridge

command:

- '--path.rootfs=/host'

pid: host

restart: unless-stopped

volumes:

- '/:/host:ro,rslave'

Находим Container ID:

root@ip-172-31-17-58:/home/ubuntu# docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED          STATUS         PORTS      NAMES
47b7fd130812   quay.io/prometheus/node-exporter:latest   "/bin/node_exporter …"   9 seconds ago    Up 8 seconds   9100/tcp   node_exporter_bridge
daff8458e7bc   quay.io/prometheus/node-exporter:latest   "/bin/node_exporter …"   54 seconds ago   Up 8 seconds              node_exporter_host

root@ip-172-31-17-58:/home/ubuntu# docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

47b7fd130812 quay.io/prometheus/node-exporter:latest "/bin/node_exporter …" 9 seconds ago Up 8 seconds 9100/tcp node_exporter_bridge

daff8458e7bc quay.io/prometheus/node-exporter:latest "/bin/node_exporter …" 54 seconds ago Up 8 seconds node_exporter_host

Используя CID 47b7fd130812 – контейнер с node_exporter в bridge mode – находим PID, с которым он запущен на хосте:

root@ip-172-31-17-58:/home/ubuntu# docker inspect -f '{{.State.Pid}}' 47b7fd130812

4561

Используя nsenter – через network namespace процесса проверяем содержимое /proc/net/netstat:

root@ip-172-31-17-58:/home/ubuntu# nsenter --net=/proc/4561/ns/net cat /proc/net/netstat
...
TcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
IpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

root@ip-172-31-17-58:/home/ubuntu# nsenter --net=/proc/4561/ns/net cat /proc/net/netstat

...

TcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

...

IpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Пусто.

Повторяем тоже самое для контейнера в режиме сети host – находим PID:

root@ip-172-31-17-58:/home/ubuntu# docker inspect -f '{{.State.Pid}}' daff8458e7bc

4505

И проверяем данные в неймспейсе:

root@ip-172-31-17-58:/home/ubuntu# nsenter --net=/proc/4505/ns/net cat /proc/net/netstat
...
TcpExt: 0 0 0 8 23 0 0 0 0 0 100 0 0 0 0 95 1 9 0 0 17105774 3547 3110 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 36 1 0 0 0 3758 0 9 0 0 0 2 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62 217266 0 0 1 1 0 0 0 0 0 0 0 0 0 2261 0 0 81 17 11692 2 32 0 0 0 0 0 0 0 0 0 0 0 0 11823 0 126847 0 0 0 0 21 0 0 0
...
IpExt: 0 0 0 4 0 0 72718074409 510047067 0 160 0 0 0 49856224 0 22 0 0

root@ip-172-31-17-58:/home/ubuntu# nsenter --net=/proc/4505/ns/net cat /proc/net/netstat

...

TcpExt: 0 0 0 8 23 0 0 0 0 0 100 0 0 0 0 95 1 9 0 0 17105774 3547 3110 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 36 1 0 0 0 3758 0 9 0 0 0 2 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62 217266 0 0 1 1 0 0 0 0 0 0 0 0 0 2261 0 0 81 17 11692 2 32 0 0 0 0 0 0 0 0 0 0 0 0 11823 0 126847 0 0 0 0 21 0 0 0

...

IpExt: 0 0 0 4 0 0 72718074409 510047067 0 160 0 0 0 49856224 0 22 0 0

Повторим напрямую на хост-машине:

root@ip-172-31-17-58:/home/ubuntu# cat /proc/net/netstat
...
TcpExt: 0 0 0 8 23 0 0 0 0 0 100 0 0 0 0 95 1 9 0 0 17105774 3555 3116 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 36 1 0 0 0 3758 0 9 0 0 0 2 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62 217266 0 0 1 1 0 0 0 0 0 0 0 0 0 2269 0 0 81 17 11717 2 32 0 0 0 0 0 0 0 0 0 0 0 0 11848 0 126847 0 0 0 0 21 0 0 0
...
IpExt: 0 0 0 4 0 0 72718075935 510071965 0 160 0 0 0 49856242 0 22 0 0

root@ip-172-31-17-58:/home/ubuntu# cat /proc/net/netstat

...

TcpExt: 0 0 0 8 23 0 0 0 0 0 100 0 0 0 0 95 1 9 0 0 17105774 3555 3116 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 36 1 0 0 0 3758 0 9 0 0 0 2 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62 217266 0 0 1 1 0 0 0 0 0 0 0 0 0 2269 0 0 81 17 11717 2 32 0 0 0 0 0 0 0 0 0 0 0 0 11848 0 126847 0 0 0 0 21 0 0 0

...

IpExt: 0 0 0 4 0 0 72718075935 510071965 0 160 0 0 0 49856242 0 22 0 0

Всё сходится – контейнер в режиме host использует данные с хоста.

Centos

Prometheus: мониторинг сети с node_exporter – метрики сети CloudWatch и Docker –net=host

Метрики `node_exporter` и AWS CloudWatch

Запуск `node_exporter`

Подключение Prometheus и Grafana

Тестирование сети с `iperf`

CloudWatch vs `node_exporter`

`node_exporter` и Docker Host Mode

`node_exporter` и `netstat`

https://github.com/midnight47/

Апрель 2024
Пн	Вт	Ср	Чт	Пт	Сб	Вс
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Метрики node_exporter и AWS CloudWatch

Запуск node_exporter

Подключение Prometheus и Grafana

Тестирование сети с iperf

CloudWatch vs node_exporter

node_exporter и Docker Host Mode

node_exporter и netstat

https://github.com/midnight47/

Метрики `node_exporter` и AWS CloudWatch

Запуск `node_exporter`

Тестирование сети с `iperf`

CloudWatch vs `node_exporter`

`node_exporter` и Docker Host Mode

`node_exporter` и `netstat`