VictoriaMetrics部署及vmalert集成钉钉告警

1、部署VictoriaMetrics

复制代码
cd /usr/local
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.65.0/victoria-metrics-amd64-v1.65.0.tar.gz
mkdir victoria-metrics && tar -xvzf victoria-metrics-amd64-v1.65.0.tar.gz && \
mv victoria-metrics-prod victoria-metrics/victoria-metrics && cd victoria-metrics
nohup ./victoria-metrics -retentionPeriod=30d -storageDataPath=data &

2、配置Prometheus

复制代码
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
        datacenter: "victoria-1"
# 远程写入victoria
remote_write:
   - url: "http://127.0.0.1:8428/api/v1/write"
     queue_config:
         max_samples_per_send: 10000
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   - "/usr/local/prometheus/rules/rule_node_down.yml"
   - "/usr/local/prometheus/rules/rule_disk_over.yml"
   - "/usr/local/prometheus/rules/rule_cpu_over.yml"
   - "/usr/local/prometheus/rules/rule_memory_over.yml"

  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label  to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
#  用于配置victoria
  - job_name: 'victoria'
    static_configs:
    - targets: ['47.105.38.224:8480']
    - targets: ['47.105.38.224:8481']
    - targets: ['47.105.38.224:8482']
    # metrics_path defaults to '/metrics'
    #     # scheme defaults to 'http'.
  - job_name: 'consul-prometheus'
   # metrics_path: "/v1/agent/metrics"
    scrape_interval: 60s
    scrape_timeout: 10s
    scheme: http
    params:
      format: ['prometheus']
    #static_configs:
    #    - targets:
    #       - 47.105.38.224:8500
    consul_sd_configs:
      - server: '47.105.38.224:8500'
        services: []
    relabel_configs:
      - source_labels: [__metrics_path__]
        separator: ;
        regex: /metrics
        target_label: __metrics_path__
        replacement: /actuator/prometheus
        action: replace
      - source_labels: ['__meta_consul_tags']
        regex: '^.*,metrics=true,.*$'
        action: keep
  - job_name: "node20_exporter"
    static_configs:
    - targets: ["localhost:9100"]

  - job_name: "node21_exporter"
    static_configs:
    - targets: ["172.16.17.21:9100"]    #监控主机

  - job_name: "node22_exporter"
    static_configs:
    - targets: ["172.16.17.22:9100"]

  - job_name: "node23_exporter"
    static_configs:
    - targets: ["172.16.17.23:9100"]    #监控主机


  - job_name: "alertmanager"
    static_configs:
    - targets: ["localhost:9093"]

3、配置Grafana数据源

4、构建vmalert

从源代码构建vmalert:

复制代码
git clone https://github.com/VictoriaMetrics/VictoriaMetrics
cd VictoriaMetrics
make vmalert

构建二进制文件将放置在VictoriaMetrics/bin文件夹中。

5、添加alert.rules

复制代码
vim alert.rules

#rule示例
groups:
    - name: test-rule
      rules:
      - alert: 主机状态
        expr: up == 0
        for: 2m
        labels:
          status: warning
        annotations:
          summary: "{{$labels.instance}}:服务器关闭"
          description: "{{$labels.instance}}:服务器关闭"

6、修改钉钉prometheus-webhook-dingtalk配置文件

复制代码
vim /usr/local/prometheus-webhook-dingtalk/config.example.yml

## Request timeout
# timeout: 5s

## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true

## Customizable templates path
templates:
  - '/usr/local/alertmanager/template/default.tmpl'

## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
#default_message:
#  title: '{{ template "legacy.title" . }}'
#  text: '{{ template "legacy.content" . }}'

## Targets, previously was known as "profiles"
targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=XXXXXXXX
    # secret for signature
    secret: SEC000000000000000000000
#  webhook2:
#    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
#  webhook_legacy:
 #   url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    # Customize template content
    message:
      # Use legacy template
     # title: '{{ template "legacy.title" . }}'
      text: '{{ template "wechat.default.message" . }}'
  webhook_mention_all:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    mention:
      all: true
  webhook_mention_users:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    mention:
      mobiles: ['156xxxx8827', '189xxxx8325']

7、修改alertmanager配置文件

复制代码
vim /usr/local/alertmanager/alertmanager.yml

global:
  resolve_timeout: 5m # 处理超时时间,默认为5min

templates:    # 指定邮件模板的路径,可以使用相对路径,template/*.tmpl的方式
  - '/usr/local/alertmanager/template/default.tmpl'
# 定义路由树信息
route:
  group_by: [alertname]  # 报警分组依据
  receiver: ops_notify   # 设置默认接收人
  group_wait: 30s        # 最初即第一次等待多久时间发送一组警报的通知
  group_interval: 60s    # 在发送新警报前的等待时间
  repeat_interval: 1h    # 重复发送告警时间。默认1h
  routes:

  - receiver: ops_notify  # 基础告警通知人
    group_wait: 10s
    match_re:
      alertname: 实例存活告警|磁盘使用率告警   # 匹配告警规则中的名称发送

  - receiver: info_notify  # 消息告警通知人
    group_wait: 10s
    match_re:
      alertname: 内存使用率告警|CPU使用率告警|目录大小告警

# 定义基础告警接收者
receivers:
- name: ops_notify
  webhook_configs:
  - url: http://localhost:8060/dingtalk/webhook1/send
    send_resolved: true  # 警报被解决之后是否通知
#    message: '{{ template "wechat.default.message" . }}'

# 定义消息告警接收者
- name: info_notify
  webhook_configs:
  - url: http://localhost:8060/dingtalk/webhook1/send
    send_resolved: true
 #   message: '{{ template "wechat.default.message" . }}'

# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的#警报失效的规则。两个警报必须具有一组相同的标签。
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

8、启动vmalert

复制代码
./bin/vmalert -rule=alert.rules \
  -datasource.url=http://localhost:8428 \
  -notifier.url=http://localhost:9093 &

9、查看钉钉告警

相关推荐
XIAOHEZIcode9 小时前
Linux系统鼠标偏移常见原因以及修复方案
linux·运维·游戏
用户0328472220701 天前
如何搭建本地yum源(上)
运维
大树884 天前
金刚石散热越强,管路越先见顶
大数据·运维·服务器·人工智能·ai
摇滚侠4 天前
Linux CentOS7 rpm 安装 MySQL 5.7
linux·运维·mysql
霸道流氓气质4 天前
领域驱动设计(DDD)在 Spring Boot 微服务中的实践指南
运维·spring boot·微服务
Inhand陈工4 天前
基于台达PLC与映翰通IG502的智慧水产养殖精准投喂与远程运维解决方案
运维·人工智能·物联网·阿里云·信息与通信
酣大智4 天前
ARP代理--工作原理
运维·网络·arp·arp代理
shushangyun_4 天前
2026年快消品B2B系统推荐:支持终端门店订货、促销政策自动化的工具?
java·运维·网络·数据库·人工智能·spring·自动化
施努卡机器视觉4 天前
SNK施努卡侧滑门锁上滑轮总成自动化装配线,从零件到组件,全流程精密制造方案
运维·自动化·制造
AC赳赳老秦4 天前
用 OpenClaw 搭建服务器故障应急响应系统,自动处理 80% 常见运维故障
android·运维·服务器·python·rxjava·deepseek·openclaw