【Kubernetes】在 K8s 上部署 Alertmanager

如何在 Kubernetes 上高效部署 Alertmanager 告警系统?下面将详细介绍如何部署,一起看看吧!

1、创建命名空间

bash 复制代码
### 为监控组件创建一个专用命名空间:monitoring

# monitoring-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
 
# 应用
kubectl apply -f monitoring-namespace.yaml

2、部署 Alertmanager

2.1、部署应用

bash 复制代码
### 容器端口为 9093,部署成功后可通过 svc 暴露的端口在浏览器访问 alertmanager 页面
### 结合实际选择是否持久化挂载数据目录 storage-volume

# alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - port: 9093
    targetPort: http
  selector:
    app: alertmanager
---
# alertmanager-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      labels:
        app: alertmanager
    spec:
      containers:
      - name: alertmanager
        image: prom/alertmanager:v0.28.1
        args:
        - "--config.file=/etc/alertmanager/alertmanager.yml"
        - "--storage.path=/alertmanager"
        ports:
        - containerPort: 9093
          name: http
        volumeMounts:
        - name: config-volume
          mountPath: /etc/alertmanager
        - name: storage-volume
          mountPath: /alertmanager
      volumes:
      - name: config-volume
        configMap:
          name: alertmanager-config
      - name: storage-volume
        emptyDir: {}

2.2、告警通知配置

bash 复制代码
# alertmanager-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitoring
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
    
    receivers:
    - name: 'web.hook'                   # 需结合实际配置告警发送的服务端
      webhook_configs:
      - url: 'http://example.com/webhook'

3、配置 Prometheus

3.1、添加告警规则配置

bash 复制代码
# Deployment -> prometheus
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      serviceAccount: prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:v3.5.0
        args:
        - '--config.file=/etc/prometheus/prometheus.yml'
        - '--web.enable-lifecycle'
        - '--no-storage.tsdb.wal-compression'
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - name: prometheus-config
          mountPath: /etc/prometheus
        - name: rule-volume
          mountPath: /etc/prometheus/rules
        - name: data-volume
          mountPath: /prometheus
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config
      - name: data-volume
        emptyDir: {}
      - name: rule-volume                            # 添加告警规则 volume
        configMap:
          name: prometheus-rule

# ConfigMap - Prometheus
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s 
      evaluation_interval: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'coredns'
      metrics_path: '/metrics'
      static_configs:
      - targets: ['kube-dns.kube-system.svc.cluster.local:9153']
    rule_files:                                     # 添加规则文件(实际使用 ConfigMap 挂载)
    - "/etc/prometheus/rules/rules.yml"
    alerting:
      alertmanagers:                                # 配置 alertmanager 地址(此处使用 svc -> alertmanger-svc)
      - static_configs:
        - targets:
          - "alertmanager:9093"

3.2、配置告警规则

bash 复制代码
# prometheus-rules.yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: prometheus-rule
  namespace: monitoring
data:
  rules.yml: |
    ---
    groups:
    - name: "coredns"               # 此处规则是为了方便手动触发告警,需结合实际情况配置
      rules:
      - alert: "coredns_status"
        annotations:
          summary: "coredns:{{ $labels.instance }} 10 分钟内域名解析增长率 > 0"
        expr: "sum(rate(coredns_dns_requests_total{job=\"coredns\"}[10m])) by (instance) > 0"
        for: "30s"

# 配置完成后,可在 prometheus 界面看到告警规则(见下方截图)

4、触发告警

  • 手动触发告警后就可在 alertmanager 界面上看到告警
  • 上方为 Prometheus 界面,下方为 Alertmanager 界面
相关推荐
java_cj3 天前
深入kube-apiserver认证机制:从Bearer Token到mTLS的完整认证链解析
linux·运维·服务器·云原生·容器·kubernetes
qq_452396233 天前
第十三篇:《K8s 安全基础:RBAC、ServiceAccount、Pod Security》
java·安全·kubernetes
睡不醒男孩0308233 天前
云原生运维实战:高并发架构下的云原生可观测性、韧性降级与自动化干预体系
数据库·kubernetes·高并发·prometheus·devops·sre·缓存调优
qq_452396233 天前
第十四篇:《K8s 网络模型与 CNI 插件(Calico、Flannel、Cilium)》
网络·kubernetes·php
Hadoop_Liang3 天前
Kubernetes 应用 HTTPS 安全访问配置实践
https·kubernetes
java_cj3 天前
从0到1启动kube-apiserver:深入源码解析API Server启动全流程
docker·容器·kubernetes
Hadoop_Liang3 天前
使用Kubernetes Gateway API实现域名访问应用
容器·kubernetes·gateway
java_cj3 天前
深入kubectl create源码:从YAML到Pod的完整链路拆解
运维·云原生·容器·kubernetes
万能的知了4 天前
K8s到底需不需要GPU节点?集群资源分配的底层逻辑
云原生·容器·kubernetes
卧室小白4 天前
K8S基础-控制器&deploy&pod回滚更新&service
docker·容器·kubernetes