一、Prometheus-Operator
1.Prometheus-Operator是什么?
管理prometheus altermanager serviceMonitor podMOnitor PrometheusRule这些监控组件的。
2.Prometheus-Operator是怎么管理这些组件的?
通过crd(自定义yaml)来创建,更新这些组件,从而实现管理。
二、CRD
1.CRD是什么?
CRD的全拼:Custom Resource Define, 自定义资源资源定义
Custom:定制
Resource:资源
Define:定义
2.5个核心的CRD是什么,各有什么用?
Prometheus:创建和管理Prometheus
ServerMonitor:监控service
PodMonitor:监控Pod
AlterManager:接收Prometheus发出的警告,然后发送通知(邮件 钉钉等)
PrometheusRule:设置Prometheus的告警规则
三、下面介绍下5个核心的CRD
1.Prometheus-CRD
(1).promethesu crd是干啥的
告诉k8s怎么创建和管理prometheus的配置文件
(2).prometheus 和 serviceMonitor如何关联
Prometheus 通过 serviceMonitorSelector
匹配 ServiceMonitor 的 labels 来关联要抓取的 Service
(3).Prometheus示例
apiVersion:
kind: Prometheus
metadata:
name: my-prometheus
namespace: monitoring
spec:
replicas: 1
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: dev
podMonitorSelector:
matchLabels:
team: dev
alerting:
alertmanagers:
- namespace: monitoring
name: alertmanager-main
port: web
resources:
requests:
memory: 400Mi
cpu: 200m
2.ServiceMonitor CRD
(1).serviceMonitor crd是干啥的
监控service
(2).serviceMonitor如何监控serivce?
1.serviceMonitor 关联service
2.service关联pod,获取pod 暴露的指标(/metrics)
3.serviceMonitor 获取到pod暴露的指标
(3).serviceMonitor 和 service如何关联
通过labels,在serviceMonitor中选择service的labes关联
spec:
selector:
matchesLables:
app: myapp
(4).完整的serviceMonitor示例
apiVersion:
kind: ServiceMonitor
metadata:
name: myapp-servicemonitor # 这个 ServiceMonitor 的名字
namespace: monitoring # 放在哪个命名空间
labels:
team: backend # 标签,可用于 Prometheus CR 的 selector
spec:
selector: # 选择要监控的 Service
matchLabels:
app: myapp
namespaceSelector: # 选择 Service 所在命名空间
matchNames:
- default
endpoints: # 指定抓取方式
- port: http-metrics # Service 的端口名
interval: 30s # 抓取间隔
path: /metrics # 抓取路径
3.PodMonitor CRD
(1).PodMonitor CRD做什么的
让Prometheus来自动发现并采集pod
(2).Prometheus如何监控pod?
通过 PodMonitor设置的 /metrics, Prometheus拉取 /metrics的数据
(3).PodMonitor如何和pod关联
通过 labels,podMonitor中些selector,关联 app=myapp的pod
spec:
selector:
matchLabels:
app: myapp
(4).podMonitor的示例
apiVersion:
kind: PodMonitor
metadata:
name: myapp-pod-monitor
namespace: monitoring
labels:
team: dev
spec:
# 匹配哪些 Pod
selector:
matchLabels:
app: myapp
# 哪些命名空间下查找 Pod
namespaceSelector:
matchNames:
- default
# Pod metrics 端点配置
podMetricsEndpoints:
- port: metrics # Pod 容器里暴露 metrics 的端口名称
path: /metrics # metrics 路径
interval: 15s # 抓取频率
scheme: http # 协议(http/https)
4.PrometheusRule CRD
(1).PrometheusRule crd是什么
设置告警规则,比如CPU占用率大于80%就告警
(2).PrometheusRule是如何获取到CPU占用率的
1.通过Prometheus暴露的指标 2.Pod的/metrics 计算出CPU占用率后,通过/metrics暴露,然后Prometheus抓取
(3).代码中 PrometheusRule 如何获取到Prometheus抓取的指标
通过定义PromQL 表达式,如: rate(container_cpu_usage_seconds_total[5m]) > 0.8
(4).PrometheusRule的完整示例
apiVersion:
kind: PrometheusRule
metadata:
name: myapp-rules
namespace: monitoring
spec:
groups:
- name: myapp-alerts
rules:
# 告警规则示例:CPU 使用率过高
- alert: HighCpuUsage
expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) / sum(container_spec_cpu_quota / container_spec_cpu_period) by (pod) > 0.8
for: 2m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} CPU usage is high"
# 告警规则示例:内存使用率过高
- alert: HighMemoryUsage
expr: sum(container_memory_usage_bytes) by (pod) / sum(container_spec_memory_limit_bytes) by (pod) > 0.8
for: 2m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} memory usage is high"
# 记录规则示例:计算每秒请求数
- record: job:http_requests:rate5m
expr: rate(http_requests_total[5m])
5.AlterManager CRD
(1).AlterManager是干啥的
1.prometheus计算 prometheusRule的告警
2.有告警后 发送给AlterManager
3.AlterManager 处理告警信息(分组 去重等)
4.AlterManger将告警信息发到配置的平台(如钉钉 邮箱 )
(2).prometheus如何将告警发送给AlterManager
不是通过labels匹配
而是通过 namespace name port匹配
在prometheus可配置这些参数
spec:
alerting:
alertmanagers:
- namespace: monitoring
name: alertmanager-main
port: web
AlterManager本质也是个service
(3).AlterManager如何处理告警信息
可通过yaml配置
route:
receiver: 'team-email'
group_by: ['alertname', 'severity'] # 按这些 labels 分组告警
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
(4).AlterManger如何将告警信息发送到其他平台
通过yaml
通过yaml
receivers:
- name: 'team-email'
email_configs:
- to: 'devops@example.com'
from: 'alertmanager@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'alertmanager'
auth_identity: 'alertmanager'
auth_password: 'password'