【Prometheus Operator 监控 K8S集群的Calico 与 Ingress-Nginx 组件】

提示:本文原创作品,良心制作,干货为主,简洁清晰,一看就会

文章目录


前言

此前我们借助 Prometheus Operator 完成了 K8s 高可用集群监控,覆盖 etcd、kube-controller-manager、kube-scheduler、kube-proxy 等组件。本文继续讲解 Calico 与 ingress-nginx 的监控配置,新手可参考往期两篇文章查漏补缺

Prometheus Operator监控K8S高可用集群及Etcd数据库:https://blog.csdn.net/m0_63756214/article/details/161484786?spm=1001.2014.3001.5501

Prometheus Operator 监控 K8s控制器、调度器、代理组件:https://blog.csdn.net/m0_63756214/article/details/161627526?spm=1001.2014.3001.5501

一、整体概述

1.1 实验环境

我的实验环境:

主机名 ip 作用
K8s-master1 192.168.13.136 k8s控制节点
K8s-master2 192.168.13.137 k8s控制节点
K8s-master3 192.168.13.138 k8s控制节点
k8s-node1 192.168.13.139 k8s工作节点
k8s-node2 192.168.13.140 k8s工作节点
NFS 192.168.13.141 NFS服务端,提供存储

1.2 K8s基于Prometheus Operator全链路监控指标采集流程

用 Prometheus Operator 标准化管理 Prometheus,ServiceMonitor 声明式配置监控规则,依托 K8s 原生 Service/Endpoints 自动发现 Pod,完成指标采集存储并由 Grafana 展示的云原生监控全流程

yaml 复制代码
【用户编写资源】
      ↓
1. ServiceMonitor(CRD:采集规则说明书)
   配置:标签筛选规则、metrics端口、抓取周期
      ↓(Operator持续监听CRD变更)
2. Prometheus Operator控制器
   ✅ 自动解析所有ServiceMonitor
   ✅ 自动生成prometheus.yml配置文件(内置kubernetes_sd_configs K8s服务发现+relabel规则)
   ✅ 热更新Prometheus配置,无需手动重启Prometheus Pod
      ↓(配置下发至Prometheus实例)
3. Prometheus Server
   依托配置调用K8s APIServer,触发【K8s Endpoints服务发现】
      ↓(APIServer查询集群资源关联关系)
4. Service资源(关键中转层)
   · Service依靠selector标签绑定后端Pod
   · K8s控制器自动生成对应Endpoints(Endpoints=Pod真实IP+端口清单)
   · ServiceMonitor靠spec.selector匹配Service标签,命中即纳入采集任务
      ↓(解析Endpoints列表)
5. Endpoints = 实际监控目标(每个Pod IP:metrics端口)
      ↓
6. 业务/组件Pod(kube-controller-manager/etcd/node-exporter等)
   暴露 /metrics 指标接口
      ↓
7. Prometheus定时拉取指标存入时序数据库
      ↓
8. Grafana配置Dashboard读取Prometheus数据源,可视化展示监控数据

理解上述流程,能帮助我们更好的监控下面指标

二、监控Calico组件

Calico的组件:

Felix:Calico 的 "大脑",运行在每个节点上,负责所有网络策略的实现

Typha:可选扩展组件,专门优化节点和数据存储之间的通信,提升集群规模上限

kube-controllers:Calico 的控制平面管家,负责资源清理、K8s API 同步等核心管理工作

监控配置:以上组件均可配置向 Prometheus 暴露监控指标,实现全链路监控覆盖

2.1 监控 Calico Felix

查看calico-node基本情况

yaml 复制代码
## 1. 查看calico-node pod
root@k8s-master1:~# kubectl get pod -n calico-system | grep calico-node
calico-node-7zpmm                          1/1     Running   1 (5h36m ago)    24h
calico-node-8mbk7                          1/1     Running   1 (5h55m ago)    23h
calico-node-gj5pz                          1/1     Running   1 (5h36m ago)    24h
calico-node-k6wpw                          1/1     Running   1 (5h36m ago)    24h
calico-node-k95vj                          1/1     Running   1 (5h36m ago)    24h

## 查看pod标签
root@k8s-master1:~# kubectl get pod calico-node-7zpmm -n calico-system --show-labels
NAME                READY   STATUS    RESTARTS        AGE   LABELS
calico-node-7zpmm   1/1     Running   1 (5h37m ago)   24h   app.kubernetes.io/name=calico-node,controller-revision-hash=5f84f7dcd7,k8s-app=calico-node,pod-template-generation=6

## 2. 查看calico-node有没有svc,可以看到没有
root@k8s-master1:~# kubectl get svc -n calico-system
NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
calico-kube-controllers-metrics   ClusterIP   None            <none>        9094/TCP   58d
calico-typha                      ClusterIP   10.105.252.79   <none>        5473/TCP   58d

## 3. 查看calico-node有没有自动生成servicemonitor,可以看到没有
root@k8s-master1:~# kubectl get servicemonitor -n monitoring | grep calico

由于calico-node没有svc和servicemonitor ,所以接下来我们要创建相关的svc和servicemonitor

打开metrics监听

在此之前我们要打开calico的metrics监听,calico在启动时,默认没有打开metrics监听

yaml 复制代码
root@k8s-master1:~# kubectl patch felixconfiguration default --type merge --patch '{"spec":{"prometheusMetricsEnabled": true}}'
root@k8s-master1:~# curl http://192.168.13.136:9091/metrics

创建svc

yaml 复制代码
root@k8s-master1:~# cd /k8s/svc/
root@k8s-master1:/k8s/svc# vim calico.yaml 
apiVersion: v1
kind: Service
metadata:
  name: felix-metrics-svc
  namespace: calico-system
  labels:
    k8s-app: calico-node
spec:
  clusterIP: None
  selector:
    k8s-app: calico-node  # 要和pod的标签匹配
  ports:
  - name: http-metrics
    port: 9091
    targetPort: 9091
root@k8s-master1:/k8s/svc# 
root@k8s-master1:/k8s/svc# kubectl apply -f calico.yaml 

创建serviceMonitor

yaml 复制代码
root@k8s-master1:~# cd kube-prometheus/manifests/
root@k8s-master1:~/kube-prometheus/manifests# vim calico-serviceMonitor.yaml 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: calico-node-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: calico-node   # 抓取calico-node的svc标签
  namespaceSelector:
    matchNames:
    - calico-system   # calico-node所在命名空间
  endpoints:
  - port: http-metrics
    interval: 15s
    path: /metrics
root@k8s-master1:~/kube-prometheus/manifests# kubectl apply -f calico-serviceMonitor.yaml

prometheus开跨命名空间的权限

由于我的calico不在kube-system下,所以我现在需要对prometheus开跨命名空间的权限,让它能发现并监控 Calico 的指标

yaml 复制代码
## prometheus访问k8s看有没有权限查看calico-system命名空间下的endpoints
root@k8s-master1:~# kubectl auth can-i list endpoints  --as=system:serviceaccount:monitoring:prometheus-k8s -n calico-system
no
yaml 复制代码
## prometheus开跨命名空间的权限
root@k8s-master1:~# vim kube-prometheus/manifests/prometheus-rbac.yaml
# 创建一个集群级角色
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-cross-namespace
rules:
- apiGroups: [""]
  resources: ["services", "endpoints", "pods"]
  verbs: ["get", "list", "watch"]
---
# 上面的权限绑定给Prometheus账号
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-cross-namespace-binding
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: monitoring
roleRef:
  kind: ClusterRole
  name: prometheus-cross-namespace
  apiGroup: rbac.authorization.k8s.io
root@k8s-master1:~# kubectl apply -f kube-prometheus/manifests/prometheus-rbac.yaml
yaml 复制代码
## 再次查看
root@k8s-master1:~# kubectl auth can-i list endpoints --as=system:serviceaccount:monitoring:prometheus-k8s -n calico-system
yes

浏览器访问Prometheus,可以看到已经监控到calico felix实例了

grafana添加dashboard面板

https://grafana.com/grafana/dashboards/12175-calico-felix/

2.2 监控kube-controllers

查看kube-controllers基本情况

yaml 复制代码
## 1. 查看kube-controllers pod
root@k8s-master1:~# kubectl get pod -n calico-system | grep kube-controllers
calico-kube-controllers-65c5875bc8-t8m28   1/1     Running   9 (56m ago)    60d

## 2. 查看pod标签
root@k8s-master1:~# kubectl get pod calico-kube-controllers-65c5875bc8-t8m28 -n calico-system  --show-labels
NAME                                       READY   STATUS    RESTARTS      AGE   LABELS
calico-kube-controllers-65c5875bc8-t8m28   1/1     Running   9 (64m ago)   60d   app.kubernetes.io/name=calico-kube-controllers,k8s-app=calico-kube-controllers,pod-template-hash=65c5875bc8

## 3. 查看kube-controllers有没有自动创建svc,可以看到是有的
root@k8s-master1:~# kubectl get svc -n calico-system | grep calico-kube-controllers-metrics
calico-kube-controllers-metrics   ClusterIP   None            <none>        9094/TCP   60d

## 4. 查看svc详细信息
root@k8s-master1:~/kube-prometheus/manifests# kubectl get svc calico-kube-controllers-metrics -n calico-system -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9094"
    prometheus.io/scrape: "true"
  creationTimestamp: "2026-04-08T06:59:06Z"
  labels:
    k8s-app: calico-kube-controllers
  name: calico-kube-controllers-metrics
  namespace: calico-system
  ownerReferences:
  - apiVersion: operator.tigera.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Installation
    name: default
    uid: 1541d167-34fd-4d93-99d2-574abf3eae7e
  resourceVersion: "12791"
  uid: cbe84cf1-7504-4f5c-99b4-988bd5e95dc1
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: metrics-port
    port: 9094
    protocol: TCP
    targetPort: 9094
  selector:
    k8s-app: calico-kube-controllers
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

## 5. 查看kube-controllers有没有自动创建servicemonitor,可以看到没有
root@k8s-master1:~# kubectl get servicemonitor -n monitoring | grep calico
calico-node-metrics       2d22h

由上面的情况,我们得知,kube-controllers已经有了svc但是没有servicemonitor ,所以接下来我们要创建servicemonitor

创建servicemonitor

yaml 复制代码
root@k8s-master1:~# cd kube-prometheus/manifests/
root@k8s-master1:~/kube-prometheus/manifests# vim calico-kube-controllers-serviceMonitor.yaml 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: calico-kube-controllers
  namespace: monitoring
  labels:
    app: calico-kube-controllers
spec:
  jobLabel: calico-kube-controllers
  endpoints:
    - interval: 30s
      port: metrics-port       # 必须和 Service 的 ports.name 完全一致
      scheme: http
  selector:
    matchLabels:
      k8s-app: calico-kube-controllers  # 必须和 metrics Service 的标签一致
  namespaceSelector:
    matchNames:
      - calico-system
root@k8s-master1:~/kube-prometheus/manifests# kubectl apply -f calico-kube-controllers-serviceMonitor.yaml 

浏览器访问Prometheus,可以看到已经监控到kube-controllers实例了

yaml 复制代码
## 如果没看到实例,可以执行下面这一步试试,或者等一会再查看Prometheus实例
root@k8s-master1:~/kube-prometheus/manifests# kubectl -n monitoring port-forward svc/prometheus-k8s 9090:9090

三、监控ingress-nginx

3.1 打开metrics监听及svc映射端口

yaml 复制代码
root@k8s-master1:~# cd /k8s/ingress-nginx/
root@k8s-master1:/k8s/ingress-nginx# ls
alertmanager-ingress.yaml  grafana-ingress.yaml  ingress-nginx.yaml  prometheus-ingress.yaml
root@k8s-master1:/k8s/ingress-nginx# vim ingress-nginx.yaml
yaml 复制代码
## 更新ingress-nginx
root@k8s-master1:/k8s/ingress-nginx# kubectl replace -f ingress-nginx.yaml
root@k8s-master1:/k8s/ingress-nginx# curl 192.168.13.139:10254/metrics
yaml 复制代码
root@k8s-master1:/k8s/ingress-nginx# kubectl get svc -n ingress-nginx
NAME                                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                      AGE
ingress-nginx-controller             NodePort    10.107.51.27   <none>        80:31303/TCP,443:31299/TCP,10254:31235/TCP   4m12s
ingress-nginx-controller-admission   ClusterIP   10.105.53.5    <none>        443/TCP                                      4m12s

3.2 创建servicemonitor

yaml 复制代码
root@k8s-master1:~# cd kube-prometheus/manifests/
root@k8s-master1:~/kube-prometheus/manifests# vim ingress-nginx-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ingress-nginx-controller
  namespace: monitoring
  labels:
    app: ingress-nginx
spec:
  endpoints:
  - interval: 30s
    port: http-metrics          # 必须与 Service 中的 ports.name 完全一致
    path: /metrics
    scheme: http
  selector:  # 匹配ingress-nginx svc标签
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/component: controller
  namespaceSelector:
    matchNames:
    - ingress-nginx   # ingress-nginx svc所在命名空间

浏览器访问Prometheus,可以看到ingress-nginx实例已经被添加

3.3 grafana添加dashboard面板

https://grafana.com/grafana/dashboards/14314-kubernetes-nginx-ingress-controller-nextgen-devops-nirvana/

至此,Prometheus Operator监控calico组件及ingress-nginx就完成了!


注:

文中若有疏漏,欢迎大家指正赐教。

本文为100%原创,转载请务必标注原创作者,尊重劳动成果。

求赞、求关注、求评论!你的支持是我更新的最大动力,评论区等你~

相关推荐
运维开发故事3 天前
基于 Arthas 的多集群在线诊断系统设计与实现
kubernetes
Patrick_Wilson5 天前
从「改个端口」到 502:Next.js on k8s 的容器端口、Service 映射与 env 覆盖
docker·kubernetes·next.js
SRETalk5 天前
Zabbix、Prometheus、Grafana、Nightingale,四个监控如何选型?
zabbix·grafana·prometheus·nightingale
探索云原生5 天前
K8s 1.36 这个 GA 特性,把 initContainer 拉模型的 hack 干掉了
ai·云原生·kubernetes
Java之美6 天前
一次k8s升级引发的DevicePlugin注册失败
云原生·kubernetes
虚无境12 天前
如何编写一个SpringBoot项目告警推送的Starter
java·prometheus·webhook
java_cj13 天前
深入kube-apiserver认证机制:从Bearer Token到mTLS的完整认证链解析
linux·运维·服务器·云原生·容器·kubernetes
qq_4523962313 天前
第十三篇:《K8s 安全基础:RBAC、ServiceAccount、Pod Security》
java·安全·kubernetes
睡不醒男孩03082313 天前
云原生运维实战:高并发架构下的云原生可观测性、韧性降级与自动化干预体系
数据库·kubernetes·高并发·prometheus·devops·sre·缓存调优