kube-promethesu新增k8s组件监控(etcd\kube-controller-manage\kube-scheduler)

我们的k8s集群是二进制部署,版本是1.20.4 同时选择的kube-prometheus版本是kube-prometheus-0.8.0

一、prometheus添加自定义监控与告警(etcd)

1、步骤及注意事项(前提,部署参考部署篇)

1.1 一般etcd集群会开启HTTPS认证,因此访问etcd需要对应的证书

1.2 使用证书创建etcd的secret

1.3 将etcd的secret挂在到prometheus

1.4创建etcd的servicemonitor对象(匹配kube-system空间下具有k8s-app=etcd标签的service)

1.5 创建service关联被监控对象

2、操作部署

2.1 创建etcd的secret

ETC的自建证书路径:/opt/etcd/ssl,cd /opt/etcd/ssl

kubectl create secret generic etcd-certs --from-file=server.pem --from-file=server-key.pem --from-file=ca.pem -n monitoring

可以用下面的命令验证下是否有内容产出,由内存说明是没有问题的

curl --cert /opt/etcd/ssl/server.pem --key /opt/etcd/ssl/server-key.pem https://192.168.7.108:2379/metrics -k |more

可以进入容器查看,查看证书挂载进去了

root@master manifests\]# kubectl exec -it -n monitoring prometheus-k8s-0 /bin/sh /prometheus $ ls /etc/prometheus/secrets/etcd-certs/ ca.pem server-key.pem server.pem ![](https://img-blog.csdnimg.cn/direct/358d9d24324f48c8a4bc884a95e8e1be.png) 2.2 添加secret到名为k8s的prometheus对象上(kubectl edit prometheus k8s -n monitoring或者修改yaml文件并更新资源) apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: k8s name: k8s namespace: monitoring spec: alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web baseImage: quay.io/prometheus/prometheus nodeSelector: kubernetes.io/os: linux podMonitorNamespaceSelector: {} podMonitorSelector: {} replicas: 2 secrets: - etcd-certs resources: requests: memory: 400Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} version: v2.11.0 或者可以直接找到配置文件更新vim prometheus-prometheus.yaml kubectl replace -f prometheus-prometheus.yaml 3、创建servicemonitoring对象 apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: etcd1 #这个serviceMonitor的标签 name: etcd namespace: monitoring spec: endpoints: - interval: 30s port: etcd #port名字就是service里面的spec.ports.name scheme: https #访问的方式 tlsConfig: caFile: /etc/prometheus/secrets/etcd-certs/ca.pem #证书位置/etc/prometheus/secrets,这个路径是默认的挂载路径 certFile: /etc/prometheus/secrets/etcd-certs/server.pem keyFile: /etc/prometheus/secrets/etcd-certs/server-key.pem selector: matchLabels: k8s-app: etcd1 namespaceSelector: matchNames: - monitoring #匹配的命名空间 4、创建service并自定义endpoint --- apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: etcd1 name: etcd namespace: monitoring subsets: - addresses: - ip: 192.168.7.108 - ip: 192.168.7.109 - ip: 192.168.7.106 ports: - name: etcd #name port: 2379 #port protocol: TCP --- apiVersion: v1 kind: Service metadata: labels: k8s-app: etcd1 name: etcd namespace: monitoring spec: ports: - name: etcd port: 2379 protocol: TCP targetPort: 2379 sessionAffinity: None type: ClusterIP kubectl replace -f prometheus-prometheus.yaml kubectl apply -f servicemonitor.yaml kubectl apply -f service.yam 到这里就可以在prometheus中查看到etcd的监控信息了 ![](https://img-blog.csdnimg.cn/direct/ece1e0fc9d3c43488525d16972508db7.png) 添加告警 apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: k8s role: alert-rules name: etcd-rules namespace: monitoring spec: groups: - name: etcd-exporter.rules rules: - alert: EtcdClusterUnavailable annotations: summary: etcd cluster small description: If one more etcd peer goes down the cluster will be unavailable expr: | count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2-1) for: 3m labels: severity: critical 二、prometheus添加自定义监控与告警(kube-controller-manager) ## Kube-prometheus默认是配置了kube-controller-manager的servicemonitor的,但是因为我们是二进制部署的,所以无法找到对应的kube-contorller-manager的service和endpoints,所以这里我们需要自己去手动创建service和endpoints kubectl get servicemonitor -n monitoring ![](https://img-blog.csdnimg.cn/direct/d2e50382f8714a27ac09c6e5debf92f7.png) 通过查看servicemonitor去查看需要匹配的service的labels vim kubernetes-serviceMonitorKubeControllerManager.yaml ![](https://img-blog.csdnimg.cn/direct/243db00548874c269b255f3b8b1c7c47.png) 以看到他是通过标签app.kubernetes.io/name=kube-controller-manager来匹配controller-manager的当我们查看的时候,并没有符合这个标签的svc所以prometheus找不到controller-manager地址。 好了,下面我们就开始创建service和endpoints了 创建service 首先创建一个endpoint,指向宿主机ip+10252,然后在创建一个同名的service,和上面查出来的标签 --- apiVersion: v1 kind: Endpoints metadata: annotations: app.kubernetes.io/name: kube-controller-manager name: kube-controller-manager-monitoring namespace: kube-system subsets: - addresses: - ip: 192.168.7.100 ports: - name: https-metrics port: 10252 protocol: TCP --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: kube-controller-manager name: kube-controller-manager-monitoring namespace: kube-system spec: ports: - name: https-metrics port: 10252 protocol: TCP targetPort: 10252 sessionAffinity: None type: ClusterIP #注:kube-prometheus使用的是https、而暴露使用的是http,将https改成http kubectl edit servicemonitor -n monitoring kube-controller-manager 60 scheme: http 配置完成后就可以再prometheus界面查看到监控信息了,这样就是成功了 ![](https://img-blog.csdnimg.cn/direct/d6645e10e6a34774a29036a62283e57d.png) 三、prometheus添加自定义监控与告警(kube-scheduler) Kube-scheduler的配置和kube-controller-manager的配置类似 kubectl edit servicemonitor -n monitoring kube-scheduler #scheme: https 改为scheme: http 好了,下面我们就开始创建service和endpoints了 apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: kube-scheduler name: scheduler namespace: kube-system spec: ports: - name: https-metrics port: 10251 protocol: TCP targetPort: 10251 sessionAffinity: None type: ClusterIP --- apiVersion: v1 kind: Endpoints metadata: labels: app.kubernetes.io/name: kube-scheduler name: scheduler namespace: kube-system subsets: - addresses: - ip: 192.168.7.100 ports: - name: https-metrics port: 10251 protocol: TCP Kubectl apply -f svc-kube-scheduler.yaml 配置完成后就可以再prometheus界面查看到监控信息了,这样就是成功了 ![](https://img-blog.csdnimg.cn/direct/2312a317d27e42a1925599d8f71b7c48.png) 至此 controller-manager、scheduler 已经起来 备注:看到不少的资料都是说需要修改kube-controller-manager和kube-scheduler的监听地址,从127.0.0.1修改成0.0.0.0 但是因为我的配置中kube-controller-manager的配置文件一开是就是0.0.0.0所以没有修改,kube-scheduler的配置文件监听地址是127.0.0.1也没有进行修改但是依然成功了,所以这里还有待验证。

相关推荐
云手机管家1 小时前
CDN加速对云手机延迟的影响
运维·服务器·网络·容器·智能手机·矩阵·自动化
孤的心了不冷1 小时前
【Docker】CentOS 8.2 安装Docker教程
linux·运维·docker·容器·eureka·centos
头疼的程序员2 小时前
docker学习与使用(概念、镜像、容器、数据卷、dockerfile等)
学习·docker·容器
淡水猫.2 小时前
hbit资产收集工具Docker(笔记版)
运维·docker·容器
水淹萌龙7 小时前
k8s 中使用 Service 访问时NetworkPolicy不生效问题排查
云原生·容器·kubernetes
alden_ygq10 小时前
K8S cgroups详解
容器·贪心算法·kubernetes
dddaidai12311 小时前
分布式ID和分布式锁
redis·分布式·mysql·zookeeper·etcd
旧故新长11 小时前
访问 Docker 官方镜像源(包括代理)全部被“重置连接”或超时
运维·docker·容器
matrixlzp11 小时前
K8S Gateway AB测试、蓝绿发布、金丝雀(灰度)发布
kubernetes·gateway·ab测试
云攀登者-望正茂16 小时前
最大化效率和性能:AKS 中节点池的强大功能
云原生·容器·kubernetes