kube-promethesu新增k8s组件监控(etcd\kube-controller-manage\kube-scheduler)

我们的k8s集群是二进制部署,版本是1.20.4 同时选择的kube-prometheus版本是kube-prometheus-0.8.0

一、prometheus添加自定义监控与告警(etcd)

1、步骤及注意事项(前提,部署参考部署篇)

1.1 一般etcd集群会开启HTTPS认证,因此访问etcd需要对应的证书

1.2 使用证书创建etcd的secret

1.3 将etcd的secret挂在到prometheus

1.4创建etcd的servicemonitor对象(匹配kube-system空间下具有k8s-app=etcd标签的service)

1.5 创建service关联被监控对象

2、操作部署

2.1 创建etcd的secret

ETC的自建证书路径:/opt/etcd/ssl,cd /opt/etcd/ssl

kubectl create secret generic etcd-certs --from-file=server.pem --from-file=server-key.pem --from-file=ca.pem -n monitoring

可以用下面的命令验证下是否有内容产出,由内存说明是没有问题的

curl --cert /opt/etcd/ssl/server.pem --key /opt/etcd/ssl/server-key.pem https://192.168.7.108:2379/metrics -k |more

可以进入容器查看,查看证书挂载进去了

root@master manifests# kubectl exec -it -n monitoring prometheus-k8s-0 /bin/sh

/prometheus $ ls /etc/prometheus/secrets/etcd-certs/

ca.pem server-key.pem server.pem

2.2 添加secret到名为k8s的prometheus对象上(kubectl edit prometheus k8s -n monitoring或者修改yaml文件并更新资源)

复制代码
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.11.0

或者可以直接找到配置文件更新vim prometheus-prometheus.yaml

kubectl replace -f prometheus-prometheus.yaml

3、创建servicemonitoring对象

复制代码
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: etcd1 #这个serviceMonitor的标签
  name: etcd
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: etcd     #port名字就是service里面的spec.ports.name
    scheme: https  #访问的方式
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.pem #证书位置/etc/prometheus/secrets,这个路径是默认的挂载路径
      certFile: /etc/prometheus/secrets/etcd-certs/server.pem
      keyFile: /etc/prometheus/secrets/etcd-certs/server-key.pem
  selector:
    matchLabels:
      k8s-app: etcd1
  namespaceSelector:
    matchNames:
    - monitoring  #匹配的命名空间

4、创建service并自定义endpoint

复制代码
---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: etcd1
  name: etcd
  namespace: monitoring
subsets:
- addresses:
  - ip: 192.168.7.108
  - ip: 192.168.7.109
  - ip: 192.168.7.106
  ports:
  - name: etcd  #name
    port: 2379  #port
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: etcd1
  name: etcd
  namespace: monitoring
spec:
  ports:
  - name: etcd
    port: 2379
    protocol: TCP
    targetPort: 2379
  sessionAffinity: None
  type: ClusterIP

kubectl replace -f prometheus-prometheus.yaml

kubectl apply -f servicemonitor.yaml

kubectl apply -f service.yam

到这里就可以在prometheus中查看到etcd的监控信息了

添加告警

复制代码
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: etcd-rules
  namespace: monitoring
spec:
  groups:
  - name: etcd-exporter.rules
    rules:
    - alert: EtcdClusterUnavailable
      annotations:
        summary: etcd cluster small
        description: If one more etcd peer goes down the cluster will be unavailable
      expr: |
        count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2-1)
      for: 3m
      labels:
        severity: critical

二、prometheus添加自定义监控与告警(kube-controller-manager)

Kube-prometheus默认是配置了kube-controller-manager的servicemonitor的,但是因为我们是二进制部署的,所以无法找到对应的kube-contorller-manager的service和endpoints,所以这里我们需要自己去手动创建service和endpoints

kubectl get servicemonitor -n monitoring

通过查看servicemonitor去查看需要匹配的service的labels

vim kubernetes-serviceMonitorKubeControllerManager.yaml

以看到他是通过标签app.kubernetes.io/name=kube-controller-manager来匹配controller-manager的当我们查看的时候,并没有符合这个标签的svc所以prometheus找不到controller-manager地址。

好了,下面我们就开始创建service和endpoints了

创建service

首先创建一个endpoint,指向宿主机ip+10252,然后在创建一个同名的service,和上面查出来的标签

复制代码
---
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    app.kubernetes.io/name: kube-controller-manager
  name: kube-controller-manager-monitoring
  namespace: kube-system
subsets:
- addresses:
  - ip: 192.168.7.100
  ports:
  - name: https-metrics
    port: 10252
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kube-controller-manager
  name: kube-controller-manager-monitoring
  namespace: kube-system
spec:
  ports:
  - name: https-metrics
    port: 10252
    protocol: TCP
    targetPort: 10252
  sessionAffinity: None
  type: ClusterIP

#注:kube-prometheus使用的是https、而暴露使用的是http,将https改成http

kubectl edit servicemonitor -n monitoring kube-controller-manager

60 scheme: http

配置完成后就可以再prometheus界面查看到监控信息了,这样就是成功了

三、prometheus添加自定义监控与告警(kube-scheduler)

Kube-scheduler的配置和kube-controller-manager的配置类似

kubectl edit servicemonitor -n monitoring kube-scheduler

#scheme: https 改为scheme: http

好了,下面我们就开始创建service和endpoints了

复制代码
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kube-scheduler
  name: scheduler
  namespace: kube-system
spec:
  ports:
  - name: https-metrics
    port: 10251
    protocol: TCP
    targetPort: 10251
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    app.kubernetes.io/name: kube-scheduler
  name: scheduler
  namespace: kube-system
subsets:
- addresses:
  - ip: 192.168.7.100
  ports:
  - name: https-metrics
    port: 10251
    protocol: TCP

Kubectl apply -f svc-kube-scheduler.yaml

配置完成后就可以再prometheus界面查看到监控信息了,这样就是成功了

至此 controller-manager、scheduler 已经起来

备注:看到不少的资料都是说需要修改kube-controller-manager和kube-scheduler的监听地址,从127.0.0.1修改成0.0.0.0

但是因为我的配置中kube-controller-manager的配置文件一开是就是0.0.0.0所以没有修改,kube-scheduler的配置文件监听地址是127.0.0.1也没有进行修改但是依然成功了,所以这里还有待验证。

相关推荐
武子康3 天前
调查研究-183 Apple container:Mac 上用轻量 VM 跑 Linux 容器,Swift 会改写本地容器体验吗?
docker·容器·apple
2601_961875246 天前
决战申论100题2026|最新|范文
linux·容器·centos·debian·ssh·fabric·vagrant
java_cj6 天前
深入kube-apiserver认证机制:从Bearer Token到mTLS的完整认证链解析
linux·运维·服务器·云原生·容器·kubernetes
程序员老赵6 天前
服务器没有桌面?Docker 跑个 Chrome,浏览器就能远程用
docker·容器·devops
正经教主6 天前
【docker基础】 第八周:容器监控与应用更新策略
运维·docker·容器
kiros_wang6 天前
Docker 使用完整指南
运维·docker·容器
正经教主6 天前
【docker基础】第九周:Docker安全与镜像优化
运维·docker·容器
qq_452396236 天前
第十三篇:《K8s 安全基础:RBAC、ServiceAccount、Pod Security》
java·安全·kubernetes
睡不醒男孩0308236 天前
云原生运维实战:高并发架构下的云原生可观测性、韧性降级与自动化干预体系
数据库·kubernetes·高并发·prometheus·devops·sre·缓存调优
qq_452396236 天前
第十四篇:《K8s 网络模型与 CNI 插件(Calico、Flannel、Cilium)》
网络·kubernetes·php