我们的k8s集群是二进制部署,版本是1.20.4 同时选择的kube-prometheus版本是kube-prometheus-0.8.0
一、prometheus添加自定义监控与告警(etcd)
1、步骤及注意事项(前提,部署参考部署篇)
1.1 一般etcd集群会开启HTTPS认证,因此访问etcd需要对应的证书
1.2 使用证书创建etcd的secret
1.3 将etcd的secret挂在到prometheus
1.4创建etcd的servicemonitor对象(匹配kube-system空间下具有k8s-app=etcd标签的service)
1.5 创建service关联被监控对象
2、操作部署
2.1 创建etcd的secret
ETC的自建证书路径:/opt/etcd/ssl,cd /opt/etcd/ssl
kubectl create secret generic etcd-certs --from-file=server.pem --from-file=server-key.pem --from-file=ca.pem -n monitoring
可以用下面的命令验证下是否有内容产出,由内存说明是没有问题的
curl --cert /opt/etcd/ssl/server.pem --key /opt/etcd/ssl/server-key.pem https://192.168.7.108:2379/metrics -k |more
可以进入容器查看,查看证书挂载进去了
root@master manifests\]# kubectl exec -it -n monitoring prometheus-k8s-0 /bin/sh /prometheus $ ls /etc/prometheus/secrets/etcd-certs/ ca.pem server-key.pem server.pem  2.2 添加secret到名为k8s的prometheus对象上(kubectl edit prometheus k8s -n monitoring或者修改yaml文件并更新资源) apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: k8s name: k8s namespace: monitoring spec: alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web baseImage: quay.io/prometheus/prometheus nodeSelector: kubernetes.io/os: linux podMonitorNamespaceSelector: {} podMonitorSelector: {} replicas: 2 secrets: - etcd-certs resources: requests: memory: 400Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} version: v2.11.0 或者可以直接找到配置文件更新vim prometheus-prometheus.yaml kubectl replace -f prometheus-prometheus.yaml 3、创建servicemonitoring对象 apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: etcd1 #这个serviceMonitor的标签 name: etcd namespace: monitoring spec: endpoints: - interval: 30s port: etcd #port名字就是service里面的spec.ports.name scheme: https #访问的方式 tlsConfig: caFile: /etc/prometheus/secrets/etcd-certs/ca.pem #证书位置/etc/prometheus/secrets,这个路径是默认的挂载路径 certFile: /etc/prometheus/secrets/etcd-certs/server.pem keyFile: /etc/prometheus/secrets/etcd-certs/server-key.pem selector: matchLabels: k8s-app: etcd1 namespaceSelector: matchNames: - monitoring #匹配的命名空间 4、创建service并自定义endpoint --- apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: etcd1 name: etcd namespace: monitoring subsets: - addresses: - ip: 192.168.7.108 - ip: 192.168.7.109 - ip: 192.168.7.106 ports: - name: etcd #name port: 2379 #port protocol: TCP --- apiVersion: v1 kind: Service metadata: labels: k8s-app: etcd1 name: etcd namespace: monitoring spec: ports: - name: etcd port: 2379 protocol: TCP targetPort: 2379 sessionAffinity: None type: ClusterIP kubectl replace -f prometheus-prometheus.yaml kubectl apply -f servicemonitor.yaml kubectl apply -f service.yam 到这里就可以在prometheus中查看到etcd的监控信息了  添加告警 apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: k8s role: alert-rules name: etcd-rules namespace: monitoring spec: groups: - name: etcd-exporter.rules rules: - alert: EtcdClusterUnavailable annotations: summary: etcd cluster small description: If one more etcd peer goes down the cluster will be unavailable expr: | count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2-1) for: 3m labels: severity: critical 二、prometheus添加自定义监控与告警(kube-controller-manager) ## Kube-prometheus默认是配置了kube-controller-manager的servicemonitor的,但是因为我们是二进制部署的,所以无法找到对应的kube-contorller-manager的service和endpoints,所以这里我们需要自己去手动创建service和endpoints kubectl get servicemonitor -n monitoring  通过查看servicemonitor去查看需要匹配的service的labels vim kubernetes-serviceMonitorKubeControllerManager.yaml  以看到他是通过标签app.kubernetes.io/name=kube-controller-manager来匹配controller-manager的当我们查看的时候,并没有符合这个标签的svc所以prometheus找不到controller-manager地址。 好了,下面我们就开始创建service和endpoints了 创建service 首先创建一个endpoint,指向宿主机ip+10252,然后在创建一个同名的service,和上面查出来的标签 --- apiVersion: v1 kind: Endpoints metadata: annotations: app.kubernetes.io/name: kube-controller-manager name: kube-controller-manager-monitoring namespace: kube-system subsets: - addresses: - ip: 192.168.7.100 ports: - name: https-metrics port: 10252 protocol: TCP --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: kube-controller-manager name: kube-controller-manager-monitoring namespace: kube-system spec: ports: - name: https-metrics port: 10252 protocol: TCP targetPort: 10252 sessionAffinity: None type: ClusterIP #注:kube-prometheus使用的是https、而暴露使用的是http,将https改成http kubectl edit servicemonitor -n monitoring kube-controller-manager 60 scheme: http 配置完成后就可以再prometheus界面查看到监控信息了,这样就是成功了  三、prometheus添加自定义监控与告警(kube-scheduler) Kube-scheduler的配置和kube-controller-manager的配置类似 kubectl edit servicemonitor -n monitoring kube-scheduler #scheme: https 改为scheme: http 好了,下面我们就开始创建service和endpoints了 apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: kube-scheduler name: scheduler namespace: kube-system spec: ports: - name: https-metrics port: 10251 protocol: TCP targetPort: 10251 sessionAffinity: None type: ClusterIP --- apiVersion: v1 kind: Endpoints metadata: labels: app.kubernetes.io/name: kube-scheduler name: scheduler namespace: kube-system subsets: - addresses: - ip: 192.168.7.100 ports: - name: https-metrics port: 10251 protocol: TCP Kubectl apply -f svc-kube-scheduler.yaml 配置完成后就可以再prometheus界面查看到监控信息了,这样就是成功了  至此 controller-manager、scheduler 已经起来 备注:看到不少的资料都是说需要修改kube-controller-manager和kube-scheduler的监听地址,从127.0.0.1修改成0.0.0.0 但是因为我的配置中kube-controller-manager的配置文件一开是就是0.0.0.0所以没有修改,kube-scheduler的配置文件监听地址是127.0.0.1也没有进行修改但是依然成功了,所以这里还有待验证。