K8S的探针说明和使用方式

探针概述

探针分类

K8S中 探针（Probes） 是用于检查容器的健康状况和可用性的机制。探针可以自动判断应用的运行状态，并根据需要重启容器、替换容器或将流量路由到健康的实例。从而确保应用始终处于健康、可用的状态，并帮助自动化故障恢复

livenessProbe:

健康状态检查，周期性检查服务是否存活，检查结果失败，将"重启"容器(删除源容器并重新创建新容器)。

如果容器没有提供健康状态检查，则默认状态为Success。

readinessProbe:

可用性检查，周期性检查服务是否可用，从而判断容器是否就绪。

若检测pod服务不可用，则会将pod从svc的ep列表中移除。

若检测pod服务可用，则会将pod重新添加到svc的ep列表中。

如果容器没有提供可用性检查，则默认状态为Success。

startupProbe: (1.16+之后的版本才支持)

如果提供了启动探针，则所有其他探针都会被禁用，直到此探针成功为止。

如果启动探测失败，kubelet将杀死容器，而容器依其重启策略进行重启。

如果容器没有提供启动探测，则默认状态为 Success。

探针的探测方式

HTTP GET 探测：发起 HTTP 请求来检查容器是否可以处理请求，（可以是 GET、POST 等）。如果返回的状态码为 2xx 或 3xx，探测就会成功，否则认为容器失败
yaml 复制代码
```
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
    httpHeaders:
    - name: host
      value: test-probe
  initialDelaySeconds: 3
  periodSeconds: 5
```
TCP Socket 探测：向容器内的指定端口发送 TCP 连接请求来检查容器是否能正常响应，适用于没有 HTTP 服务的应用，如数据库、消息队列。在容器内的指定端口建立 TCP 连接。如果连接成功，探测即认为成功，否则认为容器失败
yaml 复制代码
```
livenessProbe:
  tcpSocket:
    port: 3306
  initialDelaySeconds: 3
  periodSeconds: 5
```
命令（Exec）探测：执行容器内部的命令来判断容器是否健康，适用于需要执行特定操作或检查应用内部状态的场景。执行指定的命令，如果命令返回 0 表示容器健康，返回其他值则表示容器不健康
yaml 复制代码
```
livenessProbe:
  exec:
    command:
    - "sh"
    - "-c"
    - "ps -ef | grep filebeat"
  initialDelaySeconds: 3
  periodSeconds: 5
```

探针相关配置字段说明

探针相关字段	说明
failureThreshold: 3	连续3次探测失败后，判定pod失效，触发相应操作
initialDelaySeconds: 10	在容器启动后的10秒钟后开始探测
periodSeconds: 1	每1秒进行一次探测
successThreshold: 1	探测成功的阈值（即1次成功即可认为探测通过)
timeoutSeconds: 1	每次探测的超时时间为1秒

livenessprobe存活探针示例

用于检测容器是否存活。如果存活探针检测到容器处于非健康状态（如死循环或挂起），Kubernetes会重新启动该容器。

yaml 复制代码

# 存活探针示例：
LivenessProbe:
  enabled: true
  httpGet:
  port: 8080
  scheme: HTTP
  initialDelaySeconds: 30  # 初始化时间，默认值为0
  timeoutSeconds: 35 			 # 超时时间，默认值为1
  periodSeconds: 30  			 # 探测间隔时间，默认是10S
  successThreshold: 1      # 检查1次成功就表示成功
  failureThreshold: 3      # 检测3次失败就表示失败，默认值是3

命令探测方式

创建一个文件，通过 cat 命令查看文件是否存在作为命令探测方式。一段时间后，文件被删除，探针监测失败

yaml 复制代码

cat > 20-pods-livenessProbe-exec.yaml <<eof
apiVersion: v1
kind: pod
metadata:
  name: pod-livenessprobe-exec-001
spec:
   containers:
   - image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
     name: c1
     command: 
     - /bin/sh
     - -c
     - touch /tmp/zhiyong18-linux-healthy; sleep 20; rm -f /tmp/zhiyong18-linux-healthy; sleep 600
     livenessProbe:
       # 使用exec的方式去做健康检查，自定义检查命令为cat，并判断cat的返回值
       exec:
         command:
         - cat
         - /tmp/zhiyong18-linux-healthy
       periodSeconds: 1
       failureThreshold: 3
       successThreshold: 1
       initialDelaySeconds:  30
       timeoutSeconds: 1
eof

http探测方式

若是删除，/usr/share/nginx/html/index.html，访问nginx，http请求失败，pod重启

yaml 复制代码

cat > 21-pods-livenessProbe-httpGet.yaml <<eof
apiVersion: v1
kind: pod
metadata:
  name: livenessprobe-httpget-001
spec:
   containers:
   - image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
     name: c1
     # 健康状态检查，周期性检查服务是否存活，检查结果失败，将重启容器。
     livenessProbe:
       # 使用httpGet的方式去做健康检查
       httpGet:
         port: 80
         path: /index.html
       failureThreshold: 3
       initialDelaySeconds: 10
       periodSeconds: 1
       successThreshold: 1
       timeoutSeconds: 1
eof

端口探测

检测80端口，启动10秒钟后关闭nginx；由于tcp探针检测失败，后容器重启

yaml 复制代码

cat > 22-pods-livenessProbe-tcpSocket.yaml <<EOF
apiVersion: v1
kind: pod
metadata:
  name: livenessprobe-tcpsoket-001
spec:
   containers:
   - image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
     name: c1
     command:
     - /bin/sh
     - -c
     - nginx; sleep 10; nginx -s stop ; sleep 600
     # 健康状态检查，周期性检查服务是否存活，检查结果失败，将重启容器。
     livenessProbe:
       # 使用tcpSocket的方式去做健康检查
       tcpSocket:
         port: 80
       failureThreshold: 3
       initialDelaySeconds: 20
       periodSeconds: 2
       successThreshold: 1
       timeoutSeconds: 1
EOF

readinessProbe就绪探针

场景：用于判断容器是否准备好处理用户请求。K8S根据就绪探针的状态决定是否将pod的IP加入服务负载均衡器，确保只将流量发送到准备好的容器

举例：数据库服务启动时可能需要几秒钟才能连接到数据库并进行初始化

yaml 复制代码

readinessProbe:
  httpGet:
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 30  # 初始化时间，默认值为0
  timeoutSeconds: 35       # 超时时间，默认值为1
  periodSeconds: 30        # 探测间隔时间，默认是10S
  successThreshold: 1      # 检查1次成功就表示成功
  failureThreshold: 3      # 检测3次失败就表示失败，默认值是3

使用exec自定义检查的命令

周期性，检测服务是否可用。可以和livenessprobe一起使用

yaml 复制代码

cat > 23-pods-readinessprobe-livenessProbe-exec.yaml <<eof
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-readinessprobe
spec:
  replicas: 3
  selector:
    matchLabels:
      apps: v1
  template:
    metadata:
      labels:
        apps: v1
    spec:
       containers:
       - image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
         name: c1
         command: 
         - /bin/sh
         - -c
         - touch /tmp/zhiyong18-linux-healthy; sleep 35; rm -f /tmp/zhiyong18-linux-healthy; sleep 600
         # 存活探针
         livenessProbe:
           exec:
             command:
             - cat
             - /tmp/zhiyong18-linux-healthy
           failureThreshold: 3
           initialDelaySeconds: 65
           periodSeconds: 1
           successThreshold: 1
           timeoutSeconds: 1
         # 可用性检查，周期性检查服务是否可用,从而判断容器是否就绪.
         readinessProbe:
           # 使用exec的方式去做健康检查
           exec:
             # 自定义检查的命令
             command:
             - cat
             - /tmp/zhiyong18-linux-healthy
           failureThreshold: 3
           initialDelaySeconds: 5
           periodSeconds: 1
           successThreshold: 1
           timeoutSeconds: 1

---

apiVersion: v1
kind: Service
metadata:
  name: zhiyong18-readinessprobe-exec
spec:
  selector:
    apps: v1
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
eof

分析，一段时间后，发现提示readness探针提示为准备就绪。

bash 复制代码

[root@master231~]# kubectl get pods
NAME                                     READY   STATUS    RESTARTS      AGE
deploy-readinessprobe-59996dd447-fbpjs   0/1     Running   2 (82s ago)   4m43

[root@master231~]# kubectl describe pods deploy-readinessprobe-59996dd447-fbpjs
...
Warning  Unhealthy  3m49s (x22 over 4m9s)  kubelet            Readiness probe failed: cat: can't open '/tmp/zhiyong18-linux-healthy': No such file or directory

startup启动探针

yaml 复制代码

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

特点：

探测容器中的应用是否已经正常启动。如果启动了startupProbe，则所有其他探针都先被禁用，直到此探针成功为止。如果startupProbe探测失败，kubelet会杀死容器，而容器根据重启策略来进行重启。如果容器没有提供启动探针，则默认状态为Success。

应用场景：

适用于启动时间较长或启动顺序依赖较复杂的应用。Startup Probe 可避免在应用尚未启动完成时因健康检查失败而导致容器被误判为不健康

yaml 复制代码

[root@master231 pods]# cat >> 26-pods-startupProbe-httpGet.yaml <<eof
apiVersion: v1
kind: pod
metadata:
  name: startupprobe-httpget-01
spec:
  volumes:
  - name: data
    emptyDir: {}
  initContainers:
  - name: init01
    image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
    volumeMounts:
    - name: data
      mountPath: /zhiyong18
    command:
    - /bin/sh
    - -c
    - echo "liveness probe test page" >> /zhiyong18/huozhe.html
  - name: init02
    image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v2
    volumeMounts:
    - name: data
      mountPath: /zhiyong18
    command:
    - /bin/sh
    - -c
    - echo "readiness probe test page" >> /zhiyong18/zhiyong18.html
  - name: init03
    image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v3
    volumeMounts:
    - name: data
      mountPath: /zhiyong18
    command:
    - /bin/sh
    - -c
    - echo "startup probe test page" >> /zhiyong18/start.html
  containers:
  - name: c1
    image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
    volumeMounts:
    - name: data
      mountPath: /usr/share/nginx/html
    # 判断服务是否健康，若检查不通过，将pod直接重启。
    livenessProbe:
      httpGet:
        port: 80
        path: /huozhe.html
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 1
      successThreshold: 1
      timeoutSeconds: 1
    # 判断服务是否就绪，若检查不通过，将pod标记为未就绪状态。
    readinessProbe:
      httpGet:
        port: 80
        path: /zhiyong18.html
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    # 启动时做检查，若检查不通过，直接杀死容器。并进行重启！
    # startupProbe探针通过后才回去执行readinessProbe和livenessProbe哟~
    startupProbe:
      httpGet:
        port: 80
        path: /start.html
      failureThreshold: 3
      # 尽管上面的readinessProbe和livenessProbe数据已经就绪，但必须等待startupProbe的检测成功后才能执行。
      initialDelaySeconds: 35
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
eof

GRPC探测方式

参考链接：https://kubernetes.io/zh-cn/blog/2022/05/13/grpc-probes-now-in-beta/

什么是rpc：在本地调用远程主机的方法，就像是调用本地的方法

1.cat 27-pods-livenessProbe-grpc.yaml

yaml 复制代码

apiVersion: v1
kind: ReplicationController
metadata:
  name: rc-etcd-grpc
spec:
  replicas: 1
  selector:
    app: etcd
  template:
    metadata:
      labels:
        app: etcd
    spec:
       restartPolicy: Always
       containers:
       - image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/etcd:3.5.10
         name: web
         imagePullPolicy: IfNotPresent
         command:
          - /opt/bitnami/etcd/bin/etcd
          - --data-dir=/tmp/etcd 
          - --listen-client-urls=http://0.0.0.0:2379 
          - --advertise-client-urls=http://127.0.0.1:2379 
          - --log-level=debug
         ports:
         - containerPort: 2379
         livenessProbe:
           # 对grpc端口发起grpc调用，目前属于测试阶段，如果真的想要使用，请在更高版本关注，比如k8s 1.24+
           # 在1.23.17版本中，如果检测失败，会触发警告，但不会重启容器只是会有警告事件。
           grpc:
             port: 2379
             # service: health
           failureThreshold: 3
           initialDelaySeconds: 10
           periodSeconds: 1
           successThreshold: 1
           timeoutSeconds: 1
.1:2379 
          - --log-level=debug
         ports:
         - containerPort: 2379
         livenessProbe:
           # 对grpc端口发起grpc调用，目前属于测试阶段，如果真的想要使用，请在更高版本关注，比如k8s 1.24+
           # 在1.23.17版本中，如果检测失败，会触发警告，但不会重启容器只是会有警告事件。
           grpc:
             port: 2379
             # service: health
           failureThreshold: 3
           initialDelaySeconds: 10
           periodSeconds: 1
           successThreshold: 1
           timeoutSeconds: 1