Kubernetes从入门到精通(进阶篇)03

一、高级调度

1、Job

Job(batch/v1) 是用来跑一次性、跑完就退出的任务控制器,保证任务 Pod 成功完成,失败会自动重试Kubernetes。

典型场景:

  • 数据库迁移,数据备份
  • 批量计算,报表生成
  • 初始化脚本,一次运维任务

Job的工作机制:

  1. 创建Job→启动pod
  2. pod正常退出(exit=0)→计数+1
  3. 达到Completions数→job状态为completed
  4. pod异常退出→job按backofflimit 重试
  5. 超过重试次数→job状态Failed

案例:

使用 mysql 镜像,等待 MySQL 服务就绪 → 自动导入测试数据

  • 已有独立 MySQL Deployment + Service(业务数据库)

  • 单独创建一个 Job (使用mysql:8.0镜像)

  • Job 内循环等待 MySQL 服务就绪,自动执行建库、建表、写入测试数据

  • 一次性任务,跑完 Job 标记为 Completed,不常驻

    准备configMap(my.cnf)文件

    cat > my.cnf << EOF
    [mysqld]
    bind-address = 0.0.0.0
    character_set_server = utf8mb4
    collation-server = utf8mb4_unicode_ci
    max_connections = 200
    default-time_zone = '+8:00'
    EOF

    创建configMap

    [root@k8s-master01 job]# kubectl create configmap mysql-cfg --from-file=my.cnf -n tcloud

StatefulSet 靠 Headless Service 给每个 Pod 分配固定域名!

没有 Service → 没有固定域名 → 你的 Job 找不到 mysql-0、mysql-1、mysql-2 → 初始化失败!

复制代码
# 准备service文件
[root@k8s-master01 job]# vim service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mysql-svc
  namespace: tcloud
spec:
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306


# 准备statefulset文件
[root@k8s-master01 job]# vim statefulset.yaml 
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: tcloud
spec:
  serviceName: mysql-svc
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: docker.io/library/mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "Test@123"
        ports:
        - containerPort: 3306
        # 挂载 ConfigMap 到 MySQL 配置目录 ↓↓↓
        volumeMounts:
        - name: mysql-config
          mountPath: /etc/mysql/conf.d

      # 使用创建的 ConfigMap ↓↓↓
      volumes:
      - name: mysql-config
        configMap:
          name: mysql-cfg

# 准备job任务文件
[root@k8s-master01 job]# vim job.yaml 
apiVersion: batch/v1
kind: Job
metadata:
  name: mysql-init-job   # Job任务名称
  namespace: tcloud      # 运行在tcloud命名空间
spec:
  parallelism: 1         # 同时只启动1个Pod执行任务
  completions: 1         # 需成功完成1次任务才算结束
  backoffLimit: 10       # 任务失败最大重试次数
  activeDeadlineSeconds: 300  # 任务最大运行超时时间,超时自动终止

  template:
    spec:
      restartPolicy: OnFailure  # 仅容器异常失败时才重启
      containers:
      - name: init-data
        image: docker.io/library/mysql:8.0
        command:
        - /bin/sh
        - -c
        - |
          # 变量
          u=root
          p=Test@123

          # 等待 10 秒让 MySQL 完全启动(最稳妥)
          sleep 10

          # 插入 mysql-0
          mysql -h mysql-0.mysql-svc -u$u -p$p -e "create database testdb; use testdb; create table t(id int); insert into t values(0);"
          echo "mysql-0 插入数据成功"

          # 插入 mysql-1
          mysql -h mysql-1.mysql-svc -u$u -p$p -e "create database testdb; use testdb; create table t(id int); insert into t values(1);"
          echo "mysql-1 插入数据成功"

          # 插入 mysql-2
          mysql -h mysql-2.mysql-svc -u$u -p$p -e "create database testdb; use testdb; create table t(id int); insert into t values(2);"
          echo "mysql-2 插入数据成功"

# 创建资源
[root@k8s-master01 job]# kubectl apply -f statefulset.yaml 
[root@k8s-master01 job]# kubectl apply -f service.yaml 

# 验证创建的资源
[root@k8s-master01 job]# kubectl get pod -n tcloud -o wide
NAME      READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
mysql-0   1/1     Running   0          2m25s   10.244.58.236    k8s-node02   <none>           <none>
mysql-1   1/1     Running   0          2m23s   10.244.135.172   k8s-node03   <none>           <none>
mysql-2   1/1     Running   0          2m22s   10.244.85.230    k8s-node01   <none>           <none>
[root@k8s-master01 job]# kubectl get endpoints -n tcloud 
Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
NAME        ENDPOINTS                                                   AGE
mysql-svc   10.244.135.172:3306,10.244.58.236:3306,10.244.85.230:3306   96s

# 创建job来执行任务
[root@k8s-master01 job]# kubectl apply -f job.yaml 

# 查看创建的job
[root@k8s-master01 job]# kubectl get pod -n tcloud 
NAME                   READY   STATUS      RESTARTS   AGE
mysql-0                1/1     Running     0          67s
mysql-1                1/1     Running     0          65s
mysql-2                1/1     Running     0          64s
mysql-init-job-m89sh   0/1     Completed   0          62s
状态是Completed说明已经执行完毕

# 查看日志
[root@k8s-master01 job]# kubectl logs -n tcloud mysql-init-job-m89sh 
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql-0 插入数据成功
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql-1 插入数据成功
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql-2 插入数据成功


# 进入pod查看数据
[root@k8s-master01 job]# kubectl exec -n tcloud mysql-0 -- mysql -uroot -pTest@123 -e "use testdb; select * from t;"
mysql: [Warning] Using a password on the command line interface can be insecure.
id
0
[root@k8s-master01 job]# kubectl exec -n tcloud mysql-1 -- mysql -uroot -pTest@123 -e "use testdb; select * from t;"
mysql: [Warning] Using a password on the command line interface can be insecure.
id
1
[root@k8s-master01 job]# kubectl exec -n tcloud mysql-2 -- mysql -uroot -pTest@123 -e "use testdb; select * from t;"
mysql: [Warning] Using a password on the command line interface can be insecure.
id
2

# 删除实验所用到的资源   
[root@k8s-master01 job]# kubectl delete configmaps -n tcloud mysql-cfg 
[root@k8s-master01 job]# kubectl delete -f service.yaml 
[root@k8s-master01 job]# kubectl delete -f statefulset.yaml 
[root@k8s-master01 job]# kubectl delete -f job.yaml 

2、CronJob

CronJob控制器以Job控制器资源为其管控对象,并借助它管理pod资源对象,Job控制器定义的作业任务在其控制器资源创建之后便会立即执行,但CronJob可以以类似于Linux操作系统的周期性任务作业计划的方式控制其运行时间点及重复运行的方式。也就是说,CronJob可以在特定的时间点(反复的)去运行job任务。

使用场景:

  1. 数据库定时运维
  2. 日志与垃圾清理
  3. 业务定时任务
  4. 定时数据同步

案例:K8s 定时集群健康巡检

定时任务需要使用kubelet命令,需要自己制作镜像

复制代码
# 在云服务器使用docker制作
[root@iZatn52hwp5l42Z ~]# mkdir -p /opt/self-kubectl-img
[root@iZatn52hwp5l42Z ~]# cd /opt/self-kubectl-img
[root@iZatn52hwp5l42Z self-kubectl-img]# wget https://dl.k8s.io/release/v1.28.2/bin/linux/amd64/kubectl
[root@iZatn52hwp5l42Z self-kubectl-img]# chmod +x kubectl

# 编写Dockerfile
[root@iZatn52hwp5l42Z self-kubectl-img]# cat > Dockerfile <<EOF
FROM busybox:1.36
COPY kubectl /usr/local/bin/
RUN chmod +x /usr/local/bin/kubectl
EOF

# 构建镜像
[root@iZatn52hwp5l42Z self-kubectl-img]# docker build -t local/busybox-kubectl:v1.28 .

# 打包镜像
[root@iZatn52hwp5l42Z self-kubectl-img]# docker save -o busybox-kubectl.tar local/busybox-kubectl:v1.28

# 把镜像包上传到每一台Kubernetes节点,导入
[root@k8s-master01 ~]# ctr -n k8s.io image import busybox-kubectl.tar

# 创建服务认证授权rabc权限
[root@k8s-master01 cronjob]# vim ServiceAccount.yaml 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cluster-health-sa
  namespace: tcloud
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-health-role
rules:
- apiGroups: [""]
  resources: ["nodes", "pods"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-health-rb
subjects:
- kind: ServiceAccount
  name: cluster-health-sa
  namespace: tcloud
roleRef:
  kind: ClusterRole
  name: cluster-health-role
  apiGroup: rbac.authorization.k8s.io

# 编写cronjob文件
[root@k8s-master01 cronjob]# vim crontab.ymal 
apiVersion: batch/v1
kind: CronJob
metadata:
  name: cluster-health-check
  namespace: tcloud
spec:
  # 每分钟执行一次
  schedule: "*/1 * * * *"
  # 禁止并发:上一个没跑完,跳过本次
  concurrencyPolicy: Forbid
  # 错过30秒不再补跑
  startingDeadlineSeconds: 30
  # 保留成功/失败历史各3个
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3

  jobTemplate:
    spec:
      # 任务最长运行60秒,超时强制终止
      activeDeadlineSeconds: 60
      # 失败不重启Pod
      backoffLimit: 0
      template:
        spec:
          serviceAccountName: cluster-health-sa
          restartPolicy: Never
          containers:
          - name: health-check
            image: local/busybox-kubectl:v1.28
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |
              echo "====================================================="
              echo "               K8s 集群健康巡检报告"
              echo "====================================================="
              echo "执行时间: $(date '+%Y-%m-%d %H:%M:%S')"
              echo "命名空间: tcloud"
              echo "====================================================="

              echo -e "\n===== 1. 集群节点列表 ====="
              kubectl get nodes -o wide

              echo -e "\n===== 2. 检测非就绪节点 ====="
              UNREADY_NODE=$(kubectl get nodes --no-headers | grep -v Ready)
              if [ -n "$UNREADY_NODE" ]; then
                echo "❌ 检测到未就绪节点:"
                echo "$UNREADY_NODE"
                exit 1
              else
                echo "✅ 所有节点全部就绪"
              fi

              echo -e "\n===== 3. 检测异常Pod ====="
              ERROR_POD=$(kubectl get pods -A --no-headers \
              | grep -E "CrashLoopBackOff|Error|Pending|Evicted|ImagePullBackOff|ErrImagePull")
              if [ -n "$ERROR_POD" ]; then
                echo "❌ 检测到异常Pod:"
                echo "$ERROR_POD"
                exit 1
              else
                echo "✅ 全局所有Pod状态正常"
              fi

              echo -e "\n====================================================="
              echo "✅ 本次集群巡检全部通过"
              echo "====================================================="

# apply
[root@k8s-master01 cronjob]# kubectl apply -f ServiceAccount.yaml 
[root@k8s-master01 cronjob]# kubectl apply -f crontab.ymal 

# 查看cronjob
[root@k8s-master01 cronjob]# kubectl get cronjob -n tcloud 
NAME                   SCHEDULE      TIMEZONE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cluster-health-check   */1 * * * *   <none>     False     1        3s              69s

# 查看job
[root@k8s-master01 cronjob]# kubectl get job -n tcloud 
NAME                            STATUS     COMPLETIONS   DURATION   AGE
cluster-health-check-29625369   Complete   1/1           3s         87s
cluster-health-check-29625370   Complete   1/1           3s         27s

# 查看pod
[root@k8s-master01 cronjob]# kubectl get pod -n tcloud 
NAME                                  READY   STATUS      RESTARTS   AGE
cluster-health-check-29625369-kg5pg   0/1     Completed   0          101s
cluster-health-check-29625370-646hl   0/1     Completed   0          41s


# 查看日志
[root@k8s-master01 cronjob]# kubectl logs -n tcloud cluster-health-check-29625369-kg5pg 
=====================================================
               K8s 集群健康巡检报告
=====================================================
执行时间: 2026-04-30 04:09:00
命名空间: tcloud
=====================================================

===== 1. 集群节点列表 =====
NAME           STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master01   Ready    <none>   10d   v1.34.1   192.168.1.10   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   containerd://1.6.20
k8s-master02   Ready    <none>   10d   v1.34.1   192.168.1.11   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   containerd://1.6.20
k8s-master03   Ready    <none>   10d   v1.34.1   192.168.1.12   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   containerd://1.6.20
k8s-node01     Ready    <none>   10d   v1.34.1   192.168.1.13   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   containerd://1.6.20
k8s-node02     Ready    <none>   10d   v1.34.1   192.168.1.14   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   containerd://1.6.20
k8s-node03     Ready    <none>   10d   v1.34.1   192.168.1.15   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   containerd://1.6.20

===== 2. 检测非就绪节点 =====
✅ 所有节点全部就绪

===== 3. 检测异常Pod =====
✅ 全局所有Pod状态正常

=====================================================
✅ 本次集群巡检全部通过
=====================================================

# 删除资源
[root@k8s-master01 cronjob]# kubectl delete -f ServiceAccount.yaml 
[root@k8s-master01 cronjob]# kubectl delete -f crontab.ymal 

3、初始化容器initContainer

初始化容器是在pod的主容器启动之前要运行的容器,主要是做一些主容器的前置工作,它具有两大特征:

  • 初始化容器必须运行完成直至结束,若某初始化容器运行失败,那么kubernetes需要重启它直到成功完成
  • 初始化容器必须按照定义的顺序执行,当且仅当前一个成功之后,后面的一个才能运行

初始化容器有很多的应用场景,下面列出的是最常见的几个:

  • 创建业务所需目录

  • 初始化配置文件

  • 设置目录权限

  • 预热完成再启动主业务容器

    创建使用初始化容器的pod

    [root@k8s-master01 init]# vim init-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: vue-login-app
    namespace: tcloud
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: vue-login
    template:
    metadata:
    labels:
    app: vue-login
    spec:
    # ==============================================
    # 初始化容器(串行执行)
    # 1. 检测域名解析是否就绪
    # 2. 节点预热:创建目录、权限、配置
    # ==============================================
    initContainers:
    # -------------------
    # Init 1: 域名解析检测
    # -------------------
    - name: check-dns
    image: local/busybox-kubectl:v1.28
    imagePullPolicy: IfNotPresent
    command:
    - /bin/sh
    - -c
    - |
    echo "===== 检测K8s域名解析是否就绪 ====="
    until nslookup kubernetes.default.svc; do
    echo "域名未就绪,等待3秒..."
    sleep 3
    done
    echo "✅ 域名解析正常!"

    复制代码
          # -------------------
          # Init 2: 节点预热
          # -------------------
          - name: node-preheat
            image: local/busybox-kubectl:v1.28
            imagePullPolicy: IfNotPresent
            volumeMounts:
              - name: data-volume
                mountPath: /data
            command:
              - /bin/sh
              - -c
              - |
                echo "===== 开始节点预热 ====="
                mkdir -p /data/logs /data/nginx/cache
                chmod -R 755 /data
                echo "✅ 目录创建完成"
                echo "✅ 权限初始化完成"
                echo "✅ 节点预热完成!"
    
        # ==============================================
        # 主容器:你的 vue-login 业务镜像
        # ==============================================
        containers:
          - name: vue-login
            image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
            imagePullPolicy: IfNotPresent
            ports:
              - containerPort: 80
            volumeMounts:
              - name: data-volume
                mountPath: /data
    
        # 共享存储卷
        volumes:
          - name: data-volume
            emptyDir: {}

其中emptyDir = 临时空目录

流程:

  • Init 容器(node-preheat)

    • 挂载 emptyDir → /data
    • 创建目录:/data/logs/data/nginx/cache
    • 设置权限
  • emptyDir 保存这些目录和文件

  • 主容器(vue-login)

    • 挂载 同一个 emptyDir/data

    • 直接使用 Init 容器创建好的目录

    • 可以读写日志、缓存

      在另一个窗口查看pod的生命周期状态

      [root@k8s-master01 ~]# kubectl get pod -n tcloud -w

      apply

      [root@k8s-master01 init]# kubectl apply -f init-deployment.yaml

      pod的生命周期

      [root@k8s-master01 ~]# kubectl get pod -n tcloud -w
      NAME READY STATUS RESTARTS AGE
      vue-login-app-5c64cbb49c-qbn6r 0/1 Pending 0 0s
      vue-login-app-5c64cbb49c-qbn6r 0/1 Pending 0 0s
      vue-login-app-5c64cbb49c-qbn6r 0/1 Init:0/2 0 0s
      vue-login-app-5c64cbb49c-qbn6r 0/1 Init:1/2 0 1s
      vue-login-app-5c64cbb49c-qbn6r 0/1 PodInitializing 0 2s
      vue-login-app-5c64cbb49c-qbn6r 1/1 Running 0 3s
      Init:0/2 → 刚开始运行 2 个初始化容器
      Init:1/2 → 第 1 个 Init 容器完成(DNS 检测)
      PodInitializing → 第 2 个 Init 容器完成(节点预热)
      Running → 主容器 vue-login 启动成功!

      查看初始化的日志,这里需要-C指定初始化容器

      [root@k8s-master01 init]# kubectl logs -n tcloud vue-login-app-5c64cbb49c-qbn6r -c check-dns
      ===== 检测K8s域名解析是否就绪 =====
      Server: 10.96.0.10
      Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

      Name: kubernetes.default.svc
      Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
      ✅ 域名解析正常!
      [root@k8s-master01 init]# kubectl logs -n tcloud vue-login-app-5c64cbb49c-qbn6r -c node-preheat
      ===== 开始节点预热 =====
      ✅ 目录创建完成
      ✅ 权限初始化完成
      ✅ 节点预热完成!

      进入pod查看目录是否创建

      [root@k8s-master01 init]# kubectl exec -it -n tcloud vue-login-app-5c64cbb49c-qbn6r -- bash
      Defaulted container "vue-login" out of: vue-login, check-dns (init), node-preheat (init)
      root@vue-login-app-5c64cbb49c-qbn6r:/# ls -l /data/
      total 0
      drwxr-xr-x 2 root root 6 Apr 30 05:26 logs
      drwxr-xr-x 3 root root 19 Apr 30 05:26 nginx
      如果不使用emptydir 目录会跟随初始化容器的结束而删除

4、pod调度

1)定向调度

定向调度,指的是利用在pod上声明nodeName或者nodeSelector,以此将Pod调度到期望的node节点上。注意,这里的调度是强制的,这就意味着即使要调度的目标Node不存在,也会向上面进行调度,只不过pod运行失败而已。

NodeName

直接强制把 Pod 绑定到指定节点绕过调度器,不经过预选、打分,强制落在你写的节点上。

规则:

  • 写了 nodeName: 节点名,Pod 只会调度到该节点

  • 该节点不存在 / 宕机 / 不可调度 → Pod 永远 Pending

  • 不支持弹性、不自动漂移,纯手动写死

    [root@k8s-master01 nodename]# vim vue-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: vue-login
    namespace: tcloud
    spec:
    replicas: 2
    selector:
    matchLabels:
    app: vue-login
    template:
    metadata:
    labels:
    app: vue-login
    spec:
    # 定向调度:写要固定的节点名
    nodeName: k8s-node01

    复制代码
        containers:
        - name: vue-login
          image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
          imagePullPolicy: IfNotPresent
          ports:
          - containerPort: 80

    apply

    [root@k8s-master01 nodename]# kubectl apply -f vue-deployment.yaml

    查看pod的运行在那个节点上

    [root@k8s-master01 nodename]# kubectl get pod -n tcloud -o wide
    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    vue-login-6fc46b59d5-bwqpl 1/1 Running 0 17s 10.244.85.235 k8s-node01 <none> <none>
    vue-login-6fc46b59d5-prp9n 1/1 Running 0 17s 10.244.85.236 k8s-node01 <none> <none>

    删除其中一个pod

    [root@k8s-master01 nodename]# kubectl delete pod -n tcloud vue-login-6fc46b59d5-bwqpl

    查看pod的运行节点

    [root@k8s-master01 nodename]# kubectl get pod -n tcloud -o wide
    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    vue-login-6fc46b59d5-prp9n 1/1 Running 0 80s 10.244.85.236 k8s-node01 <none> <none>
    vue-login-6fc46b59d5-swpkl 1/1 Running 0 2s 10.244.85.237 k8s-node01 <none> <none>
    还是node01

    删除资源

    [root@k8s-master01 nodename]# kubectl delete -f vue-deployment.yaml

NodeSelector

给节点打标签 → Pod 根据标签选择节点 → 只调度到符合标签的机器上

复制代码
# 给k8s-node01节点打赏标签
[root@k8s-master01 nodename]# kubectl label nodes k8s-node01 app=vue

# 查看节点的标签
[root@k8s-master01 nodename]# kubectl get nodes --show-labels

# 编辑deployment文件
[root@k8s-master01 nodeselector]# vim nodeselector.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vue-login
  namespace: tcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vue-login
  template:
    metadata:
      labels:
        app: vue-login
    spec:
      nodeSelector:
        app: vue   # 只调度到有 app=vue 标签的节点

      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80

# apply
[root@k8s-master01 nodeselector]# kubectl apply -f nodeselector.yaml 


# 查看pod运行的节点
[root@k8s-master01 nodeselector]# kubectl get pod -n tcloud -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
vue-login-658b98d7c5-2qt9h   1/1     Running   0          25s   10.244.85.238   k8s-node01   <none>           <none>

# 删除pod重建
[root@k8s-master01 nodeselector]# kubectl delete pod -n tcloud vue-login-658b98d7c5-2qt9h 

# 再次查看pod的运行节点
[root@k8s-master01 nodeselector]# kubectl get pod -n tcloud -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
vue-login-658b98d7c5-6hnbg   1/1     Running   0          13s   10.244.85.239   k8s-node01   <none>           <none>


# 删除资源
[root@k8s-master01 nodeselector]# kubectl delete -f nodeselector.yaml 
# 删除标签
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node01 app-

2)亲和性调度

两种定向调度的方式,使用起来非常方便,但是也有一定的问题,那就是如果没有满足条件的Node,那么Pod将不会被运行,即使在集群中还有可用Node列表也不行,这就限制了它的使用场景。

基于上面的问题,kubernetes还提供了一种亲和性调度(Affinity)。它在NodeSelector的基础之上的进行了扩展,可以通过配置的形式,实现优先选择满足条件的Node进行调度,如果没有,也可以调度到不满足条件的节点上,使调度更加灵活。

Affinity主要分为三类:

  • nodeAffinity(node亲和性): 以node为目标,解决pod可以调度到哪些node的问题
  • podAffinity(pod亲和性) : 以pod为目标,解决pod可以和哪些已存在的pod部署在同一个拓扑域中的问题
  • podAntiAffinity(pod反亲和性) : 以pod为目标,解决pod不能和哪些已存在pod部署在同一个拓扑域中的问题
NodeAffinity(节点亲和性)

Pod,只想跑在「有某个标签」的节点上;可以设 必须在这(硬策略)、尽量在这(软策略)

硬策略

关键参数:requiredDuringSchedulingIgnoredDuringExecution

复制代码
# 先给节点打标签
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node01 role=web
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node02 role=web
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node03 role=db


# 编写硬策略的yaml文件
[root@k8s-master01 nodeselector]# vim NodeAffinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vue-nodeaffinity-hard
  namespace: tcloud
spec:
  replicas: 2
  selector:
    matchLabels:
      app: vue-hard
  template:
    metadata:
      labels:
        app: vue-hard
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent
      # ====================== 节点亲和性配置(硬亲和,必须满足)======================
      affinity:
        nodeAffinity:
          # 硬亲和策略:调度时必须满足,不满足则Pod无法调度(运行时不生效)
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              # 节点必须包含标签 key=role
              - key: role
                # 操作符:节点标签的值必须在指定列表中
                operator: In
                # 允许的值:节点标签 role=web 才能调度
                values:
                - web

# apply
[root@k8s-master01 nodeselector]# kubectl apply -f NodeAffinity.yaml 

# 查看pod的运行节点
[root@k8s-master01 nodeselector]# kubectl get pod -n tcloud -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
vue-nodeaffinity-hard-6486578875-67wv9   1/1     Running   0          30s   10.244.58.208   k8s-node02   <none>           <none>
vue-nodeaffinity-hard-6486578875-zp7tw   1/1     Running   0          30s   10.244.85.240   k8s-node01   <none>           <none>

# 硬策略特点
强制匹配,不匹配就 Pending
效果比 nodeSelector 更强,支持多条件、复杂运算符
适合业务必须跑在指定业务节点

# 现在把node01的标签删除
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node01 role-

# 扩容副本改成5
[root@k8s-master01 nodeselector]# kubectl get pod -n tcloud -o wide
NAME                                     READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
vue-nodeaffinity-hard-6486578875-67wv9   1/1     Running   0          6m48s   10.244.58.208   k8s-node02   <none>           <none>
vue-nodeaffinity-hard-6486578875-ctc6z   1/1     Running   0          11s     10.244.58.210   k8s-node02   <none>           <none>
vue-nodeaffinity-hard-6486578875-n6vpz   1/1     Running   0          11s     10.244.58.209   k8s-node02   <none>           <none>
vue-nodeaffinity-hard-6486578875-r7pb4   1/1     Running   0          11s     10.244.58.211   k8s-node02   <none>           <none>
vue-nodeaffinity-hard-6486578875-zp7tw   1/1     Running   0          6m48s   10.244.85.240   k8s-node01   <none>           <none>

已经运行的 Pod 不会被驱逐
新扩容 / 重建的 Pod 就不能调度到 node01 了

# 删除资源
[root@k8s-master01 nodeselector]# kubectl delete -f NodeAffinity.yaml 
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node02 role-
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node03 role-
软策略

pod尽量运行在符合的节点,没有也行

关键参数:preferredDuringSchedulingIgnoredDuringExecution

复制代码
# 打标签
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node01 env=test
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node02 env=test

# 创建软策略的yaml文件
[root@k8s-master01 nodeselector]# vim NodeAffinity-vue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vue-nodeaffinity-soft
  namespace: tcloud
spec:
  replicas: 2
  selector:
    matchLabels:
      app: vue-soft
  template:
    metadata:
      labels:
        app: vue-soft
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent
      affinity:
        nodeAffinity:
          # 软亲和策略:调度时优先满足,不满足也可正常调度运行(运行时不生效)
          preferredDuringSchedulingIgnoredDuringExecution:
          # 权重值10,权重越高,优先调度的倾向越强
          - weight: 10
            preference:
              matchExpressions:
              # 匹配节点标签 key=env
              - key: env
                # 操作符:节点标签值在指定列表中
                operator: In
                # 优先调度到 标签env=test 的节点上
                values:
                - test

# apply
[root@k8s-master01 nodeselector]# kubectl apply -f NodeAffinity-vue.yaml 

# 查看pod的运行节点
[root@k8s-master01 nodeselector]# kubectl get pod -n tcloud -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
vue-nodeaffinity-soft-677465f496-pnpjj   1/1     Running   0          90s   10.244.58.213   k8s-node02   <none>           <none>
vue-nodeaffinity-soft-677465f496-tnlxq   1/1     Running   0          90s   10.244.85.241   k8s-node01   <none>           <none>


# 把node1和node2的标签删除
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node01 env-
node/k8s-node01 unlabeled
[root@k8s-master01 nodeselector]# kubectl label nodes k8s-node02 env-
node/k8s-node02 unlabeled

# 删除pod重建
[root@k8s-master01 nodeselector]# kubectl delete pod -n tcloud -l app=vue-soft

# 查看pod的运行节点
[root@k8s-master01 nodeselector]# kubectl get pod -n tcloud -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
vue-nodeaffinity-soft-677465f496-mbszx   1/1     Running   0          4s    10.244.135.132   k8s-node03     <none>           <none>
vue-nodeaffinity-soft-677465f496-xqh7r   1/1     Running   0          4s    10.244.195.50    k8s-master03   <none>           <none>
不满足的也可以运行

# 删除资源
[root@k8s-master01 nodeselector]# kubectl delete -f NodeAffinity-vue.yaml 
PodAffinity(pod亲和性)

新 Pod 要跟 已存在的某类 Pod 调度在同一个拓扑域(同一节点 / 同可用区)

适用场景:

前端后端,业务和reids要就近部署,减少网络延迟

硬亲和

新的 Pod 必须 和 目标 Pod 部署在同一个拓扑域(同一节点)不满足 → 直接 Pending,绝不调度!

关键参数:requiredDuringSchedulingIgnoredDuringExecution

复制代码
# 创建一个被依赖的pod
[root@k8s-master01 podAffinity]# vim podaffinity.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: tcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent

# apply查看跑在那个节点
[root@k8s-master01 podAffinity]# kubectl apply -f podaffinity.yaml
[root@k8s-master01 podAffinity]# kubectl get pod -n tcloud -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
redis-7b6f9ff8cd-lt792   1/1     Running   0          12s   10.244.58.214   k8s-node02   <none>           <none>

# 创建硬亲和的pod
[root@k8s-master01 podAffinity]# vim hard-podaffinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pod-affinity-hard
  namespace: tcloud
spec:
  replicas: 2
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent

      # ==============================
      # Pod 硬亲和(核心配置)
      # ==============================
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - redis   # 必须跟标签 app=redis 的 Pod 在一起
            topologyKey: kubernetes.io/hostname  # 同一节点才算同一个域

topologyKey:拓扑域,主要针对宿主机,相当于对宿主机进行区域的划分。用label进行判断,不同的key和不同的value是属于不同的拓扑域

复制代码
# apply查看运行在那个节点
[root@k8s-master01 podAffinity]# kubectl apply -f hard-podaffinity.yaml 

[root@k8s-master01 podAffinity]# kubectl get pod -n tcloud -o wide
NAME                                 READY   STATUS    RESTARTS   AGE    IP              NODE         NOMINATED NODE   READINESS GATES
pod-affinity-hard-599cb76bd8-4c7x2   1/1     Running   0          2s     10.244.58.215   k8s-node02   <none>           <none>
pod-affinity-hard-599cb76bd8-7vl4d   1/1     Running   0          2s     10.244.58.217   k8s-node02   <none>           <none>
redis-7b6f9ff8cd-lt792               1/1     Running   0          3m6s   10.244.58.214   k8s-node02   <none>           <none>
可以发现pod找到了我们设定的规则 也运行在node02上了

# 删除依赖pod
[root@k8s-master01 podAffinity]# kubectl delete -f podaffinity.yaml 


# 再删除硬亲和的pod
[root@k8s-master01 podAffinity]# kubectl delete pod -n tcloud pod-affinity-hard-599cb76bd8-2d6nh 


# 查看重建的pod跑到那个节点
[root@k8s-master01 podAffinity]# kubectl get pod -n tcloud -o wide
NAME                                 READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
pod-affinity-hard-599cb76bd8-4s2wd   0/1     Pending   0          14s   <none>          <none>       <none>           <none>
pod-affinity-hard-599cb76bd8-75cch   1/1     Running   0          76s   10.244.58.219   k8s-node02   <none>           <none>
状态是Pending,找不到对应的pod,所以直接Pending

# 删除资源
[root@k8s-master01 podAffinity]# kubectl delete -f hard-podaffinity.yaml 
软亲和

尽量和目标 Pod 部署在同一拓扑域,有位置就凑一起,没位置也能调度到其他节点,不会 Pending

关键参数:preferredDuringSchedulingIgnoredDuringExecution

复制代码
# 创建依赖pod
[root@k8s-master01 podAffinity]# kubectl apply -f podaffinity.yaml 

# 创建软亲和的yaml文件
[root@k8s-master01 podAffinity]# vim soft-podaffinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pod-affinity-soft
  namespace: tcloud
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend-soft
  template:
    metadata:
      labels:
        app: backend-soft
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent

      affinity:
        podAffinity:
          # 软亲和:尽量在一起
          preferredDuringSchedulingIgnoredDuringExecution:
          # 权重 1~100,数值越大优先程度越高
          - weight: 100
            podAffinityTerm:
              # 找 app=redis 的 Pod
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - redis
              # 按同一节点作为拓扑域
              topologyKey: kubernetes.io/hostname

# apply
[root@k8s-master01 podAffinity]# kubectl apply -f soft-podaffinity.yaml 

# 查看运行的节点,都是node02
[root@k8s-master01 podAffinity]# kubectl get pod -n tcloud -o wide
NAME                                 READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
pod-affinity-soft-5cb99988f7-c4vfx   1/1     Running   0          14s     10.244.58.222   k8s-node02   <none>           <none>
pod-affinity-soft-5cb99988f7-dklzw   1/1     Running   0          14s     10.244.58.221   k8s-node02   <none>           <none>
pod-affinity-soft-5cb99988f7-g5mg2   1/1     Running   0          14s     10.244.58.223   k8s-node02   <none>           <none>
redis-7b6f9ff8cd-58jsg               1/1     Running   0          2m45s   10.244.58.220   k8s-node02   <none>           <none>

# 删除依赖pod,在删除软亲和的pod
[root@k8s-master01 podAffinity]# kubectl delete -f podaffinity.yaml 
[root@k8s-master01 podAffinity]# kubectl delete pod -n tcloud pod-affinity-soft-5cb99988f7-c4vfx 


# 查看重建的pod运行在那个节点
[root@k8s-master01 podAffinity]# kubectl get pod -n tcloud -o wide
NAME                                 READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod-affinity-soft-5cb99988f7-dklzw   1/1     Running   0          86s   10.244.58.221    k8s-node02   <none>           <none>
pod-affinity-soft-5cb99988f7-dxn47   1/1     Running   0          24s   10.244.135.133   k8s-node03   <none>           <none>
pod-affinity-soft-5cb99988f7-g5mg2   1/1     Running   0          86s   10.244.58.223    k8s-node02   <none>           <none>
重建的pod运行在了node03上

# 删除资源
[root@k8s-master01 podAffinity]# kubectl delete -f soft-podaffinity.yaml 
PodAntiAffinity(pod反亲和性)

PodAntiAffinity主要实现以运行的Pod为参照,让新创建的Pod跟参照pod不在一个区域中的功能。

多副本强制打散到不同节点,避免一台节点挂了,服务全挂,保障高可用。

硬反亲和

同一个拓扑域(同一节点)不能存在多个相同标签的 Pod必须打散,做不到就 Pending,绝不扎堆。生产用来做高可用:多副本强制分到不同节点,防止单节点挂掉服务全崩。

可以用来反本身的标签,也可以反别的pod的标签

关键参数:requiredDuringSchedulingIgnoredDuringExecution

复制代码
[root@k8s-master01 podAntiAffinity]# vim hard-podAntiAffinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-antiaff-hard
  namespace: tcloud
spec:
  replicas: 6  # 副本数6
  selector:
    matchLabels:
      app: web-antiaff-hard
  template:
    metadata:
      labels:
        app: web-antiaff-hard  # Pod 标签,反亲和依据这个标签匹配
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent  # 本地有镜像就不拉取,没有才拉取

      # ====================== 硬反亲和配置 ======================
      affinity:
        # Pod 反亲和:不让相同的 Pod 调度到同一类节点
        podAntiAffinity:
          # 硬反亲和(必须满足,不满足则调度失败,Pod 处于 Pending)
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              # 选择带有以下标签的 Pod
              matchExpressions:
              - key: app        # 标签 key
                operator: In    # 操作符:属于以下值列表
                values:
                # 根据自身的标签打散pod运行节点
                - web-antiaff-hard  # 标签 value,和当前 Pod 标签一致
            # 拓扑域:以节点为单位(不同 node 视为不同域)
            # kubernetes.io/hostname 表示:每个节点是一个独立域
            topologyKey: kubernetes.io/hostname

# 我们这里是副本的是6,因为我们的集群共有6个节点,master上没有打污点所有master也可以运行
# apply
[root@k8s-master01 podAntiAffinity]# kubectl apply -f hard-podAntiAffinity.yaml 

# 查看pod的运行节点
[root@k8s-master01 podAntiAffinity]# kubectl get pod -n tcloud -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP               NODE           NOMINATED NODE   READINESS GATES
web-antiaff-hard-7d6c64ff9c-cqm8p   1/1     Running   0          3m31s   10.244.135.134   k8s-node03     <none>           <none>
web-antiaff-hard-7d6c64ff9c-fzszg   1/1     Running   0          3m31s   10.244.195.51    k8s-master03   <none>           <none>
web-antiaff-hard-7d6c64ff9c-h7crk   1/1     Running   0          3m31s   10.244.58.224    k8s-node02     <none>           <none>
web-antiaff-hard-7d6c64ff9c-hcxv7   1/1     Running   0          3m31s   10.244.32.166    k8s-master01   <none>           <none>
web-antiaff-hard-7d6c64ff9c-qxpv9   1/1     Running   0          3m31s   10.244.85.243    k8s-node01     <none>           <none>
web-antiaff-hard-7d6c64ff9c-x5csn   1/1     Running   0          3m31s   10.244.122.150   k8s-master02   <none>           <none>
都是运行在不用的节点

# 现在扩容到7副本
# 查看pod运行节点
[root@k8s-master01 podAntiAffinity]# kubectl get pod -n tcloud -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP               NODE           NOMINATED NODE   READINESS GATES
web-antiaff-hard-7d6c64ff9c-cqm8p   1/1     Running   0          4m36s   10.244.135.134   k8s-node03     <none>           <none>
web-antiaff-hard-7d6c64ff9c-fzszg   1/1     Running   0          4m36s   10.244.195.51    k8s-master03   <none>           <none>
web-antiaff-hard-7d6c64ff9c-h7crk   1/1     Running   0          4m36s   10.244.58.224    k8s-node02     <none>           <none>
web-antiaff-hard-7d6c64ff9c-hcxv7   1/1     Running   0          4m36s   10.244.32.166    k8s-master01   <none>           <none>
web-antiaff-hard-7d6c64ff9c-qxpv9   1/1     Running   0          4m36s   10.244.85.243    k8s-node01     <none>           <none>
web-antiaff-hard-7d6c64ff9c-rr48w   0/1     Pending   0          7s      <none>           <none>         <none>           <none>
web-antiaff-hard-7d6c64ff9c-x5csn   1/1     Running   0          4m36s   10.244.122.150   k8s-master02   <none>           <none>
可以看到直接Pending了

# 删除资源
[root@k8s-master01 podAntiAffinity]# kubectl delete -f hard-podAntiAffinity.yaml 
软反亲和

尽量把同服务副本分散、不挤在同一节点,节点资源紧张、不够分时,允许挤在同一节点,不会出现 Pending,容错性很强

关键参数:preferredDuringSchedulingIgnoredDuringExecution

复制代码
# 创建软反亲和的pod
[root@k8s-master01 podAntiAffinity]# vim soft-podAntiAffinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-antiaff-soft  # Deployment名称
  namespace: tcloud       # 命名空间
spec:
  replicas: 6             # 副本数:6个Pod
  selector:
    matchLabels:
      app: web-antiaff-soft  # 标签选择器,管理带此标签的Pod
  template:
    metadata:
      labels:
        app: web-antiaff-soft  # Pod自带标签,反亲和依据此标签匹配
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent  # 本地有镜像则不拉取,提高启动速度

      # ====================== Pod 软反亲和配置 ======================
      affinity:
        # Pod反亲和:尽量让相同Pod分散调度
        podAntiAffinity:
          # 软反亲和(重点)
          # 规则:尽量满足,不满足也可以调度,不会导致Pod Pending
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100  # 权重:1-100,数值越高,优先级越强
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app        # 匹配标签的key
                  operator: In    # 操作符:包含在values列表中
                  values:
                  - web-antiaff-soft  # 以自身的标签打散,比如集群中有一个app = mysql 的 不想和这个mysql服务跑在同一个节点,就更换为mysql就行
              # 拓扑域:以节点为单位(每个节点是一个独立域)
              # 作用:尽量不让带相同标签的Pod调度到同一个节点
              topologyKey: kubernetes.io/hostname

# apply 查看pod运行节点
[root@k8s-master01 podAntiAffinity]# kubectl apply -f soft-podAntiAffinity.yaml 
[root@k8s-master01 podAntiAffinity]# kubectl get pod -n tcloud -o wide
[root@k8s-master01 podAntiAffinity]# kubectl get pod -n tcloud -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
web-antiaff-soft-9b5ff75c-5r7sm   1/1     Running   0          10s   10.244.32.167    k8s-master01   <none>           <none>
web-antiaff-soft-9b5ff75c-6lg9j   1/1     Running   0          10s   10.244.122.151   k8s-master02   <none>           <none>
web-antiaff-soft-9b5ff75c-dcqn8   1/1     Running   0          10s   10.244.58.225    k8s-node02     <none>           <none>
web-antiaff-soft-9b5ff75c-fl6q7   1/1     Running   0          10s   10.244.85.244    k8s-node01     <none>           <none>
web-antiaff-soft-9b5ff75c-kdmfz   1/1     Running   0          10s   10.244.195.52    k8s-master03   <none>           <none>
web-antiaff-soft-9b5ff75c-vtk68   1/1     Running   0          10s   10.244.135.135   k8s-node03     <none>           <none>


# 扩容副本为7
[root@k8s-master01 podAntiAffinity]# kubectl get pod -n tcloud -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
web-antiaff-soft-9b5ff75c-5r7sm   1/1     Running   0          55s   10.244.32.167    k8s-master01   <none>           <none>
web-antiaff-soft-9b5ff75c-6lg9j   1/1     Running   0          55s   10.244.122.151   k8s-master02   <none>           <none>
web-antiaff-soft-9b5ff75c-dcqn8   1/1     Running   0          55s   10.244.58.225    k8s-node02     <none>           <none>
web-antiaff-soft-9b5ff75c-fl6q7   1/1     Running   0          55s   10.244.85.244    k8s-node01     <none>           <none>
web-antiaff-soft-9b5ff75c-kdmfz   1/1     Running   0          55s   10.244.195.52    k8s-master03   <none>           <none>
web-antiaff-soft-9b5ff75c-vtk68   1/1     Running   0          55s   10.244.135.135   k8s-node03     <none>           <none>
web-antiaff-soft-9b5ff75c-z5th8   1/1     Running   0          3s    10.244.58.226    k8s-node02     <none>           <none>
新pod运行在node02上了,状态也是running

# 删除资源
[root@k8s-master01 podAntiAffinity]# kubectl delete -f soft-podAntiAffinity.yaml 

3)污点和容忍

污点 Taint:打在节点上,相当于节点设门槛、拒接普通 Pod

容忍 Toleration:打在Pod上,相当于 Pod有通行证,能容忍节点污点、被调度上去

kubernetes有一些内置污点(触发自动打,无需手动操作)

污点 Key 触发条件 默认 Effect
node.kubernetes.io/not-ready 节点未就绪 Ready=False NoSchedule、NoExecute
node.kubernetes.io/unreachable 节点失联、kubelet 断连 Ready=Unknown NoSchedule、NoExecute
node.kubernetes.io/memory-pressure 内存压力不足 NoSchedule
node.kubernetes.io/disk-pressure 磁盘空间 / IO 压力大 NoSchedule
node.kubernetes.io/pid-pressure 系统进程 PID 耗尽 NoSchedule
node.kubernetes.io/network-unavailable 网络插件未就绪 NoSchedule
node.kubernetes.io/unschedulable 手动执行 kubectl cordon 锁节点 NoSchedule

三种策略:

  1. NoSchedule:绝不调度新 Pod 过来,已在上面的不驱逐

  2. PreferNoSchedule:尽量不调度,实在没节点也能调度

  3. NoExecute:绝不调度 + 已在节点上的 Pod 直接驱逐

    给节点打污点

    kubectl taint nodes 节点名 key=value:NoSchedule

    查看节点污点

    kubectl describe node 节点名 | grep Taints

    删除污点(末尾加 -)

    kubectl taint nodes 节点名 key=value:NoSchedule-

污点常用场景:

  • master 节点默认有污点,普通 Pod 不上调度,保证控制平面独占
  • 特殊机器隔离:日志节点、监控节点、数据库节点,只允许专属 Pod 部署
  • 节点维护时打 NoExecute,驱逐 Pod 做迁移
  • 区分业务环境:生产 / 测试节点隔离

容忍有5种标准写法

1、精确匹配(最常用)

复制代码
tolerations:
- key: "app"
  operator: "Equal"
  value: "test"
  effect: "NoSchedule"

2、匹配同key所有Effect

复制代码
不写 effect,匹配该 key 下所有污点效果

tolerations:
- key: "app"
  operator: "Equal"
  value: "test"

3、Exists只匹配key(任意value)

复制代码
只要节点有这个 key 就容忍,不用写 value

tolerations:
- key: "app"
  operator: "Exists"

4、万能容忍(匹配所有污点)

复制代码
空 key+Exists,无视所有节点污点
tolerations:
- operator: "Exists"

5、NoExecute延迟驱逐

复制代码
设置 tolerationSeconds 延迟驱逐时间

tolerations:
- key: "node"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300
NoSchedule

绝不调度新 Pod 过来,已在上面的不驱逐

复制代码
# 给k8s-node01打NoSchedule污点
[root@k8s-master01 ~]# kubectl taint node k8s-node01 node=lock:NoSchedule

# 查看节点污点
[root@k8s-master01 ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             node=lock:NoSchedule

# 查看node01上之前运行的pod
[root@k8s-master01 Taint]# kubectl get pod -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS        AGE   IP               NODE           NOMINATED NODE   READINESS GATES
calico-kube-controllers-658896864b-4w6hn   1/1     Running   12 (130m ago)   8d    192.168.1.11     k8s-master02   <none>           <none>
calico-node-8krq6                          1/1     Running   14 (130m ago)   11d   192.168.1.13     k8s-node01     <none>           <none>
calico-node-fs4zn                          1/1     Running   14 (130m ago)   11d   192.168.1.10     k8s-master01   <none>           <none>
calico-node-g6pr2                          1/1     Running   14 (130m ago)   11d   192.168.1.11     k8s-master02   <none>           <none>
calico-node-l9m8k                          1/1     Running   14 (130m ago)   11d   192.168.1.15     k8s-node03     <none>           <none>
calico-node-nj6jh                          1/1     Running   14 (130m ago)   11d   192.168.1.14     k8s-node02     <none>           <none>
calico-node-ps59q                          1/1     Running   14 (130m ago)   11d   192.168.1.12     k8s-master03   <none>           <none>
coredns-6659878fb9-5tvrj                   1/1     Running   14 (130m ago)   11d   10.244.122.149   k8s-master02   <none>           <none>
metrics-server-6f49b5d9d9-xd7wm            1/1     Running   25 (130m ago)   11d   10.244.85.242    k8s-node01     <none>           <none>

也是没有被驱赶的

# 创建deployment
[root@k8s-master01 Taint]# vim taint-noschedule-no-tol.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: taint-no-tol
  namespace: tcloud
spec:
  replicas: 10
  selector:
    matchLabels:
      app: taint-no-tol
  template:
    metadata:
      labels:
        app: taint-no-tol
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent

# apply查看pod运行的节点
[root@k8s-master01 Taint]# kubectl apply -f taint-noschedule-no-tol.yaml
[root@k8s-master01 Taint]# kubectl get pod -n tcloud -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
taint-no-tol-9575648d8-7xm6p   1/1     Running   0          6s    10.244.135.137   k8s-node03     <none>           <none>
taint-no-tol-9575648d8-9l97x   1/1     Running   0          6s    10.244.32.168    k8s-master01   <none>           <none>
taint-no-tol-9575648d8-cmfs9   1/1     Running   0          6s    10.244.122.153   k8s-master02   <none>           <none>
taint-no-tol-9575648d8-hn5wh   1/1     Running   0          6s    10.244.58.227    k8s-node02     <none>           <none>
taint-no-tol-9575648d8-ldzw9   1/1     Running   0          6s    10.244.195.54    k8s-master03   <none>           <none>
taint-no-tol-9575648d8-lrtrh   1/1     Running   0          6s    10.244.32.169    k8s-master01   <none>           <none>
taint-no-tol-9575648d8-n2dtq   1/1     Running   0          6s    10.244.122.152   k8s-master02   <none>           <none>
taint-no-tol-9575648d8-p559w   1/1     Running   0          6s    10.244.58.228    k8s-node02     <none>           <none>
taint-no-tol-9575648d8-sbkth   1/1     Running   0          6s    10.244.135.136   k8s-node03     <none>           <none>
taint-no-tol-9575648d8-w8v24   1/1     Running   0          6s    10.244.195.53    k8s-master03   <none>           <none>
发现node01上没有pod运行

# 删除deployment
[root@k8s-master01 Taint]# kubectl delete deployments.apps -n tcloud taint-no-tol 

# 修改deployment加上容忍
[root@k8s-master01 Taint]# vim taint-noschedule-no-tol.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: taint-has-tol
  namespace: tcloud
spec:
  replicas: 4
  selector:
    matchLabels:
      app: taint-has-tol
  template:
    metadata:
      labels:
        app: taint-has-tol
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent
      # 容忍匹配 node=lock:NoSchedule 污点
      tolerations:
      - key: "node"
        operator: "Equal"
        value: "lock"
        effect: "NoSchedule"

# apply查看运行节点
[root@k8s-master01 Taint]# kubectl apply -f taint-noschedule-no-tol.yaml
[root@k8s-master01 Taint]# kubectl get pod -n tcloud -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
taint-has-tol-db79db4c-mv5hz   1/1     Running   0          7s    10.244.135.139   k8s-node03     <none>           <none>
taint-has-tol-db79db4c-pwzrk   1/1     Running   0          7s    10.244.58.230    k8s-node02     <none>           <none>
taint-has-tol-db79db4c-r45qq   1/1     Running   0          7s    10.244.85.246    k8s-node01     <none>           <none>
taint-has-tol-db79db4c-sfdj9   1/1     Running   0          7s    10.244.195.56    k8s-master03   <none>           <none>

可以发现node01上有pod运行,说明他容忍了这个污点

# 清除污点
[root@k8s-master01 Taint]# kubectl taint nodes k8s-node01 node=lock:NoSchedule-
[root@k8s-master01 Taint]# kubectl delete -f taint-noschedule-no-tol.yaml 
PreferNoSchedule

尽量不把 Pod 调度到该污点节点

能迁就就迁就,没其他节点时也能调度上去

不会把 Pod 卡成 Pending,只是调度偏好

整体环境规划

3 台 master:打 NoSchedule 污点,禁止业务 Pod 调度进来(生产标准做法)

k8s-node01:打 PreferNoSchedule 软污点

k8s-node02、k8s-node03:无污点,正常业务节点

用 Pod 资源请求压满 node02/node03,观察多余 Pod只能被迫跑到 node01,不会跑到 master

复制代码
# 给所有master节点打NoSchedule污点
[root@k8s-master01 Taint]# kubectl taint nodes k8s-master01 node-role.kubernetes.io/control-plane:NoSchedule
node/k8s-master01 tainted
[root@k8s-master01 Taint]# kubectl taint nodes k8s-master02 node-role.kubernetes.io/control-plane:NoSchedule
node/k8s-master02 tainted
[root@k8s-master01 Taint]# kubectl taint nodes k8s-master03 node-role.kubernetes.io/control-plane:NoSchedule

3 台 master:有 NoSchedule 污点 → 普通业务 Pod 进不来
3 台 node:纯业务调度节点

# 给node01打PreferNoSchedule 软污点
[root@k8s-master01 Taint]# kubectl taint nodes k8s-node01 app=soft:PreferNoSchedule

# 编写压力测试yaml
[root@k8s-master01 Taint]# vim pressure.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prefer-pressure
  namespace: tcloud
spec:
  replicas: 3
  selector:
    matchLabels:
      app: prefer-pressure
  template:
    metadata:
      labels:
        app: prefer-pressure
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent
        # 资源请求:1核1500Mi,单node最多跑1个
        resources:
          requests:
            cpu: "1000m"
            memory: "1500Mi"

# apply
[root@k8s-master01 Taint]# kubectl apply -f pressure.yaml

# 查看pod运行节点
[root@k8s-master01 Taint]# kubectl get pod -n tcloud -o wide
NAME                               READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
prefer-pressure-86dd98c4b5-dnqpf   1/1     Running   0          2m23s   10.244.85.247    k8s-node01   <none>           <none>
prefer-pressure-86dd98c4b5-qxh9h   1/1     Running   0          2m23s   10.244.58.231    k8s-node02   <none>           <none>
prefer-pressure-86dd98c4b5-tzwn9   1/1     Running   0          2m23s   10.244.135.140   k8s-node03   <none>           <none>
可以看见node02和node03的资源被占满,多余的副本只能跑到打了软污点的node01上

# 清除污点
[root@k8s-master01 Taint]# kubectl taint nodes k8s-node01 app=soft:PreferNoSchedule-
[root@k8s-master01 Taint]# kubectl taint nodes k8s-master01 node-role.kubernetes.io/control-plane:NoSchedule-
[root@k8s-master01 Taint]# kubectl taint nodes k8s-master02 node-role.kubernetes.io/control-plane:NoSchedule-
[root@k8s-master01 Taint]# kubectl taint nodes k8s-master03 node-role.kubernetes.io/control-plane:NoSchedule-

# 删除deployment
[root@k8s-master01 Taint]# kubectl delete -f pressure.yaml 
NoExecute

禁止新 Pod 调度到该节点(和 NoSchedule 一样)

立刻驱逐已经在节点上运行的旧 Pod

没有对应容忍的 Pod,直接被踢走

复制代码
# 查看集群的所有pod运行节点
[root@k8s-master01 Taint]# kubectl get pod -A -o wide
NAMESPACE       NAME                                       READY   STATUS    RESTARTS        AGE     IP               NODE           NOMINATED NODE   READINESS GATES
ingress-nginx   ingress-nginx-controller-5zbcx             1/1     Running   8 (154m ago)    5d17h   192.168.1.10     k8s-master01   <none>           <none>
kube-system     calico-kube-controllers-658896864b-4w6hn   1/1     Running   12 (154m ago)   8d      192.168.1.11     k8s-master02   <none>           <none>
kube-system     calico-node-8krq6                          1/1     Running   14 (154m ago)   11d     192.168.1.13     k8s-node01     <none>           <none>
kube-system     calico-node-fs4zn                          1/1     Running   14 (154m ago)   11d     192.168.1.10     k8s-master01   <none>           <none>
kube-system     calico-node-g6pr2                          1/1     Running   14 (154m ago)   11d     192.168.1.11     k8s-master02   <none>           <none>
kube-system     calico-node-l9m8k                          1/1     Running   14 (154m ago)   11d     192.168.1.15     k8s-node03     <none>           <none>
kube-system     calico-node-nj6jh                          1/1     Running   14 (154m ago)   11d     192.168.1.14     k8s-node02     <none>           <none>
kube-system     calico-node-ps59q                          1/1     Running   14 (154m ago)   11d     192.168.1.12     k8s-master03   <none>           <none>
kube-system     coredns-6659878fb9-5tvrj                   1/1     Running   14 (154m ago)   11d     10.244.122.149   k8s-master02   <none>           <none>
kube-system     metrics-server-6f49b5d9d9-xd7wm            1/1     Running   25 (154m ago)   11d     10.244.85.242    k8s-node01     <none>           <none>


node01上运行的pod有:
ingress-nginx-controller-5zbcx
calico-node-8krq6
metrics-server-6f49b5d9d9-xd7wm 

# 给node01上打NoExecute污点
[root@k8s-master01 Taint]# kubectl taint nodes k8s-node01 test=evict:NoExecute

# 查看pod的运行节点变化
[root@k8s-master01 Taint]# kubectl get pod -A -o wide 
NAMESPACE       NAME                                       READY   STATUS    RESTARTS         AGE   IP               NODE           NOMINATED NODE   READINESS GATES
ingress-nginx   ingress-nginx-controller-l59g4             1/1     Running   0                36m   192.168.1.10     k8s-master01   <none>           <none>
kube-system     calico-kube-controllers-658896864b-4w6hn   1/1     Running   12 (3h18m ago)   9d    192.168.1.11     k8s-master02   <none>           <none>
kube-system     calico-node-8krq6                          1/1     Running   14 (3h18m ago)   11d   192.168.1.13     k8s-node01     <none>           <none>
kube-system     calico-node-fs4zn                          1/1     Running   14 (3h18m ago)   11d   192.168.1.10     k8s-master01   <none>           <none>
kube-system     calico-node-g6pr2                          1/1     Running   14 (3h18m ago)   11d   192.168.1.11     k8s-master02   <none>           <none>
kube-system     calico-node-l9m8k                          1/1     Running   14 (3h18m ago)   11d   192.168.1.15     k8s-node03     <none>           <none>
kube-system     calico-node-nj6jh                          1/1     Running   14 (3h18m ago)   11d   192.168.1.14     k8s-node02     <none>           <none>
kube-system     calico-node-ps59q                          1/1     Running   14 (3h18m ago)   11d   192.168.1.12     k8s-master03   <none>           <none>
kube-system     coredns-6659878fb9-5tvrj                   1/1     Running   14 (3h18m ago)   11d   10.244.122.149   k8s-master02   <none>           <none>
kube-system     metrics-server-6f49b5d9d9-45qx9            1/1     Running   0                69s   10.244.58.232    k8s-node02     <none>           <none>


发现只有metrics-server被驱逐到node02上了,calico没有被驱逐,是因为calico的容忍是万能容忍
kubectl describe daemonset -n kube-system calico-node  | grep NoExecute
:NoExecute op=Exists

# 编写无容忍的pod强制调度到node01
[root@k8s-master01 Taint]# vim deploy-evict.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: taint-no-tol
  namespace: tcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      app: taint-no-tol
  template:
    metadata:
      labels:
        app: taint-no-tol
    spec:
      # 强制运行在 node01
      nodeName: k8s-node01
      
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent

# apply 查看pod运行情况
[root@k8s-master01 Taint]# kubectl get pod -n tcloud -o wide
NAME                            READY   STATUS              RESTARTS   AGE   IP       NODE         NOMINATED NODE   READINESS GATES
taint-no-tol-7859cdbf4d-29mhf   0/1     Terminating         0          17s   <none>   k8s-node01   <none>           <none>
taint-no-tol-7859cdbf4d-2q4z4   0/1     Terminating         0          2s    <none>   k8s-node01   <none>           <none>
taint-no-tol-7859cdbf4d-45hkj   0/1     Terminating         0          16s   <none>   k8s-node01   <none>           <none>
taint-no-tol-7859cdbf4d-4kfgd   0/1     Terminating         0          17s   <none>   k8s-node01   <none>           <none>
taint-no-tol-7859cdbf4d-4w5ld   0/1     Terminating         0          5s    <none>   k8s-node01   <none>           <none>
taint-no-tol-7859cdbf4d-55mrc   0/1     Terminating         0          12s   <none>   k8s-node01   <none>           <none>
taint-no-tol-7859cdbf4d-5gfm2   0/1     Terminating         0          1s    <none>   k8s-node01   <none>           <none>
taint-no-tol-7859cdbf4d-5jt6x   0/1     Pending             0          0s    <none>   k8s-node01   <none>           
...
可以看到状态一致在删除,重建,因为强制让pod运行在node01上
# 删除deployment
[root@k8s-master01 Taint]# kubectl delete -f deploy-evict.yaml 

# 编辑yaml文件添加容忍NoExecte
[root@k8s-master01 Taint]# vim deploy-evict.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: taint-no-tol
  namespace: tcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      app: taint-no-tol
  template:
    metadata:
      labels:
        app: taint-no-tol
    spec:
      # 强制运行在 node01
      nodeName: k8s-node01
      
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        imagePullPolicy: IfNotPresent
      # 容忍 NoExecute 污点(必须和污点一致)
      tolerations:
      - key: "test"
        operator: "Equal"
        value: "evict"
        effect: "NoExecute"

# apply 查看pod运行情况
[root@k8s-master01 Taint]# kubectl get pod -n tcloud -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
taint-no-tol-56ffb7b844-6v8xz   1/1     Running   0          34s   10.244.85.248   k8s-node01   <none>           <none>
加了容忍就可以运行在node01上了

# 清除环境
[root@k8s-master01 Taint]# kubectl delete -f deploy-evict.yaml 
# 清除污点
[root@k8s-master01 Taint]# kubectl taint nodes k8s-node01 test=evict:NoExecute-

二、资源管控

1、ResourceQuota

ResourceQuota 是生产集群的「资源保险丝」,防止业务 "裸奔",避免一个应用拖垮整个集群。

它是 Kubernetes 的内置准入控制资源,作用于整个 Namespace,用来限制该命名空间下所有资源的总使用量

它可以限制:

  • 计算资源:requests.cpu / requests.memory / limits.cpu / limits.memory
  • 存储资源:PVC 数量、总存储容量
  • 资源对象数量:Pod、Deployment、Service、ConfigMap 等的最大数量

1)为什么生产一定要使用ResourceQuota

1、防止【资源泄漏】导致节点被打满

如果没有 ResourceQuota:

  1. 某个应用出现 Bug,不断创建新 Pod 或占用大量内存 / CPU
  2. 它会耗尽节点资源,导致同节点的其他业务被挤掉、节点宕机
  3. 严重时会引发集群雪崩

有了 ResourceQuota:

  • 命名空间的总资源被锁死,应用再怎么泄漏,也不会超出配额
  • 其他业务、节点、集群整体稳定性不受影响

2、强制用户资源声明资源,避免"裸奔"pod

很多用户部署应用时,会忘记写 requests/limits:

  • 没 requests:调度器无法知道该 Pod 实际需要多少资源,可能调度到资源不足的节点,导致 Pod 运行缓慢甚至崩溃
  • 没 limits:Pod 可以无限制占用节点资源,直接拖垮宿主机

配合 LimitRange + ResourceQuota:

  • LimitRange 给没写资源的 Pod 注入默认值
  • ResourceQuota 限制总使用量,避免用户 "裸奔"

3、多租户环境的资源隔离与公平分配

在多团队共用的集群里,每个团队一个 Namespace:

  • 没有 ResourceQuota:资源抢用、互相影响,谁的应用能占资源谁就赢
  • 有 ResourceQuota:
  • 每个团队的资源配额是固定的,比如 dev 命名空间 4 核 8G,prod 命名空间 16 核 32G
  • 避免开发环境业务占用生产环境资源,保障核心业务的资源优先级

4、控制成本,避免资源浪费

  • 云环境下,CPU / 内存 / 存储都是按用量计费的
  • 没有 ResourceQuota,用户可能随意创建大量高规格 Pod,导致账单飙升
  • 有了 ResourceQuota,配额上限 = 成本上限,预算可控

5、防止恶意攻击或误操作

  • 误操作:比如 kubectl scale deployment app --replicas=1000,瞬间创建上千个 Pod
  • 恶意行为:攻击者利用集群漏洞,创建大量资源耗尽集群

2)资源配额

ResourceQuota 三大核心层级:metadata /spec/status
1. metadata

包含 ResourceQuota 的名称、所属命名空间、标签、注解等标准元数据。特点:资源配额仅对当前所在 namespace 生效。
2. spec

只包含核心配置项 hard(硬限制),超出直接拒绝创建资源。常用限制项:

requests.cpu:命名空间内所有 Pod CPU 请求总量上限

limits.cpu:命名空间内所有 Pod CPU 限制总量上限

requests.memory:命名空间内所有 Pod 内存请求总量上限

limits.memory:命名空间内所有 Pod 内存限制总量上限

persistentvolumeclaims:允许创建 PVC 的总数量

pods:允许运行 Pod 最大总数

还可限制:configmaps、secrets、services 等对象个数
3. status

系统自动维护,无需手动配置:

展示各类资源当前已使用量

展示各类资源配额硬上限

直观看到 已用 / 总量 占用情况

复制代码
# 创建ResourceQuota
[root@k8s-master01 ResourceQuota]# vim ResourceQuota.yaml
apiVersion: v1
kind: ResourceQuota    # 资源类型:资源配额
metadata:
  name: tcloud-quota   # 配额名称
  namespace: tcloud    # 只对 tcloud 这个命名空间生效
spec:
  hard:                # hard = 硬限制,绝对不能超过
    # ========== 1. 计算资源限制(最重要) ==========
    requests.cpu: "4"        # 所有Pod的CPU请求总和 ≤ 4核
    requests.memory: "8Gi"   # 所有Pod的内存请求总和 ≤ 8G
    limits.cpu: "8"          # 所有Pod的CPU限制总和 ≤ 8核
    limits.memory: "16Gi"    # 所有Pod的内存限制总和 ≤ 16G

    # ========== 2. Pod数量限制 ==========
    pods: "5"               # 这个命名空间最多跑 5 个Pod

    # ========== 3. 存储资源限制 ==========
    persistentvolumeclaims: 5  # 最多创建5个PVC
    requests.storage: "50Gi"   # 所有存储请求总和 ≤ 50G

    # ========== 4. K8s对象数量限制(可选) ==========
    services: "2"            # 最多2个Service
    configmaps: "10"         # 最多10个ConfigMap
    secrets: "10"           # 最多10个Secret

# apply
[root@k8s-master01 ResourceQuota]# kubectl apply -f ResourceQuota.yaml 

# 查看配额是否生效
[root@k8s-master01 ResourceQuota]# kubectl get quota -n tcloud
NAME           REQUEST                                                                                                                                                       LIMIT                                    AGE
tcloud-quota   configmaps: 1/10, persistentvolumeclaims: 0/5, pods: 0/5, requests.cpu: 0/4, requests.memory: 0/8Gi, requests.storage: 0/50Gi, secrets: 1/10, services: 0/2   limits.cpu: 0/8, limits.memory: 0/16Gi   48m

# 创建deployment
[root@k8s-master01 ResourceQuota]# vim deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-pod-quota
  namespace: tcloud
spec:
  replicas: 6
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0

        # 必须加这个!否则被 ResourceQuota 拦截
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

# apply查看pod数量
[root@k8s-master01 ResourceQuota]# kubectl get pod -n tcloud 
NAME                              READY   STATUS    RESTARTS   AGE
test-pod-quota-7dbc8f55b4-45w77   1/1     Running   0          <invalid>
test-pod-quota-7dbc8f55b4-82h8c   1/1     Running   0          <invalid>
test-pod-quota-7dbc8f55b4-9qkzp   1/1     Running   0          <invalid>
test-pod-quota-7dbc8f55b4-cfkht   1/1     Running   0          <invalid>
test-pod-quota-7dbc8f55b4-fdxts   1/1     Running   0          <invalid>
只能创建5个第6个被拦截

# 查看配额使用情况
[root@k8s-master01 ResourceQuota]# kubectl describe quota tcloud-quota -n tcloud
Name:                   tcloud-quota
Namespace:              tcloud
Resource                Used    Hard
--------                ----    ----
configmaps              1       10
limits.cpu              2500m   8
limits.memory           2560Mi  16Gi
persistentvolumeclaims  0       5
pods                    5       5
requests.cpu            500m    4
requests.memory         640Mi   8Gi
requests.storage        0       50Gi
secrets                 1       10
services                0       2

# 删除资源和ResourceQuato
[root@k8s-master01 ResourceQuota]# kubectl delete -f deployment.yaml 
[root@k8s-master01 ResourceQuota]# kubectl delete -f ResourceQuota.yaml 

2、LimitRange

1)什么是LimitRange

只有ResoueceQuota是不够的

LimitRange 是 Kubernetes 中的一种资源对象,用于限制命名空间内单个容器或 Pod 的资源使用量。它通过定义资源的最小值、最大值和默认值,确保资源分配的合理性和公平性。

  • 资源限制:限制单个容器或 Pod 的 CPU、内存等资源的使用量。
  • 默认值设置:为未指定资源请求或限制的容器提供默认值。
  • 避免资源浪费:防止过度分配资源,提高集群资源利用率。

ResourceQuota(总量限制) + LimitRange(自动补资源)

没有 LimitRange → 有配额也创建不了 Pod

有了 LimitRange → 自动补资源,

核心字段

层级 字段 作用说明
metadata name LimitRange 资源名称
metadata namespace 作用于当前指定命名空间,隔离生效
spec.limits[ ] type 限制类型,常用 Container 代表针对容器生效
spec.limits[ ] default 容器未定义 limits 时,自动补充CPU / 内存最大限制
spec.limits[ ] defaultRequest 容器未定义 requests 时,自动补充CPU / 内存申请值
spec.limits[ ] max 单个容器允许的 CPU、内存资源上限,超了直接拒绝创建
spec.limits[ ] min 单个容器允许的 CPU、内存资源下限,低于下限拒绝创建

2)限制类型

容器、整 Pod、存储 PVC都是可以限制的

限制类型 type 限制对象 管控内容 作用场景
Container 单个容器 单个容器 CPU / 内存 的默认值、最大值、最小值 规范每个容器资源,防止单容器资源超标
Pod 整个 Pod(所有容器总和) Pod 内所有容器合计 CPU / 内存 最大、最小值 限制一个 Pod 整体资源占用,避免 Pod 过大拖垮节点
PersistentVolumeClaim PVC 持久卷声明 单个 PVC 存储容量的默认值、最大值、最小值 管控单 PVC 存储空间,防止乱申请超大存储
复制代码
apiVersion: v1
kind: LimitRange
metadata:
  name: tcloud-lr      # LimitRange 规则名称
  namespace: tcloud    # 仅对 tcloud 命名空间生效
spec:
  limits:
  # 1、Container:限制【单个容器】资源规范
  - type: Container
    default:                  # 容器未配置limits时,自动填充默认资源上限
      cpu: 500m
      memory: 512Mi
    defaultRequest:          # 容器未配置requests时,自动填充默认资源申请值
      cpu: 100m
      memory: 128Mi
    max:                      # 单个容器允许的CPU/内存最大上限,超出拒绝创建
      cpu: 2
      memory: 2Gi
    min:                      # 单个容器允许的CPU/内存最小下限,低于拒绝创建
      cpu: 50m
      memory: 64Mi

  # 2、Pod:限制【整个Pod所有容器总和】资源
  - type: Pod
    max:                      # 单个Pod所有容器合计资源最大上限
      cpu: 4
      memory: 4Gi
    min:                      # 单个Pod所有容器合计资源最小下限
      cpu: 100m
      memory: 256Mi

  # 3、PersistentVolumeClaim:限制【PVC存储】容量
  - type: PersistentVolumeClaim
    max:                      # 单个PVC允许申请的存储最大容量
      storage: 100Gi
    min:                      # 单个PVC允许申请的存储最小容量
      storage: 1Gi
    default:                  # PVC未指定存储大小时,自动填充默认容量
      storage: 10Gi

3)配置默认资源限制

复制代码
# 创建LimitRange
[root@k8s-master01 limitrange]# vim limitrange.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: tcloud-lr      # 规则名字叫 tcloud-lr
  namespace: tcloud    # 只管 tcloud 这个命名空间
spec:
  limits:
  - type: Container    # 【容器】

  # --------------- 规矩 1:自动补默认值 ----------------
    default:           # 如果容器没写 limits
      cpu: 500m        # 自动给 0.5核上限
      memory: 512Mi    # 自动给 512M上限

    defaultRequest:    # 如果容器没写 requests
      cpu: 100m        # 自动给 0.1核申请
      memory: 128Mi    # 自动给 128M申请

  # --------------- 规矩 2:最大能配多少 ----------------
    max:               # 单个容器【最大】不能超过
      cpu: 2           # 2核
      memory: 2Gi      # 2G内存

  # --------------- 规矩 3:最小能配多少 ----------------
    min:               # 单个容器【最小】不能低于
      cpu: 50m         # 0.05核
      memory: 64Mi     # 64M

# apply
[root@k8s-master01 limitrange]# kubectl apply -f limitrange.yaml

# 查看LimitRange
[root@k8s-master01 limitrange]# kubectl get limitrange -n tcloud
NAME        CREATED AT
tcloud-lr   2026-05-01T08:03:04Z
[root@k8s-master01 limitrange]# kubectl describe limitrange tcloud-lr -n tcloud
Name:       tcloud-lr
Namespace:  tcloud
Type        Resource  Min   Max  Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---   ---  ---------------  -------------  -----------------------
Container   cpu       50m   2    100m             500m           -
Container   memory    64Mi  2Gi  128Mi            512Mi          -

# 创建不写资源配置的deployment
[root@k8s-master01 limitrange]# vim deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-lr-deploy
  namespace: tcloud
spec:
  replicas: 3
  selector:
    matchLabels:
      app: vue
  template:
    metadata:
      labels:
        app: vue
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0

# apply查看pod
[root@k8s-master01 limitrange]# kubectl get pod -n tcloud 
NAME                              READY   STATUS    RESTARTS   AGE
test-lr-deploy-547c68ccfd-brpf7   1/1     Running   0          43s
test-lr-deploy-547c68ccfd-gf4z4   1/1     Running   0          43s
test-lr-deploy-547c68ccfd-j9wfl   1/1     Running   0          43s


# 查看是否注入了Resources
[root@k8s-master01 limitrange]# kubectl get pod -n tcloud test-lr-deploy-547c68ccfd-brpf7 -o yaml | grep -A 10 resources
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 128Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
--
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 128Mi
    restartCount: 0
    started: true
    state:
      running:
Pod 自动拥有了 requests 和 limits!

# 删除deployment
[root@k8s-master01 limitrange]# kubectl delete -f deployment.yaml 

# 创建超出资源的deployment
[root@k8s-master01 limitrange]# vim deploy-over-limit.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-over-limit
  namespace: tcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      app: over
  template:
    metadata:
      labels:
        app: over
    spec:
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
        resources:
          limits:
            cpu: 3        # 超过 max: 2
            memory: 3Gi   # 超过 max: 2Gi

# apply 查看pod
[root@k8s-master01 limitrange]# kubectl apply -f deploy-over-limit.yaml 
[root@k8s-master01 limitrange]# kubectl get pod -n tcloud 
No resources found in tcloud namespace.
发现没有pod,直接被准入控制拒绝,创建失败!

# 查看拒绝日志
[root@k8s-master01 limitrange]# kubectl get events -n tcloud --sort-by=.metadata.creationTimestamp
...
 is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m21s       Normal    ScalingReplicaSet        deployment/deploy-over-limit              Scaled up replica set deploy-over-limit-6b499c4df7 from 0 to 1
3m21s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-tjbv6" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m21s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-bftgs" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m21s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-9fkrv" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m21s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-svbld" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m21s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-x7htx" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m20s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-59kh5" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m20s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-v95xk" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
3m19s       Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   Error creating: pods "deploy-over-limit-6b499c4df7-dnjf7" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
37s         Warning   FailedCreate             replicaset/deploy-over-limit-6b499c4df7   (combined from similar events): Error creating: pods "deploy-over-limit-6b499c4df7-fb6vl" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]

说明:
创建失败:Pod 被禁止创建
原因:
1. 单个容器最大允许 CPU 是 2核,却申请了 3核
2. 单个容器最大允许内存是 2Gi,却申请了 3Gi


# 也可以看RS的详细信息,就是创建的deployment管理的RS信息
[root@k8s-master01 limitrange]# kubectl describe replicaset -n tcloud deploy-over-limit-6b499c4df7
......
  Warning  FailedCreate  <invalid>                      replicaset-controller  Error creating: pods "deploy-over-limit-6b499c4df7-dnjf7" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]
  Warning  FailedCreate  <invalid> (x8 over <invalid>)  replicaset-controller  (combined from similar events): Error creating: pods "deploy-over-limit-6b499c4df7-brmmv" is forbidden: [maximum cpu usage per Container is 2, but limit is 3, maximum memory usage per Container is 2Gi, but limit is 3Gi]


# 清除资源
[root@k8s-master01 limitrange]# kubectl delete -f limitrange.yaml 
[root@k8s-master01 limitrange]# kubectl delete -f deploy-over-limit.yaml 

3、Quality of Service(服务质量)

1)什么是Qos

K8s 自动根据 Pod 里容器的 requests /limits 配置,把 Pod 划分成三种服务质量等级。

作用:

  1. 节点资源紧张时,决定先驱逐谁、后驱逐谁
  2. 决定 OOM 谁先被杀死
  3. 保障核心业务优先占用资源,不被普通业务抢占
  4. 生产用来做可用性保障、服务分级

Qos一共分为3种:

  • Guaranteed 保证型(最高优先级)
  • Burstable 突发型(中等优先级)
  • BestEffort 尽力型(最低优先级)

驱逐顺序(资源不够时)
BestEffort → Burstable → Guaranteed

先杀最低的,保留核心服务

复制代码
# Guaranteed 保证型
# 所有容器 requests = limits(CPU、内存都相等)

2)Qos详细规则

Guaranteed 保证型

满足条件

  • Pod 里所有容器都必须同时配置 requests 和 limits
  • CPU、内存 每项:requests = limits

特点:

  • 资源完全独占、不被抢占
  • 节点资源不足最后被驱逐
  • 稳定性最高

生产适用

数据库、订单、支付、核心中间件、不能挂的核心服务。

复制代码
[root@k8s-master01 qos]# vim qos-guaranteed.yaml
apiVersion: v1
kind: Pod
metadata:
  name: qos-guaranteed
  namespace: tcloud
spec:
  containers:
  - name: app
    image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
      limits:
        cpu: 500m
        memory: 512Mi

# 创建并查看qos
[root@k8s-master01 qos]# kubectl apply -f qos-guaranteed.yaml



[root@k8s-master01 qos]# kubectl get pod qos-guaranteed -n tcloud -o yaml | grep qosClass
  qosClass: Guaranteed
Burstable 突发型

满足条件

  • 不满足 Guaranteed
  • Pod 内至少有一个容器配置了 requests
  • 常见配置:requests < limits

特点:

  • 有保底资源,允许临时突发用更高资源
  • 优先级中等
  • 资源紧张时,比 BestEffort 晚被清理

生产适用

绝大多数普通业务、微服务、后端应用。

复制代码
[root@k8s-master01 qos]# vim qos-burstable.yaml
apiVersion: v1
kind: Pod
metadata:
  name: qos-burstable
  namespace: tcloud
spec:
  containers:
  - name: app
    image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 512Mi

# 创建并查看
[root@k8s-master01 qos]# kubectl apply -f qos-burstable.yaml


[root@k8s-master01 qos]# kubectl get pod qos-burstable -n tcloud -o yaml | grep qosClass
  qosClass: Burstable
BestEffort 尽力型

满足条件

  • Pod 内所有容器都不写 requests、limits 任何资源配置

特点

  • 无任何资源保障,只吃节点剩余空闲资源
  • 资源紧张第一个被驱逐、最先被杀

生产适用

  • 仅测试、开发环境、临时 Pod、离线任务;生产业务禁止使用。

    [root@k8s-master01 qos]# vim qos-besteffort.yaml
    apiVersion: v1
    kind: Pod
    metadata:
    name: qos-besteffort
    namespace: tcloud
    spec:
    containers:
    - name: app
    image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0

    创建并查看

    [root@k8s-master01 qos]# kubectl apply -f qos-besteffort.yaml

    [root@k8s-master01 qos]# kubectl get pod qos-besteffort -n tcloud -o yaml | grep qosClass
    qosClass: BestEffort

三、安全认证

1、访问控制概述

Kubernetes作为一个分布式集群的管理工具,保证集群的安全性是其一个重要的任务。所谓的安全性其实就是保证对Kubernetes的各种客户端进行认证和鉴权操作。

客户端

在Kubernetes集群中,客户端通常有两类:

  • User Account:一般是独立于kubernetes之外的其他服务管理的用户账号。
  • Service Account:kubernetes管理的账号,用于为Pod中的服务进程在访问Kubernetes时提供身份标识。


认证、授权与准入控制

ApiServer是访问及管理资源对象的唯一入口。任何一个请求访问ApiServer,都要经过下面三个流程:

  • Authentication(认证):身份鉴别,只有正确的账号才能够通过认证
  • Authorization(授权): 判断用户是否有权限对访问的资源执行特定的动作
  • Admission Control(准入控制):用于补充授权机制以实现更加精细的访问控制功能。

2、认证管理

在Kubernetes(简称K8s)集群中,客户端访问者主要分为两大类------普通用户与服务用户,二者的访问场景、安全需求截然不同,对应的认证方式也各有侧重。Kubernetes的客户端身份认证,从宏观通信模式上可分为HTTP Base、HTTP Token、HTTPS证书三种核心大类,从具体实现机制上可细分为7种细分方式,下文将结合「普通用户」「服务用户」的使用场景,逐一解析各类认证方式,兼顾入门正统性与生产实用性。

核心区分:普通用户指运维人员、开发人员等手动操作集群的人员;服务用户指集群内Pod、kubelet、controller-manager等自动访问API Server的组件/应用,二者认证方式的选择,核心围绕「易用性」「安全性」「自动化」三大需求。

User Account(运维/开发人员)

普通用户的核心需求是「便捷操作+适度安全」,主要通过手动方式(如kubectl命令、API调用)访问集群,适配的认证方式以宏观三大类为主

HTTP Base基础认证

采用用户名 + 密码方式,将 用户:密码 做 Base64 编码放在 HTTP 请求头中发给 APIServer,服务端解码校验身份。特点:配置简单,安全性弱,只适合测试环境,不用于生产。

HTTP Token认证

HTTP Token 认证通过一串唯一且难以伪造的 Token 字符串识别合法用户。每个 Token 对应绑定一个集群用户,客户端发起 API 请求时,在请求头中携带 Token 信息;API Server 收到请求后,将传入 Token 与服务端留存的令牌列表进行比对校验,以此确认普通用户身份合法性。相比 Base 认证安全性有所提升,配置轻量化,但令牌固定、灵活性差,仍只适用于测试和小型非生产场景。

HTTPS证书认证

HTTPS 证书认证基于 CA 根证书签名实现双向数字证书认证,是三种方式中安全性最高的一种。通信双方通过 CA 机构申请并颁发证书,客户端与 API Server 之间完成双向证书校验,确认彼此身份可信;协商加密密钥后全程密文通信,有效防止身份伪造与数据窃听。配置流程相对繁琐,但安全等级最高,是生产环境普通用户访问集群的首选认证方式。

在部署集群的时候采用的就是HTTPS的证书方式认证的

复制代码
# 查看 apiserver 启动参数
systemctl cat kube-apiserver
说明:
--basic-auth-file=xxx.csv   # 有这个参数 → 开启了 HTTP Base 认证
--token-auth-file=xxx.csv   # 有这个参数 → 开启了 HTTP Token 认证
--client-ca-file=/etc/kubernetes/pki/ca.pem   # 集群默认用 HTTPS 证书认证

Service Account(服务账号)

不同于普通用户,服务用户(如集群组件、业务 Pod)无人工操作场景,其核心需求是「自动化、无需人工干预、长期安全可靠」,无法依赖手动配置的账号密码或客户端证书,因此 Kubernetes 专门设计了 ServiceAccount(服务账号) 来解决这类程序身份认证问题。

ServiceAccount 是 Kubernetes 集群内置的资源对象,专门为 Pod 和后台组件提供身份标识,是服务用户访问 API Server 的标准方式。它的核心特点是:

  • 自动化身份管理:Kubernetes 会为每个 ServiceAccount 自动生成并签发 JWT 令牌,Pod 启动时默认将令牌挂载到容器内固定路径 /var/run/secrets/kubernetes.io/serviceaccount/,应用无需手动配置即可读取并使用该令牌发起 API 请求。
  • 与 RBAC 深度结合:可以通过 RBAC 规则,为不同 ServiceAccount 配置精细的资源访问权限,实现最小权限原则,避免程序越权访问集群资源。
  • 全生命周期托管:ServiceAccount 的创建、令牌的签发与轮换、资源的清理,均由集群控制器自动完成,无需人工干预,保障身份的安全性与一致性。

支撑 ServiceAccount(服务账号)全自动化 的核心角色,它们分工明确,共同完成「服务账号的创建、令牌的管理、Pod 挂载令牌」这一整套流程。

  • ServiceAccount Admission Controller
  • Token Controller
  • ServiceAccount Controller
ServiceAccount Admission Controller

它是 API Server 的内置准入控制器,在 Pod 创建请求被 API Server 处理前介入,完成两项关键工作:

  1. 自动绑定服务账号:如果 Pod 配置中未显式指定 spec.serviceAccountName,控制器会自动为 Pod 绑定当前命名空间下的 default 服务账号,确保每个 Pod 都有合法身份标识。

  2. 自动挂载令牌配置:为 Pod 注入一个特殊 Volume,将服务账号对应的 JWT 令牌 Secret 挂载到容器内固定路径 /var/run/secrets/kubernetes.io/serviceaccount/,让应用无需手动配置即可读取令牌,向 API Server 发起认证请求。

    查看kube-apiserver配置

    [root@k8s-master01 ~]# cat /usr/lib/systemd/system/kube-apiserver.service | grep enable-admission-plugins
    --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota \

    可以看到有ServiceAccount,说明控制器已经启用

    创建deployment

    [root@k8s-master01 serviceAccount]# vim deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: serviceaccount-pod
    namespace: tcloud
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: test
    template:
    metadata:
    labels:
    app: test
    spec:
    containers:
    - name: vue-login
    image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0

    apply

    [root@k8s-master01 serviceAccount]# kubectl apply -f deployment.yaml

    查看 Pod 的服务账号配置

    [root@k8s-master01 serviceAccount]# kubectl get pod -n tcloud serviceaccount-pod-86b9487458-t7xnz -o yaml | grep serviceAccount
    serviceAccount: default
    serviceAccountName: default
    - serviceAccountToken:
    控制器已自动为 Pod 绑定default服务账号

    进入pod查看挂载令牌

    [root@k8s-master01 serviceAccount]# kubectl exec -it serviceaccount-pod-86b9487458-t7xnz -n tcloud -- ls /var/run/secrets/kubernetes.io/serviceaccount/
    ca.crt namespace token
    控制器已将服务账号的令牌、CA 证书等文件自动挂载到容器内,应用可直接读取使用。

ServiceAccount Admission Controller 是服务账号自动化的第一道环节,它确保了所有 Pod 默认具备合法身份并自动获取认证凭证,无需人工干预配置。该控制器配合 Token Controller、ServiceAccount Controller,共同实现了服务账号从创建、令牌签发到 Pod 挂载的全流程自动化,为集群内服务用户提供了安全、便捷的认证方式。

Token Controller

Token Controller 运行在 kube-controller-manager 中,专门负责为 ServiceAccount 生成、维护和回收 JWT 令牌,是服务账号认证的 "令牌管理员"。

核心作用

1、自动创建令牌

当一个 ServiceAccount 被创建时,Token Controller 会自动为其生成对应的 Secret 资源,并签发包含服务账号身份信息的 JWT 令牌,存入该 Secret 中。(1.24+版本不生成Secret资源
2、令牌有效性维护

持续监控令牌状态,当令牌过期、失效或被篡改时,会自动更新并重新签发,确保服务账号的认证凭证始终有效。
3、资源生命周期同步

当 ServiceAccount 被删除时,Token Controller 会自动清理其关联的所有令牌 Secret,避免无效身份凭证残留,保障集群安全。

复制代码
# 查看 controller-manager 进程参数(systemd 部署)
[root@k8s-master01 serviceAccount]# ps aux | grep kube-controller-manager
root        1033  0.1  3.1 1297344 62392 ?       Ssl  09:05   0:07 /usr/local/bin/kube-controller-manager --v=2 --bind-address=127.0.0.1 --root-ca-file=/etc/kubernetes/pki/ca.pem --cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem --service-account-private-key-file=/etc/kubernetes/pki/sa.key --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig --leader-elect=true --use-service-account-credentials=true --node-monitor-grace-period=40s --node-monitor-period=5s --node-eviction-rate=0.1 --controllers=*,bootstrapsigner,tokencleaner --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem --node-cidr-mask-size=24
正常运行即代表 Token Controller 已启用。

# 创建自定义的服务账号
[root@k8s-master01 ~]# kubectl create sa test-sa -n tcloud


# 创建deployment使用创建的ServiceAccount
[root@k8s-master01 serviceAccount]# vim deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: serviceaccount-pod
  namespace: tcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      serviceAccountName: test-sa
      containers:
      - name: vue-login
        image: registry.cn-hangzhou.aliyuncs.com/gwjcloud/vue-login:v1.0


# apply
[root@k8s-master01 serviceAccount]# kubectl apply -f deployment.yaml 


# 查看容器内动态生成的令牌
[root@k8s-master01 serviceAccount]# kubectl exec -it serviceaccount-pod-578f9d45b6-k59s8 -n tcloud -- cat /var/run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsImtpZCI6ImNpVjg4Rmd6VjBEbGZwXzE4RVl0OGhNWTBjWV94a1hhR2V3QmhQR1d4aTAifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxODA5MjI2NDUxLCJpYXQiOjE3Nzc2OTA0NTEsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiNWRiNjExNWItYmFkOC00NjU3LWFiNDgtYjVmNTVjYTNkYTc4Iiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJ0Y2xvdWQiLCJub2RlIjp7Im5hbWUiOiJrOHMtbm9kZTAzIiwidWlkIjoiYjA2YjRjMGMtNTc0Ny00MTgxLWI5MDMtYWZjMjVmNTg2NmE5In0sInBvZCI6eyJuYW1lIjoic2VydmljZWFjY291bnQtcG9kLTU3OGY5ZDQ1YjYtazU5czgiLCJ1aWQiOiJmODg5M2NjZS1lZDc0LTRmYjctODMxYi1hODc1YTZjNDlkNDIifSwic2VydmljZWFjY291bnQiOnsibmFtZSI6InRlc3Qtc2EiLCJ1aWQiOiIwNmFhM2IxMy0yYjU5LTQwZWItYWMyYi04OGE5ZTBhZGM2NWQifSwid2FybmFmdGVyIjoxNzc3Njk0MDU4fSwibmJmIjoxNzc3NjkwNDUxLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6dGNsb3VkOnRlc3Qtc2EifQ.FG_oBmRR7m_UMWm1WsfMkmUmWtaZZCUz6lSZSt31kND_NiOq6w2d89omXmQel-6yg7Pho4Ady9LSkRiLNZNOxX1ZCmc6NDnFvyjZcvSU-tGFwS-QN3Tix3lscAiZ02m02UeRhScWz1jEXUw00MD0k3Zt9oz3iWVYpJlwVrH5ylJLau5BiOm3ppq1nvV8MS8mp08uy5eL0jM7FHk1Lm9vHmSD-h--sjY2NWwFPcbVbX4hu_hNbvW-hvWIEm5Vi0Jc-V8gQn-SYYOUa8SiHRglbz4mu9k3my9kPH2Umo8yFFFhtwBAD6mmIivk5D6mj1rDcmlXpqKux_XzusTPjcSS1g
输出一长串 JWT 令牌,由 Token Controller 动态签发。

# 验证令牌可访问集群API
[root@k8s-master01 serviceAccount]# kubectl exec -it serviceaccount-pod-578f9d45b6-k59s8 -n tcloud -- bash
root@serviceaccount-pod-578f9d45b6-k59s8:/# TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
root@serviceaccount-pod-578f9d45b6-k59s8:/# curl -H "Authorization: Bearer $TOKEN" https://kubernetes.default.svc/api/v1/namespaces/tcloud/pods -k
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "pods is forbidden: User \"system:serviceaccount:tcloud:test-sa\" cannot list resource \"pods\" in API group \"\" in the namespace \"tcloud\"",
  "reason": "Forbidden",
  "details": {
    "kind": "pods"
  },
  "code": 403
说明:
认证成功 ✅ 权限不足 ❌

认证成功
令牌正确、Token Controller 正常工作、ServiceAccount 认证通过API Server 认识这个用户:system:serviceaccount:tcloud:test-sa

权限不足
Kubernetes 默认安全机制:服务账号默认没有任何操作权限必须通过 RBAC 授权后才能操作资源
ServiceAccount Controller

同样运行在 kube-controller-manager 中,负责服务账号生命周期管理。

  • 新命名空间创建后,自动为其生成default默认服务账号;

  • 维护命名空间与服务账号的关联关系,保障资源匹配;

  • 删除服务账号或命名空间时,自动联动清理关联资源,避免无效身份凭证残留;

  • 为整个服务账号体系提供基础资源支撑。

    查看命名空间自动生成的默认服务账号

    [root@k8s-master01 ~]# kubectl get sa -n tcloud
    NAME SECRETS AGE
    default 0 10d
    test-sa 0 15m
    可以看到命名空间自带 default 服务账号,由 ServiceAccount Controller 自动创建。

    查看当前自定义服务账号

    [root@k8s-master01 ~]# kubectl get sa test-sa -n tcloud
    NAME SECRETS AGE
    test-sa 0 15m

    删除正在被 Pod 使用的服务账号

    [root@k8s-master01 ~]# kubectl delete sa test-sa -n tcloud

    进入原有 Pod 再次测试 API 访问

    [root@k8s-master01 ~]# kubectl exec -it serviceaccount-pod-578f9d45b6-k59s8 -n tcloud -- bash
    root@serviceaccount-pod-578f9d45b6-k59s8:/# TOKEN=(cat /var/run/secrets/kubernetes.io/serviceaccount/token) root@serviceaccount-pod-578f9d45b6-k59s8:/# curl -H "Authorization: Bearer TOKEN" https://kubernetes.default.svc/api/v1/namespaces/tcloud/pods -k
    {
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Failure",
    "message": "Unauthorized",
    "reason": "Unauthorized",
    "code": 401
    说明:
    ServiceAccount Controller 工作完全正常
    服务账号删除后,令牌立刻失效
    Pod 失去合法身份权限
    401 就是证明:服务账号已被系统回收,身份不合法了!

3、授权管理

认证只解决 "你是谁" 的身份识别问题,但识别身份后,并不能默认拥有集群所有资源操作权限;想要控制用户能做什么、不能做什么,就需要引入授权管理。

授权管理的核心作用,是在用户或服务账号完成身份认证之后,进一步校验该主体是否具备对某类资源执行增删改查等操作的权限,实现身份认证与权限授权分离,遵循最小权限安全原则。
API Server目前支持以下几种授权策略:

  • AlwaysDeny:表示拒绝所有请求,一般用于测试
  • AlwaysAllow:允许接收所有请求,相当于集群不需要授权流程(Kubernetes默认的策略)
  • ABAC:基于属性的访问控制,表示使用用户配置的授权规则对用户请求进行匹配和控制
  • Webhook:通过调用外部REST服务对用户进行授权
  • Node:是一种专用模式,用于对kubelet发出的请求进行访问控制
  • RBAC:基于角色的访问控制

RBAC(Role-Based Access Control) 基于角色的访问控制,主要是在描述一件事情:给哪些对象授予了哪些权限
其中涉及到了下面几个概念:

  • 对象:User、Groups、ServiceAccount
  • 角色:代表着一组定义在资源上的可操作动作(权限)的集合
  • 绑定:将定义好的角色跟用户绑定在一起

1、什么是RBAC

RBAC:基于角色的访问控制

核心逻辑: 不给用户直接赋权限,先把权限打包成角色,在给用户绑定角色

  • 先把一堆权限(看 Pod、删 Service、改配置)打包,起个名字叫角色
  • 有普通用户、服务账号、用户组这些主体
  • 直接把角色分给人,人就拥有角色里所有权限;
  • 换人、改权限只需要改角色绑定,不用一个个改权限,好管理、符合最小权限原则。

RBAC引入了4个顶级资源对象:

资源 级别 作用 适用场景
Role 命名空间 定义命名空间内权限 项目权限、团队权限
ClusterRole 集群 定义全局权限 集群管理员、节点管理
RoleBinding 命名空间 绑定命名空间角色 给用户分配某个 ns 权限
ClusterRoleBinding 集群 绑定集群角色 集群管理员、全局只读账号

2、RBAC配置解析

所有的RBAC配置都遵循这个逻辑

复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: 角色类型
metadata:
  name: 名称
  namespace: 命名空间(只有集群级不需要)
rules: 权限规则(角色才有)
subjects: 被授权的人/账号(绑定才有)
roleRef: 关联哪个角色(绑定才有)
Role(命名空间级角色)

作用:定义某个命名空间下的权限。

复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader       # 角色名
  namespace: tcloud     # 只在 tcloud 生效
rules:
- apiGroups: [""]        # 核心组(空字符串 = core 组,如 pod、service)
  resources: ["pods"]    # 要控制的资源
  verbs: ["get", "list", "watch"]  # 允许的操作

apiGroups:API 组,空 = 核心资源(pod、svc、node),apps = deployment、statefulset

resources:资源类型(pods、services、deployments 等)

verbs:权限动作

复制代码
get         查看单个
list        列表查看
watch       监听变化
create      创建
update      更新
patch       部分更新
delete      删除
deletecollection 批量删除
*           所有权限
ClusterRole(集群级角色)

作用:定义整个集群、所有命名空间的权限。

复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-pod-reader  # 全局唯一名称
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

没有 namespace 字段

作用于整个集群

可用于集群级资源:nodes、namespaces、persistentvolumes

RoleBinding(命名空间级绑定)

作用:把 Role 绑定给 用户 / ServiceAccount / 用户组,让它获得权限。

复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-reader-binding
  namespace: default
subjects:
- kind: User
  name: tom          # 用户名
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader   # 绑定的角色名
  apiGroup: rbac.authorization.k8s.io
  • subjects :被授权的对象
    • User:普通用户
    • Group:用户组
    • ServiceAccount:服务账号(Pod 用)
  • roleRef:关联的角色(必须写全,不能省略)
ClusterRoleBinding(集群级绑定)

作用:把 ClusterRole 绑定给对象,全局生效。

复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-pod-reader-binding
subjects:
- kind: User
  name: tom
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-pod-reader
  apiGroup: rbac.authorization.k8s.io

3、RBAC实践

命名空间级Role和RoleBinding

案例:创建服务账号,创建权限,绑定权限到SA,适用于只有查看权限的人员场景,核心目标是实现"业务资源隔离、最小权限管控",避免开发人员误操作其他命名空间资源或删除、修改核心业务组件。

复制代码
# 创建服务账号和role角色绑定权限
[root@k8s-master01 role]# vim rbac.yaml
# 1. 创建服务账号
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tcloud-dev-sa
  namespace: tcloud
---
# 2. 命名空间角色:定义权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: tcloud-dev-role
  namespace: tcloud  # 关键:限定只在tcloud生效
rules:
# 核心资源权限
- apiGroups: [""]
  resources: ["pods", "services", "configmaps"]
  verbs: ["get", "list", "watch"]
# apps组资源权限
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets"]
  verbs: ["get", "list", "watch"]
---
# 3. 角色绑定:把权限赋给SA
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: tcloud-dev-bind
  namespace: tcloud
subjects:                # 被授权主体
- kind: ServiceAccount
  name: tcloud-dev-sa
  namespace: tcloud
roleRef:                 # 关联哪个角色
  kind: Role
  name: tcloud-dev-role
  apiGroup: rbac.authorization.k8s.io

ServiceAccount 解析
作用:给业务 Pod / 程序使用的内置账号,代替集群超级管理员账号
必须指定 namespace: tcloud,归属当前命名空间


Role 关键字段解析
kind: Role:命名空间级角色,不能跨命名空间
apiGroups:资源分组
"" 空组:核心原生资源(pod、svc、cm)
apps:工作负载资源(deployment、statefulset)
resources:要管控的资源类型
verbs:动作权限
get/List/watch:只读
不配置 create/update/delete:禁止增删改


RoleBinding 解析
subjects:填被授权对象,这里是 tcloud 下的 SA
roleRef:固定关联本命名空间的 Role
生效范围:只在 tcloud 内有效

# apply
[root@k8s-master01 role]# kubectl apply -f rbac.yaml 

# kubectl auth 是用来检查与管理 K8s 权限(RBAC)、确认当前用户身份的命令组,核心就是回答两个问题:我是谁?我能干什么?

# 1. tcloud命名空间:允许查pod
[root@k8s-master01 role]# kubectl auth can-i get pods -n tcloud --as=system:serviceaccount:tcloud:tcloud-dev-sa
yes

# 2. tcloud命名空间:禁止删pod
[root@k8s-master01 role]# kubectl auth can-i delete pods -n tcloud --as=system:serviceaccount:tcloud:tcloud-dev-sa
no


# 3. 跳到default命名空间:完全无权限
[root@k8s-master01 role]# kubectl auth can-i get pods -n default --as=system:serviceaccount:tcloud:tcloud-dev-sa
no


# 4.查deployment:有权限
[root@k8s-master01 role]# kubectl auth can-i list deployments -n tcloud --as=system:serviceaccount:tcloud:tcloud-dev-sa
yes

# 使用服务账号(ServiceAccount)来验证
[root@k8s-master01 role]# vim pod-rbac.yaml
apiVersion: v1
kind: Pod
metadata:
  name: rbac-test-pod
  namespace: tcloud
spec:
  serviceAccountName: tcloud-dev-sa  # 挂载 RBAC 账号
  containers:
  - name: kubectl-box
    image: docker.io/local/busybox-kubectl:v1.28
    command: ["sleep", "36000"]


# apply
[root@k8s-master01 role]# kubectl apply -f pod-rbac.yaml 

# 进入pod测试权限
[root@k8s-master01 role]# kubectl exec -it -n tcloud rbac-test-pod -- sh
/ # kubectl get pods -n tcloud
NAME                                  READY   STATUS    RESTARTS   AGE
rbac-test-pod                         1/1     Running   0          67s
serviceaccount-pod-86b9487458-rdjf5   1/1     Running   0          3m13s
/ # kubectl get deploy -n tcloud
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
serviceaccount-pod   1/1     1            1           3m16s
/ # kubectl delete pod rbac-test-pod -n tcloud
Error from server (Forbidden): pods "rbac-test-pod" is forbidden: User "system:serviceaccount:tcloud:tcloud-dev-sa" cannot delete resource "pods" in API group "" in the namespace "tcloud"

# 也可以模拟SA的账号是验证
[root@k8s-master01 role]# kubectl get pods -n tcloud --as=system:serviceaccount:tcloud:tcloud-dev-sa
NAME                                  READY   STATUS    RESTARTS   AGE
rbac-test-pod                         1/1     Running   0          3m15s
serviceaccount-pod-86b9487458-rdjf5   1/1     Running   0          5m21s


# 查看当前我是谁
[root@k8s-master01 role]# kubectl auth whoami
ATTRIBUTE                                           VALUE
Username                                            admin
Groups                                              [system:masters system:authenticated]
Extra: authentication.kubernetes.io/credential-id   [X509SHA256=977025f007c274df33e14080ec4bd70969aa6be2f94d857cdcd0733f8e100cb2]


# 模拟SA查看身份
[root@k8s-master01 role]# kubectl auth whoami --as=system:serviceaccount:tcloud:tcloud-dev-sa
ATTRIBUTE   VALUE
Username    system:serviceaccount:tcloud:tcloud-dev-sa
Groups      [system:serviceaccounts system:serviceaccounts:tcloud system:authenticated]

当然我们这里只是验证,而不是去真正使用,真实使用是去把 tcloud-dev-sa 生成可给开发直接用的 kubeconfig

作用:开发拿到这份 kubeconfig,本地 kubectl 只能操作 tcloud 命名空间、只读,跨 NS / 删改都无权限

复制代码
# 创建生成kubeconfig的脚本
[root@k8s-master01 role]# vim gen-dev-kubeconfig.sh
#!/bin/bash
SA_NAME="tcloud-dev-sa"
NS_NAME="tcloud"
KUBECONFIG="./tcloud-dev-kubeconfig"

# 1.24+ 官方生成短期token(1小时有效期)
TOKEN=$(kubectl create token $SA_NAME -n $NS_NAME)
# APISERVER地址
APISERVER=$(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.server}')
# CA证书base64
CA_DATA=$(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')

# 生成独立kubeconfig,不覆盖本机~/.kube/config
cat > $KUBECONFIG <<EOF
apiVersion: v1
kind: Config
clusters:
- name: k8s
  cluster:
    server: ${APISERVER}
    certificate-authority-data: ${CA_DATA}
contexts:
- name: tcloud-dev
  context:
    cluster: k8s
    namespace: tcloud
    user: dev-user
current-context: tcloud-dev
users:
- name: dev-user
  user:
    token: ${TOKEN}
EOF

echo "生成成功:$KUBECONFIG"

# 执行脚本
[root@k8s-master01 role]# chmod +x gen-dev-kubeconfig.sh
[root@k8s-master01 role]# ./gen-dev-kubeconfig.sh
生成成功:./tcloud-dev-kubeconfig


# 这样把这个kubeconfig文件给到开发
# 模拟使用tcloud-dev-kubeconfig文件来操作集群
[root@k8s-master01 role]# export KUBECONFIG=./tcloud-dev-kubeconfig
[root@k8s-master01 role]# kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
rbac-test-pod                         1/1     Running   0          16m
serviceaccount-pod-86b9487458-rdjf5   1/1     Running   0          18m
[root@k8s-master01 role]# kubectl get pod -A
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:tcloud:tcloud-dev-sa" cannot list resource "pods" in API group "" at the cluster scope
可以看到我们定义了什么权限,就有什么权限

# 恢复环境
[root@k8s-master01 role]# unset KUBECONFIG
[root@k8s-master01 role]# kubectl get pod -A
NAMESPACE       NAME                                       READY   STATUS    RESTARTS         AGE
ingress-nginx   ingress-nginx-controller-l59g4             1/1     Running   0                25h
kube-system     calico-kube-controllers-658896864b-4w6hn   1/1     Running   13 (3h57m ago)   10d
kube-system     calico-node-8krq6                          1/1     Running   15 (3h57m ago)   12d
kube-system     calico-node-fs4zn                          1/1     Running   14 (28h ago)     12d
kube-system     calico-node-g6pr2                          1/1     Running   15 (3h57m ago)   12d
kube-system     calico-node-l9m8k                          1/1     Running   15 (3h57m ago)   12d
kube-system     calico-node-nj6jh                          1/1     Running   15 (3h57m ago)   12d
kube-system     calico-node-ps59q                          1/1     Running   15 (3h57m ago)   12d
kube-system     coredns-6659878fb9-5tvrj                   1/1     Running   15 (3h57m ago)   12d
kube-system     metrics-server-6f49b5d9d9-45qx9            1/1     Running   1 (3h57m ago)    25h
tcloud          rbac-test-pod                              1/1     Running   0                19m
tcloud          serviceaccount-pod-86b9487458-rdjf5        1/1     Running   0                21m

# 清除环境
[root@k8s-master01 role]# kubectl delete -f rbac.yaml
[root@k8s-master01 role]# kubectl delete -f pod-rbac.yaml 
[root@k8s-master01 role]# kubectl delete -f deployment.yaml 

当然这里可以使用集群级别的,让开发去操作整个集群

集群级ClusterRole和ClusterRolebinding

案例:适用于运维人员全局查看集群、监控、问题排查场景,实现全集群只读、禁止高危操作。

生产正式运维人员使用,基于 K8s x509 客户端证书认证,不靠 SA、不靠 Token。

原理

  1. 给运维签发客户端证书,设置 CN=ops-user;
  2. 创建 ClusterRole 定义全局只读权限;
  3. 创建 ClusterRoleBinding,直接绑定 User: ops-user;
  4. 运维用带证书的 kubeconfig 连接集群,K8s 读取证书 CN 匹配用户名,自动授权;
  5. 无任何 ServiceAccount,纯证书 + User 绑定模式,企业标准生产用法。

权限原理:

  1. 后续给运维签发客户端证书,指定 CN=ops-user;

  2. 定义 ClusterRole 全局只读规则;

  3. 通过 ClusterRoleBinding 直接绑定 User: ops-user;

  4. 运维携带证书连接集群,K8s 自动读取证书 CN 匹配身份,授予对应权限。

    [root@k8s-master01 clusterrole]# vim rbac-cluster.yaml

    1. 集群角色:定义全局只读权限

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    name: cluster-ops-read
    rules:

    集群核心资源 + 普通资源只读

    • apiGroups: [""]
      resources:
      • nodes
      • namespaces
      • persistentvolumes
      • pods
      • services
      • configmaps
      • endpoints
        verbs:
      • get
      • list
      • watch

    应用工作负载资源只读

    • apiGroups: ["apps"]
      resources:
      • deployments
      • statefulsets
      • replicasets
      • daemonsets
        verbs:
      • get
      • list
      • watch

    2. 集群角色绑定:绑定证书用户名 ops-user(不使用SA)

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    name: cluster-ops-read-binding
    subjects:

    绑定类型:User 用户名,对应客户端证书里的 CN=ops-user

    apply

    [root@k8s-master01 clusterrole]# kubectl apply -f rbac-cluster.yaml

    查看资源

    [root@k8s-master01 clusterrole]# kubectl get clusterrole | grep cluster-ops-read
    cluster-ops-read 2026-05-02T06:08:06Z
    [root@k8s-master01 clusterrole]# kubectl get clusterrolebinding | grep cluster-ops-read
    cluster-ops-read-binding ClusterRole/cluster-ops-read

    auth验证权限

    有权限:查看集群节点

    kubectl auth can-i get nodes --as=ops-user

    有权限:查看所有命名空间Pod

    kubectl auth can-i get pods -A --as=ops-user

    无权限:删除集群节点

    kubectl auth can-i delete nodes --as=ops-user

    无权限:创建命名空间

    kubectl auth can-i create namespaces --as=ops-user

    给运维人员创建 ops-user 证书 + kubeconfig

    创建目录(存放证书)

    [root@k8s-master01 ~]# mkdir -p /opt/ops-user && cd /opt/ops-user

    生成私钥

    [root@k8s-master01 ops-user]# openssl genrsa -out ops-user.key 2048

    生成CSR

    [root@k8s-master01 ops-user]# openssl req -new -key ops-user.key -out ops-user.csr -subj "/CN=ops-user"

    签发证书

    [root@k8s-master01 ops-user]# openssl x509 -req -in ops-user.csr -CA /etc/kubernetes/pki/ca.pem -CAkey /etc/kubernetes/pki/ca-key.pem -CAcreateserial -out ops-user.crt -days 3650
    现在目录就有
    ops-user.key
    ops-user.csr
    ops-user.crt ← 运维用户证书(已签发)

    生成kubeconfig

    [root@k8s-master01 ops-user]# APISERVER=(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.server}') [root@k8s-master01 ops-user]# CA_DATA=(base64 -w 0 /etc/kubernetes/pki/ca.pem)
    [root@k8s-master01 ops-user]# CLIENT_CERT=(base64 -w 0 ops-user.crt) [root@k8s-master01 ops-user]# CLIENT_KEY=(base64 -w 0 ops-user.key)
    [root@k8s-master01 ops-user]# cat > ./ops-kubeconfig << EOF
    apiVersion: v1
    kind: Config
    clusters:

    • cluster:
      certificate-authority-data: CA_DATA server: APISERVER
      name: k8s
      contexts:
    • context:
      cluster: k8s
      user: ops-user
      name: ops-context
      current-context: ops-context
      users:
    • name: ops-user
      user:
      client-certificate-data: CLIENT_CERT client-key-data: CLIENT_KEY
      EOF

    当前目录会生成ops-kubeconfig文件

    临时使用这个配置,只当前终端生效

    [root@k8s-master01 ops-user]# export KUBECONFIG=./ops-kubeconfig

    看下当前使用的那个用户(ops-user)

    [root@k8s-master01 ops-user]# kubectl auth whoami
    ATTRIBUTE VALUE
    Username ops-user
    Groups [system:authenticated]
    Extra: authentication.kubernetes.io/credential-id [X509SHA256=ca323c0ce7bdebd6989c112fe4af8ac0524e2bac1d115753e207625a290a5f18]

    验证只读操作

    [root@k8s-master01 ops-user]# kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    k8s-master01 Ready <none> 13d v1.34.1
    k8s-master02 Ready <none> 13d v1.34.1
    k8s-master03 Ready <none> 13d v1.34.1
    k8s-node01 Ready <none> 13d v1.34.1
    k8s-node02 Ready <none> 13d v1.34.1
    k8s-node03 Ready <none> 13d v1.34.1
    [root@k8s-master01 ops-user]# kubectl get ns
    NAME STATUS AGE
    default Active 13d
    ingress-nginx Active 6d22h
    kube-node-lease Active 13d
    kube-public Active 13d
    kube-system Active 13d
    tcloud Active 10d
    [root@k8s-master01 ops-user]# kubectl get pods -A
    NAMESPACE NAME READY STATUS RESTARTS AGE
    ingress-nginx ingress-nginx-controller-l59g4 1/1 Running 1 (5h28m ago) 27h
    kube-system calico-kube-controllers-658896864b-4w6hn 1/1 Running 13 (5h28m ago) 10d
    kube-system calico-node-8krq6 1/1 Running 15 (5h28m ago) 12d
    kube-system calico-node-fs4zn 1/1 Running 15 (5h28m ago) 12d
    kube-system calico-node-g6pr2 1/1 Running 15 (5h28m ago) 12d
    kube-system calico-node-l9m8k 1/1 Running 15 (5h28m ago) 12d
    kube-system calico-node-nj6jh 1/1 Running 15 (5h28m ago) 12d
    kube-system calico-node-ps59q 1/1 Running 15 (5h28m ago) 12d
    kube-system coredns-6659878fb9-5tvrj 1/1 Running 15 (5h28m ago) 12d
    kube-system metrics-server-6f49b5d9d9-45qx9 1/1 Running 1 (5h28m ago) 26h
    [root@k8s-master01 ops-user]# kubectl get svc -A
    NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 13d
    ingress-nginx ingress-nginx-controller ClusterIP 10.96.26.248 <none> 80/TCP,443/TCP 6d22h
    kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 12d
    kube-system metrics-server ClusterIP 10.96.41.20 <none> 443/TCP 12d
    [root@k8s-master01 ops-user]# kubectl get deploy -A
    NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
    kube-system calico-kube-controllers 1/1 1 1 12d
    kube-system coredns 1/1 1 1 12d
    kube-system metrics-server 1/1 1 1 12d

    验证禁止写操作(会报错 Forbidden)

    创建命名空间

    [root@k8s-master01 ops-user]# kubectl create ns test-demo
    Error from server (Forbidden): namespaces is forbidden: User "ops-user" cannot create resource "namespaces" in API group "" at the cluster scope

    删除节点

    [root@k8s-master01 ops-user]# kubectl delete node k8s-master01
    Error from server (Forbidden): nodes "k8s-master01" is forbidden: User "ops-user" cannot delete resource "nodes" in API group "" at the cluster scope

    删除Pod

    [root@k8s-master01 ops-user]# kubectl delete pod -n kube-system xxx
    Error from server (Forbidden): pods "xxx" is forbidden: User "ops-user" cannot delete resource "pods" in API group "" in the namespace "kube-system"

    恢复权限

    [root@k8s-master01 ops-user]# unset KUBECONFIG

这套流程就是Kubenetes标准的多租户RBAC隔离方案

多租户(套 K8s 集群,分给多个不同用户 / 团队使用,每个人权限隔离、互不干扰,只能看自己被授权的资源。)

相关推荐
GuokLiu2 小时前
260502-Clawith-Docker安装过程
运维·docker·容器·claw
日取其半万世不竭2 小时前
PeerTube 部署指南:自建视频托管平台
云原生·eureka·音视频
JesseDev3 小时前
Docker lnmp环境快速搭建开箱即用
运维·docker·容器
小义_4 小时前
【Kubernetes】(十二)配置存储卷
云原生·容器·kubernetes
BduL OWED5 小时前
Docker:基于自制openjdk8镜像 or 官方openjdk8镜像,制作tomcat镜像
docker·容器·tomcat
.柒宇.5 小时前
AI掘金头条项目 Docker Compose 部署完整教程(附踩坑记录)
运维·后端·python·docker·容器·fastapi
AI攻城狮5 小时前
AI的"平庸之恶":当机器正确地做了灾难性的事
云原生
薪火铺子16 小时前
微服务认证方案对比与选型
微服务·云原生·架构
运维全栈笔记17 小时前
K8S部署Redis高可用全攻略:1主2从3哨兵架构实战
redis·docker·云原生·容器·架构·kubernetes·bootstrap