kubernetes Pod-04 调度

调度

k8s中我们很少直接创建pod,大多数情况会通过RCDeployment等控制器完成Pod副本的创建、调度及生命周期的自动控制任务。

全自动调度

Deployment主要功能之一是自动部署一个容器应用的多个副本,以及持续监控副本的数量,在集群内始终维持用户设定的副本数量。

yaml 复制代码
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
  • 查看Deployment
bash 复制代码
[root@master1 pod]# kubectl get deployment
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3/3     3            3           4m32s

该状态说明Deployment已经创建3个副本,并且所有副本都是可用的。

  • 查看RsPod
bash 复制代码
[root@master1 pod]# kubectl get rs
NAME                          DESIRED   CURRENT   READY   AGE
nginx-deployment-5d59d67564   3         3         3       6m35s
[root@master1 pod]# kubectl get pod
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-5d59d67564-5thcv   1/1     Running   0          6m47s
nginx-deployment-5d59d67564-9pgcf   1/1     Running   0          6m47s
nginx-deployment-5d59d67564-dvfp5   1/1     Running   0          6m47s

3个Pod由系统自动完成调度。完全由Scheduler经过算法计算出来的。

NodeSelector

在一些情况下,可能需要将Pod调度到指定的Node上,可以通过Node的标签来实现。

bash 复制代码
# 给node2打标签
[root@master1 pod]# kubectl label nodes node2 zone=north
node/node2 labeled
yaml 复制代码
apiVersion: v1
kind: ReplicationController
metadata:
  name: redis-master
  labels:
    name: redis-master
spec:
  replicas: 1
  selector:
    name: redis-master
  template:
    metadata:
      labels:
        name: redis-master
    spec:
      containers:
      - name: master
        image: kubeguide/redis-master
        ports:
        - containerPort: 6379
      nodeSelector:
        zone: north

Pod中增加nodeSelector设置 生成并查看结果

bash 复制代码
[root@master1 pod]# kubectl apply -f 14.yaml
replicationcontroller/redis-master created

[root@master1 pod]# kubectl get pods -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
redis-master-qnbkm   1/1     Running   0          35s   10.244.104.3   node2   <none>           <none>

注意,如果我们指定了PodnodeSelector条件,且在集群中不存在包含的相应标签,则这个Pod也不能创建成功。

Node亲和性

两种节点亲和性表达

  • RequiredDuringSchedulingIgnoredDuringExecution:必须满足指定规则才可以调度PodNode上,相当于硬限制。
  • PerferredDuringSchedulingIgnoredDuringExecution:强调优先满足指定规则,调度器会尝试调度PodNode上,但不强求,相当于软限制。

IgnoredDuringExecution:如果在运行期间标签发生了变化,不再符合该Pod的节点亲和性要求,系统将忽略Node上的标签变化,该Pod可以继续在该节点上运行。

yaml 复制代码
apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
                - matchExpressions:
                  - key: bete.kubernets.io/arch
                    operator: In
                    values:
                    - amd64
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
                  matchExpressions:
                  - key: disk-type
                    operator: In
                    values:
                    - ssd
  containers:
  - name: with-node-affinity
    image: registry.aliyuncs.com/google_containers/pause:3.1

从上面的配置中可 以看到 In 操作符, NodeAffrnity 语法支持的操作符包括 InNotlnExistsDoesNotExistGtLt 。虽然没有节点排斥 的功能, 但是用 NotlnDoesNotExist 就可以实现排斥 的 功能了

odeAffinity 规则设置的注意事项如下

  • 如果同时定义了 nodeSelectornodeA伍nity ,那么必须两个条件都得到满足, Pod 才能最终运行在指定的 Node 上。
  • 如果 nodeAffrnity 指定了多个 nodeSelectorTerms , 那么只需要其中 一个能够匹配成功即可 。
  • 如果 nodeSelectorTerms 中有多个 matchExpressions , 则 一 个节点必须满足所有matchExpressions 才能运行该 Pod

Pod亲和与互斥调度策略

这一功能让用户从另一个角度来限制Pod所能运行的节点:根据在节点上正在运行的Pod的标签而不是节点的标签进行判断和调度,要求对节点和Pod两个条件进行匹配。

  • 参照Pod 注意:带有标签 security=S1和image=nginx

    yaml 复制代码
      apiVersion: v1
      kind: Pod
      metadata:
        name: pod-flag
        labels:
          security: "S1"
          app: "nginx"
      spec:
        containers:
        - name: nginx
          image: nginx

    运行

    bash 复制代码
    [root@master1 pod]# kubectl get pod -o wide
      NAME       READY   STATUS    RESTARTS   AGE     IP               NODE    NOMINATED NODE   READINESS GATES
      pod-flag   1/1     Running   0          8m57s   10.244.166.164   node1   <none>           <none>
  • 亲和性调度

    yaml 复制代码
    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-affinity
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: security
                operator: In
                values:
                - S1
            topologyKey: kubernetes.io/hostname
      containers:
      - name: with-pod-affinity
        image: nginx

查看结果

bash 复制代码
[root@master1 pod]# kubectl get pod -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP               NODE    NOMINATED NODE   READINESS GATES
pod-affinity   1/1     Running   0          4s    10.244.166.166   node1   <none>           <none>
pod-flag       1/1     Running   0          45m   10.244.166.164   node1   <none>           <none>

可以看到两个Pod在同一个node1上。

互斥性

创建Pod,我们希望该Pod不与目标Pod运行在同一个Node

yaml 复制代码
apiVersion: v1
kind: Pod
metadata:
  name: anti-affinity
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx
        topologyKey: kubernetes.io/hostname
  containers:
  - name: anti-affinity
    image: nginx

运行并查看结果

bash 复制代码
[root@master1 pod]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE    NOMINATED NODE   READINESS GATES
anti-affinity   1/1     Running   0          20s   10.244.104.6     node2   <none>           <none>
pod-flag        1/1     Running   1          23h   10.244.166.167   node1   <none>           <none>

pod-flag pod分配到不同的node

污点/容忍度

Node拒绝Pod的运行,需要和Toleration配合使用,让Pod避开那些不合适的Node

可以使用kubectl taint命令为Node设taint信息

Node2上设置污点

bash 复制代码
[root@master1 pod]# kubectl taint nodes node2 key=value:NoSchedule
node/node2 tainted

# 查看node2上是否有污点

[root@master1 pod]# kubectl describe node node2
Name:               node2
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node2
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=worker
                    zone=north
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.40.183/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.244.104.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 07 Mar 2024 08:46:02 -0500
Taints:             key=value:NoSchedule

Pod调用到node2节点上

yaml 复制代码
apiVersion: v1
kind: Pod
metadata:
  name: pod-taints
spec:
  nodeSelector:
    kubernetes.io/hostname: node2
  containers:
  - name: nginx
    image: nginx

运行后

bash 复制代码
[root@master1 pod]# kubectl describe pod pod-taints
Name:         pod-taints
Namespace:    pod-ns
Priority:     0
Node:         <none>
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:
IPs:          <none>
Containers:
  nginx:
    Image:        nginx
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9p2fw (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-9p2fw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9p2fw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/hostname=node2
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  32s   default-scheduler  0/4 nodes are available: 1 node(s) didn't match Pod's node affinity, 1 node(s) had taint {key: value}, that the pod didn't tolerate, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Warning  FailedScheduling  32s   default-scheduler  0/4 nodes are available: 1 node(s) didn't match Pod's node affinity, 1 node(s) had taint {key: value}, that the pod didn't tolerate, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

运行失败,因为pod没有容忍该污点 key=value:NoSchedule

更改yaml文件,新增容忍

yaml 复制代码
apiVersion: v1
kind: Pod
metadata:
  name: pod-taints
spec:
  nodeSelector:
    kubernetes.io/hostname: node2
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"
  containers:
  - name: nginx
    image: nginx

运行并查看结果

bash 复制代码
[root@master1 pod]# kubectl get pod
NAME         READY   STATUS    RESTARTS   AGE
pod-taints   1/1     Running   0          20s
  • operator 的值是Exists(无需指定value)
  • operator 的值是Equal并且 value 相等

如果不指定Operator,默认值为Equal

  • effect=NoSchedule: 调度器不会将Pod调度到这个节点
  • effect=PreNoSchedul:尽量不调度到这个节点
  • effect=NoExecute:已经调度到该节点的Pod会被驱逐,后续不再会有Pod调度到该节点(可以增加tolerationSeconds

优先级调度

有些负载比较重要,在集群资源不足的情况下,要保证重要的资源运行正常。

如何声明一个负载相对于其他负载"更重要",我们可以通过一下维度来定义:

  • Proority,优先级
  1. 驱逐:综合考虑Pod的优先级,优先级一样的,资源使用量是资源申请量的倍数越大越先驱逐
  2. 抢占:Scheduler有权驱逐部分优先级低的Pod来满足调度目标
  • QoS,服务质量等级
  • 系统定义的其他度量指标

创建优先级Yaml,优先级不属于任何命名空间

yaml 复制代码
apiVersion: scheduling.k8s.io/v1batal
kind: PriorityClass
metadata:
  name: hight-priority
value: 1000000
globalDefault: false
description: "这个是优先级文件,应该被用在XYZ service pods.only"

运行结果

yaml 复制代码
[root@master1 pod]# kubectl apply -f 20.yaml
priorityclass.scheduling.k8s.io/hight-priority created

# 查看
[root@master1 pod]# kubectl get priorityClass
NAME                      VALUE        GLOBAL-DEFAULT   AGE
hight-priority            1000000      false            7s

优先级别100000,数字越大,优先级越高

使用优先级

yaml 复制代码
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx-priority
    image: nginx
  priorityClassName: hight-priority

运行并查看结果

bash 复制代码
[root@master1 pod]# kubectl apply -f 21.yaml
pod/nginx created

# 查看结果
[root@master1 pod]# kubectl describe pod nginx
Name:                 nginx
Namespace:            pod-ns
Priority:             1000000
Priority Class Name:  hight-priority
Node:                 node1/192.168.40.182
Start Time:           Thu, 14 Mar 2024 09:39:25 -0400
Labels:               env=test
Annotations:          cni.projectcalico.org/podIP: 10.244.166.176/32
                      cni.projectcalico.org/podIPs: 10.244.166.176/32
Status:               Running
IP:                   10.244.166.176
IPs:
  IP:  10.244.166.176
Containers:
  nginx-priority:
    Container ID:   docker://fc7465fd346a5f4b50958b152f48b0c368dc8608967db92b22fa1a1c6f31c320
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 14 Mar 2024 09:39:27 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9p2fw (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-9p2fw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9p2fw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2m15s  default-scheduler  Successfully assigned pod-ns/nginx to node1
  Normal  Pulling    2m14s  kubelet            Pulling image "nginx"
  Normal  Pulled     2m13s  kubelet            Successfully pulled image "nginx" in 752.154707ms
  Normal  Created    2m13s  kubelet            Created container nginx-priority
  Normal  Started    2m13s  kubelet            Started container nginx-priority

如果发生抢占调度,高优先级Pod就可以抢占节点N,并将低优先级Pod驱逐出节点N。高优先级的Pod信息中nominatedNodeName字段会记录目标节点N的名称。

如果资源不足,首先要考虑扩容,然后在考虑优先级。

DaemonSet

管理集群中每个Node上仅仅运行一个Pod的副本。

需求:

  • 日志采集
  • 性能监控
yaml 复制代码
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nginx-daemonset
spec:
  selector:
    matchLabels:
      app: nginx-daemon
  template:
    metadata:
      labels:
        app: nginx-daemon
    spec:
      containers:
      - name: nginx
        image: nginx:1.21.6
        ports:
        - containerPort: 80
          hostPort: 80
        volumeMounts:
        - mountPath: /var/log/nginx
          name: nginx-logs
      volumes:
      - name: nginx-logs
        emptyDir: {}

运行并查看结果

bash 复制代码
[root@master1 pod]# kubectl get daemonset
NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
nginx-daemonset   2         2         0       2            0           <none>          14s

# 查看pod
[root@master1 pod]# kubectl get pod
NAME                    READY   STATUS    RESTARTS   AGE
nginx-daemonset-47gkk   1/1     Running   0          3m28s
nginx-daemonset-4bqcs   1/1     Running   0          3m28s

Kubernetes 中,DaemonSet 的更新策略用于控制 DaemonSetPod 的更新行为。DaemonSet 确保集群中的每个节点上运行一个指定的 Pod 副本。当您修改 DaemonSet 的配置(如容器镜像或 Pod 参数)时,更新策略决定了旧 Pods 如何被新版本的 Pods 替换。

DaemonSet 支持以下两种更新策略:

  • OnDelete

当设置为 OnDelete 时,Kubernetes 不会自动更新 Pod 直至用户手动删除旧的 DaemonSet Pod。这意味着只要旧版 Pod 存在并且没有被删除,就不会启动新版 Pod

  • RollingUpdate (默认策略)

这是 DaemonSet 更新的默认策略。使用 RollingUpdate 时,一旦更新了 DaemonSet 的配置模板,系统将按受控的方式终止旧的 DaemonSet Pods,并自动创建新的 Pods

yaml 复制代码
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: my-daemonset
spec:
  updateStrategy:
    type: RollingUpdate

批处理调度

批处理任务通常并行(或者串行)启动多个计算进程去处理一批工作项,处理完成后,整个批处理任务结束。

批处理几种模式

  • Job Template Expansion模式:一个Job对应一个待处理的Work item。通常适合Work item数量少、每个Work item要处理数据量大的场景。

模版

yaml 复制代码
apiVersion: batch/v1
kind: Job
metadata:
  name: process-item-$ITEM
  labels:
    jobgroup: jobexample
spec:
  template:
    metadata:
      name: jobexample
      labels:
        jobgroup: jobexample
    spec:
      containers:
      - name: c
        image: busybox
        command: ["sh","-c","echo Processing item $ITEM && sleep 5"]
      restartPolicy: Never

创建Job文件

bash 复制代码
# 创建 jobs 文件夹
mkdir jobs

# 根据模版创建文件
for i in apple banana cherry; do   cat 3.9_6.yaml | sed "s/\$ITEM/$i/" > ./jobs/job-$i.yaml; done

# 查看文件
[root@master1 pod]# ls jobs/
job-apple.yaml  job-banana.yaml  job-cherry.yaml

# 创建 Job
[root@master1 pod]# kubectl apply -f jobs/
job.batch/process-item-apple created
job.batch/process-item-banana created
job.batch/process-item-cherry created

# 查看 
[root@master1 pod]# kubectl get jobs -l jobgroup=jobexample
NAME                  COMPLETIONS   DURATION   AGE
process-item-apple    1/1           23s        5m32s
process-item-banana   1/1           38s        5m32s
process-item-cherry   1/1           55s        5m32s
  • Queue with Pod Per Work Item:采用一个队列存放work item。对象作为消费者去完成这些Work Item。一个Pod对应一个Work Item
  • Queue with Variable Pod Count模式:Worker程序需要指定队列中是否还有等待处理的Work item,如果有就取出来处理,否则就认为所有工作完成并结束进程。

CronJob

类似Linux Cron的定时任务Cron Job

yaml 复制代码
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

运行并查看结果

bash 复制代码
# 查看结果
[root@master1 pod]# kubectl get cronjob hello -w
NAME    SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello   */1 * * * *   False     0        46s             95s
hello   */1 * * * *   False     1        3s              112s
hello   */1 * * * *   False     0        13s             2m2s
hello   */1 * * * *   False     1        3s              2m52s
hello   */1 * * * *   False     0        13s             3m2s
hello   */1 * * * *   False     1        3s              3m52s
相关推荐
一个假的前端男2 小时前
Windows Docker Desktop安装及使用 Docker 运行 MySQL
windows·docker·容器
ahuang12022 小时前
在centos下使用containerd管理容器:5分钟从docker转型到containerd
linux·docker·centos
小马爱打代码2 小时前
125个Docker的常用命令
运维·docker·容器
xiao-xiang2 小时前
jenkins-k8s pod方式动态生成slave节点
java·kubernetes·jenkins
胡八一3 小时前
解决docker: ‘buildx‘ is not a docker command.
运维·docker·容器
石明亮(JT)3 小时前
docker部署jenkins
java·docker·jenkins
Мартин.3 小时前
[Meachines] [Easy] GoodGames SQLI+Flask SSTI+Docker逃逸权限提升
python·docker·flask
QQ_7781329745 小时前
在K8S中使用Values文件定制不同环境下的应用配置详解
kubernetes
不会飞的小龙人14 小时前
Docker Compose创建镜像服务
linux·运维·docker·容器·镜像
不会飞的小龙人14 小时前
Docker基础安装与使用
linux·运维·docker·容器