【星海出品】K8S调度器leader

发现K8S的技术资料越写越多,独立阐述一下K8S-Scheduler-leader

调度器通过Watch机制来发现集群中【新创建】且尚未被调度【unscheduled】到节点上的pod。

由于 Pod 中的容器和 Pod 本身可能有不同的要求,调度程序会过滤掉任何不满足 Pod 特定调度需求的节点。

在集群中找到一个 Pod 的所有可调度节点,然后根据一系列函数对这些可调度节点打分, 选出其中得分最高的节点来运行 Pod。

调度器将这个调度决定通知给 kube-apiserver,这个过程叫做绑定。

bash 复制代码
检查调度器是否正常	
kubectl get pods -n kube-system | grep kube-scheduler

如果调度失败,可通过kubectl describe pod查看。

其基本信息已经存储在etcd中,需通过delete删除pod

bash 复制代码
kubectl delete pod

静态pod运行

/etc/kubernetes/manifests 目录下,例如 kube-scheduler.yaml

yaml 复制代码
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /etc/kubernetes/kubeconfig
  qps: 100
  burst: 150
profiles:
  - schedulerName: default-scheduler
    plugins:
      postFilter:
        disabled:
          - name: DefaultPreemption
      preFilter:
        enabled:
          - name: CheckCSIStorageCapacity
      filter:
        enabled:
          - name: CheckPodCountLimit
          - name: CheckPodLimitResources
          - name: CheckCSIStorageCapacity
          - name: LvmVolumeCapacity
    pluginConfig:
    - name: CheckPodCountLimit
      args:
        podCountLimit: 2
    - name: CheckPodLimitResources
      args:
        limitRatio:
          cpu: 0.7
          memory: 0.7

调度器默认不启用高可用,需要手动设置

bash 复制代码
# 启动命令示例
kube-scheduler \
  --leader-elect=true \
  --leader-elect-lease-duration=15s \
  --leader-elect-renew-deadline=10s \
  --leader-elect-retry-period=2s

自主编写K8S扩展调度器

https://github.com/kubernetes/kubernetes/tree/master/plugin/pkg/scheduler

go 复制代码
package main

import (
    "context"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/resource"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/klog/v2"
)

func main() {
    // 1. 配置Kubernetes客户端
    config, err := rest.InClusterConfig() // 集群内模式
    // config, err := clientcmd.BuildConfigFromFlags("", "/path/to/kubeconfig") // 集群外模式
    if err != nil {
        klog.Fatal(err)
    }
    clientset := kubernetes.NewForConfigOrDie(config)
 
    // 2. 定义调度逻辑(示例:按CPU资源调度)
    scheduler := NewCustomScheduler(clientset)
    scheduler.Schedule()
}
 
type CustomScheduler struct {
    clientset *kubernetes.Clientset
}
 
func NewCustomScheduler(clientset *kubernetes.Clientset) *CustomScheduler {
    return &CustomScheduler{clientset: clientset}
}
 
func (s *CustomScheduler) Schedule() {
    // 3. 监听待调度Pod
    podList, err := s.clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
        FieldSelector: "spec.nodeName="", // 未调度的Pod
    })
    if err != nil {
        klog.Fatal(err)
    }
 
    for _, pod := range podList.Items {
        // 4. 筛选可用节点
        nodes, err := s.clientset.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
        if err != nil {
            klog.Fatal(err)
        }
 
        var suitableNodes []v1.Node
        for _, node := range nodes.Items {
            if s.isNodeSuitable(pod, node) {
                suitableNodes = append(suitableNodes, node)
            }
        }
 
        // 5. 选择最优节点(示例:随机选择)
        if len(suitableNodes) > 0 {
            selectedNode := suitableNodes[0] // 实际场景需优化选择算法
            s.bindPodToNode(pod, selectedNode)
        }
    }
}
 
func (s *CustomScheduler) isNodeSuitable(pod v1.Pod, node v1.Node) bool {
    // 示例:检查节点CPU资源是否满足Pod请求
    cpuRequest := pod.Spec.Containers[0].Resources.Requests.Cpu()
    if cpuRequest == nil {
        return true // 无CPU请求,默认允许
    }
 
    nodeCPU := node.Status.Allocatable.Cpu()
    if nodeCPU.Cmp(*cpuRequest) >= 0 {
        return true
    }
    return false
}
 
func (s *CustomScheduler) bindPodToNode(pod v1.Pod, node v1.Node) {
    // 6. 将Pod绑定到选定节点
    binding := &v1.Binding{
        ObjectMeta: metav1.ObjectMeta{
            Name:      pod.Name,
            Namespace: pod.Namespace,
        },
        Target: v1.ObjectReference{
            APIVersion: "v1",
            Kind:       "Node",
            Name:       node.Name,
        },
    }
 
    err := s.clientset.CoreV1().Pods(pod.Namespace).Bind(context.TODO(), binding, metav1.CreateOptions{})
    if err != nil {
        klog.Errorf("Failed to bind Pod %s to Node %s: %v", pod.Name, node.Name, err)
    } else {
        klog.Infof("Pod %s bound to Node %s", pod.Name, node.Name)
    }
}
bash 复制代码
# Dockerfile
FROM golang:1.20 AS builder
WORKDIR /app
COPY . .
RUN go mod tidy
RUN CGO_ENABLED=0 GOOS=linux go build -o custom-scheduler
 
FROM alpine:latest
COPY --from=builder /app/custom-scheduler /usr/local/bin/
ENTRYPOINT ["custom-scheduler"]

https://www.qikqiak.com/k8strain/scheduler/overview/

构建镜像并推送至镜像仓库:

bash 复制代码
docker build -t your-dockerhub-id/custom-scheduler:v1 .
docker push your-dockerhub-id/custom-scheduler:v1

部署多用例

yaml 复制代码
# custom-scheduler-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: custom-scheduler
spec:
  replicas: 3 # 部署3个实例
  selector:
    matchLabels:
      app: custom-scheduler
  template:
    metadata:
      labels:
        app: custom-scheduler
    spec:
      containers:
      - name: scheduler
        image: your-dockerhub-id/custom-scheduler:v1
        args:
        - --leader-elect=true # 启用领导者选举
        - --leader-elect-lease-duration=15s
        - --leader-elect-renew-deadline=10s
        - --leader-elect-retry-period=2s
        - --v=2 # 日志级别
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: "0.5"
            memory: 256Mi

为调度器创建ServiceAccount并绑定ClusterRole:

yaml 复制代码
# custom-scheduler-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: custom-scheduler
  namespace: kube-system
 
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-scheduler-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: custom-scheduler
  namespace: kube-system

将调度器注册为K8S的组件

yaml 复制代码
# custom-scheduler-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta3
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: true
      resourceLock: leases
      resourceName: custom-scheduler
      resourceNamespace: kube-system
bash 复制代码
kubectl get pods -n kube-system | grep custom-scheduler

创建pod

yaml 复制代码
# custom-scheduled-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: custom-scheduled-pod
spec:
  schedulerName: custom-scheduler # 指定使用自定义调度器
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        cpu: "500m"

检查

powershell 复制代码
kubectl get pod custom-scheduled-pod -o wide

集成监控

相关推荐
2501_9240641130 分钟前
2025年优测平台:微服务全链路性能瓶颈分析与最佳实践
微服务·云原生·架构·性能瓶颈·全链路性能
石小千1 小时前
Ubuntu24.04 安装Docker
运维·docker·容器
scriptsboy1 小时前
Halo Docker 迁移方法
运维·docker·容器
隐语SecretFlow2 小时前
【隐语Secretflow】一文速通基于可信执行环境 (TEE) 的零信任计算系统
云原生·kubernetes·开源
R.lin2 小时前
Docker核心原理详解
运维·docker·容器
MarkHD2 小时前
车辆TBOX科普 第70次 AUTOSAR Adaptive、容器化与云原生的融合革命
云原生·wpf
Dobby_052 小时前
【k8s】集群安全机制(一):认证
运维·安全·kubernetes
测试人社区-小明3 小时前
测试领域的“云原生”进化:Serverless Testing
人工智能·科技·云原生·面试·金融·serverless·github
阿基米东3 小时前
Traefik:为云原生而生的自动化反向代理
运维·云原生·自动化
纷飞梦雪3 小时前
排查k8s连接mysql的pod
云原生·容器·kubernetes