【星海出品】K8S调度器leader

发现K8S的技术资料越写越多,独立阐述一下K8S-Scheduler-leader

调度器通过Watch机制来发现集群中【新创建】且尚未被调度【unscheduled】到节点上的pod。

由于 Pod 中的容器和 Pod 本身可能有不同的要求,调度程序会过滤掉任何不满足 Pod 特定调度需求的节点。

在集群中找到一个 Pod 的所有可调度节点,然后根据一系列函数对这些可调度节点打分, 选出其中得分最高的节点来运行 Pod。

调度器将这个调度决定通知给 kube-apiserver,这个过程叫做绑定。

bash 复制代码
检查调度器是否正常	
kubectl get pods -n kube-system | grep kube-scheduler

如果调度失败,可通过kubectl describe pod查看。

其基本信息已经存储在etcd中,需通过delete删除pod

bash 复制代码
kubectl delete pod

静态pod运行

/etc/kubernetes/manifests 目录下,例如 kube-scheduler.yaml

yaml 复制代码
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /etc/kubernetes/kubeconfig
  qps: 100
  burst: 150
profiles:
  - schedulerName: default-scheduler
    plugins:
      postFilter:
        disabled:
          - name: DefaultPreemption
      preFilter:
        enabled:
          - name: CheckCSIStorageCapacity
      filter:
        enabled:
          - name: CheckPodCountLimit
          - name: CheckPodLimitResources
          - name: CheckCSIStorageCapacity
          - name: LvmVolumeCapacity
    pluginConfig:
    - name: CheckPodCountLimit
      args:
        podCountLimit: 2
    - name: CheckPodLimitResources
      args:
        limitRatio:
          cpu: 0.7
          memory: 0.7

调度器默认不启用高可用,需要手动设置

bash 复制代码
# 启动命令示例
kube-scheduler \
  --leader-elect=true \
  --leader-elect-lease-duration=15s \
  --leader-elect-renew-deadline=10s \
  --leader-elect-retry-period=2s

自主编写K8S扩展调度器

https://github.com/kubernetes/kubernetes/tree/master/plugin/pkg/scheduler

go 复制代码
package main

import (
    "context"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/resource"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/klog/v2"
)

func main() {
    // 1. 配置Kubernetes客户端
    config, err := rest.InClusterConfig() // 集群内模式
    // config, err := clientcmd.BuildConfigFromFlags("", "/path/to/kubeconfig") // 集群外模式
    if err != nil {
        klog.Fatal(err)
    }
    clientset := kubernetes.NewForConfigOrDie(config)
 
    // 2. 定义调度逻辑(示例:按CPU资源调度)
    scheduler := NewCustomScheduler(clientset)
    scheduler.Schedule()
}
 
type CustomScheduler struct {
    clientset *kubernetes.Clientset
}
 
func NewCustomScheduler(clientset *kubernetes.Clientset) *CustomScheduler {
    return &CustomScheduler{clientset: clientset}
}
 
func (s *CustomScheduler) Schedule() {
    // 3. 监听待调度Pod
    podList, err := s.clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
        FieldSelector: "spec.nodeName="", // 未调度的Pod
    })
    if err != nil {
        klog.Fatal(err)
    }
 
    for _, pod := range podList.Items {
        // 4. 筛选可用节点
        nodes, err := s.clientset.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
        if err != nil {
            klog.Fatal(err)
        }
 
        var suitableNodes []v1.Node
        for _, node := range nodes.Items {
            if s.isNodeSuitable(pod, node) {
                suitableNodes = append(suitableNodes, node)
            }
        }
 
        // 5. 选择最优节点(示例:随机选择)
        if len(suitableNodes) > 0 {
            selectedNode := suitableNodes[0] // 实际场景需优化选择算法
            s.bindPodToNode(pod, selectedNode)
        }
    }
}
 
func (s *CustomScheduler) isNodeSuitable(pod v1.Pod, node v1.Node) bool {
    // 示例:检查节点CPU资源是否满足Pod请求
    cpuRequest := pod.Spec.Containers[0].Resources.Requests.Cpu()
    if cpuRequest == nil {
        return true // 无CPU请求,默认允许
    }
 
    nodeCPU := node.Status.Allocatable.Cpu()
    if nodeCPU.Cmp(*cpuRequest) >= 0 {
        return true
    }
    return false
}
 
func (s *CustomScheduler) bindPodToNode(pod v1.Pod, node v1.Node) {
    // 6. 将Pod绑定到选定节点
    binding := &v1.Binding{
        ObjectMeta: metav1.ObjectMeta{
            Name:      pod.Name,
            Namespace: pod.Namespace,
        },
        Target: v1.ObjectReference{
            APIVersion: "v1",
            Kind:       "Node",
            Name:       node.Name,
        },
    }
 
    err := s.clientset.CoreV1().Pods(pod.Namespace).Bind(context.TODO(), binding, metav1.CreateOptions{})
    if err != nil {
        klog.Errorf("Failed to bind Pod %s to Node %s: %v", pod.Name, node.Name, err)
    } else {
        klog.Infof("Pod %s bound to Node %s", pod.Name, node.Name)
    }
}
bash 复制代码
# Dockerfile
FROM golang:1.20 AS builder
WORKDIR /app
COPY . .
RUN go mod tidy
RUN CGO_ENABLED=0 GOOS=linux go build -o custom-scheduler
 
FROM alpine:latest
COPY --from=builder /app/custom-scheduler /usr/local/bin/
ENTRYPOINT ["custom-scheduler"]

https://www.qikqiak.com/k8strain/scheduler/overview/

构建镜像并推送至镜像仓库:

bash 复制代码
docker build -t your-dockerhub-id/custom-scheduler:v1 .
docker push your-dockerhub-id/custom-scheduler:v1

部署多用例

yaml 复制代码
# custom-scheduler-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: custom-scheduler
spec:
  replicas: 3 # 部署3个实例
  selector:
    matchLabels:
      app: custom-scheduler
  template:
    metadata:
      labels:
        app: custom-scheduler
    spec:
      containers:
      - name: scheduler
        image: your-dockerhub-id/custom-scheduler:v1
        args:
        - --leader-elect=true # 启用领导者选举
        - --leader-elect-lease-duration=15s
        - --leader-elect-renew-deadline=10s
        - --leader-elect-retry-period=2s
        - --v=2 # 日志级别
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: "0.5"
            memory: 256Mi

为调度器创建ServiceAccount并绑定ClusterRole:

yaml 复制代码
# custom-scheduler-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: custom-scheduler
  namespace: kube-system
 
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-scheduler-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: custom-scheduler
  namespace: kube-system

将调度器注册为K8S的组件

yaml 复制代码
# custom-scheduler-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta3
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: true
      resourceLock: leases
      resourceName: custom-scheduler
      resourceNamespace: kube-system
bash 复制代码
kubectl get pods -n kube-system | grep custom-scheduler

创建pod

yaml 复制代码
# custom-scheduled-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: custom-scheduled-pod
spec:
  schedulerName: custom-scheduler # 指定使用自定义调度器
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        cpu: "500m"

检查

powershell 复制代码
kubectl get pod custom-scheduled-pod -o wide

集成监控

相关推荐
JAVA坚守者1 小时前
Eureka 深度解析:从原理到部署的全场景实践
云原生·eureka
明天不下雨(牛客同名)2 小时前
Docker和K8s面试题
docker·容器·kubernetes
平谷一勺3 小时前
docker存储
运维·docker·容器
zhang-ge4 小时前
docker本地部署ClipCascade,实现跨设备剪贴板同步
docker·容器
IT闫4 小时前
【Docker】——在Docker工具上安装创建容器并完成项目部署
运维·docker·容器
玄明Hanko4 小时前
云原生深度解析:从传统架构到云上的技术跃迁
后端·微服务·云原生
阿里云云原生5 小时前
Nacos 3.0 正式发布:MCP Registry、安全零信任、链接更多生态
云原生·mcp
Ares-Wang5 小时前
Kubernetes》》k8s》》explain查 yaml 参数
kubernetes
背书包的儿郎5 小时前
docker--什么是docker
运维·docker·容器