发现K8S的技术资料越写越多,独立阐述一下K8S-Scheduler-leader
调度器通过Watch机制来发现集群中【新创建】且尚未被调度【unscheduled】到节点上的pod。
由于 Pod 中的容器和 Pod 本身可能有不同的要求,调度程序会过滤掉任何不满足 Pod 特定调度需求的节点。
在集群中找到一个 Pod 的所有可调度节点,然后根据一系列函数对这些可调度节点打分, 选出其中得分最高的节点来运行 Pod。
调度器将这个调度决定通知给 kube-apiserver,这个过程叫做绑定。
bash
检查调度器是否正常
kubectl get pods -n kube-system | grep kube-scheduler
如果调度失败,可通过kubectl describe pod查看。
其基本信息已经存储在etcd中,需通过delete删除pod
bash
kubectl delete pod
静态pod运行
/etc/kubernetes/manifests 目录下,例如 kube-scheduler.yaml
yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /etc/kubernetes/kubeconfig
qps: 100
burst: 150
profiles:
- schedulerName: default-scheduler
plugins:
postFilter:
disabled:
- name: DefaultPreemption
preFilter:
enabled:
- name: CheckCSIStorageCapacity
filter:
enabled:
- name: CheckPodCountLimit
- name: CheckPodLimitResources
- name: CheckCSIStorageCapacity
- name: LvmVolumeCapacity
pluginConfig:
- name: CheckPodCountLimit
args:
podCountLimit: 2
- name: CheckPodLimitResources
args:
limitRatio:
cpu: 0.7
memory: 0.7
调度器默认不启用高可用,需要手动设置
bash
# 启动命令示例
kube-scheduler \
--leader-elect=true \
--leader-elect-lease-duration=15s \
--leader-elect-renew-deadline=10s \
--leader-elect-retry-period=2s
自主编写K8S扩展调度器
https://github.com/kubernetes/kubernetes/tree/master/plugin/pkg/scheduler
go
package main
import (
"context"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/clientcmd"
"k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/klog/v2"
)
func main() {
// 1. 配置Kubernetes客户端
config, err := rest.InClusterConfig() // 集群内模式
// config, err := clientcmd.BuildConfigFromFlags("", "/path/to/kubeconfig") // 集群外模式
if err != nil {
klog.Fatal(err)
}
clientset := kubernetes.NewForConfigOrDie(config)
// 2. 定义调度逻辑(示例:按CPU资源调度)
scheduler := NewCustomScheduler(clientset)
scheduler.Schedule()
}
type CustomScheduler struct {
clientset *kubernetes.Clientset
}
func NewCustomScheduler(clientset *kubernetes.Clientset) *CustomScheduler {
return &CustomScheduler{clientset: clientset}
}
func (s *CustomScheduler) Schedule() {
// 3. 监听待调度Pod
podList, err := s.clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
FieldSelector: "spec.nodeName="", // 未调度的Pod
})
if err != nil {
klog.Fatal(err)
}
for _, pod := range podList.Items {
// 4. 筛选可用节点
nodes, err := s.clientset.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
if err != nil {
klog.Fatal(err)
}
var suitableNodes []v1.Node
for _, node := range nodes.Items {
if s.isNodeSuitable(pod, node) {
suitableNodes = append(suitableNodes, node)
}
}
// 5. 选择最优节点(示例:随机选择)
if len(suitableNodes) > 0 {
selectedNode := suitableNodes[0] // 实际场景需优化选择算法
s.bindPodToNode(pod, selectedNode)
}
}
}
func (s *CustomScheduler) isNodeSuitable(pod v1.Pod, node v1.Node) bool {
// 示例:检查节点CPU资源是否满足Pod请求
cpuRequest := pod.Spec.Containers[0].Resources.Requests.Cpu()
if cpuRequest == nil {
return true // 无CPU请求,默认允许
}
nodeCPU := node.Status.Allocatable.Cpu()
if nodeCPU.Cmp(*cpuRequest) >= 0 {
return true
}
return false
}
func (s *CustomScheduler) bindPodToNode(pod v1.Pod, node v1.Node) {
// 6. 将Pod绑定到选定节点
binding := &v1.Binding{
ObjectMeta: metav1.ObjectMeta{
Name: pod.Name,
Namespace: pod.Namespace,
},
Target: v1.ObjectReference{
APIVersion: "v1",
Kind: "Node",
Name: node.Name,
},
}
err := s.clientset.CoreV1().Pods(pod.Namespace).Bind(context.TODO(), binding, metav1.CreateOptions{})
if err != nil {
klog.Errorf("Failed to bind Pod %s to Node %s: %v", pod.Name, node.Name, err)
} else {
klog.Infof("Pod %s bound to Node %s", pod.Name, node.Name)
}
}
bash
# Dockerfile
FROM golang:1.20 AS builder
WORKDIR /app
COPY . .
RUN go mod tidy
RUN CGO_ENABLED=0 GOOS=linux go build -o custom-scheduler
FROM alpine:latest
COPY --from=builder /app/custom-scheduler /usr/local/bin/
ENTRYPOINT ["custom-scheduler"]
https://www.qikqiak.com/k8strain/scheduler/overview/
构建镜像并推送至镜像仓库:
bash
docker build -t your-dockerhub-id/custom-scheduler:v1 .
docker push your-dockerhub-id/custom-scheduler:v1
部署多用例
yaml
# custom-scheduler-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: custom-scheduler
spec:
replicas: 3 # 部署3个实例
selector:
matchLabels:
app: custom-scheduler
template:
metadata:
labels:
app: custom-scheduler
spec:
containers:
- name: scheduler
image: your-dockerhub-id/custom-scheduler:v1
args:
- --leader-elect=true # 启用领导者选举
- --leader-elect-lease-duration=15s
- --leader-elect-renew-deadline=10s
- --leader-elect-retry-period=2s
- --v=2 # 日志级别
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: "0.5"
memory: 256Mi
为调度器创建ServiceAccount并绑定ClusterRole:
yaml
# custom-scheduler-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: custom-scheduler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: custom-scheduler-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: custom-scheduler
namespace: kube-system
将调度器注册为K8S的组件
yaml
# custom-scheduler-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: custom-scheduler-config
namespace: kube-system
data:
scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
resourceLock: leases
resourceName: custom-scheduler
resourceNamespace: kube-system
bash
kubectl get pods -n kube-system | grep custom-scheduler
创建pod
yaml
# custom-scheduled-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: custom-scheduled-pod
spec:
schedulerName: custom-scheduler # 指定使用自定义调度器
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: "500m"
检查
powershell
kubectl get pod custom-scheduled-pod -o wide