基于 Linux 6.18.26,结合内核源码逐行分析
系列文章:
- Sched_ext 回调深度解析(一):sched_ext 框架总览------前言
- Sched_ext 回调深度解析(二):init_task ------ 每个任务走进调度器的第一道门(6.18.26)
- Sched_ext 回调深度解析(三):enable ------ 任务被调度器接管的关键时刻(6.18.26)
- Sched_ext 回调深度解析(四):select_cpu ------ 任务唤醒时的选核决策(6.18.26)
- Sched_ext 回调深度解析(五):runnable ------ 任务状态转换的哨兵(6.18.26)
- Sched_ext 回调深度解析(六):enqueue ------ 任务入队,调度器的核心决策点(6.18.26)
- Sched_ext 回调深度解析(七):dispatch ------ 从队列取出任务送到 CPU(6.18.26)
- Sched_ext 回调深度解析(八):running ------ 任务开始执行(6.18.26)
框架背景与回调总览见前言,本文聚焦 running 回调。
1. running 是什么
running 是其中一个钩子,定义在 struct sched_ext_ops 中(kernel/sched/ext_internal.h:387)
c
/**
* @running: A task is starting to run on its associated CPU
* @p: task starting to run
*
* Note that this callback may be called from a CPU other than the
* one the task is going to run on. This can happen when a task
* property is changed (i.e., affinity), since scx_next_task_scx(),
* which triggers this callback, may run on a CPU different from
* the task's assigned CPU.
*
* Therefore, always use scx_bpf_task_cpu(@p) to determine the
* target CPU the task is going to use.
*
* See ->runnable() for explanation on the task state notifiers.
*/
void (*running)(struct task_struct *p);
它的职责很明确:通知 BPF 调度器,一个 task 即将开始在 CPU 上执行。
与前面分析过的 init_task(登记)和 enable(上岗)不同,running 不是一次性的------每次 task 被调度到 CPU 上运行时,running 都会被调用。它和 stopping 构成一对:running 标志执行开始,stopping 标志执行结束。
ext_sched_class完整定义及 running/stopping 的挂接位置见前言第 6 节。
2. running 的完整调用链
running 回调发生在内核调度的核心路径中。以下是从 __schedule() 到 running 的完整调用链:
__schedule() // kernel/sched/core.c:6789
└─ pick_next_task(rq, rq->donor, &rf) // kernel/sched/core.c:6875
└─ __pick_next_task(rq, prev, rf) // kernel/sched/core.c:5955
└─ class->pick_task(rq) // 对 ext_sched_class 即 pick_task_scx
└─ put_prev_set_next_task(rq, prev, next) // kernel/sched/sched.h:2499
├─ prev->sched_class->put_prev_task() // → put_prev_task_scx → stopping 回调
└─ next->sched_class->set_next_task() // → set_next_task_scx → running 回调
下面用一张总览图展示从时钟中断到 running 回调的完整路径:
#mermaid-svg-AkHNRQaR23UUK8Ey{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-AkHNRQaR23UUK8Ey .error-icon{fill:#552222;}#mermaid-svg-AkHNRQaR23UUK8Ey .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-AkHNRQaR23UUK8Ey .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-AkHNRQaR23UUK8Ey .marker{fill:#333333;stroke:#333333;}#mermaid-svg-AkHNRQaR23UUK8Ey .marker.cross{stroke:#333333;}#mermaid-svg-AkHNRQaR23UUK8Ey svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-AkHNRQaR23UUK8Ey p{margin:0;}#mermaid-svg-AkHNRQaR23UUK8Ey .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-AkHNRQaR23UUK8Ey .cluster-label text{fill:#333;}#mermaid-svg-AkHNRQaR23UUK8Ey .cluster-label span{color:#333;}#mermaid-svg-AkHNRQaR23UUK8Ey .cluster-label span p{background-color:transparent;}#mermaid-svg-AkHNRQaR23UUK8Ey .label text,#mermaid-svg-AkHNRQaR23UUK8Ey span{fill:#333;color:#333;}#mermaid-svg-AkHNRQaR23UUK8Ey .node rect,#mermaid-svg-AkHNRQaR23UUK8Ey .node circle,#mermaid-svg-AkHNRQaR23UUK8Ey .node ellipse,#mermaid-svg-AkHNRQaR23UUK8Ey .node polygon,#mermaid-svg-AkHNRQaR23UUK8Ey .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-AkHNRQaR23UUK8Ey .rough-node .label text,#mermaid-svg-AkHNRQaR23UUK8Ey .node .label text,#mermaid-svg-AkHNRQaR23UUK8Ey .image-shape .label,#mermaid-svg-AkHNRQaR23UUK8Ey .icon-shape .label{text-anchor:middle;}#mermaid-svg-AkHNRQaR23UUK8Ey .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-AkHNRQaR23UUK8Ey .rough-node .label,#mermaid-svg-AkHNRQaR23UUK8Ey .node .label,#mermaid-svg-AkHNRQaR23UUK8Ey .image-shape .label,#mermaid-svg-AkHNRQaR23UUK8Ey .icon-shape .label{text-align:center;}#mermaid-svg-AkHNRQaR23UUK8Ey .node.clickable{cursor:pointer;}#mermaid-svg-AkHNRQaR23UUK8Ey .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-AkHNRQaR23UUK8Ey .arrowheadPath{fill:#333333;}#mermaid-svg-AkHNRQaR23UUK8Ey .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-AkHNRQaR23UUK8Ey .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-AkHNRQaR23UUK8Ey .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AkHNRQaR23UUK8Ey .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-AkHNRQaR23UUK8Ey .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AkHNRQaR23UUK8Ey .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-AkHNRQaR23UUK8Ey .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-AkHNRQaR23UUK8Ey .cluster text{fill:#333;}#mermaid-svg-AkHNRQaR23UUK8Ey .cluster span{color:#333;}#mermaid-svg-AkHNRQaR23UUK8Ey div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-AkHNRQaR23UUK8Ey .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-AkHNRQaR23UUK8Ey rect.text{fill:none;stroke-width:0;}#mermaid-svg-AkHNRQaR23UUK8Ey .icon-shape,#mermaid-svg-AkHNRQaR23UUK8Ey .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AkHNRQaR23UUK8Ey .icon-shape p,#mermaid-svg-AkHNRQaR23UUK8Ey .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-AkHNRQaR23UUK8Ey .icon-shape .label rect,#mermaid-svg-AkHNRQaR23UUK8Ey .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AkHNRQaR23UUK8Ey .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-AkHNRQaR23UUK8Ey .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-AkHNRQaR23UUK8Ey :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 触发源
时钟中断 / 自愿让出 / 唤醒抢占
__schedule()
pick_next_task()
__pick_next_task()
prev_balance()
for_each_active_class()
pick_task_scx()
从 local DSQ 选出 next
put_prev_set_next_task()
put_prev_task_scx()
ops.stopping(prev)
set_next_task_scx()
ops.running(next)
📌 注释
balance_scx() 负责从全局 DSQ 搬运任务到 localDSQ
pick_task_scx() 仅从 local DSQ 选取任务,不直接访问全局 DSQ
下面逐层深入分析每一层的关键逻辑。
3. __schedule ------ 调度的总入口 (第1层)
__schedule() 是内核调度的核心函数(kernel/sched/core.c:6789),每次调度切换都经过它。这里只关注与 running 相关的关键部分:
c
static void __sched notrace __schedule(int sched_mode)
{
struct task_struct *prev, *next;
// ...
cpu = smp_processor_id();
rq = cpu_rq(cpu);
prev = rq->curr; // 当前正在运行的任务
rq_lock(rq, &rf);
smp_mb__after_spinlock();
update_rq_clock(rq);
// ... 处理 prev 状态(阻塞、抢占等)...
pick_again:
next = pick_next_task(rq, rq->donor, &rf); // ← 选出下一个要运行的任务
rq_set_donor(rq, next);
// ...
if (likely(is_switch)) {
RCU_INIT_POINTER(rq->curr, next); // ← 切换 curr 指针
// ... 后续上下文切换 ...
}
}
注意 6.18.26 使用了 rq->donor(而非早期版本的 rq->curr)作为 pick_next_task 的参数,这是 proxy execution 机制引入的概念。
用流程图表示 __schedule 的核心决策路径:
#mermaid-svg-fKIjzkIOX7q9fR6j{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-fKIjzkIOX7q9fR6j .error-icon{fill:#552222;}#mermaid-svg-fKIjzkIOX7q9fR6j .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-fKIjzkIOX7q9fR6j .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-fKIjzkIOX7q9fR6j .marker{fill:#333333;stroke:#333333;}#mermaid-svg-fKIjzkIOX7q9fR6j .marker.cross{stroke:#333333;}#mermaid-svg-fKIjzkIOX7q9fR6j svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-fKIjzkIOX7q9fR6j p{margin:0;}#mermaid-svg-fKIjzkIOX7q9fR6j .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-fKIjzkIOX7q9fR6j .cluster-label text{fill:#333;}#mermaid-svg-fKIjzkIOX7q9fR6j .cluster-label span{color:#333;}#mermaid-svg-fKIjzkIOX7q9fR6j .cluster-label span p{background-color:transparent;}#mermaid-svg-fKIjzkIOX7q9fR6j .label text,#mermaid-svg-fKIjzkIOX7q9fR6j span{fill:#333;color:#333;}#mermaid-svg-fKIjzkIOX7q9fR6j .node rect,#mermaid-svg-fKIjzkIOX7q9fR6j .node circle,#mermaid-svg-fKIjzkIOX7q9fR6j .node ellipse,#mermaid-svg-fKIjzkIOX7q9fR6j .node polygon,#mermaid-svg-fKIjzkIOX7q9fR6j .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-fKIjzkIOX7q9fR6j .rough-node .label text,#mermaid-svg-fKIjzkIOX7q9fR6j .node .label text,#mermaid-svg-fKIjzkIOX7q9fR6j .image-shape .label,#mermaid-svg-fKIjzkIOX7q9fR6j .icon-shape .label{text-anchor:middle;}#mermaid-svg-fKIjzkIOX7q9fR6j .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-fKIjzkIOX7q9fR6j .rough-node .label,#mermaid-svg-fKIjzkIOX7q9fR6j .node .label,#mermaid-svg-fKIjzkIOX7q9fR6j .image-shape .label,#mermaid-svg-fKIjzkIOX7q9fR6j .icon-shape .label{text-align:center;}#mermaid-svg-fKIjzkIOX7q9fR6j .node.clickable{cursor:pointer;}#mermaid-svg-fKIjzkIOX7q9fR6j .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-fKIjzkIOX7q9fR6j .arrowheadPath{fill:#333333;}#mermaid-svg-fKIjzkIOX7q9fR6j .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-fKIjzkIOX7q9fR6j .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-fKIjzkIOX7q9fR6j .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-fKIjzkIOX7q9fR6j .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-fKIjzkIOX7q9fR6j .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-fKIjzkIOX7q9fR6j .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-fKIjzkIOX7q9fR6j .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-fKIjzkIOX7q9fR6j .cluster text{fill:#333;}#mermaid-svg-fKIjzkIOX7q9fR6j .cluster span{color:#333;}#mermaid-svg-fKIjzkIOX7q9fR6j div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-fKIjzkIOX7q9fR6j .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-fKIjzkIOX7q9fR6j rect.text{fill:none;stroke-width:0;}#mermaid-svg-fKIjzkIOX7q9fR6j .icon-shape,#mermaid-svg-fKIjzkIOX7q9fR6j .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-fKIjzkIOX7q9fR6j .icon-shape p,#mermaid-svg-fKIjzkIOX7q9fR6j .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-fKIjzkIOX7q9fR6j .icon-shape .label rect,#mermaid-svg-fKIjzkIOX7q9fR6j .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-fKIjzkIOX7q9fR6j .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-fKIjzkIOX7q9fR6j .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-fKIjzkIOX7q9fR6j :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} SM_IDLE 模式
yes
no
!preempt 且 prev_state
preempt 或 running
yes
no
__schedule() 入口
cpu = smp_processor_id()
rq = cpu_rq(cpu)
prev = rq->curr
rq_lock(rq, &rf)
update_rq_clock(rq)
prev 状态判断
rq->nr_running == 0
且 !scx_enabled()?
next = prev, goto picked
pick_again:
next = pick_next_task()
try_to_block_task()
prev != next?
RCU_INIT_POINTER(rq->curr, next)
上下文切换
直接返回
4. __pick_next_task ------ 选择下一个任务(第2层)
__pick_next_task() 负责从所有调度类中选出下一个要运行的任务(kernel/sched/core.c:5955):
c
static inline struct task_struct *
__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
const struct sched_class *class;
struct task_struct *p;
rq->dl_server = NULL;
if (scx_enabled())
goto restart; // ← scx 启用时,跳过 CFS 快速路径
// CFS 快速路径优化(scx 启用时跳过)...
restart:
prev_balance(rq, prev, rf); // ← 调用各调度类的 balance()
for_each_active_class(class) {
if (class->pick_next_task) {
p = class->pick_next_task(rq, prev);
if (p)
return p;
} else {
p = class->pick_task(rq); // ← 对 ext_sched_class,调用 pick_task_scx()
if (p) {
put_prev_set_next_task(rq, prev, p); // ← 关键:完成 prev/next 切换
return p;
}
}
}
BUG(); /* The idle class should always have a runnable task. */
}
当 scx_enabled() 为 true 时,CFS 快速路径被跳过,直接进入 restart 标签。 这意味着即使系统中大多数任务都是 fair 类,只要 scx 加载了,就必须走完整的调度类遍历路径。
__pick_next_task 的决策逻辑可以用下面的流程图表示:
#mermaid-svg-AOdqx5cFJMe3rpu5{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-AOdqx5cFJMe3rpu5 .error-icon{fill:#552222;}#mermaid-svg-AOdqx5cFJMe3rpu5 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-AOdqx5cFJMe3rpu5 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .marker.cross{stroke:#333333;}#mermaid-svg-AOdqx5cFJMe3rpu5 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-AOdqx5cFJMe3rpu5 p{margin:0;}#mermaid-svg-AOdqx5cFJMe3rpu5 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .cluster-label text{fill:#333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .cluster-label span{color:#333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .cluster-label span p{background-color:transparent;}#mermaid-svg-AOdqx5cFJMe3rpu5 .label text,#mermaid-svg-AOdqx5cFJMe3rpu5 span{fill:#333;color:#333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .node rect,#mermaid-svg-AOdqx5cFJMe3rpu5 .node circle,#mermaid-svg-AOdqx5cFJMe3rpu5 .node ellipse,#mermaid-svg-AOdqx5cFJMe3rpu5 .node polygon,#mermaid-svg-AOdqx5cFJMe3rpu5 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-AOdqx5cFJMe3rpu5 .rough-node .label text,#mermaid-svg-AOdqx5cFJMe3rpu5 .node .label text,#mermaid-svg-AOdqx5cFJMe3rpu5 .image-shape .label,#mermaid-svg-AOdqx5cFJMe3rpu5 .icon-shape .label{text-anchor:middle;}#mermaid-svg-AOdqx5cFJMe3rpu5 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-AOdqx5cFJMe3rpu5 .rough-node .label,#mermaid-svg-AOdqx5cFJMe3rpu5 .node .label,#mermaid-svg-AOdqx5cFJMe3rpu5 .image-shape .label,#mermaid-svg-AOdqx5cFJMe3rpu5 .icon-shape .label{text-align:center;}#mermaid-svg-AOdqx5cFJMe3rpu5 .node.clickable{cursor:pointer;}#mermaid-svg-AOdqx5cFJMe3rpu5 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .arrowheadPath{fill:#333333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-AOdqx5cFJMe3rpu5 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AOdqx5cFJMe3rpu5 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-AOdqx5cFJMe3rpu5 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AOdqx5cFJMe3rpu5 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-AOdqx5cFJMe3rpu5 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-AOdqx5cFJMe3rpu5 .cluster text{fill:#333;}#mermaid-svg-AOdqx5cFJMe3rpu5 .cluster span{color:#333;}#mermaid-svg-AOdqx5cFJMe3rpu5 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-AOdqx5cFJMe3rpu5 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-AOdqx5cFJMe3rpu5 rect.text{fill:none;stroke-width:0;}#mermaid-svg-AOdqx5cFJMe3rpu5 .icon-shape,#mermaid-svg-AOdqx5cFJMe3rpu5 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AOdqx5cFJMe3rpu5 .icon-shape p,#mermaid-svg-AOdqx5cFJMe3rpu5 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-AOdqx5cFJMe3rpu5 .icon-shape .label rect,#mermaid-svg-AOdqx5cFJMe3rpu5 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AOdqx5cFJMe3rpu5 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-AOdqx5cFJMe3rpu5 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-AOdqx5cFJMe3rpu5 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} yes
no
yes
RETRY_TASK
选到任务
没选到
no
pick_next_task
pick_next_task
yes
no
__pick_next_task()
scx_enabled()?
goto restart
全部是 fair 类任务?
pick_next_task_fair()
返回 p
pick_task_idle()
put_prev_set_next_task()
prev_balance()
for_each_active_class()
dl_sched_class
rt_sched_class
ext_sched_class
pick_task_scx()
选到任务?
put_prev_set_next_task()
fair_sched_class
idle_sched_class
在 for_each_active_class 循环中,ext_sched_class 没有 pick_next_task 方法(只有 pick_task),所以走 else 分支:
- 调用
pick_task_scx()选出下一个任务 - 调用
put_prev_set_next_task()完成 prev/next 切换
5. put_prev_set_next_task ------ prev/next 切换枢纽(第3层)
put_prev_set_next_task() 是连接 put_prev 和 set_next 的枢纽函数(kernel/sched/sched.h:2499):
c
static inline void put_prev_set_next_task(struct rq *rq,
struct task_struct *prev,
struct task_struct *next)
{
WARN_ON_ONCE(rq->donor != prev);
__put_prev_set_next_dl_server(rq, prev, next);
if (next == prev) // ← 同一个任务,无需切换
return;
prev->sched_class->put_prev_task(rq, prev, next); // ← 让 prev "下班"(触发 stopping)
next->sched_class->set_next_task(rq, next, true); // ← 让 next "上班"(触发 running)
}
这里有两点值得注意:
细节 1:next == prev 时直接返回。 如果 pick_task 选出的任务和当前运行的任务相同(比如时间片未耗尽,继续运行),不会触发任何回调。这意味着 running 不是"每个 tick 都调用",而是"每次实际发生任务切换时调用"。
细节 2:先 put_prev 再 set_next。 stopping 一定先于 running 执行。对于同一个 CPU,时序永远是:先让旧任务 stopping,再让新任务 running。
6. 第四层:set_next_task_scx ------ running 回调的执行现场
set_next_task_scx() 是 ext_sched_class 的 set_next_task 实现(kernel/sched/ext.c:2268),这是 running 回调真正被调用的地方:
c
static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool first)
{
struct scx_sched *sch = scx_root;
// ① 如果 task 还在队列中(core-sched 强制提前调度),先出队
if (p->scx.flags & SCX_TASK_QUEUED) {
/*
* Core-sched might decide to execute @p before it is
* dispatched. Call ops_dequeue() to notify the BPF scheduler.
*/
ops_dequeue(rq, p, SCX_DEQ_CORE_SCHED_EXEC);
dispatch_dequeue(rq, p);
}
// ② 设置执行开始时间戳
p->se.exec_start = rq_clock_task(rq);
// ③ 调用 running 回调
/* see dequeue_task_scx() on why we skip when !QUEUED */
if (SCX_HAS_OP(sch, running) && (p->scx.flags & SCX_TASK_QUEUED))
SCX_CALL_OP_TASK(sch, SCX_KF_REST, running, rq, p);
// ④ 清除 runnable 状态
clr_task_runnable(p, true);
// ⑤ 刷新 tick 依赖
if ((p->scx.slice == SCX_SLICE_INF) !=
(bool)(rq->scx.flags & SCX_RQ_CAN_STOP_TICK)) {
if (p->scx.slice == SCX_SLICE_INF)
rq->scx.flags |= SCX_RQ_CAN_STOP_TICK;
else
rq->scx.flags &= ~SCX_RQ_CAN_STOP_TICK;
sched_update_tick_dependency(rq);
update_other_load_avgs(rq);
}
}
这个函数做的事情比较多,与running相关的分析如下:
6.1 core-sched 提前调度处理
正常情况下,task 被调度时已经从 DSQ(调度队列)中移除了。但 core-sched(内核的同级线程调度机制) 可能会强制让一个 task 提前执行------即使它还没被 dispatch 到 local DSQ。
#mermaid-svg-68juu1Qcjc2OySCx{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-68juu1Qcjc2OySCx .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-68juu1Qcjc2OySCx .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-68juu1Qcjc2OySCx .error-icon{fill:#552222;}#mermaid-svg-68juu1Qcjc2OySCx .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-68juu1Qcjc2OySCx .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-68juu1Qcjc2OySCx .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-68juu1Qcjc2OySCx .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-68juu1Qcjc2OySCx .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-68juu1Qcjc2OySCx .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-68juu1Qcjc2OySCx .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-68juu1Qcjc2OySCx .marker{fill:#333333;stroke:#333333;}#mermaid-svg-68juu1Qcjc2OySCx .marker.cross{stroke:#333333;}#mermaid-svg-68juu1Qcjc2OySCx svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-68juu1Qcjc2OySCx p{margin:0;}#mermaid-svg-68juu1Qcjc2OySCx .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-68juu1Qcjc2OySCx .cluster-label text{fill:#333;}#mermaid-svg-68juu1Qcjc2OySCx .cluster-label span{color:#333;}#mermaid-svg-68juu1Qcjc2OySCx .cluster-label span p{background-color:transparent;}#mermaid-svg-68juu1Qcjc2OySCx .label text,#mermaid-svg-68juu1Qcjc2OySCx span{fill:#333;color:#333;}#mermaid-svg-68juu1Qcjc2OySCx .node rect,#mermaid-svg-68juu1Qcjc2OySCx .node circle,#mermaid-svg-68juu1Qcjc2OySCx .node ellipse,#mermaid-svg-68juu1Qcjc2OySCx .node polygon,#mermaid-svg-68juu1Qcjc2OySCx .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-68juu1Qcjc2OySCx .rough-node .label text,#mermaid-svg-68juu1Qcjc2OySCx .node .label text,#mermaid-svg-68juu1Qcjc2OySCx .image-shape .label,#mermaid-svg-68juu1Qcjc2OySCx .icon-shape .label{text-anchor:middle;}#mermaid-svg-68juu1Qcjc2OySCx .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-68juu1Qcjc2OySCx .rough-node .label,#mermaid-svg-68juu1Qcjc2OySCx .node .label,#mermaid-svg-68juu1Qcjc2OySCx .image-shape .label,#mermaid-svg-68juu1Qcjc2OySCx .icon-shape .label{text-align:center;}#mermaid-svg-68juu1Qcjc2OySCx .node.clickable{cursor:pointer;}#mermaid-svg-68juu1Qcjc2OySCx .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-68juu1Qcjc2OySCx .arrowheadPath{fill:#333333;}#mermaid-svg-68juu1Qcjc2OySCx .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-68juu1Qcjc2OySCx .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-68juu1Qcjc2OySCx .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-68juu1Qcjc2OySCx .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-68juu1Qcjc2OySCx .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-68juu1Qcjc2OySCx .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-68juu1Qcjc2OySCx .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-68juu1Qcjc2OySCx .cluster text{fill:#333;}#mermaid-svg-68juu1Qcjc2OySCx .cluster span{color:#333;}#mermaid-svg-68juu1Qcjc2OySCx div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-68juu1Qcjc2OySCx .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-68juu1Qcjc2OySCx rect.text{fill:none;stroke-width:0;}#mermaid-svg-68juu1Qcjc2OySCx .icon-shape,#mermaid-svg-68juu1Qcjc2OySCx .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-68juu1Qcjc2OySCx .icon-shape p,#mermaid-svg-68juu1Qcjc2OySCx .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-68juu1Qcjc2OySCx .icon-shape .label rect,#mermaid-svg-68juu1Qcjc2OySCx .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-68juu1Qcjc2OySCx .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-68juu1Qcjc2OySCx .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-68juu1Qcjc2OySCx :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} core-sched 强制路径
task 仍在 DSQ 中
core-sched 强制选择 task
set_next_task_scx()
ops_dequeue(DEQ_CORE_SCHED_EXEC)
dispatch_dequeue()
正常路径
ops.dispatch()
task 进入 local DSQ
pick_task_scx() 取出
dispatch_dequeue() 已完成
set_next_task_scx()
此时 p->scx.flags & SCX_TASK_QUEUED 仍然为 true,需要做两件事:
-
ops_dequeue(rq, p, SCX_DEQ_CORE_SCHED_EXEC):通知 BPF 调度器,这个 task 正在被 core-sched 强制出队。BPF 的dequeue回调会被调用,deq_flags为SCX_DEQ_CORE_SCHED_EXEC(1LLU << 32)。 -
dispatch_dequeue(rq, p):从 DSQ 中物理移除 task,清理p->scx.dsq、p->scx.holding_cpu等字段。
SCX_DEQ_CORE_SCHED_EXEC 的定义(kernel/sched/ext_internal.h:959):
c
/*
* The generic core-sched layer decided to execute the task even though
* it hasn't been dispatched yet. Dequeue from the BPF side.
*/
SCX_DEQ_CORE_SCHED_EXEC = 1LLU << 32,
6.2 调用 running 回调
c
if (SCX_HAS_OP(sch, running) && (p->scx.flags & SCX_TASK_QUEUED))
SCX_CALL_OP_TASK(sch, SCX_KF_REST, running, rq, p);
这里有两个条件必须同时满足:
#mermaid-svg-l2yaGQgdIFOuCicI{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-l2yaGQgdIFOuCicI .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-l2yaGQgdIFOuCicI .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-l2yaGQgdIFOuCicI .error-icon{fill:#552222;}#mermaid-svg-l2yaGQgdIFOuCicI .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-l2yaGQgdIFOuCicI .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-l2yaGQgdIFOuCicI .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-l2yaGQgdIFOuCicI .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-l2yaGQgdIFOuCicI .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-l2yaGQgdIFOuCicI .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-l2yaGQgdIFOuCicI .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-l2yaGQgdIFOuCicI .marker{fill:#333333;stroke:#333333;}#mermaid-svg-l2yaGQgdIFOuCicI .marker.cross{stroke:#333333;}#mermaid-svg-l2yaGQgdIFOuCicI svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-l2yaGQgdIFOuCicI p{margin:0;}#mermaid-svg-l2yaGQgdIFOuCicI .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-l2yaGQgdIFOuCicI .cluster-label text{fill:#333;}#mermaid-svg-l2yaGQgdIFOuCicI .cluster-label span{color:#333;}#mermaid-svg-l2yaGQgdIFOuCicI .cluster-label span p{background-color:transparent;}#mermaid-svg-l2yaGQgdIFOuCicI .label text,#mermaid-svg-l2yaGQgdIFOuCicI span{fill:#333;color:#333;}#mermaid-svg-l2yaGQgdIFOuCicI .node rect,#mermaid-svg-l2yaGQgdIFOuCicI .node circle,#mermaid-svg-l2yaGQgdIFOuCicI .node ellipse,#mermaid-svg-l2yaGQgdIFOuCicI .node polygon,#mermaid-svg-l2yaGQgdIFOuCicI .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-l2yaGQgdIFOuCicI .rough-node .label text,#mermaid-svg-l2yaGQgdIFOuCicI .node .label text,#mermaid-svg-l2yaGQgdIFOuCicI .image-shape .label,#mermaid-svg-l2yaGQgdIFOuCicI .icon-shape .label{text-anchor:middle;}#mermaid-svg-l2yaGQgdIFOuCicI .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-l2yaGQgdIFOuCicI .rough-node .label,#mermaid-svg-l2yaGQgdIFOuCicI .node .label,#mermaid-svg-l2yaGQgdIFOuCicI .image-shape .label,#mermaid-svg-l2yaGQgdIFOuCicI .icon-shape .label{text-align:center;}#mermaid-svg-l2yaGQgdIFOuCicI .node.clickable{cursor:pointer;}#mermaid-svg-l2yaGQgdIFOuCicI .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-l2yaGQgdIFOuCicI .arrowheadPath{fill:#333333;}#mermaid-svg-l2yaGQgdIFOuCicI .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-l2yaGQgdIFOuCicI .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-l2yaGQgdIFOuCicI .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-l2yaGQgdIFOuCicI .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-l2yaGQgdIFOuCicI .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-l2yaGQgdIFOuCicI .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-l2yaGQgdIFOuCicI .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-l2yaGQgdIFOuCicI .cluster text{fill:#333;}#mermaid-svg-l2yaGQgdIFOuCicI .cluster span{color:#333;}#mermaid-svg-l2yaGQgdIFOuCicI div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-l2yaGQgdIFOuCicI .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-l2yaGQgdIFOuCicI rect.text{fill:none;stroke-width:0;}#mermaid-svg-l2yaGQgdIFOuCicI .icon-shape,#mermaid-svg-l2yaGQgdIFOuCicI .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-l2yaGQgdIFOuCicI .icon-shape p,#mermaid-svg-l2yaGQgdIFOuCicI .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-l2yaGQgdIFOuCicI .icon-shape .label rect,#mermaid-svg-l2yaGQgdIFOuCicI .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-l2yaGQgdIFOuCicI .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-l2yaGQgdIFOuCicI .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-l2yaGQgdIFOuCicI :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} false:BPF 未注册
true:BPF 已注册
false:task 不在队列
true:task 在队列中
步骤③:准备调用 running
SCX_HAS_OP(running)?
跳过,不调用
SCX_TASK_QUEUED?
SCX_CALL_OP_TASK(running)
BPF 调度器的 running()
条件 1:SCX_HAS_OP(sch, running)
BPF 调度器必须注册了 running 回调。SCX_HAS_OP 是一个位测试宏:
c
#define SCX_HAS_OP(sch, op) test_bit(SCX_OP_IDX(op), (sch)->has_op)
如果 BPF 程序没有实现 running,这个检查直接返回 false,跳过调用。
条件 2:p->scx.flags & SCX_TASK_QUEUED
task 必须仍在 scx 的可运行队列中。这个标志定义在 include/linux/sched/ext.h:
c
/* scx_entity.flags */
enum scx_ent_flags {
SCX_TASK_QUEUED = 1 << 0, /* on ext runqueue */
// ...
};
注释 /* see dequeue_task_scx() on why we skip when !QUEUED */ 指向一个重要细节。这里的关键问题是:步骤①中 dispatch_dequeue 清除了 p->scx.dsq,步骤④中 clr_task_runnable 从 runnable_list 中移除了 task------但 SCX_TASK_QUEUED 标志只由 dequeue_task_scx() 清除 ,而 dequeue_task_scx() 在 set_next_task_scx 的执行路径中并未被调用。
因此,无论 core-sched 提前调度还是正常调度场景,步骤③检查 SCX_TASK_QUEUED 时该标志始终为 true。只要 BPF 调度器注册了 running 回调,它都会被调用。
6.3 清除 runnable 状态
c
clr_task_runnable(p, true);
task 即将开始执行,不再处于 "runnable" 状态。注意 reset_runnable_at = true,这意味着当 task 下次被 enqueue 时,runnable_at 时间戳会被重置。
7. running 的"搭档":stopping
running 和 stopping 是一对对称的回调。running 在 set_next_task_scx 中调用,stopping 在 put_prev_task_scx 中调用。两者在 put_prev_set_next_task 中被紧密连接:
put_prev_set_next_task(rq, prev, next)
├─ prev->sched_class->put_prev_task(rq, prev, next)
│ └─ put_prev_task_scx() → ops.stopping(prev, runnable=true)
└─ next->sched_class->set_next_task(rq, next, true)
└─ set_next_task_scx() → ops.running(next)
7.1 stopping 回调的定义
c
/**
* @stopping: A task is stopping execution
* @p: task stopping to run
* @runnable: is task @p still runnable?
*
* Note that this callback may be called from a CPU other than the
* one the task was running on. This can happen when a task
* property is changed (i.e., affinity), since dequeue_task_scx(),
* which triggers this callback, may run on a CPU different from
* the task's assigned CPU.
*
* Therefore, always use scx_bpf_task_cpu(@p) to retrieve the CPU
* the task was running on.
*
* See ->runnable() for explanation on the task state notifiers. If
* !@runnable, ->quiescent() will be invoked after this operation
* returns.
*/
void (*stopping)(struct task_struct *p, bool runnable);
7.2 put_prev_task_scx 的源码分析
put_prev_task_scx() 是 ext_sched_class 的 put_prev_task 实现(kernel/sched/ext.c:2364):
c
static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
struct task_struct *next)
{
struct scx_sched *sch = scx_root;
/* see kick_cpus_irq_workfn() */
smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
update_curr_scx(rq); // ← 更新时间片消耗
/* see dequeue_task_scx() on why we skip when !QUEUED */
if (SCX_HAS_OP(sch, stopping) && (p->scx.flags & SCX_TASK_QUEUED))
SCX_CALL_OP_TASK(sch, SCX_KF_REST, stopping, rq, p, true); // ← stopping 回调
if (p->scx.flags & SCX_TASK_QUEUED) {
set_task_runnable(rq, p);
// 如果还有剩余时间片且非 bypass 模式,放回 local DSQ 头部
if (p->scx.slice && !scx_rq_bypassing(rq)) {
dispatch_enqueue(sch, &rq->scx.local_dsq, p, SCX_ENQ_HEAD);
goto switch_class;
}
// 否则重新 enqueue
if (sched_class_above(&ext_sched_class, next->sched_class)) {
WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
} else {
do_enqueue_task(rq, p, 0, -1);
}
}
switch_class:
if (next && next->sched_class != &ext_sched_class)
switch_class(rq, next); // ← 通知 CPU 被 higher class 抢占
}
put_prev_task_scx 中 stopping 的调用同样要求 SCX_TASK_QUEUED 标志。然后根据 task 是否还有剩余时间片,决定是放回 local DSQ 头部还是重新 enqueue。
其中 update_curr_scx() 会递减时间片:
c
static void update_curr_scx(struct rq *rq)
{
struct task_struct *curr = rq->curr;
s64 delta_exec;
delta_exec = update_curr_common(rq);
if (unlikely(delta_exec <= 0))
return;
if (curr->scx.slice != SCX_SLICE_INF) {
curr->scx.slice -= min_t(u64, curr->scx.slice, delta_exec);
if (!curr->scx.slice)
touch_core_sched(rq, curr);
}
}
7.3 running 与 stopping 的时序关系
BPF Scheduler set_next_task_scx put_prev_task_scx __schedule BPF Scheduler set_next_task_scx put_prev_task_scx __schedule #mermaid-svg-Q9YJza3pb0ZoWHuK{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Q9YJza3pb0ZoWHuK .error-icon{fill:#552222;}#mermaid-svg-Q9YJza3pb0ZoWHuK .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Q9YJza3pb0ZoWHuK .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Q9YJza3pb0ZoWHuK .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Q9YJza3pb0ZoWHuK .marker.cross{stroke:#333333;}#mermaid-svg-Q9YJza3pb0ZoWHuK svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Q9YJza3pb0ZoWHuK p{margin:0;}#mermaid-svg-Q9YJza3pb0ZoWHuK .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-Q9YJza3pb0ZoWHuK text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-Q9YJza3pb0ZoWHuK .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-Q9YJza3pb0ZoWHuK .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-Q9YJza3pb0ZoWHuK .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-Q9YJza3pb0ZoWHuK .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-Q9YJza3pb0ZoWHuK #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-Q9YJza3pb0ZoWHuK .sequenceNumber{fill:white;}#mermaid-svg-Q9YJza3pb0ZoWHuK #sequencenumber{fill:#333;}#mermaid-svg-Q9YJza3pb0ZoWHuK #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-Q9YJza3pb0ZoWHuK .messageText{fill:#333;stroke:none;}#mermaid-svg-Q9YJza3pb0ZoWHuK .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-Q9YJza3pb0ZoWHuK .labelText,#mermaid-svg-Q9YJza3pb0ZoWHuK .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-Q9YJza3pb0ZoWHuK .loopText,#mermaid-svg-Q9YJza3pb0ZoWHuK .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-Q9YJza3pb0ZoWHuK .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-Q9YJza3pb0ZoWHuK .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-Q9YJza3pb0ZoWHuK .noteText,#mermaid-svg-Q9YJza3pb0ZoWHuK .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-Q9YJza3pb0ZoWHuK .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-Q9YJza3pb0ZoWHuK .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-Q9YJza3pb0ZoWHuK .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-Q9YJza3pb0ZoWHuK .actorPopupMenu{position:absolute;}#mermaid-svg-Q9YJza3pb0ZoWHuK .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-Q9YJza3pb0ZoWHuK .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-Q9YJza3pb0ZoWHuK .actor-man circle,#mermaid-svg-Q9YJza3pb0ZoWHuK line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-Q9YJza3pb0ZoWHuK :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} update_curr_scx() - 递减 prev 时间片 将 prev 放回 DSQ 或重新 enqueue 可能执行 ops_dequeue (core-sched) 设置 exec_start 时间戳 clr_task_runnable() 刷新 tick 依赖 put_prev_set_next_task(prev, next) ops.stopping(prev, runnable=true) set_next_task_scx(next) ops.running(next)
8. 重要问题
8.1 执行running 回调的CPU 与 task 即将运行的 CPU 可能不是同一个
6.18.26 的 running 回调注释中特别强调了这一点(6.15.7 的注释中并没有提到):
Note that this callback may be called from a CPU other than the one the task is going to run on.
也就是说:执行 BPF running 函数的 CPU,不一定是 task 即将运行的那个 CPU。 典型场景是 task 的 CPU 亲和性(affinity)被修改,导致 set_next_task_scx() 在 CPU A 上执行,但 task 实际将在 CPU B 上运行。
#mermaid-svg-JSBTizY90nbt63Po{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-JSBTizY90nbt63Po .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-JSBTizY90nbt63Po .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-JSBTizY90nbt63Po .error-icon{fill:#552222;}#mermaid-svg-JSBTizY90nbt63Po .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-JSBTizY90nbt63Po .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-JSBTizY90nbt63Po .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-JSBTizY90nbt63Po .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-JSBTizY90nbt63Po .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-JSBTizY90nbt63Po .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-JSBTizY90nbt63Po .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-JSBTizY90nbt63Po .marker{fill:#333333;stroke:#333333;}#mermaid-svg-JSBTizY90nbt63Po .marker.cross{stroke:#333333;}#mermaid-svg-JSBTizY90nbt63Po svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-JSBTizY90nbt63Po p{margin:0;}#mermaid-svg-JSBTizY90nbt63Po .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-JSBTizY90nbt63Po .cluster-label text{fill:#333;}#mermaid-svg-JSBTizY90nbt63Po .cluster-label span{color:#333;}#mermaid-svg-JSBTizY90nbt63Po .cluster-label span p{background-color:transparent;}#mermaid-svg-JSBTizY90nbt63Po .label text,#mermaid-svg-JSBTizY90nbt63Po span{fill:#333;color:#333;}#mermaid-svg-JSBTizY90nbt63Po .node rect,#mermaid-svg-JSBTizY90nbt63Po .node circle,#mermaid-svg-JSBTizY90nbt63Po .node ellipse,#mermaid-svg-JSBTizY90nbt63Po .node polygon,#mermaid-svg-JSBTizY90nbt63Po .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-JSBTizY90nbt63Po .rough-node .label text,#mermaid-svg-JSBTizY90nbt63Po .node .label text,#mermaid-svg-JSBTizY90nbt63Po .image-shape .label,#mermaid-svg-JSBTizY90nbt63Po .icon-shape .label{text-anchor:middle;}#mermaid-svg-JSBTizY90nbt63Po .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-JSBTizY90nbt63Po .rough-node .label,#mermaid-svg-JSBTizY90nbt63Po .node .label,#mermaid-svg-JSBTizY90nbt63Po .image-shape .label,#mermaid-svg-JSBTizY90nbt63Po .icon-shape .label{text-align:center;}#mermaid-svg-JSBTizY90nbt63Po .node.clickable{cursor:pointer;}#mermaid-svg-JSBTizY90nbt63Po .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-JSBTizY90nbt63Po .arrowheadPath{fill:#333333;}#mermaid-svg-JSBTizY90nbt63Po .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-JSBTizY90nbt63Po .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-JSBTizY90nbt63Po .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-JSBTizY90nbt63Po .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-JSBTizY90nbt63Po .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-JSBTizY90nbt63Po .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-JSBTizY90nbt63Po .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-JSBTizY90nbt63Po .cluster text{fill:#333;}#mermaid-svg-JSBTizY90nbt63Po .cluster span{color:#333;}#mermaid-svg-JSBTizY90nbt63Po div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-JSBTizY90nbt63Po .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-JSBTizY90nbt63Po rect.text{fill:none;stroke-width:0;}#mermaid-svg-JSBTizY90nbt63Po .icon-shape,#mermaid-svg-JSBTizY90nbt63Po .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-JSBTizY90nbt63Po .icon-shape p,#mermaid-svg-JSBTizY90nbt63Po .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-JSBTizY90nbt63Po .icon-shape .label rect,#mermaid-svg-JSBTizY90nbt63Po .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-JSBTizY90nbt63Po .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-JSBTizY90nbt63Po .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-JSBTizY90nbt63Po :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} CPU B(task 的目标 CPU)
CPU A(执行调度的 CPU)
task 将被迁移
set_next_task_scx()
BPF running() 在 CPU A 上执行
task 实际在 CPU B 上运行
这意味着在 running 回调中,不能使用 bpf_get_smp_processor_id() 来获取 task 即将运行的 CPU,因为返回的是 CPU A(当前执行 BPF 程序的 CPU),而不是 CPU B(task 的目标 CPU)。
正确做法是使用 scx_bpf_task_cpu(p) 来获取 task 的目标 CPU:
c
void BPF_STRUCT_OPS(my_running, struct task_struct *p)
{
s32 cpu = scx_bpf_task_cpu(p); // ← 正确:获取 task 的目标 CPU
// 不要用 bpf_get_smp_processor_id(),那可能是不同的 CPU
}
8.2 running 的调用频率
running 在以下场景被调用:
| 场景 | 说明 |
|---|---|
| 正常调度切换 | __schedule() → pick_next_task() → put_prev_set_next_task() → set_next_task_scx() |
| Core-sched 强制调度 | core-sched 机制让 SMT 同级线程上的任务被强制切换 |
| Task 亲和性变更 | task 被迁移到新 CPU 时,在新 CPU 的 set_next_task_scx 中触发 |
running 不会在以下场景被调用:
next == prev(同一个任务继续运行)- task 不在
SCX_TASK_QUEUED状态 - BPF 调度器没有注册 running 回调
8.3 running 中不能做阻塞操作
running 回调运行在 rq lock 保护的上下文中(SCX_KF_REST),此时:
- 当前 CPU 的运行队列锁(rq->lock)被持有
- 中断被禁用
- 内核处于原子上下文
在这些约束下,任何可能导致睡眠的操作都是禁止的,包括:
- 使用
bpf_ktime_get_ns()是安全的(不阻塞) - 使用
bpf_printk()是安全的 - 任何涉及内存分配、锁竞争、I/O 等操作都可能导致死锁
8.4 running 回调中可以修改 task 的调度参数吗?
可以,但有约束。常见的做法包括:
- 更新 per-task 的调度元数据(如
p->scx.dsq_vtime) - 更新 BPF map 中的统计信息
- 修改自定义的调度策略参数
但不能 修改会影响调度器核心行为的字段,如 p->scx.slice(时间片)在 running 中修改是安全的,但修改 p->scx.weight 应该通过 ops.set_weight() 来完成。
9. 实战:scx_simple 中的 running 实现
scx_simple 是 Linux 内核自带的示例 sched_ext 调度器(tools/scheduling/scx_simple.bpf.c),实现了一个全局加权 vtime 调度器。
它的 running用来推进全局 vtime 时钟**。当 task 开始执行时,如果它的 vtime 比当前全局时钟还大(说明这个 task 被饿了一段时间),就把全局时钟推到它的 vtime 位置。这样:
- vtime 较小的 task(亏欠更多的)仍然会被优先调度。
c
static u64 vtime_now; // 全局 vtime 时钟
void BPF_STRUCT_OPS(simple_running, struct task_struct *p)
{
if (fifo_sched)
return;
/*
* Global vtime always progresses forward as tasks start executing. The
* test and update can be performed concurrently from multiple CPUs and
* thus racy. Any error should be contained and temporary. Let's just
* live with it.
*/
if (time_before(vtime_now, p->scx.dsq_vtime))
vtime_now = p->scx.dsq_vtime;
}
10. running 与其他回调的完整时序
以下是 scx 任务从创建到调度执行的完整回调时序:
BPF 调度器 scx 框架 fork 流程 BPF 调度器 scx 框架 fork 流程 #mermaid-svg-U6ptw8W2AtDIUtFu{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-U6ptw8W2AtDIUtFu .error-icon{fill:#552222;}#mermaid-svg-U6ptw8W2AtDIUtFu .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-U6ptw8W2AtDIUtFu .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-U6ptw8W2AtDIUtFu .marker{fill:#333333;stroke:#333333;}#mermaid-svg-U6ptw8W2AtDIUtFu .marker.cross{stroke:#333333;}#mermaid-svg-U6ptw8W2AtDIUtFu svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-U6ptw8W2AtDIUtFu p{margin:0;}#mermaid-svg-U6ptw8W2AtDIUtFu .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-U6ptw8W2AtDIUtFu text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-U6ptw8W2AtDIUtFu .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-U6ptw8W2AtDIUtFu .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-U6ptw8W2AtDIUtFu .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-U6ptw8W2AtDIUtFu .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-U6ptw8W2AtDIUtFu #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-U6ptw8W2AtDIUtFu .sequenceNumber{fill:white;}#mermaid-svg-U6ptw8W2AtDIUtFu #sequencenumber{fill:#333;}#mermaid-svg-U6ptw8W2AtDIUtFu #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-U6ptw8W2AtDIUtFu .messageText{fill:#333;stroke:none;}#mermaid-svg-U6ptw8W2AtDIUtFu .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-U6ptw8W2AtDIUtFu .labelText,#mermaid-svg-U6ptw8W2AtDIUtFu .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-U6ptw8W2AtDIUtFu .loopText,#mermaid-svg-U6ptw8W2AtDIUtFu .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-U6ptw8W2AtDIUtFu .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-U6ptw8W2AtDIUtFu .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-U6ptw8W2AtDIUtFu .noteText,#mermaid-svg-U6ptw8W2AtDIUtFu .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-U6ptw8W2AtDIUtFu .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-U6ptw8W2AtDIUtFu .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-U6ptw8W2AtDIUtFu .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-U6ptw8W2AtDIUtFu .actorPopupMenu{position:absolute;}#mermaid-svg-U6ptw8W2AtDIUtFu .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-U6ptw8W2AtDIUtFu .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-U6ptw8W2AtDIUtFu .actor-man circle,#mermaid-svg-U6ptw8W2AtDIUtFu line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-U6ptw8W2AtDIUtFu :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 新任务创建 状态: NONE → INIT → READY 状态: READY → ENABLED 任务被唤醒 调度循环(每次调度都会发生) next 在 CPU 上执行 时间片耗尽或被抢占 loop 每次调度切换 任务退出/阻塞 scx_fork() ops.init_task(p, fork=true) scx_post_fork() ops.enable(p) ops.set_weight(p, weight) ops.runnable(p, enq_flags) ops.select_cpu(p, prev_cpu, wake_flags) ops.enqueue(p, enq_flags) ops.dispatch(cpu, prev) ops.stopping(prev, runnable) ops.running(next) ops.stopping(p, runnable=false) ops.quiescent(p, deq_flags)
12. 总结:写给 eBPF 调度器开发者的 check list
#mermaid-svg-xtyoIG4mePsJukkV{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-xtyoIG4mePsJukkV .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-xtyoIG4mePsJukkV .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-xtyoIG4mePsJukkV .error-icon{fill:#552222;}#mermaid-svg-xtyoIG4mePsJukkV .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-xtyoIG4mePsJukkV .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-xtyoIG4mePsJukkV .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-xtyoIG4mePsJukkV .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-xtyoIG4mePsJukkV .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-xtyoIG4mePsJukkV .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-xtyoIG4mePsJukkV .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-xtyoIG4mePsJukkV .marker{fill:#333333;stroke:#333333;}#mermaid-svg-xtyoIG4mePsJukkV .marker.cross{stroke:#333333;}#mermaid-svg-xtyoIG4mePsJukkV svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-xtyoIG4mePsJukkV p{margin:0;}#mermaid-svg-xtyoIG4mePsJukkV .edge{stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .section--1 rect,#mermaid-svg-xtyoIG4mePsJukkV .section--1 path,#mermaid-svg-xtyoIG4mePsJukkV .section--1 circle,#mermaid-svg-xtyoIG4mePsJukkV .section--1 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section--1 path{fill:hsl(240, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section--1 text{fill:#ffffff;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon--1{font-size:40px;color:#ffffff;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge--1{stroke:hsl(240, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth--1{stroke-width:17;}#mermaid-svg-xtyoIG4mePsJukkV .section--1 line{stroke:hsl(60, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-0 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-0 path,#mermaid-svg-xtyoIG4mePsJukkV .section-0 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-0 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-0 path{fill:hsl(60, 100%, 73.5294117647%);}#mermaid-svg-xtyoIG4mePsJukkV .section-0 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-0{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-0{stroke:hsl(60, 100%, 73.5294117647%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-0{stroke-width:14;}#mermaid-svg-xtyoIG4mePsJukkV .section-0 line{stroke:hsl(240, 100%, 83.5294117647%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-1 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-1 path,#mermaid-svg-xtyoIG4mePsJukkV .section-1 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-1 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-1 path{fill:hsl(80, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-1 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-1{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-1{stroke:hsl(80, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-1{stroke-width:11;}#mermaid-svg-xtyoIG4mePsJukkV .section-1 line{stroke:hsl(260, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-2 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-2 path,#mermaid-svg-xtyoIG4mePsJukkV .section-2 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-2 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-2 path{fill:hsl(270, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-2 text{fill:#ffffff;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-2{font-size:40px;color:#ffffff;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-2{stroke:hsl(270, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-2{stroke-width:8;}#mermaid-svg-xtyoIG4mePsJukkV .section-2 line{stroke:hsl(90, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-3 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-3 path,#mermaid-svg-xtyoIG4mePsJukkV .section-3 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-3 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-3 path{fill:hsl(300, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-3 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-3{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-3{stroke:hsl(300, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-3{stroke-width:5;}#mermaid-svg-xtyoIG4mePsJukkV .section-3 line{stroke:hsl(120, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-4 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-4 path,#mermaid-svg-xtyoIG4mePsJukkV .section-4 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-4 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-4 path{fill:hsl(330, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-4 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-4{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-4{stroke:hsl(330, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-4{stroke-width:2;}#mermaid-svg-xtyoIG4mePsJukkV .section-4 line{stroke:hsl(150, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-5 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-5 path,#mermaid-svg-xtyoIG4mePsJukkV .section-5 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-5 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-5 path{fill:hsl(0, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-5 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-5{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-5{stroke:hsl(0, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-5{stroke-width:-1;}#mermaid-svg-xtyoIG4mePsJukkV .section-5 line{stroke:hsl(180, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-6 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-6 path,#mermaid-svg-xtyoIG4mePsJukkV .section-6 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-6 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-6 path{fill:hsl(30, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-6 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-6{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-6{stroke:hsl(30, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-6{stroke-width:-4;}#mermaid-svg-xtyoIG4mePsJukkV .section-6 line{stroke:hsl(210, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-7 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-7 path,#mermaid-svg-xtyoIG4mePsJukkV .section-7 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-7 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-7 path{fill:hsl(90, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-7 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-7{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-7{stroke:hsl(90, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-7{stroke-width:-7;}#mermaid-svg-xtyoIG4mePsJukkV .section-7 line{stroke:hsl(270, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-8 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-8 path,#mermaid-svg-xtyoIG4mePsJukkV .section-8 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-8 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-8 path{fill:hsl(150, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-8 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-8{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-8{stroke:hsl(150, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-8{stroke-width:-10;}#mermaid-svg-xtyoIG4mePsJukkV .section-8 line{stroke:hsl(330, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-9 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-9 path,#mermaid-svg-xtyoIG4mePsJukkV .section-9 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-9 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-9 path{fill:hsl(180, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-9 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-9{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-9{stroke:hsl(180, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-9{stroke-width:-13;}#mermaid-svg-xtyoIG4mePsJukkV .section-9 line{stroke:hsl(0, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-10 rect,#mermaid-svg-xtyoIG4mePsJukkV .section-10 path,#mermaid-svg-xtyoIG4mePsJukkV .section-10 circle,#mermaid-svg-xtyoIG4mePsJukkV .section-10 polygon,#mermaid-svg-xtyoIG4mePsJukkV .section-10 path{fill:hsl(210, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-10 text{fill:black;}#mermaid-svg-xtyoIG4mePsJukkV .node-icon-10{font-size:40px;color:black;}#mermaid-svg-xtyoIG4mePsJukkV .section-edge-10{stroke:hsl(210, 100%, 76.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .edge-depth-10{stroke-width:-16;}#mermaid-svg-xtyoIG4mePsJukkV .section-10 line{stroke:hsl(30, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-xtyoIG4mePsJukkV .disabled,#mermaid-svg-xtyoIG4mePsJukkV .disabled circle,#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:lightgray;}#mermaid-svg-xtyoIG4mePsJukkV .disabled text{fill:#efefef;}#mermaid-svg-xtyoIG4mePsJukkV .section-root rect,#mermaid-svg-xtyoIG4mePsJukkV .section-root path,#mermaid-svg-xtyoIG4mePsJukkV .section-root circle,#mermaid-svg-xtyoIG4mePsJukkV .section-root polygon{fill:hsl(240, 100%, 46.2745098039%);}#mermaid-svg-xtyoIG4mePsJukkV .section-root text{fill:#ffffff;}#mermaid-svg-xtyoIG4mePsJukkV .section-root span{color:#ffffff;}#mermaid-svg-xtyoIG4mePsJukkV .section-2 span{color:#ffffff;}#mermaid-svg-xtyoIG4mePsJukkV .icon-container{height:100%;display:flex;justify-content:center;align-items:center;}#mermaid-svg-xtyoIG4mePsJukkV .edge{fill:none;}#mermaid-svg-xtyoIG4mePsJukkV .mindmap-node-label{dy:1em;alignment-baseline:middle;text-anchor:middle;dominant-baseline:middle;text-align:center;}#mermaid-svg-xtyoIG4mePsJukkV :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} running要点
触发时机
每次调度切换时触发
与stopping配对 非一次性
执行顺序
先stopping旧任务
后running新任务
跨CPU陷阱
可能不在目标CPU上执行
必须用scx_bpf_task_cpu获取CPU
上下文约束
持有rq lock 不能阻塞
只能调用SCX_KF_REST
典型用途
推进全局vtime时钟
与stopping配合实现vtime调度
core-sched
强制调度时也会调用
触发SCX_DEQ_CORE_SCHED_EXEC
| 关注点 | 要点 |
|---|---|
| 触发时机 | 每次任务被调度到 CPU 执行时,非一次性 |
| 调用频率 | 与 stopping 配对,每次调度切换调用一次 |
| 执行顺序 | 先 stopping(旧任务),后 running(新任务) |
| 跨 CPU 调用 | running 可能在不同于 task 目标 CPU 的 CPU 上调用,使用 scx_bpf_task_cpu(p) |
| 上下文约束 | rq lock 持有状态,不能做阻塞操作 |
| 典型用途 | 推进全局时钟 |
| core-sched | core-sched 强制调度时,running 也会被调用(通过 SCX_DEQ_CORE_SCHED_EXEC 标志) |
参考资料
- Linux 6.18.26 内核源码
kernel/sched/ scx_simple调度器源码tools/sched_ext/scx_simple.bpf.c