05｜从 `SuspenseException` 到 `retryTimedOutBoundary`：Suspense 的 Ping 与 Retry 机制

本栏目是「React 源码剖析」系列：我会以源码为证据、以架构为线索，讲清 React 从运行时到核心算法的关键设计。开源仓库：https://github.com/facebook/react

引言：为什么 Suspense 需要专门的"Ping/Retry"机制？

在第 3、4 篇我们把 更新进入引擎 和 Lane 优先级模型讲清了。但只要你开始读并发渲染，很快就会遇到一个现实问题：

"渲染过程"本质是计算。
"数据/资源准备好"本质是 I/O。

React 想要实现的不是"遇到 I/O 就阻塞整个线程"，而是：

渲染阶段可以 暂停（suspend），并选择切到 fallback
数据准备好后可以 被唤醒（ping），重新尝试渲染
如果 fallback 已经提交，则需要 重试（retry） 让边界从 fallback 回到 primary

这篇文章你会看到一套非常 React 风格的工程化平衡：

通过 SuspenseException（一个不可被用户捕获的"哑异常"） 实现"中断 call stack"，但又不让 thenable 泄露到用户代码。
通过 pingCache（按 root + lanes 作为 thread ID 去重） 实现"同一个 Promise 只绑定一次 ping listener"。
通过 commit 阶段的 retryCache（按 boundary fiber 去重） 实现"fallback 提交后再绑定 retry listener"。
通过 RetryLane + 节流（FALLBACK_THROTTLE_MS） 避免 loading 状态闪烁与无意义的频繁 commit。

本文涉及的核心文件（按阅读顺序）：

packages/react-reconciler/src/ReactFiberThenable.js
packages/react-reconciler/src/ReactFiberWorkLoop.js
packages/react-reconciler/src/ReactFiberThrow.js
packages/react-reconciler/src/ReactFiberCompleteWork.js
packages/react-reconciler/src/ReactFiberCommitWork.js
packages/react-reconciler/src/ReactFiberLane.js

核心概念：把"挂起/唤醒/重试"说清楚

1) Wakeable / Thenable

在 React 源码里，"能被 then 的对象"通常被叫做 wakeable （虽然形态上就是 thenable）。它不一定是原生 Promise，但必须符合：

typeof value.then === 'function'

2) Ping vs Retry：两种 listener，两个时机

Ping listener（渲染阶段绑定）：
- 目标：当数据 ready 时，标记 root 的 pinged lanes，并重新调度 root。
- 绑定时机：render 期间一旦发现挂起（throwException / use 抛出 SuspenseException）。
- 去重粒度：root.pingCache，以 (wakeable, lanes) 为 key（lanes 被当成"thread ID"）。
Retry listener（提交后绑定）：
- 目标：fallback 已经提交，wakeable ready 后要触发 boundary 自己发起一次"翻回 primary"的更新。
- 绑定时机：commit 阶段（attachSuspenseRetryListeners）。
- 去重粒度：boundary 自身的 retryCache（不同 boundary 的 wakeable 集合互不影响）。

这两种机制看似重复，但其实解决的是两个不同的问题：

Ping 更像"解除阻塞，让 WorkLoop 决定要不要重来"。
Retry 更像"fallback 已经成为 UI 事实，需要一次新的更新把 UI 改回去"。

3) RetryLane：为什么 Retry 不复用原来的 lane？

Retry 不是用户显式发起的更新，它更像"系统为了完成一致性而触发的后续工作"。React 给它单独的 lane 池：

RetryLane1..4（在 ReactFiberLane.js）
通过 claimNextRetryLane() 轮转分配

这样做有两个好处：

把 retry 从用户更新里隔离出来，避免与输入/transition 争抢语义
允许 WorkLoop 对"只包含 retry 的 render"做特殊策略（例如节流 commit、强制 time-slice 等）

全链路总览：从"抛异常"到"重新渲染"

先用一张时序图把链路跑通，后面再逐段贴源码。
CommitWork FiberRoot ReactFiberThrow.throwException WorkLoop(render) thenable/wakeable 用户组件 CommitWork FiberRoot ReactFiberThrow.throwException WorkLoop(render) thenable/wakeable 用户组件 alt [use() 触发挂起] [传统 throw promise] render 结束，可能 commit fallback 执行 render 1 throw SuspenseException 2 handleThrow() ->> getSuspendedThenable() 3 throw wakeable 4 throwException(root, ..., wakeable, lanes) 5 attachPingListener(root, wakeable, lanes) 6 把 wakeable 放入 boundary.updateQueue (RetryQueue) 7 commitMutationEffectsOnFiber 8 attachSuspenseRetryListeners(boundary, retryQueue) 9 wakeable.then(retry, retry) 10 resolve ->> pingSuspendedRoot 11 markRootPinged(root, lanes) 12 ensureRootIsScheduled(root) 13 resolve ->> resolveRetryWakeable(boundary, wakeable) 14 retryTimedOutBoundary(boundary, retryLane) 15 markRootUpdated + ensureRootIsScheduled 16

你会注意到：同一个 wakeable resolve 会触发两条回调（ping & retry），因为它们服务于不同阶段：

ping 面向 root 的"调度决策"
retry 面向 boundary 的"UI 回切"

源码依次解析：关键路径逐段 Code Walkthrough

Step 1：`use` 为什么要抛 `SuspenseException`？

文件：packages/react-reconciler/src/ReactFiberThenable.js

use（及内部的 thenable 追踪逻辑）在真正挂起时，不直接 throw thenable，而是：

js 复制代码

// ...（省略：thenable 的 status expando 与同步解包逻辑）

suspendedThenable = thenable;
if (__DEV__) {
  needsToResetSuspendedThenableDEV = true;
}
throw SuspenseException;

以及 WorkLoop 用来取回真实 thenable 的入口：

js 复制代码

export function getSuspendedThenable(): Thenable<mixed> {
  if (suspendedThenable === null) {
    throw new Error(
      'Expected a suspended thenable. This is a bug in React. Please file ' +
        'an issue.',
    );
  }
  const thenable = suspendedThenable;
  suspendedThenable = null;
  if (__DEV__) {
    needsToResetSuspendedThenableDEV = false;
  }
  return thenable;
}

为什么这么写？（设计动机）

如果直接 throw thenable，用户代码（或框架）可能用 try/catch 把它吞掉，导致 React 的控制流被破坏。
SuspenseException 是一个"哑异常"：
- 它本身不携带业务信息
- 它的唯一目的就是让 call stack unwind
真正的 thenable 被存放在模块级变量 suspendedThenable，只允许 WorkLoop 取回。

Trade-off：模块级变量安全吗？

这是一种非常工程化的取舍：
- React 渲染是单线程推进，WorkLoop 控制渲染入口
- 通过模块变量传递 thenable，避免在异常对象上携带引用（以及被用户拿到）
代价是：代码阅读门槛更高，且需要极严谨的"写入/清空"配对。

Step 2：WorkLoop 如何把 `SuspenseException` 还原为 thenable？

文件：packages/react-reconciler/src/ReactFiberWorkLoop.js

js 复制代码

function handleThrow(root: FiberRoot, thrownValue: any): void {
  // ...（省略：resetHooksAfterThrow 等）

  if (
    thrownValue === SuspenseException ||
    thrownValue === SuspenseActionException
  ) {
    // ...（省略：注释）
    thrownValue = getSuspendedThenable();
    workInProgressSuspendedReason = SuspendedOnImmediate;
  } else if (thrownValue === SuspenseyCommitException) {
    thrownValue = getSuspendedThenable();
    workInProgressSuspendedReason = SuspendedOnInstance;
  } else {
    const isWakeable =
      thrownValue !== null &&
      typeof thrownValue === 'object' &&
      typeof thrownValue.then === 'function';

    workInProgressSuspendedReason = isWakeable
      ? SuspendedOnDeprecatedThrowPromise
      : SuspendedOnError;
  }

  workInProgressThrownValue = thrownValue;
  // ...（后续会走 throwException / error 边界）
}

这里的关键点是：WorkLoop 把"控制流异常"翻译回"业务对象（thenable）"，并标记挂起原因。

SuspendedOnImmediate：代表这是 use 的挂起路径（React 能做更强的控制）
SuspendedOnDeprecatedThrowPromise：代表是旧式的 throw promise，语义略不同

Step 3：`throwException`：在 render 阶段决定"捕获 fallback"并挂上两类队列

文件：packages/react-reconciler/src/ReactFiberThrow.js

当 thrownValue 是 thenable 时，throwException 做了三件核心事：

找最近的 Suspense handler（boundary）
标记 boundary 要 capture（切到 fallback）
记录 wakeable，并决定是否绑定 ping listener

关键代码：

js 复制代码

if (typeof value.then === 'function') {
  const wakeable: Wakeable = (value: any);
  resetSuspendedComponent(sourceFiber, rootRenderLanes);

  const suspenseBoundary = getSuspenseHandler();
  if (suspenseBoundary !== null) {
    switch (suspenseBoundary.tag) {
      case ActivityComponent:
      case SuspenseComponent:
      case SuspenseListComponent: {
        if (disableLegacyMode || sourceFiber.mode & ConcurrentMode) {
          if (getShellBoundary() === null) {
            renderDidSuspendDelayIfPossible();
          } else {
            const current = suspenseBoundary.alternate;
            if (current === null) {
              renderDidSuspend();
            }
          }
        }

        suspenseBoundary.flags &= ~ForceClientRender;
        markSuspenseBoundaryShouldCapture(
          suspenseBoundary,
          returnFiber,
          sourceFiber,
          root,
          rootRenderLanes,
        );

        const isSuspenseyResource =
          wakeable === noopSuspenseyCommitThenable;
        if (isSuspenseyResource) {
          suspenseBoundary.flags |= ScheduleRetry;
        } else {
          const retryQueue: RetryQueue | null =
            (suspenseBoundary.updateQueue: any);
          if (retryQueue === null) {
            suspenseBoundary.updateQueue = new Set([wakeable]);
          } else {
            retryQueue.add(wakeable);
          }

          if (disableLegacyMode || suspenseBoundary.mode & ConcurrentMode) {
            attachPingListener(root, wakeable, rootRenderLanes);
          }
        }
        return false;
      }

      // ... Offscreen 分支省略
    }
  }

  // ... 没有 boundary 的情况（ConcurrentRoot 允许无限 suspend）
}

这段代码的"叙事"非常重要：

boundary 的 updateQueue 被当成 RetryQueue（Set<Wakeable>）使用：它只是一个"暂存区"，等 commit 阶段再绑定 retry listener。
attachPingListener 是 render 阶段立即绑定：
- 因为数据可能在 fallback commit 前就 resolve
- 或者根本不会 commit fallback（例如 refresh/prerender）

Step 4：`attachPingListener`：用 `pingCache` 去重，用 lanes 当 "thread ID"

文件：packages/react-reconciler/src/ReactFiberWorkLoop.js

js 复制代码

export function attachPingListener(
  root: FiberRoot,
  wakeable: Wakeable,
  lanes: Lanes,
) {
  let pingCache = root.pingCache;
  let threadIDs;
  if (pingCache === null) {
    pingCache = root.pingCache = new PossiblyWeakMap();
    threadIDs = new Set<mixed>();
    pingCache.set(wakeable, threadIDs);
  } else {
    threadIDs = pingCache.get(wakeable);
    if (threadIDs === undefined) {
      threadIDs = new Set();
      pingCache.set(wakeable, threadIDs);
    }
  }
  if (!threadIDs.has(lanes)) {
    workInProgressRootDidAttachPingListener = true;

    threadIDs.add(lanes);
    const ping = pingSuspendedRoot.bind(null, root, wakeable, lanes);
    // ... devtools restorePendingUpdaters
    wakeable.then(ping, ping);
  }
}

以及被绑定的回调：

js 复制代码

function pingSuspendedRoot(
  root: FiberRoot,
  wakeable: Wakeable,
  pingedLanes: Lanes,
) {
  const pingCache = root.pingCache;
  if (pingCache !== null) {
    pingCache.delete(wakeable);
  }

  markRootPinged(root, pingedLanes);

  // ... profiler/act warning

  if (
    workInProgressRoot === root &&
    isSubsetOfLanes(workInProgressRootRenderLanes, pingedLanes)
  ) {
    if (
      workInProgressRootExitStatus === RootSuspendedWithDelay ||
      (workInProgressRootExitStatus === RootSuspended &&
        includesOnlyRetries(workInProgressRootRenderLanes) &&
        now() - globalMostRecentFallbackTime < FALLBACK_THROTTLE_MS)
    ) {
      if ((executionContext & RenderContext) === NoContext) {
        prepareFreshStack(root, NoLanes);
      }
    } else {
      workInProgressRootPingedLanes = mergeLanes(
        workInProgressRootPingedLanes,
        pingedLanes,
      );
    }

    if (workInProgressSuspendedRetryLanes === workInProgressRootRenderLanes) {
      workInProgressSuspendedRetryLanes = NoLanes;
    }
  }

  ensureRootIsScheduled(root);
}

为什么要以 (wakeable, lanes) 去重？

wakeable 可能在不同的 render attempt 里被重复看到。
lanes 在这里扮演"thread ID"：
- 同一个 root 在同一组 lanes 的 render 中，不应该对同一个 wakeable 重复绑定 listener
- 但如果 lanes 不同（比如更高优先级插队），那是不同的"线程"，可以允许另一次绑定

Trade-off：这种去重策略有什么代价？

好处：防止对同一个 promise 绑定 N 次回调导致内存/调度风暴。
代价：把"渲染并发语义"编码进缓存 key（理解成本上升）。

Step 5：RetryLane 的分配与 spawned retry lanes 的语义

5.1 `requestRetryLane`：并发 root 才走 retry lane

文件：packages/react-reconciler/src/ReactFiberWorkLoop.js

js 复制代码

function requestRetryLane(fiber: Fiber) {
  const mode = fiber.mode;
  if (!disableLegacyMode && (mode & ConcurrentMode) === NoMode) {
    return (SyncLane: Lane);
  }

  return claimNextRetryLane();
}

这解释了一个很关键的分叉：

Legacy root（或非 concurrent mode）没有"挂起后再恢复"的空间，retry 直接走 SyncLane。
Concurrent mode 才有 RetryLane 池。

5.2 `claimNextRetryLane`：轮转 4 个 retry lanes

文件：packages/react-reconciler/src/ReactFiberLane.js

js 复制代码

export function claimNextRetryLane(): Lane {
  const lane = nextRetryLane;
  nextRetryLane <<= 1;
  if ((nextRetryLane & RetryLanes) === NoLanes) {
    nextRetryLane = RetryLane1;
  }
  return lane;
}

这是一种很典型的"位图环形分配器"：

常量池很小（4 个）
通过位移循环
避免无限增长的 lane 分配

5.3 `ScheduleRetry`：在 complete 阶段安排一次"立即 retry"

文件：packages/react-reconciler/src/ReactFiberCompleteWork.js

js 复制代码

function scheduleRetryEffect(
  workInProgress: Fiber,
  retryQueue: RetryQueue | null,
) {
  const wakeables = retryQueue;
  if (wakeables !== null) {
    workInProgress.flags |= Update;
  }

  if (workInProgress.flags & ScheduleRetry) {
    const retryLane =
      workInProgress.tag !== OffscreenComponent
        ? claimNextRetryLane()
        : OffscreenLane;
    workInProgress.lanes = mergeLanes(workInProgress.lanes, retryLane);

    markSpawnedRetryLane(retryLane);
  }
}

这段很容易被忽略，但它是理解"为什么 retry lanes 会在 root 上被标记 suspended"的关键：

有些情况下（比如跳过 siblings，为了 sibling prerender），React 会在还没等数据 ready 时就先"排一个 retry lane"去做预渲染。
这会产生一个"spawned retry lane"，需要在 root 完成时作为 suspendedRetryLanes 处理。

WorkLoop 用 markSpawnedRetryLane 记录它：

js 复制代码

export function markSpawnedRetryLane(lane: Lane): void {
  workInProgressSuspendedRetryLanes = mergeLanes(
    workInProgressSuspendedRetryLanes,
    lane,
  );
}

而 root 完成时，ReactFiberLane.markRootFinished 会把"新生成且未 ping 的 retry lanes"加入 root.suspendedLanes：

js 复制代码

if (
  suspendedRetryLanes !== NoLanes &&
  updatedLanes === NoLanes &&
  !(disableLegacyMode && root.tag === LegacyRoot)
) {
  const freshlySpawnedRetryLanes =
    suspendedRetryLanes &
    ~(previouslyPendingLanes & ~finishedLanes);
  root.suspendedLanes |= freshlySpawnedRetryLanes;
}

为什么要把它们标记为 suspended？

这相当于告诉调度器：

这些 retry lanes 现在"有活（pending）"，但你别把它当成可立即完成的普通更新
它们天然更可能再次 suspend，所以应该用 prerender/跳过 siblings 等策略来避免阻塞

Step 6：Commit 阶段绑定 Retry listener，并节流 fallback 的出现/消失

文件：packages/react-reconciler/src/ReactFiberCommitWork.js

当 Suspense boundary 的 Update flag 被标记（通常由 scheduleRetryEffect 触发），commit 阶段会把 render 阶段攒下来的 retryQueue 取出来，绑定 retry listener：

js 复制代码

function attachSuspenseRetryListeners(
  finishedWork: Fiber,
  wakeables: RetryQueue,
) {
  const retryCache = getRetryCache(finishedWork);
  wakeables.forEach(wakeable => {
    if (!retryCache.has(wakeable)) {
      retryCache.add(wakeable);

      const retry = resolveRetryWakeable.bind(null, finishedWork, wakeable);
      wakeable.then(retry, retry);
    }
  });
}

同一个 commit 分支还会负责更新 globalMostRecentFallbackTime，用于节流：

js 复制代码

if (offscreenFiber.flags & Visibility) {
  const isShowingFallback =
    (finishedWork.memoizedState: SuspenseState | null) !== null;
  const wasShowingFallback =
    current !== null &&
    (current.memoizedState: SuspenseState | null) !== null;

  if (alwaysThrottleRetries) {
    if (isShowingFallback !== wasShowingFallback) {
      markCommitTimeOfFallback();
    }
  } else {
    if (isShowingFallback && !wasShowingFallback) {
      markCommitTimeOfFallback();
    }
  }
}

为什么节流属于 commit？

因为节流的目标是"控制 UI 事实（fallback 是否出现在屏幕上）"
而 UI 事实只能由 commit 定义

Step 7：wakeable resolve 后，如何真正触发 retry？

回到 WorkLoop：resolveRetryWakeable 与 retryTimedOutBoundary。

文件：packages/react-reconciler/src/ReactFiberWorkLoop.js

js 复制代码

function retryTimedOutBoundary(boundaryFiber: Fiber, retryLane: Lane) {
  if (retryLane === NoLane) {
    retryLane = requestRetryLane(boundaryFiber);
  }
  const root = enqueueConcurrentRenderForLane(boundaryFiber, retryLane);
  if (root !== null) {
    markRootUpdated(root, retryLane);
    ensureRootIsScheduled(root);
  }
}

export function resolveRetryWakeable(boundaryFiber: Fiber, wakeable: Wakeable) {
  let retryLane: Lane = NoLane; // Default
  let retryCache: WeakSet<Wakeable> | Set<Wakeable> | null;
  switch (boundaryFiber.tag) {
    case ActivityComponent:
    case SuspenseComponent:
      retryCache = boundaryFiber.stateNode;
      const suspenseState: null | SuspenseState | ActivityState =
        boundaryFiber.memoizedState;
      if (suspenseState !== null) {
        retryLane = suspenseState.retryLane;
      }
      break;
    case SuspenseListComponent:
      retryCache = boundaryFiber.stateNode;
      break;
    case OffscreenComponent: {
      const instance: OffscreenInstance = boundaryFiber.stateNode;
      retryCache = instance._retryCache;
      break;
    }
    default:
      throw new Error(
        'Pinged unknown suspense boundary type. ' +
          'This is probably a bug in React.',
      );
  }

  if (retryCache !== null) {
    retryCache.delete(wakeable);
  }

  retryTimedOutBoundary(boundaryFiber, retryLane);
}

这段代码特别像"一个小型调度器"：

从 memoizedState.retryLane 尽量复用已有 retry lane（更稳定的 lane 归因）
否则用 requestRetryLane 分配新的 retry lane
通过 enqueueConcurrentRenderForLane + markRootUpdated + ensureRootIsScheduled 把工作送回 Root Scheduler

Step 8：为什么 Retry commit 会被节流？

文件：packages/react-reconciler/src/ReactFiberWorkLoop.js

当一次 render 只包含 retries（没有用户更新）时，React 会在 finishConcurrentRender 里做节流判断：

js 复制代码

if (
  includesOnlyRetries(lanes) &&
  (alwaysThrottleRetries || exitStatus === RootSuspended)
) {
  const msUntilTimeout =
    globalMostRecentFallbackTime + FALLBACK_THROTTLE_MS - now();

  if (msUntilTimeout > 10) {
    markRootSuspended(
      root,
      lanes,
      workInProgressDeferredLane,
      !workInProgressRootDidSkipSuspendedSiblings,
    );

    const nextLanes = getNextLanes(root, NoLanes, true);
    if (nextLanes !== NoLanes) {
      return;
    }

    pendingEffectsLanes = lanes;
    root.timeoutHandle = scheduleTimeout(
      commitRootWhenReady.bind(
        null,
        root,
        finishedWork,
        // ... 省略参数
        'Throttled',
        renderStartTime,
        renderEndTime,
      ),
      msUntilTimeout,
    );
    return;
  }
}

直觉解释：

retry 往往意味着"我们正在逐步从 fallback 填回真实内容"。
如果每 resolve 一点点就立刻 commit，一方面会造成 UI 闪烁，另一方面 commit 本身也有成本（mutation/layout/passive）。
FALLBACK_THROTTLE_MS（这里是 300ms）像一个"节拍器"，让逐步填充变成更平滑的节奏。

读源码时最容易踩的坑（以及怎么验证）

1) 误以为 ping = retry

你可以用断点（或日志）验证两条回调是并存的：

ReactFiberWorkLoop.attachPingListener（render 阶段）
ReactFiberCommitWork.attachSuspenseRetryListeners（commit 阶段）

同一个 wakeable resolve 后会同时触发：

pingSuspendedRoot（root 视角）
resolveRetryWakeable（boundary 视角）

2) 误以为 retry 一定会立刻 commit primary

实际上：

retry 只是 enqueue 了一次新的 render
是否 commit primary，仍受 getNextLanes、markRootSuspended、节流与是否还有更高优先级工作影响

3) uncached promise 会导致异常行为/禁用 error recovery

WorkLoop 有一段专门的保护（你可以从 workInProgressRootDidAttachPingListener 追进去）：

js 复制代码

if (workInProgressRootDidAttachPingListener && !wasRootDehydrated) {
  root.errorRecoveryDisabledLanes = mergeLanes(
    root.errorRecoveryDisabledLanes,
    originallyAttemptedLanes,
  );
  workInProgressRootInterleavedUpdatedLanes |= originallyAttemptedLanes;
  return RootSuspendedWithDelay;
}

这背后的意思是：

"解包 uncached promise"与"并发错误恢复"在某些情况下会冲突
React 宁可牺牲最后一道错误恢复，也要避免数据竞争引发更隐蔽的问题

总结：React 在 Suspense 上的工程化取舍

这一套 Ping/Retry 机制把"异步 I/O 的完成"翻译成了"可调度的 work"，核心思想可以浓缩为四点：

用异常做控制流，但不让业务对象泄露到用户层 ：SuspenseException + getSuspendedThenable。
Render 阶段绑定 ping，Commit 阶段绑定 retry：同一个 wakeable 两种 listener，分别服务于"调度决策"和"UI 回切"。
用 Lane 表达"重试"语义：RetryLane 让调度器可以对 retries 做专门策略（节流/强制 time-slice/预渲染）。
对用户体验负责 ：FALLBACK_THROTTLE_MS + globalMostRecentFallbackTime 把"流式填充"变成"稳定节奏"。

下一篇如果你愿意继续往下走，我建议把视角从"挂起/唤醒"扩大到"Suspense + Offscreen + sibling prerender"------你会发现 retry lane 与 Offscreen lane 的关系越来越像同一个故事的两种表述。

05｜从 `SuspenseException` 到 `retryTimedOutBoundary`：Suspense 的 Ping 与 Retry 机制