truncateHeadForPTLRetry 分析

源码：src/services/compact/compact.ts:243-291 调用的关键函数：groupMessagesByApiRound（src/services/compact/grouping.ts:22）、getPromptTooLongTokenGap（src/services/api/errors.ts:104）

1. 概述

truncateHeadForPTLRetry 是压缩流程中的最后逃生口。当压缩请求本身（发给 LLM 总结的消息列表）都因为太长而返回 413 prompt-too-long 时，这个函数从头部丢弃最旧的消息，让重试得以继续。

为什么需要它

typescript 复制代码

// compact.ts:235-241 注释
// This is the last-resort escape hatch for CC-1180 --- when the compact request
// itself hits prompt-too-long, the user is otherwise stuck. Dropping the
// oldest context is lossy but unblocks them. The reactive-compact path
// (compactMessages.ts) has the proper retry loop that peels from the tail;
// this helper is the dumb-but-safe fallback for the proactive/manual path
// that wasn't migrated in bfdb472f's unification.

两种压缩路径的区别：

路径	retry 机制	策略
Reactive compact（413 后被动）	`compactMessages.ts` 中的完善 retry loop	从尾部逐层剥离
Proactive / Manual compact（主动/手动）	调用此函数	从头部丢弃最旧消息（有损但安全）

2. 函数签名

typescript 复制代码

export function truncateHeadForPTLRetry(
  messages: Message[],            // 要发给 LLM 总结的消息列表
  ptlResponse: AssistantMessage,  // 413 错误响应（含 token gap 信息）
): Message[] | null               // null = 无法再丢弃了

3. 完整执行流程

yaml 复制代码

truncateHeadForPTLRetry(messages, ptlResponse)
  │
  ├─ 步骤 1: 移除上一次的 PTL_RETRY_MARKER ← 防循环
  │
  ├─ 步骤 2: groupMessagesByApiRound()
  │    └─ 不足 2 组？→ return null
  │
  ├─ 步骤 3: 计算丢弃数量
  │    ├─ 有 token gap → 精确计算（累计超过 gap）
  │    └─ 无 token gap → 丢 20%（最少 1 组）
  │
  ├─ 步骤 4: 至少保留 1 组 ← 保护底线
  │
  └─ 步骤 5: 处理 role 顺序
       ├─ 剩余以 assistant 开头 → 插入 PTL_RETRY_MARKER
       └─ 否 → 直接返回

4. 各步骤详解

步骤 1：移除重试标记（第 250-255 行）

typescript 复制代码

const PTL_RETRY_MARKER = '[earlier conversation truncated for compaction retry]'

const input =
  messages[0]?.type === 'user' &&
  messages[0].isMeta &&
  messages[0].message.content === PTL_RETRY_MARKER
    ? messages.slice(1)
    : messages

目的：防止第二次重试时的零进度循环。

为什么需要 ：假设第一次重试插入了标记并丢了 20% 的组（丢掉了 group 0 = 标记本身）。如果第二次重试时不去掉旧标记，groupMessagesByApiRound 会把标记单独分为 group 0。再丢 20% → 又只丢掉 group 0（标记本身）→ 消息没减少，进入死循环。

图示：

ini 复制代码

第一次重试前:
  [标记][msg1][msg2]...[msgN]
  groupMessagesByApiRound:
    group 0: [标记]          → 20% 丢弃 → 丢掉
    group 1..N: [msg1]...[msgN]

第二次重试前（不去掉标记的话）:
  [标记][msg1][msg2]...[msgN]  ← 和前一次一样！
  仍然丢 group 0（标记）→ 零进度

第二次重试前（去掉标记后）:
  [msg1][msg2]...[msgN]
  → 真正开始丢消息

步骤 2：按 API Round 分组（第 257 行）

typescript 复制代码

const groups = groupMessagesByApiRound(input)
if (groups.length < 2) return null

groupMessagesByApiRound（grouping.ts:22）按 message.id 的变化分组：

typescript 复制代码

export function groupMessagesByApiRound(messages: Message[]): Message[][] {
  const groups: Message[][] = []
  let current: Message[] = []
  let lastAssistantId: string | undefined

  for (const msg of messages) {
    // 新的 assistant 响应（不同 message.id）→ 新组
    if (msg.type === 'assistant' && msg.message.id !== lastAssistantId && current.length > 0) {
      groups.push(current)
      current = [msg]
    } else {
      current.push(msg)
    }
    if (msg.type === 'assistant') lastAssistantId = msg.message.id
  }
  if (current.length > 0) groups.push(current)
  return groups
}

分组规则：

每个 message.id 变化 → 新的一组
一条 API 响应中流式产生的多个消息（相同 message.id）→ 同一组
group 0 通常包含 preamble（system message 等）

为什么按 API round 分组：

每个 API round 是 tool_use/tool_result 的完整闭环
丢弃一整轮不会产生悬挂的 tool_use/tool_result
reactive compact 也使用同样的分组方式

步骤 3：计算丢弃数量（第 260-272 行）

策略 A：精确计算（有 token gap 时）

typescript 复制代码

const tokenGap = getPromptTooLongTokenGap(ptlResponse)
if (tokenGap !== undefined) {
  let acc = 0
  dropCount = 0
  for (const g of groups) {
    acc += roughTokenCountEstimationForMessages(g)
    dropCount++
    if (acc >= tokenGap) break
  }
}

getPromptTooLongTokenGap（errors.ts:104）从 API 的 413 错误中提取 actualTokens - limitTokens 的差值。然后从最老的组开始累加 token，累计超过 gap 时停止------丢弃恰好覆盖超限部分的最少组数。

策略 B：按比例丢弃（无 token gap 时）

typescript 复制代码

dropCount = Math.max(1, Math.floor(groups.length * 0.2))

没有 API 返回的精确信息时，保守丢弃 20% 的组（至少 1 组）。

步骤 4：至少保留 1 组（第 275-276 行）

typescript 复制代码

dropCount = Math.min(dropCount, groups.length - 1)
if (dropCount < 1) return null

保护底线 ：不能把组全部丢光，至少留一组给 LLM 做总结。如果只剩 1 组时还在 PTL → 返回 null 表示无法继续，走用户提示。

步骤 5：处理 role 顺序（第 278-290 行）

typescript 复制代码

const sliced = groups.slice(dropCount).flat()

if (sliced[0]?.type === 'assistant') {
  return [
    createUserMessage({ content: PTL_RETRY_MARKER, isMeta: true }),
    ...sliced,
  ]
}
return sliced

API 约束 ：messages 数组的第一条必须是 role: 'user'。

为什么会出现 assistant 开头 ：groupMessagesByApiRound 的 group 0 通常包含 preamble（user role），后面的 group 都以 assistant 开头。如果丢弃了 group 0，剩余的第一条就是 assistant。

解决方案 ：插入一条合成 user 消息（带 isMeta: true 标记，内容为 PTL_RETRY_MARKER），满足 API 要求。

5. 完整的调用上下文

调用位置

compactConversation 函数中有两处相同的调用模式：

typescript 复制代码

// compact.ts:~467（proactive compact 路径）
if (isPromptTooLongMessage(summaryResponse)) {
  const truncated = truncateHeadForPTLRetry(messagesToSummarize, summaryResponse)
  if (truncated) {
    messagesToSummarize = truncated
    continue  // 重试
  }
}

// compact.ts:~877（manual compact 路径）
if (isPromptTooLongMessage(summaryResponse)) {
  const truncated = truncateHeadForPTLRetry(apiMessages, summaryResponse)
  if (truncated) {
    apiMessages = truncated
    continue  // 重试
  }
}

重试循环

scss 复制代码

compactConversation()
  │
  └─ LLM 总结请求
       │
       ├─ 成功 → 使用摘要
       │
       └─ 413 PTL → truncateHeadForPTLRetry()
            │
            ├─ 返回 Message[] → 重试请求
            │
            └─ 返回 null → 无法继续
                 └─ 返回 ERROR_MESSAGE_PROMPT_TOO_LONG

6. 关键常量

常量	值	用途
`PTL_RETRY_MARKER`	`'[earlier conversation truncated for compaction retry]'`	丢弃消息后插入的占位 user 消息
`ERROR_MESSAGE_PROMPT_TOO_LONG`	`'Conversation too long. Press esc twice...'`	无法再丢弃时给用户的错误提示

7. 设计要点

为什么从头部丢而不是尾部？

因为压缩的目的是保留最近的消息（最新的上下文对后续对话最重要），从头部丢弃最旧的消息是最符合直觉的。

reactive compact 从尾部剥离的原因是它用分治法------每次减半发送的消息量------从头部丢适合"扔掉旧历史"的场景，reactive compact 从尾部丢是为了找到"刚好能通过的消息量"，两种策略服务于不同的重试场景。

有损但安全的兜底

注释中明确承认这是 dumb-but-safe 的方案：

有损：丢弃的消息内容永远丢失了，不会出现在总结中
安全：至少保留 1 组，不会完全清空消息

为什么至少丢 20%

20% 是一个保守的比例。如果 LLM 的摘要请求本身都 413 了，丢 1% 大概率还是 413，丢 20% 才有意义。一次丢弃足够多，减少重试次数。