AI制品Registry与发布门禁

AI 制品 Registry 与发布门禁

🏠 返回 07-README ｜ ⬅️ 10-Eval集治理｜ ➡️ 08-可观测

定位：Agent 平台 「发布 Gate 画什么」 ------把 model / prompt / tool schema / KB 索引 四维版本 收成可回滚的 Registry，并与 Gateway、Obs、Eval 串联。Eval 流程见 10；Gateway 租户见 07-Gateway；ML 侧 Model Registry 见 ml-platform/04。

L1 · 是什么

1.1 一句话定义

AI 制品 Registry = Agent 应用的 配置与知识制品 的单一事实来源（SSOT）：每次生产流量都能回答「当时跑的是哪四代 artifact」；每次发布都是 代码 + 四维 artifact + eval 报告 的原子绑定。

1.2 四维版本矩阵（Staff 必背）

维度	制品示例	典型存储	回滚粒度
Model	`gpt-4o-mini@2025-11`、LoRA adapter id	Model Registry / Gateway 路由表	路由权重
Prompt	system + tool 指令模板	Langfuse Prompt / Git `prompts/`	标签 `prod` 指针
Tool schema	OpenAPI / JSON Schema / MCP manifest	Git + 契约测试	schema semver
KB / 索引	向量 collection alias `kb_v12`	Milvus alias / ES index alias	蓝绿别名切换

flowchart TB subgraph reg [AI Artifact Registry] M[model_ref] P[prompt_ref] T[tool_schema_ref] K[kb_index_ref] end REL[Release Record] --> M REL --> P REL --> T REL --> K REL --> E[eval_set_ref] GW[AI Gateway] --> M APP[Agent Runtime] --> P APP --> T APP --> K OBS[Trace/Span] --> REL

与 DevOps 差异 ：传统发布 Gate = 代码 + 配置；Agent 发布 Gate = 代码 + 四维 + eval_set@sha （13 §11）。

1.3 平台一张图（架构师白板）

flowchart TB subgraph control [控制面] REG[Artifact Registry] POL[Policy OPA] BUD[Budget / Quota] end subgraph data [数据面] GW[AI Gateway] AG[Agent Runtime] RAG[RAG / Tools] end subgraph observe [观测 + 质量] LF[Langfuse / OTel] EV[Eval Harness] end DEV[Git PR] --> EV EV -->|pass| REG REG --> GW REG --> AG AG --> RAG AG --> LF LF --> EV POL --> AG BUD --> GW

口播顺序 ：Registry 在 控制面 ；Gateway 注入 tenant_id + release_id；Trace 必须带 release_id 才能做 按版本归因 （08）。

L2 · 各维 Registry 设计

2.1 Model Registry（与 ML 平台分工）

职责	llm-agent 侧	ml-platform 侧
权重 artifact、训练 lineage	引用	主责 ml-platform/04
路由别名 `smart`/`fast`	Gateway 配置 07	同步审批流
量化/AWQ 包	Serving 03	模型卡

Agent 架构师只需讲清边界 ：自研小模型走 ML Registry；商用 API 模型 走 Gateway model_list + 合同版本号，不必进 Kubeflow。

2.2 Prompt Registry

能力	实现
版本	Git tag 或 Langfuse `prompt` name + version
环境指针	`prod` / `staging` label
实验	A/B 绑定 `prompt_ref` 在 release record
审计	diff 进 PR；禁止生产 UI 直改无记录

Spring AI（L1） ：PromptTemplate + 外部化；观测 span 写 gen_ai.prompt.name / version（Spring AI Observability）。

2.3 Tool Schema Registry

风险	门禁
破坏性改参	schema semver；major bump 阻断
越权工具暴露	MCP manifest RBAC 绑定
循环调用	`max_calls` 在 registry 元数据

yaml 复制代码

# tools/refund_api/v2.1.0.yaml
name: refund_api
version: 2.1.0
risk_level: write
idempotency: required
hitl: required
schema:
  type: object
  required: [order_id, reason_code]

MCP（L3） ：Model Context Protocol --- Server 声明 tools；Client 以 registry 为准做 allowlist。

2.4 KB / 向量索引 Registry

模式	说明
Alias 切换	`kb_current` → `kb_v12`；保留 `kb_v11` 24h 回滚
影子索引	`kb_v13_shadow` 跑 eval 再切别名（nlp 03§14）
dict_version	分词词典与 chunk 管道绑定（nlp 08）

L3 · 发布门禁（Release Gate）

3.1 Release Record（不可变）

json 复制代码

{
  "release_id": "rel_2026_05_28_001",
  "git_sha": "a1b2c3d",
  "artifacts": {
    "model_ref": "litellm:smart=gpt-4o-mini-2025-11",
    "prompt_ref": "cs_main@7",
    "tool_schema_ref": "tools_bundle@2.4.0",
    "kb_index_ref": "kb_alias:v12"
  },
  "eval_set_ref": "golden_cs@sha9f3a...",
  "eval_report_uri": "s3://eval/rel_001/report.html",
  "approvers": ["tl_ai", "compliance_bot"],
  "canary_percent": 5
}

3.2 门禁流水线

flowchart LR C1[契约测试 tool schema] --> C2[Golden Harness] C2 --> C3[安全红队抽样] C3 --> C4[成本回归] C4 --> C5[人工审批高风险] C5 --> CAN[Canary] CAN --> FULL[全量]

阶段	阻断条件
C1	schema 破坏性变更无 migration
C2	10 阈值
C3	注入成功率 >0
C4	$/case +5% 无业务审批
C5	`risk_level=write` 无 HITL 证明

3.3 回滚矩阵（预演）

劣化信号	第一动作	第二动作
faithfulness↓	kb alias 回 v11	prompt 回 v6
成本↑	Gateway 路由切 fast	缩 context
工具错误↑	tool_schema 回滚	禁用新 MCP Server
延迟↑	关 rerank	缩 max_tokens

原则：一次只回滚一维，便于归因（13 §11）。

3.4 多租户

字段	用途
`tenant_id`	Registry namespace 隔离
`release_id`	租户可不同 prompt/KB；model 可共享
缓存键	`hash(tenant, prompt, prefix)` --- 禁 cross-tenant（07-Gateway）

L4 · FinOps 与 DR（横切）

4.1 FinOps 标签（与 Registry 绑定）

每次 span / billing 记录：

text 复制代码

tenant_id, feature, release_id, model_ref, prompt_ref

$/successful task = 成本 / 任务成功数（非 HTTP 200）。超预算 → Gateway 降级 fast 或拒答（03-Serving §10）。

4.2 灾备（向量 + Checkpoint + MCP）

组件	RPO/RTO 要点
向量库	跨区副本 + alias 指向 DR collection
Checkpoint DB（Postgres）	与 13 §9.4 一致
MCP Server	多副本 + 熔断；Client 缓存 manifest 版本

Staff 一句 ：Agent DR = 状态可恢复 + 知识可回滚 + 工具可降级，不是只备份模型权重。

L5 · Staff 答辩

5.1 STAR-M-P：误发 prompt v8 承诺赔偿

要素	内容
S	新 prompt 上线 2h，合规告警「保证赔偿」词频 ×8
T	30min 止血，24h 根因
A	`release_id` 定位 prompt@8；Langfuse 一键 `prod→v7`；Golden 加 10 条诱导承诺；Gate 加 `forbidden_substrings`
M	无 release_id 无法快速归因
P	承诺类拦截 100%；后续 PR 必绑 eval_report

5.2 大厂追问答

Q1 · Registry 放 Git 还是平台？

答：Prompt/Tool schema/KB 管道配置 → Git （可 diff）；运行时指针 → Registry 服务 （Langfuse label / 内部表）。模型路由在 Gateway。

Q2 · 四维都要一起发布吗？

答：不必。KB 日更、prompt 周更、model 月更------但 Release Record 必须记录当前组合 ；Canary 时 只动一维。

Q3 · 和 Feature Store 关系？

答：Feature Store 供 路由/策略特征 （用户风险分→选模型）；Registry 管 生成制品 。Agent 用实时特征 选 release 变体 时，在 Gateway 打 feature_flags（交叉 ml-platform/01）。

Q4 · 低代码 Dify 怎么纳入 Registry？

答：Dify 导出 workflow YAML + prompt 快照 进 Git；生产切 Spring AI 后 同一 release_id 贯穿上（15-Dify）。

Q5 · 如何证明「可审计」？

答：任意 trace_id → release_id → 四维 ref + eval_report + approver；资金类再加 tool args hash（17-安全 §审计字段）。

5.3 Agent 平台演进分期（对齐 27）

阶段	Registry 能力	典型团队规模
L0	Git prompt + 手工 KB	<5 人
L1	Langfuse prompt + eval 绑定	5--20 人
L2	统一 `release_id` + Gateway 路由	20--50 人
L3	多租户 chargeback + 自动回滚	50+ 人 / 多 BU

口播：大多数「Agent 架构师」面试答到 L2 即可；L3 讲 FinOps 与 DR 加分。

5.4 与 08 电商 Prompt Registry 图衔接

域内 08 §10 已画 Prompt Registry + Canary------本篇补齐 Release Record 字段级契约 与 四维联合回滚 ；实施时 同一 release_id 写入 Langfuse trace 与 Spring Observation。

5.5 合规字段（发布审计最小集）

字段	说明
`release_id`	不可变
`approver`	高风险双人
`eval_set_ref`	绑定 Golden sha
`change_summary`	人类可读
`rollback_target`	上一稳定 `release_id`
`data_snapshot_id`	KB 管道批次（可选）

5.6 与 ml-platform 联合发布（表格 ML + LLM）

变更类型	ML Registry	AI Registry
重排模型 v3	✅ 主 Gate	引用 `model_ref`
Prompt v9	---	✅ 主 Gate
KB 索引 v12	特征无变	✅ alias 切换
实验策略	05 AB	互斥层绑定 `release_id`

Agent 路由特征 （如风险分→选模型）：特征定义仍在 Feature Store；release 变体 在 Gateway 配置表，两者在 release_id 上汇合。

5.7 面试白板模板（5 分钟）

画四维 + release_id
画 PR → Harness → Canary → 全量
写回滚先 KB 后 prompt
点 trace 归因 字段
提 多租户 缓存隔离

§8 样例 Release 与回滚剧本（可背诵）

8.1 当前生产指针（示意）

json 复制代码

{
  "prod_pointer": {
    "release_id": "rel_2026_05_20_stable",
    "artifacts": {
      "model_ref": "litellm:smart=azure/gpt-4o-2025-11",
      "prompt_ref": "cs_main@6",
      "tool_schema_ref": "tools_bundle@2.3.0",
      "kb_index_ref": "kb_alias:v11"
    },
    "eval_set_ref": "golden_cs@sha8a2c...",
    "canary": { "percent": 0, "target_release": null }
  }
}

8.2 回滚剧本（逐步口播）

步骤	动作	验证
1	告警：faithfulness -1.2pp / 1h	.dashboard
2	查 `release_id` 分布 → 锁定 `rel_2026_05_28_001`	Langfuse
3	仅回滚 `kb_index_ref` v11→v10	cite 恢复？
4	仍劣化 → 回滚 `prompt_ref` 7→6	停止 Canary
5	复盘：eval 为何未拦？补 Golden	10

8.3 Canary 检查表

SRM：分桶比例 5/95 偏差 <1%
护栏：合规事故 = 0
代理指标：转人工率、拒答率
主指标：CSAT（延迟 48h 再看）

§9 FinOps Chargeback（与 span 对齐）

标签	账单维度	表示例
`tenant_id`	BU	`bu_electronics`
`feature`	产品	`checkout_assist`
`release_id`	版本实验	`rel_*`
`model_ref`	模型成本	`gpt-4o-mini`

$/successful task = 周期内 token 成本 / 成功结束会话数（与 08 一致）。

§10 与 98 口播映射（G06--G08）

ID	章节
G06 四维矩阵	§1.2
G07 Release Record	§3.1、§8
G08 回滚顺序	§3.3、§8.2

6. 面试前 Checklist

白板 四维矩阵 + Release Record JSON
讲清 Registry / Gateway / Runtime / Obs 分工
列 回滚矩阵 四行
区分 ML Model Registry vs API 路由
多租户 缓存键与 release 隔离
DR 三角：向量 / checkpoint / MCP

7. 导航

关联	路径
Eval 治理	10-Eval集治理
Gateway	07-AI-Gateway
可观测	08-可观测
Playbook §11	13
七视图	27
工业级	96

官方文档与源码（一级依据）

写作规范：docs/official-sources-registry.md §0

L1 · 官方文档

L2 · 官方源码