【Backend Flow工程实践 27】Backend Script Template：一个可维护的后端脚本体系应该如何组织？

作者：Darren H. Chen

方向：Backend Flow / 后端实现流程 / EDA 工具工程 / Script Architecture

demo：LAY-BE-27_backend_script_template

标签：Backend Flow、EDA、Tcl、Script Template、Flow Architecture、Report、Log、Parameter、Regression、工程方法论

一条 backend flow 可能最开始只是一个非常简单的脚本：

text 复制代码

run.tcl

在早期实验中，这样可能已经足够。脚本加载 libraries，导入一个小 design，执行几个命令，生成一个 report，然后退出。

但是在真实项目中，脚本集合会很快膨胀：

text 复制代码

library setup
technology setup
design import
link
floorplan
placement
clock tree
routing
post-route checks
ECO
export
PV handoff
report collection
multi-mode / multi-corner setup
run comparison
archive

如果 flow 没有架构，脚本目录很快就会变成一堆临时救火文件：

text 复制代码

run_all.tcl
run_new.tcl
run_final.tcl
run_final2.tcl
run_final_fix.tcl
run_final_fix_new.tcl

这不是脚本数量问题，而是 flow architecture 问题。

一个可维护的 backend script system 不是一份很长的命令清单，而是一个工程控制系统。它必须把 configuration、stage execution、error handling、reporting、logging、parameter capture、version tracking、result comparison、archive packaging 分离开。

本文的核心问题是：

text 复制代码

一个 backend script template 应该如何组织，才能在 design 规模增长后，仍然保持可复现、可检查、可维护、可扩展？

1. Backend Script 不是命令清单

一个弱 backend script 通常会被组织成一串直接的工具命令：

tcl 复制代码

read_library
read_lef
read_verilog
link_design
create_floorplan
run_placement
run_clock_tree
run_routing
write_reports

这种写法也许能跑通一次，但它无法回答第一次成功运行之后出现的工程问题：

text 复制代码

使用的是哪个 library 版本？
使用的是哪个 constraint file？
哪个 stage 失败了？
哪些 warning 是新增的？
进入下一阶段前必须检查哪些 report？
哪些 parameter 被修改了？
哪个 database snapshot 对应当前结果？
另一个工程师能否复现同样的 run？
两个 run 能否进行比较？
这个 flow 能否在不重写所有内容的情况下扩展？

因此，一个 backend script system 不能只是执行命令。它必须把一次 run 当作一个工程对象来管理。

一次完整的 run 至少有四个维度：

维度	控制内容	示例产物
Execution	哪些 stage 运行，以及按什么顺序运行	stage Tcl、run scripts、status files
Observation	这次 run 报告了什么	QoR reports、summaries、error reports
Reproduction	这次 run 如何被重新创建	manifest、config snapshot、command log
Comparison	这次 run 如何与其他 run 对比	QoR diff、violation diff、runtime diff

如果脚本只处理 execution，那么整个 flow 仍然是脆弱的。

2. Backend Script Stack

一个可维护的 backend script system 可以被建模成一个分层 stack。
Shell Entry Layer
Environment Layer
Configuration Layer
Common Utility Layer
Stage Execution Layer
Report and Check Layer
Snapshot and Archive Layer
Regression and Compare Layer
run_stage.csh / run_all.csh
tool path / license / run dir
design / library / scenario config
error / report / object utilities
import / floorplan / place / CTS / route
QoR / violation / stage summaries
database / output / manifest
QoR diff / warning diff / runtime diff

每一层都有清晰职责。

层级	主要职责	典型文件
Shell entry	以受控方式启动工具	`run_stage.csh`, `run_all.csh`
Environment	固定 run directories 和外部状态	`env_check.tcl`, `run.env.log`
Configuration	描述 design-specific inputs	`design_config.tcl`, `library_config.tcl`
Common utilities	提供可复用 procedure	`report_utils.tcl`, `error_utils.tcl`
Stage execution	执行每个 backend stage	`01_import.tcl`, `03_place.tcl`
Report/check	把工具状态转化成证据	`report_place.tcl`, `check_route.tcl`
Snapshot/archive	保留结果和身份信息	`run_manifest.yaml`, `db/`, `output/`
Compare	跟踪不同 run 之间的差异	`qor_compare.rpt`, `runtime_compare.rpt`

这种结构不是为了好看，而是为了避免 design-specific variables、tool execution logic、report generation、error handling、archive policy 混在一个无法阅读的大文件里。

3. 推荐的 Repository Structure

一个实用的 backend script template 可以从下面的目录结构开始：

text 复制代码

backend_flow/
├─ README.md
├─ config/
│  ├─ design_config.tcl
│  ├─ library_config.tcl
│  ├─ scenario_config.tcl
│  ├─ floorplan_config.tcl
│  ├─ placement_config.tcl
│  ├─ cts_config.tcl
│  ├─ route_config.tcl
│  └─ export_config.tcl
├─ scripts/
│  ├─ run_stage.csh
│  ├─ run_all.csh
│  ├─ clean.csh
│  ├─ archive_run.csh
│  └─ compare_runs.csh
├─ tcl/
│  ├─ main.tcl
│  ├─ common/
│  │  ├─ env_check.tcl
│  │  ├─ config_loader.tcl
│  │  ├─ stage_runner.tcl
│  │  ├─ report_utils.tcl
│  │  ├─ error_utils.tcl
│  │  ├─ object_utils.tcl
│  │  ├─ file_utils.tcl
│  │  └─ manifest_utils.tcl
│  ├─ stages/
│  │  ├─ 01_import.tcl
│  │  ├─ 02_floorplan.tcl
│  │  ├─ 03_place.tcl
│  │  ├─ 04_cts.tcl
│  │  ├─ 05_route.tcl
│  │  ├─ 06_post_route_check.tcl
│  │  ├─ 07_eco.tcl
│  │  └─ 08_export.tcl
│  ├─ reports/
│  │  ├─ report_import.tcl
│  │  ├─ report_floorplan.tcl
│  │  ├─ report_place.tcl
│  │  ├─ report_cts.tcl
│  │  ├─ report_route.tcl
│  │  ├─ report_eco.tcl
│  │  ├─ report_export.tcl
│  │  └─ report_final_summary.tcl
│  └─ checks/
│     ├─ check_import.tcl
│     ├─ check_floorplan.tcl
│     ├─ check_place.tcl
│     ├─ check_cts.tcl
│     ├─ check_route.tcl
│     └─ check_export.tcl
├─ logs/
├─ reports/
├─ output/
├─ db/
├─ tmp/
└─ regression/

核心思想是分离：

text 复制代码

config/      描述项目之间会变化的内容
scripts/     从 shell 启动和控制 run
common/      提供可复用 Tcl 基础设施
stages/      修改 design database
reports/     观察 design database
checks/      判断 stage 是否 ready 或 pass
db/          保存 stage snapshots
output/      保存 handoff deliverables
regression/  保存跨 run 比较结果

当一个新工程师不需要询问原作者，就能回答下面这些问题时，backend flow 才开始具备可维护性：

text 复制代码

top name 在哪里定义？
libraries 在哪里列出？
placement 在哪里执行？
placement 在哪里检查？
placement report 在哪里生成？
run manifest 在哪里写出？
哪些 output files 属于这次 run？

4. Configuration Layer：把变量和执行分离

最常见的脚本失败模式之一，是把 project-specific values 直接硬编码到 stage scripts 里。

一个弱 stage script 可能包含：

tcl 复制代码

set TOP_NAME my_chip_top
set LIB_PATH /project/lib/slow.lib
set CORE_UTIL 0.68
set MAX_ROUTING_LAYER M8

这会让 stage file 很难复用。这个 stage 同时混合了两个不同关注点：

text 复制代码

design 是什么
stage 要做什么

更好的结构是：

text 复制代码

config/design_config.tcl      -> top、netlist、ports、reset、clock names
config/library_config.tcl     -> LEF、Liberty、tech、GDS、RC model
config/scenario_config.tcl    -> modes、corners、SDC、operating conditions
config/floorplan_config.tcl   -> die、core、utilization、macro constraints
config/route_config.tcl       -> layers、NDR、route rules、antenna options

然后 stage script 读取一个稳定的 configuration interface：

tcl 复制代码

source $::env(FLOW_ROOT)/config/design_config.tcl
source $::env(FLOW_ROOT)/config/library_config.tcl
source $::env(FLOW_ROOT)/config/scenario_config.tcl

这样有几个优点：

优点	为什么重要
Reuse	同一套 stage logic 可以服务不同 design
Review	design-specific changes 集中在 config files 中
Debug	一次 run 可以捕获并归档 config snapshot
Comparison	两个 run 可以通过 config diff 比较
Regression	stage scripts 保持稳定，同时 inputs 可以变化

稳定的 configuration layer，是稳定 backend flow 的第一步。

5. Environment Layer：控制外部状态

Backend tool 的行为依赖外部状态：

text 复制代码

tool binary
working directory
license environment
shell variables
search path
temporary directory
log directory
user HOME configuration

一个可维护的 script template 不应该假设这些状态都是正确的，而应该显式检查它们。

Shell entry script 可以定义一个受控的 run environment：

csh 复制代码

#!/bin/csh -f

set nonomatch

set FLOW_ROOT = `pwd`
set RUN_ID = `date +%Y%m%d_%H%M%S`

set LOG_DIR = "$FLOW_ROOT/logs/$RUN_ID"
set RPT_DIR = "$FLOW_ROOT/reports/$RUN_ID"
set DB_DIR  = "$FLOW_ROOT/db/$RUN_ID"
set TMP_DIR = "$FLOW_ROOT/tmp/$RUN_ID"

mkdir -p "$LOG_DIR" "$RPT_DIR" "$DB_DIR" "$TMP_DIR"

setenv FLOW_ROOT "$FLOW_ROOT"
setenv RUN_ID "$RUN_ID"
setenv LOG_DIR "$LOG_DIR"
setenv RPT_DIR "$RPT_DIR"
setenv DB_DIR  "$DB_DIR"
setenv TMP_DIR "$TMP_DIR"
setenv EDA_TOOL_BIN /path/to/eda_tool

$EDA_TOOL_BIN -batch "$FLOW_ROOT/tcl/main.tcl" \
  >&! "$LOG_DIR/run.stdout.log"

重要模式是：

text 复制代码

创建唯一 run id
创建唯一 run directories
显式 export environment variables
捕获 stdout
不要依赖隐藏的 current directory

Tcl 侧也应该写出一份 environment summary：

text 复制代码

reports/<run_id>/environment_summary.rpt

它应该包含：

text 复制代码

run id
host
user
tool binary
tool version
working directory
log directory
report directory
database directory
temporary directory
key environment variables

这样，run 才是可复现、可审计的。

6. Stage Lifecycle：Precheck、Execute、Postcheck、Report、Save

每一个 backend stage 都应该遵循相同的 lifecycle：

text 复制代码

precheck
execute
postcheck
report
save

required input missing
ready
tool command failed
command completed
result invalid
result acceptable
Precheck
Blocked
Execute
Failed
Postcheck
Report
Save
Passed

这个 lifecycle 会把每一个 stage 变成一个受控的工程事务。

6.1 Precheck

Precheck 判断一个 stage 是否应该开始。

对于 placement，precheck 可以检查：

text 复制代码

design 已经 linked
floorplan 已存在
rows 已存在
standard cells 已识别
placement blockages 合法
timing scenario 已存在
output directories 可写

6.2 Execute

Execute 执行核心 stage commands。

对于 placement，可能包括：

text 复制代码

global placement
legalization
detailed placement
placement optimization

6.3 Postcheck

Postcheck 判断执行后是否产生了有效的 stage state。

对于 placement：

text 复制代码

没有 unplaced standard cells
没有严重 overlap
row/site legality passed
utilization 在预期范围内
主要 congestion warning 已 review
required reports 已生成

6.4 Report

Report 把 tool state 转换成可 review 的证据。

对于 placement：

text 复制代码

placement_summary.rpt
utilization_after_place.rpt
placement_legality_check.rpt
placement_timing_snapshot.rpt
placement_congestion_snapshot.rpt

6.5 Save

Save 保存 stage result：

text 复制代码

stage database
DEF snapshot
stage manifest
stage status file

这种结构可以避免把一个 stage 简单地因为命令返回到 shell，就误认为已经成功。

7. 一个通用 Stage Interface

一个可维护的 template 可以让每个 stage 使用相同的 procedure interface：

tcl 复制代码

proc stage_precheck {} {
    # Verify inputs and previous stage state.
}

proc stage_execute {} {
    # Run the stage commands.
}

proc stage_postcheck {} {
    # Verify output state.
}

proc stage_report {} {
    # Write reports.
}

proc stage_save {} {
    # Save database or output snapshot.
}

common runner 可以用一致的方式执行任意 stage：

tcl 复制代码

proc run_stage {stage_name stage_file} {
    puts "STAGE_BEGIN: $stage_name"

    if {![file exists $stage_file]} {
        error "Stage file not found: $stage_file"
    }

    source $stage_file

    foreach step {stage_precheck stage_execute stage_postcheck stage_report stage_save} {
        puts "STEP_BEGIN: $stage_name.$step"

        set rc [catch {
            $step
        } msg]

        if {$rc != 0} {
            puts "STEP_ERROR: $stage_name.$step"
            puts "ERROR: $msg"
            puts "ERRORINFO: $::errorInfo"
            error "Stage failed: $stage_name at $step"
        }

        puts "STEP_END: $stage_name.$step"
    }

    puts "STAGE_END: $stage_name"
}

这个 interface 让所有 stage 具有相同形态。

收益非常明显：

text 复制代码

new stages 更容易添加
stage behavior 更容易检查
error handling 统一
report generation 可预测
stage status 可以汇总
run comparison 更容易

8. Report Layer：Reports 是 Stage Interfaces

Reports 不应该被当成可有可无的 output。在 backend flow 中，reports 是 stages 和 engineers 之间的 interface。

一个 stage report 应该回答：

text 复制代码

这个 stage 消耗了什么？
这个 stage 改变了什么？
产生了哪些 quality metrics？
还剩下哪些 warnings 或 violations？
flow 是否可以进入下一 stage？

推荐的 report contract 如下：

Stage	必需 reports	主要目的
Import	`import_summary.rpt`, `unresolved_reference.rpt`	验证 design database creation
Floorplan	`floorplan_summary.rpt`, `row_site_summary.rpt`	验证 physical world setup
Placement	`placement_summary.rpt`, `legality_check.rpt`	验证 cell placement quality
CTS	`cts_summary.rpt`, `clock_skew_summary.rpt`	验证 real clock network quality
Routing	`route_summary.rpt`, `route_drc_summary.rpt`	验证 routed design health
Post-route	`antenna_summary.rpt`, `fill_summary.rpt`	验证 post-route closure tasks
ECO	`eco_delta_summary.rpt`, `eco_verification_checklist.rpt`	验证受控 design change
Export	`export_manifest.rpt`, `file_inventory.rpt`	验证 handoff package completeness

Report system 还应该包含一个顶层 status summary：

text 复制代码

reports/<run_id>/flow_status.rpt

示例：

text 复制代码

Stage                Status        Blocking Issues
--------------------------------------------------
01_import            PASS          0
02_floorplan         PASS          0
03_place             PASS_WARN     congestion hotspot near macro U_MEM0
04_cts               PASS          0
05_route             FAIL          27 spacing violations

这个文件应该让 reviewer 不打开每一个详细 log，也能理解这次 run 的状态。

9. Error and Warning Extraction

Backend logs 通常很大。一个 script template 应该把 error 和 warning summaries 提取成结构化 reports。

简单的 extraction policy 可以搜索以下 patterns：

text 复制代码

ERROR
FATAL
WARN
unresolved
missing
illegal
mismatch
violation
failed
not found
out of range

Flow 应该生成：

text 复制代码

error_summary.rpt
warning_summary.rpt
blocking_issue_summary.rpt

一个有用的 error summary 应该包含：

text 复制代码

stage
source log
line number if available
message
classification
blocking or non-blocking
suggested owner or next action

这不仅仅是方便，它会改变团队 review run 的方式。

没有 extraction：

text 复制代码

工程师手工检查成千上万行 log。

有 extraction：

text 复制代码

工程师 review 一份按优先级排列的 actionable issues 清单。

当 failure modes 变得可见时，flow 才更容易维护。

10. Parameter Snapshot：捕获 QoR 变化背后的隐藏原因

Backend tools 有大量 parameters。有些显式写在 scripts 里，有些来自 defaults，有些继承自前面的 setup。

Parameter changes 可能影响：

text 复制代码

placement density behavior
timing cost weight
clock tree balancing
routing layer selection
DRC repair strategy
report thresholds
runtime and memory

因此，每次 run 都应该捕获 parameter snapshots：

text 复制代码

parameter_snapshot.rpt
non_default_parameter.rpt
stage_parameter_summary.rpt

Parameter report 可以按 stage 组织：

Stage	Parameter class	示例内容
Placement	density / timing / congestion	target density、effort、padding
CTS	skew / latency / buffer	skew target、buffer list、route rule
Routing	layers / DRC / timing	min/max layer、NDR、repair effort
Export	format / naming / hierarchy	DEF version、GDS map、hierarchy policy

当两个 run 之间 QoR 发生变化时，parameter snapshots 可以帮助回答：

text 复制代码

design 是否变化了？
library 是否变化了？
script 是否变化了？
parameter set 是否变化了？
tool version 是否变化了？

没有这些证据，flow debug 往往会变成猜测。

11. Manifest：一次 Run 的身份证

每一次 backend run 都应该生成 manifest。

Manifest 是一次 run 的身份证。

一个实用 manifest 可以采用 YAML-like 格式：

yaml 复制代码

run_id: LAY_RUN_20260427_101500
project: backend_flow_demo
top: demo_top
user: darren
host: eda_server_01
tool: eda_tool
tool_version: 2026.x
flow_root: /path/to/backend_flow
script_version: abc1234
config_version: def5678
library_version: demo_lib_v1
constraint_version: demo_sdc_v1
stages:
  - 01_import
  - 02_floorplan
  - 03_place
status: PASS_WITH_WARNINGS
reports:
  - reports/20260427_101500/flow_status.rpt
  - reports/20260427_101500/qor_summary.rpt
outputs:
  - output/20260427_101500/demo_top.def
  - output/20260427_101500/demo_top.v
known_issues:
  - congestion warning near memory boundary

Manifest 可以让后续工程师回答：

text 复制代码

是什么产生了这个结果？
使用了哪些 inputs？
使用了哪些 scripts？
哪些 outputs 属于这次 run？
这次 run 是 clean，还是只能有条件接受？

没有 manifest，backend output directory 只是一堆文件。

有了 manifest，它才变成一个可追踪的工程交付物。

12. Regression and Comparison Layer

Backend flow 很少只运行一次。随着 design 演进，它会被反复运行。

因此，script template 必须支持 run comparison。

重要 comparison targets 包括：

text 复制代码

area
utilization
cell count
buffer count
WNS / TNS / violation count
setup / hold status
clock skew
route length
via count
congestion score
DRC count
antenna count
warning count
runtime
memory

Comparison report 可以像这样：

text 复制代码

Metric                    Previous        Current         Delta
----------------------------------------------------------------
Instance count             120,340         120,925          +585
Design area                1.82e6          1.85e6           +1.6%
WNS setup                  -0.041          -0.018           +0.023
TNS setup                  -12.4           -3.1             +9.3
Hold violations            88              31               -57
Route DRC                  142             27               -115
Runtime                    07:42:10        08:11:45         +00:29:35

Comparison 会把 review 问题从：

text 复制代码

当前 run 是否完成？

转变为：

text 复制代码

和上一个已知 run 相比，发生了什么变化？

这对于 flow development、tool version evaluation、script cleanup、parameter tuning、design update review 都非常关键。

13. Backend Stage 的 Status Model

Backend stage 不应该只有 done 或 not done。

更有用的 status model 是：

text 复制代码

NOT_STARTED
BLOCKED
RUNNING
FAILED
PASS_WITH_WARNINGS
PASSED
SKIPPED

precheck failed
start stage
command error or blocking check failed
completed with non-blocking issues
completed cleanly
disabled by run config
NOT_STARTED
BLOCKED
RUNNING
FAILED
PASS_WITH_WARNINGS
PASSED
SKIPPED

这个 status model 很有用，因为 backend stages 经常会带着 warnings 完成。

Placement stage 可能在存在轻微 congestion warnings 时仍然可接受。

Route stage 如果还存在 opens 或 shorts，就通常不能接受。

Post-route stage 可能在存在已知 waived DRC markers 的情况下可接受，但前提是 waiver database 已经 versioned 并经过 review。

Stage status 应该基于显式规则，而不是主观解释。

14. Script System 中的数据流

Script system 应该让 data movement 可见。
Config Files
Main Tcl
Stage Tcl
EDA Tool Database
Reports
Database Snapshot
Output Files
Run Summary
Archive
Run Comparison

这有助于澄清一个重要原则：

text 复制代码

stage scripts 修改 database state
report scripts 观察 database state
archive scripts 保存 database state
compare scripts 评估 state differences

混合这些职责，会让 flow behavior 很难理解。

例如，report script 不应该意外改变 placement 或 routing。

Stage script 不应该在不更新 manifest 的情况下，静默覆盖 final handoff outputs。

如果所有 required metrics 已经被捕获到 reports 中，那么 compare script 不应该依赖一个 live tool session。

15. Naming Methodology

File naming 是 flow design 的一部分。

使用稳定的 stage numbers：

text 复制代码

01_import.tcl
02_floorplan.tcl
03_place.tcl
04_cts.tcl
05_route.tcl
06_post_route_check.tcl
07_eco.tcl
08_export.tcl

避免模糊名称：

text 复制代码

run_new.tcl
run_final.tcl
try.tcl
fix.tcl
new2.tcl
latest.tcl

同样规则也适用于 reports：

text 复制代码

placement_summary.rpt
placement_legality_check.rpt
placement_timing_snapshot.rpt

比下面这种命名更好：

text 复制代码

report1.rpt
place_new.rpt
check_final.rpt

一个好的 file name 应该回答：

text 复制代码

它由哪个 stage 产生？
它总结了什么？
它是 check、summary、snapshot，还是 handoff file？

Naming discipline 可以降低沟通成本。

16. Demo 27：`LAY-BE-27_backend_script_template`

这个 demo 的目的不是完成一个完整 chip implementation，而是展示一个可维护 backend script system 的架构。

Demo 应该聚焦于：

text 复制代码

standard directory structure
configuration loading
stage registration
stage precheck
stage status reporting
manifest generation
report contract generation
archive manifest generation

16.1 推荐 Demo Directory

text 复制代码

LAY-BE-27_backend_script_template/
├─ README.md
├─ config/
│  ├─ design_config.tcl
│  ├─ library_config.tcl
│  └─ stage_config.tcl
├─ scripts/
│  ├─ run_stage.csh
│  ├─ run_all.csh
│  └─ clean.csh
├─ tcl/
│  ├─ main.tcl
│  ├─ common/
│  │  ├─ env_check.tcl
│  │  ├─ stage_runner.tcl
│  │  ├─ report_utils.tcl
│  │  └─ manifest_utils.tcl
│  ├─ stages/
│  │  ├─ 01_import.tcl
│  │  ├─ 02_floorplan.tcl
│  │  └─ 03_place.tcl
│  └─ reports/
│     ├─ report_template_structure.tcl
│     ├─ report_stage_list.tcl
│     └─ report_contract.tcl
├─ logs/
├─ reports/
├─ output/
├─ db/
└─ tmp/

16.2 Demo Inputs

Demo inputs 不是工业 design data，而是 template control inputs：

text 复制代码

config/design_config.tcl
config/library_config.tcl
config/stage_config.tcl
stage Tcl files
common utility Tcl files
shell entry scripts

16.3 Demo Outputs

Demo 应该生成：

text 复制代码

reports/template_structure_check.rpt
reports/stage_list.rpt
reports/run_manifest.rpt
reports/report_contract.rpt
reports/archive_manifest.rpt
logs/run.summary.log

16.4 Demo Checks

Demo 应该验证：

text 复制代码

required directories exist
required config files exist
required common utility files exist
stage files follow the expected naming rule
stage list is readable
run manifest can be generated
report contract can be generated
archive manifest can be generated

这个 demo 会把 script template 本身变成一个可检查、可测试的对象。

17. Backend Script Systems 中的 Failure Patterns

Backend script system 通常会以可预测的方式失败。

Failure pattern	Symptom	Root cause	Better practice
Monolithic script	一个巨大的 `run.tcl` 控制一切	没有 layer separation	拆分 config、stage、report、check
Hardcoded paths	Flow 只对某一个用户有效	design variables 嵌入 scripts	使用 config 和 environment layer
Hidden state	同一个 script 表现不同	依赖 HOME、PATH 或 previous session	记录 environment 并 reset state
No stage gates	前一阶段失败后，后续阶段仍然运行	没有 precheck/postcheck	使用 stage lifecycle
Log-only review	工程师手工 grep 巨大 log	没有 summary reports	生成 structured summaries
No manifest	Outputs 无法追溯	没有 run identity	写出 run manifest
No comparison	QoR regression 无法早期发现	没有 diff framework	生成 compare reports
Weak naming	没人知道哪个是最新脚本	文件名 ad hoc	使用 numbered stages 和 contracts
Mixed responsibilities	Report script 改变 design	没有角色分离	保持 observe/modify/archive 分离

这些 failure patterns 并不罕见。它们正是脚本增长缺少架构约束时的正常结果。

18. 可维护 Script Template 的 Review Checklist

可以用下面的 checklist 来 review 一个 backend script template。

Entry and environment

text 复制代码

是否有清晰的 shell entry？
tool path 是否显式？
每次 run 是否分配 run id？
log/report/db/tmp directories 是否按 run 唯一？
stdout 是否被捕获？
tool version 是否被记录？

Configuration

text 复制代码

design variables 是否与 stage execution 分离？
library files 是否集中管理？
scenarios 是否集中管理？
config files 是否可以随 run 一起 archive？

Stage control

text 复制代码

每个 stage 是否有 precheck？
每个 stage 是否有 postcheck？
每个 stage 是否写 reports？
每个 stage 是否保存 status？
是否可以单独 rerun 一个 stage？
是否可以按顺序运行整个 flow？

Reports and logs

text 复制代码

是否为每个 stage 定义 required reports？
是否有顶层 flow status report？
warnings 和 errors 是否被 summary？
command logs 是否被保留？
reports 是否放在 run id 目录下？

Release and comparison

text 复制代码

是否有 run manifest？
outputs 是否被 inventory？
两个 run 是否可以比较？
archive packaging 是否定义？
不打开工具是否也能 review flow result？

如果这些问题能被清晰回答，那么 backend script system 就正在从个人脚本走向工程级 flow。

19. Engineering Takeaways

Backend script template 不只是 coding style。它是一种方法，用来在复杂 implementation process 中保留工程控制力。

关键原则是：

text 复制代码

separate configuration from execution
separate stage actions from reports
separate observation from modification
make every stage checkable
make every run identifiable
make every result comparable
make every output traceable

成熟的 backend flow 并不是由它能运行多少命令来定义的，而是由这次 run 是否能够被 reproducible、inspectable、comparable、handoff，并且能够被安全扩展来定义的。

20. Summary

一个可维护的 backend script system 应该被组织成分层工程框架：

text 复制代码

Environment Layer
Configuration Layer
Common Utility Layer
Stage Execution Layer
Report and Check Layer
Snapshot and Archive Layer
Regression and Compare Layer

每一个 backend stage 都应该遵循标准 lifecycle：

text 复制代码

Precheck -> Execute -> Postcheck -> Report -> Save

每一次 run 都应该产生：

text 复制代码

logs
reports
stage status
parameter snapshot
error summary
warning summary
run manifest
output inventory
comparison-ready metrics

目标不是让 scripts 看起来更正式，而是防止 backend implementation knowledge 被困在不稳定的个人脚本中。

设计良好的 script template，会把 backend flow 变成一个可复现、可检查、以 report 驱动的工程系统。

【Backend Flow工程实践 27】Backend Script Template：一个可维护的后端脚本体系应该如何组织？

1. Backend Script 不是命令清单

2. Backend Script Stack

3. 推荐的 Repository Structure

4. Configuration Layer：把变量和执行分离

5. Environment Layer：控制外部状态

6. Stage Lifecycle：Precheck、Execute、Postcheck、Report、Save

6.1 Precheck

6.2 Execute

6.3 Postcheck

6.4 Report

6.5 Save

7. 一个通用 Stage Interface

8. Report Layer：Reports 是 Stage Interfaces

9. Error and Warning Extraction

10. Parameter Snapshot：捕获 QoR 变化背后的隐藏原因

11. Manifest：一次 Run 的身份证

12. Regression and Comparison Layer

13. Backend Stage 的 Status Model

14. Script System 中的数据流

15. Naming Methodology

16. Demo 27：LAY-BE-27_backend_script_template

16.1 推荐 Demo Directory

16.2 Demo Inputs

16.3 Demo Outputs

16.4 Demo Checks

17. Backend Script Systems 中的 Failure Patterns

18. 可维护 Script Template 的 Review Checklist

Entry and environment

Configuration

Stage control

Reports and logs

Release and comparison

19. Engineering Takeaways

20. Summary

16. Demo 27：`LAY-BE-27_backend_script_template`