作者 :Darren H. Chen
方向 :汽车芯片功能安全分析与故障注入实践
Demo :D16_regression_and_trend_tracking
标签:汽车芯片、功能安全、安全回归、故障注入回归、Diagnostic Coverage Trend、Residual FIT Trend、FMEDA Delta、Evidence Package、CI、安全指标
1. 为什么这一篇重要?
上一篇文章中,我们已经基于结构化 evidence package 生成了一份可评审的 safety engineering report。
D15 生成的输出包括:
text
safety_report.md
safety_report_summary.md
review_action_list.md
metric_tables_for_review.csv
report_warnings.csv
report_manifest.yaml
这份 report 描述的是一次分析快照。
但是,真实的功能安全工程并不是一次性活动。
设计会变化。
安全机制会变化。
Fault list 会变化。
Campaign results 会变化。
Measured diagnostic coverage 会变化。
FMEDA rows 会变化。
Residual FIT 会变化。
Review items 会被打开、修复,也可能重新打开。
下一个问题是:
如何跨设计迭代跟踪 safety analysis results,并识别 safety regression?
本篇对应的 Demo 是:
text
D16_regression_and_trend_tracking
本篇引入的通用工具名是:
text
safeic-regress
safeic-regress 的目标,是对多个 safety evidence packages 或 safety reports 进行跨迭代比较:
text
baseline evidence package
current evidence package
baseline FMEDA table
current FMEDA table
baseline measured DC
current measured DC
baseline fault outcomes
current fault outcomes
review action history
trend policy
regression policy
并生成:
text
regression_summary.md
metric_trend.csv
dc_trend_by_failure_mode.csv
residual_fit_trend.csv
fmeda_delta_trend.csv
fault_outcome_delta.csv
review_item_trend.csv
regression_alerts.csv
核心思想是:
Functional safety analysis 只有被跟踪成一个 iterative engineering loop,而不是一次性 report,才会真正产生持续工程价值。
2. D16 在整体流程中的位置
D16 是本系列中第一个 multi-iteration Demo。
Evidence Package v1
D16 Regression and Trend Tracking
Evidence Package v2
Metric Trend
FMEDA Delta Trend
Regression Alerts
Review Item Trend
图 1:D16 比较不同迭代的 evidence packages 或 reports,并生成趋势和回归结果。
D15 回答的是:
text
这一个 evidence package 表达了什么?
D16 回答的是:
text
相比上一个 package,发生了什么变化?
Diagnostic coverage 是提高了还是下降了?
Residual FIT 是增加了还是减少了?
Unsafe faults 是否被修复?
是否出现了新的 unsafe faults?
Review items 是否关闭,还是继续打开?
Evidence quality 是否改善?
这一步是从 single-run reporting 转向 continuous safety improvement。
3. 为什么 Safety Regression 不同于普通 Regression?
普通设计 regression 经常问:
text
测试是否通过?
Timing 是否通过?
Lint 是否通过?
Simulation 是否通过?
Safety regression 需要问更深的问题:
text
Diagnostic coverage 是否变化?
Residual FIT 是否变化?
之前 detected 的 fault 是否变成 unsafe?
之前 reviewed 的 FMEDA row 是否重新变成 review_required?
某个 safety mechanism 是否停止检测 faults?
Campaign 是否丢失了 observability?
Evidence quality 是否变弱?
一个测试可以仍然 PASS,但 safety evidence 已经 regression。
示例:
text
RTL simulation passes.
Fault campaign runs.
But alarm path stuck-at fault changes from detected to unsafe.
这就是 safety regression。
No
Yes
Yes
No
Normal Regression PASS
Safety Evidence Changed?
No Safety Regression
Safety Trend Analysis
Worse?
Safety Regression Alert
Improvement or Neutral Change
图 2:Safety regression 关注 evidence 和 metric changes,而不只是 execution pass/fail。
因此,D16 比较的是结构化 evidence,而不是单纯比较 logs。
4. 什么是 Iteration?
Iteration 是一个版本化的 safety-analysis snapshot。
它可能对应:
text
new RTL version
new safety mechanism implementation
new fault list
new simulation campaign
new classification policy
new measurement policy
new FMEDA update
new evidence package
new report
一个简单 iteration record:
yaml
iteration:
id: iter_2026_05_01
design_version: toy_counter_v1
evidence_package: packages/iter_2026_05_01
report: reports/iter_2026_05_01/safety_report.md
tag: baseline
另一个 iteration:
yaml
iteration:
id: iter_2026_05_08
design_version: toy_counter_v2_alarm_fix
evidence_package: packages/iter_2026_05_08
report: reports/iter_2026_05_08/safety_report.md
tag: alarm_path_fix
D16 比较这些 iterations。
5. Baseline 与 Current
Regression analysis 通常比较:
text
baseline
current
Baseline 是参考快照。
Current package 是新的快照。
示例:
text
baseline:
D14 package generated before alarm-path fix
current:
D14 package generated after alarm-path fix
Baseline Package
Compare
Current Package
Improvement
Regression
Neutral Change
New Evidence Gap
图 3:D16 比较 baseline package 和 current package,并对变化进行分类。
Baseline 不一定是完美版本。
它只是被选定的参考点。
6. 应该比较什么?
D16 应比较多个维度:
text
diagnostic coverage
residual FIT
fault outcomes
FMEDA rows
review items
evidence quality
execution quality
assumptions
policies
traceability completeness
一个有用的 regression tool 不应只比较一个 metric。
例如,measured DC 可能提高了,但 evidence quality 因 unresolved ratio 增大而变差。
示例:
text
measured DC improves from 0.80 to 0.90
unresolved ratio increases from 0.02 to 0.40
这不是干净的 improvement。
D16 应该把它标记为需要 review。
7. Metric Trend
Metric trend 用来跟踪多个 iterations 的指标变化。
示例:
csv
iteration,total_base_fit,total_residual_fit,weighted_selected_dc,unsafe_faults,unresolved_faults,review_required_rows
iter_001,0.078,0.0204,0.738,2,0,2
iter_002,0.078,0.0104,0.867,1,0,1
iter_003,0.078,0.0064,0.918,0,1,1
这个趋势显示了改善,但也显示 iter_003 出现了一个 unresolved fault。
Iteration 1 Metrics
Trend Table
Iteration 2 Metrics
Iteration 3 Metrics
Regression Alerts
图 4:Metric trend tables 展示 safety evidence 如何跨迭代演进。
第一版实现中,trend 可以先用 CSV 存储。
后续可以渲染成图表或 dashboard。
8. Diagnostic Coverage Trend
Diagnostic coverage trend 应按有意义的 group 跟踪:
text
overall
endpoint
failure mode
safety mechanism
part
sub-part
示例:
csv
iteration,group_type,group_id,measured_dc,selected_dc,confidence
iter_001,failure_mode,FM_ALARM_NOT_ASSERTED,0.000,0.000,LOW
iter_002,failure_mode,FM_ALARM_NOT_ASSERTED,0.500,0.000,LOW
iter_003,failure_mode,FM_ALARM_NOT_ASSERTED,0.900,0.850,MEDIUM
这讲述了一个故事:
text
iteration 1:
alarm-not-asserted is uncovered
iteration 2:
measured behavior improves but confidence is low
iteration 3:
measured confidence becomes acceptable and selected DC is updated
Trend 必须区分:
text
measured_dc
selected_dc
confidence
否则报告可能会夸大 improvement。
9. Residual FIT Trend
Residual FIT trend 通常比 coverage trend 更有工程价值。
示例:
csv
iteration,failure_mode,base_fit,selected_dc,residual_fit
iter_001,FM_ALARM_NOT_ASSERTED,0.010,0.000,0.0100
iter_002,FM_ALARM_NOT_ASSERTED,0.010,0.500,0.0050
iter_003,FM_ALARM_NOT_ASSERTED,0.010,0.850,0.0015
Residual FIT 降低代表风险下降。
但是 trend 还应跟踪变化原因:
text
design change
new safety mechanism
policy change
campaign expansion
manual review
FIT model change
示例:
csv
iteration,failure_mode,residual_fit,change_reason
iter_002,FM_ALARM_NOT_ASSERTED,0.0050,alarm path monitor added
iter_003,FM_ALARM_NOT_ASSERTED,0.0015,campaign expanded and selected DC updated
这样 trend 才可解释。
10. Fault Outcome Delta
最重要的 regression checks 之一是 fault outcome delta。
一个 fault 可能从:
text
unsafe -> detected
unsafe -> safe
detected -> unsafe
detected -> unresolved
unresolved -> detected
有些变化是 improvement。
有些变化是 regression。
示例:
csv
fault_id,baseline_outcome,current_outcome,delta_class
F004,unsafe,detected,improvement
F010,detected,unsafe,regression
F020,unresolved,detected,improvement
F030,detected,unresolved,evidence_regression
Baseline Fault Outcome
Compare
Current Fault Outcome
Improvement
Regression
Evidence Regression
No Change
图 5:Fault outcome delta 在 fault level 识别 improvements、regressions 和 evidence-quality changes。
Detected-to-unsafe change 应触发高等级 alert。
11. 跨 Iteration 的 Fault Matching
Faults 必须能跨 iterations 匹配。
最简单的 key 是:
text
fault_id
但当 fault list 重新生成时,fault ID 可能变化。
更稳健的 matching 可以使用:
text
node
fault_type
endpoint
failure_mode
safety_mechanism
fault_model
injection_mode
示例 matching key:
text
node + fault_type + failure_mode + endpoint
Policy 示例:
yaml
fault_matching:
primary_key: fault_id
fallback_keys:
- node
- fault_type
- endpoint
- failure_mode
工具应报告 unmatched faults:
text
new faults
removed faults
renamed faults
unmatched faults
设计变化时,unmatched faults 可能是正常的,但必须可见。
12. FMEDA Row Delta
FMEDA rows 也可能变化。
需要跟踪的变化包括:
text
row added
row removed
base FIT changed
estimated DC changed
measured DC changed
selected DC changed
residual FIT changed
review status changed
evidence source changed
unsafe fault count changed
示例:
csv
row_id,field,baseline_value,current_value,delta_class
R003,selected_dc,0.000,0.850,improvement
R003,residual_fit,0.0100,0.0015,improvement
R003,review_status,review_required,reviewed,improvement
R005,row_status,missing,added,new_row
FMEDA delta tracking 很重要,因为 FMEDA 是 safety reviewer 最常看的表。
13. Review Item Trend
Review items 不应该在不同迭代间消失在视野外。
D16 应跟踪:
text
new review items
closed review items
reopened review items
persistent review items
severity changes
owner changes
due date changes
示例:
csv
item_id,baseline_status,current_status,delta_class
I001,open,closed,closed
I002,open,open,persistent
I003,missing,open,new
I004,closed,open,reopened
高等级 review item 被 reopened 时,应生成强 alert。
Review item trend 把安全改进转化成可管理的工程工作。
14. Evidence Quality Trend
Evidence quality 也应被跟踪。
指标包括:
text
unresolved ratio
not-classified ratio
missing artifact count
low-confidence metric count
scope mismatch count
open high-severity review items
policy change count
示例:
csv
iteration,unresolved_ratio,not_classified_ratio,missing_artifacts,low_confidence_groups,open_high_items
iter_001,0.00,0.00,0,3,1
iter_002,0.05,0.00,0,2,1
iter_003,0.20,0.02,1,1,0
一个 trend 可以在某一方面变好,同时在另一方面变差。
D16 不应给出过于简单的 pass/fail conclusion。
15. 跨 Iteration 的 Policy Changes
Metric changes 可能来自 policy changes。
示例:
text
safe faults included in denominator in one run but excluded in another
late alarms counted as detected in one run but unsafe in another
FIT-weighted DC enabled in one run but not another
low-confidence measured DC allowed in one run but not another
D16 必须比较 policy files,或至少记录 policy hashes。
示例:
csv
policy_name,baseline_hash,current_hash,status
classification_policy,abc123,abc123,unchanged
measurement_policy,def456,789abc,changed
fmeda_update_policy,111aaa,111aaa,unchanged
如果 metric 变化同时 policy 也变化,trend interpretation 必须说明。
text
Measured DC changed from 0.60 to 0.72, but measurement policy changed.
Review is required before treating the change as design improvement.
16. Regression Severity
不是所有变化都有相同严重程度。
建议 severity levels:
text
INFO
LOW
MEDIUM
HIGH
CRITICAL
示例 severity rules:
yaml
regression_policy:
critical:
- detected_to_unsafe
- reviewed_to_review_required_with_residual_fit_increase
high:
- residual_fit_increase_above_threshold
- new_unsafe_failure_mode
- high_severity_review_item_reopened
medium:
- measured_dc_drop_above_threshold
- unresolved_ratio_increase_above_threshold
- policy_changed_with_metric_change
low:
- confidence_drop
- new_low_severity_review_item
这有助于确定工程优先级。
17. Regression Alerts
D16 应生成 regression_alerts.csv。
示例:
csv
alert_id,severity,category,item,baseline,current,message
A001,CRITICAL,fault_outcome,F010,detected,unsafe,previously detected fault became unsafe
A002,HIGH,residual_fit,FM_ALARM_NOT_ASSERTED,0.0015,0.0100,residual FIT increased above threshold
A003,MEDIUM,evidence_quality,unresolved_ratio,0.02,0.20,unresolved ratio increased
A004,MEDIUM,policy,measurement_policy,abc123,def456,policy changed with metric trend
Alerts 应简洁且可执行。
一个好的 alert 应说明:
text
what changed
why it matters
where to look
what to do next
18. Trend Summary Report
Human-readable trend report 应包括:
text
baseline iteration
current iteration
overall status
key improvements
key regressions
metric trend
fault outcome delta
FMEDA row delta
review item trend
evidence quality trend
policy changes
recommended actions
示例:
md
# D16 Regression and Trend Tracking Summary
Baseline: iter_001
Current: iter_002
## Overall Status
Status: REVIEW_REQUIRED
## Improvements
- FM_ALARM_NOT_ASSERTED measured DC improved from 0.000 to 0.500.
- Fault F004 changed from unsafe to detected.
## Regressions
- Measurement confidence remains LOW.
- One high-severity review item remains open.
## Required Actions
1. Expand the alarm-path campaign.
2. Keep FMEDA selected DC unchanged until confidence improves.
3. Review measurement policy consistency.
这是 safety iteration 的主要评审 artifact。
19. Trend Database
D16 可以维护一个简单 trend database。
对 Demo 来说,可以先用 CSV 实现。
示例 layout:
text
trend_db/
iterations.csv
metric_trend.csv
dc_trend_by_failure_mode.csv
residual_fit_trend.csv
review_item_history.csv
policy_hash_history.csv
示例 iterations.csv:
csv
iteration_id,date,design_version,package_path,report_path,tag
iter_001,2026-05-01,toy_counter_v1,packages/iter_001,reports/iter_001,baseline
iter_002,2026-05-08,toy_counter_v2,packages/iter_002,reports/iter_002,alarm_fix
CSV trend database 对 GitHub Demo 已经足够。
后续可演进到 SQLite 或 dashboard backend。
20. CI Integration
Safety regression tracking 可以接入 CI。
一个 CI flow 可能执行:
text
build design
run safety preflight
generate fault list
run selected campaign
classify outcomes
compute measured DC
update FMEDA
package evidence
generate report
compare with baseline
fail or warn on regression
No
Yes
Commit / Tag
Run Safety Flow
Generate Evidence Package
Generate Safety Report
Compare Against Baseline
Regression?
Pass / Archive
Warn or Fail CI
图 6:Safety regression tracking 可以成为 safety evidence quality 的 CI gate。
早期可以手动运行 D16。
后续可作为轻量 CI check。
21. 什么情况应该 Fail CI?
不是所有 warning 都应该 fail CI。
可能的 CI fail conditions:
text
detected fault becomes unsafe
new critical review item appears
residual FIT increases above threshold
selected DC drops below threshold
FMEDA row becomes evidence_missing
required artifact missing
policy file changes without review approval
可能的 warning-only conditions:
text
confidence remains low
sample size still small
new low-severity review item
new assumption added
non-critical metric change
Policy 示例:
yaml
ci_policy:
fail_on:
- critical_regression_alert
- missing_required_artifact
- detected_to_unsafe
- residual_fit_increase_gt_threshold
warn_on:
- low_confidence
- policy_change
- unresolved_ratio_increase
CI 应足够严格以捕捉危险 regression,但不能严格到阻塞所有探索性分析。
22. Baseline Selection
选择 baseline 很重要。
可能的 baseline strategies:
text
last successful package
last reviewed package
release candidate package
golden reference package
specific tag
specific date
manual baseline
示例:
yaml
baseline_selection:
mode: last_reviewed
fallback: latest
Safety baseline 通常应是 reviewed。
拿 unreviewed baseline 对比,可能产生误导性的 regression conclusions。
23. 处理 Design Changes
当 design 变化时,有些 faults 和 FMEDA rows 可能消失或被重命名。
D16 应分类:
text
matched
added
removed
renamed
unmatched
示例:
csv
object_type,object_id,delta_class,comment
fault,F010,removed,node no longer exists
fault,F120,added,new alarm monitor fault
fmeda_row,R003,matched,same failure mode and part
fmeda_row,R010,added,new watchdog row
Added 或 removed objects 不自动代表好坏。
它们需要 context。
24. 处理 Tool 或 Policy Changes
如果 tool behavior 变化,trend 可能受到影响。
D16 应记录:
text
tool version
script version
policy hash
configuration hash
evidence package hash
示例:
csv
item,baseline,current,status
safeic-classify_version,0.1.0,0.1.1,changed
classification_policy_hash,aaa111,aaa111,unchanged
measurement_policy_hash,bbb222,ccc333,changed
这很重要,因为 metric change 可能来自:
text
design change
fault list change
classification policy change
tool bug fix
campaign expansion
Trend report 应诚实说明不确定性。
25. D16 核心输入
建议输入:
text
inputs/
regression_config.yaml
trend_policy.yaml
baseline/
evidence_package/
safety_report.md
current/
evidence_package/
safety_report.md
Evidence packages 应包含:
text
fmeda_table.csv
safety_metric_summary.csv
measured_dc_by_failure_mode.csv
measured_dc_by_endpoint.csv
measured_residual_fit.csv
fault_outcomes.csv
review_items.csv
assumption_register.csv
package_status.csv
artifact_hashes.csv
D16 应使用 package outputs,而不是使用分散的 raw files。
26. D16 主要输出
建议输出:
text
outputs/
regression_summary.md
regression_alerts.csv
metric_trend.csv
dc_trend_by_failure_mode.csv
dc_trend_by_endpoint.csv
residual_fit_trend.csv
fault_outcome_delta.csv
fmeda_delta_trend.csv
review_item_trend.csv
evidence_quality_trend.csv
policy_delta.csv
trend_manifest.yaml
每个输出都有明确用途:
| Output | Purpose |
|---|---|
regression_summary.md |
Human-readable trend report |
regression_alerts.csv |
Prioritized regression warnings |
metric_trend.csv |
Top-level metric comparison |
dc_trend_by_failure_mode.csv |
Measured/selected DC by failure mode |
residual_fit_trend.csv |
Residual risk change |
fault_outcome_delta.csv |
Fault-level outcome changes |
fmeda_delta_trend.csv |
FMEDA row changes |
review_item_trend.csv |
Review action changes |
policy_delta.csv |
Policy or configuration changes |
27. regression_config.yaml 示例
yaml
regression:
name: toy_counter_alarm_fix_regression
baseline_iteration: iter_001
current_iteration: iter_002
inputs:
baseline_package: inputs/baseline/evidence_package
current_package: inputs/current/evidence_package
matching:
fault_matching:
primary_key: fault_id
fallback_keys:
- node
- fault_type
- endpoint
- failure_mode
fmeda_matching:
primary_key: row_id
fallback_keys:
- part
- subpart
- failure_mode
- design_object
outputs:
summary: outputs/regression_summary.md
alerts: outputs/regression_alerts.csv
这能让比较过程可复现。
28. trend_policy.yaml 示例
yaml
trend_policy:
thresholds:
measured_dc_drop_warn: 0.05
measured_dc_drop_fail: 0.10
residual_fit_increase_warn: 0.001
residual_fit_increase_fail: 0.005
unresolved_ratio_increase_warn: 0.10
severity:
detected_to_unsafe: CRITICAL
unsafe_to_detected: INFO
unsafe_to_safe: INFO
detected_to_unresolved: HIGH
unresolved_to_detected: INFO
new_unsafe_fault: HIGH
high_review_item_reopened: HIGH
policy_change:
warn_if_policy_hash_changed: true
require_review_if_metric_changed_with_policy_change: true
ci:
fail_on_critical_alert: true
fail_on_missing_required_artifact: true
warn_on_low_confidence: true
Thresholds 应按项目自定义。
公开 Demo 可以先使用简单默认值。
29. 工具架构
通用工具 safeic-regress 可以实现成分阶段 pipeline。
manifest.yaml
safeic-regress
regression_config.yaml
trend_policy.yaml
Baseline Evidence Package
Current Evidence Package
Load Packages
Validate Comparable Inputs
Match Faults and FMEDA Rows
Compare Metrics
Compare Fault Outcomes
Compare FMEDA Rows
Compare Review Items
Compare Policies
Generate Alerts
Generate Trend Reports
图 7:safeic-regress 加载两个 evidence packages,比较 metrics 和 evidence,生成 alerts,并输出 trend reports。
建议内部模块:
text
safeic_regress/
cli.py
manifest.py
load_config.py
load_package.py
validate_compare.py
match_faults.py
match_fmeda.py
metric_compare.py
outcome_delta.py
fmeda_delta.py
review_delta.py
policy_delta.py
severity.py
trend_db.py
report.py
职责划分:
| Module | Responsibility |
|---|---|
load_package.py |
Read D14 evidence package structure |
validate_compare.py |
Check whether two packages are comparable |
match_faults.py |
Match faults across iterations |
match_fmeda.py |
Match FMEDA rows across iterations |
metric_compare.py |
Compare DC, residual FIT, and summary metrics |
outcome_delta.py |
Compare fault outcomes |
fmeda_delta.py |
Compare FMEDA row changes |
review_delta.py |
Compare review item status |
policy_delta.py |
Compare policy hashes and configs |
severity.py |
Assign regression alert severity |
trend_db.py |
Update trend history |
report.py |
Generate CSV and Markdown outputs |
30. D16 目录结构
建议目录:
text
D16_regression_and_trend_tracking/
README.md
run_demo.sh
run_demo.csh
manifest.yaml
inputs/
regression_config.yaml
trend_policy.yaml
baseline/
evidence_package/
package_manifest.yaml
package_status.csv
metrics/
measured_dc_by_failure_mode.csv
measured_residual_fit.csv
safety_metric_summary.csv
fmeda/
fmeda_table.csv
fmeda_review_items.csv
campaign/
fault_outcomes.csv
policies/
classification_policy.yaml
measurement_policy.yaml
current/
evidence_package/
package_manifest.yaml
package_status.csv
metrics/
measured_dc_by_failure_mode.csv
measured_residual_fit.csv
safety_metric_summary.csv
fmeda/
fmeda_table.csv
fmeda_review_items.csv
campaign/
fault_outcomes.csv
policies/
classification_policy.yaml
measurement_policy.yaml
outputs/
regression_summary.md
regression_alerts.csv
metric_trend.csv
dc_trend_by_failure_mode.csv
residual_fit_trend.csv
fault_outcome_delta.csv
fmeda_delta_trend.csv
review_item_trend.csv
policy_delta.csv
trend_manifest.yaml
这个结构把 baseline/current comparison 显式化。
31. D16 Manifest
示例:
yaml
project:
name: automotive_safeic_practice
demo: D16_regression_and_trend_tracking
top_module: toy_counter
inputs:
regression_config: inputs/regression_config.yaml
trend_policy: inputs/trend_policy.yaml
baseline_package: inputs/baseline/evidence_package
current_package: inputs/current/evidence_package
outputs:
summary: outputs/regression_summary.md
alerts: outputs/regression_alerts.csv
metric_trend: outputs/metric_trend.csv
dc_trend_by_failure_mode: outputs/dc_trend_by_failure_mode.csv
residual_fit_trend: outputs/residual_fit_trend.csv
fault_outcome_delta: outputs/fault_outcome_delta.csv
fmeda_delta_trend: outputs/fmeda_delta_trend.csv
review_item_trend: outputs/review_item_trend.csv
policy_delta: outputs/policy_delta.csv
trend_manifest: outputs/trend_manifest.yaml
Manifest 明确定义比较对象和输出位置。
32. D16 执行流程
Load Manifest
Load Regression Config
Load Trend Policy
Load Baseline Package
Load Current Package
Validate Comparability
Compare Top-Level Metrics
Compare DC by Group
Compare Residual FIT
Compare Fault Outcomes
Compare FMEDA Rows
Compare Review Items
Compare Policies
Generate Regression Alerts
Generate Trend Summary
图 8:D16 执行流程:加载 packages、校验、比较 metrics 和 evidence,生成 alerts 与 trend summary。
示例 bash 脚本:
bash
#!/usr/bin/env bash
set -euo pipefail
safeic-regress \
--manifest manifest.yaml \
--output-dir outputs
示例 csh 脚本:
csh
#!/bin/csh -f
set DEMO = D16_regression_and_trend_tracking
echo "Running $DEMO"
safeic-regress \
--manifest manifest.yaml \
--output-dir outputs
预期输出:
text
outputs/regression_summary.md
outputs/regression_alerts.csv
outputs/metric_trend.csv
outputs/dc_trend_by_failure_mode.csv
outputs/residual_fit_trend.csv
outputs/fault_outcome_delta.csv
outputs/fmeda_delta_trend.csv
outputs/review_item_trend.csv
outputs/policy_delta.csv
outputs/trend_manifest.yaml
33. metric_trend.csv 示例
csv
metric,baseline,current,delta,delta_class
total_base_fit,0.078,0.078,0.000,no_change
total_residual_fit,0.0204,0.0104,-0.0100,improvement
weighted_selected_dc,0.738,0.867,0.129,improvement
rows_review_required,2,1,-1,improvement
unsafe_faults,2,1,-1,improvement
unresolved_faults,0,0,0,no_change
这提供了紧凑的 top-level view。
34. fault_outcome_delta.csv 示例
csv
fault_id,node,failure_mode,baseline_outcome,current_outcome,delta_class,severity
F001,toy_counter.count[0],FM_DATA_CORRUPTION,detected,detected,no_change,INFO
F003,toy_counter.count_parity,FM_DIAGNOSTIC_STATE_CORRUPTION,unsafe,unsafe,no_change,HIGH
F004,toy_counter.alarm,FM_ALARM_NOT_ASSERTED,unsafe,detected,improvement,INFO
F010,toy_counter.alarm_mask,FM_DIAGNOSTIC_MASKED,missing,unsafe,new_unsafe_fault,HIGH
这个表说明 metric changes 背后的详细故事。
35. regression_alerts.csv 示例
csv
alert_id,severity,category,item,message,recommended_action
A001,HIGH,new_unsafe_fault,F010,new unsafe alarm-mask fault appeared,review alarm mask protection
A002,MEDIUM,review_item,I002,diagnostic state issue remains open,add diagnostic state protection or justify residual risk
A003,LOW,confidence,FM_ALARM_NOT_ASSERTED,measured confidence is still low,expand campaign sample size
这是驱动 action 的输出。
36. regression_summary.md 示例
md
# D16 Regression and Trend Tracking Summary
Baseline: iter_001
Current: iter_002
Design: toy_counter
## Overall Status
Status: REVIEW_REQUIRED
## Top-Level Metric Changes
- Total residual FIT decreased from 0.0204 to 0.0104.
- Weighted selected DC increased from 0.738 to 0.867.
- Unsafe faults decreased from 2 to 1.
- Review-required FMEDA rows decreased from 2 to 1.
## Improvements
1. Fault F004 changed from unsafe to detected.
2. FM_ALARM_NOT_ASSERTED residual FIT decreased.
## Remaining Issues
1. Diagnostic state corruption remains unsafe.
2. A new alarm-mask unsafe fault appeared.
3. Measured confidence remains low for alarm-path coverage.
## Recommended Actions
1. Add or justify diagnostic state protection.
2. Review new alarm-mask fault F010.
3. Expand campaign sample size before increasing selected DC further.
Summary 应足够清晰,可直接用于 review discussion。
37. 校验规则
safeic-regress 应校验:
text
baseline package exists
current package exists
required metric files exist
required FMEDA files exist
required fault outcome files exist
policy files exist or missing status is reported
iteration IDs are defined
fault matching policy is valid
FMEDA matching policy is valid
trend thresholds are valid
numeric metrics can be parsed
review item statuses are valid
示例信息:
text
[PASS] baseline package loaded
[PASS] current package loaded
[PASS] measured DC tables loaded for both packages
[PASS] FMEDA rows matched: 3 matched, 1 added, 0 removed
[WARN] measurement policy hash changed
[WARN] new unsafe fault F010 found
[ERROR] current package missing fmeda_table.csv
当 required artifacts 缺失时,工具应拒绝比较 incomplete packages。
38. 常见错误
38.1 只比较一个 Metric
Measured DC 增长可能掩盖 residual FIT、confidence 或 unresolved evidence 的 regression。
38.2 忽略 Policy Changes
如果 classification 或 measurement policies 变化了,metric changes 就很难解释。
38.3 自动把 Added Faults 当成 Regressions
New faults 可能只是代表 coverage 扩展。
需要谨慎分类。
38.4 忽略 Persistent Review Items
一个 review item 跨多个 iterations 一直 open,很重要。
38.5 隐藏 Evidence Quality Regression
Unresolved ratio 和 missing artifacts 都很重要。
38.6 使用 Unreviewed Baseline
弱 baseline 会让 trend conclusions 产生误导。
38.7 所有 Warning 都 Fail CI
不是所有 warning 都是 blocker。
Critical safety regressions 应 fail;探索性 warning 不一定 fail。
39. D16 如何连接到后续 Demo?
D16 让 safety engineering 具备 iteration-aware 能力。
后续 Demo 可以利用 regression outputs 做 tool comparison、dashboarding 和 publication。
D16 Regression Tracking
D17 Commercial Tool Comparison
D18 Dashboard / Website Demo
D19 CI Automation
Comparison Report
Interactive Trend View
Automated Safety Regression Gate
图 9:D16 为 comparison、dashboarding 和 automation 提供 trend 与 regression foundation。
一旦具备 regression tracking,整个 workflow 就更像真正的 engineering platform。
40. 推荐实现阶段
D16 可以分阶段实现。
Stage 1:Two-Package Metric Comparison
比较 baseline 和 current package metrics。
交付物:
text
metric_trend.csv
regression_summary.md
Stage 2:Fault Outcome Delta
比较跨迭代的 classified outcomes。
交付物:
text
fault_outcome_delta.csv
regression_alerts.csv
Stage 3:FMEDA Row Delta
比较 FMEDA row values 和 review statuses。
交付物:
text
fmeda_delta_trend.csv
Stage 4:Review Item and Policy Delta
跟踪 review item changes 和 policy hash changes。
交付物:
text
review_item_trend.csv
policy_delta.csv
Stage 5:Trend Database and CI Mode
维护 historical trend tables 和 CI pass/warn/fail status。
交付物:
text
trend_db/
ci_status.csv
这个分阶段实现让 D16 立刻有用,并能逐步扩展到 automation。
41. 总结
Regression and trend tracking 把 safety analysis 从一次性 report 变成可迭代工程闭环。
D16 Demo:
text
D16_regression_and_trend_tracking
引入通用工具:
text
safeic-regress
该工具消费:
text
baseline evidence package
current evidence package
regression_config.yaml
trend_policy.yaml
并生成:
text
regression_summary.md
regression_alerts.csv
metric_trend.csv
dc_trend_by_failure_mode.csv
dc_trend_by_endpoint.csv
residual_fit_trend.csv
fault_outcome_delta.csv
fmeda_delta_trend.csv
review_item_trend.csv
policy_delta.csv
trend_manifest.yaml
核心结论是:
Safety evidence 必须随时间被跟踪。单次 report 解释一个 snapshot,而 regression and trend tracking 展示 safety argument 是在改善、退化,还是变得更不确定。
D16 让 safety workflow 变得可迭代、可审计,并且具备 CI-style automation 的基础。
42. D16 Demo Checklist
对于 D16_regression_and_trend_tracking,预期交付物如下:
text
[ ] README.md
[ ] run_demo.sh
[ ] run_demo.csh
[ ] manifest.yaml
[ ] inputs/regression_config.yaml
[ ] inputs/trend_policy.yaml
[ ] inputs/baseline/evidence_package/package_manifest.yaml
[ ] inputs/baseline/evidence_package/package_status.csv
[ ] inputs/baseline/evidence_package/metrics/measured_dc_by_failure_mode.csv
[ ] inputs/baseline/evidence_package/metrics/measured_residual_fit.csv
[ ] inputs/baseline/evidence_package/metrics/safety_metric_summary.csv
[ ] inputs/baseline/evidence_package/fmeda/fmeda_table.csv
[ ] inputs/baseline/evidence_package/fmeda/fmeda_review_items.csv
[ ] inputs/baseline/evidence_package/campaign/fault_outcomes.csv
[ ] inputs/baseline/evidence_package/policies/classification_policy.yaml
[ ] inputs/baseline/evidence_package/policies/measurement_policy.yaml
[ ] inputs/current/evidence_package/package_manifest.yaml
[ ] inputs/current/evidence_package/package_status.csv
[ ] inputs/current/evidence_package/metrics/measured_dc_by_failure_mode.csv
[ ] inputs/current/evidence_package/metrics/measured_residual_fit.csv
[ ] inputs/current/evidence_package/metrics/safety_metric_summary.csv
[ ] inputs/current/evidence_package/fmeda/fmeda_table.csv
[ ] inputs/current/evidence_package/fmeda/fmeda_review_items.csv
[ ] inputs/current/evidence_package/campaign/fault_outcomes.csv
[ ] inputs/current/evidence_package/policies/classification_policy.yaml
[ ] inputs/current/evidence_package/policies/measurement_policy.yaml
[ ] outputs/regression_summary.md
[ ] outputs/regression_alerts.csv
[ ] outputs/metric_trend.csv
[ ] outputs/dc_trend_by_failure_mode.csv
[ ] outputs/dc_trend_by_endpoint.csv
[ ] outputs/residual_fit_trend.csv
[ ] outputs/fault_outcome_delta.csv
[ ] outputs/fmeda_delta_trend.csv
[ ] outputs/review_item_trend.csv
[ ] outputs/policy_delta.csv
[ ] outputs/trend_manifest.yaml
一次成功的 D16 运行应该回答:
text
Baseline 和 current safety evidence package 之间发生了什么变化?
Measured DC 是 improve 还是 regress?
Residual FIT 是 decrease 还是 increase?
是否有 detected fault 变成 unsafe?
是否有 unsafe fault 变成 detected?
是否引入了新的 unsafe faults?
FMEDA rows 是改善还是退化?
哪些 review items 被打开、关闭或 reopened?
Evidence quality 是改善还是退化?
Policies 是否在 iterations 之间发生变化?
Current iteration 应该 pass、warn 还是 fail safety regression gate?