文章目录
-
- 一、金融风控业务背景
-
- [1.1 三大风控场景与指标选择](#1.1 三大风控场景与指标选择)
- [1.2 成本矩阵驱动的阈值选择](#1.2 成本矩阵驱动的阈值选择)
- 二、数据审查与合规约束
-
- [2.1 可用与禁用特征清单](#2.1 可用与禁用特征清单)
- [2.2 特征审计文档模板](#2.2 特征审计文档模板)
- 三、特征设计方法论
-
- [3.1 WOE 编码(证据权重)](#3.1 WOE 编码(证据权重))
- [3.2 WOE 编码的关键原理](#3.2 WOE 编码的关键原理)
- [3.3 时间衰减特征](#3.3 时间衰减特征)
- 四、评分卡模型构建
-
- [4.1 逻辑回归 → 评分刻度转换](#4.1 逻辑回归 → 评分刻度转换)
- [4.2 评分卡可解释性](#4.2 评分卡可解释性)
- [五、模型选型决策:合规优先 vs 性能优先](#五、模型选型决策:合规优先 vs 性能优先)
-
- [5.1 两种选型哲学](#5.1 两种选型哲学)
- [5.2 实际选型策略](#5.2 实际选型策略)
- [5.3 双模型并行策略](#5.3 双模型并行策略)
- [六、SHAP 合规输出](#六、SHAP 合规输出)
-
- [6.1 监管审计文档模板](#6.1 监管审计文档模板)
- [6.2 特征歧视性检测](#6.2 特征歧视性检测)
- 七、模型部署与实时决策
-
- [7.1 推理 API 设计](#7.1 推理 API 设计)
- [7.2 批量评分(每日 10 万笔)](#7.2 批量评分(每日 10 万笔))
- 八、监控与合规迭代
-
- [8.1 PSI 特征稳定性监控](#8.1 PSI 特征稳定性监控)
- [8.2 模型性能月度审计](#8.2 模型性能月度审计)
- [8.3 新特征上线审批流程](#8.3 新特征上线审批流程)
- [九、完整链路实战:LendingClub 信贷违约预测](#九、完整链路实战:LendingClub 信贷违约预测)
-
- [9.1 实战项目代码框架](#9.1 实战项目代码框架)
- 总结
金融风控是 ML 变现最直接的场景------一笔贷款违约预测准确率提升 1%,可能意味着几千万的坏账损失减少。但金融场景有特殊约束:监管要求模型可解释、特征必须合规、实时决策延迟 < 100ms、模型变更需要审计记录------这些约束让"训个模型"变成了"建一个合规系统"。
#mermaid-svg-d2TMHczU0mnVtC3v{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-d2TMHczU0mnVtC3v .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-d2TMHczU0mnVtC3v .error-icon{fill:#552222;}#mermaid-svg-d2TMHczU0mnVtC3v .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-d2TMHczU0mnVtC3v .marker{fill:#333333;stroke:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v .marker.cross{stroke:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-d2TMHczU0mnVtC3v p{margin:0;}#mermaid-svg-d2TMHczU0mnVtC3v .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster-label text{fill:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster-label span{color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster-label span p{background-color:transparent;}#mermaid-svg-d2TMHczU0mnVtC3v .label text,#mermaid-svg-d2TMHczU0mnVtC3v span{fill:#333;color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .node rect,#mermaid-svg-d2TMHczU0mnVtC3v .node circle,#mermaid-svg-d2TMHczU0mnVtC3v .node ellipse,#mermaid-svg-d2TMHczU0mnVtC3v .node polygon,#mermaid-svg-d2TMHczU0mnVtC3v .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .rough-node .label text,#mermaid-svg-d2TMHczU0mnVtC3v .node .label text,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape .label,#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape .label{text-anchor:middle;}#mermaid-svg-d2TMHczU0mnVtC3v .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .rough-node .label,#mermaid-svg-d2TMHczU0mnVtC3v .node .label,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape .label,#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape .label{text-align:center;}#mermaid-svg-d2TMHczU0mnVtC3v .node.clickable{cursor:pointer;}#mermaid-svg-d2TMHczU0mnVtC3v .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v .arrowheadPath{fill:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-d2TMHczU0mnVtC3v .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-d2TMHczU0mnVtC3v .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-d2TMHczU0mnVtC3v .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-d2TMHczU0mnVtC3v .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-d2TMHczU0mnVtC3v .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-d2TMHczU0mnVtC3v .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster text{fill:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster span{color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-d2TMHczU0mnVtC3v .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-d2TMHczU0mnVtC3v rect.text{fill:none;stroke-width:0;}#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape p,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape .label rect,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-d2TMHczU0mnVtC3v .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-d2TMHczU0mnVtC3v .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-d2TMHczU0mnVtC3v :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 数据审查
合规筛选
WOE/IV 编码
评分卡 vs XGBoost
SHAP 合规输出
部署方案
PSI 监控 + 月度审计
监管报告
一、金融风控业务背景
1.1 三大风控场景与指标选择
金融风控不是单一问题------它包含三个核心场景,每个场景的业务约束和成功标准不同:
| 场景 | 目标定义 | 成功标准 | 成本矩阵 |
|---|---|---|---|
| 信贷违约预测 | 预测贷款是否违约 | recall > 90%, precision > 70% | 漏判违约 = 损失本金;误判正常 = 失去客户 |
| 反欺诈检测 | 识别欺诈交易 | recall > 95%, < 50ms 响应 | 漏判欺诈 = 直接损失;误判正常 = 客户体验下降 |
| 信用评分 | 评估客户信用等级 | 排序准确性 + 可解释 | 评分偏差 = 风险定价错误 |
本篇聚焦信贷违约预测场景,使用 LendingClub 公开数据集。
1.2 成本矩阵驱动的阈值选择
金融场景不能用默认的 0.5 阈值------业务成本决定了最优阈值:
python
def optimal_threshold_by_cost(y_true, y_prob,
cost_fn, # 漏判违约的成本(损失本金)
cost_fp, # 误判正常的成本(失去客户)
cost_tp=0, cost_tn=0):
"""基于成本矩阵寻找最优阈值"""
thresholds = np.arange(0.1, 0.9, 0.01)
total_costs = []
for t in thresholds:
y_pred = (y_prob >= t).astype(int)
# 计算每种预测结果的成本
fn_mask = (y_true == 1) & (y_pred == 0) # 漏判
fp_mask = (y_true == 0) & (y_pred == 1) # 误判
tp_mask = (y_true == 1) & (y_pred == 1) # 正确识别违约
tn_mask = (y_true == 0) & (y_pred == 0) # 正确识别正常
total = fn_mask.sum() * cost_fn + fp_mask.sum() * cost_fp + \
tp_mask.sum() * cost_tp + tn_mask.sum() * cost_tn
total_costs.append(total)
optimal_idx = np.argmin(total_costs)
optimal_threshold = thresholds[optimal_idx]
min_cost = total_costs[optimal_idx]
print(f"最优阈值: {optimal_threshold:.2f}")
print(f"最小总成本: {min_cost:.0f}")
print(f"(漏判成本={cost_fn}, 误判成本={cost_fp})")
return optimal_threshold
# 示例:贷款本金 100000 元,客户价值 5000 元
optimal_t = optimal_threshold_by_cost(
y_true=y_test, y_prob=y_prob,
cost_fn=100000, # 漏判违约 = 损失 10 万
cost_fp=5000 # 误判正常 = 损失 5 千客户价值
)
# 最优阈值通常在 0.3~0.4(低于 0.5,偏向高 recall)
二、数据审查与合规约束
2.1 可用与禁用特征清单
金融风控的特征合规审查是其他领域不会涉及的特殊环节。监管要求不能使用可能导致歧视性决策的特征:
python
def compliance_feature_audit(feature_list, prohibited_features,
sensitive_but_allowed):
"""特征合规审查"""
audit_results = []
for feat in feature_list:
if feat in prohibited_features:
status = 'PROHIBITED'
reason = '监管禁用------可能导致歧视性决策'
elif feat in sensitive_but_allowed:
status = 'SENSITIVE_ALLOWED'
reason = '可用但需审计记录------不可作为主决策因素'
else:
status = 'ALLOWED'
reason = '合规可用'
audit_results.append({
'feature': feat,
'status': status,
'reason': reason
})
# 打印审查报告
prohibited = [r for r in audit_results if r['status'] == 'PROHIBITED']
sensitive = [r for r in audit_results if r['status'] == 'SENSITIVE_ALLOWED']
print(f"合规审查报告: {len(feature_list)} 特征")
print(f" 禁用: {len(prohibited)} → {[r['feature'] for r in prohibited]}")
print(f" 敏感但允许: {len(sensitive)} → {[r['feature'] for r in sensitive]}")
print(f" 完全允许: {len(feature_list) - len(prohibited) - len(sensitive)}")
return audit_results
# LendingClub 特征合规审查
all_features = [
'loan_amnt', 'term', 'int_rate', 'installment', 'grade',
'emp_length', 'home_ownership', 'annual_inc', 'dti',
'delinq_2yrs', 'fico_range_low', 'fico_range_high',
'open_acc', 'pub_rec', 'revol_bal', 'revol_util',
'total_acc', 'earliest_cr_line', 'application_type',
'gender', 'race', 'age' # 最后三个是敏感特征
]
prohibited = ['gender', 'race']
sensitive_but_allowed = ['age'] # 年龄可用于验证(如 <18 拒绝),但不能歧视
audit = compliance_feature_audit(all_features, prohibited, sensitive_but_allowed)
2.2 特征审计文档模板
合规场景要求每个特征都有审计文档------证明特征来源、使用理由、潜在歧视风险:
python
def generate_feature_audit_doc(feature_name, source, justification,
discrimination_risk, mitigations):
"""生成特征审计文档"""
template = f"""
========================================
特征审计文档: {feature_name}
========================================
1. 特征来源: {source}
2. 使用理由: {justification}
3. 潜在歧视风险: {discrimination_risk}
4. 缓解措施: {mitigations}
5. 数据类型: {dtype}
6. 值域范围: [{min_val}, {max_val}]
7. 缺失率: {missing_pct}%
8. 审批状态: APPROVED / REJECTED / CONDITIONAL
9. 审批人: {approver}
10. 审批日期: {approval_date}
"""
return template
# 示例:dti (负债收入比) 的审计文档
dti_audit = generate_feature_audit_doc(
feature_name='dti',
source='LendingClub 申请表------申请人自报负债与收入',
justification='直接反映还款能力------违约风险的核心指标',
discrimination_risk='LOW------与种族/性别无直接关联',
mitigations='无特殊缓解需求,但需监控不同收入群体的评分分布'
)
print(dti_audit)
三、特征设计方法论
3.1 WOE 编码(证据权重)
WOE(Weight of Evidence)是信用评分行业的标准编码方法------它将每个特征分箱后,计算每一箱对正/负目标的区分能力:
python
def calculate_woe_iv(df, feature, target, bins=10, min_pct=0.05):
"""WOE/IV 编码计算------信用评分行业标准"""
# 分箱策略:数值特征用等频分箱,类别特征用原始类别
if df[feature].dtype in ['object', 'category']:
bins_df = df.groupby(feature)[target].agg(['sum', 'count'])
else:
# 等频分箱
df['temp_bin'] = pd.qcut(df[feature], q=bins, duplicates='drop')
bins_df = df.groupby('temp_bin')[target].agg(['sum', 'count'])
bins_df.columns = ['events', 'total']
bins_df['non_events'] = bins_df['total'] - bins_df['events']
# 总事件数和非事件数
total_events = bins_df['events'].sum()
total_non_events = bins_df['non_events'].sum()
# 计算每箱的分布
bins_df['pct_events'] = bins_df['events'] / total_events
bins_df['pct_non_events'] = bins_df['non_events'] / total_non_events
# 处理 0 值(加微小值避免 log(0))
bins_df['pct_events'] = np.clip(bins_df['pct_events'], 1e-4, None)
bins_df['pct_non_events'] = np.clip(bins_df['pct_non_events'], 1e-4, None)
# WOE = ln(非事件占比 / 事件占比)
bins_df['woe'] = np.log(bins_df['pct_non_events'] / bins_df['pct_events'])
# IV = Σ(pct_non_events - pct_events) × woe
bins_df['iv_contribution'] = (bins_df['pct_non_events'] - bins_df['pct_events']) * bins_df['woe']
iv = bins_df['iv_contribution'].sum()
# IV 值解读
iv_interpretation = {
'<0.02': '无用特征',
'0.02-0.1': '弱预测力',
'0.1-0.3': '中等预测力',
'0.3-0.5': '强预测力',
'>0.5': '极强预测力(需检查是否过拟合)'
}
# IV 值分类
if iv < 0.02:
iv_cat = '无用'
elif iv < 0.1:
iv_cat = '弱'
elif iv < 0.3:
iv_cat = '中等'
elif iv < 0.5:
iv_cat = '强'
else:
iv_cat = '极强/可疑'
print(f"特征 '{feature}' IV={iv:.3f} ({iv_cat})")
print(f"WOE 分布:")
print(bins_df[['woe', 'pct_events', 'pct_non_events']].to_string())
# IV 筛选标准:IV > 0.1 保留
recommendation = '保留' if iv >= 0.1 else '考虑删除'
print(f"筛选建议: {recommendation}")
return bins_df, iv
# 批量计算所有特征的 IV 值
feature_ivs = {}
for feat in ['loan_amnt', 'dti', 'fico_range_low', 'revol_util',
'emp_length', 'delinq_2yrs', 'open_acc']:
_, iv = calculate_woe_iv(df, feat, 'default', bins=10)
feature_ivs[feat] = iv
# IV 筛选结果
selected_features = [f for f, iv in feature_ivs.items() if iv >= 0.1]
print(f"\nIV ≥ 0.1 的特征: {selected_features}")
3.2 WOE 编码的关键原理
WOE 编码不是简单的分箱------它有明确的统计含义:
- WOE > 0:该箱中正常客户占比高于违约客户 → 这箱倾向于"正常"
- WOE < 0:该箱中违约客户占比高于正常客户 → 这箱倾向于"违约"
- WOE ≈ 0:该箱中两类客户占比几乎相同 → 该箱区分能力弱
IV 值是整个特征的区分能力度量------IV > 0.1 是行业筛选标准:
| IV 范围 | 解读 | 处理建议 |
|---|---|---|
| < 0.02 | 无预测力 | 删除 |
| 0.02~0.1 | 弱预测力 | 可保留但权重低 |
| 0.1~0.3 | 中等预测力 | 保留,重要特征 |
| 0.3~0.5 | 强预测力 | 保留,核心特征 |
| > 0.5 | 极强/可疑 | 需检查------可能是目标泄漏 |
3.3 时间衰减特征
金融数据有时间衰减效应------最近 3 个月的行为比 12 个月的行为更重要:
python
def create_time_decay_features(df, base_features, decay_rate=0.95):
"""时间衰减特征------近期行为权重更高"""
# 1. 加权逾期次数(最近逾期权重更高)
# 模拟:2年内逾期次数 × 时间衰减
df['weighted_delinq'] = df['delinq_2yrs'] * (1 + df['delinq_2yrs'] * 0.1)
# 2. 信用历史长度(月数)与年龄的比值
# 年轻但信用历史长 = 负责任的表现
df['credit_history_density'] = df['earliest_cr_line_months'] / (df['age_months'] + 1)
# 3. 最近 6 个月 revolving utilization vs 总 revolving utilization
# 如果最近 utilization 上升 = 流失风险信号(本数据集没有此字段,模拟)
df['utilization_trend'] = df['revol_util'] * (1 + 0.1 * df['open_acc'] / df['total_acc'])
# 4. DTI 分箱后的极端值标记
# DTI > 40% = 高风险区间
df['high_dti_flag'] = (df['dti'] > 40).astype(int)
return df
df = create_time_decay_features(df, [])
print("时间衰减特征已创建")
四、评分卡模型构建
4.1 逻辑回归 → 评分刻度转换
评分卡(Scorecard)是金融行业的"可解释 ML 标准"------逻辑回归系数转换为用户友好的整数评分:
python
def build_scorecard(lr_model, feature_names, woe_mappings,
pdo=20, base_score=600, base_odds=50):
"""逻辑回归 → 评分卡转换
核心公式: Score = BaseScore + PDO/ln(2) × (β₀ + Σ βᵢ × WOEᵢ)
- PDO (Points to Double Odds): 评分每增加 PDO 分,好/坏比率翻倍
- BaseScore: 基准分数(好/坏比 = BaseOdds 时的分数)
- BaseOdds: 基准好/坏比率
"""
factor = pdo / np.log(2) # 评分因子
offset = base_score - factor * np.log(base_odds) # 基准偏移
scorecard = []
# 截距 → 基础分
intercept_score = offset + factor * lr_model.intercept_[0]
scorecard.append({
'feature': 'BaseScore',
'bin': '基础分',
'woe': lr_model.intercept_[0],
'coefficient': 1,
'score': round(intercept_score)
})
# 每个特征 × 每个分箱 → 评分贡献
for i, feat in enumerate(feature_names):
coef = lr_model.coef_[0][i]
if feat in woe_mappings:
for bin_label, woe_val in woe_mappings[feat].items():
score_contribution = factor * coef * woe_val
scorecard.append({
'feature': feat,
'bin': bin_label,
'woe': woe_val,
'coefficient': coef,
'score': round(score_contribution)
})
scorecard_df = pd.DataFrame(scorecard)
# 验证:评分范围是否合理(通常 300~850)
min_score = scorecard_df[scorecard_df['feature'] != 'BaseScore']['score'].min()
max_score = scorecard_df[scorecard_df['feature'] != 'BaseScore']['score'].max()
total_range = intercept_score + sum(scorecard_df[scorecard_df['score'] < 0]['score']), \
intercept_score + sum(scorecard_df[scorecard_df['score'] > 0]['score'])
print(f"评分卡构建完成:")
print(f" PDO={pdo}, 基准分={base_score}, 基准比率={base_odds}")
print(f" 评分范围约: [{total_range[0]:.0f}, {total_range[1]:.0f}]")
print(f"\n评分卡详情:")
print(scorecard_df.to_string(index=False))
return scorecard_df
# 模拟评分卡输出示例
print("=== 评分卡输出示例 ===")
print("客户评分 620 = 基准 600 + 负债收入比-20 + 信用历史+40")
print("客户评分 380 = 基准 600 + DTI极高风险-80 + 多次逾期-120 + 短信用历史-20")
4.2 评分卡可解释性
评分卡的核心优势是每个客户都能收到一份清晰的评分拆解:
python
def scorecard_explanation(customer_features, scorecard_df, base_score):
"""评分卡单客户解释------监管要求的合规输出"""
total_score = base_score
contributions = []
for feat, value in customer_features.items():
# 找到该特征对应值所在的分箱
feat_rows = scorecard_df[scorecard_df['feature'] == feat]
# 简化:根据值找到最近的分箱
if feat_rows['bin'].dtype == 'object':
# 类别特征:直接匹配
matching_row = feat_rows[feat_rows['bin'] == value]
else:
# 数值特征:找到包含该值的区间
matching_row = feat_rows.iloc[0] # 简化
if len(matching_row) > 0:
score_contribution = matching_row['score'].values[0]
total_score += score_contribution
contributions.append({
'feature': feat,
'value': value,
'score_contribution': score_contribution,
'direction': '提升信用' if score_contribution > 0 else '降低信用'
})
# 排序:影响最大的因素排在前面
contributions.sort(key=lambda x: abs(x['score_contribution']), reverse=True)
# 生成解释报告
risk_level = '低风险' if total_score >= 650 else \
'中等风险' if total_score >= 500 else '高风险'
report = [
f"客户信用评分: {total_score} ({risk_level})",
f"",
f"评分拆解(按影响程度排序):"
]
for c in contributions[:5]:
report.append(
f" • {c['feature']}={c['value']}: "
f"贡献 {c['score_contribution']} 分 ({c['direction']})"
)
report.extend([
f"",
f"决策建议:",
f" {'批准贷款------建议利率基准+0.5%' if total_score >= 650 else "批准贷款------需附加条件' if total_score >= 500 else '拒绝贷款或要求抵押物'}"
])
return '\n'.join(report)
# 示例客户
customer = {'dti': 15, 'fico_range_low': 700, 'delinq_2yrs': 0,
'revol_util': 25, 'emp_length': '10+ years'}
print(scorecard_explanation(customer, scorecard_df, 600))
五、模型选型决策:合规优先 vs 性能优先
5.1 两种选型哲学
金融场景的模型选型不是"谁精度高选谁"------而是合规要求决定选型边界:
| 模型 | 可解释性 | 精度 | 延迟 | 监管合规 | 适用场景 |
|---|---|---|---|---|---|
| 逻辑回归(评分卡) | ★★★★★ | ★★★ | ★★★★★ | ★★★★★ | 监管审计严格 |
| XGBoost + SHAP | ★★★★ | ★★★★★ | ★★★★ | ★★★★ | 性能优先+事后解释 |
| LightGBM | ★★★ | ★★★★★ | ★★★★★ | ★★★ | 大数据量+快速迭代 |
| Stacking | ★ | ★★★★★ | ★★ | ★ | 不推荐------监管无法审计 |
5.2 实际选型策略
python
def financial_model_selection(data_size, regulatory_level, latency_requirement,
explainability_requirement):
"""金融场景模型选型决策器"""
# 监管等级 → 可解释性最低要求
explainability_threshold = {
'strict': 4, # 银行信贷------必须评分卡级别
'moderate': 3, # 互联网金融------需事后解释
'loose': 2 # 内部风控------可接受黑箱
}
min_explain = explainability_threshold.get(regulatory_level, 3)
candidates = [
{'model': 'LogisticRegression_Scorecard', 'explainability': 5,
'accuracy': 3, 'latency': 5, 'compliance': 5},
{'model': 'XGBoost_SHAP', 'explainability': 4,
'accuracy': 5, 'latency': 4, 'compliance': 4},
{'model': 'LightGBM', 'explainability': 3,
'accuracy': 5, 'latency': 5, 'compliance': 3},
{'model': 'Stacking', 'explainability': 1,
'accuracy': 5, 'latency': 2, 'compliance': 1},
]
# 过滤:满足最低可解释性要求
qualified = [c for c in candidates if c['explainability'] >= min_explain]
# 在合格候选中按精度排序
qualified.sort(key=lambda x: x['accuracy'], reverse=True)
print(f"选型结果 (监管等级={regulatory_level}):")
print(f" 最低可解释性要求: {min_explain}")
for c in qualified:
print(f" ✓ {c['model']} (解释={c['explainability']}, 精度={c['accuracy']})")
rejected = [c for c in candidates if c['explainability'] < min_explain]
for c in rejected:
print(f" ✗ {c['model']} (解释={c['explainability']} < 最低要求)")
return qualified[0] # 推荐:精度最高的合规模型
# 银行信贷场景
recommended = financial_model_selection(
data_size=50000, regulatory_level='strict',
latency_requirement=100, explainability_requirement='high'
)
# → 推荐: LogisticRegression_Scorecard
# 互联网金融场景
recommended2 = financial_model_selection(
data_size=500000, regulatory_level='moderate',
latency_requirement=50, explainability_requirement='medium'
)
# → 推荐: XGBoost_SHAP
5.3 双模型并行策略
金融实践中常用"双轨制":评分卡作为合规主模型 + XGBoost 作为性能提升辅助模型:
python
def dual_model_strategy(X_train, y_train, X_test, y_test, preprocessor):
"""双模型并行策略------合规主模型 + 性能辅助模型"""
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, f1_score, recall_score
# 1. 合规主模型:逻辑回归评分卡
lr = LogisticRegression(class_weight='balanced', max_iter=1000, random_state=42)
lr_pipe = Pipeline([('preprocessor', preprocessor), ('model', lr)])
lr_pipe.fit(X_train, y_train)
lr_prob = lr_pipe.predict_proba(X_test)[:, 1]
lr_pred = (lr_prob >= 0.35).astype(int) # 成本矩阵驱动的阈值
# 2. 性能辅助模型:XGBoost
xgb = XGBClassifier(
n_estimators=300, max_depth=6, learning_rate=0.05,
scale_pos_weight=4, eval_metric='logloss', random_state=42
)
xgb_pipe = Pipeline([('preprocessor', preprocessor), ('model', xgb)])
xgb_pipe.fit(X_train, y_train)
xgb_prob = xgb_pipe.predict_proba(X_test)[:, 1]
xgb_pred = (xgb_prob >= 0.35).astype(int)
# 3. 双模型一致性检查
agreement = (lr_pred == xgb_pred).mean()
print(f"双模型一致性: {agreement:.1%}")
# 4. 分歧处理:两模型预测不一致时用哪个?
# 策略:不一致时信任合规主模型(逻辑回归)
final_pred = lr_pred.copy()
disagreement_mask = lr_pred != xgb_pred
# 高风险分歧(XGBoost 预测违约但 LR 预测正常)→ 人工复核
high_risk_disagreement = disagreement_mask & (xgb_pred == 1) & (lr_pred == 0)
# 5. 汇总指标
print(f"\n合规主模型 (LR 评分卡):")
print(f" Recall: {recall_score(y_test, lr_pred):.3f}")
print(f" F1: {f1_score(y_test, lr_pred):.3f}")
print(f" ROC-AUC: {roc_auc_score(y_test, lr_prob):.3f}")
print(f"\n性能辅助模型 (XGBoost):")
print(f" Recall: {recall_score(y_test, xgb_pred):.3f}")
print(f" F1: {f1_score(y_test, xgb_pred):.3f}")
print(f" ROC-AUC: {roc_auc_score(y_test, xgb_prob):.3f}")
print(f"\n分歧案例数: {disagreement_mask.sum()} (高风险分歧: {high_risk_disagreement.sum()})")
print(f"处理策略: 高风险分歧 → 人工复核队列")
return {'lr_pipe': lr_pipe, 'xgb_pipe': xgb_pipe}
dual_results = dual_model_strategy(X_train, y_train, X_test, y_test, preprocessor)
六、SHAP 合规输出
6.1 监管审计文档模板
SHAP 解释在金融场景中不只是"技术工具"------它是监管合规的必要输出。以下是合规级别的 SHAP 报告模板:
python
def generate_compliance_shap_report(customer_idx, shap_values, feature_names,
prediction, threshold, risk_level):
"""SHAP 合规输出------满足监管审计要求"""
sv = shap_values[customer_idx]
top_idx = np.argsort(np.abs(sv))[-7:]
# 特征贡献方向表
contributions = []
for idx in reversed(top_idx):
impact = sv[idx]
contributions.append({
'feature': feature_names[idx],
'value': X_processed[customer_idx, idx],
'shap_value': round(impact, 4),
'direction': '正向(提升信用)' if impact < 0 else '负向(增加违约风险)',
'weight_pct': round(abs(impact) / sum(abs(sv)) * 100, 1)
})
# 生成监管审计报告
report = f"""
================================================
信贷违约预测合规解释报告
================================================
1. 预测结果:
- 违约概率: {prediction:.2%}
- 风险等级: {risk_level}
- 决策阈值: {threshold}
- 决策: {'拒绝贷款' if prediction >= threshold else '批准贷款'}
2. 关键影响因素 (Top 7):
{'---' * 50}
"""
for c in contributions:
report += f"\n • {c['feature']}: 值={c['value']:.2f}, " \
f"SHAP={c['shap_value']:+.4f}, " \
f"方向={c['direction']}, " \
f"权重={c['weight_pct']}%"
report += f"""
3. 模型信息:
- 模型类型: XGBoost
- 版本: v2.1.0
- 训练日期: 2026-06-01
- 特征数: {len(feature_names)}
4. 合规声明:
本预测未使用性别、种族等受保护特征。
所有特征使用均经过合规审查并留有审计记录。
5. 申诉渠道:
如对本决策有异议,可通过以下渠道申诉:
- 客服热线: XXX-XXXX
- 申诉邮箱: appeal@bank.com
- 处理时限: 5 个工作日
================================================
审批人: ________ 审批日期: ________
================================================
"""
return report
# 示例
compliance_report = generate_compliance_shap_report(
customer_idx=42, shap_values=shap_values, feature_names=feature_names,
prediction=0.72, threshold=0.35, risk_level='高风险'
)
print(compliance_report)
6.2 特征歧视性检测
即使没有使用禁用特征,模型也可能通过合法特征间接关联到敏感属性。需要检测模型的间接歧视:
python
def indirect_discrimination_check(model_predictions, sensitive_attribute,
non_sensitive_features):
"""检测模型是否通过合法特征间接歧视敏感群体"""
# 方法:检查模型预测在不同敏感群体间的分布差异
group_a = model_predictions[sensitive_attribute == 0] # 如:男性
group_b = model_predictions[sensitive_attribute == 1] # 如:女性
# KS 检验:预测分布是否有显著差异
ks_stat, p_value = ks_2samp(group_a, group_b)
# 平均违约概率差异
mean_diff = group_b.mean() - group_a.mean()
print(f"间接歧视检测:")
print(f" 群体A平均预测: {group_a.mean():.3f}")
print(f" 群体B平均预测: {group_b.mean():.3f}")
print(f" 差异: {mean_diff:+.3f}")
print(f" KS 统计量: {ks_stat:.3f} (p={p_value:.4f})")
if abs(mean_diff) > 0.05:
print(f" ⚠️ 潜在间接歧视------需进一步审查特征关联性")
elif abs(mean_diff) > 0.02:
print(f" ⚡ 微弱差异------建议监控但不需立即行动")
else:
print(f" ✓ 无显著间接歧视风险")
return {'mean_diff': mean_diff, 'ks_stat': ks_stat, 'p_value': p_value}
七、模型部署与实时决策
7.1 推理 API 设计
金融风控的推理 API 有特殊要求------响应延迟 < 100ms、输入合规校验、输出含版本号:
python
from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel, Field, validator
import time
app = FastAPI(title="Credit Risk API", version="2.1.0")
class LoanApplication(BaseModel):
"""贷款申请数据------合规校验"""
loan_amnt: float = Field(..., gt=0, description="贷款金额")
term: str = Field(..., description="贷款期限")
int_rate: float = Field(..., ge=0, le=30, description="利率")
dti: float = Field(..., ge=0, description="负债收入比")
fico_range_low: float = Field(..., ge=300, le=850, description="最低FICO")
emp_length: str = Field(..., description="工作年限")
annual_inc: float = Field(..., gt=0, description="年收入")
delinq_2yrs: int = Field(..., ge=0, description="2年内逾期次数")
revol_util: float = Field(..., ge=0, le=100, description="循环信用利用率")
# ⚠️ 禁用特征不接受
@validator('term')
def validate_term(cls, v):
if v not in ['36 months', '60 months']:
raise ValueError('term must be 36 or 60 months')
return v
class RiskDecision(BaseModel):
"""风控决策输出"""
application_id: str
default_probability: float
decision: str # approve / reject / manual_review
risk_level: str
scorecard_score: int
xgboost_score: float
model_version: str
inference_time_ms: float
compliance_hash: str # 审计哈希------记录每个决策的唯一标识
@app.post("/evaluate", response_model=RiskDecision)
async def evaluate_loan(application: LoanApplication, request: Request):
start_time = time.time()
# 1. 输入合规校验(Pydantic 已做基本校验)
# 2. 特征预处理
input_df = pd.DataFrame([application.dict()])
input_df = create_business_features(input_df)
# 3. 双模型预测
lr_prob = dual_results['lr_pipe'].predict_proba(input_df)[0, 1]
xgb_prob = dual_results['xgb_pipe'].predict_proba(input_df)[0, 1]
# 4. 决策逻辑(合规主模型优先)
threshold = 0.35 # 成本矩阵驱动
final_prob = lr_prob
decision = 'reject' if final_prob >= threshold else \
'manual_review' if abs(final_prob - threshold) < 0.1 else 'approve'
# 5. 评分卡分数估算
scorecard_score = int(600 + (0.5 - final_prob) * 200)
# 6. 审计哈希(不可篡改的决策记录)
compliance_hash = hashlib.sha256(
f"{application.dict()}{final_prob}{decision}{model_version}".encode()
).hexdigest()[:12]
inference_time = (time.time() - start_time) * 1000
return RiskDecision(
application_id=f"APP_{int(time.time())}",
default_probability=round(final_prob, 4),
decision=decision,
risk_level=risk_level(final_prob),
scorecard_score=scorecard_score,
xgboost_score=round(xgb_prob, 4),
model_version="v2.1.0",
inference_time_ms=round(inference_time, 2),
compliance_hash=compliance_hash
)
7.2 批量评分(每日 10 万笔)
python
def batch_scoring(applications_df, model_pipeline, threshold=0.35):
"""批量评分------每日定时任务"""
# 预处理
processed = model_pipeline.named_steps['preprocessor'].transform(applications_df)
# 批量预测
probabilities = model_pipeline.named_steps['model'].predict_proba(processed)[:, 1]
# 评分与决策
decisions = np.where(probabilities >= threshold, 'reject',
np.where(probabilities >= threshold - 0.1, 'manual_review', 'approve'))
# 批量结果
results = pd.DataFrame({
'application_id': applications_df.index,
'default_probability': probabilities,
'decision': decisions,
'scorecard_score': 600 + (0.5 - probabilities) * 200,
'batch_id': f"BATCH_{pd.Timestamp.now().strftime('%Y%m%d_%H%M')}"
})
# 统计
print(f"批量评分完成: {len(results)} 笔")
print(f" 拒绝: {(decisions == 'reject').sum()} ({(decisions == 'reject').mean():.1%})")
print(f" 人工复核: {(decisions == 'manual_review').sum()}")
print(f" 批准: {(decisions == 'approve').sum()}")
return results
# 每日批量评分示例
daily_batch = batch_scoring(df_new_applications, dual_results['lr_pipe'])
八、监控与合规迭代
8.1 PSI 特征稳定性监控
金融场景对特征稳定性有严格要求------PSI > 0.25 的特征必须上报监管:
python
def financial_psi_monitor(reference_df, current_df, features,
regulatory_threshold=0.25):
"""金融级 PSI 监控------含监管上报逻辑"""
psi_results = []
for feat in features:
psi_val = calculate_psi(reference_df[feat], current_df[feat])['psi']
# 金融行业标准分级
if psi_val < 0.1:
status = 'GREEN------稳定'
action = '继续使用'
elif psi_val < regulatory_threshold:
status = 'YELLOW------轻微变化'
action = '密切监控,更新文档'
else:
status = 'RED------显著变化'
action = '暂停使用 + 向监管上报 + 重新训练'
psi_results.append({
'feature': feat,
'psi': psi_val,
'status': status,
'action': action
})
# 生成监管报告
red_features = [r for r in psi_results if 'RED' in r['status']]
print(f"PSI 监控报告:")
for r in psi_results:
print(f" {r['feature']}: PSI={r['psi']:.3f} → {r['status']}")
if red_features:
print(f"\n⚠️ 需上报监管的特征: {[r['feature'] for r in red_features]}")
return psi_results
psi_report = financial_psi_monitor(df_train, df_current_month,
selected_features, regulatory_threshold=0.25)
8.2 模型性能月度审计
python
def monthly_model_audit(predictions_log, reference_metrics, audit_month):
"""月度模型审计------向监管提交"""
recent = pd.read_csv(predictions_log)
# 计算近期指标
from sklearn.metrics import roc_auc_score, f1_score, recall_score, precision_score
recent_auc = roc_auc_score(recent['actual'], recent['predicted_prob'])
recent_recall = recall_score(recent['actual'], recent['predicted'])
recent_precision = precision_score(recent['actual'], recent['predicted'])
recent_f1 = f1_score(recent['actual'], recent['predicted'])
# 与参考指标对比
degradation = {
'auc': reference_metrics['roc_auc'] - recent_auc,
'recall': reference_metrics['recall'] - recent_recall,
'f1': reference_metrics['f1'] - recent_f1
}
# 审计判断
audit_status = 'PASS' if all(d < 0.05 for d in degradation.values()) else \
'WARNING' if all(d < 0.10 for d in degradation.values()) else 'FAIL'
audit_report = f"""
================================================
模型月度审计报告 --- {audit_month}
================================================
模型版本: v2.1.0
审计范围: {len(recent)} 笔贷款申请
性能对比:
{'---' * 50}
指标 参考值 近期值 下降幅度 状态
ROC-AUC {reference_metrics['roc_auc']:.3f} {recent_auc:.3f} {degradation['auc']:.3f} {'OK' if degradation['auc'] < 0.05 else '⚠️'}
Recall {reference_metrics['recall']:.3f} {recent_recall:.3f} {degradation['recall']:.3f} {'OK' if degradation['recall'] < 0.05 else '⚠️'}
F1 {reference_metrics['f1']:.3f} {recent_f1:.3f} {degradation['f1']:.3f} {'OK' if degradation['f1'] < 0.05 else '⚠️'}
审计结论: {audit_status}
{'建议继续使用' if audit_status == 'PASS' else '建议密切监控' if audit_status == 'WARNING' else '建议暂停并重训练'}
特征稳定性 (PSI): {len(red_features)} 特征需上报
审批人: ________ 审批日期: ________
================================================
"""
print(audit_report)
return audit_report
monthly_model_audit('predictions_2026_06.csv', final_metrics, '2026-06')
8.3 新特征上线审批流程
python
def new_feature_approval_pipeline(feature_name, iv_value, psi_value,
discrimination_risk, business_justification):
"""新特征上线审批流程"""
approval_checklist = {
'1. IV 值达标': iv_value >= 0.1,
'2. PSI 稳定': psi_value < 0.25,
'3. 无歧视风险': discrimination_risk == 'LOW',
'4. 有业务依据': len(business_justification) > 0,
'5. 审计文档完整': True, # 需人工确认
'6. 模型性能不下降': True # 需 A/B 测试验证
}
all_pass = all(approval_checklist.values())
print(f"新特征审批清单: {feature_name}")
for step, passed in approval_checklist.items():
print(f" {'✓' if passed else '✗'} {step}")
status = 'APPROVED' if all_pass else 'REJECTED / CONDITIONAL'
print(f"\n审批结果: {status}")
if not all_pass:
failed_steps = [s for s, p in approval_checklist.items() if not p]
print(f"需补充: {failed_steps}")
return {'status': status, 'checklist': approval_checklist}
九、完整链路实战:LendingClub 信贷违约预测
9.1 实战项目代码框架
python
def full_credit_risk_pipeline(data_path, target_col='default'):
"""金融风控完整链路"""
print("=" * 60)
print("金融风控端到端项目启动")
print("=" * 60)
# Step 1: 数据加载
df = pd.read_csv(data_path)
print(f"\n[1] 数据加载: {len(df)} 行, {len(df.columns)} 列")
# Step 2: 合规审查
prohibited = ['gender', 'race', 'religion']
available_features = [f for f in df.columns if f not in prohibited + [target_col]]
print(f"[2] 合规审查: 移除 {len(prohibited)} 禁用特征, 保留 {len(available_features)} 可用特征")
# Step 3: WOE/IV 编码
feature_ivs = {}
for feat in available_features[:10]: # 取 Top 10
_, iv = calculate_woe_iv(df, feat, target_col)
feature_ivs[feat] = iv
selected = [f for f, iv in feature_ivs.items() if iv >= 0.1]
print(f"[3] WOE/IV 筛选: {len(selected)} 特征通过 (IV ≥ 0.1)")
# Step 4: 特征工程流水线
# ...(使用前面定义的 preprocessor)
# Step 5: 双模型训练
# ...(使用前面定义的 dual_model_strategy)
# Step 6: SHAP 合规输出
# ...(使用前面定义的 generate_compliance_shap_report)
# Step 7: 部署
# ...(使用前面定义的 FastAPI 服务)
# Step 8: 监控设计
# ...(使用前面定义的 financial_psi_monitor + monthly_model_audit)
print(f"\n[✓] 项目完成: 评分卡 + XGBoost 双模型、合规输出、监控体系")
return {
'scorecard_model': lr_pipe,
'xgboost_model': xgb_pipe,
'woe_mappings': woe_mappings,
'feature_ivs': feature_ivs,
'selected_features': selected
}
# 完整链路运行
results = full_credit_risk_pipeline('lendingclub_data.csv')
总结
金融风控的 ML 项目不是"训个模型"------而是"建一个合规系统"。从特征合规审查到 WOE/IV 编码,从评分卡可解释输出到监管审计文档,每一步都受合规约束驱动。逻辑回归评分卡不是"落后的算法"------而是监管场景下最可靠的选择。XGBoost 不是"替代评分卡"------而是性能提升的辅助模型。双模型并行策略让合规和性能不再矛盾
前文探讨的端到端 ML 项目实战一聚焦"工程化全流程"------本篇聚焦"合规约束下的 ML"。WOE/IV 编码、评分卡模型、SHAP 合规输出是金融行业专属知识。逻辑回归 vs 树模型的选型不再是"性能优先"而是"合规优先"------这个视角转换是金融风控的核心。前文的推荐系统基础提供了算法层面的方法论,本文的金融风控项目则展示了如何在强约束场景下落地这些方法论
如果觉得这篇金融风控全链路实战对理解合规约束下的 ML 项目有帮助,欢迎点赞收藏,关注专栏获取后续更新