端到端 ML 项目实战二：金融风控全链路——特征设计/模型选型/线上监控

文章目录

- 一、金融风控业务背景
- - [1.1 三大风控场景与指标选择](#1.1 三大风控场景与指标选择)
  - [1.2 成本矩阵驱动的阈值选择](#1.2 成本矩阵驱动的阈值选择)
- 二、数据审查与合规约束
- - [2.1 可用与禁用特征清单](#2.1 可用与禁用特征清单)
  - [2.2 特征审计文档模板](#2.2 特征审计文档模板)
- 三、特征设计方法论
- - [3.1 WOE 编码（证据权重）](#3.1 WOE 编码（证据权重）)
  - [3.2 WOE 编码的关键原理](#3.2 WOE 编码的关键原理)
  - [3.3 时间衰减特征](#3.3 时间衰减特征)
- 四、评分卡模型构建
- - [4.1 逻辑回归 → 评分刻度转换](#4.1 逻辑回归 → 评分刻度转换)
  - [4.2 评分卡可解释性](#4.2 评分卡可解释性)
- [五、模型选型决策：合规优先 vs 性能优先](#五、模型选型决策：合规优先 vs 性能优先)
- - [5.1 两种选型哲学](#5.1 两种选型哲学)
  - [5.2 实际选型策略](#5.2 实际选型策略)
  - [5.3 双模型并行策略](#5.3 双模型并行策略)
- [六、SHAP 合规输出](#六、SHAP 合规输出)
- - [6.1 监管审计文档模板](#6.1 监管审计文档模板)
  - [6.2 特征歧视性检测](#6.2 特征歧视性检测)
- 七、模型部署与实时决策
- - [7.1 推理 API 设计](#7.1 推理 API 设计)
  - [7.2 批量评分（每日 10 万笔）](#7.2 批量评分（每日 10 万笔）)
- 八、监控与合规迭代
- - [8.1 PSI 特征稳定性监控](#8.1 PSI 特征稳定性监控)
  - [8.2 模型性能月度审计](#8.2 模型性能月度审计)
  - [8.3 新特征上线审批流程](#8.3 新特征上线审批流程)
- [九、完整链路实战：LendingClub 信贷违约预测](#九、完整链路实战：LendingClub 信贷违约预测)
- - [9.1 实战项目代码框架](#9.1 实战项目代码框架)
- 总结

金融风控是 ML 变现最直接的场景------一笔贷款违约预测准确率提升 1%，可能意味着几千万的坏账损失减少。但金融场景有特殊约束：监管要求模型可解释、特征必须合规、实时决策延迟 < 100ms、模型变更需要审计记录------这些约束让"训个模型"变成了"建一个合规系统"。
#mermaid-svg-d2TMHczU0mnVtC3v{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-d2TMHczU0mnVtC3v .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-d2TMHczU0mnVtC3v .error-icon{fill:#552222;}#mermaid-svg-d2TMHczU0mnVtC3v .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-d2TMHczU0mnVtC3v .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-d2TMHczU0mnVtC3v .marker{fill:#333333;stroke:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v .marker.cross{stroke:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-d2TMHczU0mnVtC3v p{margin:0;}#mermaid-svg-d2TMHczU0mnVtC3v .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster-label text{fill:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster-label span{color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster-label span p{background-color:transparent;}#mermaid-svg-d2TMHczU0mnVtC3v .label text,#mermaid-svg-d2TMHczU0mnVtC3v span{fill:#333;color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .node rect,#mermaid-svg-d2TMHczU0mnVtC3v .node circle,#mermaid-svg-d2TMHczU0mnVtC3v .node ellipse,#mermaid-svg-d2TMHczU0mnVtC3v .node polygon,#mermaid-svg-d2TMHczU0mnVtC3v .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .rough-node .label text,#mermaid-svg-d2TMHczU0mnVtC3v .node .label text,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape .label,#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape .label{text-anchor:middle;}#mermaid-svg-d2TMHczU0mnVtC3v .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .rough-node .label,#mermaid-svg-d2TMHczU0mnVtC3v .node .label,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape .label,#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape .label{text-align:center;}#mermaid-svg-d2TMHczU0mnVtC3v .node.clickable{cursor:pointer;}#mermaid-svg-d2TMHczU0mnVtC3v .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v .arrowheadPath{fill:#333333;}#mermaid-svg-d2TMHczU0mnVtC3v .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-d2TMHczU0mnVtC3v .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-d2TMHczU0mnVtC3v .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-d2TMHczU0mnVtC3v .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-d2TMHczU0mnVtC3v .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-d2TMHczU0mnVtC3v .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-d2TMHczU0mnVtC3v .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster text{fill:#333;}#mermaid-svg-d2TMHczU0mnVtC3v .cluster span{color:#333;}#mermaid-svg-d2TMHczU0mnVtC3v div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-d2TMHczU0mnVtC3v .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-d2TMHczU0mnVtC3v rect.text{fill:none;stroke-width:0;}#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape p,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-d2TMHczU0mnVtC3v .icon-shape .label rect,#mermaid-svg-d2TMHczU0mnVtC3v .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-d2TMHczU0mnVtC3v .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-d2TMHczU0mnVtC3v .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-d2TMHczU0mnVtC3v :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 数据审查
合规筛选
WOE/IV 编码
评分卡 vs XGBoost
SHAP 合规输出
部署方案
PSI 监控 + 月度审计
监管报告

一、金融风控业务背景

1.1 三大风控场景与指标选择

金融风控不是单一问题------它包含三个核心场景，每个场景的业务约束和成功标准不同：

场景	目标定义	成功标准	成本矩阵
信贷违约预测	预测贷款是否违约	recall > 90%, precision > 70%	漏判违约 = 损失本金；误判正常 = 失去客户
反欺诈检测	识别欺诈交易	recall > 95%, < 50ms 响应	漏判欺诈 = 直接损失；误判正常 = 客户体验下降
信用评分	评估客户信用等级	排序准确性 + 可解释	评分偏差 = 风险定价错误

本篇聚焦信贷违约预测场景，使用 LendingClub 公开数据集。

1.2 成本矩阵驱动的阈值选择

金融场景不能用默认的 0.5 阈值------业务成本决定了最优阈值：

python 复制代码

def optimal_threshold_by_cost(y_true, y_prob, 
                                cost_fn,  # 漏判违约的成本（损失本金）
                                cost_fp,  # 误判正常的成本（失去客户）
                                cost_tp=0, cost_tn=0):
    """基于成本矩阵寻找最优阈值"""
    thresholds = np.arange(0.1, 0.9, 0.01)
    total_costs = []
    
    for t in thresholds:
        y_pred = (y_prob >= t).astype(int)
        # 计算每种预测结果的成本
        fn_mask = (y_true == 1) & (y_pred == 0)  # 漏判
        fp_mask = (y_true == 0) & (y_pred == 1)  # 误判
        tp_mask = (y_true == 1) & (y_pred == 1)  # 正确识别违约
        tn_mask = (y_true == 0) & (y_pred == 0)  # 正确识别正常
        
        total = fn_mask.sum() * cost_fn + fp_mask.sum() * cost_fp + \
                tp_mask.sum() * cost_tp + tn_mask.sum() * cost_tn
        total_costs.append(total)
    
    optimal_idx = np.argmin(total_costs)
    optimal_threshold = thresholds[optimal_idx]
    min_cost = total_costs[optimal_idx]
    
    print(f"最优阈值: {optimal_threshold:.2f}")
    print(f"最小总成本: {min_cost:.0f}")
    print(f"（漏判成本={cost_fn}, 误判成本={cost_fp}）")
    
    return optimal_threshold

# 示例：贷款本金 100000 元，客户价值 5000 元
optimal_t = optimal_threshold_by_cost(
    y_true=y_test, y_prob=y_prob,
    cost_fn=100000,  # 漏判违约 = 损失 10 万
    cost_fp=5000     # 误判正常 = 损失 5 千客户价值
)
# 最优阈值通常在 0.3~0.4（低于 0.5，偏向高 recall）

二、数据审查与合规约束

2.1 可用与禁用特征清单

金融风控的特征合规审查是其他领域不会涉及的特殊环节。监管要求不能使用可能导致歧视性决策的特征：

python 复制代码

def compliance_feature_audit(feature_list, prohibited_features, 
                              sensitive_but_allowed):
    """特征合规审查"""
    audit_results = []
    
    for feat in feature_list:
        if feat in prohibited_features:
            status = 'PROHIBITED'
            reason = '监管禁用------可能导致歧视性决策'
        elif feat in sensitive_but_allowed:
            status = 'SENSITIVE_ALLOWED'
            reason = '可用但需审计记录------不可作为主决策因素'
        else:
            status = 'ALLOWED'
            reason = '合规可用'
        
        audit_results.append({
            'feature': feat,
            'status': status,
            'reason': reason
        })
    
    # 打印审查报告
    prohibited = [r for r in audit_results if r['status'] == 'PROHIBITED']
    sensitive = [r for r in audit_results if r['status'] == 'SENSITIVE_ALLOWED']
    
    print(f"合规审查报告: {len(feature_list)} 特征")
    print(f"  禁用: {len(prohibited)} → {[r['feature'] for r in prohibited]}")
    print(f"  敏感但允许: {len(sensitive)} → {[r['feature'] for r in sensitive]}")
    print(f"  完全允许: {len(feature_list) - len(prohibited) - len(sensitive)}")
    
    return audit_results

# LendingClub 特征合规审查
all_features = [
    'loan_amnt', 'term', 'int_rate', 'installment', 'grade',
    'emp_length', 'home_ownership', 'annual_inc', 'dti', 
    'delinq_2yrs', 'fico_range_low', 'fico_range_high',
    'open_acc', 'pub_rec', 'revol_bal', 'revol_util',
    'total_acc', 'earliest_cr_line', 'application_type',
    'gender', 'race', 'age'  # 最后三个是敏感特征
]

prohibited = ['gender', 'race']
sensitive_but_allowed = ['age']  # 年龄可用于验证（如 <18 拒绝），但不能歧视

audit = compliance_feature_audit(all_features, prohibited, sensitive_but_allowed)

2.2 特征审计文档模板

合规场景要求每个特征都有审计文档------证明特征来源、使用理由、潜在歧视风险：

python 复制代码

def generate_feature_audit_doc(feature_name, source, justification, 
                                 discrimination_risk, mitigations):
    """生成特征审计文档"""
    template = f"""
    ========================================
    特征审计文档: {feature_name}
    ========================================
    
    1. 特征来源: {source}
    2. 使用理由: {justification}
    3. 潜在歧视风险: {discrimination_risk}
    4. 缓解措施: {mitigations}
    5. 数据类型: {dtype}
    6. 值域范围: [{min_val}, {max_val}]
    7. 缺失率: {missing_pct}%
    8. 审批状态: APPROVED / REJECTED / CONDITIONAL
    9. 审批人: {approver}
    10. 审批日期: {approval_date}
    """
    return template

# 示例：dti (负债收入比) 的审计文档
dti_audit = generate_feature_audit_doc(
    feature_name='dti',
    source='LendingClub 申请表------申请人自报负债与收入',
    justification='直接反映还款能力------违约风险的核心指标',
    discrimination_risk='LOW------与种族/性别无直接关联',
    mitigations='无特殊缓解需求，但需监控不同收入群体的评分分布'
)
print(dti_audit)

三、特征设计方法论

3.1 WOE 编码（证据权重）

WOE（Weight of Evidence）是信用评分行业的标准编码方法------它将每个特征分箱后，计算每一箱对正/负目标的区分能力：

python 复制代码

def calculate_woe_iv(df, feature, target, bins=10, min_pct=0.05):
    """WOE/IV 编码计算------信用评分行业标准"""
    
    # 分箱策略：数值特征用等频分箱，类别特征用原始类别
    if df[feature].dtype in ['object', 'category']:
        bins_df = df.groupby(feature)[target].agg(['sum', 'count'])
    else:
        # 等频分箱
        df['temp_bin'] = pd.qcut(df[feature], q=bins, duplicates='drop')
        bins_df = df.groupby('temp_bin')[target].agg(['sum', 'count'])
    
    bins_df.columns = ['events', 'total']
    bins_df['non_events'] = bins_df['total'] - bins_df['events']
    
    # 总事件数和非事件数
    total_events = bins_df['events'].sum()
    total_non_events = bins_df['non_events'].sum()
    
    # 计算每箱的分布
    bins_df['pct_events'] = bins_df['events'] / total_events
    bins_df['pct_non_events'] = bins_df['non_events'] / total_non_events
    
    # 处理 0 值（加微小值避免 log(0)）
    bins_df['pct_events'] = np.clip(bins_df['pct_events'], 1e-4, None)
    bins_df['pct_non_events'] = np.clip(bins_df['pct_non_events'], 1e-4, None)
    
    # WOE = ln(非事件占比 / 事件占比)
    bins_df['woe'] = np.log(bins_df['pct_non_events'] / bins_df['pct_events'])
    
    # IV = Σ(pct_non_events - pct_events) × woe
    bins_df['iv_contribution'] = (bins_df['pct_non_events'] - bins_df['pct_events']) * bins_df['woe']
    iv = bins_df['iv_contribution'].sum()
    
    # IV 值解读
    iv_interpretation = {
        '<0.02': '无用特征',
        '0.02-0.1': '弱预测力',
        '0.1-0.3': '中等预测力',
        '0.3-0.5': '强预测力',
        '>0.5': '极强预测力（需检查是否过拟合）'
    }
    
    # IV 值分类
    if iv < 0.02:
        iv_cat = '无用'
    elif iv < 0.1:
        iv_cat = '弱'
    elif iv < 0.3:
        iv_cat = '中等'
    elif iv < 0.5:
        iv_cat = '强'
    else:
        iv_cat = '极强/可疑'
    
    print(f"特征 '{feature}' IV={iv:.3f} ({iv_cat})")
    print(f"WOE 分布:")
    print(bins_df[['woe', 'pct_events', 'pct_non_events']].to_string())
    
    # IV 筛选标准：IV > 0.1 保留
    recommendation = '保留' if iv >= 0.1 else '考虑删除'
    print(f"筛选建议: {recommendation}")
    
    return bins_df, iv

# 批量计算所有特征的 IV 值
feature_ivs = {}
for feat in ['loan_amnt', 'dti', 'fico_range_low', 'revol_util', 
             'emp_length', 'delinq_2yrs', 'open_acc']:
    _, iv = calculate_woe_iv(df, feat, 'default', bins=10)
    feature_ivs[feat] = iv

# IV 筛选结果
selected_features = [f for f, iv in feature_ivs.items() if iv >= 0.1]
print(f"\nIV ≥ 0.1 的特征: {selected_features}")

3.2 WOE 编码的关键原理

WOE 编码不是简单的分箱------它有明确的统计含义：

WOE > 0：该箱中正常客户占比高于违约客户 → 这箱倾向于"正常"
WOE < 0：该箱中违约客户占比高于正常客户 → 这箱倾向于"违约"
WOE ≈ 0：该箱中两类客户占比几乎相同 → 该箱区分能力弱

IV 值是整个特征的区分能力度量------IV > 0.1 是行业筛选标准：

IV 范围	解读	处理建议
< 0.02	无预测力	删除
0.02~0.1	弱预测力	可保留但权重低
0.1~0.3	中等预测力	保留，重要特征
0.3~0.5	强预测力	保留，核心特征
> 0.5	极强/可疑	需检查------可能是目标泄漏

3.3 时间衰减特征

金融数据有时间衰减效应------最近 3 个月的行为比 12 个月的行为更重要：

python 复制代码

def create_time_decay_features(df, base_features, decay_rate=0.95):
    """时间衰减特征------近期行为权重更高"""
    
    # 1. 加权逾期次数（最近逾期权重更高）
    # 模拟：2年内逾期次数 × 时间衰减
    df['weighted_delinq'] = df['delinq_2yrs'] * (1 + df['delinq_2yrs'] * 0.1)
    
    # 2. 信用历史长度（月数）与年龄的比值
    # 年轻但信用历史长 = 负责任的表现
    df['credit_history_density'] = df['earliest_cr_line_months'] / (df['age_months'] + 1)
    
    # 3. 最近 6 个月 revolving utilization vs 总 revolving utilization
    # 如果最近 utilization 上升 = 流失风险信号（本数据集没有此字段，模拟）
    df['utilization_trend'] = df['revol_util'] * (1 + 0.1 * df['open_acc'] / df['total_acc'])
    
    # 4. DTI 分箱后的极端值标记
    # DTI > 40% = 高风险区间
    df['high_dti_flag'] = (df['dti'] > 40).astype(int)
    
    return df

df = create_time_decay_features(df, [])
print("时间衰减特征已创建")

四、评分卡模型构建

4.1 逻辑回归 → 评分刻度转换

评分卡（Scorecard）是金融行业的"可解释 ML 标准"------逻辑回归系数转换为用户友好的整数评分：

python 复制代码

def build_scorecard(lr_model, feature_names, woe_mappings, 
                     pdo=20, base_score=600, base_odds=50):
    """逻辑回归 → 评分卡转换
    
    核心公式: Score = BaseScore + PDO/ln(2) × (β₀ + Σ βᵢ × WOEᵢ)
    
    - PDO (Points to Double Odds): 评分每增加 PDO 分，好/坏比率翻倍
    - BaseScore: 基准分数（好/坏比 = BaseOdds 时的分数）
    - BaseOdds: 基准好/坏比率
    """
    factor = pdo / np.log(2)  # 评分因子
    offset = base_score - factor * np.log(base_odds)  # 基准偏移
    
    scorecard = []
    
    # 截距 → 基础分
    intercept_score = offset + factor * lr_model.intercept_[0]
    scorecard.append({
        'feature': 'BaseScore',
        'bin': '基础分',
        'woe': lr_model.intercept_[0],
        'coefficient': 1,
        'score': round(intercept_score)
    })
    
    # 每个特征 × 每个分箱 → 评分贡献
    for i, feat in enumerate(feature_names):
        coef = lr_model.coef_[0][i]
        if feat in woe_mappings:
            for bin_label, woe_val in woe_mappings[feat].items():
                score_contribution = factor * coef * woe_val
                scorecard.append({
                    'feature': feat,
                    'bin': bin_label,
                    'woe': woe_val,
                    'coefficient': coef,
                    'score': round(score_contribution)
                })
    
    scorecard_df = pd.DataFrame(scorecard)
    
    # 验证：评分范围是否合理（通常 300~850）
    min_score = scorecard_df[scorecard_df['feature'] != 'BaseScore']['score'].min()
    max_score = scorecard_df[scorecard_df['feature'] != 'BaseScore']['score'].max()
    total_range = intercept_score + sum(scorecard_df[scorecard_df['score'] < 0]['score']), \
                  intercept_score + sum(scorecard_df[scorecard_df['score'] > 0]['score'])
    
    print(f"评分卡构建完成:")
    print(f"  PDO={pdo}, 基准分={base_score}, 基准比率={base_odds}")
    print(f"  评分范围约: [{total_range[0]:.0f}, {total_range[1]:.0f}]")
    print(f"\n评分卡详情:")
    print(scorecard_df.to_string(index=False))
    
    return scorecard_df

# 模拟评分卡输出示例
print("=== 评分卡输出示例 ===")
print("客户评分 620 = 基准 600 + 负债收入比-20 + 信用历史+40")
print("客户评分 380 = 基准 600 + DTI极高风险-80 + 多次逾期-120 + 短信用历史-20")

4.2 评分卡可解释性

评分卡的核心优势是每个客户都能收到一份清晰的评分拆解：

python 复制代码

def scorecard_explanation(customer_features, scorecard_df, base_score):
    """评分卡单客户解释------监管要求的合规输出"""
    total_score = base_score
    contributions = []
    
    for feat, value in customer_features.items():
        # 找到该特征对应值所在的分箱
        feat_rows = scorecard_df[scorecard_df['feature'] == feat]
        # 简化：根据值找到最近的分箱
        if feat_rows['bin'].dtype == 'object':
            # 类别特征：直接匹配
            matching_row = feat_rows[feat_rows['bin'] == value]
        else:
            # 数值特征：找到包含该值的区间
            matching_row = feat_rows.iloc[0]  # 简化
        
        if len(matching_row) > 0:
            score_contribution = matching_row['score'].values[0]
            total_score += score_contribution
            contributions.append({
                'feature': feat,
                'value': value,
                'score_contribution': score_contribution,
                'direction': '提升信用' if score_contribution > 0 else '降低信用'
            })
    
    # 排序：影响最大的因素排在前面
    contributions.sort(key=lambda x: abs(x['score_contribution']), reverse=True)
    
    # 生成解释报告
    risk_level = '低风险' if total_score >= 650 else \
                 '中等风险' if total_score >= 500 else '高风险'
    
    report = [
        f"客户信用评分: {total_score} ({risk_level})",
        f"",
        f"评分拆解（按影响程度排序）:"
    ]
    
    for c in contributions[:5]:
        report.append(
            f"  • {c['feature']}={c['value']}: "
            f"贡献 {c['score_contribution']} 分 ({c['direction']})"
        )
    
    report.extend([
        f"",
        f"决策建议:",
        f"  {'批准贷款------建议利率基准+0.5%' if total_score >= 650 else "批准贷款------需附加条件' if total_score >= 500 else '拒绝贷款或要求抵押物'}"
    ])
    
    return '\n'.join(report)

# 示例客户
customer = {'dti': 15, 'fico_range_low': 700, 'delinq_2yrs': 0, 
            'revol_util': 25, 'emp_length': '10+ years'}
print(scorecard_explanation(customer, scorecard_df, 600))

五、模型选型决策：合规优先 vs 性能优先

5.1 两种选型哲学

金融场景的模型选型不是"谁精度高选谁"------而是合规要求决定选型边界：

模型	可解释性	精度	延迟	监管合规	适用场景
逻辑回归（评分卡）	★★★★★	★★★	★★★★★	★★★★★	监管审计严格
XGBoost + SHAP	★★★★	★★★★★	★★★★	★★★★	性能优先+事后解释
LightGBM	★★★	★★★★★	★★★★★	★★★	大数据量+快速迭代
Stacking	★	★★★★★	★★	★	不推荐------监管无法审计

5.2 实际选型策略

python 复制代码

def financial_model_selection(data_size, regulatory_level, latency_requirement,
                               explainability_requirement):
    """金融场景模型选型决策器"""
    
    # 监管等级 → 可解释性最低要求
    explainability_threshold = {
        'strict': 4,    # 银行信贷------必须评分卡级别
        'moderate': 3,  # 互联网金融------需事后解释
        'loose': 2      # 内部风控------可接受黑箱
    }
    
    min_explain = explainability_threshold.get(regulatory_level, 3)
    
    candidates = [
        {'model': 'LogisticRegression_Scorecard', 'explainability': 5, 
         'accuracy': 3, 'latency': 5, 'compliance': 5},
        {'model': 'XGBoost_SHAP', 'explainability': 4,
         'accuracy': 5, 'latency': 4, 'compliance': 4},
        {'model': 'LightGBM', 'explainability': 3,
         'accuracy': 5, 'latency': 5, 'compliance': 3},
        {'model': 'Stacking', 'explainability': 1,
         'accuracy': 5, 'latency': 2, 'compliance': 1},
    ]
    
    # 过滤：满足最低可解释性要求
    qualified = [c for c in candidates if c['explainability'] >= min_explain]
    
    # 在合格候选中按精度排序
    qualified.sort(key=lambda x: x['accuracy'], reverse=True)
    
    print(f"选型结果 (监管等级={regulatory_level}):")
    print(f"  最低可解释性要求: {min_explain}")
    for c in qualified:
        print(f"  ✓ {c['model']} (解释={c['explainability']}, 精度={c['accuracy']})")
    
    rejected = [c for c in candidates if c['explainability'] < min_explain]
    for c in rejected:
        print(f"  ✗ {c['model']} (解释={c['explainability']} < 最低要求)")
    
    return qualified[0]  # 推荐：精度最高的合规模型

# 银行信贷场景
recommended = financial_model_selection(
    data_size=50000, regulatory_level='strict',
    latency_requirement=100, explainability_requirement='high'
)
# → 推荐: LogisticRegression_Scorecard

# 互联网金融场景
recommended2 = financial_model_selection(
    data_size=500000, regulatory_level='moderate',
    latency_requirement=50, explainability_requirement='medium'
)
# → 推荐: XGBoost_SHAP

5.3 双模型并行策略

金融实践中常用"双轨制"：评分卡作为合规主模型 + XGBoost 作为性能提升辅助模型：

python 复制代码

def dual_model_strategy(X_train, y_train, X_test, y_test, preprocessor):
    """双模型并行策略------合规主模型 + 性能辅助模型"""
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import roc_auc_score, f1_score, recall_score
    
    # 1. 合规主模型：逻辑回归评分卡
    lr = LogisticRegression(class_weight='balanced', max_iter=1000, random_state=42)
    lr_pipe = Pipeline([('preprocessor', preprocessor), ('model', lr)])
    lr_pipe.fit(X_train, y_train)
    
    lr_prob = lr_pipe.predict_proba(X_test)[:, 1]
    lr_pred = (lr_prob >= 0.35).astype(int)  # 成本矩阵驱动的阈值
    
    # 2. 性能辅助模型：XGBoost
    xgb = XGBClassifier(
        n_estimators=300, max_depth=6, learning_rate=0.05,
        scale_pos_weight=4, eval_metric='logloss', random_state=42
    )
    xgb_pipe = Pipeline([('preprocessor', preprocessor), ('model', xgb)])
    xgb_pipe.fit(X_train, y_train)
    
    xgb_prob = xgb_pipe.predict_proba(X_test)[:, 1]
    xgb_pred = (xgb_prob >= 0.35).astype(int)
    
    # 3. 双模型一致性检查
    agreement = (lr_pred == xgb_pred).mean()
    print(f"双模型一致性: {agreement:.1%}")
    
    # 4. 分歧处理：两模型预测不一致时用哪个？
    # 策略：不一致时信任合规主模型（逻辑回归）
    final_pred = lr_pred.copy()
    disagreement_mask = lr_pred != xgb_pred
    # 高风险分歧（XGBoost 预测违约但 LR 预测正常）→ 人工复核
    high_risk_disagreement = disagreement_mask & (xgb_pred == 1) & (lr_pred == 0)
    
    # 5. 汇总指标
    print(f"\n合规主模型 (LR 评分卡):")
    print(f"  Recall: {recall_score(y_test, lr_pred):.3f}")
    print(f"  F1: {f1_score(y_test, lr_pred):.3f}")
    print(f"  ROC-AUC: {roc_auc_score(y_test, lr_prob):.3f}")
    
    print(f"\n性能辅助模型 (XGBoost):")
    print(f"  Recall: {recall_score(y_test, xgb_pred):.3f}")
    print(f"  F1: {f1_score(y_test, xgb_pred):.3f}")
    print(f"  ROC-AUC: {roc_auc_score(y_test, xgb_prob):.3f}")
    
    print(f"\n分歧案例数: {disagreement_mask.sum()} (高风险分歧: {high_risk_disagreement.sum()})")
    print(f"处理策略: 高风险分歧 → 人工复核队列")
    
    return {'lr_pipe': lr_pipe, 'xgb_pipe': xgb_pipe}

dual_results = dual_model_strategy(X_train, y_train, X_test, y_test, preprocessor)

六、SHAP 合规输出

6.1 监管审计文档模板

SHAP 解释在金融场景中不只是"技术工具"------它是监管合规的必要输出。以下是合规级别的 SHAP 报告模板：

python 复制代码

def generate_compliance_shap_report(customer_idx, shap_values, feature_names,
                                     prediction, threshold, risk_level):
    """SHAP 合规输出------满足监管审计要求"""
    
    sv = shap_values[customer_idx]
    top_idx = np.argsort(np.abs(sv))[-7:]
    
    # 特征贡献方向表
    contributions = []
    for idx in reversed(top_idx):
        impact = sv[idx]
        contributions.append({
            'feature': feature_names[idx],
            'value': X_processed[customer_idx, idx],
            'shap_value': round(impact, 4),
            'direction': '正向（提升信用）' if impact < 0 else '负向（增加违约风险）',
            'weight_pct': round(abs(impact) / sum(abs(sv)) * 100, 1)
        })
    
    # 生成监管审计报告
    report = f"""
    ================================================
    信贷违约预测合规解释报告
    ================================================
    
    1. 预测结果:
       - 违约概率: {prediction:.2%}
       - 风险等级: {risk_level}
       - 决策阈值: {threshold}
       - 决策: {'拒绝贷款' if prediction >= threshold else '批准贷款'}
    
    2. 关键影响因素 (Top 7):
       {'---' * 50}
    """
    
    for c in contributions:
        report += f"\n       • {c['feature']}: 值={c['value']:.2f}, " \
                  f"SHAP={c['shap_value']:+.4f}, " \
                  f"方向={c['direction']}, " \
                  f"权重={c['weight_pct']}%"
    
    report += f"""
    
    3. 模型信息:
       - 模型类型: XGBoost
       - 版本: v2.1.0
       - 训练日期: 2026-06-01
       - 特征数: {len(feature_names)}
    
    4. 合规声明:
       本预测未使用性别、种族等受保护特征。
       所有特征使用均经过合规审查并留有审计记录。
    
    5. 申诉渠道:
       如对本决策有异议，可通过以下渠道申诉：
       - 客服热线: XXX-XXXX
       - 申诉邮箱: appeal@bank.com
       - 处理时限: 5 个工作日
    
    ================================================
    审批人: ________  审批日期: ________
    ================================================
    """
    
    return report

# 示例
compliance_report = generate_compliance_shap_report(
    customer_idx=42, shap_values=shap_values, feature_names=feature_names,
    prediction=0.72, threshold=0.35, risk_level='高风险'
)
print(compliance_report)

6.2 特征歧视性检测

即使没有使用禁用特征，模型也可能通过合法特征间接关联到敏感属性。需要检测模型的间接歧视：

python 复制代码

def indirect_discrimination_check(model_predictions, sensitive_attribute, 
                                   non_sensitive_features):
    """检测模型是否通过合法特征间接歧视敏感群体"""
    
    # 方法：检查模型预测在不同敏感群体间的分布差异
    group_a = model_predictions[sensitive_attribute == 0]  # 如：男性
    group_b = model_predictions[sensitive_attribute == 1]  # 如：女性
    
    # KS 检验：预测分布是否有显著差异
    ks_stat, p_value = ks_2samp(group_a, group_b)
    
    # 平均违约概率差异
    mean_diff = group_b.mean() - group_a.mean()
    
    print(f"间接歧视检测:")
    print(f"  群体A平均预测: {group_a.mean():.3f}")
    print(f"  群体B平均预测: {group_b.mean():.3f}")
    print(f"  差异: {mean_diff:+.3f}")
    print(f"  KS 统计量: {ks_stat:.3f} (p={p_value:.4f})")
    
    if abs(mean_diff) > 0.05:
        print(f"  ⚠️ 潜在间接歧视------需进一步审查特征关联性")
    elif abs(mean_diff) > 0.02:
        print(f"  ⚡ 微弱差异------建议监控但不需立即行动")
    else:
        print(f"  ✓ 无显著间接歧视风险")
    
    return {'mean_diff': mean_diff, 'ks_stat': ks_stat, 'p_value': p_value}

七、模型部署与实时决策

7.1 推理 API 设计

金融风控的推理 API 有特殊要求------响应延迟 < 100ms、输入合规校验、输出含版本号：

python 复制代码

from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel, Field, validator
import time

app = FastAPI(title="Credit Risk API", version="2.1.0")

class LoanApplication(BaseModel):
    """贷款申请数据------合规校验"""
    loan_amnt: float = Field(..., gt=0, description="贷款金额")
    term: str = Field(..., description="贷款期限")
    int_rate: float = Field(..., ge=0, le=30, description="利率")
    dti: float = Field(..., ge=0, description="负债收入比")
    fico_range_low: float = Field(..., ge=300, le=850, description="最低FICO")
    emp_length: str = Field(..., description="工作年限")
    annual_inc: float = Field(..., gt=0, description="年收入")
    delinq_2yrs: int = Field(..., ge=0, description="2年内逾期次数")
    revol_util: float = Field(..., ge=0, le=100, description="循环信用利用率")
    
    # ⚠️ 禁用特征不接受
    @validator('term')
    def validate_term(cls, v):
        if v not in ['36 months', '60 months']:
            raise ValueError('term must be 36 or 60 months')
        return v

class RiskDecision(BaseModel):
    """风控决策输出"""
    application_id: str
    default_probability: float
    decision: str  # approve / reject / manual_review
    risk_level: str
    scorecard_score: int
    xgboost_score: float
    model_version: str
    inference_time_ms: float
    compliance_hash: str  # 审计哈希------记录每个决策的唯一标识

@app.post("/evaluate", response_model=RiskDecision)
async def evaluate_loan(application: LoanApplication, request: Request):
    start_time = time.time()
    
    # 1. 输入合规校验（Pydantic 已做基本校验）
    # 2. 特征预处理
    input_df = pd.DataFrame([application.dict()])
    input_df = create_business_features(input_df)
    
    # 3. 双模型预测
    lr_prob = dual_results['lr_pipe'].predict_proba(input_df)[0, 1]
    xgb_prob = dual_results['xgb_pipe'].predict_proba(input_df)[0, 1]
    
    # 4. 决策逻辑（合规主模型优先）
    threshold = 0.35  # 成本矩阵驱动
    final_prob = lr_prob
    decision = 'reject' if final_prob >= threshold else \
               'manual_review' if abs(final_prob - threshold) < 0.1 else 'approve'
    
    # 5. 评分卡分数估算
    scorecard_score = int(600 + (0.5 - final_prob) * 200)
    
    # 6. 审计哈希（不可篡改的决策记录）
    compliance_hash = hashlib.sha256(
        f"{application.dict()}{final_prob}{decision}{model_version}".encode()
    ).hexdigest()[:12]
    
    inference_time = (time.time() - start_time) * 1000
    
    return RiskDecision(
        application_id=f"APP_{int(time.time())}",
        default_probability=round(final_prob, 4),
        decision=decision,
        risk_level=risk_level(final_prob),
        scorecard_score=scorecard_score,
        xgboost_score=round(xgb_prob, 4),
        model_version="v2.1.0",
        inference_time_ms=round(inference_time, 2),
        compliance_hash=compliance_hash
    )

7.2 批量评分（每日 10 万笔）

python 复制代码

def batch_scoring(applications_df, model_pipeline, threshold=0.35):
    """批量评分------每日定时任务"""
    
    # 预处理
    processed = model_pipeline.named_steps['preprocessor'].transform(applications_df)
    
    # 批量预测
    probabilities = model_pipeline.named_steps['model'].predict_proba(processed)[:, 1]
    
    # 评分与决策
    decisions = np.where(probabilities >= threshold, 'reject', 
                         np.where(probabilities >= threshold - 0.1, 'manual_review', 'approve'))
    
    # 批量结果
    results = pd.DataFrame({
        'application_id': applications_df.index,
        'default_probability': probabilities,
        'decision': decisions,
        'scorecard_score': 600 + (0.5 - probabilities) * 200,
        'batch_id': f"BATCH_{pd.Timestamp.now().strftime('%Y%m%d_%H%M')}"
    })
    
    # 统计
    print(f"批量评分完成: {len(results)} 笔")
    print(f"  拒绝: {(decisions == 'reject').sum()} ({(decisions == 'reject').mean():.1%})")
    print(f"  人工复核: {(decisions == 'manual_review').sum()}")
    print(f"  批准: {(decisions == 'approve').sum()}")
    
    return results

# 每日批量评分示例
daily_batch = batch_scoring(df_new_applications, dual_results['lr_pipe'])

八、监控与合规迭代

8.1 PSI 特征稳定性监控

金融场景对特征稳定性有严格要求------PSI > 0.25 的特征必须上报监管：

python 复制代码

def financial_psi_monitor(reference_df, current_df, features, 
                           regulatory_threshold=0.25):
    """金融级 PSI 监控------含监管上报逻辑"""
    
    psi_results = []
    for feat in features:
        psi_val = calculate_psi(reference_df[feat], current_df[feat])['psi']
        
        # 金融行业标准分级
        if psi_val < 0.1:
            status = 'GREEN------稳定'
            action = '继续使用'
        elif psi_val < regulatory_threshold:
            status = 'YELLOW------轻微变化'
            action = '密切监控，更新文档'
        else:
            status = 'RED------显著变化'
            action = '暂停使用 + 向监管上报 + 重新训练'
        
        psi_results.append({
            'feature': feat,
            'psi': psi_val,
            'status': status,
            'action': action
        })
    
    # 生成监管报告
    red_features = [r for r in psi_results if 'RED' in r['status']]
    
    print(f"PSI 监控报告:")
    for r in psi_results:
        print(f"  {r['feature']}: PSI={r['psi']:.3f} → {r['status']}")
    
    if red_features:
        print(f"\n⚠️ 需上报监管的特征: {[r['feature'] for r in red_features]}")
    
    return psi_results

psi_report = financial_psi_monitor(df_train, df_current_month, 
                                    selected_features, regulatory_threshold=0.25)

8.2 模型性能月度审计

python 复制代码

def monthly_model_audit(predictions_log, reference_metrics, audit_month):
    """月度模型审计------向监管提交"""
    
    recent = pd.read_csv(predictions_log)
    
    # 计算近期指标
    from sklearn.metrics import roc_auc_score, f1_score, recall_score, precision_score
    
    recent_auc = roc_auc_score(recent['actual'], recent['predicted_prob'])
    recent_recall = recall_score(recent['actual'], recent['predicted'])
    recent_precision = precision_score(recent['actual'], recent['predicted'])
    recent_f1 = f1_score(recent['actual'], recent['predicted'])
    
    # 与参考指标对比
    degradation = {
        'auc': reference_metrics['roc_auc'] - recent_auc,
        'recall': reference_metrics['recall'] - recent_recall,
        'f1': reference_metrics['f1'] - recent_f1
    }
    
    # 审计判断
    audit_status = 'PASS' if all(d < 0.05 for d in degradation.values()) else \
                   'WARNING' if all(d < 0.10 for d in degradation.values()) else 'FAIL'
    
    audit_report = f"""
    ================================================
    模型月度审计报告 --- {audit_month}
    ================================================
    
    模型版本: v2.1.0
    审计范围: {len(recent)} 笔贷款申请
    
    性能对比:
    {'---' * 50}
    指标          参考值    近期值    下降幅度    状态
    ROC-AUC       {reference_metrics['roc_auc']:.3f}    {recent_auc:.3f}    {degradation['auc']:.3f}     {'OK' if degradation['auc'] < 0.05 else '⚠️'}
    Recall         {reference_metrics['recall']:.3f}    {recent_recall:.3f}    {degradation['recall']:.3f}     {'OK' if degradation['recall'] < 0.05 else '⚠️'}
    F1            {reference_metrics['f1']:.3f}    {recent_f1:.3f}    {degradation['f1']:.3f}     {'OK' if degradation['f1'] < 0.05 else '⚠️'}
    
    审计结论: {audit_status}
    
    {'建议继续使用' if audit_status == 'PASS' else '建议密切监控' if audit_status == 'WARNING' else '建议暂停并重训练'}
    
    特征稳定性 (PSI): {len(red_features)} 特征需上报
    
    审批人: ________  审批日期: ________
    ================================================
    """
    
    print(audit_report)
    return audit_report

monthly_model_audit('predictions_2026_06.csv', final_metrics, '2026-06')

8.3 新特征上线审批流程

python 复制代码

def new_feature_approval_pipeline(feature_name, iv_value, psi_value,
                                    discrimination_risk, business_justification):
    """新特征上线审批流程"""
    
    approval_checklist = {
        '1. IV 值达标': iv_value >= 0.1,
        '2. PSI 稳定': psi_value < 0.25,
        '3. 无歧视风险': discrimination_risk == 'LOW',
        '4. 有业务依据': len(business_justification) > 0,
        '5. 审计文档完整': True,  # 需人工确认
        '6. 模型性能不下降': True  # 需 A/B 测试验证
    }
    
    all_pass = all(approval_checklist.values())
    
    print(f"新特征审批清单: {feature_name}")
    for step, passed in approval_checklist.items():
        print(f"  {'✓' if passed else '✗'} {step}")
    
    status = 'APPROVED' if all_pass else 'REJECTED / CONDITIONAL'
    print(f"\n审批结果: {status}")
    
    if not all_pass:
        failed_steps = [s for s, p in approval_checklist.items() if not p]
        print(f"需补充: {failed_steps}")
    
    return {'status': status, 'checklist': approval_checklist}

九、完整链路实战：LendingClub 信贷违约预测

9.1 实战项目代码框架

python 复制代码

def full_credit_risk_pipeline(data_path, target_col='default'):
    """金融风控完整链路"""
    
    print("=" * 60)
    print("金融风控端到端项目启动")
    print("=" * 60)
    
    # Step 1: 数据加载
    df = pd.read_csv(data_path)
    print(f"\n[1] 数据加载: {len(df)} 行, {len(df.columns)} 列")
    
    # Step 2: 合规审查
    prohibited = ['gender', 'race', 'religion']
    available_features = [f for f in df.columns if f not in prohibited + [target_col]]
    print(f"[2] 合规审查: 移除 {len(prohibited)} 禁用特征, 保留 {len(available_features)} 可用特征")
    
    # Step 3: WOE/IV 编码
    feature_ivs = {}
    for feat in available_features[:10]:  # 取 Top 10
        _, iv = calculate_woe_iv(df, feat, target_col)
        feature_ivs[feat] = iv
    selected = [f for f, iv in feature_ivs.items() if iv >= 0.1]
    print(f"[3] WOE/IV 筛选: {len(selected)} 特征通过 (IV ≥ 0.1)")
    
    # Step 4: 特征工程流水线
    # ...（使用前面定义的 preprocessor）
    
    # Step 5: 双模型训练
    # ...（使用前面定义的 dual_model_strategy）
    
    # Step 6: SHAP 合规输出
    # ...（使用前面定义的 generate_compliance_shap_report）
    
    # Step 7: 部署
    # ...（使用前面定义的 FastAPI 服务）
    
    # Step 8: 监控设计
    # ...（使用前面定义的 financial_psi_monitor + monthly_model_audit）
    
    print(f"\n[✓] 项目完成: 评分卡 + XGBoost 双模型、合规输出、监控体系")
    return {
        'scorecard_model': lr_pipe,
        'xgboost_model': xgb_pipe,
        'woe_mappings': woe_mappings,
        'feature_ivs': feature_ivs,
        'selected_features': selected
    }

# 完整链路运行
results = full_credit_risk_pipeline('lendingclub_data.csv')

总结

金融风控的 ML 项目不是"训个模型"------而是"建一个合规系统"。从特征合规审查到 WOE/IV 编码，从评分卡可解释输出到监管审计文档，每一步都受合规约束驱动。逻辑回归评分卡不是"落后的算法"------而是监管场景下最可靠的选择。XGBoost 不是"替代评分卡"------而是性能提升的辅助模型。双模型并行策略让合规和性能不再矛盾

前文探讨的端到端 ML 项目实战一聚焦"工程化全流程"------本篇聚焦"合规约束下的 ML"。WOE/IV 编码、评分卡模型、SHAP 合规输出是金融行业专属知识。逻辑回归 vs 树模型的选型不再是"性能优先"而是"合规优先"------这个视角转换是金融风控的核心。前文的推荐系统基础提供了算法层面的方法论，本文的金融风控项目则展示了如何在强约束场景下落地这些方法论

如果觉得这篇金融风控全链路实战对理解合规约束下的 ML 项目有帮助，欢迎点赞收藏，关注专栏获取后续更新