机器学习评估指标详解 - 进阶篇

本文是机器学习评估指标系列的第二篇，深入讲解训练过程指标、模型参数指标、超参数调优指标和模型验证指标，帮助您全面理解机器学习模型的训练和优化过程。

引言

在入门篇中，我们学习了数据质量、前处理和基础评估指标。在进阶篇中，我们将深入探讨：

📊 图表生成说明 ：本文档中的所有图表都可以通过运行 docs/ml_metrics_visualization_complete.py 脚本生成。每个指标部分都包含了相应的图表说明和代码引用。

训练过程监控：如何实时监控模型训练状态
模型复杂度评估：参数量、计算量等指标
超参数优化：如何选择最佳超参数
模型验证策略：交叉验证、留出法等

这些指标帮助我们：

及时发现训练问题（过拟合、欠拟合、梯度消失等）
优化模型架构和超参数
评估模型的泛化能力
平衡模型性能和计算资源

训练过程指标

1. 损失函数指标

训练损失 (Training Loss)

定义：模型在训练集上的损失值

监控目的：

观察模型是否在学习
识别训练异常（损失不下降、损失爆炸等）

Python实现：

python 复制代码

import matplotlib.pyplot as plt
import numpy as np

class LossTracker:
    """
    损失函数跟踪器
    """
    def __init__(self):
        self.train_losses = []
        self.val_losses = []
        self.epochs = []
    
    def update(self, epoch, train_loss, val_loss=None):
        """
        更新损失值
        """
        self.epochs.append(epoch)
        self.train_losses.append(train_loss)
        if val_loss is not None:
            self.val_losses.append(val_loss)
    
    def plot(self):
        """
        绘制损失曲线
        """
        plt.figure(figsize=(12, 5))
        
        plt.subplot(1, 2, 1)
        plt.plot(self.epochs, self.train_losses, label='Training Loss', color='blue')
        if self.val_losses:
            plt.plot(self.epochs, self.val_losses, label='Validation Loss', color='red')
        plt.xlabel('Epoch')
        plt.ylabel('Loss')
        plt.title('Loss Curve')
        plt.legend()
        plt.grid(True, alpha=0.3)
        
        # 绘制对数尺度损失（如果损失值很大）
        plt.subplot(1, 2, 2)
        plt.semilogy(self.epochs, self.train_losses, label='Training Loss', color='blue')
        if self.val_losses:
            plt.semilogy(self.epochs, self.val_losses, label='Validation Loss', color='red')
        plt.xlabel('Epoch')
        plt.ylabel('Loss (log scale)')
        plt.title('Loss Curve (Log Scale)')
        plt.legend()
        plt.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    def analyze(self):
        """
        分析损失趋势
        """
        if len(self.train_losses) < 2:
            return "数据不足"
        
        # 计算损失下降率
        recent_losses = self.train_losses[-10:] if len(self.train_losses) >= 10 else self.train_losses
        loss_trend = np.polyfit(range(len(recent_losses)), recent_losses, 1)[0]
        
        analysis = {
            'current_train_loss': self.train_losses[-1],
            'min_train_loss': min(self.train_losses),
            'loss_trend': loss_trend,
            'is_decreasing': loss_trend < 0
        }
        
        if self.val_losses:
            analysis['current_val_loss'] = self.val_losses[-1]
            analysis['min_val_loss'] = min(self.val_losses)
            analysis['overfitting'] = self.val_losses[-1] > self.train_losses[-1] * 1.1
        
        return analysis

验证损失 (Validation Loss)

定义：模型在验证集上的损失值

关键观察：

训练损失 < 验证损失：正常情况
训练损失 << 验证损失：可能过拟合
训练损失 ≈ 验证损失：可能欠拟合或数据不足

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_loss_function_comparison
plot_loss_function_comparison()

损失函数类型

1. 分类任务损失

python 复制代码

import torch
import torch.nn as nn

# 交叉熵损失
criterion_ce = nn.CrossEntropyLoss()

# Focal Loss（处理类别不平衡）
class FocalLoss(nn.Module):
    def __init__(self, alpha=0.25, gamma=2.0):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
    
    def forward(self, inputs, targets):
        ce_loss = nn.CrossEntropyLoss(reduction='none')(inputs, targets)
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1 - pt) ** self.gamma * ce_loss
        return focal_loss.mean()

# Label Smoothing Loss
class LabelSmoothingLoss(nn.Module):
    def __init__(self, num_classes, smoothing=0.1):
        super(LabelSmoothingLoss, self).__init__()
        self.num_classes = num_classes
        self.smoothing = smoothing
    
    def forward(self, inputs, targets):
        log_probs = nn.functional.log_softmax(inputs, dim=1)
        with torch.no_grad():
            true_dist = torch.zeros_like(log_probs)
            true_dist.fill_(self.smoothing / (self.num_classes - 1))
            true_dist.scatter_(1, targets.unsqueeze(1), 1.0 - self.smoothing)
        return torch.mean(torch.sum(-true_dist * log_probs, dim=1))

2. 回归任务损失

python 复制代码

# MSE Loss
mse_loss = nn.MSELoss()

# MAE Loss
mae_loss = nn.L1Loss()

# Huber Loss（对异常值更鲁棒）
huber_loss = nn.HuberLoss(delta=1.0)

# Smooth L1 Loss
smooth_l1_loss = nn.SmoothL1Loss()

3. 目标检测损失

python 复制代码

# IoU Loss
def iou_loss(pred_boxes, target_boxes):
    """
    计算IoU损失
    """
    iou = calculate_iou_batch(pred_boxes, target_boxes)
    return 1 - iou.mean()

# GIoU Loss
def giou_loss(pred_boxes, target_boxes):
    """
    计算GIoU损失
    """
    giou = calculate_giou_batch(pred_boxes, target_boxes)
    return 1 - giou.mean()

# CIoU Loss
def ciou_loss(pred_boxes, target_boxes):
    """
    计算CIoU损失
    """
    ciou = calculate_ciou_batch(pred_boxes, target_boxes)
    return 1 - ciou.mean()

2. 准确率指标

训练准确率 vs 验证准确率

监控目的：

识别过拟合：训练准确率 >> 验证准确率
识别欠拟合：训练准确率和验证准确率都很低
确定最佳训练轮数：验证准确率不再提升时停止

Python实现：

python 复制代码

class AccuracyTracker:
    """
    准确率跟踪器
    """
    def __init__(self):
        self.train_accuracies = []
        self.val_accuracies = []
        self.epochs = []
    
    def update(self, epoch, train_acc, val_acc=None):
        self.epochs.append(epoch)
        self.train_accuracies.append(train_acc)
        if val_acc is not None:
            self.val_accuracies.append(val_acc)
    
    def plot(self):
        plt.figure(figsize=(10, 6))
        plt.plot(self.epochs, self.train_accuracies, 'o-', label='Training Accuracy', color='blue')
        if self.val_accuracies:
            plt.plot(self.epochs, self.val_accuracies, 'o-', label='Validation Accuracy', color='red')
        plt.xlabel('Epoch')
        plt.ylabel('Accuracy')
        plt.title('Accuracy Curve')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.show()
    
    def get_best_epoch(self):
        """
        获取最佳验证准确率对应的epoch
        """
        if not self.val_accuracies:
            return None
        best_idx = np.argmax(self.val_accuracies)
        return {
            'epoch': self.epochs[best_idx],
            'val_accuracy': self.val_accuracies[best_idx],
            'train_accuracy': self.train_accuracies[best_idx]
        }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_training_curves
plot_training_curves()

3. 梯度指标

梯度范数 (Gradient Norm)

定义：所有参数梯度的L2范数

监控目的：

梯度爆炸：梯度范数突然增大
梯度消失：梯度范数接近0
训练稳定性：梯度范数应该稳定下降

Python实现：

python 复制代码

def calculate_gradient_norm(model):
    """
    计算模型梯度的L2范数
    """
    total_norm = 0
    for param in model.parameters():
        if param.grad is not None:
            param_norm = param.grad.data.norm(2)
            total_norm += param_norm.item() ** 2
    total_norm = total_norm ** (1. / 2)
    return total_norm

class GradientTracker:
    """
    梯度跟踪器
    """
    def __init__(self):
        self.gradient_norms = []
        self.epochs = []
    
    def update(self, epoch, model):
        grad_norm = calculate_gradient_norm(model)
        self.epochs.append(epoch)
        self.gradient_norms.append(grad_norm)
    
    def plot(self):
        plt.figure(figsize=(10, 6))
        plt.plot(self.epochs, self.gradient_norms, 'o-', color='green')
        plt.axhline(y=1.0, color='red', linestyle='--', label='Normal Range')
        plt.xlabel('Epoch')
        plt.ylabel('Gradient Norm')
        plt.title('Gradient Norm Over Training')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.yscale('log')
        plt.show()
    
    def analyze(self):
        """
        分析梯度问题
        """
        if len(self.gradient_norms) < 2:
            return "数据不足"
        
        current_norm = self.gradient_norms[-1]
        max_norm = max(self.gradient_norms)
        min_norm = min(self.gradient_norms)
        
        issues = []
        if current_norm > 100:
            issues.append("梯度爆炸：梯度范数过大")
        if current_norm < 1e-6:
            issues.append("梯度消失：梯度范数过小")
        if max_norm / min_norm > 1000:
            issues.append("梯度不稳定：梯度变化过大")
        
        return {
            'current_norm': current_norm,
            'max_norm': max_norm,
            'min_norm': min_norm,
            'issues': issues
        }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_gradient_analysis
plot_gradient_analysis()

梯度裁剪 (Gradient Clipping)

目的：防止梯度爆炸

实现：

python 复制代码

def clip_gradients(model, max_norm=1.0):
    """
    梯度裁剪
    """
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

4. 学习率指标

学习率调度

常见策略：

固定学习率

python 复制代码

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

StepLR：每隔固定epoch降低学习率

python 复制代码

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

ExponentialLR：指数衰减

python 复制代码

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95)

CosineAnnealingLR：余弦退火

python 复制代码

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100)

ReduceLROnPlateau：验证损失不再下降时降低学习率

python 复制代码

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=10
)

学习率跟踪：

python 复制代码

class LearningRateTracker:
    """
    学习率跟踪器
    """
    def __init__(self):
        self.learning_rates = []
        self.epochs = []
    
    def update(self, epoch, optimizer):
        current_lr = optimizer.param_groups[0]['lr']
        self.epochs.append(epoch)
        self.learning_rates.append(current_lr)
    
    def plot(self):
        plt.figure(figsize=(10, 6))
        plt.plot(self.epochs, self.learning_rates, 'o-', color='purple')
        plt.xlabel('Epoch')
        plt.ylabel('Learning Rate')
        plt.title('Learning Rate Schedule')
    plt.yscale('log')
    plt.grid(True, alpha=0.3)
    plt.show()

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_learning_rate_schedules
plot_learning_rate_schedules()

5. 权重分布指标

权重统计

监控目的：

观察权重是否正常更新
识别权重初始化问题
检测权重爆炸或消失

Python实现：

python 复制代码

def analyze_weight_distribution(model, layer_name=None):
    """
    分析权重分布
    """
    weight_stats = {}
    
    for name, param in model.named_parameters():
        if 'weight' in name and (layer_name is None or layer_name in name):
            weights = param.data.cpu().numpy().flatten()
            
            weight_stats[name] = {
                'mean': np.mean(weights),
                'std': np.std(weights),
                'min': np.min(weights),
                'max': np.max(weights),
                'median': np.median(weights),
                'zero_ratio': np.sum(weights == 0) / len(weights)
            }
    
    return weight_stats

def plot_weight_distribution(model, layer_name):
    """
    绘制权重分布直方图
    """
    for name, param in model.named_parameters():
        if layer_name in name and 'weight' in name:
            weights = param.data.cpu().numpy().flatten()
            
            plt.figure(figsize=(10, 6))
            plt.hist(weights, bins=50, alpha=0.7)
            plt.xlabel('Weight Value')
            plt.ylabel('Frequency')
            plt.title(f'Weight Distribution: {name}')
            plt.grid(True, alpha=0.3)
            plt.show()
                    break

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_weight_distribution_analysis
plot_weight_distribution_analysis()

模型参数指标

1. 模型复杂度指标

参数量 (Parameter Count)

定义：模型中可训练参数的总数

计算公式：

复制代码

总参数量 = Σ(各层参数量)

Python实现：

python 复制代码

def count_parameters(model):
    """
    计算模型参数量
    """
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    
    return {
        'total_params': total_params,
        'trainable_params': trainable_params,
        'non_trainable_params': total_params - trainable_params,
        'total_params_million': total_params / 1e6,
        'total_params_billion': total_params / 1e9
    }

def print_model_summary(model, input_size=(3, 224, 224)):
    """
    打印模型摘要
    """
    total_params = count_parameters(model)
    
    print("=" * 60)
    print("Model Summary")
    print("=" * 60)
    print(f"Total Parameters: {total_params['total_params']:,}")
    print(f"Trainable Parameters: {total_params['trainable_params']:,}")
    print(f"Non-trainable Parameters: {total_params['non_trainable_params']:,}")
    print(f"Total Parameters: {total_params['total_params_million']:.2f}M")
    print("=" * 60)
    
    # 按层统计
    print("\nLayer-wise Parameter Count:")
    print("-" * 60)
    for name, param in model.named_parameters():
        print(f"{name:50s} {param.numel():>15,}")

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_model_complexity
plot_model_complexity()

扩展解析：

模型复杂度分析包括三个关键维度：

参数量 vs 准确率：评估模型效率，寻找准确率和参数量的平衡点
FLOPs vs 准确率：评估计算效率，FLOPs越低，推理速度越快
模型效率对比：综合考虑准确率、参数量效率和计算效率

最佳实践：

参数量 < 10M：适合移动端部署
FLOPs < 1G：适合实时推理
准确率 > 90%：满足大多数应用需求

FLOPs (Floating Point Operations)

定义：模型前向传播所需的浮点运算次数

Python实现：

python 复制代码

try:
    from thop import profile, clever_format
    
    def calculate_flops(model, input_size=(1, 3, 224, 224)):
        """
        计算模型FLOPs
        """
        dummy_input = torch.randn(input_size)
        flops, params = profile(model, inputs=(dummy_input,))
        flops, params = clever_format([flops, params], "%.3f")
        
        return {
            'flops': flops,
            'params': params
        }
except ImportError:
    print("请安装thop: pip install thop")

模型大小 (Model Size)

定义：模型文件占用的存储空间

Python实现：

python 复制代码

def calculate_model_size(model, save_path='temp_model.pth'):
    """
    计算模型大小（MB）
    """
    torch.save(model.state_dict(), save_path)
    import os
    size_mb = os.path.getsize(save_path) / (1024 * 1024)
    os.remove(save_path)
    
    return {
        'size_mb': size_mb,
        'size_kb': size_mb * 1024,
        'size_bytes': size_mb * 1024 * 1024
    }

2. 内存占用指标

前向传播内存

Python实现：

python 复制代码

import torch

def estimate_memory_usage(model, input_size, batch_size=1):
    """
    估算模型内存占用
    """
    # 模型参数内存
    param_memory = sum(p.numel() * p.element_size() for p in model.parameters())
    
    # 梯度内存（训练时）
    grad_memory = param_memory  # 梯度大小与参数相同
    
    # 激活值内存（估算）
    # 这里需要根据具体模型架构估算
    # 简化估算：假设激活值大小与输入大小相关
    input_memory = batch_size * np.prod(input_size) * 4  # float32 = 4 bytes
    
    total_memory = param_memory + grad_memory + input_memory
    
    return {
        'param_memory_mb': param_memory / (1024 ** 2),
        'grad_memory_mb': grad_memory / (1024 ** 2),
        'activation_memory_mb': input_memory / (1024 ** 2),
        'total_memory_mb': total_memory / (1024 ** 2),
        'total_memory_gb': total_memory / (1024 ** 3)
    }

3. 推理速度指标

推理时间 (Inference Time)

Python实现：

python 复制代码

import time

def measure_inference_time(model, input_size, num_runs=100, device='cuda'):
    """
    测量推理时间
    """
    model.eval()
    model = model.to(device)
    
    # 预热
    dummy_input = torch.randn(input_size).to(device)
    with torch.no_grad():
        for _ in range(10):
            _ = model(dummy_input)
    
    # 测量
    torch.cuda.synchronize() if device == 'cuda' else None
    start_time = time.time()
    
    with torch.no_grad():
        for _ in range(num_runs):
            _ = model(dummy_input)
    
    torch.cuda.synchronize() if device == 'cuda' else None
    end_time = time.time()
    
    avg_time = (end_time - start_time) / num_runs
    fps = 1.0 / avg_time
    
    return {
        'avg_inference_time_ms': avg_time * 1000,
        'fps': fps,
        'total_time_s': end_time - start_time
    }

吞吐量 (Throughput)

定义：单位时间内处理的样本数

Python实现：

python 复制代码

def measure_throughput(model, input_size, batch_sizes=[1, 4, 8, 16, 32], device='cuda'):
    """
    测量不同batch size下的吞吐量
    """
    results = {}
    
    for batch_size in batch_sizes:
        batch_input_size = (batch_size,) + input_size[1:]
        time_result = measure_inference_time(model, batch_input_size, device=device)
        results[batch_size] = {
            'fps': time_result['fps'],
            'samples_per_second': batch_size * time_result['fps']
        }
    
    return results

超参数调优指标

1. 超参数重要性分析

超参数敏感性分析

Python实现：

python 复制代码

import optuna

def hyperparameter_sensitivity_analysis(study):
    """
    分析超参数敏感性
    """
    importance = optuna.importance.get_param_importances(study)
    
    # 可视化
    import matplotlib.pyplot as plt
    
    params = list(importance.keys())
    values = list(importance.values())
    
    plt.figure(figsize=(10, 6))
    plt.barh(params, values)
    plt.xlabel('Importance')
    plt.title('Hyperparameter Importance')
    plt.tight_layout()
    plt.show()
    
    return importance

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_hyperparameter_tuning
plot_hyperparameter_tuning()

2. 超参数搜索策略

网格搜索 (Grid Search)

Python实现：

python 复制代码

from sklearn.model_selection import GridSearchCV

def grid_search_hyperparameters(estimator, param_grid, X, y, cv=5):
    """
    网格搜索超参数
    """
    grid_search = GridSearchCV(
        estimator, param_grid, cv=cv, 
        scoring='accuracy', n_jobs=-1, verbose=1
    )
    grid_search.fit(X, y)
    
    return {
        'best_params': grid_search.best_params_,
        'best_score': grid_search.best_score_,
        'cv_results': grid_search.cv_results_
    }

随机搜索 (Random Search)

Python实现：

python 复制代码

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

def random_search_hyperparameters(estimator, param_distributions, X, y, n_iter=100, cv=5):
    """
    随机搜索超参数
    """
    random_search = RandomizedSearchCV(
        estimator, param_distributions, n_iter=n_iter,
        cv=cv, scoring='accuracy', n_jobs=-1, verbose=1, random_state=42
    )
    random_search.fit(X, y)
    
    return {
        'best_params': random_search.best_params_,
        'best_score': random_search.best_score_,
        'cv_results': random_search.cv_results_
    }

贝叶斯优化 (Bayesian Optimization)

Python实现：

python 复制代码

import optuna

def bayesian_optimization(objective, n_trials=100):
    """
    贝叶斯优化超参数
    """
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=n_trials)
    
    return {
        'best_params': study.best_params,
        'best_value': study.best_value,
        'study': study
    }

3. 超参数调优可视化

超参数关系图

Python实现：

python 复制代码

def plot_hyperparameter_relationships(study):
    """
    绘制超参数关系图
    """
    fig = optuna.visualization.plot_parallel_coordinate(study)
    fig.show()
    
    fig = optuna.visualization.plot_contour(study)
    fig.show()
    
    fig = optuna.visualization.plot_slice(study)
    fig.show()

模型验证指标

1. 交叉验证指标

K折交叉验证 (K-Fold Cross-Validation)

Python实现：

python 复制代码

from sklearn.model_selection import cross_val_score, KFold

def k_fold_cross_validation(estimator, X, y, k=5, scoring='accuracy'):
    """
    K折交叉验证
    """
    kfold = KFold(n_splits=k, shuffle=True, random_state=42)
    scores = cross_val_score(estimator, X, y, cv=kfold, scoring=scoring)
    
    return {
        'scores': scores,
        'mean_score': scores.mean(),
        'std_score': scores.std(),
        'min_score': scores.min(),
        'max_score': scores.max()
    }

分层K折交叉验证 (Stratified K-Fold)

Python实现：

python 复制代码

from sklearn.model_selection import StratifiedKFold

def stratified_k_fold_cv(estimator, X, y, k=5, scoring='accuracy'):
    """
    分层K折交叉验证（保持类别比例）
    """
    skfold = StratifiedKFold(n_splits=k, shuffle=True, random_state=42)
    scores = cross_val_score(estimator, X, y, cv=skfold, scoring=scoring)
    
    return {
        'scores': scores,
        'mean_score': scores.mean(),
        'std_score': scores.std()
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_cross_validation_results
plot_cross_validation_results()

2. 留出法 (Hold-Out)

Python实现：

python 复制代码

from sklearn.model_selection import train_test_split

def hold_out_validation(X, y, test_size=0.2, random_state=42):
    """
    留出法验证
    """
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state
    )
    
    return {
        'X_train': X_train,
        'X_test': X_test,
        'y_train': y_train,
        'y_test': y_test
    }

3. 时间序列交叉验证

Python实现：

python 复制代码

from sklearn.model_selection import TimeSeriesSplit

def time_series_cross_validation(estimator, X, y, n_splits=5):
    """
    时间序列交叉验证
    """
    tscv = TimeSeriesSplit(n_splits=n_splits)
    scores = cross_val_score(estimator, X, y, cv=tscv)
    
    return {
        'scores': scores,
        'mean_score': scores.mean(),
        'std_score': scores.std()
    }

目标检测进阶指标

1. AP (Average Precision) - 平均精度

计算方法

11点插值法（Pascal VOC标准）：

python 复制代码

def calculate_ap_11point(precision, recall):
    """
    使用11点插值法计算AP
    """
    ap = 0.0
    for t in np.arange(0, 1.1, 0.1):
        if np.sum(recall >= t) == 0:
            p = 0
        else:
            p = np.max(precision[recall >= t])
        ap += p / 11.0
    return ap

所有点插值法（COCO标准）：

python 复制代码

def calculate_ap_all_points(precision, recall):
    """
    使用所有点插值法计算AP（COCO标准）
    """
    mrec = np.concatenate(([0.0], recall, [1.0]))
    mpre = np.concatenate(([0.0], precision, [0.0]))
    
    # 确保Precision是单调递减的
    for i in range(mpre.size - 1, 0, -1):
        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
    
    # 找到Recall值变化的位置
    i = np.where(mrec[1:] != mrec[:-1])[0]
    
    # 计算AP（曲线下面积）
    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap

2. AR (Average Recall) - 平均召回率

Python实现：

python 复制代码

def calculate_ar(detections, ground_truth, max_dets_list=[1, 10, 100], iou_threshold=0.5):
    """
    计算不同maxDets下的AR
    """
    ar_dict = {}
    
    for max_dets in max_dets_list:
        limited_detections = detections[:max_dets]
        
        tp = 0
        fn = 0
        
        for gt in ground_truth:
            matched = False
            for det in limited_detections:
                if calculate_iou(det.bbox, gt.bbox) >= iou_threshold:
                    matched = True
                    break
            if matched:
                tp += 1
            else:
                fn += 1
        
        recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
        ar_dict[max_dets] = recall
    
    return ar_dict

3. mAP (Mean Average Precision) - 平均精度均值

Python实现：

python 复制代码

def calculate_map(all_class_aps, iou_thresholds=np.arange(0.5, 1.0, 0.05)):
    """
    计算COCO标准的mAP
    """
    map_per_iou = []
    for iou_thresh in iou_thresholds:
        aps_at_iou = [ap_dict[iou_thresh] for ap_dict in all_class_aps]
        map_per_iou.append(np.mean(aps_at_iou))
    
    map_value = np.mean(map_per_iou)
    return map_value

损失函数详解

1. 分类损失函数

Cross Entropy Loss

公式：

复制代码

CE = -log(p)

特点：

最基础的分类损失
对困难样本关注不够
类别不平衡时效果差

Focal Loss

公式：

复制代码

Focal Loss = -α(1-p)^γ log(p)

参数：

α：类别权重（通常0.25）
γ：聚焦参数（通常2.0）

特点：

自动关注困难样本
解决类别不平衡问题

Label Smoothing Loss

公式：

复制代码

LS Loss = -(1-ε)log(p) - εlog(1-p)

参数：

ε：平滑参数（通常0.1）

特点：

防止模型过度自信
提高泛化能力

2. 回归损失函数

MSE Loss

公式：

复制代码

MSE = (1/n) × Σ(y - ŷ)²

MAE Loss

公式：

复制代码

MAE = (1/n) × Σ|y - ŷ|

Huber Loss

公式：

复制代码

Huber = { 0.5 × (y - ŷ)²,  if |y - ŷ| ≤ δ
        { δ × |y - ŷ| - 0.5 × δ²,  otherwise

特点：

对异常值更鲁棒
结合MSE和MAE的优点

3. 目标检测损失函数

IoU Loss

公式：

复制代码

IoU Loss = 1 - IoU

GIoU Loss

公式：

复制代码

GIoU Loss = 1 - GIoU
GIoU = IoU - |C \ (A ∪ B)| / |C|

CIoU Loss

公式：

复制代码

CIoU Loss = 1 - CIoU
CIoU = IoU - (ρ²(b, b^gt) / c²) - (αv)

特点：

最全面的IoU损失函数
同时考虑重叠、中心距离和宽高比

优化器指标

1. SGD优化器

特点：

简单稳定
需要手动调整学习率
可能陷入局部最优

2. Adam优化器

特点：

自适应学习率
收敛速度快
内存占用较大

3. AdamW优化器

特点：

Adam的改进版本
权重衰减解耦
更好的泛化能力

4. 优化器性能对比

Python实现：

python 复制代码

def compare_optimizers(model, train_loader, optimizers, num_epochs=10):
    """
    比较不同优化器的性能
    """
    results = {}
    
    for opt_name, optimizer in optimizers.items():
        model_copy = copy.deepcopy(model)
        train_losses = []
        
        for epoch in range(num_epochs):
            epoch_loss = 0
            for batch in train_loader:
                # 训练代码
                loss = train_step(model_copy, batch, optimizer)
                epoch_loss += loss
            
            train_losses.append(epoch_loss / len(train_loader))
        
        results[opt_name] = train_losses
    
    # 可视化
    plt.figure(figsize=(10, 6))
    for opt_name, losses in results.items():
        plt.plot(losses, label=opt_name)
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Optimizer Comparison')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()
    
    return results

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_optimizer_comparison
plot_optimizer_comparison()

正则化指标

1. L1正则化 (Lasso)

公式：

复制代码

Loss = Original Loss + λ × Σ|w|

特点：

产生稀疏权重
特征选择

2. L2正则化 (Ridge)

公式：

复制代码

Loss = Original Loss + λ × Σw²

特点：

防止权重过大
提高泛化能力

3. Dropout

定义：训练时随机丢弃部分神经元

Python实现：

python 复制代码

class DropoutTracker:
    """
    Dropout效果跟踪
    """
    def __init__(self):
        self.train_accuracies = []
        self.val_accuracies = []
        self.dropout_rates = []
    
    def evaluate_dropout_rate(self, model, dropout_rate, train_loader, val_loader):
        """
        评估不同dropout率的效果
        """
        # 设置dropout率
        for module in model.modules():
            if isinstance(module, nn.Dropout):
                module.p = dropout_rate
        
        # 训练和评估
        train_acc = evaluate(model, train_loader)
        val_acc = evaluate(model, val_loader)
        
        self.dropout_rates.append(dropout_rate)
        self.train_accuracies.append(train_acc)
        self.val_accuracies.append(val_acc)
    
    def plot(self):
        plt.figure(figsize=(10, 6))
        plt.plot(self.dropout_rates, self.train_accuracies, 'o-', label='Train')
        plt.plot(self.dropout_rates, self.val_accuracies, 'o-', label='Val')
        plt.xlabel('Dropout Rate')
        plt.ylabel('Accuracy')
        plt.title('Dropout Rate Analysis')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.show()

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_regularization_effects
plot_regularization_effects()

4. Batch Normalization

定义：对每个batch进行标准化

效果指标：

训练稳定性
收敛速度
最终性能

总结

核心要点回顾

训练过程指标
- 损失函数：训练损失、验证损失
- 准确率：训练准确率、验证准确率
- 梯度：梯度范数、梯度裁剪
- 学习率：学习率调度、学习率跟踪
模型参数指标
- 参数量：总参数量、可训练参数量
- FLOPs：计算复杂度
- 模型大小：存储空间
- 内存占用：训练内存、推理内存
超参数调优指标
- 搜索策略：网格搜索、随机搜索、贝叶斯优化
- 超参数重要性：敏感性分析
- 可视化：超参数关系图
模型验证指标
- 交叉验证：K折、分层K折
- 留出法：训练集/测试集划分
- 时间序列验证：时间序列交叉验证

最佳实践

训练监控
- 实时监控损失和准确率
- 设置早停机制
- 记录最佳模型
模型优化
- 平衡模型复杂度和性能
- 考虑计算资源限制
- 优化超参数
验证策略
- 使用交叉验证评估泛化能力
- 保持数据分布一致性
- 避免数据泄露

下一步学习

在高级篇中，我们将深入学习：

后处理指标：NMS、阈值优化等
模型量化指标：精度损失、压缩比、加速比等
部署指标：延迟、吞吐量、资源占用等
全流程监控指标：端到端性能评估

相关资源：

机器学习评估指标详解 - 进阶篇