机器学习评估指标详解 - 入门篇

本文是机器学习评估指标系列的第一篇，面向初学者，全面介绍机器学习全流程中的基础评估指标，包括数据质量、前处理、分类、回归、聚类等任务的核心指标。

引言

机器学习项目的成功不仅依赖于模型算法，更依赖于对数据的深入理解和全流程的指标监控。本入门篇将系统介绍从数据准备到模型评估的完整指标体系。

机器学习全流程概览

复制代码

数据收集 → 数据质量评估 → 数据前处理 → 特征工程 → 模型训练 → 模型验证 → 模型部署 → 模型监控
    ↓           ↓            ↓          ↓         ↓         ↓         ↓         ↓
  数据指标   质量指标     预处理指标   特征指标  训练指标  验证指标  部署指标  监控指标

为什么需要全面的指标？

数据质量指标：确保数据可用性，避免"垃圾进，垃圾出"
前处理指标：评估数据预处理效果，保证数据质量
训练指标：监控训练过程，及时发现过拟合等问题
验证指标：评估模型泛化能力，选择最佳模型
部署指标：确保模型在生产环境中的性能

数据质量评估指标

1. 数据完整性指标

缺失值率 (Missing Value Rate)

定义：数据集中缺失值占总样本的比例

计算公式：

复制代码

缺失值率 = (缺失值数量 / 总样本数) × 100%

评估标准：

优秀：< 5%
良好：5% - 10%
一般：10% - 20%
较差：> 20%

Python实现：

python 复制代码

import pandas as pd
import numpy as np

def calculate_missing_rate(df):
    """
    计算数据集的缺失值率
    
    Args:
        df: pandas DataFrame
    
    Returns:
        dict: 各列的缺失值率
    """
    missing_stats = {}
    total_rows = len(df)
    
    for col in df.columns:
        missing_count = df[col].isna().sum()
        missing_rate = (missing_count / total_rows) * 100
        missing_stats[col] = {
            'missing_count': missing_count,
            'missing_rate': missing_rate,
            'status': 'excellent' if missing_rate < 5 else 
                     'good' if missing_rate < 10 else 
                     'fair' if missing_rate < 20 else 'poor'
        }
    
    return missing_stats

# 示例
df = pd.DataFrame({
    'feature1': [1, 2, np.nan, 4, 5],
    'feature2': [1, np.nan, np.nan, 4, 5],
    'feature3': [1, 2, 3, 4, 5]
})

missing_stats = calculate_missing_rate(df)
for col, stats in missing_stats.items():
    print(f"{col}: {stats['missing_rate']:.2f}% ({stats['status']})")

可视化图表：

生成图表代码：

python 复制代码

# 运行完整可视化脚本生成所有图表
python docs/ml_metrics_visualization_complete.py

# 或单独生成缺失值分析图表
from docs.ml_metrics_visualization_complete import plot_missing_value_analysis
plot_missing_value_analysis()

数据覆盖率 (Data Coverage Rate)

定义：有效数据占总数据的比例

计算公式：

复制代码

数据覆盖率 = (有效数据数量 / 总数据数量) × 100%

应用场景：

评估数据采集质量
识别数据采集盲点
指导数据补充策略

2. 数据一致性指标

重复值率 (Duplicate Rate)

定义：数据集中重复样本的比例

计算公式：

复制代码

重复值率 = (重复样本数 / 总样本数) × 100%

Python实现：

python 复制代码

def calculate_duplicate_rate(df, subset=None):
    """
    计算重复值率
    
    Args:
        df: pandas DataFrame
        subset: 用于判断重复的列（None表示所有列）
    
    Returns:
        float: 重复值率
    """
    total_rows = len(df)
    duplicate_count = df.duplicated(subset=subset).sum()
    duplicate_rate = (duplicate_count / total_rows) * 100
    
    return {
        'duplicate_count': duplicate_count,
        'duplicate_rate': duplicate_rate,
        'unique_rows': total_rows - duplicate_count
    }

数据一致性检查

检查项：

格式一致性：日期格式、数值格式等
编码一致性：字符编码、类别编码等
单位一致性：度量单位统一性

Python实现：

python 复制代码

def check_data_consistency(df):
    """
    检查数据一致性
    """
    consistency_issues = []
    
    # 检查数值列的一致性
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    for col in numeric_cols:
        # 检查异常值（使用IQR方法）
        Q1 = df[col].quantile(0.25)
        Q3 = df[col].quantile(0.75)
        IQR = Q3 - Q1
        outliers = df[(df[col] < Q1 - 1.5*IQR) | (df[col] > Q3 + 1.5*IQR)]
        if len(outliers) > 0:
            consistency_issues.append({
                'column': col,
                'issue': 'outliers',
                'count': len(outliers),
                'rate': len(outliers) / len(df) * 100
            })
    
    return consistency_issues

3. 数据分布指标

偏度 (Skewness)

定义：衡量数据分布的不对称程度

计算公式：

复制代码

偏度 = E[(X - μ)³] / σ³

取值范围：

偏度 = 0：对称分布（如正态分布）
偏度 > 0：右偏（正偏），尾部向右延伸
偏度 < 0：左偏（负偏），尾部向左延伸

评估标准：

|偏度| < 0.5：近似对称
0.5 ≤ |偏度| < 1：中等偏斜
|偏度| ≥ 1：高度偏斜

Python实现：

python 复制代码

from scipy import stats

def calculate_skewness(data):
    """
    计算偏度
    """
    skewness = stats.skew(data)
    
    if abs(skewness) < 0.5:
        interpretation = "近似对称分布"
    elif abs(skewness) < 1:
        interpretation = "中等偏斜"
    else:
        interpretation = "高度偏斜"
    
    return {
        'skewness': skewness,
        'interpretation': interpretation,
        'direction': '右偏' if skewness > 0 else '左偏' if skewness < 0 else '对称'
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_data_distribution_analysis
plot_data_distribution_analysis()

峰度 (Kurtosis)

定义：衡量数据分布的尖锐程度（相对于正态分布）

计算公式：

复制代码

峰度 = E[(X - μ)⁴] / σ⁴ - 3

取值范围：

峰度 = 0：与正态分布相同
峰度 > 0：尖峰分布（Leptokurtic），数据更集中
峰度 < 0：平峰分布（Platykurtic），数据更分散

Python实现：

python 复制代码

def calculate_kurtosis(data):
    """
    计算峰度
    """
    kurtosis = stats.kurtosis(data)
    
    if abs(kurtosis) < 0.5:
        interpretation = "接近正态分布"
    elif kurtosis > 0.5:
        interpretation = "尖峰分布，数据集中"
    else:
        interpretation = "平峰分布，数据分散"
    
    return {
        'kurtosis': kurtosis,
        'interpretation': interpretation
    }

4. 数据平衡性指标

类别不平衡度 (Class Imbalance Ratio)

定义：衡量分类任务中各类别样本数量的不平衡程度

计算公式：

复制代码

不平衡度 = max(各类别样本数) / min(各类别样本数)

评估标准：

不平衡度 < 2：平衡
2 ≤ 不平衡度 < 10：轻微不平衡
10 ≤ 不平衡度 < 100：中度不平衡
不平衡度 ≥ 100：严重不平衡

Python实现：

python 复制代码

def calculate_class_imbalance(y):
    """
    计算类别不平衡度
    
    Args:
        y: 类别标签数组
    
    Returns:
        dict: 不平衡度统计
    """
    from collections import Counter
    
    class_counts = Counter(y)
    counts = list(class_counts.values())
    
    max_count = max(counts)
    min_count = min(counts)
    imbalance_ratio = max_count / min_count if min_count > 0 else float('inf')
    
    # 计算每个类别的占比
    total = sum(counts)
    class_proportions = {cls: count/total for cls, count in class_counts.items()}
    
    if imbalance_ratio < 2:
        status = "平衡"
    elif imbalance_ratio < 10:
        status = "轻微不平衡"
    elif imbalance_ratio < 100:
        status = "中度不平衡"
    else:
        status = "严重不平衡"
    
    return {
        'imbalance_ratio': imbalance_ratio,
        'status': status,
        'class_counts': class_counts,
        'class_proportions': class_proportions,
        'max_class': max(class_counts, key=class_counts.get),
        'min_class': min(class_counts, key=class_counts.get)
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_class_imbalance_analysis
plot_class_imbalance_analysis()

数据前处理指标

1. 标准化/归一化指标

Z-Score标准化效果

定义：评估标准化后数据的分布特征

计算公式：

复制代码

Z = (X - μ) / σ

评估指标：

均值接近0：标准化成功
标准差接近1：标准化成功
数据范围：通常在[-3, 3]之间

Python实现：

python 复制代码

from sklearn.preprocessing import StandardScaler
import numpy as np

def evaluate_scaling(data, method='standard'):
    """
    评估标准化/归一化效果
    
    Args:
        data: 原始数据
        method: 'standard' (Z-score) 或 'minmax' (Min-Max)
    
    Returns:
        dict: 评估结果
    """
    if method == 'standard':
        scaler = StandardScaler()
        scaled_data = scaler.fit_transform(data)
        
        evaluation = {
            'original_mean': np.mean(data, axis=0),
            'scaled_mean': np.mean(scaled_data, axis=0),
            'original_std': np.std(data, axis=0),
            'scaled_std': np.std(scaled_data, axis=0),
            'scaled_min': np.min(scaled_data, axis=0),
            'scaled_max': np.max(scaled_data, axis=0),
            'mean_error': np.abs(np.mean(scaled_data, axis=0)),
            'std_error': np.abs(np.std(scaled_data, axis=0) - 1)
        }
        
        # 评估标准化质量
        mean_quality = "优秀" if np.all(evaluation['mean_error'] < 0.01) else \
                      "良好" if np.all(evaluation['mean_error'] < 0.1) else "一般"
        std_quality = "优秀" if np.all(evaluation['std_error'] < 0.01) else \
                     "良好" if np.all(evaluation['std_error'] < 0.1) else "一般"
        
        evaluation['mean_quality'] = mean_quality
        evaluation['std_quality'] = std_quality
        
    elif method == 'minmax':
        from sklearn.preprocessing import MinMaxScaler
        scaler = MinMaxScaler()
        scaled_data = scaler.fit_transform(data)
        
        evaluation = {
            'original_min': np.min(data, axis=0),
            'original_max': np.max(data, axis=0),
            'scaled_min': np.min(scaled_data, axis=0),
            'scaled_max': np.max(scaled_data, axis=0),
            'range': np.max(scaled_data, axis=0) - np.min(scaled_data, axis=0)
        }
        
        # 评估归一化质量（应该在[0,1]范围内）
        range_quality = "优秀" if np.allclose(evaluation['range'], 1.0) else "一般"
        evaluation['range_quality'] = range_quality
    
    return evaluation

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_scaling_comparison
plot_scaling_comparison()

2. 特征选择指标

特征重要性得分

定义：评估特征对目标变量的贡献程度

常用方法：

互信息 (Mutual Information)
卡方检验 (Chi-square Test)
相关系数 (Correlation Coefficient)
特征重要性 (Feature Importance)

Python实现：

python 复制代码

from sklearn.feature_selection import mutual_info_classif, chi2, f_classif
from sklearn.feature_selection import SelectKBest
import pandas as pd

def evaluate_feature_importance(X, y, method='mutual_info'):
    """
    评估特征重要性
    
    Args:
        X: 特征矩阵
        y: 目标变量
        method: 'mutual_info', 'chi2', 'f_test', 'correlation'
    
    Returns:
        dict: 特征重要性得分
    """
    if method == 'mutual_info':
        scores = mutual_info_classif(X, y, random_state=42)
    elif method == 'chi2':
        scores, _ = chi2(X, y)
    elif method == 'f_test':
        scores, _ = f_classif(X, y)
    elif method == 'correlation':
        if len(np.unique(y)) == 2:  # 二分类
            scores = np.abs([np.corrcoef(X[:, i], y)[0, 1] for i in range(X.shape[1])])
        else:  # 回归
            scores = np.abs([np.corrcoef(X[:, i], y)[0, 1] for i in range(X.shape[1])])
    
    # 归一化得分到[0, 1]
    scores_normalized = (scores - scores.min()) / (scores.max() - scores.min() + 1e-10)
    
    feature_importance = {
        'scores': scores,
        'scores_normalized': scores_normalized,
        'ranking': np.argsort(scores)[::-1],  # 从高到低排序
        'top_k_features': np.argsort(scores)[::-1][:10]  # Top 10特征
    }
    
    return feature_importance

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_feature_importance
plot_feature_importance()

3. 特征工程指标

特征相关性矩阵

定义：评估特征之间的相关性，识别冗余特征

计算公式：

复制代码

Pearson相关系数 = Cov(X, Y) / (σX × σY)

评估标准：

|r| < 0.3：弱相关
0.3 ≤ |r| < 0.7：中等相关
|r| ≥ 0.7：强相关（可能存在冗余）

Python实现：

python 复制代码

import seaborn as sns
import matplotlib.pyplot as plt

def analyze_feature_correlation(df, threshold=0.7):
    """
    分析特征相关性
    
    Args:
        df: 特征DataFrame
        threshold: 相关性阈值
    
    Returns:
        dict: 相关性分析结果
    """
    corr_matrix = df.corr()
    
    # 找出高度相关的特征对
    high_corr_pairs = []
    for i in range(len(corr_matrix.columns)):
        for j in range(i+1, len(corr_matrix.columns)):
            corr_value = corr_matrix.iloc[i, j]
            if abs(corr_value) >= threshold:
                high_corr_pairs.append({
                    'feature1': corr_matrix.columns[i],
                    'feature2': corr_matrix.columns[j],
                    'correlation': corr_value
                })
    
    # 识别冗余特征（与多个特征高度相关）
    redundant_features = {}
    for pair in high_corr_pairs:
        f1, f2 = pair['feature1'], pair['feature2']
        if f1 not in redundant_features:
            redundant_features[f1] = []
        if f2 not in redundant_features:
            redundant_features[f2] = []
        redundant_features[f1].append(f2)
        redundant_features[f2].append(f1)
    
    return {
        'correlation_matrix': corr_matrix,
        'high_corr_pairs': high_corr_pairs,
        'redundant_features': redundant_features,
        'recommendation': f"建议移除与{threshold}以上特征高度相关的特征"
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_correlation_heatmap
plot_correlation_heatmap()

分类任务评估指标

1. 基础分类指标

准确率 (Accuracy)

定义：正确预测的样本占总样本的比例

计算公式：

复制代码

Accuracy = (TP + TN) / (TP + TN + FP + FN)

适用场景：

类别平衡的数据集
对各类别错误成本相同的情况

局限性：

类别不平衡时容易误导
不区分错误类型

Python实现：

python 复制代码

from sklearn.metrics import accuracy_score

def calculate_accuracy(y_true, y_pred):
    """
    计算准确率
    """
    accuracy = accuracy_score(y_true, y_pred)
    
    return {
        'accuracy': accuracy,
        'correct_predictions': np.sum(y_true == y_pred),
        'total_samples': len(y_true),
        'error_rate': 1 - accuracy
    }

精确率 (Precision)

定义：在所有预测为正类的样本中，真正为正类的比例

计算公式：

复制代码

Precision = TP / (TP + FP)

含义：模型预测为正类的样本中，有多少是真的正类

Python实现：

python 复制代码

from sklearn.metrics import precision_score

def calculate_precision(y_true, y_pred, average='binary'):
    """
    计算精确率
    
    Args:
        average: 'binary', 'micro', 'macro', 'weighted'
    """
    precision = precision_score(y_true, y_pred, average=average)
    
    return {
        'precision': precision,
        'interpretation': '预测为正类的样本中，真正为正类的比例'
    }

召回率 (Recall)

定义：在所有真实为正类的样本中，被正确预测为正类的比例

计算公式：

复制代码

Recall = TP / (TP + FN)

含义：真实的正类样本中，有多少被模型找到了

Python实现：

python 复制代码

from sklearn.metrics import recall_score

def calculate_recall(y_true, y_pred, average='binary'):
    """
    计算召回率
    """
    recall = recall_score(y_true, y_pred, average=average)
    
    return {
        'recall': recall,
        'interpretation': '真实正类样本中，被正确预测的比例'
    }

F1-Score

定义：精确率和召回率的调和平均数

计算公式：

复制代码

F1 = 2 × (Precision × Recall) / (Precision + Recall)

特点：

平衡精确率和召回率
对不平衡数据集更友好

Python实现：

python 复制代码

from sklearn.metrics import f1_score

def calculate_f1_score(y_true, y_pred, average='binary'):
    """
    计算F1分数
    """
    f1 = f1_score(y_true, y_pred, average=average)
    
    return {
        'f1_score': f1,
        'interpretation': '精确率和召回率的调和平均数'
    }

2. 多分类指标

宏平均 (Macro Average)

定义：对所有类别的指标取平均（不考虑类别样本数）

计算公式：

复制代码

Macro Precision = (P₁ + P₂ + ... + Pₙ) / n
Macro Recall = (R₁ + R₂ + ... + Rₙ) / n
Macro F1 = (F1₁ + F1₂ + ... + F1ₙ) / n

特点：

每个类别权重相同
适合关注每个类别的性能

微平均 (Micro Average)

定义：将所有类别的TP、FP、FN汇总后计算指标

计算公式：

复制代码

Micro Precision = Total TP / (Total TP + Total FP)
Micro Recall = Total TP / (Total TP + Total FN)

特点：

考虑每个样本的贡献
适合样本数不平衡的情况

加权平均 (Weighted Average)

定义：根据每个类别的样本数加权平均

计算公式：

复制代码

Weighted Precision = Σ(Pᵢ × nᵢ) / Σnᵢ

特点：

样本数多的类别权重更大
更符合实际应用场景

3. 混淆矩阵详解

二分类混淆矩阵

	预测为正类	预测为负类
真实为正类	TP (真正例)	FN (假负例)
真实为负类	FP (假正例)	TN (真负例)

从混淆矩阵计算的指标：

python 复制代码

from sklearn.metrics import confusion_matrix

def analyze_confusion_matrix(y_true, y_pred):
    """
    分析混淆矩阵
    """
    cm = confusion_matrix(y_true, y_pred)
    tn, fp, fn, tp = cm.ravel()
    
    metrics = {
        'confusion_matrix': cm,
        'TP': tp,
        'TN': tn,
        'FP': fp,
        'FN': fn,
        'accuracy': (tp + tn) / (tp + tn + fp + fn),
        'precision': tp / (tp + fp) if (tp + fp) > 0 else 0,
        'recall': tp / (tp + fn) if (tp + fn) > 0 else 0,
        'specificity': tn / (tn + fp) if (tn + fp) > 0 else 0,  # 真负率
        'f1_score': 2 * tp / (2 * tp + fp + fn) if (2 * tp + fp + fn) > 0 else 0
    }
    
    return metrics

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_confusion_matrix_detailed
plot_confusion_matrix_detailed()

4. ROC曲线和AUC

ROC曲线 (Receiver Operating Characteristic Curve)

定义：以假正率(FPR)为横轴，真正率(TPR)为纵轴的曲线

计算公式：

复制代码

TPR (Recall) = TP / (TP + FN)
FPR = FP / (FP + TN)

特点：

不依赖于分类阈值
可以比较不同模型的性能

Python实现：

python 复制代码

from sklearn.metrics import roc_curve, auc

def plot_roc_curve(y_true, y_scores):
    """
    绘制ROC曲线并计算AUC
    """
    fpr, tpr, thresholds = roc_curve(y_true, y_scores)
    roc_auc = auc(fpr, tpr)
    
    return {
        'fpr': fpr,
        'tpr': tpr,
        'thresholds': thresholds,
        'auc': roc_auc,
        'interpretation': f'AUC = {roc_auc:.3f}，越接近1越好'
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_roc_curve_multiple
plot_roc_curve_multiple()

AUC (Area Under Curve)

定义：ROC曲线下的面积

取值范围：[0, 1]

评估标准：

AUC = 1.0：完美分类器
AUC > 0.9：优秀
0.7 < AUC ≤ 0.9：良好
0.5 < AUC ≤ 0.7：一般
AUC = 0.5：随机猜测
AUC < 0.5：比随机猜测还差

5. PR曲线 (Precision-Recall Curve)

定义：以召回率为横轴，精确率为纵轴的曲线

适用场景：

类别不平衡的数据集
更关注正类性能的情况

Python实现：

python 复制代码

from sklearn.metrics import precision_recall_curve, average_precision_score

def plot_pr_curve(y_true, y_scores):
    """
    绘制PR曲线并计算AP
    """
    precision, recall, thresholds = precision_recall_curve(y_true, y_scores)
    ap = average_precision_score(y_true, y_scores)
    
    return {
        'precision': precision,
        'recall': recall,
        'thresholds': thresholds,
        'ap': ap,  # Average Precision
        'interpretation': f'AP = {ap:.3f}，越接近1越好'
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_pr_curve_detailed
plot_pr_curve_detailed()

回归任务评估指标

1. 基础回归指标

均方误差 (MSE - Mean Squared Error)

定义：预测值与真实值差的平方的平均值

计算公式：

复制代码

MSE = (1/n) × Σ(yᵢ - ŷᵢ)²

特点：

对大误差惩罚更重
单位是目标变量的平方

Python实现：

python 复制代码

from sklearn.metrics import mean_squared_error

def calculate_mse(y_true, y_pred):
    """
    计算均方误差
    """
    mse = mean_squared_error(y_true, y_pred)
    
    return {
        'mse': mse,
        'rmse': np.sqrt(mse),  # 均方根误差
        'interpretation': 'MSE对大误差惩罚更重'
    }

平均绝对误差 (MAE - Mean Absolute Error)

定义：预测值与真实值差的绝对值的平均值

计算公式：

复制代码

MAE = (1/n) × Σ|yᵢ - ŷᵢ|

特点：

对所有误差一视同仁
单位与目标变量相同，更易解释

Python实现：

python 复制代码

from sklearn.metrics import mean_absolute_error

def calculate_mae(y_true, y_pred):
    """
    计算平均绝对误差
    """
    mae = mean_absolute_error(y_true, y_pred)
    
    return {
        'mae': mae,
        'interpretation': 'MAE对所有误差一视同仁，更易解释'
    }

均方根误差 (RMSE - Root Mean Squared Error)

定义：MSE的平方根

计算公式：

复制代码

RMSE = √MSE = √[(1/n) × Σ(yᵢ - ŷᵢ)²]

特点：

单位与目标变量相同
对大误差更敏感

Python实现：

python 复制代码

def calculate_rmse(y_true, y_pred):
    """
    计算均方根误差
    """
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    
    return {
        'rmse': rmse,
        'interpretation': 'RMSE对大误差更敏感'
    }

2. 相对误差指标

平均绝对百分比误差 (MAPE - Mean Absolute Percentage Error)

定义：预测误差相对于真实值的百分比

计算公式：

复制代码

MAPE = (100/n) × Σ|(yᵢ - ŷᵢ) / yᵢ|

特点：

无量纲，便于比较不同量级的数据
对接近0的值敏感

Python实现：

python 复制代码

def calculate_mape(y_true, y_pred):
    """
    计算平均绝对百分比误差
    """
    mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    
    return {
        'mape': mape,
        'interpretation': f'平均预测误差为真实值的{mape:.2f}%'
    }

对称平均绝对百分比误差 (SMAPE)

定义：MAPE的对称版本，避免除零问题

计算公式：

复制代码

SMAPE = (100/n) × Σ|yᵢ - ŷᵢ| / ((|yᵢ| + |ŷᵢ|) / 2)

Python实现：

python 复制代码

def calculate_smape(y_true, y_pred):
    """
    计算对称平均绝对百分比误差
    """
    smape = 100 * np.mean(
        np.abs(y_true - y_pred) / ((np.abs(y_true) + np.abs(y_pred)) / 2)
    )
    
    return {
        'smape': smape,
        'interpretation': f'对称平均绝对百分比误差: {smape:.2f}%'
    }

3. 决定系数 (R² Score)

定义：衡量模型对数据的拟合程度

计算公式：

复制代码

R² = 1 - (SS_res / SS_tot)
    = 1 - [Σ(yᵢ - ŷᵢ)² / Σ(yᵢ - ȳ)²]

取值范围：(-∞, 1]

评估标准：

R² = 1：完美拟合
R² > 0.9：优秀
0.7 < R² ≤ 0.9：良好
0.5 < R² ≤ 0.7：一般
R² ≤ 0：模型比简单平均值还差

Python实现：

python 复制代码

from sklearn.metrics import r2_score

def calculate_r2(y_true, y_pred):
    """
    计算决定系数R²
    """
    r2 = r2_score(y_true, y_pred)
    
    if r2 > 0.9:
        quality = "优秀"
    elif r2 > 0.7:
        quality = "良好"
    elif r2 > 0.5:
        quality = "一般"
    else:
        quality = "较差"
    
    return {
        'r2_score': r2,
        'quality': quality,
        'interpretation': f'模型解释了{r2*100:.2f}%的方差'
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_regression_metrics
plot_regression_metrics()

4. 分位数误差指标

分位数损失 (Quantile Loss)

定义：评估模型在不同分位数上的预测性能

计算公式：

复制代码

Quantile Loss = max(τ × (y - ŷ), (1-τ) × (ŷ - y))

应用场景：

需要预测区间而非点估计
对不同方向的误差成本不同

Python实现：

python 复制代码

def calculate_quantile_loss(y_true, y_pred, quantile=0.5):
    """
    计算分位数损失
    """
    error = y_true - y_pred
    loss = np.maximum(quantile * error, (quantile - 1) * error)
    quantile_loss = np.mean(loss)
    
    return {
        'quantile_loss': quantile_loss,
        'quantile': quantile,
        'interpretation': f'{quantile*100}%分位数损失'
    }

聚类任务评估指标

1. 内部评估指标

轮廓系数 (Silhouette Coefficient)

定义：衡量样本与其所属簇的相似度，以及与其他簇的差异度

计算公式：

复制代码

s(i) = (b(i) - a(i)) / max(a(i), b(i))

其中：

a(i)：样本i到同簇其他样本的平均距离
b(i)：样本i到最近其他簇的平均距离

取值范围：[-1, 1]

评估标准：

s(i) ≈ 1：样本聚类合理
s(i) ≈ 0：样本在两个簇的边界上
s(i) ≈ -1：样本被分配到错误的簇

Python实现：

python 复制代码

from sklearn.metrics import silhouette_score, silhouette_samples

def calculate_silhouette_score(X, labels):
    """
    计算轮廓系数
    """
    silhouette_avg = silhouette_score(X, labels)
    sample_silhouette_values = silhouette_samples(X, labels)
    
    return {
        'silhouette_avg': silhouette_avg,
        'sample_silhouette_values': sample_silhouette_values,
        'interpretation': f'平均轮廓系数: {silhouette_avg:.3f}'
    }

轮廓系数评估标准

> 0.7：强聚类结构
0.5 - 0.7：合理的聚类结构
0.25 - 0.5：弱聚类结构，需要改进
< 0.25：没有实质性的聚类结构

戴维森-保丁指数 (DBI - Davies-Bouldin Index)

定义：衡量簇内距离与簇间距离的比值

计算公式：

复制代码

DBI = (1/k) × Σ max(i≠j) [(σᵢ + σⱼ) / d(cᵢ, cⱼ)]

特点：

值越小越好（理想值为0）
不需要真实标签

Python实现：

python 复制代码

from sklearn.metrics import davies_bouldin_score

def calculate_dbi(X, labels):
    """
    计算戴维森-保丁指数
    """
    dbi = davies_bouldin_score(X, labels)
    
    return {
        'dbi': dbi,
        'interpretation': f'DBI = {dbi:.3f}，值越小越好'
    }

卡林斯基-哈拉巴斯指数 (CHI - Calinski-Harabasz Index)

定义：簇间离散度与簇内离散度的比值

计算公式：

复制代码

CHI = [tr(Bk) / (k-1)] / [tr(Wk) / (n-k)]

特点：

值越大越好
也称为方差比标准

Python实现：

python 复制代码

from sklearn.metrics import calinski_harabasz_score

def calculate_chi(X, labels):
    """
    计算卡林斯基-哈拉巴斯指数
    """
    chi = calinski_harabasz_score(X, labels)
    
    return {
        'chi': chi,
        'interpretation': f'CHI = {chi:.3f}，值越大越好'
    }

2. 外部评估指标

调整兰德指数 (ARI - Adjusted Rand Index)

定义：衡量聚类结果与真实标签的一致性

计算公式：

复制代码

ARI = (RI - E[RI]) / (max(RI) - E[RI])

取值范围：[-1, 1]

评估标准：

ARI = 1：完全一致
ARI = 0：随机分配
ARI < 0：比随机分配还差

Python实现：

python 复制代码

from sklearn.metrics import adjusted_rand_score

def calculate_ari(labels_true, labels_pred):
    """
    计算调整兰德指数
    """
    ari = adjusted_rand_score(labels_true, labels_pred)
    
    return {
        'ari': ari,
        'interpretation': f'ARI = {ari:.3f}，越接近1越好'
    }

标准化互信息 (NMI - Normalized Mutual Information)

定义：衡量聚类结果与真实标签的互信息

计算公式：

复制代码

NMI = I(X; Y) / √(H(X) × H(Y))

取值范围：[0, 1]

Python实现：

python 复制代码

from sklearn.metrics import normalized_mutual_info_score

def calculate_nmi(labels_true, labels_pred):
    """
    计算标准化互信息
    """
    nmi = normalized_mutual_info_score(labels_true, labels_pred)
    
    return {
        'nmi': nmi,
        'interpretation': f'NMI = {nmi:.3f}，越接近1越好'
    }

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_clustering_metrics
plot_clustering_metrics()

目标检测基础指标

1. IOU (Intersection over Union) - 交并比

定义：衡量两个边界框重叠程度的指标

计算公式：

复制代码

IOU = (预测框 ∩ 真实框) / (预测框 ∪ 真实框)
    = 交集面积 / 并集面积

取值范围：[0, 1]

评估标准：

IOU = 1：完全重叠
IOU ≥ 0.5：通常认为是正确检测
IOU < 0.5：检测质量较差

Python实现：

python 复制代码

def calculate_iou(box1, box2):
    """
    计算两个边界框的IOU
    
    Args:
        box1: [x1, y1, x2, y2] 预测框坐标
        box2: [x1, y1, x2, y2] 真实框坐标
    
    Returns:
        iou: 交并比值
    """
    # 计算交集区域
    x1_inter = max(box1[0], box2[0])
    y1_inter = max(box1[1], box2[1])
    x2_inter = min(box1[2], box2[2])
    y2_inter = min(box1[3], box2[3])
    
    # 如果没有交集，返回0
    if x2_inter <= x1_inter or y2_inter <= y1_inter:
        return 0.0
    
    # 计算交集面积
    intersection = (x2_inter - x1_inter) * (y2_inter - y1_inter)
    
    # 计算并集面积
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union = area1 + area2 - intersection
    
    # 计算IOU
    iou = intersection / union if union > 0 else 0.0
    return iou

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_iou_visualization
plot_iou_visualization()

2. 混淆矩阵在目标检测中的应用

在目标检测中，混淆矩阵需要考虑IOU阈值：

	预测为正类 (IOU≥阈值)	预测为负类 (IOU<阈值)
真实为正类	TP (真正例)	FN (假负例)
真实为负类	FP (假正例)	TN (真负例，通常不计)

Python实现：

python 复制代码

def evaluate_detections(predictions, ground_truth, iou_threshold=0.5):
    """
    评估目标检测结果
    
    Args:
        predictions: 预测框列表
        ground_truth: 真实框列表
        iou_threshold: IOU阈值
    
    Returns:
        dict: 评估结果
    """
    tp = 0
    fp = 0
    fn = 0
    
    matched_gt = set()
    
    # 按置信度排序预测
    predictions_sorted = sorted(predictions, key=lambda x: x['confidence'], reverse=True)
    
    for pred in predictions_sorted:
        best_iou = 0
        best_gt_idx = -1
        
        for gt_idx, gt in enumerate(ground_truth):
            if gt_idx in matched_gt:
                continue
            
            iou = calculate_iou(pred['bbox'], gt['bbox'])
            if iou > best_iou and pred['class'] == gt['class']:
                best_iou = iou
                best_gt_idx = gt_idx
        
        if best_iou >= iou_threshold:
            tp += 1
            matched_gt.add(best_gt_idx)
        else:
            fp += 1
    
    fn = len(ground_truth) - len(matched_gt)
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
    
    return {
        'TP': tp,
        'FP': fp,
        'FN': fn,
        'Precision': precision,
        'Recall': recall,
        'F1': f1
    }

3. 直方图在目标检测中的应用

置信度分布直方图

用途：

观察模型输出的置信度分布
选择合适的置信度阈值
识别模型是否过度自信

Python实现：

python 复制代码

import matplotlib.pyplot as plt
import numpy as np

def plot_confidence_distribution(confidences, bins=50):
    """
    绘制置信度分布直方图
    """
plt.figure(figsize=(10, 6))
    plt.hist(confidences, bins=bins, color='skyblue', edgecolor='black', alpha=0.7)
plt.axvline(x=0.5, color='red', linestyle='--', linewidth=2, label='Threshold=0.5')
plt.xlabel('Confidence Score')
plt.ylabel('Frequency')
plt.title('Distribution of Detection Confidence Scores')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

IoU分布直方图

用途：

评估检测框的定位精度
识别定位不准确的问题

Python实现：

python 复制代码

def plot_iou_distribution(ious, bins=50):
    """
    绘制IoU分布直方图
    """
    plt.figure(figsize=(10, 6))
    plt.hist(ious, bins=bins, color='lightgreen', edgecolor='black', alpha=0.7)
    plt.axvline(x=0.5, color='red', linestyle='--', linewidth=2, label='IOU Threshold=0.5')
    plt.axvline(x=0.75, color='orange', linestyle='--', linewidth=2, label='IOU Threshold=0.75')
    plt.xlabel('IoU')
    plt.ylabel('Frequency')
    plt.title('Distribution of IoU Values')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

可视化图表：

生成图表代码：

python 复制代码

from docs.ml_metrics_visualization_complete import plot_confidence_distribution
plot_confidence_distribution()

数据可视化指标

1. 分布可视化

直方图 (Histogram)

用途：

了解数据的分布特征
发现异常值
选择合适的阈值

Python实现：

python 复制代码

def plot_histogram(data, bins=30, title='Distribution'):
    """
    绘制直方图
    """
    plt.figure(figsize=(10, 6))
    plt.hist(data, bins=bins, color='skyblue', edgecolor='black', alpha=0.7)
    plt.xlabel('Value')
    plt.ylabel('Frequency')
    plt.title(title)
    plt.grid(True, alpha=0.3)
    plt.show()

箱线图 (Box Plot)

用途：

显示数据的中位数、四分位数
识别异常值
比较不同组的数据分布

Python实现：

python 复制代码

def plot_boxplot(data, labels=None):
    """
    绘制箱线图
    """
    plt.figure(figsize=(10, 6))
    plt.boxplot(data, labels=labels)
    plt.ylabel('Value')
    plt.title('Box Plot')
    plt.grid(True, alpha=0.3)
    plt.show()

2. 相关性可视化

热力图 (Heatmap)

用途：

可视化特征之间的相关性
识别高度相关的特征

Python实现：

python 复制代码

import seaborn as sns

def plot_correlation_heatmap(corr_matrix):
    """
    绘制相关性热力图
    """
    plt.figure(figsize=(12, 10))
    sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', 
                center=0, square=True, linewidths=0.5)
    plt.title('Feature Correlation Heatmap')
    plt.tight_layout()
    plt.show()

3. 性能可视化

学习曲线 (Learning Curve)

用途：

观察模型训练过程
识别过拟合/欠拟合
确定最佳训练轮数

Python实现：

python 复制代码

from sklearn.model_selection import learning_curve

def plot_learning_curve(estimator, X, y, cv=5):
    """
    绘制学习曲线
    """
    train_sizes, train_scores, val_scores = learning_curve(
        estimator, X, y, cv=cv, n_jobs=-1
    )
    
    train_mean = np.mean(train_scores, axis=1)
    train_std = np.std(train_scores, axis=1)
    val_mean = np.mean(val_scores, axis=1)
    val_std = np.std(val_scores, axis=1)
    
plt.figure(figsize=(10, 6))
    plt.plot(train_sizes, train_mean, 'o-', color='blue', label='Training Score')
    plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1, color='blue')
    plt.plot(train_sizes, val_mean, 'o-', color='red', label='Validation Score')
    plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1, color='red')
    plt.xlabel('Training Set Size')
    plt.ylabel('Score')
    plt.title('Learning Curve')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

总结

核心要点回顾

数据质量指标
- 缺失值率、重复值率
- 偏度、峰度
- 类别不平衡度
前处理指标
- 标准化/归一化效果
- 特征重要性
- 特征相关性
分类任务指标
- 准确率、精确率、召回率、F1
- ROC曲线、AUC
- PR曲线、AP
回归任务指标
- MSE、MAE、RMSE
- MAPE、SMAPE
- R² Score
聚类任务指标
- 轮廓系数
- DBI、CHI
- ARI、NMI
目标检测指标
- IOU
- 混淆矩阵
- 置信度/IoU分布

指标选择指南

任务类型	主要指标	辅助指标
二分类	AUC, F1	Precision, Recall
多分类	Macro F1, Weighted F1	Per-class Precision/Recall
回归	RMSE, R²	MAE, MAPE
聚类	Silhouette Score	DBI, CHI
目标检测	mAP	Precision, Recall, IoU

下一步学习

在进阶篇中，我们将深入学习：

训练过程指标：损失函数、学习率、梯度等
模型参数指标：参数量、FLOPs、内存占用等
超参数调优指标：学习率调度、正则化等
模型验证指标：交叉验证、留出法等

相关资源：

机器学习评估指标详解 - 入门篇