C++ 梯度下降法(Gradient Descent):数值优化的核心迭代算法

目录

C++ 梯度下降法(Gradient Descent):数值优化的核心迭代算法

引言

梯度下降法(Gradient Descent,GD)是最经典的一阶优化算法,核心思想是"沿着损失函数梯度的反方向迭代更新参数",逐步逼近函数的极小值点。它是机器学习(如线性回归、神经网络训练)、数值优化、非线性方程求解等领域的基础算法,具有实现简单、计算高效、适用性广的特点。

本文将从梯度下降的数学原理、核心变体、C++ 实现框架到实战案例(线性回归、函数极值求解),全面讲解这一算法,并通过完整的代码实现帮你掌握其核心思想与工程落地能力。

一、梯度下降核心原理

1. 数学基础:梯度与最优化

(1)梯度的定义

对于多元函数 f ( x ) f(\boldsymbol{x}) f(x)( x = [ x 1 , x 2 , . . . , x n ] T \boldsymbol{x} = [x_1, x_2, ..., x_n]^T x=[x1,x2,...,xn]T),其梯度 是一个向量,定义为:
∇ f ( x ) = ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , . . . , ∂ f ∂ x n ) T \nabla f(\boldsymbol{x}) = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n} \right)^T ∇f(x)=(∂x1∂f,∂x2∂f,...,∂xn∂f)T

梯度的几何意义:函数在该点上升最快的方向,其反方向是函数下降最快的方向。

(2)梯度下降的核心公式

梯度下降的迭代更新公式为:
x t + 1 = x t − η ⋅ ∇ f ( x t ) \boldsymbol{x}_{t+1} = \boldsymbol{x}_t - \eta \cdot \nabla f(\boldsymbol{x}_t) xt+1=xt−η⋅∇f(xt)

其中:

  • x t \boldsymbol{x}_t xt:第 t t t 次迭代的参数值;
  • η \eta η(学习率,Learning Rate):步长,控制每次迭代的更新幅度;
  • ∇ f ( x t ) \nabla f(\boldsymbol{x}_t) ∇f(xt):函数在 x t \boldsymbol{x}_t xt 处的梯度;
  • 负号:表示沿梯度反方向更新(向极小值点靠近)。
(3)收敛条件

迭代停止的常见条件(满足其一即可):

  1. 迭代次数达到预设最大值;
  2. 梯度的模长小于阈值( ∥ ∇ f ( x t ) ∥ < ϵ \|\nabla f(\boldsymbol{x}_t)\| < \epsilon ∥∇f(xt)∥<ϵ,如 ϵ = 1 e − 6 \epsilon=1e-6 ϵ=1e−6);
  3. 参数更新的幅度小于阈值( ∥ x t + 1 − x t ∥ < ϵ \|\boldsymbol{x}_{t+1} - \boldsymbol{x}_t\| < \epsilon ∥xt+1−xt∥<ϵ);
  4. 损失函数值的变化小于阈值( ∣ f ( x t + 1 ) − f ( x t ) ∣ < ϵ |f(\boldsymbol{x}_{t+1}) - f(\boldsymbol{x}_t)| < \epsilon ∣f(xt+1)−f(xt)∣<ϵ)。

2. 梯度下降的核心变体

根据数据使用方式和更新策略,梯度下降分为三类核心变体:

变体类型 核心思想 优点 缺点 适用场景
批量梯度下降(BGD) 使用全部训练数据计算梯度 收敛稳定,能找到全局最优(凸函数) 计算成本高,数据量大时效率低 小数据集、凸优化问题
随机梯度下降(SGD) 使用单个样本随机计算梯度 计算速度快,易跳出局部最优 收敛震荡,稳定性差 大数据集、非凸优化问题
小批量梯度下降(MBGD) 使用小批量样本计算梯度(如batch_size=32) 平衡BGD和SGD的优缺点,收敛稳定且效率高 需要调整batch_size参数 绝大多数机器学习场景

3. 梯度下降的关键参数

参数 作用 调优建议
学习率 η \eta η 控制参数更新幅度 过小:收敛慢;过大:震荡不收敛;建议从0.01/0.001开始尝试
迭代次数 控制算法终止条件 结合收敛阈值使用,避免无效迭代
batch_size 小批量梯度下降的样本数 常用32/64/128,平衡计算效率和稳定性
收敛阈值 ϵ \epsilon ϵ 判断是否收敛 常用1e-6~1e-8,根据问题精度要求调整

二、梯度下降的C++基础实现框架

1. 通用数据结构与工具函数

cpp 复制代码
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
#include <numeric>
#include <algorithm>
#include <iomanip>
#include <chrono>

using namespace std;

// 定义向量类型(适配多元函数优化)
using Vector = vector<double>;

// 定义矩阵类型(适配批量数据处理)
using Matrix = vector<Vector>;

// 随机数生成器(单例模式,保证高质量随机性)
class RandomGenerator {
public:
    static RandomGenerator& get_instance() {
        static RandomGenerator instance;
        return instance;
    }

    // 生成[min, max]范围内的随机浮点数
    double rand_double(double min = 0.0, double max = 1.0) {
        uniform_real_distribution<double> dist(min, max);
        return dist(rng);
    }

    // 生成服从正态分布的随机数(用于参数初始化)
    double rand_normal(double mean = 0.0, double stddev = 1.0) {
        normal_distribution<double> dist(mean, stddev);
        return dist(rng);
    }

private:
    RandomGenerator() {
        random_device rd;
        rng = mt19937(rd()); // 梅森旋转算法,高质量随机数
    }

    // 禁止拷贝
    RandomGenerator(const RandomGenerator&) = delete;
    RandomGenerator& operator=(const RandomGenerator&) = delete;

    mt19937 rng;
};

// 向量点积
double dot_product(const Vector& a, const Vector& b) {
    if (a.size() != b.size()) {
        throw invalid_argument("向量维度不匹配");
    }
    double res = 0.0;
    for (size_t i = 0; i < a.size(); ++i) {
        res += a[i] * b[i];
    }
    return res;
}

// 向量加法
Vector vector_add(const Vector& a, const Vector& b) {
    if (a.size() != b.size()) {
        throw invalid_argument("向量维度不匹配");
    }
    Vector res(a.size());
    for (size_t i = 0; i < a.size(); ++i) {
        res[i] = a[i] + b[i];
    }
    return res;
}

// 向量数乘
Vector scalar_multiply(double scalar, const Vector& vec) {
    Vector res(vec.size());
    for (size_t i = 0; i < vec.size(); ++i) {
        res[i] = scalar * vec[i];
    }
    return res;
}

// 向量的L2范数(模长)
double vector_norm(const Vector& vec) {
    double sum = 0.0;
    for (double v : vec) {
        sum += v * v;
    }
    return sqrt(sum);
}

// 打印向量
void print_vector(const Vector& vec, const string& name = "向量") {
    cout << name << ":[";
    for (size_t i = 0; i < vec.size(); ++i) {
        cout << fixed << setprecision(6) << vec[i];
        if (i != vec.size() - 1) {
            cout << ", ";
        }
    }
    cout << "]" << endl;
}

// 打印矩阵
void print_matrix(const Matrix& mat, const string& name = "矩阵") {
    cout << name << ":" << endl;
    for (const auto& row : mat) {
        cout << "[";
        for (size_t i = 0; i < row.size(); ++i) {
            cout << fixed << setprecision(4) << row[i];
            if (i != row.size() - 1) {
                cout << ", ";
            }
        }
        cout << "]" << endl;
    }
}

2. 梯度下降核心算法实现

cpp 复制代码
// 梯度下降参数配置
struct GDParams {
    double learning_rate = 0.01;    // 学习率
    int max_iter = 10000;           // 最大迭代次数
    double tol = 1e-6;              // 收敛阈值(梯度模长/参数变化/损失变化)
    int batch_size = 32;            // 小批量梯度下降的批次大小(BGD设为数据集大小,SGD设为1)
    bool verbose = true;            // 是否打印迭代信息
    int print_interval = 1000;      // 打印间隔(每N次迭代打印一次)
};

// 梯度下降结果
struct GDResult {
    Vector params;                  // 优化后的参数
    double final_loss;              // 最终损失值
    int iter_num;                   // 实际迭代次数
    bool converged;                 // 是否收敛
};

// 梯度下降核心函数(通用模板,适配任意可计算梯度的损失函数)
// 参数说明:
// - init_params: 参数初始值
// - loss_func: 损失函数(输入参数,返回损失值)
// - grad_func: 梯度函数(输入参数,返回梯度向量)
// - data: 训练数据(可选,用于批量/小批量梯度计算)
// - params: 梯度下降配置
GDResult gradient_descent(
    Vector init_params,
    function<double(const Vector&, const Matrix&)> loss_func,
    function<Vector(const Vector&, const Matrix&)> grad_func,
    const Matrix& data,
    const GDParams& gd_params
) {
    GDResult result;
    result.params = init_params;
    result.final_loss = 0.0;
    result.iter_num = 0;
    result.converged = false;

    int n_samples = data.size();
    if (n_samples == 0) {
        cerr << "错误:训练数据为空!" << endl;
        return result;
    }

    // 迭代优化
    for (int iter = 0; iter < gd_params.max_iter; ++iter) {
        // 1. 选择批次数据(BGD/SGD/MBGD)
        Matrix batch_data;
        if (gd_params.batch_size == n_samples) {
            // 批量梯度下降(BGD):使用全部数据
            batch_data = data;
        } else if (gd_params.batch_size == 1) {
            // 随机梯度下降(SGD):随机选择一个样本
            int rand_idx = RandomGenerator::get_instance().rand_double(0, n_samples - 1);
            batch_data.push_back(data[rand_idx]);
        } else {
            // 小批量梯度下降(MBGD):随机选择batch_size个样本
            vector<int> indices(n_samples);
            iota(indices.begin(), indices.end(), 0);
            shuffle(indices.begin(), indices.end(), RandomGenerator::get_instance().get_rng());
            for (int i = 0; i < min(gd_params.batch_size, n_samples); ++i) {
                batch_data.push_back(data[indices[i]]);
            }
        }

        // 2. 计算当前损失和梯度
        double current_loss = loss_func(result.params, batch_data);
        Vector grad = grad_func(result.params, batch_data);
        double grad_norm = vector_norm(grad);

        // 3. 打印迭代信息
        if (gd_params.verbose && iter % gd_params.print_interval == 0) {
            cout << "迭代次数:" << iter 
                 << " | 损失值:" << fixed << setprecision(6) << current_loss
                 << " | 梯度模长:" << fixed << setprecision(8) << grad_norm << endl;
        }

        // 4. 检查收敛条件(梯度模长小于阈值)
        if (grad_norm < gd_params.tol) {
            result.converged = true;
            result.iter_num = iter;
            result.final_loss = current_loss;
            break;
        }

        // 5. 参数更新:x = x - η * ∇f(x)
        Vector update = scalar_multiply(gd_params.learning_rate, grad);
        for (size_t i = 0; i < result.params.size(); ++i) {
            result.params[i] -= update[i];
        }

        // 6. 更新迭代次数
        result.iter_num = iter + 1;
        result.final_loss = current_loss;
    }

    // 最终检查是否收敛
    if (!result.converged && gd_params.verbose) {
        cout << "警告:达到最大迭代次数,未收敛!" << endl;
    }

    return result;
}

三、实战案例1:求解函数极小值(无约束优化)

1. 问题描述

求解二元函数 f ( x , y ) = x 2 + 2 y 2 + 2 sin ⁡ ( x ) cos ⁡ ( y ) f(x, y) = x^2 + 2y^2 + 2\sin(x)\cos(y) f(x,y)=x2+2y2+2sin(x)cos(y) 的极小值点。

2. 梯度推导

函数的梯度为:
∇ f ( x , y ) = ( 2 x + 2 cos ⁡ ( x ) cos ⁡ ( y ) , 4 y − 2 sin ⁡ ( x ) sin ⁡ ( y ) ) T \nabla f(x, y) = \left( 2x + 2\cos(x)\cos(y), 4y - 2\sin(x)\sin(y) \right)^T ∇f(x,y)=(2x+2cos(x)cos(y),4y−2sin(x)sin(y))T

3. 完整实现代码

cpp 复制代码
// 目标函数:f(x, y) = x² + 2y² + 2sin(x)cos(y)
double target_function(const Vector& params) {
    if (params.size() != 2) {
        throw invalid_argument("参数必须为二维向量(x, y)");
    }
    double x = params[0];
    double y = params[1];
    return x*x + 2*y*y + 2*sin(x)*cos(y);
}

// 目标函数的损失函数(无数据,直接返回函数值)
double loss_func_unconstrained(const Vector& params, const Matrix& /*data*/) {
    return target_function(params);
}

// 目标函数的梯度函数
Vector grad_func_unconstrained(const Vector& params, const Matrix& /*data*/) {
    if (params.size() != 2) {
        throw invalid_argument("参数必须为二维向量(x, y)");
    }
    double x = params[0];
    double y = params[1];
    Vector grad(2);
    grad[0] = 2*x + 2*cos(x)*cos(y);  // ∂f/∂x
    grad[1] = 4*y - 2*sin(x)*sin(y);  // ∂f/∂y
    return grad;
}

// 测试无约束优化(函数极小值求解)
void test_unconstrained_optimization() {
    cout << "===== 无约束优化:求解函数极小值 =====" << endl;

    // 1. 初始化参数(随机初始值)
    Vector init_params = {
        RandomGenerator::get_instance().rand_double(-3.0, 3.0),
        RandomGenerator::get_instance().rand_double(-3.0, 3.0)
    };
    cout << "初始参数:";
    print_vector(init_params);
    cout << "初始函数值:" << fixed << setprecision(6) << target_function(init_params) << endl;

    // 2. 配置梯度下降参数
    GDParams gd_params;
    gd_params.learning_rate = 0.01;
    gd_params.max_iter = 20000;
    gd_params.tol = 1e-8;
    gd_params.verbose = true;
    gd_params.print_interval = 2000;
    gd_params.batch_size = 1;  // 无数据,SGD/BGD/MBGD效果一致

    // 3. 空数据(无约束优化无需训练数据)
    Matrix empty_data;

    // 4. 执行梯度下降
    GDResult result = gradient_descent(
        init_params,
        loss_func_unconstrained,
        grad_func_unconstrained,
        empty_data,
        gd_params
    );

    // 5. 输出结果
    cout << "\n===== 优化结果 =====" << endl;
    cout << "最终参数:";
    print_vector(result.params, "极小值点(x, y)");
    cout << "最终函数值(极小值):" << fixed << setprecision(6) << result.final_loss << endl;
    cout << "迭代次数:" << result.iter_num << endl;
    cout << "是否收敛:" << (result.converged ? "是" : "否") << endl;
}

四、实战案例2:线性回归(有监督学习)

1. 问题描述

给定数据集 { ( x i , y i ) } \{(x_i, y_i)\} {(xi,yi)},拟合线性模型 y = w T x + b y = \boldsymbol{w}^T \boldsymbol{x} + b y=wTx+b(简化为 y = w 1 x 1 + w 2 x 2 + . . . + w n x n + b y = w_1x_1 + w_2x_2 + ... + w_nx_n + b y=w1x1+w2x2+...+wnxn+b),最小化均方误差(MSE)损失函数:
L ( w , b ) = 1 m ∑ i = 1 m ( y i − ( w T x i + b ) ) 2 L(\boldsymbol{w}, b) = \frac{1}{m} \sum_{i=1}^m (y_i - (\boldsymbol{w}^T \boldsymbol{x}_i + b))^2 L(w,b)=m1i=1∑m(yi−(wTxi+b))2

2. 梯度推导

将偏置 b b b 融入参数(令 w 0 = b w_0 = b w0=b, x i 0 = 1 x_{i0} = 1 xi0=1),模型简化为 y = w T x y = \boldsymbol{w}^T \boldsymbol{x} y=wTx,损失函数梯度为:
∂ L ∂ w j = − 2 m ∑ i = 1 m ( y i − w T x i ) x i j \frac{\partial L}{\partial w_j} = -\frac{2}{m} \sum_{i=1}^m (y_i - \boldsymbol{w}^T \boldsymbol{x}i) x{ij} ∂wj∂L=−m2i=1∑m(yi−wTxi)xij

3. 完整实现代码

cpp 复制代码
// 生成线性回归测试数据(y = 2x1 + 3x2 + 4 + 噪声)
Matrix generate_linear_regression_data(int n_samples, int n_features, double noise_std = 0.1) {
    Matrix data(n_samples, Vector(n_features + 1));  // 最后一列是y

    // 真实参数:w=[2, 3], b=4
    Vector true_weights = {2.0, 3.0};
    double true_bias = 4.0;

    for (int i = 0; i < n_samples; ++i) {
        // 生成特征x1, x2
        for (int j = 0; j < n_features; ++j) {
            data[i][j] = RandomGenerator::get_instance().rand_double(0.0, 10.0);
        }
        // 生成标签y = 2x1 + 3x2 + 4 + 噪声
        double y = true_bias;
        for (int j = 0; j < n_features; ++j) {
            y += true_weights[j] * data[i][j];
        }
        // 添加高斯噪声
        y += RandomGenerator::get_instance().rand_normal(0.0, noise_std);
        data[i][n_features] = y;
    }

    return data;
}

// 线性回归损失函数(MSE)
double loss_func_linear_regression(const Vector& params, const Matrix& batch_data) {
    int n_samples = batch_data.size();
    int n_features = batch_data[0].size() - 1;  // 最后一列是y

    if (params.size() != n_features + 1) {  // params = [b, w1, w2](b是偏置,w1/w2是权重)
        throw invalid_argument("参数维度错误:应为" + to_string(n_features + 1) + "维");
    }

    double loss = 0.0;
    for (const auto& sample : batch_data) {
        // 预测值:y_pred = b + w1*x1 + w2*x2
        double y_pred = params[0];  // 偏置b
        for (int j = 0; j < n_features; ++j) {
            y_pred += params[j+1] * sample[j];
        }
        // 真实值
        double y_true = sample[n_features];
        // 均方误差
        loss += (y_true - y_pred) * (y_true - y_pred);
    }
    return loss / n_samples;  // 平均损失
}

// 线性回归梯度函数
Vector grad_func_linear_regression(const Vector& params, const Matrix& batch_data) {
    int n_samples = batch_data.size();
    int n_features = batch_data[0].size() - 1;

    if (params.size() != n_features + 1) {
        throw invalid_argument("参数维度错误:应为" + to_string(n_features + 1) + "维");
    }

    Vector grad(params.size(), 0.0);
    for (const auto& sample : batch_data) {
        // 预测值
        double y_pred = params[0];
        for (int j = 0; j < n_features; ++j) {
            y_pred += params[j+1] * sample[j];
        }
        // 真实值
        double y_true = sample[n_features];
        double error = y_true - y_pred;

        // 计算梯度
        grad[0] -= 2 * error / n_samples;  // 偏置b的梯度
        for (int j = 0; j < n_features; ++j) {
            grad[j+1] -= 2 * error * sample[j] / n_samples;  // 权重wj的梯度
        }
    }

    return grad;
}

// 测试线性回归
void test_linear_regression() {
    cout << "\n===== 线性回归:梯度下降拟合 =====" << endl;

    // 1. 生成测试数据
    int n_samples = 1000;    // 样本数
    int n_features = 2;      // 特征数(x1, x2)
    Matrix data = generate_linear_regression_data(n_samples, n_features);
    cout << "生成数据:" << n_samples << "个样本," << n_features << "个特征" << endl;
    // print_matrix(data, "前5个样本"); // 可选:打印前5个样本

    // 2. 初始化参数(b, w1, w2)
    Vector init_params(n_features + 1);
    for (size_t i = 0; i < init_params.size(); ++i) {
        init_params[i] = RandomGenerator::get_instance().rand_normal(0.0, 0.1);  // 小随机数初始化
    }
    cout << "初始参数(b, w1, w2):";
    print_vector(init_params);

    // 3. 配置梯度下降参数(小批量梯度下降)
    GDParams gd_params;
    gd_params.learning_rate = 0.001;
    gd_params.max_iter = 10000;
    gd_params.tol = 1e-7;
    gd_params.verbose = true;
    gd_params.print_interval = 1000;
    gd_params.batch_size = 64;  // 小批量大小

    // 4. 执行梯度下降
    GDResult result = gradient_descent(
        init_params,
        loss_func_linear_regression,
        grad_func_linear_regression,
        data,
        gd_params
    );

    // 5. 输出结果
    cout << "\n===== 线性回归拟合结果 =====" << endl;
    cout << "拟合参数:";
    print_vector(result.params, "[b, w1, w2]");
    cout << "真实参数:[4.0, 2.0, 3.0]" << endl;
    cout << "最终MSE损失:" << fixed << setprecision(6) << result.final_loss << endl;
    cout << "迭代次数:" << result.iter_num << endl;
    cout << "是否收敛:" << (result.converged ? "是" : "否") << endl;

    // 6. 测试预测
    Vector test_sample = {5.0, 6.0};  // x1=5, x2=6
    double y_pred = result.params[0] + result.params[1]*test_sample[0] + result.params[2]*test_sample[1];
    double y_true = 4.0 + 2.0*5.0 + 3.0*6.0;
    cout << "\n测试预测:" << endl;
    cout << "测试样本(x1, x2):[" << test_sample[0] << ", " << test_sample[1] << "]" << endl;
    cout << "预测值:" << fixed << setprecision(4) << y_pred << endl;
    cout << "真实值:" << fixed << setprecision(4) << y_true << endl;
    cout << "预测误差:" << fixed << setprecision(4) << abs(y_pred - y_true) << endl;
}

五、梯度下降的优化技巧

1. 学习率调整策略

固定学习率易导致收敛慢或震荡,常用动态调整策略:

cpp 复制代码
// 学习率衰减(指数衰减)
double exponential_decay_lr(double initial_lr, int iter, double decay_rate = 0.99, int decay_step = 100) {
    return initial_lr * pow(decay_rate, iter / decay_step);
}

// 学习率衰减(线性衰减)
double linear_decay_lr(double initial_lr, int iter, int max_iter, double end_lr = 0.0001) {
    return initial_lr - (initial_lr - end_lr) * (double)iter / max_iter;
}

2. 动量(Momentum)优化

引入动量模拟物理惯性,加速收敛并减少震荡:

cpp 复制代码
// 带动量的梯度下降
GDResult gradient_descent_with_momentum(
    Vector init_params,
    function<double(const Vector&, const Matrix&)> loss_func,
    function<Vector(const Vector&, const Matrix&)> grad_func,
    const Matrix& data,
    const GDParams& gd_params,
    double momentum = 0.9  // 动量系数(常用0.9)
) {
    GDResult result;
    result.params = init_params;
    result.final_loss = 0.0;
    result.iter_num = 0;
    result.converged = false;

    Vector velocity(init_params.size(), 0.0);  // 速度项(动量)
    int n_samples = data.size();

    for (int iter = 0; iter < gd_params.max_iter; ++iter) {
        // 1. 选择批次数据(同基础版)
        Matrix batch_data;
        if (gd_params.batch_size == n_samples) {
            batch_data = data;
        } else if (gd_params.batch_size == 1) {
            int rand_idx = RandomGenerator::get_instance().rand_double(0, n_samples - 1);
            batch_data.push_back(data[rand_idx]);
        } else {
            vector<int> indices(n_samples);
            iota(indices.begin(), indices.end(), 0);
            shuffle(indices.begin(), indices.end(), RandomGenerator::get_instance().get_rng());
            for (int i = 0; i < min(gd_params.batch_size, n_samples); ++i) {
                batch_data.push_back(data[indices[i]]);
            }
        }

        // 2. 计算损失和梯度
        double current_loss = loss_func(result.params, batch_data);
        Vector grad = grad_func(result.params, batch_data);
        double grad_norm = vector_norm(grad);

        // 3. 打印信息
        if (gd_params.verbose && iter % gd_params.print_interval == 0) {
            cout << "迭代次数:" << iter 
                 << " | 损失值:" << fixed << setprecision(6) << current_loss
                 << " | 梯度模长:" << fixed << setprecision(8) << grad_norm << endl;
        }

        // 4. 收敛检查
        if (grad_norm < gd_params.tol) {
            result.converged = true;
            result.iter_num = iter;
            result.final_loss = current_loss;
            break;
        }

        // 5. 带动量的参数更新
        double lr = exponential_decay_lr(gd_params.learning_rate, iter);  // 动态学习率
        for (size_t i = 0; i < result.params.size(); ++i) {
            velocity[i] = momentum * velocity[i] + lr * grad[i];  // 速度更新
            result.params[i] -= velocity[i];                      // 参数更新
        }

        result.iter_num = iter + 1;
        result.final_loss = current_loss;
    }

    return result;
}

3. 自适应学习率优化(Adam)

Adam 结合动量和自适应学习率,是目前最常用的梯度下降变体:

cpp 复制代码
// Adam优化器(简化版)
GDResult gradient_descent_adam(
    Vector init_params,
    function<double(const Vector&, const Matrix&)> loss_func,
    function<Vector(const Vector&, const Matrix&)> grad_func,
    const Matrix& data,
    const GDParams& gd_params,
    double beta1 = 0.9,    // 一阶矩衰减系数
    double beta2 = 0.999,  // 二阶矩衰减系数
    double eps = 1e-8      // 防止除零
) {
    GDResult result;
    result.params = init_params;
    result.final_loss = 0.0;
    result.iter_num = 0;
    result.converged = false;

    Vector m(init_params.size(), 0.0);  // 一阶矩(动量)
    Vector v(init_params.size(), 0.0);  // 二阶矩(自适应学习率)
    int n_samples = data.size();

    for (int iter = 0; iter < gd_params.max_iter; ++iter) {
        // 1. 选择批次数据
        Matrix batch_data;
        if (gd_params.batch_size == n_samples) {
            batch_data = data;
        } else if (gd_params.batch_size == 1) {
            int rand_idx = RandomGenerator::get_instance().rand_double(0, n_samples - 1);
            batch_data.push_back(data[rand_idx]);
        } else {
            vector<int> indices(n_samples);
            iota(indices.begin(), indices.end(), 0);
            shuffle(indices.begin(), indices.end(), RandomGenerator::get_instance().get_rng());
            for (int i = 0; i < min(gd_params.batch_size, n_samples); ++i) {
                batch_data.push_back(data[indices[i]]);
            }
        }

        // 2. 计算损失和梯度
        double current_loss = loss_func(result.params, batch_data);
        Vector grad = grad_func(result.params, batch_data);
        double grad_norm = vector_norm(grad);

        // 3. 打印信息
        if (gd_params.verbose && iter % gd_params.print_interval == 0) {
            cout << "迭代次数:" << iter 
                 << " | 损失值:" << fixed << setprecision(6) << current_loss
                 << " | 梯度模长:" << fixed << setprecision(8) << grad_norm << endl;
        }

        // 4. 收敛检查
        if (grad_norm < gd_params.tol) {
            result.converged = true;
            result.iter_num = iter;
            result.final_loss = current_loss;
            break;
        }

        // 5. Adam参数更新
        double lr = gd_params.learning_rate;
        double t = iter + 1;
        for (size_t i = 0; i < result.params.size(); ++i) {
            // 一阶矩和二阶矩更新
            m[i] = beta1 * m[i] + (1 - beta1) * grad[i];
            v[i] = beta2 * v[i] + (1 - beta2) * grad[i] * grad[i];
            // 偏差修正
            double m_hat = m[i] / (1 - pow(beta1, t));
            double v_hat = v[i] / (1 - pow(beta2, t));
            // 参数更新
            result.params[i] -= lr * m_hat / (sqrt(v_hat) + eps);
        }

        result.iter_num = iter + 1;
        result.final_loss = current_loss;
    }

    return result;
}

六、完整可运行代码

cpp 复制代码
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
#include <numeric>
#include <algorithm>
#include <iomanip>
#include <chrono>
#include <functional>
#include <stdexcept>
#include <string>

using namespace std;

// 向量和矩阵类型定义
using Vector = vector<double>;
using Matrix = vector<Vector>;

// 随机数生成器单例
class RandomGenerator {
public:
    static RandomGenerator& get_instance() {
        static RandomGenerator instance;
        return instance;
    }

    double rand_double(double min = 0.0, double max = 1.0) {
        uniform_real_distribution<double> dist(min, max);
        return dist(rng);
    }

    double rand_normal(double mean = 0.0, double stddev = 1.0) {
        normal_distribution<double> dist(mean, stddev);
        return dist(rng);
    }

    mt19937& get_rng() { return rng; }

private:
    RandomGenerator() {
        random_device rd;
        rng = mt19937(rd());
    }

    RandomGenerator(const RandomGenerator&) = delete;
    RandomGenerator& operator=(const RandomGenerator&) = delete;

    mt19937 rng;
};

// 向量工具函数
double dot_product(const Vector& a, const Vector& b) {
    if (a.size() != b.size()) {
        throw invalid_argument("向量维度不匹配");
    }
    double res = 0.0;
    for (size_t i = 0; i < a.size(); ++i) {
        res += a[i] * b[i];
    }
    return res;
}

Vector vector_add(const Vector& a, const Vector& b) {
    if (a.size() != b.size()) {
        throw invalid_argument("向量维度不匹配");
    }
    Vector res(a.size());
    for (size_t i = 0; i < a.size(); ++i) {
        res[i] = a[i] + b[i];
    }
    return res;
}

Vector scalar_multiply(double scalar, const Vector& vec) {
    Vector res(vec.size());
    for (size_t i = 0; i < vec.size(); ++i) {
        res[i] = scalar * vec[i];
    }
    return res;
}

double vector_norm(const Vector& vec) {
    double sum = 0.0;
    for (double v : vec) {
        sum += v * v;
    }
    return sqrt(sum);
}

void print_vector(const Vector& vec, const string& name) {
    cout << name << ":[";
    for (size_t i = 0; i < vec.size(); ++i) {
        cout << fixed << setprecision(6) << vec[i];
        if (i != vec.size() - 1) {
            cout << ", ";
        }
    }
    cout << "]" << endl;
}

void print_matrix(const Matrix& mat, const string& name) {
    cout << name << ":" << endl;
    for (const auto& row : mat) {
        cout << "[";
        for (size_t i = 0; i < row.size(); ++i) {
            cout << fixed << setprecision(4) << row[i];
            if (i != row.size() - 1) {
                cout << ", ";
            }
        }
        cout << "]" << endl;
    }
}

// 梯度下降参数配置
struct GDParams {
    double learning_rate = 0.01;
    int max_iter = 10000;
    double tol = 1e-6;
    int batch_size = 32;
    bool verbose = true;
    int print_interval = 1000;
};

// 梯度下降结果
struct GDResult {
    Vector params;
    double final_loss;
    int iter_num;
    bool converged;
};

// 基础梯度下降核心函数
GDResult gradient_descent(
    Vector init_params,
    function<double(const Vector&, const Matrix&)> loss_func,
    function<Vector(const Vector&, const Matrix&)> grad_func,
    const Matrix& data,
    const GDParams& gd_params
) {
    GDResult result;
    result.params = init_params;
    result.final_loss = 0.0;
    result.iter_num = 0;
    result.converged = false;

    int n_samples = data.size();
    if (n_samples == 0) {
        cerr << "错误:训练数据为空!" << endl;
        return result;
    }

    for (int iter = 0; iter < gd_params.max_iter; ++iter) {
        // 选择批次数据
        Matrix batch_data;
        if (gd_params.batch_size == n_samples) {
            batch_data = data;
        } else if (gd_params.batch_size == 1) {
            int rand_idx = RandomGenerator::get_instance().rand_double(0, n_samples - 1);
            batch_data.push_back(data[rand_idx]);
        } else {
            vector<int> indices(n_samples);
            iota(indices.begin(), indices.end(), 0);
            shuffle(indices.begin(), indices.end(), RandomGenerator::get_instance().get_rng());
            for (int i = 0; i < min(gd_params.batch_size, n_samples); ++i) {
                batch_data.push_back(data[indices[i]]);
            }
        }

        // 计算损失和梯度
        double current_loss = loss_func(result.params, batch_data);
        Vector grad = grad_func(result.params, batch_data);
        double grad_norm = vector_norm(grad);

        // 打印迭代信息
        if (gd_params.verbose && iter % gd_params.print_interval == 0) {
            cout << "迭代次数:" << iter 
                 << " | 损失值:" << fixed << setprecision(6) << current_loss
                 << " | 梯度模长:" << fixed << setprecision(8) << grad_norm << endl;
        }

        // 收敛检查
        if (grad_norm < gd_params.tol) {
            result.converged = true;
            result.iter_num = iter;
            result.final_loss = current_loss;
            break;
        }

        // 参数更新
        Vector update = scalar_multiply(gd_params.learning_rate, grad);
        for (size_t i = 0; i < result.params.size(); ++i) {
            result.params[i] -= update[i];
        }

        result.iter_num = iter + 1;
        result.final_loss = current_loss;
    }

    if (!result.converged && gd_params.verbose) {
        cout << "警告:达到最大迭代次数,未收敛!" << endl;
    }

    return result;
}

// ===================== 实战1:无约束优化(函数极小值) =====================
double target_function(const Vector& params) {
    if (params.size() != 2) {
        throw invalid_argument("参数必须为二维向量(x, y)");
    }
    double x = params[0];
    double y = params[1];
    return x*x + 2*y*y + 2*sin(x)*cos(y);
}

double loss_func_unconstrained(const Vector& params, const Matrix& /*data*/) {
    return target_function(params);
}

Vector grad_func_unconstrained(const Vector& params, const Matrix& /*data*/) {
    if (params.size() != 2) {
        throw invalid_argument("参数必须为二维向量(x, y)");
    }
    double x = params[0];
    double y = params[1];
    Vector grad(2);
    grad[0] = 2*x + 2*cos(x)*cos(y);
    grad[1] = 4*y - 2*sin(x)*sin(y);
    return grad;
}

void test_unconstrained_optimization() {
    cout << "===== 无约束优化:求解函数极小值 =====" << endl;

    Vector init_params = {
        RandomGenerator::get_instance().rand_double(-3.0, 3.0),
        RandomGenerator::get_instance().rand_double(-3.0, 3.0)
    };
    cout << "初始参数:";
    print_vector(init_params, "初始点(x, y)");
    cout << "初始函数值:" << fixed << setprecision(6) << target_function(init_params) << endl;

    GDParams gd_params;
    gd_params.learning_rate = 0.01;
    gd_params.max_iter = 20000;
    gd_params.tol = 1e-8;
    gd_params.verbose = true;
    gd_params.print_interval = 2000;
    gd_params.batch_size = 1;

    Matrix empty_data;
    GDResult result = gradient_descent(
        init_params,
        loss_func_unconstrained,
        grad_func_unconstrained,
        empty_data,
        gd_params
    );

    cout << "\n===== 优化结果 =====" << endl;
    print_vector(result.params, "极小值点(x, y)");
    cout << "最终函数值(极小值):" << fixed << setprecision(6) << result.final_loss << endl;
    cout << "迭代次数:" << result.iter_num << endl;
    cout << "是否收敛:" << (result.converged ? "是" : "否") << endl;
}

// ===================== 实战2:线性回归 =====================
Matrix generate_linear_regression_data(int n_samples, int n_features, double noise_std = 0.1) {
    Matrix data(n_samples, Vector(n_features + 1));

    Vector true_weights = {2.0, 3.0};
    double true_bias = 4.0;

    for (int i = 0; i < n_samples; ++i) {
        for (int j = 0; j < n_features; ++j) {
            data[i][j] = RandomGenerator::get_instance().rand_double(0.0, 10.0);
        }

        double y = true_bias;
        for (int j = 0; j < n_features; ++j) {
            y += true_weights[j] * data[i][j];
        }
        y += RandomGenerator::get_instance().rand_normal(0.0, noise_std);
        data[i][n_features] = y;
    }

    return data;
}

double loss_func_linear_regression(const Vector& params, const Matrix& batch_data) {
    int n_samples = batch_data.size();
    int n_features = batch_data[0].size() - 1;

    if (params.size() != n_features + 1) {
        throw invalid_argument("参数维度错误:应为" + to_string(n_features + 1) + "维");
    }

    double loss = 0.0;
    for (const auto& sample : batch_data) {
        double y_pred = params[0];
        for (int j = 0; j < n_features; ++j) {
            y_pred += params[j+1] * sample[j];
        }
        double y_true = sample[n_features];
        loss += (y_true - y_pred) * (y_true - y_pred);
    }
    return loss / n_samples;
}

Vector grad_func_linear_regression(const Vector& params, const Matrix& batch_data) {
    int n_samples = batch_data.size();
    int n_features = batch_data[0].size() - 1;

    if (params.size() != n_features + 1) {
        throw invalid_argument("参数维度错误:应为" + to_string(n_features + 1) + "维");
    }

    Vector grad(params.size(), 0.0);
    for (const auto& sample : batch_data) {
        double y_pred = params[0];
        for (int j = 0; j < n_features; ++j) {
            y_pred += params[j+1] * sample[j];
        }
        double y_true = sample[n_features];
        double error = y_true - y_pred;

        grad[0] -= 2 * error / n_samples;
        for (int j = 0; j < n_features; ++j) {
            grad[j+1] -= 2 * error * sample[j] / n_samples;
        }
    }

    return grad;
}

void test_linear_regression() {
    cout << "\n===== 线性回归:梯度下降拟合 =====" << endl;

    int n_samples = 1000;
    int n_features = 2;
    Matrix data = generate_linear_regression_data(n_samples, n_features);

    Vector init_params(n_features + 1);
    for (size_t i = 0; i < init_params.size(); ++i) {
        init_params[i] = RandomGenerator::get_instance().rand_normal(0.0, 0.1);
    }
    cout << "初始参数(b, w1, w2):";
    print_vector(init_params);

    GDParams gd_params;
    gd_params.learning_rate = 0.001;
    gd_params.max_iter = 10000;
    gd_params.tol = 1e-7;
    gd_params.verbose = true;
    gd_params.print_interval = 1000;
    gd_params.batch_size = 64;

    GDResult result = gradient_descent(
        init_params,
        loss_func_linear_regression,
        grad_func_linear_regression,
        data,
        gd_params
    );

    cout << "\n===== 线性回归拟合结果 =====" << endl;
    print_vector(result.params, "拟合参数[b, w1, w2]");
    cout << "真实参数:[4.0, 2.0, 3.0]" << endl;
    cout << "最终MSE损失:" << fixed << setprecision(6) << result.final_loss << endl;
    cout << "迭代次数:" << result.iter_num << endl;
    cout << "是否收敛:" << (result.converged ? "是" : "否") << endl;

    Vector test_sample = {5.0, 6.0};
    double y_pred = result.params[0] + result.params[1]*test_sample[0] + result.params[2]*test_sample[1];
    double y_true = 4.0 + 2.0*5.0 + 3.0*6.0;
    cout << "\n测试预测:" << endl;
    cout << "测试样本(x1, x2):[" << test_sample[0] << ", " << test_sample[1] << "]" << endl;
    cout << "预测值:" << fixed << setprecision(4) << y_pred << endl;
    cout << "真实值:" << fixed << setprecision(4) << y_true << endl;
    cout << "预测误差:" << fixed << setprecision(4) << abs(y_pred - y_true) << endl;
}

// 主函数
int main() {
    // 测试无约束优化
    test_unconstrained_optimization();

    // 测试线性回归
    test_linear_regression();

    return 0;
}

七、常见坑点与避坑指南

1. 学习率设置不当

  • :学习率过大导致震荡不收敛,过小导致收敛极慢;
  • 避坑:从0.01/0.001开始尝试,结合学习率衰减(指数/线性),或使用自适应学习率(Adam)。

2. 梯度计算错误

  • :手动推导梯度时符号错误、偏导数计算错误;
  • 避坑
    1. 对简单函数,用数值梯度验证( ∂ f ∂ x ≈ f ( x + ϵ ) − f ( x − ϵ ) 2 ϵ \frac{\partial f}{\partial x} \approx \frac{f(x+\epsilon) - f(x-\epsilon)}{2\epsilon} ∂x∂f≈2ϵf(x+ϵ)−f(x−ϵ));
    2. 复杂函数使用自动微分工具(如PyTorch)辅助验证。

3. 数据未归一化

  • :特征值范围差异大(如x1∈[0,1], x2∈[0,1000]),导致梯度更新不平衡;
  • 避坑 :对数据做归一化(Z-Score: x ′ = ( x − μ ) / σ x' = (x - \mu)/\sigma x′=(x−μ)/σ)或标准化(Min-Max: x ′ = ( x − m i n ) / ( m a x − m i n ) x' = (x - min)/(max - min) x′=(x−min)/(max−min))。

4. 梯度消失/爆炸

  • :深层神经网络中,梯度值趋近于0或无穷大;
  • 避坑:使用ReLU激活函数、权重初始化(Xavier/He)、梯度裁剪(Clip Gradient)。

5. 局部最优陷阱

  • :非凸函数中,梯度下降收敛到局部最优而非全局最优;
  • 避坑:多次随机初始化、使用带动量的梯度下降、加入噪声扰动。

八、总结

核心要点回顾

  1. 梯度下降核心 :沿损失函数梯度反方向更新参数,逐步逼近极小值点,核心公式 x t + 1 = x t − η ⋅ ∇ f ( x t ) \boldsymbol{x}_{t+1} = \boldsymbol{x}_t - \eta \cdot \nabla f(\boldsymbol{x}_t) xt+1=xt−η⋅∇f(xt);
  2. 核心变体
    • BGD:用全部数据计算梯度,收敛稳定但效率低;
    • SGD:用单个样本计算梯度,效率高但震荡;
    • MBGD:平衡BGD和SGD,工业界主流;
  3. 关键优化
    • 学习率衰减:动态降低学习率,加速收敛;
    • 动量:引入惯性,减少震荡;
    • Adam:结合动量和自适应学习率,适用性最广;
  4. 适用场景:函数极值求解、线性回归、神经网络训练等数值优化问题。

学习建议

  1. 先掌握基础梯度下降实现,理解迭代更新的核心逻辑;
  2. 用简单函数(如二次函数)验证梯度计算的正确性;
  3. 学习动量、Adam等优化变体,对比不同策略的收敛效果;
  4. 将梯度下降应用到实际问题(如线性回归、逻辑回归),掌握工程落地技巧。

梯度下降是数值优化的"入门

相关推荐
南梦浅1 小时前
全过程步骤(从零到高可用企业网络)
开发语言·网络·php
ok_hahaha1 小时前
java从头开始-黑马点评-基础篇
java·开发语言
吴声子夜歌2 小时前
JavaScript——函数
开发语言·javascript·ecmascript
yunyun321232 小时前
跨语言调用C++接口
开发语言·c++·算法
m0_518019482 小时前
C++中的装饰器模式变体
开发语言·c++·算法
SuperEugene2 小时前
Vue3 + Element Plus 中后台弹窗规范:开闭、传参、回调,告别弹窗地狱|Vue 组件与模板规范篇
开发语言·前端·javascript·vue.js·前端框架
SuperEugene2 小时前
VXE-Table 4.x 实战规范:列配置 + 合并单元格 + 虚拟滚动,避坑卡顿 / 错乱 / 合并失效|表单与表格规范篇
开发语言·前端·javascript·vue.js·前端框架·vxetable
xushichao19892 小时前
高性能密码学库
开发语言·c++·算法
偷懒下载原神2 小时前
【linux操作系统】信号
linux·运维·服务器·开发语言·c++·git·后端