C++ 微积分 - 求导 - 自动微分（Automatic Differentiation）

flyfish

自动微分（Automatic Differentiation，简称 AD）是一种用于精确计算函数导数的技术。它结合了符号微分的准确性和数值微分的效率。自动微分的核心思想是利用计算图对函数进行分解，通过链式法则高效地计算导数，而无需进行符号运算或近似计算。自动微分能自动计算复杂函数的精确梯度。

C++ 微积分 - 求导 - 解析法（符号计算、符号微分）
C++ 微积分 - 求导 - 数值法

自动微分的基本概念

1 计算图：

自动微分将计算过程表示为一个有向无环图（DAG），其中节点表示变量或中间计算结果，边表示计算操作。通过这个图，可以追踪每个变量对输出的影响。

2 链式法则：

自动微分利用链式法则逐步计算导数。链式法则表示复合函数的导数为各个部分导数的乘积。

在计算图中，每个节点对输出的贡献可以通过链式法则从后往前累积计算。

3 前向模式和反向模式：

前向模式（Forward Mode）：逐个变量进行传播计算，适用于输入变量较少的情况。

反向模式（Reverse Mode）：从输出开始逐步传播导数，适用于输出变量较少的情况（如机器学习中的损失函数）。

自动微分提供精确的导数值，而不是近似值。与符号微分相比，自动微分在计算复杂函数时更高效。用户无需手动推导导数，可以直接获得函数的导数。

符号微分：处理复杂函数的导数推导可能非常复杂，容易导致表达式膨胀。
数值微分：容易受到舍入误差的影响，特别是在计算机浮点运算中。

自动微分的强大之处在于它可以自动地应用一系列简单的微积分规则来计算复杂函数的导数。通过重载运算符，Dual 结构体能够使用这些基本规则构建导数，而无需手动推导。

常见求导法则及其实现

1. 加法法则

对于两个函数 f ( x ) f(x) f(x) 和 g ( x ) g(x) g(x)，有：
( f ( x ) + g ( x ) ) ′ = f ′ ( x ) + g ′ ( x ) (f(x) + g(x))' = f'(x) + g'(x) (f(x)+g(x))′=f′(x)+g′(x)实现： 在 Dual 结构体中，两个 Dual 对象相加时，值和导数分别相加。

cpp 复制代码

Dual operator+(const Dual& other) const {
    return Dual(value + other.value, derivative + other.derivative);
}

2. 乘法法则

对于两个函数 f ( x ) f(x) f(x) 和 g ( x ) g(x) g(x)，有：
( f ( x ) ⋅ g ( x ) ) ′ = f ′ ( x ) ⋅ g ( x ) + f ( x ) ⋅ g ′ ( x ) (f(x) \cdot g(x))' = f'(x) \cdot g(x) + f(x) \cdot g'(x) (f(x)⋅g(x))′=f′(x)⋅g(x)+f(x)⋅g′(x)实现： 在 Dual 结构体中，两个 Dual 对象相乘时，使用乘积法则计算导数。

cpp 复制代码

Dual operator*(const Dual& other) const {
    return Dual(value * other.value, 
                value * other.derivative + derivative * other.value);
}

3. 商法则

对于两个函数 f ( x ) f(x) f(x) 和 g ( x ) g(x) g(x)，有：
( f ( x ) g ( x ) ) ′ = f ′ ( x ) ⋅ g ( x ) − f ( x ) ⋅ g ′ ( x ) ( g ( x ) ) 2 \left(\frac{f(x)}{g(x)}\right)' = \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{(g(x))^2} (g(x)f(x))′=(g(x))2f′(x)⋅g(x)−f(x)⋅g′(x)实现： 在 Dual 结构体中，两个 Dual 对象相除时，使用商法则计算导数。

cpp 复制代码

Dual operator/(const Dual& other) const {
    return Dual(value / other.value, 
                (derivative * other.value - value * other.derivative) / (other.value * other.value));
}

4. 链式法则

对于复合函数 f ( g ( x ) ) f(g(x)) f(g(x))，有：
( f ( g ( x ) ) ) ′ = f ′ ( g ( x ) ) ⋅ g ′ ( x ) (f(g(x)))' = f'(g(x)) \cdot g'(x) (f(g(x)))′=f′(g(x))⋅g′(x)

自动微分天然支持链式法则，因为每个操作都跟踪其导数，计算过程中自动应用链式法则。

使用基本规则计算复合函数的导数

展示如何使用这些基本规则计算复合函数 h ( x ) = ( x 2 + 1 ) ⋅ sin ⁡ ( x ) h(x) = (x^2 + 1) \cdot \sin(x) h(x)=(x2+1)⋅sin(x) 的导数。

cpp 复制代码

#include <iostream>
#include <cmath>


const double M_PI = 3.1415;
// 双数结构体，用于实现前向模式自动微分
struct Dual {
    double value;      // 函数值
    double derivative; // 导数值

    // 构造函数，初始化双数
    Dual(double v, double d) : value(v), derivative(d) {}

    // 重载加法运算符
    Dual operator+(const Dual& other) const {
        return Dual(value + other.value, derivative + other.derivative);
    }

    // 重载乘法运算符
    Dual operator*(const Dual& other) const {
        return Dual(value * other.value,
                    value * other.derivative + derivative * other.value);
    }

    // 重载正弦函数
    friend Dual sin(const Dual& x) {
        return Dual(std::sin(x.value), std::cos(x.value) * x.derivative);
    }
};

int main() {
    // 初始化 x 为一个双数，值为 π/4，导数为 1
    Dual x(M_PI / 4, 1.0);

    // 计算 h(x) = (x^2 + 1) * sin(x)
    Dual x_squared = x * x; // x^2
    Dual one(1.0, 0.0);     // 常数 1
    Dual h = (x_squared + one) * sin(x);

    // 输出结果
    std::cout << "h(x) 的值为: " << h.value << std::endl;
    std::cout << "h(x) 的导数为: " << h.derivative << std::endl;

    return 0;
}

自动微分法：使用前向模式自动微分来处理更复杂的函数，包括加法、乘法、除法、指数和对数函数。

结构体 Dual：
value 表示函数的值。
derivative 表示导数的值。

支持常见的运算符重载（加、减、乘、除、取反）以便进行代数操作。
数学函数支持：

实现了 exp 和 log 函数，分别表示指数和对数函数的自动微分。
复杂函数计算：
compute_function 函数实现了一个复杂的函数 f ( x ) = x 2 + 2 x + e x f(x) = x^2 + 2x + e^x f(x)=x2+2x+ex，并使用自动微分来计算其值和导数。

cpp 复制代码

#include <iostream>
#include <cmath>

// Dual number structure for automatic differentiation
struct Dual {
    double value;      // Function value
    double derivative; // Derivative value

    Dual(double v, double d) : value(v), derivative(d) {}

    // Overload addition
    Dual operator+(const Dual& other) const {
        return Dual(value + other.value, derivative + other.derivative);
    }

    // Overload subtraction
    Dual operator-(const Dual& other) const {
        return Dual(value - other.value, derivative - other.derivative);
    }

    // Overload multiplication
    Dual operator*(const Dual& other) const {
        return Dual(value * other.value, value * other.derivative + derivative * other.value);
    }

    // Overload division
    Dual operator/(const Dual& other) const {
        return Dual(value / other.value, 
                    (derivative * other.value - value * other.derivative) / (other.value * other.value));
    }

    // Overload unary minus
    Dual operator-() const {
        return Dual(-value, -derivative);
    }
};

// Exponential function
Dual exp(const Dual& x) {
    double exp_value = std::exp(x.value);
    return Dual(exp_value, exp_value * x.derivative);
}

// Logarithm function
Dual log(const Dual& x) {
    return Dual(std::log(x.value), x.derivative / x.value);
}

// Function to compute f(x) = x^2 + 2x + exp(x)
Dual compute_function(const Dual& x) {
    return x * x + Dual(2.0, 0.0) * x + exp(x);
}

int main() {
    // Initialize x = 1.0 with derivative 1.0 (i.e., d(x)/dx = 1)
    Dual x(1.0, 1.0);

    // Compute the function and its derivative
    Dual result = compute_function(x);

    std::cout << "Function value at x = " << x.value << " is " << result.value << std::endl;
    std::cout << "Derivative at x = " << x.value << " is " << result.derivative << std::endl;

    return 0;
}