Pytorch Tutorial

文章目录

[Pytorch Tutorial](#Pytorch Tutorial)
- [Chapter 2. Autograd](#Chapter 2. Autograd)
- - [1. Review Matrix Calculus](#1. Review Matrix Calculus)
  - - [1.1 Definition向量对向量求导](#1.1 Definition向量对向量求导)
    - [1.2 Definition标量对向量求导](#1.2 Definition标量对向量求导)
    - [1.3 Definition标量对矩阵求导](#1.3 Definition标量对矩阵求导)
  - 2.关于autograd的说明
  - [3. grad的计算](#3. grad的计算)
  - - [3.1 Manual手动计算](#3.1 Manual手动计算)
    - [3.2 backward()自动计算](#3.2 backward()自动计算)
- Reference

Chapter 2. Autograd

1. Review Matrix Calculus

1.1 Definition向量对向量求导

Define the derivative of a function mapping f : R n → R m f:\mathbb{R}^n\to\mathbb{R}^m f:Rn→Rm as the n × m n\times m n×m matrix of partial derivatives. That is, if x ∈ R n , f ( x ) ∈ R m x\in\mathbb{R}^n,f(x)\in\mathbb{R}^m x∈Rn,f(x)∈Rm, the derivative of f f f with respect to x x x is defined as
$\partial f \partial x$ i j = ∂ f i ∂ x i \begin{bmatrix} \frac{\partial f}{\partial x} \end{bmatrix}_{ij} = \frac{\partial f_i}{\partial x_i} $\partialx\partialf$ ij=∂xi∂fi

Let
x = $x 1 x 2 ⋮ x n$ , f ( x ) = $f 1 ( x ) f 2 ( x ) ⋮ f m ( x )$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}, f(x) = \begin{bmatrix} f_1(x) \\ f_2(x) \\ \vdots \\ f_m(x) \end{bmatrix} x= x1x2⋮xn ,f(x)= f1(x)f2(x)⋮fm(x)

then we define the Jacobian Matrix

∂ f ∂ x = $\partial f 1 \partial x 1 \partial f 2 \partial x 1 \dots \partial f m \partial x 1 \partial f 1 \partial x 2 \partial f 2 \partial x 2 \dots \partial f m \partial x 2 ⋮ ⋮ ⋱ ⋮ \partial f 1 \partial x n \partial f 2 \partial x n \dots \partial f m \partial x n$ \frac{\partial f}{\partial x} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_2}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_1} \\ \frac{\partial f_1}{\partial x_2} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial x_n} & \frac{\partial f_2}{\partial x_n} & \cdots & \frac{\partial f_m}{\partial x_n} \\ \end{bmatrix} ∂x∂f= ∂x1∂f1∂x2∂f1⋮∂xn∂f1∂x1∂f2∂x2∂f2⋮∂xn∂f2⋯⋯⋱⋯∂x1∂fm∂x2∂fm⋮∂xn∂fm

1.2 Definition标量对向量求导

If f f f is scalar, one has

∂ f ∂ x = $\partial f \partial x 1 \partial f \partial x 2 ⋮ \partial f \partial x n$ \frac{\partial f}{\partial x} = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \\ \end{bmatrix} ∂x∂f= ∂x1∂f∂x2∂f⋮∂xn∂f

这其实是一种分母布局

1.3 Definition标量对矩阵求导

Now we give some results on the derivative of scalar functions of a matrix. Let X = $x i j$ X= $x_{ij}$ X= $xij$ be a matrix of order m × n m\times n m×n and let y = f ( X ) y=f(X) y=f(X) be a scalar function of X X X. The derivative of y y y with respect to X X X, denoted by ∂ y ∂ X \frac{\partial y}{\partial X} ∂X∂y, is defined as the following matrix of order m × n m\times n m×n,
G = ∂ y ∂ X = $\partial y \partial x 11 \partial y \partial x 12 \dots \partial y \partial x 1 n \partial y \partial x 21 \partial y \partial x 22 \dots \partial y \partial x 2 n ⋮ ⋮ ⋱ ⋮ \partial y \partial x m 1 \partial y \partial x m 2 \dots \partial y \partial x m n$ = $\partial y \partial x i j$ G = \frac{\partial y}{\partial X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1n}} \\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2n}} \\ \vdots & \vdots & \ddots & \vdots& \\ \frac{\partial y}{\partial x_{m1}} & \frac{\partial y}{\partial x_{m2}} & \cdots & \frac{\partial y}{\partial x_{mn}} \end{bmatrix} = \Big $\\frac{\\partial y}{\\partial x_{ij}} \\Big$ G=∂X∂y= ∂x11∂y∂x21∂y⋮∂xm1∂y∂x12∂y∂x22∂y⋮∂xm2∂y⋯⋯⋱⋯∂x1n∂y∂x2n∂y⋮∂xmn∂y = $\partialxij\partialy$

2.关于autograd的说明

torch.Tensor 是包的核心类。如果将其属性 tensor.requires_grad 设置为 True，则会开始跟踪针对 tensor 的所有操作。完成计算后，您可以调用 tensor.backward() 来自动计算所有梯度。该张量的梯度将累积到 tensor.grad 属性中。

要停止 tensor 历史记录的跟踪，您可以调用 tensor.detach()，它将其与计算历史记录分离，并防止将来的计算被跟踪。

要停止跟踪历史记录（和使用内存），您还可以将代码块使用 with torch.no_grad(): 包装起来。在评估模型时，这是特别有用，因为模型在训练阶段具有 requires_grad = True 的可训练参数有利于调参，但在评估阶段我们不需要梯度。

还有一个类对于 autograd 实现非常重要那就是 Function。Tensor 和 Function 互相连接并构建一个非循环图，它保存整个完整的计算过程的历史信息。每个张量都有一个 tensor.grad_fn 属性保存着创建了张量的 Function 的引用，（如果用户自己创建张量，则 grad_fn=None）。

如果你想计算导数，你可以调用 tensor.backward()。如果 Tensor 是标量（即它包含一个元素数据），则不需要指定任何参数backward()，但是如果它有更多元素，则需要指定一个gradient 参数来指定张量的形状。

最后的计算结果保存在tensor.grad属性里

使用tensor.requires_grad在初始化时，设置跟踪梯度

python 复制代码

import torch
import numpy as np

python 复制代码

x = torch.ones(2,2, requires_grad=True)
print(x)

结果如下

python 复制代码

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

设置了跟踪梯度的tensor,将会出现tensor.grad_fn的属性，用于记录上次计算的Function

python 复制代码

y = torch.add(x, 1)
print(y)
print(y.grad_fn)

结果如下

python 复制代码

tensor([[2., 2.],
        [2., 2.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x0000020D723EBE80>

tensor.requires_grad_(True / False) 会改变张量的 requires_grad 标记。如果没有提供相应的参数输入的标记默认为 False。

python 复制代码

a = torch.randn(2,2)
a = (a * 3) / (a-1)
print(a)
a.requires_grad_(True)
print(a)
a = a + 1
print(a)

python 复制代码

tensor([[  0.0646, -46.3478],
        [  5.6683,  -0.8896]])
tensor([[  0.0646, -46.3478],
        [  5.6683,  -0.8896]], requires_grad=True)
tensor([[  1.0646, -45.3478],
        [  6.6683,   0.1104]], grad_fn=<AddBackward0>)

3. grad的计算

3.1 Manual手动计算

可以使用函数torch.autograd.grad()来手动计算梯度，详细可参考此处

例如计算 y = x 1 2 + x 2 2 + x 1 x 2 y = x_1^2 + x_2^2 + x_1x_2 y=x12+x22+x1x2的梯度

python 复制代码

x1 = torch.tensor(3., requires_grad=True)
x2 = torch.tensor(1., requires_grad=True)
y = x1**2+x2**2+x1*x2

# 求一阶导数
# torch.autograd.grad(y, x1,retain_graph=True, create_graph=True)
x1_1 = torch.autograd.grad(y, x1, retain_graph=True, create_graph=True)[0]
x2_1 = torch.autograd.grad(y, x2, retain_graph=True, create_graph=True)[0]
print(x1_1,x2_1)

# 求二阶混合偏导数
x1_11 = torch.autograd.grad(x1_1, x1)[0]
x1_12 = torch.autograd.grad(x1_1, x2)[0]
x2_21 = torch.autograd.grad(x2_1, x1)[0]
x2_22 = torch.autograd.grad(x2_1, x2)[0]
print(x1_11,x1_12,x2_21,x2_22)

结果如下

python 复制代码

tensor(7., grad_fn=<AddBackward0>) tensor(5., grad_fn=<AddBackward0>)
tensor(2.) tensor(1.) tensor(1.) tensor(2.)

3.2 backward()自动计算

当输出是标量scalar函数时

考虑如下的计算问题

python 复制代码

x = torch.ones(2,2, requires_grad=True)
y = x + 2
print(y)
z = y * y * 3
out = z.mean()
print(z, out)  #输出out是一个标量
out.backward()
print(x.grad)

输出是

python 复制代码

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

X = $x 1 x 2 x 3 x 4$ = $1 1 1 1$ X =\begin{bmatrix} x_1 & x_2 \\ x_3 & x_4 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} X= $x1x3x2x4$ = $1111$

中间变量是

Z = $z 1 z 2 z 3 z 4$ = $3 ( x 1 + 2 ) 2 3 ( x 1 + 2 ) 2 3 ( x 1 + 2 ) 2 3 ( x 1 + 2 ) 2$ Z =\begin{bmatrix} z_1 & z_2 \\ z_3 & z_4 \end{bmatrix} = \begin{bmatrix} 3(x_1+2)^2 & 3(x_1+2)^2 \\ 3(x_1+2)^2 & 3(x_1+2)^2 \end{bmatrix} Z= $z1z3z2z4$ = $3(x1+2)23(x1+2)23(x1+2)23(x1+2)2$

最后获得是输出是

out = 1 4 ∑ i = 1 z i = 1 4 ( z 1 + z 2 + z 3 + z 4 ) = 1 4 ( 3 ( x 1 + 2 ) 2 + 3 ( x 2 + 2 ) 2 + 3 ( x 3 + 2 ) 2 + 3 ( x 4 + 2 ) 2 ) = f ( x ) \begin{aligned} \text{out} & = \frac{1}{4}\sum_{i=1} z_i = \frac{1}{4}(z_1+z_2+z_3+z_4) \\ & = \frac{1}{4}(3(x_1+2)^2+3(x_2+2)^2+3(x_3+2)^2+3(x_4+2)^2) \\ & = f(\mathrm{x}) \end{aligned} out=41i=1∑zi=41(z1+z2+z3+z4)=41(3(x1+2)2+3(x2+2)2+3(x3+2)2+3(x4+2)2)=f(x)

其中将矩阵 X X X和矩阵 Z Z Z中的所有元素拼接为向量

x = $x 1 , x 2 , x 3 , x 4$ T z = $z 1 , z 2 , z 3 , z 4$ T \mathrm{x} = $x_1,x_2,x_3,x_4$ ^T \\ \mathrm{z} = $z_1,z_2,z_3,z_4$ ^T x= $x1,x2,x3,x4$ Tz= $z1,z2,z3,z4$ T

我们利用矩阵求导的链式法则

∂ f ∂ x = f ( x ) ∂ x = ∂ z ∂ x ∂ f ( x ) ∂ z \frac{\partial f}{\partial \mathrm{x}} = \frac{f(\mathrm{x})}{\partial \mathrm{x}} = \frac{\partial \mathrm{z}}{\partial \mathrm{x}} \frac{\partial f(\mathrm{x})}{\partial \mathrm{z}} ∂x∂f=∂xf(x)=∂x∂z∂z∂f(x)

再利用标量函数对矩阵导数的定义，则有

∂ f ∂ x = $\partial z 1 \partial x 1 \partial z 2 \partial x 1 \partial z 3 \partial x 1 \partial z 4 \partial x 1 \partial z 1 \partial x 2 \partial z 2 \partial x 2 \partial z 3 \partial x 2 \partial z 4 \partial x 2 \partial z 1 \partial x 3 \partial z 2 \partial x 3 \partial z 3 \partial x 3 \partial z 4 \partial x 3 \partial z 1 \partial x 4 \partial z 2 \partial x 4 \partial z 3 \partial x 4 \partial z 4 \partial x 4$ $\partial f \partial z 1 \partial f \partial z 2 \partial f \partial z 3 \partial f \partial z 4$ = $6 ( x 1 + 2 ) 0 0 0 0 6 ( x 2 + 2 ) 0 0 0 0 6 ( x 3 + 2 ) 0 0 0 0 6 ( x 4 + 2 )$ $1 4 1 4 1 4 1 4$ = $4.5 4.5 4.5 4.5$ \frac{\partial f}{\partial \mathrm{x}} = \begin{bmatrix} \frac{\partial z_1}{\partial x_1} & \frac{\partial z_2}{\partial x_1} & \frac{\partial z_3}{\partial x_1} & \frac{\partial z_4}{\partial x_1} \\ \frac{\partial z_1}{\partial x_2} & \frac{\partial z_2}{\partial x_2} & \frac{\partial z_3}{\partial x_2} & \frac{\partial z_4}{\partial x_2} \\ \frac{\partial z_1}{\partial x_3} & \frac{\partial z_2}{\partial x_3} & \frac{\partial z_3}{\partial x_3} & \frac{\partial z_4}{\partial x_3} \\ \frac{\partial z_1}{\partial x_4} & \frac{\partial z_2}{\partial x_4} & \frac{\partial z_3}{\partial x_4} & \frac{\partial z_4}{\partial x_4} \end{bmatrix} \begin{bmatrix} \frac{\partial f}{\partial z_1} \\ \frac{\partial f}{\partial z_2} \\ \frac{\partial f}{\partial z_3} \\ \frac{\partial f}{\partial z_4} \end{bmatrix}= \begin{bmatrix} 6(x_1+2) & 0 & 0 & 0 \\ 0 & 6(x_2+2) & 0 & 0 \\ 0 & 0 & 6(x_3+2) & 0 \\ 0 & 0 & 0 & 6(x_4+2) \end{bmatrix} \begin{bmatrix} \frac{1}{4} \\ \frac{1}{4} \\ \frac{1}{4} \\ \frac{1}{4} \end{bmatrix} = \begin{bmatrix} 4.5 \\ 4.5 \\ 4.5 \\ 4.5 \\ \end{bmatrix} ∂x∂f= ∂x1∂z1∂x2∂z1∂x3∂z1∂x4∂z1∂x1∂z2∂x2∂z2∂x3∂z2∂x4∂z2∂x1∂z3∂x2∂z3∂x3∂z3∂x4∂z3∂x1∂z4∂x2∂z4∂x3∂z4∂x4∂z4 ∂z1∂f∂z2∂f∂z3∂f∂z4∂f = 6(x1+2)00006(x2+2)00006(x3+2)00006(x4+2) 41414141 = 4.54.54.54.5

所以最后获得关于的矩阵 X X X的导数为

∂ f ∂ X = $\partial f \partial x 1 \partial f \partial x 2 \partial f \partial x 3 \partial f \partial x 4$ = $4.5 4.5 4.5 4.5$ \frac{\partial f}{\partial X} = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} \\ \frac{\partial f}{\partial x_3} & \frac{\partial f}{\partial x_4} \end{bmatrix} = \begin{bmatrix} 4.5 & 4.5 \\ 4.5 & 4.5 \\ \end{bmatrix} ∂X∂f= $\partialx1\partialf\partialx3\partialf\partialx2\partialf\partialx4\partialf$ = $4.54.54.54.5$

当输出是张量tensor函数时

python 复制代码

x = torch.tensor([[1.0, 2, 3],[4, 5, 6],[7, 8, 9]], requires_grad=True)
w = torch.tensor([[1.0, 2, 3],[4, 5, 6]], requires_grad=True)

y = torch.matmul(x,w.T)
print(y)
print(torch.ones_like(y))
y.backward(gradient = torch.ones_like(y))
print(x.grad)

输出是

python 复制代码

tensor([[ 14.,  32.],
        [ 32.,  77.],
        [ 50., 122.]], grad_fn=<MmBackward0>)
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([[5., 7., 9.],
        [5., 7., 9.],
        [5., 7., 9.]])

Y = X W T $y 11 y 21 y 12 y 22 y 13 y 23$ = $x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33$ $w 11 w 21 w 12 w 22 w 13 w 23$ $y 11 y 21 y 12 y 22 y 13 y 23$ = $( x 11 w 11 + x 12 w 12 + x 13 w 13 ) ( x 11 w 21 + x 12 w 22 + x 13 w 23 ) ( x 21 w 11 + x 22 w 12 + x 23 w 13 ) ( x 21 w 21 + x 22 w 22 + x 23 w 23 ) ( x 31 w 11 + x 32 w 12 + x 33 w 13 ) ( x 31 w 21 + x 32 w 22 + x 33 w 23 )$ Y = XW^T \\ \begin{bmatrix} y_{11} & y_{21} \\ y_{12} & y_{22} \\ y_{13} & y_{23} \\ \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\ \end{bmatrix} \begin{bmatrix} w_{11} & w_{21} \\ w_{12} & w_{22} \\ w_{13} & w_{23} \\ \end{bmatrix} \\ \begin{bmatrix} y_{11} & y_{21} \\ y_{12} & y_{22} \\ y_{13} & y_{23} \\ \end{bmatrix} = \begin{bmatrix} (x_{11}w_{11} + x_{12}w_{12} + x_{13}w_{13}) & (x_{11}w_{21} + x_{12}w_{22} + x_{13}w_{23}) \\ (x_{21}w_{11} + x_{22}w_{12} + x_{23}w_{13}) & (x_{21}w_{21} + x_{22}w_{22} + x_{23}w_{23}) \\ (x_{31}w_{11} + x_{32}w_{12} + x_{33}w_{13}) & (x_{31}w_{21} + x_{32}w_{22} + x_{33}w_{23}) \\ \end{bmatrix} Y=XWT y11y12y13y21y22y23 = x11x21x31x12x22x32x13x23x33 w11w12w13w21w22w23 y11y12y13y21y22y23 = (x11w11+x12w12+x13w13)(x21w11+x22w12+x23w13)(x31w11+x32w12+x33w13)(x11w21+x12w22+x13w23)(x21w21+x22w22+x23w23)(x31w21+x32w22+x33w23)

gradient=torch.ones_like(y)用于指定矩阵 Y Y Y中每一项的权重都为1，由矩阵 Y Y Y中元素加权得到的scalar函数为

f ( x , w ) = 1 × y 11 + 1 × y 12 + 1 × y 13 + 1 × y 21 + 1 × y 22 + 1 × y 23 , x = $x 11 , x 12 , x 13 , x 21 , x 22 , x 23 , x 31 , x 32 , x 33$ T w = $w 11 , w 12 , w 13 , w 21 , w 22 , w 23 , w 31 , w 32 , w 33$ T \begin{aligned} f(\mathrm{x},\mathrm{w}) & = 1\times y_{11}+1\times y_{12}+1\times y_{13}+1\times y_{21}+1\times y_{22}+1\times y_{23}, \\ & \mathrm{x} = $x_{11}, x_{12}, x_{13}, x_{21}, x_{22}, x_{23}, x_{31}, x_{32}, x_{33}$ ^T \\ & \mathrm{w} = $w_{11}, w_{12}, w_{13}, w_{21}, w_{22}, w_{23}, w_{31}, w_{32}, w_{33}$ ^T \end{aligned} f(x,w)=1×y11+1×y12+1×y13+1×y21+1×y22+1×y23,x= $x11,x12,x13,x21,x22,x23,x31,x32,x33$ Tw= $w11,w12,w13,w21,w22,w23,w31,w32,w33$ T

这里不包括复合求导，可以直接计算

∂ f ∂ x = $\partial f \partial x 11 , \partial f \partial x 12 , \partial f \partial x 13 , \partial f \partial x 21 , \partial f \partial x 22 , \partial f \partial x 23 , \partial f \partial x 31 , \partial f \partial x 32 , \partial f \partial x 33$ T ∂ f ∂ x = $w 11 + w 21 , w 12 + w 22 , w 13 + w 23 , w 11 + w 21 , w 12 + w 22 , w 13 + w 23 , w 11 + w 21 , w 12 + w 22 , w 13 + w 23$ T = $5 , 7 , 9 , 5 , 7 , 9 , 5 , 7 , 9$ T \begin{aligned} \frac{\partial f}{\partial \mathrm{x}} & = \Big $\\frac{\\partial f}{\\partial x_{11}}, \\frac{\\partial f}{\\partial x_{12}}, \\frac{\\partial f}{\\partial x_{13}}, \\frac{\\partial f}{\\partial x_{21}}, \\frac{\\partial f}{\\partial x_{22}}, \\frac{\\partial f}{\\partial x_{23}}, \\frac{\\partial f}{\\partial x_{31}}, \\frac{\\partial f}{\\partial x_{32}}, \\frac{\\partial f}{\\partial x_{33}} \\Big$ ^T \\ \frac{\partial f}{\partial \mathrm{x}} & = \Big $w_{11} + w_{21}, w_{12} + w_{22}, w_{13} + w_{23}, w_{11} + w_{21}, w_{12} + w_{22}, w_{13} + w_{23}, w_{11} + w_{21}, w_{12} + w_{22}, w_{13} + w_{23} \\Big$ ^T \\ & = $5, 7, 9, 5, 7, 9, 5, 7, 9$ ^T \end{aligned} ∂x∂f∂x∂f= $\partialx11\partialf,\partialx12\partialf,\partialx13\partialf,\partialx21\partialf,\partialx22\partialf,\partialx23\partialf,\partialx31\partialf,\partialx32\partialf,\partialx33\partialf$ T= $w11+w21,w12+w22,w13+w23,w11+w21,w12+w22,w13+w23,w11+w21,w12+w22,w13+w23$ T= $5,7,9,5,7,9,5,7,9$ T

再写成矩阵的形式则有

∂ f ∂ X = $5 7 9 5 7 9 5 7 9$ \frac{\partial f}{\partial X} = \begin{bmatrix} 5 & 7 & 9 \\ 5 & 7 & 9 \\ 5 & 7 & 9 \\ \end{bmatrix} ∂X∂f= 555777999

再考虑一个更一般求二阶导的情况

python 复制代码

x = torch.ones(3, requires_grad=True)
print(x)
y = x * 2
print(y)
z = y * 2
print(z)
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
z.backward(gradient=v)
print(x.grad)

结果如下

python 复制代码

tensor([1., 1., 1.], requires_grad=True)
tensor([2., 2., 2.], grad_fn=<MulBackward0>)
tensor([4., 4., 4.], grad_fn=<MulBackward0>)
tensor([4.0000e-01, 4.0000e+00, 4.0000e-04])

其中
x = $x 1 , x 2 , x 3$ T = $1 , 1 , 1$ T y = 2 x = $y 1 , y 2 , y 3$ T = $2 , 2 , 2$ T z = 2 y = $z 1 , z 2 , z 3$ T = $4 , 4 , 4$ T \begin{aligned} \mathrm{x} & = $x_1,x_2,x_3$ ^T = $1,1,1$ ^T \\ \mathrm{y} = 2\mathrm{x} & = $y_1,y_2,y_3$ ^T = $2,2,2$ ^T \\ \mathrm{z} = 2\mathrm{y} & = $z_1,z_2,z_3$ ^T = $4,4,4$ ^T \end{aligned} xy=2xz=2y= $x1,x2,x3$ T= $1,1,1$ T= $y1,y2,y3$ T= $2,2,2$ T= $z1,z2,z3$ T= $4,4,4$ T

若考虑gradient=torch.tensor([a1,a2,a3], dtype=torch.folat)，那么最终加权得到的scalar函数为
f = a 1 z 1 + a 2 z 2 + a 3 z 3 f = a_1 z_1 + a_2 z_2 + a_3 z_3 f=a1z1+a2z2+a3z3

那么对 x \mathrm{x} x求偏导则有
∂ f ∂ x = ∂ y ∂ x ∂ f ∂ y = $\partial y 1 \partial x 1 \partial y 2 \partial x 1 \partial y 3 \partial x 1 \partial y 1 \partial x 2 \partial y 2 \partial x 2 \partial y 3 \partial x 2 \partial y 1 \partial x 3 \partial y 2 \partial x 3 \partial y 3 \partial x 3$ $\partial f \partial y 1 \partial f \partial y 2 \partial f \partial y 3$ = $2 2 2$ $2 a 1 2 a 2 2 a 3$ = $4 a 1 , 4 a 2 , 4 a 3$ T \begin{aligned} \frac{\partial f}{\partial \mathrm{x}} & = \frac{\partial \mathrm{y}}{\partial \mathrm{x}} \frac{\partial f}{\partial \mathrm{y}} \\ & = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_3}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \frac{\partial y_3}{\partial x_2} \\ \frac{\partial y_1}{\partial x_3} & \frac{\partial y_2}{\partial x_3} & \frac{\partial y_3}{\partial x_3} \\ \end{bmatrix} \begin{bmatrix} \frac{\partial f}{\partial y_1} \\ \frac{\partial f}{\partial y_2} \\ \frac{\partial f}{\partial y_3} \\ \end{bmatrix} \\ & = \begin{bmatrix} 2 & & \\ & 2 & \\ & & 2 \end{bmatrix} \begin{bmatrix} 2 a_1 \\ 2 a_2 \\ 2 a_3 \\ \end{bmatrix} \\ & = $4a_1, 4a_2, 4a_3$ ^T \end{aligned} ∂x∂f=∂x∂y∂y∂f= ∂x1∂y1∂x2∂y1∂x3∂y1∂x1∂y2∂x2∂y2∂x3∂y2∂x1∂y3∂x2∂y3∂x3∂y3 ∂y1∂f∂y2∂f∂y3∂f = 222 2a12a22a3 = $4a1,4a2,4a3$ T

一些启示

Reference

参考教程1
参考教程2

Pytorch Tutorial【Chapter 2. Autograd】

Pytorch Tutorial

文章目录

Chapter 2. Autograd

1. Review Matrix Calculus

1.1 Definition向量对向量求导

1.2 Definition标量对向量求导

1.3 Definition标量对矩阵求导

2.关于autograd的说明

3. grad的计算

3.1 Manual手动计算

3.2 backward()自动计算

Reference