《动手学深度学习 Pytorch版》 5.3 延后初始化

python 复制代码
import torch
from torch import nn
from d2l import torch as d2l

下面实例化的多层感知机的输入维度是未知的,因此框架尚未初始化任何参数,显示为"UninitializedParameter"。

python 复制代码
net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

net[0].weight
复制代码
c:\Software\Miniconda3\envs\d2l\lib\site-packages\torch\nn\modules\lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.
  warnings.warn('Lazy modules are a new feature under heavy development '





<UninitializedParameter>

一旦指定了输入维度,框架就可以一层一层的延迟初始化。

python 复制代码
X = torch.rand(2, 20)
net(X)

net[0].weight.shape
复制代码
torch.Size([256, 20])

练习

(1)如果指定了第一层的输入维度,但没有指定后续层的维度,会发生什么?是否立即进行初始化?

python 复制代码
net = nn.Sequential(
    nn.Linear(20, 256), nn.ReLU(),
    nn.LazyLinear(128), nn.ReLU(),
    nn.LazyLinear(10)
)
net[0].weight, net[2].weight, net[4].weight
复制代码
c:\Software\Miniconda3\envs\d2l\lib\site-packages\torch\nn\modules\lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.
  warnings.warn('Lazy modules are a new feature under heavy development '





(Parameter containing:
 tensor([[ 0.1332,  0.1372, -0.0939,  ..., -0.0579, -0.0911, -0.1820],
         [-0.1570, -0.0993, -0.0685,  ..., -0.0469, -0.0208,  0.0665],
         [ 0.0861,  0.1135,  0.1631,  ..., -0.1407,  0.1088, -0.2052],
         ...,
         [-0.1454, -0.0283, -0.1074,  ..., -0.2164, -0.2169,  0.1913],
         [-0.1617,  0.1206, -0.2119,  ..., -0.1862, -0.0951,  0.1535],
         [-0.0229, -0.2133, -0.1027,  ...,  0.1973,  0.1314,  0.1283]],
        requires_grad=True),
 <UninitializedParameter>,
 <UninitializedParameter>)
python 复制代码
net(X)  # 延迟初始化
net[0].weight.shape, net[2].weight.shape, net[4].weight.shape
复制代码
(torch.Size([256, 20]), torch.Size([128, 256]), torch.Size([10, 128]))

(2)如果指定了不匹配的维度会发生什么?

python 复制代码
X = torch.rand(2, 10)
net(X)  # 会报错
复制代码
---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

Cell In[14], line 2
      1 X = torch.rand(2, 10)
----> 2 net(X)


File c:\Software\Miniconda3\envs\d2l\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []


File c:\Software\Miniconda3\envs\d2l\lib\site-packages\torch\nn\modules\container.py:139, in Sequential.forward(self, input)
    137 def forward(self, input):
    138     for module in self:
--> 139         input = module(input)
    140     return input


File c:\Software\Miniconda3\envs\d2l\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []


File c:\Software\Miniconda3\envs\d2l\lib\site-packages\torch\nn\modules\linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)


RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x10 and 20x256)

(3)如果输入具有不同的维度,需要做什么?

调整维度,要么填充,要么降维。

相关推荐
牧子川7 分钟前
009-Transformer-Architecture
人工智能·深度学习·transformer
covco25 分钟前
矩阵管理系统指南:拆解星链引擎的架构设计与全链路落地实践
大数据·人工智能·矩阵
沪漂阿龙29 分钟前
AI大模型面试题:支持向量机是什么?间隔最大化、软间隔、核函数、LinearSVC 全面拆解
人工智能·算法·支持向量机
lifewange30 分钟前
AI编写测试用例工具介绍
人工智能·测试用例
陕西字符33 分钟前
2026 西安 豆包获客优化技术深度解析:企来客科技 AI 全域获客系统测评
大数据·人工智能
掘金安东尼36 分钟前
GGUF、GPTQ、AWQ、EXL2、MLX、VMLX...运行大模型,为什么会有这么多格式?
人工智能
新知图书37 分钟前
市场分析报告自动化生成(使用千问)
人工智能·ai助手·千问·高效办公
无心水39 分钟前
【Hermes:安全、权限与生产环境】38、Hermes Agent 安全四层纵深:最小权限原则从理论到落地的完全指南
人工智能·安全·mcp协议·openclaw·养龙虾·hermes·honcho
旦莫1 小时前
AI驱动的纯视觉自动化测试:知识库里应该积累什么知识内容
人工智能·python·测试开发·pytest·ai测试
dfsj660111 小时前
第四章:深度学习革命
人工智能·深度学习