LLM系列：2.pytorch入门：7.深层神经网络第一篇

FreeGo~2026-05-02 10:24

深层神经网络基础概念

深层神经网络（DNN）通过多个隐藏层实现复杂非线性映射。每一层由线性变换（权重矩阵乘法）和非线性激活函数组成，层间传递梯度通过反向传播算法更新参数。

网络结构设计

输入层维度需与数据特征匹配，隐藏层通常采用逐层降维或等宽设计。输出层维度由任务决定（如分类任务使用类别数）。PyTorch中通过nn.Module定义网络结构：

python 复制代码

import torch.nn as nn

class DNN(nn.Module):
    def __init__(self, input_dim, hidden_dims, output_dim):
        super().__init__()
        layers = []
        prev_dim = input_dim
        for dim in hidden_dims:
            layers.append(nn.Linear(prev_dim, dim))
            layers.append(nn.ReLU())
            prev_dim = dim
        self.net = nn.Sequential(*layers, nn.Linear(prev_dim, output_dim))
    
    def forward(self, x):
        return self.net(x)

激活函数选择

ReLU及其变体（LeakyReLU、PReLU）解决梯度消失问题：

ReLU: f(x)=max⁡(0,x)f(x) = \max(0,x)f(x)=max(0,x)
LeakyReLU: f(x)=max⁡(0.01x,x)f(x) = \max(0.01x,x)f(x)=max(0.01x,x)
Sigmoid/Tanh适用于输出层特定场景

参数初始化方法

Xavier初始化适配全连接层：

python 复制代码

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(0.01)
net.apply(init_weights)

批标准化技术

nn.BatchNorm1d加速深层网络训练：

python 复制代码

self.bn = nn.BatchNorm1d(hidden_dim)
def forward(self, x):
    return self.bn(self.linear(x))

梯度裁剪实现

防止梯度爆炸：

python 复制代码

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

正则化策略

L2正则通过优化器实现：

python 复制代码

optimizer = torch.optim.Adam(model.parameters(), weight_decay=1e-4)

Dropout层随机失活神经元：

python 复制代码

self.dropout = nn.Dropout(p=0.5)

损失函数配置

分类任务常用交叉熵：

python 复制代码

criterion = nn.CrossEntropyLoss()

回归任务采用MSE：

python 复制代码

criterion = nn.MSELoss()