经典CNN架构：LeNet、AlexNet、VGG、GoogLeNet、ResNet

文章目录

LeNet-5：CNN的开山之作
AlexNet：深度学习的革命
VGG：深度与简洁
GoogLeNet（Inception模块）
ResNet：解决深度网络训练难题
架构演进总结
- 各架构的关键创新
实际应用建议
- 选择合适的架构
- 迁移学习
总结

在了解了CNN的基本组件后，让我们深入研究一些经典的CNN架构。这些架构不仅在当时取得了突破性的性能，而且它们的设计思想至今仍影响着深度学习的发展。本文将详细介绍几个里程碑式的CNN架构，包括LeNet、AlexNet、VGG、GoogLeNet和ResNet。

LeNet-5：CNN的开山之作

LeNet-5由Yann LeCun在1998年提出，是第一个成功应用于实际应用的卷积神经网络。它主要用于识别手写数字（MNIST数据集）。

网络结构

LeNet-5的结构相对简单，包含7层（不包括输入层）：

复制代码

输入: 32×32×1（灰度图像）
↓
C1: 卷积层（6个5×5卷积核，步长1）→ 28×28×6
↓
S2: 下采样层（平均池化2×2，步长2）→ 14×14×6
↓
C3: 卷积层（16个5×5卷积核，步长1）→ 10×10×16
↓
S4: 下采样层（平均池化2×2，步长2）→ 5×5×16
↓
C5: 卷积层（120个5×5卷积核）→ 1×1×120
↓
F6: 全连接层（84个神经元）
↓
输出: 全连接层（10个神经元，对应10个数字）

输入

32×32×1
C1: 6@5×5

28×28×6
S2: AvgPool 2×2

14×14×6
C3: 16@5×5

10×10×16
S4: AvgPool 2×2

5×5×16
C5: 120@5×5

1×1×120
F6: FC 84

84
输出: FC 10

LeNet-5的创新点

使用卷积层：第一次成功应用卷积层提取图像特征
权重共享：大幅减少参数数量
下采样层：使用平均池化降低特征图尺寸
激活函数：使用Tanh激活函数
输出层：使用RBF激活函数进行分类

实现LeNet-5

python 复制代码

import numpy as np
import matplotlib.pyplot as plt

class LeNet5:
    """LeNet-5实现"""
    
    def __init__(self):
        # 简化实现，不包含所有细节
        self.layers = []
    
    def forward(self, x):
        """前向传播"""
        # C1: 6个5×5卷积核
        # 输入: 32×32×1 → 输出: 28×28×6
        c1 = self.conv2d(x, num_filters=6, kernel_size=5, stride=1)
        c1 = self.tanh(c1)
        
        # S2: 平均池化 2×2
        # 输入: 28×28×6 → 输出: 14×14×6
        s2 = self.avg_pool2d(c1, pool_size=2, stride=2)
        
        # C3: 16个5×5卷积核
        # 输入: 14×14×6 → 输出: 10×10×16
        c3 = self.conv2d(s2, num_filters=16, kernel_size=5, stride=1)
        c3 = self.tanh(c3)
        
        # S4: 平均池化 2×2
        # 输入: 10×10×16 → 输出: 5×5×16
        s4 = self.avg_pool2d(c3, pool_size=2, stride=2)
        
        # C5: 120个5×5卷积核
        # 输入: 5×5×16 → 输出: 1×1×120
        c5 = self.conv2d(s4, num_filters=120, kernel_size=5, stride=1)
        c5 = self.tanh(c5)
        
        # F6: 全连接层 84
        f6 = self.dense(c5.flatten(), input_size=120, output_size=84)
        f6 = self.tanh(f6)
        
        # 输出层: 全连接层 10
        output = self.dense(f6, input_size=84, output_size=10)
        
        return output
    
    def conv2d(self, x, num_filters, kernel_size, stride):
        """简化的卷积实现"""
        # 这里只是示意，实际实现需要完整的卷积操作
        in_channels, H, W = x.shape
        out_H = (H - kernel_size) // stride + 1
        out_W = (W - kernel_size) // stride + 1
        return np.random.randn(num_filters, out_H, out_W)
    
    def avg_pool2d(self, x, pool_size, stride):
        """平均池化"""
        # 简化实现
        C, H, W = x.shape
        out_H = (H - pool_size) // stride + 1
        out_W = (W - pool_size) // stride + 1
        return np.random.randn(C, out_H, out_W)
    
    def tanh(self, x):
        """Tanh激活函数"""
        return np.tanh(x)
    
    def dense(self, x, input_size, output_size):
        """全连接层"""
        weights = np.random.randn(output_size, input_size)
        bias = np.random.randn(output_size)
        return np.dot(weights, x) + bias

# 测试LeNet-5
lenet = LeNet5()
input_img = np.random.randn(1, 32, 32)
output = lenet.forward(input_img)

print(f"LeNet-5输入形状: {input_img.shape}")
print(f"LeNet-5输出形状: {output.shape}")

AlexNet：深度学习的革命

AlexNet由Alex Krizhevsky等人于2012年在ImageNet竞赛中获得冠军，标志着深度学习时代的开始。

网络结构

AlexNet是一个深度网络，包含8层（5个卷积层和3个全连接层）：

复制代码

输入: 224×224×3
↓
Conv1: 96个11×11卷积核，步长4，ReLU → 55×55×96
↓
MaxPool: 3×3，步长2 → 27×27×96
↓
Conv2: 256个5×5卷积核，ReLU → 27×27×256
↓
MaxPool: 3×3，步长2 → 13×13×256
↓
Conv3: 384个3×3卷积核，ReLU → 13×13×384
↓
Conv4: 384个3×3卷积核，ReLU → 13×13×384
↓
Conv5: 256个3×3卷积核，ReLU → 13×13×256
↓
MaxPool: 3×3，步长2 → 6×6×256
↓
FC6: 全连接层，4096个神经元，ReLU
↓
Dropout (p=0.5)
↓
FC7: 全连接层，4096个神经元，ReLU
↓
Dropout (p=0.5)
↓
FC8: 全连接层，1000个神经元（对应1000类）

输入

224×224×3
Conv1: 96@11×11

55×55×96
MaxPool

27×27×96
Conv2: 256@5×5

27×27×256
MaxPool

13×13×256
Conv3: 384@3×3

13×13×384
Conv4: 384@3×3

13×13×384
Conv5: 256@3×3

13×13×256
MaxPool

6×6×256
FC6: 4096

+Dropout
FC7: 4096

+Dropout
FC8: 1000

AlexNet的创新点

深度网络：第一个成功训练的深度CNN（8层）
ReLU激活函数：替代Sigmoid/Tanh，缓解梯度消失
Dropout：随机丢弃神经元，防止过拟合
数据增强：通过翻转、裁剪等扩充训练数据
GPU加速：使用两块GPU并行训练
局部响应归一化（LRN）：归一化局部特征

AlexNet的参数量

python 复制代码

def calculate_alexnet_params():
    """计算AlexNet的参数量"""
    
    params = {}
    
    # Conv1: 96个11×11×3卷积核 + 96个偏置
    params['Conv1'] = 96 * 11 * 11 * 3 + 96
    
    # Conv2: 256个5×5×48卷积核 + 256个偏置
    params['Conv2'] = 256 * 5 * 5 * 48 + 256
    
    # Conv3: 384个3×3×256卷积核 + 384个偏置
    params['Conv3'] = 384 * 3 * 3 * 256 + 384
    
    # Conv4: 384个3×3×192卷积核 + 384个偏置
    params['Conv4'] = 384 * 3 * 3 * 192 + 384
    
    # Conv5: 256个3×3×192卷积核 + 256个偏置
    params['Conv5'] = 256 * 3 * 3 * 192 + 256
    
    # FC6: 9216×4096 + 4096
    params['FC6'] = 9216 * 4096 + 4096
    
    # FC7: 4096×4096 + 4096
    params['FC7'] = 4096 * 4096 + 4096
    
    # FC8: 4096×1000 + 1000
    params['FC8'] = 4096 * 1000 + 1000
    
    total = sum(params.values())
    
    print("AlexNet各层参数量:")
    for layer, count in params.items():
        print(f"  {layer}: {count:,} ({count/total*100:.1f}%)")
    print(f"\n总参数量: {total:,}")
    
    return params

calculate_alexnet_params()

VGG：深度与简洁

VGG由牛津大学Visual Geometry Group提出，以其简洁的架构和优秀的性能著称。

VGG16网络结构

VGG16包含13个卷积层和3个全连接层：

复制代码

输入: 224×224×3
↓
Conv1-1: 64个3×3卷积核，ReLU → 224×224×64
↓
Conv1-2: 64个3×3卷积核，ReLU → 224×224×64
↓
MaxPool: 2×2 → 112×112×64
↓
Conv2-1: 128个3×3卷积核，ReLU → 112×112×128
↓
Conv2-2: 128个3×3卷积核，ReLU → 112×112×128
↓
MaxPool: 2×2 → 56×56×128
↓
Conv3-1: 256个3×3卷积核，ReLU → 56×56×256
↓
Conv3-2: 256个3×3卷积核，ReLU → 56×56×256
↓
Conv3-3: 256个3×3卷积核，ReLU → 56×56×256
↓
MaxPool: 2×2 → 28×28×256
↓
Conv4-1: 512个3×3卷积核，ReLU → 28×28×512
↓
Conv4-2: 512个3×3卷积核，ReLU → 28×28×512
↓
Conv4-3: 512个3×3卷积核，ReLU → 28×28×512
↓
MaxPool: 2×2 → 14×14×512
↓
Conv5-1: 512个3×3卷积核，ReLU → 14×14×512
↓
Conv5-2: 512个3×3卷积核，ReLU → 14×14×512
↓
Conv5-3: 512个3×3卷积核，ReLU → 14×14×512
↓
MaxPool: 2×2 → 7×7×512
↓
FC6: 全连接层 4096
↓
FC7: 全连接层 4096
↓
FC8: 全连接层 1000

输入

224×224×3
Block1: 2×64@3×3

112×112×64
Block2: 2×128@3×3

56×56×128
Block3: 3×256@3×3

28×28×256
Block4: 3×512@3×3

14×14×512
Block5: 3×512@3×3

7×7×512
FC6: 4096
FC7: 4096
FC8: 1000

VGG的设计思想

统一使用3×3卷积核：小卷积核堆叠代替大卷积核
- 2个3×3卷积核的感受野 = 1个5×5卷积核
- 3个3×3卷积核的感受野 = 1个7×7卷积核
- 参数量更少，非线性更强
深层网络：VGG16有16层，VGG19有19层
池化层位置固定：在每组卷积后使用最大池化

python 复制代码

def compare_kernel_sizes():
    """比较不同卷积核组合"""
    
    # 1个7×7卷积核
    params_7x7 = 7 * 7
    receptive_field_7x7 = 7
    
    # 3个3×3卷积核
    params_3x3 = 3 * (3 * 3)
    receptive_field_3x3 = 7  # 3+2+2
    
    print("感受野对比（7×7）:")
    print(f"  1个7×7卷积核:")
    print(f"    参数量: {params_7x7}")
    print(f"    感受野: {receptive_field_7x7}")
    print(f"  3个3×3卷积核:")
    print(f"    参数量: {params_3x3}")
    print(f"    感受野: {receptive_field_3x3}")
    print(f"  参数减少: {params_7x7/params_3x3:.2f}倍")

compare_kernel_sizes()

VGG的参数量

python 复制代码

def calculate_vgg_params():
    """计算VGG16的参数量"""
    
    params = {}
    
    # Block 1
    params['Conv1-1'] = 64 * 3 * 3 * 3 + 64
    params['Conv1-2'] = 64 * 3 * 3 * 64 + 64
    
    # Block 2
    params['Conv2-1'] = 128 * 3 * 3 * 64 + 128
    params['Conv2-2'] = 128 * 3 * 3 * 128 + 128
    
    # Block 3
    params['Conv3-1'] = 256 * 3 * 3 * 128 + 256
    params['Conv3-2'] = 256 * 3 * 3 * 256 + 256
    params['Conv3-3'] = 256 * 3 * 3 * 256 + 256
    
    # Block 4
    params['Conv4-1'] = 512 * 3 * 3 * 256 + 512
    params['Conv4-2'] = 512 * 3 * 3 * 512 + 512
    params['Conv4-3'] = 512 * 3 * 3 * 512 + 512
    
    # Block 5
    params['Conv5-1'] = 512 * 3 * 3 * 512 + 512
    params['Conv5-2'] = 512 * 3 * 3 * 512 + 512
    params['Conv5-3'] = 512 * 3 * 3 * 512 + 512
    
    # FC layers
    params['FC6'] = 512 * 7 * 7 * 4096 + 4096
    params['FC7'] = 4096 * 4096 + 4096
    params['FC8'] = 4096 * 1000 + 1000
    
    total = sum(params.values())
    
    print("VGG16各层参数量:")
    for layer, count in params.items():
        print(f"  {layer}: {count:,} ({count/total*100:.1f}%)")
    print(f"\n总参数量: {total:,}")
    
    return params

calculate_vgg_params()

GoogLeNet（Inception模块）

GoogLeNet由Google提出，首次引入了Inception模块，在减少参数的同时提高了性能。

Inception模块

Inception模块的核心思想是并行使用不同大小的卷积核，然后拼接结果：
输入
1×1 Conv
1×1 Conv → 3×3 Conv
1×1 Conv → 5×5 Conv
3×3 MaxPool → 1×1 Conv
拼接

Inception模块的优势

多尺度特征提取：不同大小的卷积核捕获不同尺度的特征
参数效率：1×1卷积降低维度，减少参数量
宽度增加：网络变得更宽，增加特征多样性

实现简化版Inception模块

python 复制代码

class InceptionModule:
    """简化版Inception模块"""
    
    def __init__(self, in_channels, out_channels_list):
        """
        out_channels_list: [branch1, branch2_reduce, branch2_out, 
                          branch3_reduce, branch3_out, branch4]
        """
        self.branch1_out = out_channels_list[0]
        self.branch2_reduce = out_channels_list[1]
        self.branch2_out = out_channels_list[2]
        self.branch3_reduce = out_channels_list[3]
        self.branch3_out = out_channels_list[4]
        self.branch4_out = out_channels_list[5]
    
    def forward(self, x):
        """前向传播"""
        # Branch 1: 1×1卷积
        branch1 = self.conv1x1(x, self.branch1_out)
        
        # Branch 2: 1×1卷积 → 3×3卷积
        branch2_reduce = self.conv1x1(x, self.branch2_reduce)
        branch2 = self.conv3x3(branch2_reduce, self.branch2_out)
        
        # Branch 3: 1×1卷积 → 5×5卷积
        branch3_reduce = self.conv1x1(x, self.branch3_reduce)
        branch3 = self.conv5x5(branch3_reduce, self.branch3_out)
        
        # Branch 4: 3×3最大池化 → 1×1卷积
        branch4_pool = self.maxpool3x3(x)
        branch4 = self.conv1x1(branch4_pool, self.branch4_out)
        
        # 拼接所有分支
        output = np.concatenate([branch1, branch2, branch3, branch4], axis=1)
        
        return output
    
    def conv1x1(self, x, out_channels):
        """1×1卷积"""
        # 简化实现
        return np.random.randn(out_channels, x.shape[1], x.shape[2], x.shape[3])
    
    def conv3x3(self, x, out_channels):
        """3×3卷积"""
        return np.random.randn(out_channels, x.shape[1], x.shape[2], x.shape[3])
    
    def conv5x5(self, x, out_channels):
        """5×5卷积"""
        return np.random.randn(out_channels, x.shape[1], x.shape[2], x.shape[3])
    
    def maxpool3x3(self, x):
        """3×3最大池化"""
        return np.random.randn(x.shape[0], x.shape[1], x.shape[2], x.shape[3])

# 测试Inception模块
inception = InceptionModule(192, [64, 96, 128, 16, 32, 32])
input_data = np.random.randn(1, 192, 28, 28)
output = inception.forward(input_data)

print(f"Inception模块输入形状: {input_data.shape}")
print(f"Inception模块输出形状: {output.shape}")
print(f"输出通道数: {output.shape[1]}")

ResNet：解决深度网络训练难题

ResNet（Residual Network）由微软研究院提出，通过残差连接解决了深度网络的训练难题。

残差连接

ResNet的核心创新是残差连接，允许梯度直接流过多个层。

普通块：y = F(x, {Wi})

残差块：y = F(x, {Wi}) + x
跳跃连接
输入 x
权重层 F
ReLU
输出

为什么残差连接有效？

梯度流动：跳跃连接允许梯度直接传播
恒等映射：如果F(x)很小，网络可以学习恒等映射
深度扩展：可以训练非常深的网络（如ResNet-152）

python 复制代码

class ResidualBlock:
    """残差块"""
    
    def __init__(self, in_channels, out_channels, stride=1):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.stride = stride
    
    def forward(self, x):
        """前向传播"""
        identity = x
        
        # 主路径
        out = self.conv3x3(x, self.out_channels, stride=self.stride)
        out = self.relu(out)
        out = self.conv3x3(out, self.out_channels)
        
        # 残差连接
        if self.in_channels != self.out_channels or self.stride != 1:
            identity = self.conv1x1(x, self.out_channels, stride=self.stride)
        
        out += identity
        out = self.relu(out)
        
        return out
    
    def conv3x3(self, x, out_channels, stride=1):
        """3×3卷积"""
        return np.random.randn(x.shape[0], out_channels, x.shape[2], x.shape[3])
    
    def conv1x1(self, x, out_channels, stride=1):
        """1×1卷积"""
        return np.random.randn(x.shape[0], out_channels, x.shape[2], x.shape[3])
    
    def relu(self, x):
        """ReLU激活"""
        return np.maximum(0, x)

# 测试残差块
res_block = ResidualBlock(64, 64)
input_data = np.random.randn(1, 64, 32, 32)
output = res_block.forward(input_data)

print(f"残差块输入形状: {input_data.shape}")
print(f"残差块输出形状: {output.shape}")

ResNet架构对比

python 复制代码

def compare_resnet_architectures():
    """比较不同ResNet架构"""
    
    configs = {
        'ResNet-18': [2, 2, 2, 2],  # 每个阶段的残差块数量
        'ResNet-34': [3, 4, 6, 3],
        'ResNet-50': [3, 4, 6, 3],  # 使用bottleneck
        'ResNet-101': [3, 4, 23, 3],
        'ResNet-152': [3, 4, 36, 3]
    }
    
    print("ResNet架构对比:")
    print(f"{'架构':<12} {'层数':<8} {'参数量(万)':<12}")
    print("-" * 35)
    
    for name, blocks in configs.items():
        if '50' in name or '101' in name or '152' in name:
            # Bottleneck设计，每个块3层
            layers = 1 + sum([3 * b for b in blocks])
            params = 2500  # ResNet-50约2560万参数
        else:
            # Basic设计，每个块2层
            layers = 1 + sum([2 * b for b in blocks])
            params = 1100  # ResNet-18约1170万参数
        
        print(f"{name:<12} {layers:<8} {params:<12}")

compare_resnet_architectures()

架构演进总结

1998 LeNet-5

7层

MNIST
2012 AlexNet

8层

ImageNet
2014 VGG

16/19层

统一3×3
2014 GoogLeNet

22层

Inception
2015 ResNet

152层

残差连接

各架构的关键创新

架构	年份	关键创新	特点
LeNet-5	1998	卷积层	CNN的开山之作
AlexNet	2012	ReLU, Dropout	深度学习革命
VGG	2014	3×3卷积核	简洁有效
GoogLeNet	2014	Inception模块	多尺度特征
ResNet	2015	残差连接	解决深度训练难题

实际应用建议

选择合适的架构

python 复制代码

def choose_architecture(task, data_size, compute_budget):
    """架构选择建议"""
    print("架构选择建议:")
    print("\n1. 根据任务复杂度:")
    print("   - 简单任务（MNIST等）：LeNet-5、VGG11")
    print("   - 中等任务（CIFAR-10）：VGG16、ResNet18")
    print("   - 复杂任务（ImageNet）：ResNet50、EfficientNet")
    
    print("\n2. 根据数据量:")
    print("   - 小数据量（<1000）：浅层网络 + 迁移学习")
    print("   - 中等数据量（1000-10000）：VGG16、ResNet18")
    print("   - 大数据量（>10000）：ResNet50+、预训练模型")
    
    print("\n3. 根据计算资源:")
    print("   - CPU/低GPU：MobileNet、ShuffleNet")
    print("   - 单GPU：ResNet18、ResNet34")
    print("   - 多GPU：ResNet50+、EfficientNet-B4+")

choose_architecture(None, None, None)

迁移学习

使用预训练模型是新任务的最佳实践：

python 复制代码

def transfer_learning_workflow():
    """迁移学习工作流"""
    print("迁移学习步骤:")
    print("1. 选择预训练模型（如ResNet50预训练在ImageNet）")
    print("2. 冻结大部分层，只训练最后几层")
    print("3. 替换最后的全连接层以适应新任务")
    print("4. 使用较小的学习率微调")
    print("5. 可选：解冻更多层进行端到端微调")

transfer_learning_workflow()

总结

本文介绍了几个经典的CNN架构：

LeNet-5：

CNN的开山之作
奠定了CNN的基本结构
应用于手写数字识别

AlexNet：

深度学习革命的开端
引入ReLU和Dropout
使用GPU加速训练

VGG：

统一使用3×3卷积核
架构简洁，易于理解
参数量大，但性能优秀

GoogLeNet：

引入Inception模块
多尺度特征提取
参数效率高

ResNet：

残差连接解决深度训练难题
可以训练超深网络（152层）
现代深度学习的基础

这些架构不仅在当时取得了突破性的成果，而且它们的设计思想至今仍影响着深度学习的发展。理解这些经典架构，有助于我们设计更好的网络，并为后续学习更先进的模型打下基础。

在实际应用中，我们通常不会从头开始训练这些网络，而是使用预训练模型进行迁移学习。这样可以大大减少训练时间，并在数据量有限的情况下获得更好的性能。