卷积神经网络（CNN）架构详解

文章目录

为什么需要CNN？
- 全连接网络处理图像的问题
- CNN的核心思想
卷积层：特征提取的核心
池化层：降维与特征选择
- [最大池化（Max Pooling）](#最大池化（Max Pooling）)
- [平均池化（Average Pooling）](#平均池化（Average Pooling）)
- 全局平均池化
全连接层：最终决策
完整的CNN层
- 构建简单的CNN
理解卷积核学到了什么
- 边缘检测卷积核
- 可视化训练后的卷积核
CNN的典型架构
总结

卷积神经网络是深度学习在计算机视觉领域最成功的模型之一。与全连接神经网络相比，CNN通过卷积操作和池化操作，能够有效地处理图像数据，提取空间特征。本文将深入讲解CNN的基本概念，包括卷积层、池化层和全连接层的工作原理。

为什么需要CNN？

在介绍CNN之前，让我们先思考一个问题：为什么普通的神经网络不擅长处理图像？

全连接网络处理图像的问题

假设我们要处理一张28×28的灰度图像，输入就是784个像素值。如果使用全连接网络：

输入层：784个神经元
隐藏层1：1000个神经元
权重参数：784 × 1000 = 784,000个参数

如果图像尺寸增加到224×224（类似VGG输入），那么：

输入层：50,176个神经元
隐藏层1：4096个神经元
权重参数：50,176 × 4096 ≈ 2.05亿个参数！

这会导致两个严重问题：

参数爆炸：计算量大，容易过拟合
忽略空间信息：全连接破坏了图像的空间结构

python 复制代码

# 对比全连接网络和CNN的参数量
def compare_params():
    """比较全连接网络和CNN的参数量"""
    
    # 全连接网络
    fc_input_size = 224 * 224 * 3  # RGB图像
    fc_hidden_size = 4096
    fc_params = fc_input_size * fc_hidden_size
    
    # CNN（假设使用5×5卷积核，32个通道）
    cnn_kernel_size = 5 * 5 * 3  # 5×5×3
    cnn_num_kernels = 32
    cnn_params = cnn_kernel_size * cnn_num_kernels
    
    print("参数量对比:")
    print(f"全连接网络: {fc_params:,} 个参数")
    print(f"CNN: {cnn_params:,} 个参数")
    print(f"CNN减少了: {fc_params / cnn_params:.1f} 倍")

compare_params()

CNN的核心思想

CNN的提出基于两个核心洞察：

局部连接：每个神经元只连接输入的一小部分区域
权重共享：使用相同的卷积核扫描整个图像

这两个设计使得CNN能够：

大幅减少参数量
保持图像的空间结构
提取局部特征（如边缘、纹理）

图像
卷积层

局部连接+权重共享
池化层

降维+特征选择
全连接层

分类/回归
输出

卷积层：特征提取的核心

卷积层是CNN最核心的组件，通过卷积操作从输入中提取特征。

卷积操作原理

卷积操作使用一个小型的矩阵（卷积核或滤波器）在输入图像上滑动，在每个位置执行元素级乘法和求和。

对于单通道图像：

复制代码

输入图像（5×5）:        卷积核（3×3）:        输出（3×3）:
[1 1 1 0 0]           [1 0 1]              [4 3 4]
[0 1 1 1 0]           [0 1 0]              [2 4 3]
[0 0 1 1 1]           [1 0 1]              [2 3 4]
[0 0 1 1 0]
[0 1 1 0 0]

计算过程（以位置(1,1)为例）:
1×1 + 1×0 + 1×1 + 0×0 + 1×1 + 1×0 + 0×1 + 0×0 + 1×1 = 4

python 复制代码

def manual_conv2d(input_matrix, kernel):
    """
    手动实现2D卷积
    input_matrix: 输入矩阵 (H, W)
    kernel: 卷积核 (KH, KW)
    """
    H, W = input_matrix.shape
    KH, KW = kernel.shape
    
    # 计算输出尺寸
    OH = H - KH + 1
    OW = W - KW + 1
    output = np.zeros((OH, OW))
    
    # 卷积操作
    for i in range(OH):
        for j in range(OW):
            output[i, j] = np.sum(input_matrix[i:i+KH, j:j+KW] * kernel)
    
    return output

# 示例
input_img = np.array([
    [1, 1, 1, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 1, 1, 1],
    [0, 0, 1, 1, 0],
    [0, 1, 1, 0, 0]
])

kernel = np.array([
    [1, 0, 1],
    [0, 1, 0],
    [1, 0, 1]
])

output = manual_conv2d(input_img, kernel)

print("输入图像:")
print(input_img)
print("\n卷积核:")
print(kernel)
print("\n卷积输出:")
print(output)

多通道卷积

真实图像通常是RGB三通道的，卷积核也需要相应地处理多通道输入。

对于多通道输入：

输入形状：(C, H, W)，C是通道数
卷积核形状：(C, KH, KW)
输出形状：(OH, OW)

如果使用多个卷积核，每个卷积核产生一个输出通道：

多个卷积核形状：(N, C, KH, KW)，N是卷积核数量
输出形状：(N, OH, OW)

输入图像

3×H×W
卷积核1

3×3×3
卷积核2

3×3×3
卷积核3

3×3×3
输出通道1
输出通道2
输出通道3
输出特征图

3×H'×W'

python 复制代码

def conv2d_multi_channel(input_matrix, kernels):
    """
    多通道卷积
    input_matrix: 输入 (C, H, W)
    kernels: 卷积核 (N, C, KH, KW)
    """
    C, H, W = input_matrix.shape
    N, _, KH, KW = kernels.shape
    
    # 计算输出尺寸
    OH = H - KH + 1
    OW = W - KW + 1
    output = np.zeros((N, OH, OW))
    
    # 对每个卷积核
    for n in range(N):
        for i in range(OH):
            for j in range(OW):
                # 在所有通道上求和
                output[n, i, j] = np.sum(
                    input_matrix[:, i:i+KH, j:j+KW] * kernels[n]
                )
    
    return output

# 示例：RGB图像卷积
input_rgb = np.random.randn(3, 6, 6)  # 3通道，6×6
kernels = np.random.randn(4, 3, 3, 3)  # 4个卷积核，每个3×3×3
output = conv2d_multi_channel(input_rgb, kernels)

print(f"输入形状: {input_rgb.shape}")
print(f"卷积核形状: {kernels.shape}")
print(f"输出形状: {output.shape}")

步长（Stride）

步长控制卷积核在输入上滑动的步长。步长越大，输出尺寸越小。

python 复制代码

def conv2d_stride(input_matrix, kernel, stride=1):
    """
    带步长的卷积
    stride: 步长
    """
    H, W = input_matrix.shape
    KH, KW = kernel.shape
    
    # 计算输出尺寸
    OH = (H - KH) // stride + 1
    OW = (W - KW) // stride + 1
    output = np.zeros((OH, OW))
    
    for i in range(OH):
        for j in range(OW):
            output[i, j] = np.sum(
                input_matrix[i*stride:i*stride+KH, j*stride:j*stride+KW] * kernel
            )
    
    return output

# 比较不同步长的效果
input_img = np.random.randn(10, 10)
kernel = np.random.randn(3, 3)

for stride in [1, 2, 3]:
    output = conv2d_stride(input_img, kernel, stride)
    print(f"步长={stride}: 输入{input_img.shape} -> 输出{output.shape}")

填充（Padding）

填充在输入周围添加零，可以控制输出尺寸。
无填充

5×5 → 3×3
Valid Padding
填充1层

5×5 → 5×5
Same Padding
填充2层

5×5 → 7×7
Full Padding

python 复制代码

def conv2d_padding(input_matrix, kernel, padding=0):
    """
    带填充的卷积
    padding: 填充的圈数
    """
    H, W = input_matrix.shape
    KH, KW = kernel.shape
    
    # 添加填充
    if padding > 0:
        padded = np.pad(input_matrix, padding, mode='constant')
    else:
        padded = input_matrix
    
    PH, PW = padded.shape
    OH = PH - KH + 1
    OW = PW - KW + 1
    output = np.zeros((OH, OW))
    
    for i in range(OH):
        for j in range(OW):
            output[i, j] = np.sum(padded[i:i+KH, j:j+KW] * kernel)
    
    return output

# 示例
input_img = np.random.randn(5, 5)
kernel = np.random.randn(3, 3)

for padding in [0, 1, 2]:
    output = conv2d_padding(input_img, kernel, padding)
    print(f"填充={padding}: 输入{input_img.shape} -> 输出{output.shape}")

输出尺寸公式

综合步长和填充，输出尺寸的计算公式为：

OH = ⌊(H + 2P - KH) / S⌋ + 1

OW = ⌊(W + 2P - KW) / S⌋ + 1

其中：

H, W：输入的高和宽
KH, KW：卷积核的高和宽
P：填充
S：步长
⌊⌋：向下取整

python 复制代码

def calculate_output_size(input_size, kernel_size, padding=0, stride=1):
    """
    计算卷积输出尺寸
    """
    return (input_size + 2 * padding - kernel_size) // stride + 1

# 示例计算
print("不同配置的输出尺寸:")
for kernel_size in [3, 5, 7]:
    for stride in [1, 2]:
        for padding in [0, 1, 2]:
            output_size = calculate_output_size(28, kernel_size, padding, stride)
            print(f"  输入28, 核{kernel_size}, 步{stride}, 填{padding} -> 输出{output_size}")

激活函数

卷积后通常会接激活函数，最常用的是ReLU。

python 复制代码

def conv2d_with_activation(input_matrix, kernel, activation='relu'):
    """
    卷积 + 激活函数
    """
    # 卷积
    output = manual_conv2d(input_matrix, kernel)
    
    # 激活函数
    if activation == 'relu':
        output = np.maximum(0, output)
    elif activation == 'sigmoid':
        output = 1 / (1 + np.exp(-output))
    elif activation == 'tanh':
        output = np.tanh(output)
    
    return output

# 可视化激活前后的效果
import matplotlib.pyplot as plt

input_img = np.random.randn(10, 10)
kernel = np.array([
    [-1, -1, -1],
    [-1, 8, -1],
    [-1, -1, -1]
])

output_raw = manual_conv2d(input_img, kernel)
output_relu = conv2d_with_activation(input_img, kernel, activation='relu')

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].imshow(input_img, cmap='gray')
axes[0].set_title('输入')
axes[1].imshow(output_raw, cmap='gray')
axes[1].set_title('卷积输出（未激活）')
axes[2].imshow(output_relu, cmap='gray')
axes[2].set_title('ReLU激活后')
for ax in axes:
    ax.axis('off')
plt.tight_layout()
plt.show()

池化层：降维与特征选择

池化层的作用是降低特征图的尺寸，减少参数量，同时增强特征的平移不变性。

最大池化（Max Pooling）

最大池化在每个池化窗口中取最大值，保留最显著的特征。

python 复制代码

def max_pool2d(input_matrix, pool_size=2, stride=2):
    """
    最大池化
    pool_size: 池化窗口大小
    stride: 步长
    """
    H, W = input_matrix.shape
    OH = (H - pool_size) // stride + 1
    OW = (W - pool_size) // stride + 1
    output = np.zeros((OH, OW))
    
    for i in range(OH):
        for j in range(OW):
            window = input_matrix[i*stride:i*stride+pool_size, 
                               j*stride:j*stride+pool_size]
            output[i, j] = np.max(window)
    
    return output

# 示例
input_img = np.array([
    [1, 3, 2, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    [13, 14, 15, 16]
])

pooled = max_pool2d(input_img, pool_size=2, stride=2)

print("输入:")
print(input_img)
print("\n最大池化输出（2×2，步长2）:")
print(pooled)

平均池化（Average Pooling）

平均池化在每个池化窗口中取平均值。

python 复制代码

def avg_pool2d(input_matrix, pool_size=2, stride=2):
    """
    平均池化
    """
    H, W = input_matrix.shape
    OH = (H - pool_size) // stride + 1
    OW = (W - pool_size) // stride + 1
    output = np.zeros((OH, OW))
    
    for i in range(OH):
        for j in range(OW):
            window = input_matrix[i*stride:i*stride+pool_size, 
                               j*stride:j*stride+pool_size]
            output[i, j] = np.mean(window)
    
    return output

# 对比最大池化和平均池化
input_img = np.random.randn(8, 8)
max_pooled = max_pool2d(input_img, pool_size=2, stride=2)
avg_pooled = avg_pool2d(input_img, pool_size=2, stride=2)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].imshow(input_img, cmap='gray')
axes[0].set_title(f'输入 {input_img.shape}')
axes[1].imshow(max_pooled, cmap='gray')
axes[1].set_title(f'最大池化 {max_pooled.shape}')
axes[2].imshow(avg_pooled, cmap='gray')
axes[2].set_title(f'平均池化 {avg_pooled.shape}')
for ax in axes:
    ax.axis('off')
plt.tight_layout()
plt.show()

全局平均池化

全局平均池化对整个特征图取平均，将每个通道缩减为一个数值。

python 复制代码

def global_avg_pool2d(input_matrix):
    """
    全局平均池化
    input_matrix: (C, H, W)
    输出: (C,)
    """
    C, H, W = input_matrix.shape
    output = np.mean(input_matrix.reshape(C, -1), axis=1)
    return output

# 示例
input_features = np.random.randn(64, 7, 7)  # 64个7×7的特征图
output = global_avg_pool2d(input_features)

print(f"输入形状: {input_features.shape}")
print(f"全局平均池化后形状: {output.shape}")

全连接层：最终决策

经过卷积和池化后，特征图需要被展平，然后通过全连接层进行最终分类或回归。
卷积+池化

C×H×W
展平

C×H×W
全连接层1

N个神经元
全连接层2

M个神经元
输出层

K个类别

python 复制代码

def flatten(input_matrix):
    """
    展平操作
    input_matrix: (C, H, W)
    输出: (C*H*W,)
    """
    return input_matrix.flatten()

# 示例
input_features = np.random.randn(32, 7, 7)
flattened = flatten(input_features)

print(f"输入形状: {input_features.shape}")
print(f"展平后形状: {flattened.shape}")

完整的CNN层

让我们实现一个完整的CNN层，包含卷积、激活和池化。

python 复制代码

class Conv2DLayer:
    """卷积层"""
    
    def __init__(self, in_channels, out_channels, kernel_size, 
                 stride=1, padding=0, activation='relu'):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.activation = activation
        
        # 初始化卷积核
        scale = np.sqrt(2.0 / (in_channels * kernel_size * kernel_size))
        self.weights = np.random.randn(out_channels, in_channels, 
                                       kernel_size, kernel_size) * scale
        self.bias = np.zeros(out_channels)
    
    def forward(self, x):
        """
        前向传播
        x: (N, C, H, W) 或 (C, H, W)
        """
        # 如果是单个样本，增加batch维度
        if len(x.shape) == 3:
            x = x[np.newaxis, :]
        
        N, C, H, W = x.shape
        
        # 填充
        if self.padding > 0:
            x = np.pad(x, [(0, 0), (0, 0), (self.padding, self.padding), 
                          (self.padding, self.padding)], mode='constant')
        
        # 计算输出尺寸
        H_padded, W_padded = x.shape[2], x.shape[3]
        OH = (H_padded - self.kernel_size) // self.stride + 1
        OW = (W_padded - self.kernel_size) // self.stride + 1
        
        # 卷积
        output = np.zeros((N, self.out_channels, OH, OW))
        
        for n in range(N):
            for oc in range(self.out_channels):
                for i in range(OH):
                    for j in range(OW):
                        h_start = i * self.stride
                        h_end = h_start + self.kernel_size
                        w_start = j * self.stride
                        w_end = w_start + self.kernel_size
                        
                        output[n, oc, i, j] = np.sum(
                            x[n, :, h_start:h_end, w_start:w_end] * 
                            self.weights[oc]
                        ) + self.bias[oc]
        
        # 激活函数
        if self.activation == 'relu':
            output = np.maximum(0, output)
        elif self.activation == 'sigmoid':
            output = 1 / (1 + np.exp(-output))
        elif self.activation == 'tanh':
            output = np.tanh(output)
        
        return output

class MaxPool2D:
    """最大池化层"""
    
    def __init__(self, pool_size=2, stride=None):
        self.pool_size = pool_size
        self.stride = stride if stride is not None else pool_size
    
    def forward(self, x):
        """
        前向传播
        x: (N, C, H, W)
        """
        N, C, H, W = x.shape
        
        OH = (H - self.pool_size) // self.stride + 1
        OW = (W - self.pool_size) // self.stride + 1
        
        output = np.zeros((N, C, OH, OW))
        
        for n in range(N):
            for c in range(C):
                for i in range(OH):
                    for j in range(OW):
                        h_start = i * self.stride
                        h_end = h_start + self.pool_size
                        w_start = j * self.stride
                        w_end = w_start + self.pool_size
                        
                        output[n, c, i, j] = np.max(
                            x[n, c, h_start:h_end, w_start:w_end]
                        )
        
        return output

class Flatten:
    """展平层"""
    
    def forward(self, x):
        """
        前向传播
        x: (N, C, H, W)
        输出: (N, C*H*W)
        """
        N = x.shape[0]
        return x.reshape(N, -1)

构建简单的CNN

python 复制代码

class SimpleCNN:
    """简单的CNN模型"""
    
    def __init__(self):
        self.conv1 = Conv2DLayer(in_channels=1, out_channels=16, 
                                kernel_size=3, stride=1, padding=1, activation='relu')
        self.pool1 = MaxPool2D(pool_size=2, stride=2)
        self.conv2 = Conv2DLayer(in_channels=16, out_channels=32, 
                                kernel_size=3, stride=1, padding=1, activation='relu')
        self.pool2 = MaxPool2D(pool_size=2, stride=2)
        self.flatten = Flatten()
        
        # 计算展平后的尺寸
        # 输入: 1×28×28 -> conv1: 16×28×28 -> pool1: 16×14×14 
        # -> conv2: 32×14×14 -> pool2: 32×7×7 -> flatten: 1568
        self.fc1 = None  # 需要根据输入尺寸初始化
        self.fc2 = None
    
    def forward(self, x):
        """前向传播"""
        # 卷积层
        x = self.conv1.forward(x)
        x = self.pool1.forward(x)
        x = self.conv2.forward(x)
        x = self.pool2.forward(x)
        
        # 展平
        x = self.flatten.forward(x)
        
        return x

# 测试网络
cnn = SimpleCNN()

# 创建一个模拟输入（batch_size=4, channels=1, height=28, width=28）
input_data = np.random.randn(4, 1, 28, 28)

# 前向传播
output = cnn.forward(input_data)

print(f"输入形状: {input_data.shape}")
print(f"输出形状: {output.shape}")

理解卷积核学到了什么

卷积核通过训练学会了提取特定的特征，如边缘、纹理等。让我们可视化一些经典的卷积核。

边缘检测卷积核

python 复制代码

# 经典的边缘检测卷积核
sobel_x = np.array([
    [-1, 0, 1],
    [-2, 0, 2],
    [-1, 0, 1]
])

sobel_y = np.array([
    [-1, -2, -1],
    [0, 0, 0],
    [1, 2, 1]
])

laplacian = np.array([
    [0, 1, 0],
    [1, -4, 1],
    [0, 1, 0]
])

# 创建一个测试图像
test_img = np.zeros((10, 10))
test_img[3:7, 3:7] = 1  # 一个方形

# 应用卷积核
edges_x = manual_conv2d(test_img, sobel_x)
edges_y = manual_conv2d(test_img, sobel_y)
edges_combined = np.sqrt(edges_x**2 + edges_y**2)

# 可视化
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

axes[0, 0].imshow(test_img, cmap='gray')
axes[0, 0].set_title('原始图像')
axes[0, 0].axis('off')

axes[0, 1].imshow(sobel_x, cmap='coolwarm')
axes[0, 1].set_title('Sobel X 卷积核')
axes[0, 1].axis('off')

axes[1, 0].imshow(edges_combined, cmap='gray')
axes[1, 0].set_title('边缘检测结果')
axes[1, 0].axis('off')

axes[1, 1].imshow(laplacian, cmap='coolwarm')
axes[1, 1].set_title('Laplacian 卷积核')
axes[1, 1].axis('off')

plt.tight_layout()
plt.show()

可视化训练后的卷积核

在训练CNN后，我们可以可视化学到的卷积核。

python 复制代码

def visualize_kernels(weights, n_cols=8):
    """
    可视化卷积核
    weights: (out_channels, in_channels, H, W)
    """
    out_channels, in_channels, H, W = weights.shape
    
    # 只显示第一个输入通道的卷积核
    if in_channels > 1:
        weights_to_show = weights[:, 0, :, :]
    else:
        weights_to_show = weights[:, 0, :, :]
    
    n_rows = (out_channels + n_cols - 1) // n_cols
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols*1.5, n_rows*1.5))
    axes = axes.flat if n_rows > 1 else [axes]
    
    for i in range(out_channels):
        axes[i].imshow(weights_to_show[i], cmap='gray')
        axes[i].axis('off')
        axes[i].set_title(f'核{i}')
    
    # 隐藏多余的子图
    for i in range(out_channels, len(axes)):
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

# 可视化随机初始化的卷积核
random_kernels = np.random.randn(32, 1, 3, 3)
visualize_kernels(random_kernels, n_cols=8)

CNN的典型架构

一个典型的CNN架构包括：

输入层：接收图像
卷积层：提取特征
激活层：引入非线性
池化层：降维
全连接层：分类
输出层：输出预测

输入图像

224×224×3
Conv + ReLU

64×112×112
Max Pool

64×56×56
Conv + ReLU

128×56×56
Max Pool

128×28×28
Conv + ReLU

256×28×28
Max Pool

256×14×14
Flatten

50176
FC + ReLU

1024
FC + Softmax

总结

本文详细讲解了CNN的基本概念，包括：

卷积层方面：

卷积操作的原理和实现
步长和填充的概念
多通道卷积
输出尺寸计算公式
激活函数的应用

池化层方面：

最大池化
平均池化
全局平均池化
池化的作用：降维和增强不变性

全连接层方面：

展平操作
最终分类决策

我们还学习了：

CNN相比全连接网络的优势
权重共享和局部连接的概念
经典的边缘检测卷积核
CNN的典型架构

卷积神经网络通过卷积和池化操作，有效地提取图像的空间特征，大大减少了参数量，同时保持了图像的结构信息。这些设计使得CNN成为计算机视觉领域最成功的模型之一。

掌握这些基础知识后，我们就可以深入学习更复杂的CNN架构（如VGG、ResNet等），并将CNN应用到实际的图像分类、目标检测、图像分割等任务中。