深度学习入门_神经网络基础

标题

- 引言
- 神经网络的生物启发
- 感知器：最早的神经网络模型
- - 感知器的工作原理
  - 感知器的局限性
- 多层感知器（MLP）
- - MLP的结构
- 激活函数
- - 常用激活函数
- 损失函数
- - 常用损失函数
- 反向传播算法
- 实战项目：手写数字识别
- 优化技巧
- - [1. 学习率调度](#1. 学习率调度)
  - [2. Dropout正则化](#2. Dropout正则化)
  - [3. 批量归一化](#3. 批量归一化)
- 实际应用：鸢尾花分类
- 总结
- 学习建议

引言

深度学习作为人工智能领域的核心技术，已经广泛应用于图像识别、自然语言处理、语音识别等多个领域。本文将深入介绍神经网络的基础知识，包括其原理、结构以及实际应用。

神经网络的生物启发

人工神经网络（ANN）的灵感来源于生物神经系统的结构和功能。人脑由数十亿个神经元组成，这些神经元通过突触相互连接，形成复杂的网络结构。每个神经元接收来自其他神经元的信号，进行处理后传递给下一个神经元。

人工神经元模仿了这种机制：

接收输入信号
对输入进行加权求和
应用激活函数
产生输出

感知器：最早的神经网络模型

感知器（Perceptron）由Frank Rosenblatt在1957年提出，是最简单的人工神经网络模型。

感知器的工作原理

感知器接收多个输入，对每个输入赋予一个权重，然后计算加权和，最后通过激活函数产生输出。

python 复制代码

import numpy as np

class Perceptron:
    def __init__(self, input_size, learning_rate=0.01, epochs=100):
        self.weights = np.zeros(input_size + 1)  # +1 for bias
        self.learning_rate = learning_rate
        self.epochs = epochs

    def predict(self, inputs):
        # 添加偏置项
        inputs_with_bias = np.insert(inputs, 0, 1)
        # 计算加权和
        weighted_sum = np.dot(inputs_with_bias, self.weights)
        # 应用阶跃激活函数
        return 1 if weighted_sum > 0 else 0

    def train(self, training_inputs, labels):
        for _ in range(self.epochs):
            for inputs, label in zip(training_inputs, labels):
                prediction = self.predict(inputs)
                # 计算误差
                error = label - prediction
                # 更新权重
                inputs_with_bias = np.insert(inputs, 0, 1)
                self.weights += self.learning_rate * error * inputs_with_bias

# 示例：使用感知器实现AND逻辑门
training_inputs = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])

labels = np.array([0, 0, 0, 1])

perceptron = Perceptron(input_size=2)
perceptron.train(training_inputs, labels)

# 测试
test_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
for inputs in test_inputs:
    print(f"输入: {inputs}, 输出: {perceptron.predict(inputs)}")

感知器的局限性

感知器只能解决线性可分问题，对于异或（XOR）这样的非线性问题无能为力。这一局限性导致了第一次"AI冬天"的到来。

多层感知器（MLP）

为了解决感知器的局限性，研究人员提出了多层感知器，也称为前馈神经网络。

MLP的结构

多层感知器包含：

输入层：接收原始数据
隐藏层：提取特征和进行非线性变换
输出层：产生最终预测

python 复制代码

import numpy as np

class MLP:
    def __init__(self, input_size, hidden_size, output_size):
        # 初始化权重
        self.weights1 = np.random.randn(input_size, hidden_size)
        self.bias1 = np.zeros(hidden_size)
        self.weights2 = np.random.randn(hidden_size, output_size)
        self.bias2 = np.zeros(output_size)

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x)

    def forward(self, X):
        # 前向传播
        self.hidden = self.sigmoid(np.dot(X, self.weights1) + self.bias1)
        self.output = self.sigmoid(np.dot(self.hidden, self.weights2) + self.bias2)
        return self.output

    def backward(self, X, y, learning_rate):
        # 反向传播
        # 计算输出层误差
        output_error = y - self.output
        output_delta = output_error * self.sigmoid_derivative(self.output)

        # 计算隐藏层误差
        hidden_error = output_delta.dot(self.weights2.T)
        hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden)

        # 更新权重
        self.weights2 += self.hidden.T.dot(output_delta) * learning_rate
        self.bias2 += np.sum(output_delta, axis=0) * learning_rate
        self.weights1 += X.T.dot(hidden_delta) * learning_rate
        self.bias1 += np.sum(hidden_delta, axis=0) * learning_rate

    def train(self, X, y, epochs, learning_rate):
        for _ in range(epochs):
            self.forward(X)
            self.backward(X, y, learning_rate)

    def predict(self, X):
        return self.forward(X)

# 示例：使用MLP解决XOR问题
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

mlp = MLP(input_size=2, hidden_size=4, output_size=1)
mlp.train(X, y, epochs=10000, learning_rate=0.1)

# 测试
predictions = mlp.predict(X)
for i in range(len(X)):
    print(f"输入: {X[i]}, 期望: {y[i][0]}, 预测: {predictions[i][0]:.4f}")

激活函数

激活函数引入非线性特性，使神经网络能够学习复杂的模式。

常用激活函数

Sigmoid函数
- 公式： σ ( x ) = 1 1 + e − x \sigma(x) = \frac{1}{1 + e^{-x}} σ(x)=1+e−x1
- 特点：输出范围(0,1)，适合概率输出
- 缺点：存在梯度消失问题
Tanh函数
- 公式： tanh ⁡ ( x ) = e x − e − x e x + e − x \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} tanh(x)=ex+e−xex−e−x
- 特点：输出范围(-1,1)，零中心化
ReLU函数
- 公式： ReLU ( x ) = max ⁡ ( 0 , x ) \text{ReLU}(x) = \max(0, x) ReLU(x)=max(0,x)
- 特点：计算简单，缓解梯度消失
- 缺点：存在"死亡ReLU"问题
Leaky ReLU
- 公式： LeakyReLU ( x ) = max ⁡ ( α x , x ) \text{LeakyReLU}(x) = \max(\alpha x, x) LeakyReLU(x)=max(αx,x)，其中 α \alpha α是小常数
- 特点：解决死亡ReLU问题

python 复制代码

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

def leaky_relu(x, alpha=0.01):
    return np.maximum(alpha * x, x)

# 生成输入值
x = np.linspace(-5, 5, 100)

# 计算各激活函数的输出
y_sigmoid = sigmoid(x)
y_tanh = tanh(x)
y_relu = relu(x)
y_leaky_relu = leaky_relu(x)

# 绘制激活函数
plt.figure(figsize=(12, 8))

plt.subplot(2, 2, 1)
plt.plot(x, y_sigmoid)
plt.title('Sigmoid')
plt.grid(True)

plt.subplot(2, 2, 2)
plt.plot(x, y_tanh)
plt.title('Tanh')
plt.grid(True)

plt.subplot(2, 2, 3)
plt.plot(x, y_relu)
plt.title('ReLU')
plt.grid(True)

plt.subplot(2, 2, 4)
plt.plot(x, y_leaky_relu)
plt.title('Leaky ReLU')
plt.grid(True)

plt.tight_layout()
plt.show()

损失函数

损失函数衡量模型预测与真实值之间的差异。

常用损失函数

均方误差（MSE）
- 适用于回归问题
- 公式： M S E = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2 MSE=n1∑i=1n(yi−y^i)2
交叉熵损失
- 适用于分类问题
- 二分类： L = − [ y log ⁡ ( y ^ ) + ( 1 − y ) log ⁡ ( 1 − y ^ ) ] L = -[y\log(\hat{y}) + (1-y)\log(1-\hat{y})] L=−[ylog(y^)+(1−y)log(1−y^)]
- 多分类： L = − ∑ i = 1 c y i log ⁡ ( y ^ i ) L = -\sum_{i=1}^{c}y_i\log(\hat{y}_i) L=−∑i=1cyilog(y^i)

python 复制代码

def mean_squared_error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

def binary_cross_entropy(y_true, y_pred):
    # 避免log(0)的情况
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

def categorical_cross_entropy(y_true, y_pred):
    # 避免log(0)的情况
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
    return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))

反向传播算法

反向传播是训练神经网络的核心算法，通过链式法则计算梯度。

python 复制代码

class NeuralNetwork:
    def __init__(self, layers):
        self.layers = []
        self.weights = []

        # 初始化网络
        for i in range(len(layers) - 1):
            self.weights.append(np.random.randn(layers[i], layers[i+1]) * 0.01)

    def relu(self, x):
        return np.maximum(0, x)

    def relu_derivative(self, x):
        return (x > 0).astype(float)

    def softmax(self, x):
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)

    def forward(self, X):
        self.layers = [X]

        # 前向传播
        for i in range(len(self.weights) - 1):
            self.layers.append(self.relu(np.dot(self.layers[-1], self.weights[i])))

        # 输出层使用softmax
        self.layers.append(self.softmax(np.dot(self.layers[-1], self.weights[-1])))

        return self.layers[-1]

    def backward(self, X, y, learning_rate):
        m = X.shape[0]  # 样本数量

        # 前向传播
        output = self.forward(X)

        # 计算输出层梯度
        delta = output - y
        d_weights = [np.dot(self.layers[-2].T, delta) / m]

        # 反向传播
        for i in range(len(self.weights) - 2, -1, -1):
            delta = np.dot(delta, self.weights[i+1].T) * self.relu_derivative(self.layers[i+1])
            d_weights.insert(0, np.dot(self.layers[i].T, delta) / m)

        # 更新权重
        for i in range(len(self.weights)):
            self.weights[i] -= learning_rate * d_weights[i]

    def train(self, X, y, epochs, learning_rate, batch_size=32):
        n_samples = X.shape[0]

        for epoch in range(epochs):
            # 随机打乱数据
            permutation = np.random.permutation(n_samples)
            X_shuffled = X[permutation]
            y_shuffled = y[permutation]

            # 批量训练
            for i in range(0, n_samples, batch_size):
                X_batch = X_shuffled[i:i+batch_size]
                y_batch = y_shuffled[i:i+batch_size]
                self.backward(X_batch, y_batch, learning_rate)

            # 计算损失
            output = self.forward(X)
            loss = -np.mean(np.sum(y * np.log(output + 1e-15), axis=1))

            if epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.4f}")

    def predict(self, X):
        output = self.forward(X)
        return np.argmax(output, axis=1)

实战项目：手写数字识别

让我们使用神经网络实现MNIST手写数字识别。

python 复制代码

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

# 加载MNIST数据集
print("加载MNIST数据集...")
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist['data'], mnist['target']

# 数据预处理
X = X / 255.0  # 归一化到[0,1]范围
X = X.values  # 转换为numpy数组

# 将标签转换为one-hot编码
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y.values.reshape(-1, 1))

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=42)

# 创建神经网络
nn = NeuralNetwork(layers=[784, 128, 64, 10])

# 训练网络
print("开始训练神经网络...")
nn.train(X_train, y_train, epochs=500, learning_rate=0.01, batch_size=64)

# 评估模型
print("评估模型性能...")
predictions = nn.predict(X_test)
true_labels = np.argmax(y_test, axis=1)
accuracy = np.mean(predictions == true_labels)
print(f"测试集准确率: {accuracy:.4f}")

# 可视化一些预测结果
import matplotlib.pyplot as plt

def plot_predictions(images, true_labels, predicted_labels, n=10):
    plt.figure(figsize=(20, 4))
    for i in range(n):
        plt.subplot(1, n, i+1)
        plt.imshow(images[i].reshape(28, 28), cmap='gray')
        plt.title(f"True: {true_labels[i]}\nPred: {predicted_labels[i]}")
        plt.axis('off')
    plt.show()

# 显示前10个测试样本的预测结果
plot_predictions(X_test[:10], true_labels[:10], predictions[:10])

优化技巧

1. 学习率调度

动态调整学习率可以提高训练效果：

python 复制代码

def learning_rate_schedule(epoch, initial_lr=0.01):
    if epoch < 100:
        return initial_lr
    elif epoch < 300:
        return initial_lr * 0.1
    else:
        return initial_lr * 0.01

2. Dropout正则化

Dropout防止过拟合：

python 复制代码

class NeuralNetworkWithDropout:
    def __init__(self, layers, dropout_rate=0.5):
        self.layers = []
        self.weights = []
        self.dropout_rate = dropout_rate

        for i in range(len(layers) - 1):
            self.weights.append(np.random.randn(layers[i], layers[i+1]) * 0.01)

    def dropout(self, X):
        if self.dropout_rate > 0:
            mask = np.random.binomial(1, 1 - self.dropout_rate, size=X.shape)
            return X * mask / (1 - self.dropout_rate)
        return X

    def forward(self, X, training=True):
        self.layers = [X]

        for i in range(len(self.weights) - 1):
            linear = np.dot(self.layers[-1], self.weights[i])
            activation = self.relu(linear)
            if training:
                activation = self.dropout(activation)
            self.layers.append(activation)

        # 输出层
        self.layers.append(self.softmax(np.dot(self.layers[-1], self.weights[-1])))
        return self.layers[-1]

3. 批量归一化

批量归一化加速训练并提高稳定性：

python 复制代码

class BatchNormalization:
    def __init__(self, input_dim, momentum=0.9):
        self.gamma = np.ones(input_dim)
        self.beta = np.zeros(input_dim)
        self.momentum = momentum
        self.running_mean = np.zeros(input_dim)
        self.running_var = np.zeros(input_dim)

    def forward(self, X, training=True):
        if training:
            mean = np.mean(X, axis=0)
            var = np.var(X, axis=0)

            # 更新运行均值和方差
            self.running_mean = self.momentum * self.running_mean + (1 - self.momentum) * mean
            self.running_var = self.momentum * self.running_var + (1 - self.momentum) * var

            # 归一化
            X_normalized = (X - mean) / np.sqrt(var + 1e-8)
            self.X_normalized = X_normalized
        else:
            X_normalized = (X - self.running_mean) / np.sqrt(self.running_var + 1e-8)

        # 缩放和偏移
        out = self.gamma * X_normalized + self.beta
        return out

实际应用：鸢尾花分类

让我们用神经网络解决经典的鸢尾花分类问题：

python 复制代码

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# 加载数据
iris = load_iris()
X = iris.data
y = iris.target.reshape(-1, 1)

# 数据预处理
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 标签编码
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y)

# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_onehot, test_size=0.2, random_state=42)

# 创建并训练网络
nn = NeuralNetwork(layers=[4, 8, 3])
nn.train(X_train, y_train, epochs=1000, learning_rate=0.01, batch_size=16)

# 评估
predictions = nn.predict(X_test)
true_labels = np.argmax(y_test, axis=1)
accuracy = np.mean(predictions == true_labels)

print(f"鸢尾花分类准确率: {accuracy:.4f}")

# 显示分类报告
from sklearn.metrics import classification_report
print("\n分类报告:")
print(classification_report(true_labels, predictions, target_names=iris.target_names))

总结

神经网络是深度学习的基础，通过模仿生物神经系统的结构和功能，能够学习复杂的模式。本文介绍了：

感知器及其局限性
多层感知器的原理和实现
常用的激活函数和损失函数
反向传播算法的细节
实际项目案例和优化技巧

神经网络的发展仍在继续，新的架构和优化方法不断涌现。掌握这些基础知识将为深入学习更复杂的深度学习模型打下坚实基础。

学习建议

探索更先进的优化算法（Adam、RMSprop等）
学习卷积神经网络（CNN）处理图像数据
研究循环神经网络（RNN）处理序列数据
了解注意力机制和Transformer架构
尝试使用深度学习框架（TensorFlow、PyTorch）

深度学习是一个不断发展的领域，持续学习和实践是掌握这项技术的关键。