人工神经网络(ANN)深度学习
目录
引言
什么是人工神经网络?
人工神经网络(Artificial Neural Network, ANN)是一种模仿生物神经系统的计算模型,通过大量相互连接的人工神经元来处理信息。它是深度学习的基础,能够学习和识别复杂的模式。
发展历史
- 1943年:McCulloch和Pitts提出第一个神经元数学模型
- 1958年:Rosenblatt发明感知器(Perceptron)
- 1986年:Rumelhart等人提出反向传播算法
- 2006年:Hinton提出深度信念网络,开启深度学习时代
- 2012年:AlexNet在ImageNet竞赛中获胜,深度学习爆发
应用领域
- 计算机视觉(图像分类、目标检测、人脸识别)
- 自然语言处理(机器翻译、情感分析、文本生成)
- 语音识别与合成
- 推荐系统
- 自动驾驶
- 医疗诊断
神经网络基础理论
神经元模型
生物神经元 vs 人工神经元
生物神经元包含树突、细胞体、轴突等结构。人工神经元将其简化为:
- 输入:对应树突,接收信号
- 权重:连接强度
- 偏置:阈值调节
- 激活函数:决定是否激活
- 输出:对应轴突输出
数学表示
单个神经元的输出可表示为:
y = f(Σ(wi * xi) + b)
其中:
- xi:输入信号
- wi:对应权重
- b:偏置项
- f:激活函数
- y:输出
网络架构
1. 前馈神经网络(Feedforward Neural Network)
最基本的神经网络结构,信息单向流动:
- 输入层:接收原始数据
- 隐藏层:特征提取和转换
- 输出层:产生最终结果
2. 网络深度与宽度
- 深度:层数的多少
- 宽度:每层神经元的数量
- 深度学习:通常指3层以上的神经网络
3. 全连接层(Dense Layer)
每个神经元与前一层所有神经元相连,参数量:
参数量 = (输入维度 × 输出维度) + 输出维度(偏置)
神经网络的数学原理
前向传播(Forward Propagation)
矩阵表示
对于L层网络,第l层的计算:
Z[l] = W[l] × A[l-1] + b[l]
A[l] = g[l](Z[l])
其中:
- W[l]:第l层权重矩阵,形状为(n[l], n[l-1])
- b[l]:第l层偏置向量,形状为(n[l], 1)
- g[l]:第l层激活函数
- A[l]:第l层激活值
计算流程
python
def forward_propagation(X, parameters):
"""
X: 输入数据
parameters: 包含W和b的字典
"""
A = X
caches = []
L = len(parameters) // 2
for l in range(1, L):
A_prev = A
W = parameters['W' + str(l)]
b = parameters['b' + str(l)]
Z = np.dot(W, A_prev) + b
A = activation_function(Z) # ReLU, Sigmoid等
cache = (A_prev, W, b, Z)
caches.append(cache)
# 输出层(通常使用不同的激活函数)
WL = parameters['W' + str(L)]
bL = parameters['b' + str(L)]
ZL = np.dot(WL, A) + bL
AL = output_activation(ZL) # Softmax, Sigmoid等
return AL, caches
反向传播(Backward Propagation)
链式法则
反向传播基于微积分的链式法则:
∂L/∂w = ∂L/∂y × ∂y/∂z × ∂z/∂w
梯度计算
对于第l层:
dZ[l] = dA[l] × g'[l](Z[l])
dW[l] = (1/m) × dZ[l] × A[l-1].T
db[l] = (1/m) × Σ(dZ[l])
dA[l-1] = W[l].T × dZ[l]
实现代码
python
def backward_propagation(AL, Y, caches):
"""
AL: 前向传播的输出
Y: 真实标签
caches: 前向传播的缓存
"""
grads = {}
L = len(caches)
m = AL.shape[1]
Y = Y.reshape(AL.shape)
# 输出层梯度
dAL = -(np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
# 反向传播
for l in reversed(range(L)):
current_cache = caches[l]
A_prev, W, b, Z = current_cache
if l == L - 1:
dZ = AL - Y # 对于交叉熵损失和sigmoid/softmax
else:
dZ = dA * activation_derivative(Z)
dW = (1/m) * np.dot(dZ, A_prev.T)
db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
dA_prev = np.dot(W.T, dZ)
grads["dW" + str(l + 1)] = dW
grads["db" + str(l + 1)] = db
dA = dA_prev
return grads
参数初始化
1. 零初始化(不推荐)
python
W = np.zeros((n_out, n_in))
问题:对称性破坏失败,所有神经元学习相同特征
2. 随机初始化
python
W = np.random.randn(n_out, n_in) * 0.01
3. Xavier/Glorot初始化
python
W = np.random.randn(n_out, n_in) * np.sqrt(1/n_in)
4. He初始化(ReLU激活函数)
python
W = np.random.randn(n_out, n_in) * np.sqrt(2/n_in)
激活函数详解
1. Sigmoid函数
数学表达式
σ(x) = 1 / (1 + e^(-x))
导数:σ'(x) = σ(x) × (1 - σ(x))
特点
- 输出范围:(0, 1)
- 适用于二分类输出层
- 缺点:梯度消失、输出不是零中心
2. Tanh函数
数学表达式
tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
导数:tanh'(x) = 1 - tanh²(x)
特点
- 输出范围:(-1, 1)
- 零中心化
- 仍存在梯度消失问题
3. ReLU(Rectified Linear Unit)
数学表达式
ReLU(x) = max(0, x)
导数:ReLU'(x) = {1, if x > 0; 0, if x ≤ 0}
特点
- 计算简单高效
- 缓解梯度消失
- 缺点:死亡ReLU问题
4. Leaky ReLU
数学表达式
LeakyReLU(x) = max(αx, x), α通常为0.01
特点
- 解决死亡ReLU问题
- 允许负值梯度流动
5. ELU(Exponential Linear Unit)
数学表达式
ELU(x) = {x, if x > 0; α(e^x - 1), if x ≤ 0}
6. Softmax(多分类输出)
数学表达式
Softmax(xi) = e^xi / Σ(e^xj)
特点
- 输出概率分布
- 所有输出和为1
- 用于多分类问题
激活函数选择指南
场景 | 推荐激活函数 |
---|---|
隐藏层(一般情况) | ReLU |
隐藏层(防止死亡神经元) | Leaky ReLU, ELU |
二分类输出层 | Sigmoid |
多分类输出层 | Softmax |
回归输出层 | Linear(无激活) |
RNN隐藏层 | Tanh |
损失函数与优化器
损失函数
1. 均方误差(MSE)- 回归问题
python
MSE = (1/n) × Σ(yi - ŷi)²
2. 交叉熵损失 - 分类问题
二分类交叉熵:
python
BCE = -(1/n) × Σ[yi×log(ŷi) + (1-yi)×log(1-ŷi)]
多分类交叉熵:
python
CE = -(1/n) × ΣΣ[yij×log(ŷij)]
3. Focal Loss - 类别不平衡
python
FL = -α(1-pt)^γ × log(pt)
优化器
1. 梯度下降(GD)
python
θ = θ - α × ∇J(θ)
2. 随机梯度下降(SGD)
python
# 每次使用一个样本
θ = θ - α × ∇J(θ; xi, yi)
3. 小批量梯度下降(Mini-batch GD)
python
# 使用batch_size个样本
θ = θ - α × (1/batch_size) × Σ∇J(θ; xi, yi)
4. 动量(Momentum)
python
v = β×v - α×∇J(θ)
θ = θ + v
5. Adam(Adaptive Moment Estimation)
python
# 一阶动量
m = β1×m + (1-β1)×g
# 二阶动量
v = β2×v + (1-β2)×g²
# 偏差修正
m_hat = m / (1-β1^t)
v_hat = v / (1-β2^t)
# 更新参数
θ = θ - α×m_hat / (√v_hat + ε)
6. RMSprop
python
v = β×v + (1-β)×g²
θ = θ - α×g / √(v + ε)
学习率调度
1. 指数衰减
python
lr = lr_initial × decay_rate^(epoch/decay_steps)
2. 余弦退火
python
lr = lr_min + 0.5×(lr_max - lr_min)×(1 + cos(π×t/T))
3. 学习率预热
python
if epoch < warmup_epochs:
lr = lr_initial × (epoch / warmup_epochs)
PyTorch实现
基础构建块
1. 张量操作
python
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
# 创建张量
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
# GPU支持
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = x.to(device)
# 自动微分
x = torch.randn(3, requires_grad=True)
y = x * 2
y.backward(torch.ones_like(x))
print(x.grad) # dy/dx = 2
2. 定义神经网络
python
class SimpleANN(nn.Module):
def __init__(self, input_size, hidden_sizes, output_size, dropout_rate=0.2):
super(SimpleANN, self).__init__()
# 构建层
layers = []
prev_size = input_size
for hidden_size in hidden_sizes:
layers.append(nn.Linear(prev_size, hidden_size))
layers.append(nn.BatchNorm1d(hidden_size))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout_rate))
prev_size = hidden_size
# 输出层
layers.append(nn.Linear(prev_size, output_size))
self.model = nn.Sequential(*layers)
def forward(self, x):
return self.model(x)
# 实例化模型
model = SimpleANN(
input_size=784,
hidden_sizes=[512, 256, 128],
output_size=10
).to(device)
# 查看模型结构
print(model)
# 统计参数量
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params}")
print(f"Trainable parameters: {trainable_params}")
3. 自定义层
python
class CustomLayer(nn.Module):
def __init__(self, in_features, out_features):
super(CustomLayer, self).__init__()
self.weight = nn.Parameter(torch.randn(out_features, in_features))
self.bias = nn.Parameter(torch.zeros(out_features))
# 初始化
nn.init.xavier_uniform_(self.weight)
nn.init.zeros_(self.bias)
def forward(self, x):
return F.linear(x, self.weight, self.bias)
完整训练流程
python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
import numpy as np
class AdvancedANN(nn.Module):
def __init__(self, config):
super(AdvancedANN, self).__init__()
self.config = config
# 输入层
self.input_layer = nn.Linear(config['input_dim'], config['hidden_dims'][0])
# 隐藏层
self.hidden_layers = nn.ModuleList()
self.batch_norms = nn.ModuleList()
self.dropouts = nn.ModuleList()
for i in range(len(config['hidden_dims']) - 1):
self.hidden_layers.append(
nn.Linear(config['hidden_dims'][i], config['hidden_dims'][i+1])
)
self.batch_norms.append(nn.BatchNorm1d(config['hidden_dims'][i+1]))
self.dropouts.append(nn.Dropout(config['dropout_rate']))
# 输出层
self.output_layer = nn.Linear(config['hidden_dims'][-1], config['output_dim'])
# 激活函数
self.activation = self._get_activation(config['activation'])
def _get_activation(self, name):
activations = {
'relu': nn.ReLU(),
'leaky_relu': nn.LeakyReLU(0.01),
'elu': nn.ELU(),
'tanh': nn.Tanh(),
'sigmoid': nn.Sigmoid()
}
return activations.get(name, nn.ReLU())
def forward(self, x):
# 输入层
x = self.activation(self.input_layer(x))
# 隐藏层
for hidden, bn, dropout in zip(self.hidden_layers, self.batch_norms, self.dropouts):
x = hidden(x)
x = bn(x)
x = self.activation(x)
x = dropout(x)
# 输出层
x = self.output_layer(x)
return x
class Trainer:
def __init__(self, model, config):
self.model = model
self.config = config
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model.to(self.device)
# 损失函数
self.criterion = self._get_loss_function(config['loss'])
# 优化器
self.optimizer = self._get_optimizer(config['optimizer'])
# 学习率调度器
self.scheduler = self._get_scheduler(config['scheduler'])
# 记录训练历史
self.history = {
'train_loss': [],
'val_loss': [],
'train_acc': [],
'val_acc': []
}
def _get_loss_function(self, loss_name):
losses = {
'mse': nn.MSELoss(),
'cross_entropy': nn.CrossEntropyLoss(),
'bce': nn.BCELoss(),
'bce_with_logits': nn.BCEWithLogitsLoss()
}
return losses.get(loss_name, nn.MSELoss())
def _get_optimizer(self, optimizer_config):
name = optimizer_config['name']
lr = optimizer_config['lr']
if name == 'adam':
return optim.Adam(self.model.parameters(), lr=lr,
betas=(0.9, 0.999), weight_decay=1e-5)
elif name == 'sgd':
return optim.SGD(self.model.parameters(), lr=lr,
momentum=0.9, weight_decay=1e-5)
elif name == 'rmsprop':
return optim.RMSprop(self.model.parameters(), lr=lr)
else:
return optim.Adam(self.model.parameters(), lr=lr)
def _get_scheduler(self, scheduler_config):
if scheduler_config['name'] == 'step':
return optim.lr_scheduler.StepLR(
self.optimizer,
step_size=scheduler_config['step_size'],
gamma=scheduler_config['gamma']
)
elif scheduler_config['name'] == 'cosine':
return optim.lr_scheduler.CosineAnnealingLR(
self.optimizer,
T_max=scheduler_config['T_max']
)
else:
return None
def train_epoch(self, train_loader):
self.model.train()
total_loss = 0
correct = 0
total = 0
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(self.device), target.to(self.device)
# 前向传播
self.optimizer.zero_grad()
output = self.model(data)
loss = self.criterion(output, target)
# 反向传播
loss.backward()
# 梯度裁剪
torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
# 更新参数
self.optimizer.step()
# 统计
total_loss += loss.item()
_, predicted = output.max(1)
total += target.size(0)
correct += predicted.eq(target).sum().item()
avg_loss = total_loss / len(train_loader)
accuracy = 100. * correct / total
return avg_loss, accuracy
def validate(self, val_loader):
self.model.eval()
total_loss = 0
correct = 0
total = 0
with torch.no_grad():
for data, target in val_loader:
data, target = data.to(self.device), target.to(self.device)
output = self.model(data)
loss = self.criterion(output, target)
total_loss += loss.item()
_, predicted = output.max(1)
total += target.size(0)
correct += predicted.eq(target).sum().item()
avg_loss = total_loss / len(val_loader)
accuracy = 100. * correct / total
return avg_loss, accuracy
def fit(self, train_loader, val_loader, epochs):
best_val_acc = 0
for epoch in range(epochs):
# 训练
train_loss, train_acc = self.train_epoch(train_loader)
# 验证
val_loss, val_acc = self.validate(val_loader)
# 更新学习率
if self.scheduler:
self.scheduler.step()
# 记录历史
self.history['train_loss'].append(train_loss)
self.history['val_loss'].append(val_loss)
self.history['train_acc'].append(train_acc)
self.history['val_acc'].append(val_acc)
# 保存最佳模型
if val_acc > best_val_acc:
best_val_acc = val_acc
torch.save(self.model.state_dict(), 'best_model.pth')
# 打印进度
print(f'Epoch [{epoch+1}/{epochs}] '
f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, '
f'Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%')
def predict(self, data_loader):
self.model.eval()
predictions = []
with torch.no_grad():
for data, _ in data_loader:
data = data.to(self.device)
output = self.model(data)
_, predicted = output.max(1)
predictions.extend(predicted.cpu().numpy())
return np.array(predictions)
# 使用示例
if __name__ == "__main__":
# 配置
config = {
'input_dim': 784,
'hidden_dims': [512, 256, 128],
'output_dim': 10,
'activation': 'relu',
'dropout_rate': 0.3,
'loss': 'cross_entropy',
'optimizer': {'name': 'adam', 'lr': 0.001},
'scheduler': {'name': 'step', 'step_size': 10, 'gamma': 0.1}
}
# 创建模型
model = AdvancedANN(config)
# 创建训练器
trainer = Trainer(model, config)
# 准备数据(示例)
# X_train, X_val, y_train, y_val = prepare_data()
# train_dataset = TensorDataset(torch.FloatTensor(X_train), torch.LongTensor(y_train))
# val_dataset = TensorDataset(torch.FloatTensor(X_val), torch.LongTensor(y_val))
# train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
# val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)
# 训练
# trainer.fit(train_loader, val_loader, epochs=50)
PyTorch高级技巧
1. 混合精度训练
python
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for data, target in train_loader:
optimizer.zero_grad()
with autocast():
output = model(data)
loss = criterion(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
2. 分布式训练
python
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
def setup(rank, world_size):
dist.init_process_group("nccl", rank=rank, world_size=world_size)
def cleanup():
dist.destroy_process_group()
# 在每个进程中
model = model.to(rank)
ddp_model = DDP(model, device_ids=[rank])
3. 模型量化
python
import torch.quantization as quantization
# 动态量化
quantized_model = quantization.quantize_dynamic(
model, {nn.Linear}, dtype=torch.qint8
)
# 静态量化
model.qconfig = quantization.get_default_qconfig('fbgemm')
quantization.prepare(model, inplace=True)
# 校准
quantization.convert(model, inplace=True)
TensorFlow实现
基础构建
1. 张量操作
python
import tensorflow as tf
import numpy as np
# 创建张量
x = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
# GPU配置
physical_devices = tf.config.list_physical_devices('GPU')
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
# 自动微分
x = tf.Variable(3.0)
with tf.GradientTape() as tape:
y = x * x
dy_dx = tape.gradient(y, x) # dy_dx = 6.0
2. Keras Sequential API
python
model = tf.keras.Sequential([
tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(10, activation='softmax')
])
# 模型摘要
model.summary()
3. Keras Functional API
python
inputs = tf.keras.Input(shape=(784,))
x = tf.keras.layers.Dense(512, activation='relu')(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(0.3)(x)
x = tf.keras.layers.Dense(256, activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(0.3)(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(0.3)(x)
outputs = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
4. 自定义层
python
class CustomDense(tf.keras.layers.Layer):
def __init__(self, units, activation=None):
super(CustomDense, self).__init__()
self.units = units
self.activation = tf.keras.activations.get(activation)
def build(self, input_shape):
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer='glorot_uniform',
trainable=True,
name='kernel'
)
self.b = self.add_weight(
shape=(self.units,),
initializer='zeros',
trainable=True,
name='bias'
)
def call(self, inputs):
output = tf.matmul(inputs, self.w) + self.b
if self.activation:
output = self.activation(output)
return output
def get_config(self):
config = super().get_config()
config.update({
'units': self.units,
'activation': tf.keras.activations.serialize(self.activation)
})
return config
完整训练实现
python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
class AdvancedANN(keras.Model):
def __init__(self, config):
super(AdvancedANN, self).__init__()
self.config = config
# 构建层
self.input_layer = layers.Dense(
config['hidden_dims'][0],
activation=config['activation'],
kernel_initializer='he_normal'
)
# 隐藏层
self.hidden_layers = []
self.batch_norms = []
self.dropouts = []
for i in range(len(config['hidden_dims']) - 1):
self.hidden_layers.append(
layers.Dense(
config['hidden_dims'][i+1],
activation=config['activation'],
kernel_initializer='he_normal'
)
)
self.batch_norms.append(layers.BatchNormalization())
self.dropouts.append(layers.Dropout(config['dropout_rate']))
# 输出层
if config['task'] == 'classification':
self.output_layer = layers.Dense(
config['output_dim'],
activation='softmax'
)
else:
self.output_layer = layers.Dense(config['output_dim'])
def call(self, inputs, training=False):
x = self.input_layer(inputs)
for hidden, bn, dropout in zip(
self.hidden_layers, self.batch_norms, self.dropouts
):
x = hidden(x)
x = bn(x, training=training)
x = dropout(x, training=training)
return self.output_layer(x)
class CustomTrainer:
def __init__(self, model, config):
self.model = model
self.config = config
# 编译模型
self._compile_model()
# 回调函数
self.callbacks = self._get_callbacks()
def _compile_model(self):
# 优化器
optimizer = self._get_optimizer()
# 损失函数
loss = self._get_loss()
# 指标
metrics = self._get_metrics()
self.model.compile(
optimizer=optimizer,
loss=loss,
metrics=metrics
)
def _get_optimizer(self):
opt_config = self.config['optimizer']
name = opt_config['name']
lr = opt_config['lr']
if name == 'adam':
return keras.optimizers.Adam(
learning_rate=lr,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-7
)
elif name == 'sgd':
return keras.optimizers.SGD(
learning_rate=lr,
momentum=0.9,
nesterov=True
)
elif name == 'rmsprop':
return keras.optimizers.RMSprop(learning_rate=lr)
else:
return keras.optimizers.Adam(learning_rate=lr)
def _get_loss(self):
loss_name = self.config['loss']
losses = {
'mse': 'mean_squared_error',
'categorical_crossentropy': 'categorical_crossentropy',
'sparse_categorical_crossentropy': 'sparse_categorical_crossentropy',
'binary_crossentropy': 'binary_crossentropy'
}
return losses.get(loss_name, 'mse')
def _get_metrics(self):
if self.config['task'] == 'classification':
return ['accuracy', keras.metrics.TopKCategoricalAccuracy(k=5)]
else:
return ['mae', 'mse']
def _get_callbacks(self):
callbacks = []
# 早停
if self.config.get('early_stopping', True):
callbacks.append(
keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=10,
restore_best_weights=True
)
)
# 学习率调度
if self.config.get('lr_scheduler', True):
callbacks.append(
keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=5,
min_lr=1e-7
)
)
# 模型检查点
callbacks.append(
keras.callbacks.ModelCheckpoint(
'best_model.h5',
monitor='val_accuracy',
save_best_only=True,
mode='max'
)
)
# TensorBoard
callbacks.append(
keras.callbacks.TensorBoard(
log_dir='./logs',
histogram_freq=1,
write_graph=True,
update_freq='epoch'
)
)
return callbacks
def train(self, X_train, y_train, X_val, y_val, epochs, batch_size):
# 数据增强(如果需要)
if self.config.get('data_augmentation', False):
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1
)
datagen.fit(X_train)
history = self.model.fit(
datagen.flow(X_train, y_train, batch_size=batch_size),
validation_data=(X_val, y_val),
epochs=epochs,
callbacks=self.callbacks,
verbose=1
)
else:
history = self.model.fit(
X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_val, y_val),
callbacks=self.callbacks,
verbose=1
)
return history
def evaluate(self, X_test, y_test):
results = self.model.evaluate(X_test, y_test, verbose=0)
print("Test Results:")
for name, value in zip(self.model.metrics_names, results):
print(f"{name}: {value:.4f}")
return results
def predict(self, X):
return self.model.predict(X)
# 自定义训练循环(低级API)
class CustomTrainingLoop:
def __init__(self, model, loss_fn, optimizer):
self.model = model
self.loss_fn = loss_fn
self.optimizer = optimizer
# 指标
self.train_loss = keras.metrics.Mean(name='train_loss')
self.train_accuracy = keras.metrics.SparseCategoricalAccuracy(
name='train_accuracy'
)
self.val_loss = keras.metrics.Mean(name='val_loss')
self.val_accuracy = keras.metrics.SparseCategoricalAccuracy(
name='val_accuracy'
)
@tf.function
def train_step(self, x, y):
with tf.GradientTape() as tape:
predictions = self.model(x, training=True)
loss = self.loss_fn(y, predictions)
gradients = tape.gradient(loss, self.model.trainable_variables)
self.optimizer.apply_gradients(
zip(gradients, self.model.trainable_variables)
)
self.train_loss.update_state(loss)
self.train_accuracy.update_state(y, predictions)
return loss
@tf.function
def test_step(self, x, y):
predictions = self.model(x, training=False)
loss = self.loss_fn(y, predictions)
self.val_loss.update_state(loss)
self.val_accuracy.update_state(y, predictions)
return loss
def fit(self, train_dataset, val_dataset, epochs):
for epoch in range(epochs):
# 重置指标
self.train_loss.reset_states()
self.train_accuracy.reset_states()
self.val_loss.reset_states()
self.val_accuracy.reset_states()
# 训练
for x_batch, y_batch in train_dataset:
self.train_step(x_batch, y_batch)
# 验证
for x_batch, y_batch in val_dataset:
self.test_step(x_batch, y_batch)
# 打印结果
print(
f'Epoch {epoch + 1}, '
f'Loss: {self.train_loss.result():.4f}, '
f'Accuracy: {self.train_accuracy.result():.4f}, '
f'Val Loss: {self.val_loss.result():.4f}, '
f'Val Accuracy: {self.val_accuracy.result():.4f}'
)
# 使用示例
if __name__ == "__main__":
# 配置
config = {
'input_dim': 784,
'hidden_dims': [512, 256, 128],
'output_dim': 10,
'activation': 'relu',
'dropout_rate': 0.3,
'task': 'classification',
'loss': 'sparse_categorical_crossentropy',
'optimizer': {'name': 'adam', 'lr': 0.001},
'early_stopping': True,
'lr_scheduler': True
}
# 创建模型
model = AdvancedANN(config)
model.build(input_shape=(None, config['input_dim']))
# 创建训练器
trainer = CustomTrainer(model, config)
# 准备数据(MNIST示例)
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0
# 分割验证集
X_val = X_train[-10000:]
y_val = y_train[-10000:]
X_train = X_train[:-10000]
y_train = y_train[:-10000]
# 训练
history = trainer.train(
X_train, y_train,
X_val, y_val,
epochs=50,
batch_size=64
)
# 评估
trainer.evaluate(X_test, y_test)
TensorFlow高级特性
1. 混合精度训练
python
# 启用混合精度
policy = tf.keras.mixed_precision.Policy('mixed_float16')
tf.keras.mixed_precision.set_global_policy(policy)
# 模型定义时注意输出层
class MixedPrecisionModel(keras.Model):
def __init__(self):
super().__init__()
self.dense1 = layers.Dense(128, activation='relu')
self.dense2 = layers.Dense(10)
def call(self, inputs):
x = self.dense1(inputs)
outputs = self.dense2(x)
# 确保输出是float32
outputs = tf.cast(outputs, tf.float32)
return outputs
2. 分布式训练
python
# 多GPU策略
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = create_model()
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# TPU策略
resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)
3. 模型量化
python
# 训练后量化
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
# 量化感知训练
import tensorflow_model_optimization as tfmot
quantize_model = tfmot.quantization.keras.quantize_model
q_aware_model = quantize_model(model)
q_aware_model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
4. 自定义训练策略
python
@tf.function
def distributed_train_step(dataset_inputs):
per_replica_losses = strategy.run(
train_step, args=(dataset_inputs,)
)
return strategy.reduce(
tf.distribute.ReduceOp.SUM,
per_replica_losses,
axis=None
)
实战案例
案例1:MNIST手写数字识别
PyTorch实现
python
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# 数据预处理
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# 加载数据
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)
# 定义模型
class MNISTNet(nn.Module):
def __init__(self):
super(MNISTNet, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(784, 512)
self.fc2 = nn.Linear(512, 256)
self.fc3 = nn.Linear(256, 128)
self.fc4 = nn.Linear(128, 10)
self.dropout = nn.Dropout(0.2)
def forward(self, x):
x = self.flatten(x)
x = torch.relu(self.fc1(x))
x = self.dropout(x)
x = torch.relu(self.fc2(x))
x = self.dropout(x)
x = torch.relu(self.fc3(x))
x = self.dropout(x)
x = self.fc4(x)
return torch.log_softmax(x, dim=1)
# 训练
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MNISTNet().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.NLLLoss()
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')
def test():
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader)
accuracy = 100. * correct / len(test_loader.dataset)
print(f'\nTest set: Average loss: {test_loss:.4f}, '
f'Accuracy: {correct}/{len(test_loader.dataset)} ({accuracy:.2f}%)\n')
# 执行训练
for epoch in range(1, 11):
train(epoch)
test()
TensorFlow实现
python
import tensorflow as tf
from tensorflow import keras
# 加载数据
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
# 预处理
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
# 构建模型
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(512, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(256, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
# 编译模型
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# 训练
history = model.fit(
X_train, y_train,
batch_size=64,
epochs=10,
validation_split=0.1,
callbacks=[
keras.callbacks.EarlyStopping(patience=3),
keras.callbacks.ModelCheckpoint('best_mnist_model.h5', save_best_only=True)
]
)
# 评估
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f'Test accuracy: {test_acc:.4f}')
案例2:时间序列预测
python
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler
class TimeSeriesANN(nn.Module):
def __init__(self, input_size, hidden_sizes, output_size):
super(TimeSeriesANN, self).__init__()
layers = []
prev_size = input_size
for hidden_size in hidden_sizes:
layers.extend([
nn.Linear(prev_size, hidden_size),
nn.ReLU(),
nn.BatchNorm1d(hidden_size),
nn.Dropout(0.2)
])
prev_size = hidden_size
layers.append(nn.Linear(prev_size, output_size))
self.model = nn.Sequential(*layers)
def forward(self, x):
return self.model(x)
def create_sequences(data, seq_length, pred_length):
X, y = [], []
for i in range(len(data) - seq_length - pred_length + 1):
X.append(data[i:i+seq_length])
y.append(data[i+seq_length:i+seq_length+pred_length])
return np.array(X), np.array(y)
# 生成示例数据
time = np.arange(0, 100, 0.1)
data = np.sin(time) + 0.1 * np.random.randn(len(time))
# 数据预处理
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data.reshape(-1, 1)).flatten()
# 创建序列
seq_length = 20
pred_length = 5
X, y = create_sequences(data_scaled, seq_length, pred_length)
# 分割数据
split_idx = int(0.8 * len(X))
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]
# 转换为张量
X_train = torch.FloatTensor(X_train)
y_train = torch.FloatTensor(y_train)
X_test = torch.FloatTensor(X_test)
y_test = torch.FloatTensor(y_test)
# 创建模型
model = TimeSeriesANN(
input_size=seq_length,
hidden_sizes=[128, 64, 32],
output_size=pred_length
)
# 训练
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()
epochs = 100
batch_size = 32
for epoch in range(epochs):
model.train()
epoch_loss = 0
for i in range(0, len(X_train), batch_size):
batch_X = X_train[i:i+batch_size]
batch_y = y_train[i:i+batch_size]
optimizer.zero_grad()
predictions = model(batch_X)
loss = criterion(predictions, batch_y)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
if (epoch + 1) % 10 == 0:
model.eval()
with torch.no_grad():
test_predictions = model(X_test)
test_loss = criterion(test_predictions, y_test)
print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {epoch_loss/len(X_train)*batch_size:.4f}, '
f'Test Loss: {test_loss:.4f}')
高级主题
1. 正则化技术
L1/L2正则化
python
# PyTorch
class RegularizedModel(nn.Module):
def __init__(self, lambda_l1=0.01, lambda_l2=0.01):
super().__init__()
self.lambda_l1 = lambda_l1
self.lambda_l2 = lambda_l2
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
return self.fc2(x)
def l1_regularization(self):
l1_norm = sum(p.abs().sum() for p in self.parameters())
return self.lambda_l1 * l1_norm
def l2_regularization(self):
l2_norm = sum(p.pow(2).sum() for p in self.parameters())
return self.lambda_l2 * l2_norm
# TensorFlow
model = keras.Sequential([
keras.layers.Dense(
256,
activation='relu',
kernel_regularizer=keras.regularizers.l1_l2(l1=0.01, l2=0.01)
),
keras.layers.Dense(10)
])
Dropout变体
python
# Spatial Dropout
class SpatialDropout1D(nn.Module):
def __init__(self, p):
super().__init__()
self.p = p
def forward(self, x):
if self.training:
mask = torch.bernoulli(torch.ones_like(x[0]) * (1 - self.p))
return x * mask.unsqueeze(0)
return x
# Alpha Dropout (用于SELU激活)
class AlphaDropout(nn.Module):
def __init__(self, p=0.5):
super().__init__()
self.p = p
self.alpha = -1.7580993408473766
self.scale = 1.0507009873554804
def forward(self, x):
if self.training:
alpha_p = -self.alpha * self.scale
mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p))
return mask * x + (1 - mask) * alpha_p
return x
2. 批归一化及其变体
python
# Layer Normalization
class LayerNorm(nn.Module):
def __init__(self, features, eps=1e-6):
super().__init__()
self.gamma = nn.Parameter(torch.ones(features))
self.beta = nn.Parameter(torch.zeros(features))
self.eps = eps
def forward(self, x):
mean = x.mean(-1, keepdim=True)
std = x.std(-1, keepdim=True)
return self.gamma * (x - mean) / (std + self.eps) + self.beta
# Group Normalization
class GroupNorm(nn.Module):
def __init__(self, num_groups, num_channels, eps=1e-5):
super().__init__()
self.num_groups = num_groups
self.eps = eps
self.gamma = nn.Parameter(torch.ones(1, num_channels, 1))
self.beta = nn.Parameter(torch.zeros(1, num_channels, 1))
def forward(self, x):
N, C, H = x.shape
x = x.view(N, self.num_groups, -1)
mean = x.mean(-1, keepdim=True)
var = x.var(-1, keepdim=True)
x = (x - mean) / torch.sqrt(var + self.eps)
x = x.view(N, C, H)
return x * self.gamma + self.beta
3. 注意力机制
python
class AttentionLayer(nn.Module):
def __init__(self, hidden_size):
super().__init__()
self.hidden_size = hidden_size
self.attention = nn.Sequential(
nn.Linear(hidden_size, hidden_size),
nn.Tanh(),
nn.Linear(hidden_size, 1)
)
def forward(self, x):
# x shape: (batch_size, seq_length, hidden_size)
attention_weights = self.attention(x)
attention_weights = torch.softmax(attention_weights, dim=1)
weighted = x * attention_weights
return weighted.sum(dim=1)
# Self-Attention
class SelfAttention(nn.Module):
def __init__(self, embed_size, heads):
super().__init__()
self.embed_size = embed_size
self.heads = heads
self.head_dim = embed_size // heads
self.queries = nn.Linear(self.head_dim, self.head_dim, bias=False)
self.keys = nn.Linear(self.head_dim, self.head_dim, bias=False)
self.values = nn.Linear(self.head_dim, self.head_dim, bias=False)
self.fc_out = nn.Linear(heads * self.head_dim, embed_size)
def forward(self, values, keys, query, mask):
N = query.shape[0]
value_len, key_len, query_len = values.shape[1], keys.shape[1], query.shape[1]
# Split embedding into heads
values = values.reshape(N, value_len, self.heads, self.head_dim)
keys = keys.reshape(N, key_len, self.heads, self.head_dim)
queries = query.reshape(N, query_len, self.heads, self.head_dim)
values = self.values(values)
keys = self.keys(keys)
queries = self.queries(queries)
# Attention mechanism
energy = torch.einsum("nqhd,nkhd->nhqk", [queries, keys])
if mask is not None:
energy = energy.masked_fill(mask == 0, float("-1e20"))
attention = torch.softmax(energy / (self.embed_size ** (1/2)), dim=3)
out = torch.einsum("nhql,nlhd->nqhd", [attention, values]).reshape(
N, query_len, self.heads * self.head_dim
)
return self.fc_out(out)
4. 残差连接和跳跃连接
python
class ResidualBlock(nn.Module):
def __init__(self, in_features, out_features):
super().__init__()
self.fc1 = nn.Linear(in_features, out_features)
self.bn1 = nn.BatchNorm1d(out_features)
self.fc2 = nn.Linear(out_features, out_features)
self.bn2 = nn.BatchNorm1d(out_features)
# 跳跃连接
self.shortcut = nn.Sequential()
if in_features != out_features:
self.shortcut = nn.Sequential(
nn.Linear(in_features, out_features),
nn.BatchNorm1d(out_features)
)
def forward(self, x):
residual = x
out = self.fc1(x)
out = self.bn1(out)
out = torch.relu(out)
out = self.fc2(out)
out = self.bn2(out)
out += self.shortcut(residual)
out = torch.relu(out)
return out
# DenseNet风格的连接
class DenseBlock(nn.Module):
def __init__(self, in_features, growth_rate, num_layers):
super().__init__()
self.layers = nn.ModuleList()
for i in range(num_layers):
self.layers.append(
nn.Sequential(
nn.Linear(in_features + i * growth_rate, growth_rate),
nn.BatchNorm1d(growth_rate),
nn.ReLU()
)
)
def forward(self, x):
features = [x]
for layer in self.layers:
new_features = layer(torch.cat(features, dim=1))
features.append(new_features)
return torch.cat(features, dim=1)
性能优化与调试
1. 梯度问题诊断
python
def check_gradients(model):
"""检查梯度消失和爆炸"""
gradients = []
for name, param in model.named_parameters():
if param.grad is not None:
grad_norm = param.grad.data.norm(2).item()
gradients.append({
'layer': name,
'grad_norm': grad_norm,
'shape': list(param.shape)
})
# 分析
grad_norms = [g['grad_norm'] for g in gradients]
print(f"Mean gradient norm: {np.mean(grad_norms):.6f}")
print(f"Max gradient norm: {np.max(grad_norms):.6f}")
print(f"Min gradient norm: {np.min(grad_norms):.6f}")
# 检查问题
if np.max(grad_norms) > 100:
print("WARNING: Possible gradient explosion!")
if np.min(grad_norms) < 1e-6:
print("WARNING: Possible gradient vanishing!")
return gradients
# 梯度裁剪
def clip_gradients(model, max_norm=1.0):
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
2. 模型性能分析
python
import time
import torch.profiler as profiler
def profile_model(model, input_shape, device='cuda'):
"""性能分析"""
model.eval()
input_data = torch.randn(*input_shape).to(device)
# 预热
for _ in range(10):
_ = model(input_data)
# 计时
torch.cuda.synchronize()
start_time = time.time()
with profiler.profile(
activities=[profiler.ProfilerActivity.CPU, profiler.ProfilerActivity.CUDA],
record_shapes=True,
profile_memory=True
) as prof:
for _ in range(100):
_ = model(input_data)
torch.cuda.synchronize()
end_time = time.time()
# 结果
avg_time = (end_time - start_time) / 100
print(f"Average inference time: {avg_time*1000:.2f} ms")
print(f"Throughput: {1/avg_time:.2f} samples/sec")
# 详细分析
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
return prof
3. 内存优化
python
def optimize_memory(model):
"""内存优化技巧"""
# 1. 梯度累积
def gradient_accumulation_training(model, dataloader, accumulation_steps=4):
model.zero_grad()
for i, (inputs, labels) in enumerate(dataloader):
outputs = model(inputs)
loss = criterion(outputs, labels)
loss = loss / accumulation_steps
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
model.zero_grad()
# 2. 梯度检查点
from torch.utils.checkpoint import checkpoint
class CheckpointedModel(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(784, 256)
self.layer2 = nn.Linear(256, 128)
self.layer3 = nn.Linear(128, 10)
def forward(self, x):
x = checkpoint(self.layer1, x)
x = checkpoint(self.layer2, x)
return self.layer3(x)
# 3. 清理缓存
torch.cuda.empty_cache()
# 4. 使用inplace操作
x = torch.relu_(x) # inplace version
4. 超参数优化
python
from sklearn.model_selection import RandomizedSearchCV
import optuna
def optuna_optimization(trial):
"""使用Optuna进行超参数优化"""
# 超参数搜索空间
config = {
'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1e-1),
'batch_size': trial.suggest_categorical('batch_size', [16, 32, 64, 128]),
'n_layers': trial.suggest_int('n_layers', 1, 5),
'n_units': trial.suggest_int('n_units', 32, 512, step=32),
'dropout': trial.suggest_uniform('dropout', 0.0, 0.5),
'activation': trial.suggest_categorical('activation', ['relu', 'tanh', 'elu'])
}
# 构建模型
model = build_model(config)
# 训练
val_accuracy = train_and_evaluate(model, config)
return val_accuracy
# 运行优化
study = optuna.create_study(direction='maximize')
study.optimize(optuna_optimization, n_trials=100)
print(f"Best parameters: {study.best_params}")
print(f"Best value: {study.best_value}")
5. 可视化工具
python
import matplotlib.pyplot as plt
import seaborn as sns
def visualize_training(history):
"""可视化训练过程"""
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# 损失曲线
axes[0].plot(history['train_loss'], label='Train Loss')
axes[0].plot(history['val_loss'], label='Val Loss')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[0].set_title('Training and Validation Loss')
# 准确率曲线
axes[1].plot(history['train_acc'], label='Train Acc')
axes[1].plot(history['val_acc'], label='Val Acc')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].legend()
axes[1].set_title('Training and Validation Accuracy')
plt.tight_layout()
plt.show()
def visualize_weights(model):
"""可视化权重分布"""
weights = []
names = []
for name, param in model.named_parameters():
if 'weight' in name:
weights.append(param.detach().cpu().numpy().flatten())
names.append(name)
fig, axes = plt.subplots(len(weights), 1, figsize=(10, 3*len(weights)))
for i, (w, name) in enumerate(zip(weights, names)):
axes[i].hist(w, bins=50, alpha=0.7)
axes[i].set_title(f'Weight distribution: {name}')
axes[i].set_xlabel('Weight value')
axes[i].set_ylabel('Frequency')
plt.tight_layout()
plt.show()
总结
关键要点
-
架构设计
- 选择合适的网络深度和宽度
- 使用批归一化加速训练
- 添加残差连接缓解梯度问题
- 合理使用正则化防止过拟合
-
训练技巧
- 正确初始化权重
- 选择合适的优化器和学习率
- 使用学习率调度策略
- 监控梯度和损失变化
-
性能优化
- 使用混合精度训练
- 实施分布式训练
- 模型量化和剪枝
- 内存和计算优化
-
调试方法
- 可视化训练过程
- 检查梯度流动
- 分析模型性能瓶颈
- 系统化超参数搜索
最佳实践
-
数据处理
- 数据标准化/归一化
- 数据增强提升泛化
- 处理类别不平衡
- 合理划分数据集
-
模型开发
- 从简单模型开始
- 逐步增加复杂度
- 使用预训练模型
- 模块化设计
-
实验管理
- 版本控制代码和数据
- 记录所有超参数
- 保存检查点和日志
- 可重现的实验设置
-
部署考虑
- 模型压缩和优化
- 推理性能测试
- 错误处理和监控
- 持续更新和维护
未来发展方向
-
自动化机器学习(AutoML)
- 神经架构搜索(NAS)
- 自动超参数优化
- 自动特征工程
-
高效神经网络
- 轻量级架构设计
- 知识蒸馏
- 网络剪枝和量化
-
可解释性
- 注意力可视化
- 特征重要性分析
- 决策路径追踪
-
新型架构
- Transformer在各领域应用
- 图神经网络
- 神经常微分方程