目录
1. 算法简介
1.1 背景与动机
在传统故障诊断方法中,我们通常依赖人工设计的特征提取方法(如频谱分析、包络分析、小波变换等)来从原始信号中提取特征。这种方法存在以下局限性:
- 特征设计依赖专家经验:需要领域知识来设计有效的特征
- 泛化能力差:在一种设备上有效的特征可能在另一种设备上失效
- 无法端到端优化:特征提取和分类器训练分离,无法联合优化
深度卷积神经网络(CNN)的出现为这一问题提供了革命性的解决方案。CNN能够自动从原始数据中学习层次化特征表示,避免了人工特征设计的复杂性和局限性。
1.2 CNN在PHM中的核心优势
| 优势 | 说明 |
|---|---|
| 端到端学习 | 直接从原始信号/图像中学习特征,无需人工特征工程 |
| 层次化特征提取 | 低层提取边缘、频率等基础特征,高层提取语义级故障模式 |
| 参数共享 | 卷积核参数共享大幅减少参数量,提高泛化能力 |
| 平移不变性 | 对信号/图像的微小平移不敏感,增强鲁棒性 |
1.3 CNN故障诊断变体概览
本文将详细介绍以下CNN变体在故障诊断中的应用:
┌─────────────────────────────────────────────────────────────────┐
│ CNN故障诊断变体体系 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ 1D-CNN │ │ 2D-CNN │ │ 多尺度CNN │ │
│ │ 直接处理 │ │ 处理时频图 │ │ 捕获多频段特征 │ │
│ │ 原始振动信号│ │ STFT/小波图 │ │ 多感受野融合 │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ 注意力增强 │ │ Wide ResNet │ │ DenseNet │ │
│ │ SE-Net/CBAM│ │ 加深加宽 │ │ 密集连接特征融合│ │
│ │ 特征重标定 │ │ 残差学习 │ │ 梯度流动优化 │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
2. 算法原理
2.1 1D-CNN:端到端原始信号处理
2.1.1 核心思想
1D-CNN直接处理一维时间序列信号(如振动信号、电流信号、声学信号),通过一维卷积操作自动提取信号特征。
2.1.2 一维卷积操作
对于输入信号 x ∈ R L x \in \mathbb{R}^{L} x∈RL(长度为L的序列),一维卷积定义为:
y [ n ] = ( x ∗ w ) [ n ] = ∑ m = 0 K − 1 x [ n + m ] ⋅ w [ m ] y[n] = (x * w)[n] = \sum_{m=0}^{K-1} x[n+m] \cdot w[m] y[n]=(x∗w)[n]=m=0∑K−1x[n+m]⋅w[m]
其中:
- w ∈ R K w \in \mathbb{R}^{K} w∈RK 是卷积核(权重向量)
- K K K 是卷积核大小(感受野)
- n n n 是输出位置索引
2.1.3 多通道一维卷积
对于多通道输入(如多传感器数据) X ∈ R C × L X \in \mathbb{R}^{C \times L} X∈RC×L,卷积操作扩展为:
y [ n ] = ∑ c = 1 C ∑ m = 0 K − 1 x c [ n + m ] ⋅ w c [ m ] + b y[n] = \sum_{c=1}^{C} \sum_{m=0}^{K-1} x_c[n+m] \cdot w_c[m] + b y[n]=c=1∑Cm=0∑K−1xc[n+m]⋅wc[m]+b
其中 C C C 是输入通道数。
2.1.4 特征图计算
卷积层输出特征图高度 H ′ H' H′ 计算公式:
H ′ = ⌊ H + 2 × p a d d i n g − k e r n e l _ s i z e s t r i d e ⌋ + 1 H' = \left\lfloor \frac{H + 2 \times padding - kernel\_size}{stride} \right\rfloor + 1 H′=⌊strideH+2×padding−kernel_size⌋+1
2.2 2D-CNN:时频图特征提取
2.2.1 时频表示
将一维时域信号转换为二维时频图像,常用方法包括:
短时傅里叶变换(STFT):
X ( n , ω ) = ∑ m = − ∞ ∞ x [ m ] w [ n − m ] e − j ω m X(n, \omega) = \sum_{m=-\infty}^{\infty} x[m] w[n-m] e^{-j\omega m} X(n,ω)=m=−∞∑∞x[m]w[n−m]e−jωm
连续小波变换(CWT):
C W T ( a , b ) = 1 ∣ a ∣ ∫ − ∞ ∞ x ( t ) ψ ∗ ( t − b a ) d t CWT(a, b) = \frac{1}{\sqrt{|a|}} \int_{-\infty}^{\infty} x(t) \psi^*\left(\frac{t-b}{a}\right) dt CWT(a,b)=∣a∣ 1∫−∞∞x(t)ψ∗(at−b)dt
其中 a a a 是尺度参数, b b b 是平移参数, ψ \psi ψ 是小波母函数。
2.2.2 二维卷积操作
对于输入特征图 X ∈ R H × W × C i n X \in \mathbb{R}^{H \times W \times C_{in}} X∈RH×W×Cin,卷积操作:
Y i , j , k = ∑ u = 0 K h − 1 ∑ v = 0 K w − 1 ∑ c = 1 C i n X i + u , j + v , c ⋅ W u , v , c , k + b k Y_{i,j,k} = \sum_{u=0}^{K_h-1} \sum_{v=0}^{K_w-1} \sum_{c=1}^{C_{in}} X_{i+u,j+v,c} \cdot W_{u,v,c,k} + b_k Yi,j,k=u=0∑Kh−1v=0∑Kw−1c=1∑CinXi+u,j+v,c⋅Wu,v,c,k+bk
2.3 多尺度CNN
2.3.1 多尺度特征提取原理
不同大小的卷积核可以捕获不同尺度的特征:
┌────────────────────────────────────────────────────────┐
│ │
│ 原始信号 ─┬─→ 小卷积核(3×1) ─→ 局部细节特征 │
│ │ │
│ ├─→ 中卷积核(5×1) ─→ 中频周期特征 │
│ │ │
│ └─→ 大卷积核(7×1) ─→ 全局趋势特征 │
│ │
│ ↓ │
│ 特征拼接与融合 │
│ ↓ │
│ 融合后的多尺度特征 │
│ │
└────────────────────────────────────────────────────────┘
2.3.2 数学表示
多尺度特征融合:
F m u l t i = C o n c a t ( F s m a l l , F m e d i u m , F l a r g e ) F_{multi} = Concat(F_{small}, F_{medium}, F_{large}) Fmulti=Concat(Fsmall,Fmedium,Flarge)
F o u t p u t = σ ( W f u s i o n ⋅ F m u l t i + b f u s i o n ) F_{output} = \sigma(W_{fusion} \cdot F_{multi} + b_{fusion}) Foutput=σ(Wfusion⋅Fmulti+bfusion)
2.4 注意力增强CNN
2.4.1 SE-Net:通道注意力
Squeeze-and-Excitation网络通过学习通道权重来强调重要特征:
Squeeze操作(全局信息嵌入):
z c = 1 H × W ∑ i = 1 H ∑ j = 1 W u c ( i , j ) = GAP ( U ) z_c = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} u_c(i,j) = \text{GAP}(U) zc=H×W1i=1∑Hj=1∑Wuc(i,j)=GAP(U)
Excitation操作(自适应校准):
s = σ ( W 2 ⋅ δ ( W 1 ⋅ z ) ) s = \sigma(W_2 \cdot \delta(W_1 \cdot z)) s=σ(W2⋅δ(W1⋅z))
其中 W 1 ∈ R C r × C W_1 \in \mathbb{R}^{\frac{C}{r} \times C} W1∈RrC×C, W 2 ∈ R C × C r W_2 \in \mathbb{R}^{C \times \frac{C}{r}} W2∈RC×rC, r r r 是缩减比。
特征重标定:
x ~ c = s c ⋅ x c \tilde{x}_c = s_c \cdot x_c x~c=sc⋅xc
2.4.2 CBAM:通道+空间注意力
Convolutional Block Attention Module依次应用通道注意力和空间注意力:
通道注意力:
M c ( F ) = σ ( M L P ( A v g P o o l ( F ) ) + M L P ( M a x P o o l ( F ) ) ) M_c(F) = \sigma(MLP(AvgPool(F)) + MLP(MaxPool(F))) Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
空间注意力:
M s ( F ) = σ ( f 7 × 7 ( [ A v g P o o l ( F ) ; M a x P o o l ( F ) ] ) ) ) M_s(F) = \sigma(f^{7\times7}([AvgPool(F); MaxPool(F)]))) Ms(F)=σ(f7×7([AvgPool(F);MaxPool(F)])))
最终输出:
F ′ = M c ( F ) ⊗ F F' = M_c(F) \otimes F F′=Mc(F)⊗F
F ′ ′ = M s ( F ′ ) ⊗ F ′ F'' = M_s(F') \otimes F' F′′=Ms(F′)⊗F′
2.5 Wide ResNet
2.5.1 残差连接
ResNet的残差学习框架:
y = F ( x , { W i } ) + x y = F(x, \{W_i\}) + x y=F(x,{Wi})+x
其中 F F F 是残差映射, x x x 是恒等映射。
2.5.2 Wide vs Deep
Wide ResNet通过增加网络宽度(通道数)而非深度来提升性能:
| 策略 | 方法 | 特点 |
|---|---|---|
| Deep | 增加网络层数 | 梯度消失/爆炸,训练困难 |
| Wide | 增加通道数 | 参数利用率高,训练稳定 |
Wide ResNet残差块:
y = σ ( B N ( W 1 ⋅ x ) ) ⊗ σ ( B N ( W 2 ⋅ x ) ) ) + x y = \sigma(BN(W_1 \cdot x)) \otimes \sigma(BN(W_2 \cdot x))) + x y=σ(BN(W1⋅x))⊗σ(BN(W2⋅x)))+x
其中 W W W 的宽度因子为 k k k(通常 k > 1 k > 1 k>1)。
2.6 DenseNet:密集连接
2.6.1 密集块结构
DenseNet每一层接收所有前面层的特征图作为输入:
x l = H l ( [ x 0 , x 1 , . . . , x l − 1 ] ) x_l = H_l([x_0, x_1, ..., x_{l-1}]) xl=Hl([x0,x1,...,xl−1])
其中 [ x 0 , x 1 , . . . , x l − 1 ] [x_0, x_1, ..., x_{l-1}] [x0,x1,...,xl−1] 表示各层输出的拼接。
2.6.2 增长率
DenseNet的特征图数量增长由增长率 k k k 控制:
C o u t = C i n + k × ( l − 1 ) C_{out} = C_{in} + k \times (l-1) Cout=Cin+k×(l−1)
DenseNet的优势在于特征复用 和梯度流动优化。
3. 代码实现
3.1 完整示例:多尺度注意力CNN故障诊断
本示例展示如何实现一个结合多尺度特征提取和SE注意力的CNN网络,用于轴承故障诊断。
python
"""
多尺度注意力CNN故障诊断模型
适用于旋转机械振动信号的端到端故障分类
"""
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
# ==================== 数据生成与预处理 ====================
def generate_bearing_data(samples_per_class=200, signal_length=1024):
"""
生成模拟轴承振动信号数据
包含正常、内圈故障、外圈故障、滚动体故障四类
"""
np.random.seed(42)
X, y = [], []
# 采样频率和转频
fs = 12000 # 12kHz
f_rpm = 1800 # 1800 RPM
f_rotation = f_rpm / 60 # 30Hz
for label, fault_type in enumerate(['Normal', 'InnerRace', 'OuterRace', 'Ball']):
for _ in range(samples_per_class):
t = np.linspace(0, signal_length/fs, signal_length)
# 基础振动成分
signal = 0.5 * np.sin(2 * np.pi * f_rotation * t)
# 添加噪声
noise = np.random.normal(0, 0.1, signal_length)
signal += noise
# 添加故障特征频率成分
if fault_type == 'Normal':
# 正常信号主要包含转频和噪声
pass
elif fault_type == 'InnerRace':
# 内圈故障特征频率(假设BPFO = 3.5 * f_rotation)
bpfi = 5.4 * f_rotation
signal += 0.8 * np.sin(2 * np.pi * bpfi * t)
signal += 0.4 * np.sin(2 * np.pi * 2*bpfi * t)
elif fault_type == 'OuterRace':
# 外圈故障特征频率(假设BPFO = 2.3 * f_rotation)
bpfo = 2.3 * f_rotation
signal += 0.7 * np.sin(2 * np.pi * bpfo * t)
signal += 0.3 * np.sin(2 * np.pi * 2*bpfo * t)
elif fault_type == 'Ball':
# 滚动体故障特征频率
bsf = 4.2 * f_rotation
signal += 0.6 * np.sin(2 * np.pi * bsf * t)
signal += 0.3 * np.sin(2 * np.pi * 2*bsf * t)
X.append(signal)
y.append(label)
return np.array(X), np.array(y)
class VibrationDataset(Dataset):
"""振动信号数据集"""
def __init__(self, X, y, augment=False):
self.X = torch.FloatTensor(X).unsqueeze(1) # 添加通道维度
self.y = torch.LongTensor(y)
self.augment = augment
def __len__(self):
return len(self.y)
def __getitem__(self, idx):
x = self.X[idx]
y = self.y[idx]
if self.augment:
# 数据增强:随机噪声和时移
if torch.rand(1) > 0.5:
noise = torch.randn_like(x) * 0.05
x = x + noise
if torch.rand(1) > 0.5:
shift = torch.randint(-50, 50, (1,)).item()
x = torch.roll(x, shifts=shift, dims=1)
return x, y
# ==================== 模型定义 ====================
class SEBlock(nn.Module):
"""Squeeze-and-Excitation通道注意力模块"""
def __init__(self, channels, reduction=16):
super(SEBlock, self).__init__()
self.squeeze = nn.AdaptiveAvgPool1d(1)
self.excitation = nn.Sequential(
nn.Linear(channels, channels // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channels // reduction, channels, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _ = x.size()
# Squeeze
y = self.squeeze(x).view(b, c)
# Excitation
y = self.excitation(y).view(b, c, 1)
# Scale
return x * y.expand_as(x)
class MultiScaleConvBlock(nn.Module):
"""多尺度卷积块 - 捕获不同频率范围的特征"""
def __init__(self, in_channels, out_channels, kernel_sizes=[3, 5, 7]):
super(MultiScaleConvBlock, self).__init__()
self.branches = nn.ModuleList([
nn.Sequential(
nn.Conv1d(in_channels, out_channels // len(kernel_sizes),
kernel_size=k, padding=k//2),
nn.BatchNorm1d(out_channels // len(kernel_sizes)),
nn.ReLU(inplace=True)
)
for k in kernel_sizes
])
self.combine = nn.Sequential(
nn.Conv1d(out_channels, out_channels, kernel_size=1),
nn.BatchNorm1d(out_channels),
nn.ReLU(inplace=True)
)
def forward(self, x):
# 并行多尺度卷积
branch_outputs = [branch(x) for branch in self.branches]
# 特征拼接
concat = torch.cat(branch_outputs, dim=1)
# 整合
return self.combine(concat)
class AttentionCNN(nn.Module):
"""多尺度注意力CNN故障诊断模型"""
def __init__(self, input_length=1024, num_classes=4):
super(AttentionCNN, self).__init__()
# 输入归一化
self.input_norm = nn.BatchNorm1d(1)
# 多尺度特征提取层
self.conv1 = MultiScaleConvBlock(1, 32, kernel_sizes=[3, 7, 11])
self.se1 = SEBlock(32)
self.pool1 = nn.MaxPool1d(2)
self.conv2 = MultiScaleConvBlock(32, 64, kernel_sizes=[3, 5, 7])
self.se2 = SEBlock(64)
self.pool2 = nn.MaxPool1d(2)
self.conv3 = nn.Sequential(
nn.Conv1d(64, 128, kernel_size=3, padding=1),
nn.BatchNorm1d(128),
nn.ReLU(inplace=True),
nn.Conv1d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm1d(128),
nn.ReLU(inplace=True)
)
self.se3 = SEBlock(128)
self.pool3 = nn.AdaptiveAvgPool1d(1)
# 全连接分类器
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(128, 64),
nn.ReLU(inplace=True),
nn.Dropout(0.3),
nn.Linear(64, num_classes)
)
self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv1d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm1d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.kaiming_normal_(m.weight)
nn.init.constant_(m.bias, 0)
def forward(self, x):
# 输入归一化
x = self.input_norm(x)
# 多尺度卷积 + SE注意力 + 池化
x = self.pool1(self.se1(self.conv1(x)))
x = self.pool2(self.se2(self.conv2(x)))
x = self.pool3(self.se3(self.conv3(x)))
# 展平
x = x.view(x.size(0), -1)
# 分类
x = self.classifier(x)
return x
# ==================== 2D-CNN时频图分类模型 ====================
def compute_stft_spectrogram(signal, nperseg=128, noverlap=96):
"""
计算信号的STFT时频图
"""
from scipy import signal as sig
frequencies, times, Zxx = sig.stft(
signal, fs=12000, nperseg=nperseg, noverlap=noverlap
)
# 取对数幅度谱
spectrogram = np.log(np.abs(Zxx) + 1e-8)
return spectrogram
class SpectrogramCNN(nn.Module):
"""处理STFT时频图的2D-CNN模型"""
def __init__(self, num_classes=4):
super(SpectrogramCNN, self).__init__()
self.features = nn.Sequential(
# Block 1
nn.Conv2d(1, 32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.Conv2d(32, 32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.MaxPool2d(2),
nn.Dropout(0.25),
# Block 2
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(2),
nn.Dropout(0.25),
# Block 3
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.MaxPool2d(2),
)
self.classifier = nn.Sequential(
nn.AdaptiveAvgPool2d((4, 4)),
nn.Flatten(),
nn.Dropout(0.5),
nn.Linear(128 * 4 * 4, 256),
nn.ReLU(inplace=True),
nn.Dropout(0.3),
nn.Linear(256, num_classes)
)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
# ==================== 训练与评估 ====================
def train_model(model, train_loader, val_loader, epochs=50, lr=0.001):
"""模型训练"""
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs)
best_acc = 0.0
history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}
for epoch in range(epochs):
# 训练阶段
model.train()
train_loss, train_correct, train_total = 0.0, 0, 0
for X, y in train_loader:
X, y = X.to(device), y.to(device)
optimizer.zero_grad()
outputs = model(X)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
train_loss += loss.item()
_, predicted = outputs.max(1)
train_total += y.size(0)
train_correct += predicted.eq(y).sum().item()
train_loss /= len(train_loader)
train_acc = 100.0 * train_correct / train_total
# 验证阶段
model.eval()
val_loss, val_correct, val_total = 0.0, 0, 0
with torch.no_grad():
for X, y in val_loader:
X, y = X.to(device), y.to(device)
outputs = model(X)
loss = criterion(outputs, y)
val_loss += loss.item()
_, predicted = outputs.max(1)
val_total += y.size(0)
val_correct += predicted.eq(y).sum().item()
val_loss /= len(val_loader)
val_acc = 100.0 * val_correct / val_total
scheduler.step()
history['train_loss'].append(train_loss)
history['val_loss'].append(val_loss)
history['train_acc'].append(train_acc)
history['val_acc'].append(val_acc)
if val_acc > best_acc:
best_acc = val_acc
torch.save(model.state_dict(), 'best_model.pth')
if (epoch + 1) % 10 == 0:
print(f'Epoch [{epoch+1}/{epochs}] '
f'Train Loss: {train_loss:.4f} Acc: {train_acc:.2f}% | '
f'Val Loss: {val_loss:.4f} Acc: {val_acc:.2f}%')
print(f'Best Validation Accuracy: {best_acc:.2f}%')
return model, history
def evaluate_model(model, test_loader):
"""模型评估"""
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.eval()
all_preds, all_labels = [], []
correct, total = 0, 0
with torch.no_grad():
for X, y in test_loader:
X, y = X.to(device), y.to(device)
outputs = model(X)
_, predicted = outputs.max(1)
all_preds.extend(predicted.cpu().numpy())
all_labels.extend(y.cpu().numpy())
total += y.size(0)
correct += predicted.eq(y).sum().item()
accuracy = 100.0 * correct / total
# 分类报告
from sklearn.metrics import classification_report, confusion_matrix
class_names = ['Normal', 'InnerRace', 'OuterRace', 'Ball']
print(f'\nTest Accuracy: {accuracy:.2f}%')
print('\nClassification Report:')
print(classification_report(all_labels, all_preds, target_names=class_names))
print('\nConfusion Matrix:')
cm = confusion_matrix(all_labels, all_preds)
print(cm)
return accuracy, cm
# ==================== 主程序 ====================
def main():
print("=" * 60)
print("多尺度注意力CNN轴承故障诊断")
print("=" * 60)
# 1. 数据生成
print("\n[1] 生成模拟轴承振动信号...")
X, y = generate_bearing_data(samples_per_class=200, signal_length=1024)
# 数据标准化
scaler = StandardScaler()
X = scaler.fit_transform(X)
print(f"数据集形状: X={X.shape}, y={y.shape}")
print(f"类别分布: {np.bincount(y)}")
# 2. 数据划分
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
X_train, y_train, test_size=0.15, random_state=42, stratify=y_train
)
print(f"训练集: {X_train.shape[0]} 样本")
print(f"验证集: {X_val.shape[0]} 样本")
print(f"测试集: {X_test.shape[0]} 样本")
# 3. 创建数据集和数据加载器
train_dataset = VibrationDataset(X_train, y_train, augment=True)
val_dataset = VibrationDataset(X_val, y_val, augment=False)
test_dataset = VibrationDataset(X_test, y_test, augment=False)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
# 4. 创建模型
print("\n[2] 创建多尺度注意力CNN模型...")
model = AttentionCNN(input_length=1024, num_classes=4)
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"总参数量: {total_params:,}")
print(f"可训练参数量: {trainable_params:,}")
# 5. 训练模型
print("\n[3] 开始训练...")
model, history = train_model(model, train_loader, val_loader, epochs=50, lr=0.001)
# 6. 测试集评估
print("\n[4] 测试集评估...")
model.load_state_dict(torch.load('best_model.pth'))
test_acc, cm = evaluate_model(model, test_loader)
# 7. 可视化训练过程
print("\n[5] 生成训练曲线...")
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(history['train_loss'], label='Train Loss')
axes[0].plot(history['val_loss'], label='Val Loss')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].set_title('Training and Validation Loss')
axes[0].legend()
axes[0].grid(True)
axes[1].plot(history['train_acc'], label='Train Acc')
axes[1].plot(history['val_acc'], label='Val Acc')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy (%)')
axes[1].set_title('Training and Validation Accuracy')
axes[1].legend()
axes[1].grid(True)
plt.tight_layout()
plt.savefig('training_curves.png', dpi=150)
print("训练曲线已保存: training_curves.png")
print("\n" + "=" * 60)
print("训练完成!")
print(f"最终测试准确率: {test_acc:.2f}%")
print("=" * 60)
if __name__ == '__main__':
main()
3.2 Wide ResNet故障诊断实现
python
"""
Wide ResNet用于故障诊断
通过增加网络宽度提升特征提取能力
"""
import torch
import torch.nn as nn
class WideBasicBlock(nn.Module):
"""Wide ResNet基本块"""
def __init__(self, in_channels, out_channels, stride=1, dropout_rate=0.3):
super(WideBasicBlock, self).__init__()
self.bn1 = nn.BatchNorm1d(in_channels)
self.conv1 = nn.Conv1d(in_channels, out_channels, kernel_size=3,
padding=1, bias=False)
self.dropout = nn.Dropout(dropout_rate)
self.bn2 = nn.BatchNorm1d(out_channels)
self.conv2 = nn.Conv1d(out_channels, out_channels, kernel_size=3,
stride=stride, padding=1, bias=False)
# 残差连接
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.shortcut = nn.Conv1d(in_channels, out_channels,
kernel_size=1, stride=stride, bias=False)
def forward(self, x):
out = self.dropout(self.conv1(F.relu(self.bn1(x))))
out = self.conv2(F.relu(self.bn2(out)))
out += self.shortcut(x)
return out
class WideResNet(nn.Module):
"""
Wide ResNet故障诊断网络
特点:
- 通过增加宽度(通道数)而非深度来提升性能
- 比深层网络更容易训练
- 残差连接缓解梯度消失问题
"""
def __init__(self, input_length=1024, num_classes=4, depth=16, widen_factor=4,
dropout_rate=0.3):
super(WideResNet, self).__init__()
self.in_channels = 16
self.depth = depth
assert (depth - 4) % 6 == 0, 'Depth should be 6n+4'
n = (depth - 4) // 6
k = widen_factor
nStages = [16, 16*k, 32*k, 64*k] # 通道数配置
# 输入层
self.conv1 = nn.Conv1d(1, nStages[0], kernel_size=3, padding=1, bias=False)
# 残差阶段
self.layer1 = self._wide_layer(WideBasicBlock, nStages[1], n, stride=1,
dropout_rate=dropout_rate)
self.layer2 = self._wide_layer(WideBasicBlock, nStages[2], n, stride=2,
dropout_rate=dropout_rate)
self.layer3 = self._wide_layer(WideBasicBlock, nStages[3], n, stride=2,
dropout_rate=dropout_rate)
self.bn1 = nn.BatchNorm1d(nStages[3])
self.linear = nn.Linear(nStages[3], num_classes)
# 权重初始化
for m in self.modules():
if isinstance(m, nn.Conv1d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm1d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.xavier_normal_(m.weight)
nn.init.constant_(m.bias, 0)
def _wide_layer(self, block, out_channels, num_blocks, stride, dropout_rate):
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(block(self.in_channels, out_channels, stride, dropout_rate))
self.in_channels = out_channels
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv1(x)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = F.relu(self.bn1(out))
out = F.adaptive_avg_pool1d(out, 1)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
# ==================== DenseNet故障诊断实现 ====================
class DenseBlock(nn.Module):
"""DenseNet密集连接块"""
def __init__(self, in_channels, growth_rate, num_layers, bn_size=4,
drop_rate=0.3):
super(DenseBlock, self).__init__()
self.layers = nn.ModuleList()
for i in range(num_layers):
self.layers.append(self._make_layer(
in_channels + i * growth_rate,
growth_rate,
bn_size,
drop_rate
))
def _make_layer(self, in_channels, growth_rate, bn_size, drop_rate):
return nn.Sequential(
nn.BatchNorm1d(in_channels),
nn.ReLU(inplace=True),
nn.Conv1d(in_channels, bn_size * growth_rate, kernel_size=1, bias=False),
nn.BatchNorm1d(bn_size * growth_rate),
nn.ReLU(inplace=True),
nn.Conv1d(bn_size * growth_rate, growth_rate, kernel_size=3,
padding=1, bias=False),
nn.Dropout(drop_rate)
)
def forward(self, x):
features = [x]
for layer in self.layers:
new_features = layer(torch.cat(features, dim=1))
features.append(new_features)
return torch.cat(features, dim=1)
class TransitionBlock(nn.Module):
"""DenseNet过渡块 - 减少特征图尺寸和通道数"""
def __init__(self, in_channels, out_channels):
super(TransitionBlock, self).__init__()
self.transition = nn.Sequential(
nn.BatchNorm1d(in_channels),
nn.ReLU(inplace=True),
nn.Conv1d(in_channels, out_channels, kernel_size=1, bias=False),
nn.AvgPool1d(kernel_size=2, stride=2)
)
def forward(self, x):
return self.transition(x)
class DenseNet1D(nn.Module):
"""
DenseNet-1D故障诊断网络
特点:
- 密集连接促进特征复用
- 梯度流动更好,训练更稳定
- 参数量相对较少
"""
def __init__(self, input_length=1024, num_classes=4, growth_rate=16,
block_config=(4, 6, 8), num_init_features=32,
bn_size=4, drop_rate=0.3):
super(DenseNet1D, self).__init__()
# 输入卷积
self.features = nn.Sequential(
nn.Conv1d(1, num_init_features, kernel_size=7, stride=2, padding=3, bias=False),
nn.BatchNorm1d(num_init_features),
nn.ReLU(inplace=True),
nn.MaxPool1d(kernel_size=3, stride=2, padding=1)
)
# 密集块和过渡块
num_features = num_init_features
for i, num_layers in enumerate(block_config):
# 密集块
block = DenseBlock(num_features, growth_rate, num_layers, bn_size, drop_rate)
self.features.add_module(f'denseblock{i+1}', block)
num_features = num_features + num_layers * growth_rate
# 过渡块
if i != len(block_config) - 1:
trans = TransitionBlock(num_features, num_features // 2)
self.features.add_module(f'transition{i+1}', trans)
num_features = num_features // 2
# 最终归一化
self.features.add_module('bn_final', nn.BatchNorm1d(num_features))
self.features.add_module('relu_final', nn.ReLU(inplace=True))
# 分类器
self.classifier = nn.Sequential(
nn.AdaptiveAvgPool1d(1),
nn.Flatten(),
nn.Dropout(drop_rate),
nn.Linear(num_features, num_classes)
)
# 权重初始化
for m in self.modules():
if isinstance(m, nn.Conv1d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm1d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def forward(self, x):
features = self.features(x)
out = self.classifier(features)
return out
3.3 时频图分类完整示例
python
"""
2D-CNN时频图分类完整流程
将振动信号转换为STFT时频图,然后使用2D-CNN分类
"""
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from scipy import signal as sig
import warnings
warnings.filterwarnings('ignore')
def signal_to_spectrogram(signal, fs=12000, nperseg=128, noverlap=96):
"""
将一维信号转换为STFT时频图
参数:
signal: 一维时域信号
fs: 采样频率
nperseg: FFT窗口长度
noverlap: 窗口重叠点数
返回:
spectrogram: 归一化后的时频图 (H, W)
"""
f, t, Zxx = sig.stft(signal, fs=fs, nperseg=nperseg, noverlap=noverlap)
# 取对数幅度
spectrogram = np.log(np.abs(Zxx) + 1e-10)
# 归一化到[0, 1]
spectrogram = (spectrogram - spectrogram.min()) / (spectrogram.max() - spectrogram.min() + 1e-10)
return spectrogram
def wavelet_transform(signal, mother_wavelet='cmor1.5-1', scales=None, fs=12000):
"""
连续小波变换时频分析
参数:
mother_wavelet: 小波母函数(如'mor2' for Morlet, 'cmor1.5-1' for complex Morlet)
scales: 尺度序列
"""
import pywt
if scales is None:
# 自动计算尺度范围
fc = pywt.central_frequency(mother_wavelet)
# 对应频率范围: 10Hz - fs/2
scales = np.arange(1, 128)
# CWT
coefficients, frequencies = pywt.cwt(signal, scales, mother_wavelet, sampling_period=1/fs)
# 取对数幅度
spectrogram = np.log(np.abs(coefficients) + 1e-10)
return spectrogram, frequencies
class SpectrogramDataset(Dataset):
"""时频图数据集"""
def __init__(self, signals, labels, target_size=(64, 64)):
self.signals = signals
self.labels = labels
self.target_size = target_size
def __len__(self):
return len(self.labels)
def __getitem__(self, idx):
# 计算时频图
spec = signal_to_spectrogram(self.signals[idx])
# 调整大小到统一尺寸
spec = torch.FloatTensor(spec).unsqueeze(0) # (1, H, W)
spec = torch.nn.functional.interpolate(
spec.unsqueeze(0), size=self.target_size, mode='bilinear', align_corners=False
).squeeze(0)
return spec, self.labels[idx]
# 使用示例
if __name__ == '__main__':
# 假设已有原始振动信号
# signals: (N, signal_length)
# labels: (N,)
# 生成示例信号
np.random.seed(42)
n_samples = 100
signal_length = 1024
# 生成4类信号
signals = []
labels = []
for i in range(4):
for _ in range(n_samples // 4):
t = np.linspace(0, signal_length/12000, signal_length)
signal = np.sin(2 * np.pi * 30 * t) + 0.1 * np.random.randn(signal_length)
signals.append(signal)
labels.append(i)
signals = np.array(signals)
labels = np.array(labels)
# 创建数据集
dataset = SpectrogramDataset(signals, labels, target_size=(64, 64))
# 获取一个样本
spec, label = dataset[0]
print(f"时频图形状: {spec.shape}")
print(f"标签: {label}")
# 创建模型
model = SpectrogramCNN(num_classes=4)
# 测试前向传播
spec_batch = spec.unsqueeze(0) # 添加batch维度
output = model(spec_batch)
print(f"模型输出形状: {output.shape}")
4. PHM应用场景
4.1 1D-CNN适用场景
| 场景 | 特点 | 优势 |
|---|---|---|
| 滚动轴承故障诊断 | 振动信号丰富,故障特征明显 | 端到端学习,自动提取故障频率 |
| 电机故障诊断 | 定子电流、振动信号 | 抗噪声能力强 |
| 齿轮箱故障诊断 | 多级齿轮啮合,信号复杂 | 多尺度捕获不同级特征 |
| 实时监测系统 | 计算资源受限 | 模型轻量,推理速度快 |
4.2 2D-CNN适用场景
| 场景 | 特点 | 优势 |
|---|---|---|
| 复杂调制信号 | 频率成分随时间变化 | 时频图保留完整时频信息 |
| 非平稳故障信号 | 故障特征随时间演化 | 捕获时变特征 |
| 多传感器融合 | 多个传感器的时频图叠加 | 二维卷积天然处理空间关系 |
| 小样本诊断 | 时频图数据增强效果好 | 可利用迁移学习 |
4.3 注意力增强CNN适用场景
| 场景 | 特点 | 注意力优势 |
|---|---|---|
| 变工况故障诊断 | 负载/转速变化影响特征 | SE-CBAM自适应调整特征权重 |
| 微弱故障检测 | 故障特征容易被噪声掩盖 | 注意力聚焦关键频段 |
| 多故障并发 | 多种故障同时存在 | 选择性关注不同故障特征 |
| 跨域迁移诊断 | 不同设备/工况数据分布差异 | 通道注意力增强泛化能力 |
4.4 Wide ResNet/DenseNet适用场景
| 模型 | 适用场景 | 特点 |
|---|---|---|
| Wide ResNet | 数据量较大、类别较多 | 宽度提升特征丰富度,训练稳定 |
| DenseNet | 特征复用重要、梯度流动问题 | 密集连接促进特征利用,适用于深层网络 |
4.5 工业应用案例
案例1:轴承故障诊断
应用场景:风电齿轮箱轴承监测
数据:振动加速度信号,采样率20kHz,信号长度2048
模型:1D-CNN + SE注意力
效果:相比传统包络谱分析,准确率提升15%,漏检率降低60%
案例2:刀具磨损监测
应用场景:数控机床刀具状态监测
数据:声发射信号 + 振动信号融合
模型:多尺度2D-CNN处理时频图
效果:实现刀具磨钝的早期预警,准确率达97%
案例3:电机轴承故障
应用场景:地铁牵引电机健康管理
数据:三相电流信号
模型:Wide ResNet (depth=22, widen_factor=8)
效果:6类故障分类准确率99.2%
5. 注意事项与总结
5.1 模型选择指南
┌──────────────────────────────────────────────────────────────┐
│ CNN模型选择决策树 │
├──────────────────────────────────────────────────────────────┤
│ │
│ 输入数据 ──┬── 一维原始信号 ──→ 1D-CNN │
│ │ │
│ ├── 多传感器信号 ──→ 多尺度1D-CNN │
│ │ │
│ └── 预处理后信号 ──→ 2D-CNN │
│ │
│ 任务特点 ──┬── 计算资源受限 ──→ 轻量级1D-CNN │
│ │ │
│ ├── 微弱故障检测 ──→ 注意力增强CNN │
│ │ │
│ ├── 大数据量 ──→ Wide ResNet │
│ │ │
│ └── 深层网络 ──→ DenseNet │
│ │
└──────────────────────────────────────────────────────────────┘
5.2 关键参数设置建议
| 参数 | 推荐范围 | 说明 |
|---|---|---|
| 卷积核大小 | 3-15 | 轴承故障:5-11;电机:3-7 |
| 卷积层数 | 3-8 | 过多易过拟合;过少特征提取不足 |
| 通道数增长 | 32→64→128 | 逐层翻倍 |
| Dropout | 0.2-0.5 | 防止过拟合 |
| 学习率 | 0.001-0.0001 | 使用学习率衰减 |
| Batch Size | 16-64 | 根据显存调整 |
5.3 数据预处理注意事项
-
信号长度选择
- 应包含足够的故障周期特征
- 建议至少包含2-3个旋转周期
- 公式: L s i g n a l ≥ f s × 60 r p m m i n × 2 × 2 L_{signal} \geq \frac{fs \times 60}{rpm_{min} \times 2} \times 2 Lsignal≥rpmmin×2fs×60×2
-
归一化方法
python# 方法1:标准化(适用于2D-CNN时频图) spectrogram = (spectrogram - mean) / std # 方法2:Min-Max归一化(适用于1D-CNN原始信号) signal = (signal - signal.min()) / (signal.max() - signal.min()) -
数据增强
- 随机噪声注入: σ ∈ [ 0.01 , 0.1 ] \sigma \in [0.01, 0.1] σ∈[0.01,0.1]
- 随机时移: ± 10 % \pm 10\% ±10% 信号长度
- 随机尺度变换: [ 0.9 , 1.1 ] [0.9, 1.1] [0.9,1.1]
- 随机裁剪与填充
5.4 常见问题与解决方案
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 过拟合严重 | 数据量不足/模型复杂 | 增加数据增强、使用正则化、减少模型深度 |
| 收敛慢 | 学习率不当 | 使用学习率warmup或余弦退火 |
| 测试准确率低 | 域偏移/标签噪声 | 域适应技术、标签平滑 |
| 梯度消失 | 网络过深 | 使用残差连接、BatchNorm |
| 类别不平衡 | 故障样本少 | 过采样/SMOTE、类别加权损失 |
5.5 与其他方法对比
| 方法 | 特征提取 | 优势 | 劣势 |
|---|---|---|---|
| 传统CNN | 自动 | 端到端、特征自学习 | 需要大量数据 |
| 传统机器学习 | 手工设计 | 可解释性强、数据需求少 | 依赖专家经验 |
| Transformer | 自动+注意力 | 长期依赖建模 | 计算量大 |
| 时频分析+分类器 | 手工+自动 | 时频特征丰富 | 两阶段优化 |
5.6 总结
深度CNN在PHM故障诊断中展现出强大的特征自动提取能力:
- 1D-CNN:适合原始一维振动信号,端到端处理,推理速度快
- 2D-CNN:适合时频图输入,能够捕获时频联合特征
- 多尺度CNN:适合复杂多频段故障特征
- 注意力机制:增强关键特征抑制冗余,提升模型可解释性
- Wide ResNet/DenseNet:适合大数据量场景,进一步提升性能
实践建议:
- 从简单模型开始,逐步复杂化
- 充分利用数据增强提升泛化能力
- 结合领域知识设计网络结构
- 重视模型可解释性,指导故障机理研究
- 持续监控系统性能,及时更新模型