PHM算法学习 Day 3：深度卷积神经网络（CNN）故障诊断变体

1. 算法简介

1.1 背景与动机

在传统故障诊断方法中，我们通常依赖人工设计的特征提取方法（如频谱分析、包络分析、小波变换等）来从原始信号中提取特征。这种方法存在以下局限性：

特征设计依赖专家经验：需要领域知识来设计有效的特征
泛化能力差：在一种设备上有效的特征可能在另一种设备上失效
无法端到端优化：特征提取和分类器训练分离，无法联合优化

深度卷积神经网络（CNN）的出现为这一问题提供了革命性的解决方案。CNN能够自动从原始数据中学习层次化特征表示，避免了人工特征设计的复杂性和局限性。

1.2 CNN在PHM中的核心优势

优势	说明
端到端学习	直接从原始信号/图像中学习特征，无需人工特征工程
层次化特征提取	低层提取边缘、频率等基础特征，高层提取语义级故障模式
参数共享	卷积核参数共享大幅减少参数量，提高泛化能力
平移不变性	对信号/图像的微小平移不敏感，增强鲁棒性

1.3 CNN故障诊断变体概览

本文将详细介绍以下CNN变体在故障诊断中的应用：

复制代码

┌─────────────────────────────────────────────────────────────────┐
│                      CNN故障诊断变体体系                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐    │
│   │   1D-CNN    │    │   2D-CNN    │    │  多尺度CNN      │    │
│   │ 直接处理    │    │ 处理时频图  │    │  捕获多频段特征  │    │
│   │ 原始振动信号│    │ STFT/小波图 │    │  多感受野融合   │    │
│   └─────────────┘    └─────────────┘    └─────────────────┘    │
│                                                                 │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐    │
│   │ 注意力增强  │    │ Wide ResNet │    │    DenseNet     │    │
│   │ SE-Net/CBAM│    │  加深加宽   │    │  密集连接特征融合│    │
│   │ 特征重标定  │    │  残差学习   │    │  梯度流动优化   │    │
│   └─────────────┘    └─────────────┘    └─────────────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

2. 算法原理

2.1 1D-CNN：端到端原始信号处理

2.1.1 核心思想

1D-CNN直接处理一维时间序列信号（如振动信号、电流信号、声学信号），通过一维卷积操作自动提取信号特征。

2.1.2 一维卷积操作

对于输入信号 x ∈ R L x \in \mathbb{R}^{L} x∈RL（长度为L的序列），一维卷积定义为：

y $n$ = ( x ∗ w ) $n$ = ∑ m = 0 K − 1 x $n + m$ ⋅ w $m$ y $n$ = (x * w) $n$ = \sum_{m=0}^{K-1} x $n+m$ \cdot w $m$ y $n$ =(x∗w) $n$ =m=0∑K−1x $n+m$ ⋅w $m$

其中：

w ∈ R K w \in \mathbb{R}^{K} w∈RK 是卷积核（权重向量）
K K K 是卷积核大小（感受野）
n n n 是输出位置索引

2.1.3 多通道一维卷积

对于多通道输入（如多传感器数据） X ∈ R C × L X \in \mathbb{R}^{C \times L} X∈RC×L，卷积操作扩展为：

y $n$ = ∑ c = 1 C ∑ m = 0 K − 1 x c $n + m$ ⋅ w c $m$ + b y $n$ = \sum_{c=1}^{C} \sum_{m=0}^{K-1} x_c $n+m$ \cdot w_c $m$ + b y $n$ =c=1∑Cm=0∑K−1xc $n+m$ ⋅wc $m$ +b

其中 C C C 是输入通道数。

2.1.4 特征图计算

卷积层输出特征图高度 H ′ H' H′ 计算公式：

H ′ = ⌊ H + 2 × p a d d i n g − k e r n e l _ s i z e s t r i d e ⌋ + 1 H' = \left\lfloor \frac{H + 2 \times padding - kernel\_size}{stride} \right\rfloor + 1 H′=⌊strideH+2×padding−kernel_size⌋+1

2.2 2D-CNN：时频图特征提取

2.2.1 时频表示

将一维时域信号转换为二维时频图像，常用方法包括：

短时傅里叶变换（STFT）：
X ( n , ω ) = ∑ m = − ∞ ∞ x $m$ w $n - m$ e − j ω m X(n, \omega) = \sum_{m=-\infty}^{\infty} x $m$ w $n-m$ e^{-j\omega m} X(n,ω)=m=−∞∑∞x $m$ w $n-m$ e−jωm

连续小波变换（CWT）：
C W T ( a , b ) = 1 ∣ a ∣ ∫ − ∞ ∞ x ( t ) ψ ∗ ( t − b a ) d t CWT(a, b) = \frac{1}{\sqrt{|a|}} \int_{-\infty}^{\infty} x(t) \psi^*\left(\frac{t-b}{a}\right) dt CWT(a,b)=∣a∣ 1∫−∞∞x(t)ψ∗(at−b)dt

其中 a a a 是尺度参数， b b b 是平移参数， ψ \psi ψ 是小波母函数。

2.2.2 二维卷积操作

对于输入特征图 X ∈ R H × W × C i n X \in \mathbb{R}^{H \times W \times C_{in}} X∈RH×W×Cin，卷积操作：

Y i , j , k = ∑ u = 0 K h − 1 ∑ v = 0 K w − 1 ∑ c = 1 C i n X i + u , j + v , c ⋅ W u , v , c , k + b k Y_{i,j,k} = \sum_{u=0}^{K_h-1} \sum_{v=0}^{K_w-1} \sum_{c=1}^{C_{in}} X_{i+u,j+v,c} \cdot W_{u,v,c,k} + b_k Yi,j,k=u=0∑Kh−1v=0∑Kw−1c=1∑CinXi+u,j+v,c⋅Wu,v,c,k+bk

2.3 多尺度CNN

2.3.1 多尺度特征提取原理

不同大小的卷积核可以捕获不同尺度的特征：

复制代码

┌────────────────────────────────────────────────────────┐
│                                                        │
│   原始信号 ─┬─→ 小卷积核(3×1) ─→ 局部细节特征           │
│             │                                        │
│             ├─→ 中卷积核(5×1) ─→ 中频周期特征          │
│             │                                        │
│             └─→ 大卷积核(7×1) ─→ 全局趋势特征          │
│                                                        │
│                         ↓                              │
│                   特征拼接与融合                        │
│                         ↓                              │
│                   融合后的多尺度特征                    │
│                                                        │
└────────────────────────────────────────────────────────┘

2.3.2 数学表示

多尺度特征融合：
F m u l t i = C o n c a t ( F s m a l l , F m e d i u m , F l a r g e ) F_{multi} = Concat(F_{small}, F_{medium}, F_{large}) Fmulti=Concat(Fsmall,Fmedium,Flarge)

F o u t p u t = σ ( W f u s i o n ⋅ F m u l t i + b f u s i o n ) F_{output} = \sigma(W_{fusion} \cdot F_{multi} + b_{fusion}) Foutput=σ(Wfusion⋅Fmulti+bfusion)

2.4 注意力增强CNN

2.4.1 SE-Net：通道注意力

Squeeze-and-Excitation网络通过学习通道权重来强调重要特征：

Squeeze操作（全局信息嵌入）：
z c = 1 H × W ∑ i = 1 H ∑ j = 1 W u c ( i , j ) = GAP ( U ) z_c = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} u_c(i,j) = \text{GAP}(U) zc=H×W1i=1∑Hj=1∑Wuc(i,j)=GAP(U)

Excitation操作（自适应校准）：
s = σ ( W 2 ⋅ δ ( W 1 ⋅ z ) ) s = \sigma(W_2 \cdot \delta(W_1 \cdot z)) s=σ(W2⋅δ(W1⋅z))

其中 W 1 ∈ R C r × C W_1 \in \mathbb{R}^{\frac{C}{r} \times C} W1∈RrC×C， W 2 ∈ R C × C r W_2 \in \mathbb{R}^{C \times \frac{C}{r}} W2∈RC×rC， r r r 是缩减比。

特征重标定：
x ~ c = s c ⋅ x c \tilde{x}_c = s_c \cdot x_c x~c=sc⋅xc

2.4.2 CBAM：通道+空间注意力

Convolutional Block Attention Module依次应用通道注意力和空间注意力：

通道注意力：
M c ( F ) = σ ( M L P ( A v g P o o l ( F ) ) + M L P ( M a x P o o l ( F ) ) ) M_c(F) = \sigma(MLP(AvgPool(F)) + MLP(MaxPool(F))) Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))

空间注意力：
M s ( F ) = σ ( f 7 × 7 ( $A v g P o o l ( F ) ; M a x P o o l ( F )$ ) ) ) M_s(F) = \sigma(f^{7\times7}( $AvgPool(F); MaxPool(F)$ ))) Ms(F)=σ(f7×7( $AvgPool(F);MaxPool(F)$ )))

最终输出：
F ′ = M c ( F ) ⊗ F F' = M_c(F) \otimes F F′=Mc(F)⊗F
F ′ ′ = M s ( F ′ ) ⊗ F ′ F'' = M_s(F') \otimes F' F′′=Ms(F′)⊗F′

2.5 Wide ResNet

2.5.1 残差连接

ResNet的残差学习框架：
y = F ( x , { W i } ) + x y = F(x, \{W_i\}) + x y=F(x,{Wi})+x

其中 F F F 是残差映射， x x x 是恒等映射。

2.5.2 Wide vs Deep

Wide ResNet通过增加网络宽度（通道数）而非深度来提升性能：

策略	方法	特点
Deep	增加网络层数	梯度消失/爆炸，训练困难
Wide	增加通道数	参数利用率高，训练稳定

Wide ResNet残差块：
y = σ ( B N ( W 1 ⋅ x ) ) ⊗ σ ( B N ( W 2 ⋅ x ) ) ) + x y = \sigma(BN(W_1 \cdot x)) \otimes \sigma(BN(W_2 \cdot x))) + x y=σ(BN(W1⋅x))⊗σ(BN(W2⋅x)))+x

其中 W W W 的宽度因子为 k k k（通常 k > 1 k > 1 k>1）。

2.6 DenseNet：密集连接

2.6.1 密集块结构

DenseNet每一层接收所有前面层的特征图作为输入：

x l = H l ( $x 0 , x 1 , . . . , x l - 1$ ) x_l = H_l( $x_0, x_1, ..., x_{l-1}$ ) xl=Hl( $x0,x1,...,xl-1$ )

其中 $x 0 , x 1 , . . . , x l - 1$ $x_0, x_1, ..., x_{l-1}$ $x0,x1,...,xl-1$ 表示各层输出的拼接。

2.6.2 增长率

DenseNet的特征图数量增长由增长率 k k k 控制：

C o u t = C i n + k × ( l − 1 ) C_{out} = C_{in} + k \times (l-1) Cout=Cin+k×(l−1)

DenseNet的优势在于特征复用 和梯度流动优化。

3. 代码实现

3.1 完整示例：多尺度注意力CNN故障诊断

本示例展示如何实现一个结合多尺度特征提取和SE注意力的CNN网络，用于轴承故障诊断。

python 复制代码

"""
多尺度注意力CNN故障诊断模型
适用于旋转机械振动信号的端到端故障分类
"""

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

# ==================== 数据生成与预处理 ====================

def generate_bearing_data(samples_per_class=200, signal_length=1024):
    """
    生成模拟轴承振动信号数据
    包含正常、内圈故障、外圈故障、滚动体故障四类
    """
    np.random.seed(42)
    X, y = [], []
    
    # 采样频率和转频
    fs = 12000  # 12kHz
    f_rpm = 1800  # 1800 RPM
    f_rotation = f_rpm / 60  # 30Hz
    
    for label, fault_type in enumerate(['Normal', 'InnerRace', 'OuterRace', 'Ball']):
        for _ in range(samples_per_class):
            t = np.linspace(0, signal_length/fs, signal_length)
            
            # 基础振动成分
            signal = 0.5 * np.sin(2 * np.pi * f_rotation * t)
            
            # 添加噪声
            noise = np.random.normal(0, 0.1, signal_length)
            signal += noise
            
            # 添加故障特征频率成分
            if fault_type == 'Normal':
                # 正常信号主要包含转频和噪声
                pass
            elif fault_type == 'InnerRace':
                # 内圈故障特征频率（假设BPFO = 3.5 * f_rotation）
                bpfi = 5.4 * f_rotation
                signal += 0.8 * np.sin(2 * np.pi * bpfi * t)
                signal += 0.4 * np.sin(2 * np.pi * 2*bpfi * t)
            elif fault_type == 'OuterRace':
                # 外圈故障特征频率（假设BPFO = 2.3 * f_rotation）
                bpfo = 2.3 * f_rotation
                signal += 0.7 * np.sin(2 * np.pi * bpfo * t)
                signal += 0.3 * np.sin(2 * np.pi * 2*bpfo * t)
            elif fault_type == 'Ball':
                # 滚动体故障特征频率
                bsf = 4.2 * f_rotation
                signal += 0.6 * np.sin(2 * np.pi * bsf * t)
                signal += 0.3 * np.sin(2 * np.pi * 2*bsf * t)
            
            X.append(signal)
            y.append(label)
    
    return np.array(X), np.array(y)


class VibrationDataset(Dataset):
    """振动信号数据集"""
    
    def __init__(self, X, y, augment=False):
        self.X = torch.FloatTensor(X).unsqueeze(1)  # 添加通道维度
        self.y = torch.LongTensor(y)
        self.augment = augment
    
    def __len__(self):
        return len(self.y)
    
    def __getitem__(self, idx):
        x = self.X[idx]
        y = self.y[idx]
        
        if self.augment:
            # 数据增强：随机噪声和时移
            if torch.rand(1) > 0.5:
                noise = torch.randn_like(x) * 0.05
                x = x + noise
            if torch.rand(1) > 0.5:
                shift = torch.randint(-50, 50, (1,)).item()
                x = torch.roll(x, shifts=shift, dims=1)
        
        return x, y


# ==================== 模型定义 ====================

class SEBlock(nn.Module):
    """Squeeze-and-Excitation通道注意力模块"""
    
    def __init__(self, channels, reduction=16):
        super(SEBlock, self).__init__()
        self.squeeze = nn.AdaptiveAvgPool1d(1)
        self.excitation = nn.Sequential(
            nn.Linear(channels, channels // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channels // reduction, channels, bias=False),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        b, c, _ = x.size()
        # Squeeze
        y = self.squeeze(x).view(b, c)
        # Excitation
        y = self.excitation(y).view(b, c, 1)
        # Scale
        return x * y.expand_as(x)


class MultiScaleConvBlock(nn.Module):
    """多尺度卷积块 - 捕获不同频率范围的特征"""
    
    def __init__(self, in_channels, out_channels, kernel_sizes=[3, 5, 7]):
        super(MultiScaleConvBlock, self).__init__()
        
        self.branches = nn.ModuleList([
            nn.Sequential(
                nn.Conv1d(in_channels, out_channels // len(kernel_sizes), 
                         kernel_size=k, padding=k//2),
                nn.BatchNorm1d(out_channels // len(kernel_sizes)),
                nn.ReLU(inplace=True)
            )
            for k in kernel_sizes
        ])
        
        self.combine = nn.Sequential(
            nn.Conv1d(out_channels, out_channels, kernel_size=1),
            nn.BatchNorm1d(out_channels),
            nn.ReLU(inplace=True)
        )
    
    def forward(self, x):
        # 并行多尺度卷积
        branch_outputs = [branch(x) for branch in self.branches]
        # 特征拼接
        concat = torch.cat(branch_outputs, dim=1)
        # 整合
        return self.combine(concat)


class AttentionCNN(nn.Module):
    """多尺度注意力CNN故障诊断模型"""
    
    def __init__(self, input_length=1024, num_classes=4):
        super(AttentionCNN, self).__init__()
        
        # 输入归一化
        self.input_norm = nn.BatchNorm1d(1)
        
        # 多尺度特征提取层
        self.conv1 = MultiScaleConvBlock(1, 32, kernel_sizes=[3, 7, 11])
        self.se1 = SEBlock(32)
        self.pool1 = nn.MaxPool1d(2)
        
        self.conv2 = MultiScaleConvBlock(32, 64, kernel_sizes=[3, 5, 7])
        self.se2 = SEBlock(64)
        self.pool2 = nn.MaxPool1d(2)
        
        self.conv3 = nn.Sequential(
            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128),
            nn.ReLU(inplace=True),
            nn.Conv1d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128),
            nn.ReLU(inplace=True)
        )
        self.se3 = SEBlock(128)
        self.pool3 = nn.AdaptiveAvgPool1d(1)
        
        # 全连接分类器
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(128, 64),
            nn.ReLU(inplace=True),
            nn.Dropout(0.3),
            nn.Linear(64, num_classes)
        )
        
        self._initialize_weights()
    
    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv1d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm1d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.kaiming_normal_(m.weight)
                nn.init.constant_(m.bias, 0)
    
    def forward(self, x):
        # 输入归一化
        x = self.input_norm(x)
        
        # 多尺度卷积 + SE注意力 + 池化
        x = self.pool1(self.se1(self.conv1(x)))
        x = self.pool2(self.se2(self.conv2(x)))
        x = self.pool3(self.se3(self.conv3(x)))
        
        # 展平
        x = x.view(x.size(0), -1)
        
        # 分类
        x = self.classifier(x)
        return x


# ==================== 2D-CNN时频图分类模型 ====================

def compute_stft_spectrogram(signal, nperseg=128, noverlap=96):
    """
    计算信号的STFT时频图
    """
    from scipy import signal as sig
    frequencies, times, Zxx = sig.stft(
        signal, fs=12000, nperseg=nperseg, noverlap=noverlap
    )
    # 取对数幅度谱
    spectrogram = np.log(np.abs(Zxx) + 1e-8)
    return spectrogram


class SpectrogramCNN(nn.Module):
    """处理STFT时频图的2D-CNN模型"""
    
    def __init__(self, num_classes=4):
        super(SpectrogramCNN, self).__init__()
        
        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Dropout(0.25),
            
            # Block 2
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Dropout(0.25),
            
            # Block 3
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
        )
        
        self.classifier = nn.Sequential(
            nn.AdaptiveAvgPool2d((4, 4)),
            nn.Flatten(),
            nn.Dropout(0.5),
            nn.Linear(128 * 4 * 4, 256),
            nn.ReLU(inplace=True),
            nn.Dropout(0.3),
            nn.Linear(256, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x


# ==================== 训练与评估 ====================

def train_model(model, train_loader, val_loader, epochs=50, lr=0.001):
    """模型训练"""
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs)
    
    best_acc = 0.0
    history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}
    
    for epoch in range(epochs):
        # 训练阶段
        model.train()
        train_loss, train_correct, train_total = 0.0, 0, 0
        
        for X, y in train_loader:
            X, y = X.to(device), y.to(device)
            
            optimizer.zero_grad()
            outputs = model(X)
            loss = criterion(outputs, y)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            _, predicted = outputs.max(1)
            train_total += y.size(0)
            train_correct += predicted.eq(y).sum().item()
        
        train_loss /= len(train_loader)
        train_acc = 100.0 * train_correct / train_total
        
        # 验证阶段
        model.eval()
        val_loss, val_correct, val_total = 0.0, 0, 0
        
        with torch.no_grad():
            for X, y in val_loader:
                X, y = X.to(device), y.to(device)
                outputs = model(X)
                loss = criterion(outputs, y)
                
                val_loss += loss.item()
                _, predicted = outputs.max(1)
                val_total += y.size(0)
                val_correct += predicted.eq(y).sum().item()
        
        val_loss /= len(val_loader)
        val_acc = 100.0 * val_correct / val_total
        
        scheduler.step()
        
        history['train_loss'].append(train_loss)
        history['val_loss'].append(val_loss)
        history['train_acc'].append(train_acc)
        history['val_acc'].append(val_acc)
        
        if val_acc > best_acc:
            best_acc = val_acc
            torch.save(model.state_dict(), 'best_model.pth')
        
        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}] '
                  f'Train Loss: {train_loss:.4f} Acc: {train_acc:.2f}% | '
                  f'Val Loss: {val_loss:.4f} Acc: {val_acc:.2f}%')
    
    print(f'Best Validation Accuracy: {best_acc:.2f}%')
    return model, history


def evaluate_model(model, test_loader):
    """模型评估"""
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.eval()
    
    all_preds, all_labels = [], []
    correct, total = 0, 0
    
    with torch.no_grad():
        for X, y in test_loader:
            X, y = X.to(device), y.to(device)
            outputs = model(X)
            _, predicted = outputs.max(1)
            
            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(y.cpu().numpy())
            
            total += y.size(0)
            correct += predicted.eq(y).sum().item()
    
    accuracy = 100.0 * correct / total
    
    # 分类报告
    from sklearn.metrics import classification_report, confusion_matrix
    class_names = ['Normal', 'InnerRace', 'OuterRace', 'Ball']
    print(f'\nTest Accuracy: {accuracy:.2f}%')
    print('\nClassification Report:')
    print(classification_report(all_labels, all_preds, target_names=class_names))
    
    print('\nConfusion Matrix:')
    cm = confusion_matrix(all_labels, all_preds)
    print(cm)
    
    return accuracy, cm


# ==================== 主程序 ====================

def main():
    print("=" * 60)
    print("多尺度注意力CNN轴承故障诊断")
    print("=" * 60)
    
    # 1. 数据生成
    print("\n[1] 生成模拟轴承振动信号...")
    X, y = generate_bearing_data(samples_per_class=200, signal_length=1024)
    
    # 数据标准化
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    
    print(f"数据集形状: X={X.shape}, y={y.shape}")
    print(f"类别分布: {np.bincount(y)}")
    
    # 2. 数据划分
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    X_train, X_val, y_train, y_val = train_test_split(
        X_train, y_train, test_size=0.15, random_state=42, stratify=y_train
    )
    
    print(f"训练集: {X_train.shape[0]} 样本")
    print(f"验证集: {X_val.shape[0]} 样本")
    print(f"测试集: {X_test.shape[0]} 样本")
    
    # 3. 创建数据集和数据加载器
    train_dataset = VibrationDataset(X_train, y_train, augment=True)
    val_dataset = VibrationDataset(X_val, y_val, augment=False)
    test_dataset = VibrationDataset(X_test, y_test, augment=False)
    
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
    
    # 4. 创建模型
    print("\n[2] 创建多尺度注意力CNN模型...")
    model = AttentionCNN(input_length=1024, num_classes=4)
    
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"总参数量: {total_params:,}")
    print(f"可训练参数量: {trainable_params:,}")
    
    # 5. 训练模型
    print("\n[3] 开始训练...")
    model, history = train_model(model, train_loader, val_loader, epochs=50, lr=0.001)
    
    # 6. 测试集评估
    print("\n[4] 测试集评估...")
    model.load_state_dict(torch.load('best_model.pth'))
    test_acc, cm = evaluate_model(model, test_loader)
    
    # 7. 可视化训练过程
    print("\n[5] 生成训练曲线...")
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    axes[0].plot(history['train_loss'], label='Train Loss')
    axes[0].plot(history['val_loss'], label='Val Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].set_title('Training and Validation Loss')
    axes[0].legend()
    axes[0].grid(True)
    
    axes[1].plot(history['train_acc'], label='Train Acc')
    axes[1].plot(history['val_acc'], label='Val Acc')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy (%)')
    axes[1].set_title('Training and Validation Accuracy')
    axes[1].legend()
    axes[1].grid(True)
    
    plt.tight_layout()
    plt.savefig('training_curves.png', dpi=150)
    print("训练曲线已保存: training_curves.png")
    
    print("\n" + "=" * 60)
    print("训练完成!")
    print(f"最终测试准确率: {test_acc:.2f}%")
    print("=" * 60)


if __name__ == '__main__':
    main()

3.2 Wide ResNet故障诊断实现

python 复制代码

"""
Wide ResNet用于故障诊断
通过增加网络宽度提升特征提取能力
"""

import torch
import torch.nn as nn


class WideBasicBlock(nn.Module):
    """Wide ResNet基本块"""
    
    def __init__(self, in_channels, out_channels, stride=1, dropout_rate=0.3):
        super(WideBasicBlock, self).__init__()
        
        self.bn1 = nn.BatchNorm1d(in_channels)
        self.conv1 = nn.Conv1d(in_channels, out_channels, kernel_size=3, 
                               padding=1, bias=False)
        self.dropout = nn.Dropout(dropout_rate)
        self.bn2 = nn.BatchNorm1d(out_channels)
        self.conv2 = nn.Conv1d(out_channels, out_channels, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        
        # 残差连接
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Conv1d(in_channels, out_channels, 
                                       kernel_size=1, stride=stride, bias=False)
    
    def forward(self, x):
        out = self.dropout(self.conv1(F.relu(self.bn1(x))))
        out = self.conv2(F.relu(self.bn2(out)))
        out += self.shortcut(x)
        return out


class WideResNet(nn.Module):
    """
    Wide ResNet故障诊断网络
    
    特点：
    - 通过增加宽度（通道数）而非深度来提升性能
    - 比深层网络更容易训练
    - 残差连接缓解梯度消失问题
    """
    
    def __init__(self, input_length=1024, num_classes=4, depth=16, widen_factor=4, 
                 dropout_rate=0.3):
        super(WideResNet, self).__init__()
        
        self.in_channels = 16
        self.depth = depth
        assert (depth - 4) % 6 == 0, 'Depth should be 6n+4'
        n = (depth - 4) // 6
        k = widen_factor
        
        nStages = [16, 16*k, 32*k, 64*k]  # 通道数配置
        
        # 输入层
        self.conv1 = nn.Conv1d(1, nStages[0], kernel_size=3, padding=1, bias=False)
        
        # 残差阶段
        self.layer1 = self._wide_layer(WideBasicBlock, nStages[1], n, stride=1, 
                                       dropout_rate=dropout_rate)
        self.layer2 = self._wide_layer(WideBasicBlock, nStages[2], n, stride=2, 
                                       dropout_rate=dropout_rate)
        self.layer3 = self._wide_layer(WideBasicBlock, nStages[3], n, stride=2, 
                                       dropout_rate=dropout_rate)
        
        self.bn1 = nn.BatchNorm1d(nStages[3])
        self.linear = nn.Linear(nStages[3], num_classes)
        
        # 权重初始化
        for m in self.modules():
            if isinstance(m, nn.Conv1d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm1d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.xavier_normal_(m.weight)
                nn.init.constant_(m.bias, 0)
    
    def _wide_layer(self, block, out_channels, num_blocks, stride, dropout_rate):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        
        for stride in strides:
            layers.append(block(self.in_channels, out_channels, stride, dropout_rate))
            self.in_channels = out_channels
        
        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = self.conv1(x)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = F.relu(self.bn1(out))
        out = F.adaptive_avg_pool1d(out, 1)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


# ==================== DenseNet故障诊断实现 ====================

class DenseBlock(nn.Module):
    """DenseNet密集连接块"""
    
    def __init__(self, in_channels, growth_rate, num_layers, bn_size=4, 
                 drop_rate=0.3):
        super(DenseBlock, self).__init__()
        
        self.layers = nn.ModuleList()
        for i in range(num_layers):
            self.layers.append(self._make_layer(
                in_channels + i * growth_rate, 
                growth_rate, 
                bn_size, 
                drop_rate
            ))
    
    def _make_layer(self, in_channels, growth_rate, bn_size, drop_rate):
        return nn.Sequential(
            nn.BatchNorm1d(in_channels),
            nn.ReLU(inplace=True),
            nn.Conv1d(in_channels, bn_size * growth_rate, kernel_size=1, bias=False),
            nn.BatchNorm1d(bn_size * growth_rate),
            nn.ReLU(inplace=True),
            nn.Conv1d(bn_size * growth_rate, growth_rate, kernel_size=3, 
                     padding=1, bias=False),
            nn.Dropout(drop_rate)
        )
    
    def forward(self, x):
        features = [x]
        for layer in self.layers:
            new_features = layer(torch.cat(features, dim=1))
            features.append(new_features)
        return torch.cat(features, dim=1)


class TransitionBlock(nn.Module):
    """DenseNet过渡块 - 减少特征图尺寸和通道数"""
    
    def __init__(self, in_channels, out_channels):
        super(TransitionBlock, self).__init__()
        self.transition = nn.Sequential(
            nn.BatchNorm1d(in_channels),
            nn.ReLU(inplace=True),
            nn.Conv1d(in_channels, out_channels, kernel_size=1, bias=False),
            nn.AvgPool1d(kernel_size=2, stride=2)
        )
    
    def forward(self, x):
        return self.transition(x)


class DenseNet1D(nn.Module):
    """
    DenseNet-1D故障诊断网络
    
    特点：
    - 密集连接促进特征复用
    - 梯度流动更好，训练更稳定
    - 参数量相对较少
    """
    
    def __init__(self, input_length=1024, num_classes=4, growth_rate=16,
                 block_config=(4, 6, 8), num_init_features=32, 
                 bn_size=4, drop_rate=0.3):
        super(DenseNet1D, self).__init__()
        
        # 输入卷积
        self.features = nn.Sequential(
            nn.Conv1d(1, num_init_features, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm1d(num_init_features),
            nn.ReLU(inplace=True),
            nn.MaxPool1d(kernel_size=3, stride=2, padding=1)
        )
        
        # 密集块和过渡块
        num_features = num_init_features
        for i, num_layers in enumerate(block_config):
            # 密集块
            block = DenseBlock(num_features, growth_rate, num_layers, bn_size, drop_rate)
            self.features.add_module(f'denseblock{i+1}', block)
            num_features = num_features + num_layers * growth_rate
            
            # 过渡块
            if i != len(block_config) - 1:
                trans = TransitionBlock(num_features, num_features // 2)
                self.features.add_module(f'transition{i+1}', trans)
                num_features = num_features // 2
        
        # 最终归一化
        self.features.add_module('bn_final', nn.BatchNorm1d(num_features))
        self.features.add_module('relu_final', nn.ReLU(inplace=True))
        
        # 分类器
        self.classifier = nn.Sequential(
            nn.AdaptiveAvgPool1d(1),
            nn.Flatten(),
            nn.Dropout(drop_rate),
            nn.Linear(num_features, num_classes)
        )
        
        # 权重初始化
        for m in self.modules():
            if isinstance(m, nn.Conv1d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm1d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
    
    def forward(self, x):
        features = self.features(x)
        out = self.classifier(features)
        return out

3.3 时频图分类完整示例

python 复制代码

"""
2D-CNN时频图分类完整流程
将振动信号转换为STFT时频图，然后使用2D-CNN分类
"""

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from scipy import signal as sig
import warnings
warnings.filterwarnings('ignore')


def signal_to_spectrogram(signal, fs=12000, nperseg=128, noverlap=96):
    """
    将一维信号转换为STFT时频图
    
    参数:
        signal: 一维时域信号
        fs: 采样频率
        nperseg: FFT窗口长度
        noverlap: 窗口重叠点数
    
    返回:
        spectrogram: 归一化后的时频图 (H, W)
    """
    f, t, Zxx = sig.stft(signal, fs=fs, nperseg=nperseg, noverlap=noverlap)
    
    # 取对数幅度
    spectrogram = np.log(np.abs(Zxx) + 1e-10)
    
    # 归一化到[0, 1]
    spectrogram = (spectrogram - spectrogram.min()) / (spectrogram.max() - spectrogram.min() + 1e-10)
    
    return spectrogram


def wavelet_transform(signal, mother_wavelet='cmor1.5-1', scales=None, fs=12000):
    """
    连续小波变换时频分析
    
    参数:
        mother_wavelet: 小波母函数（如'mor2' for Morlet, 'cmor1.5-1' for complex Morlet）
        scales: 尺度序列
    """
    import pywt
    
    if scales is None:
        # 自动计算尺度范围
        fc = pywt.central_frequency(mother_wavelet)
        # 对应频率范围: 10Hz - fs/2
        scales = np.arange(1, 128)
    
    # CWT
    coefficients, frequencies = pywt.cwt(signal, scales, mother_wavelet, sampling_period=1/fs)
    
    # 取对数幅度
    spectrogram = np.log(np.abs(coefficients) + 1e-10)
    
    return spectrogram, frequencies


class SpectrogramDataset(Dataset):
    """时频图数据集"""
    
    def __init__(self, signals, labels, target_size=(64, 64)):
        self.signals = signals
        self.labels = labels
        self.target_size = target_size
    
    def __len__(self):
        return len(self.labels)
    
    def __getitem__(self, idx):
        # 计算时频图
        spec = signal_to_spectrogram(self.signals[idx])
        
        # 调整大小到统一尺寸
        spec = torch.FloatTensor(spec).unsqueeze(0)  # (1, H, W)
        spec = torch.nn.functional.interpolate(
            spec.unsqueeze(0), size=self.target_size, mode='bilinear', align_corners=False
        ).squeeze(0)
        
        return spec, self.labels[idx]


# 使用示例
if __name__ == '__main__':
    # 假设已有原始振动信号
    # signals: (N, signal_length)
    # labels: (N,)
    
    # 生成示例信号
    np.random.seed(42)
    n_samples = 100
    signal_length = 1024
    
    # 生成4类信号
    signals = []
    labels = []
    
    for i in range(4):
        for _ in range(n_samples // 4):
            t = np.linspace(0, signal_length/12000, signal_length)
            signal = np.sin(2 * np.pi * 30 * t) + 0.1 * np.random.randn(signal_length)
            signals.append(signal)
            labels.append(i)
    
    signals = np.array(signals)
    labels = np.array(labels)
    
    # 创建数据集
    dataset = SpectrogramDataset(signals, labels, target_size=(64, 64))
    
    # 获取一个样本
    spec, label = dataset[0]
    print(f"时频图形状: {spec.shape}")
    print(f"标签: {label}")
    
    # 创建模型
    model = SpectrogramCNN(num_classes=4)
    
    # 测试前向传播
    spec_batch = spec.unsqueeze(0)  # 添加batch维度
    output = model(spec_batch)
    print(f"模型输出形状: {output.shape}")

4. PHM应用场景

4.1 1D-CNN适用场景

场景	特点	优势
滚动轴承故障诊断	振动信号丰富，故障特征明显	端到端学习，自动提取故障频率
电机故障诊断	定子电流、振动信号	抗噪声能力强
齿轮箱故障诊断	多级齿轮啮合，信号复杂	多尺度捕获不同级特征
实时监测系统	计算资源受限	模型轻量，推理速度快

4.2 2D-CNN适用场景

场景	特点	优势
复杂调制信号	频率成分随时间变化	时频图保留完整时频信息
非平稳故障信号	故障特征随时间演化	捕获时变特征
多传感器融合	多个传感器的时频图叠加	二维卷积天然处理空间关系
小样本诊断	时频图数据增强效果好	可利用迁移学习

4.3 注意力增强CNN适用场景

场景	特点	注意力优势
变工况故障诊断	负载/转速变化影响特征	SE-CBAM自适应调整特征权重
微弱故障检测	故障特征容易被噪声掩盖	注意力聚焦关键频段
多故障并发	多种故障同时存在	选择性关注不同故障特征
跨域迁移诊断	不同设备/工况数据分布差异	通道注意力增强泛化能力

4.4 Wide ResNet/DenseNet适用场景

模型	适用场景	特点
Wide ResNet	数据量较大、类别较多	宽度提升特征丰富度，训练稳定
DenseNet	特征复用重要、梯度流动问题	密集连接促进特征利用，适用于深层网络

4.5 工业应用案例

案例1：轴承故障诊断

复制代码

应用场景：风电齿轮箱轴承监测
数据：振动加速度信号，采样率20kHz，信号长度2048
模型：1D-CNN + SE注意力
效果：相比传统包络谱分析，准确率提升15%，漏检率降低60%

案例2：刀具磨损监测

复制代码

应用场景：数控机床刀具状态监测
数据：声发射信号 + 振动信号融合
模型：多尺度2D-CNN处理时频图
效果：实现刀具磨钝的早期预警，准确率达97%

案例3：电机轴承故障

复制代码

应用场景：地铁牵引电机健康管理
数据：三相电流信号
模型：Wide ResNet (depth=22, widen_factor=8)
效果：6类故障分类准确率99.2%

5. 注意事项与总结

5.1 模型选择指南

复制代码

┌──────────────────────────────────────────────────────────────┐
│                      CNN模型选择决策树                          │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│              输入数据 ──┬── 一维原始信号 ──→ 1D-CNN            │
│                            │                                │
│                            ├── 多传感器信号 ──→ 多尺度1D-CNN   │
│                            │                                │
│                            └── 预处理后信号 ──→ 2D-CNN        │
│                                                              │
│              任务特点 ──┬── 计算资源受限 ──→ 轻量级1D-CNN      │
│                            │                                │
│                            ├── 微弱故障检测 ──→ 注意力增强CNN │
│                            │                                │
│                            ├── 大数据量 ──→ Wide ResNet      │
│                            │                                │
│                            └── 深层网络 ──→ DenseNet         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

5.2 关键参数设置建议

参数	推荐范围	说明
卷积核大小	3-15	轴承故障：5-11；电机：3-7
卷积层数	3-8	过多易过拟合；过少特征提取不足
通道数增长	32→64→128	逐层翻倍
Dropout	0.2-0.5	防止过拟合
学习率	0.001-0.0001	使用学习率衰减
Batch Size	16-64	根据显存调整

5.3 数据预处理注意事项

信号长度选择
- 应包含足够的故障周期特征
- 建议至少包含2-3个旋转周期
- 公式： L s i g n a l ≥ f s × 60 r p m m i n × 2 × 2 L_{signal} \geq \frac{fs \times 60}{rpm_{min} \times 2} \times 2 Lsignal≥rpmmin×2fs×60×2

归一化方法

python 复制代码

# 方法1：标准化（适用于2D-CNN时频图）
spectrogram = (spectrogram - mean) / std

# 方法2：Min-Max归一化（适用于1D-CNN原始信号）
signal = (signal - signal.min()) / (signal.max() - signal.min())

数据增强
- 随机噪声注入： σ ∈ $0.01 , 0.1$ \sigma \in $0.01, 0.1$ σ∈ $0.01,0.1$
- 随机时移： ± 10 % \pm 10\% ±10% 信号长度
- 随机尺度变换： $0.9 , 1.1$ $0.9, 1.1$ $0.9,1.1$
- 随机裁剪与填充

5.4 常见问题与解决方案

问题	原因	解决方案
过拟合严重	数据量不足/模型复杂	增加数据增强、使用正则化、减少模型深度
收敛慢	学习率不当	使用学习率warmup或余弦退火
测试准确率低	域偏移/标签噪声	域适应技术、标签平滑
梯度消失	网络过深	使用残差连接、BatchNorm
类别不平衡	故障样本少	过采样/SMOTE、类别加权损失

5.5 与其他方法对比

方法	特征提取	优势	劣势
传统CNN	自动	端到端、特征自学习	需要大量数据
传统机器学习	手工设计	可解释性强、数据需求少	依赖专家经验
Transformer	自动+注意力	长期依赖建模	计算量大
时频分析+分类器	手工+自动	时频特征丰富	两阶段优化

5.6 总结

深度CNN在PHM故障诊断中展现出强大的特征自动提取能力：

1D-CNN：适合原始一维振动信号，端到端处理，推理速度快
2D-CNN：适合时频图输入，能够捕获时频联合特征
多尺度CNN：适合复杂多频段故障特征
注意力机制：增强关键特征抑制冗余，提升模型可解释性
Wide ResNet/DenseNet：适合大数据量场景，进一步提升性能

实践建议：

从简单模型开始，逐步复杂化
充分利用数据增强提升泛化能力
结合领域知识设计网络结构
重视模型可解释性，指导故障机理研究
持续监控系统性能，及时更新模型

PHM算法学习 Day 3：深度卷积神经网络（CNN）故障诊断变体

目录

1. 算法简介

1.1 背景与动机

1.2 CNN在PHM中的核心优势

1.3 CNN故障诊断变体概览

2. 算法原理

2.1 1D-CNN：端到端原始信号处理

2.1.1 核心思想

2.1.2 一维卷积操作

2.1.3 多通道一维卷积

2.1.4 特征图计算

2.2 2D-CNN：时频图特征提取

2.2.1 时频表示

2.2.2 二维卷积操作

2.3 多尺度CNN

2.3.1 多尺度特征提取原理

2.3.2 数学表示

2.4 注意力增强CNN

2.4.1 SE-Net：通道注意力

2.4.2 CBAM：通道+空间注意力

2.5 Wide ResNet

2.5.1 残差连接

2.5.2 Wide vs Deep

2.6 DenseNet：密集连接

2.6.1 密集块结构

2.6.2 增长率

3. 代码实现

3.1 完整示例：多尺度注意力CNN故障诊断

3.2 Wide ResNet故障诊断实现

3.3 时频图分类完整示例

4. PHM应用场景

4.1 1D-CNN适用场景

4.2 2D-CNN适用场景

4.3 注意力增强CNN适用场景

4.4 Wide ResNet/DenseNet适用场景

4.5 工业应用案例

案例1：轴承故障诊断

案例2：刀具磨损监测

案例3：电机轴承故障

5. 注意事项与总结

5.1 模型选择指南

5.2 关键参数设置建议

5.3 数据预处理注意事项

5.4 常见问题与解决方案

5.5 与其他方法对比

5.6 总结