海量数据集AI自动化打标 - 温度周期检测

概述:让AI识别温度变化的"指纹"

想象一下,在一个等离子体实验装置中,温度从几百电子伏特(eV)飞速上升到上千eV,又在瞬间下降------这个过程可能只持续几十毫秒。传统人工标注需要标注员逐帧查看温度曲线,标记上升阶段、峰值时刻、下降阶段......既耗时又容易遗漏关键特征。

温度标注ML Backend 就像一位经验丰富的温度分析专家,能自动识别温度曲线中的关键模式,将复杂的时序分析工作自动化,让科研人员专注于物理现象的解释而非繁琐的数据标注。

一、业务价值:Why - 为什么需要温度智能标注?

1.1 现实场景痛点

在等离子体物理、核聚变、材料科学等领域,温度数据分析面临诸多挑战:

场景1:等离子体放电实验分析 ⚡

SUNIST托卡马克装置每次放电实验:

产生几十个温度传感器通道数据
每个通道包含10000+时间点
需要标注:上升阶段、峰值时刻、下降阶段、异常事件
一次实验人工标注需要2-3小时

场景2:材料热处理工艺 🔥

金属热处理过程:

温度曲线直接影响材料性能
需要精确标注加热速率、保温时间、冷却速率
不同工艺曲线对比分析
人工标注效率低,难以大规模对比

场景3:非周期性温度异常检测 🚨

等离子体实验中的非周期性事件:

突然的高温持续(可能是不稳定性)
异常的温度平台期
传统周期性检测方法无法识别
关键异常事件容易被遗漏

1.2 ML Backend的核心价值

温度标注ML Backend通过智能模式识别,实现:

✅ 效率提升20倍 :自动完成95%的标注工作,人工仅需校验

✅ 标准化 :消除人为主观判断差异,保证标注一致性

✅ 全面性 :不遗漏任何异常模式,包括非周期性事件

✅ 实时分析:实验结束即可获得初步分析结果

价值量化示例:

传统标注:100次实验 × 2小时 = 200小时
ML Backend:100次实验 × 6分钟校验 = 10小时
节省时间95%,加速科研迭代

二、系统架构:What - 温度标注ML Backend是什么?

2.1 整体架构图

graph TB subgraph "数据源" A1[CSV时序数据] --> A2[数据加载器] A3[MinIO存储] --> A2 end subgraph "Label Studio前端" B1[时序标注界面] --> B2[任务管理] end subgraph "ML Backend核心" C1[_wsgi.py服务] --> C2[TemperatureModel
业务协调] C2 --> C3[TemperaturePredictor
预测引擎] C3 --> C4[NonCyclicDetector
非周期性检测] C2 --> C5[Utils工具集
数据预处理] end subgraph "算法层" D1[上升阶段检测
start_end_time_1D] D2[峰值时刻检测
find_temperature_peaks] D3[异常检测
detect_anomalies] D4[平台期检测
gradient_analysis] D5[非周期性模式
sustained_high_temp] end A2 --> C2 B2 -->|HTTP API| C1 C3 --> D1 C3 --> D2 C3 --> D3 C3 --> D4 C4 --> D5 style C3 fill:#4CAF50,color:#fff style C4 fill:#FF5722,color:#fff style D1 fill:#2196F3,color:#fff style D2 fill:#2196F3,color:#fff style D5 fill:#FF9800,color:#fff

2.2 核心组件详解

🎯 TemperatureModel - 业务编排中心

这是系统的**"指挥官"**,负责协调各个组件完成温度标注:

python 复制代码

class TemperatureModel(LabelStudioMLBase):
    def setup(self):
        # 初始化预测器,传递阈值参数
        self.predictor = TemperaturePredictor(
            critical_temp_threshold=1500.0,  # 临界温度阈值
            sustained_period=5.0              # 持续时间阈值
        )
    
    def predict(self, tasks):
        # 1. 加载数据
        data_dict = self.get_data(tasks)
        
        # 2. 执行预测
        for shot, data in data_dict.items():
            predictions = self.predictor.user_predict(data)
        
        # 3. 格式化结果
        return model_preds

设计亮点:

安全获取参数 :_safe_get()方法兼容不同版本的Label Studio
在线学习 :fit()方法根据用户标注自适应调整阈值
参数持久化:阈值参数保存在缓存中,支持模型演进

🧠 TemperaturePredictor - 智能预测引擎

这是系统的**"分析大脑"**,实现6种温度模式识别:

python 复制代码

class TemperaturePredictor(BasePredictor):
    def __init__(self, 
                 temp_rise_threshold=1000.0,   # 上升阈值
                 temp_fall_threshold=500.0,    # 下降阈值
                 gradient_threshold=100.0,     # 梯度阈值
                 critical_temp_threshold=1500.0, # 临界温度
                 sustained_period=5.0):         # 持续时间
        ...
    
    def user_predict(self, task_data):
        predictions = []
        
        # 1. 检测温度上升阶段
        predictions.extend(self._detect_rise_phases(...))
        
        # 2. 检测峰值时刻
        predictions.extend(self._detect_peak_moments(...))
        
        # 3. 检测下降阶段
        predictions.extend(self._detect_fall_phases(...))
        
        # 4. 检测异常事件
        predictions.extend(self._detect_anomalies(...))
        
        # 5. 检测平台期
        predictions.extend(self._detect_plateau_phases(...))
        
        # 6. 检测非周期性模式(新增!)
        predictions.extend(
            self.non_cyclic_detector.detect_non_cyclic_patterns(...)
        )
        
        return predictions

六大检测模块:

模块	检测目标	物理意义	应用场景
上升阶段	温度快速增长区间	加热/放电启动	识别放电开始
峰值时刻	温度最高点	能量峰值	标记关键时刻
下降阶段	温度衰减区间	冷却/放电结束	识别放电结束
异常检测	温度突变	不稳定性事件	发现异常现象
平台期	温度平稳区间	稳态阶段	工艺质量评估
非周期性	持续高温	非正常状态	安全预警

🔍 NonCyclicDetector - 非周期性模式识别器

这是系统的**"异常雷达"**,专门检测非周期性温度事件:

python 复制代码

class NonCyclicTemperatureDetector:
    def detect_non_cyclic_patterns(self, time, temp_data, channel):
        predictions = []
        
        # 检测1:持续高温
        predictions.extend(self._detect_sustained_high_temp(...))
        
        # 检测2:异常温度平台
        predictions.extend(self._detect_abnormal_plateau(...))
        
        # 检测3:突发温度尖峰
        predictions.extend(self._detect_sudden_spikes(...))
        
        return predictions

非周期性 vs 周期性的区别:

makefile 复制代码

周期性温度模式(正常放电):
  ┌─┐    ┌─┐    ┌─┐
  │ │    │ │    │ │
──┘ └────┘ └────┘ └──  (规律的升-峰-降)

非周期性温度模式(异常事件):
  ┌──────────┐
  │          │  (持续高温,不下降)
──┘          └────────

三、技术实现:How - 温度模式识别详解

3.1 核心算法原理

算法1:上升阶段检测

物理背景:等离子体放电时,温度从几百eV快速上升到上千eV

算法思想:检测温度值超过阈值的连续区间

python 复制代码

def _detect_rise_phases(self, time, temp_data, channel):
    # 1. 调用start_end_time_1D检测上升区间
    rise_intervals = start_end_time_1D(
        temp_data, 
        threshold=self.temp_rise_threshold,  # 1000 eV
        postive=True  # 检测正向超过阈值
    )
    
    # 2. 遍历每个区间
    for start_idx, end_idx in rise_intervals:
        start_time = time[start_idx]
        end_time = time[end_idx]
        
        # 3. 计算上升特征
        rise_data = temp_data[start_idx:end_idx+1]
        max_temp = np.max(rise_data)
        avg_temp = np.mean(rise_data)
        
        # 4. 智能标签分级
        if max_temp > self.temp_rise_threshold * 1.5:
            label = f"{channel}_快速上升"  # 超过1500eV
        else:
            label = f"{channel}_上升阶段"   # 1000-1500eV
        
        # 5. 计算置信度
        confidence = min(0.9, (max_temp - threshold) / threshold)
        
        predictions.append(Prediction(
            label_group='temperature_events',
            label_choice=label,
            start=start_time,
            end=end_time,
            score=confidence
        ))

start_end_time_1D算法原理:

python 复制代码

def start_end_time_1D(data, threshold, postive=True):
    """
    检测信号超过阈值的区间
    
    思路:
    1. 将数据二值化:超过阈值=1,否则=0
    2. 检测0→1跳变(上升沿)和1→0跳变(下降沿)
    3. 配对上升沿和下降沿,得到区间
    """
    if postive:
        mask = (data > threshold).astype(int)
    else:
        mask = (data < -threshold).astype(int)
    
    # 检测跳变
    diff = np.diff(mask)
    start_indices = np.where(diff == 1)[0] + 1  # 上升沿
    end_indices = np.where(diff == -1)[0] + 1   # 下降沿
    
    # 配对区间
    intervals = []
    for start, end in zip(start_indices, end_indices):
        intervals.append((start, end))
    
    return intervals

可视化示例:

arduino 复制代码

温度曲线:
  1500 ────┐     ┌────
           │     │
  1000 ────│─────│────  (阈值线)
           │     │
   500 ────┴─────┴────
        ↑  ↑     ↑  ↑
      start end start end

检测结果:
  区间1: [t1, t2]  标签:"快速上升"(max=1500)
  区间2: [t3, t4]  标签:"快速上升"(max=1600)

算法2:峰值时刻检测

物理背景:峰值对应能量最高时刻,是分析的关键点

算法思想:基于梯度检测极值点

python 复制代码

def _detect_peak_moments(self, time, temp_data, channel):
    # 调用find_temperature_peaks
    peak_times = find_temperature_peaks(
        temp_data, time,
        gradient_threshold=self.gradient_threshold,  # 100 eV/ms
        min_peak_height=self.min_peak_height         # 800 eV
    )
    
    for peak_time in peak_times:
        peak_idx = np.argmin(np.abs(time - peak_time))
        peak_temp = temp_data[peak_idx]
        
        # 智能标签分级
        if peak_temp > self.temp_rise_threshold * 2:
            label = f"{channel}_极高峰值"    # >2000eV
        elif peak_temp > self.temp_rise_threshold * 1.5:
            label = f"{channel}_高峰值"      # 1500-2000eV
        else:
            label = f"{channel}_峰值时刻"    # 800-1500eV
        
        predictions.append(Prediction(
            label_group='temperature_events',
            label_choice=label,
            start=peak_time,
            end=None,  # 时间点标注
            score=min(0.95, peak_temp / (threshold * 2))
        ))

find_temperature_peaks算法原理:

python 复制代码

def find_temperature_peaks(temp_data, time, gradient_threshold, min_peak_height):
    """
    基于梯度的峰值检测
    
    思路:
    1. 计算温度梯度 dT/dt
    2. 检测梯度从正变负的点(极大值点)
    3. 过滤掉低于最小高度的峰值
    """
    # 1. 计算梯度
    gradient = np.gradient(temp_data, time)
    
    # 2. 检测梯度符号变化
    sign_change = np.diff(np.sign(gradient))
    peak_indices = np.where(sign_change < 0)[0]  # 正→负
    
    # 3. 过滤峰值
    valid_peaks = []
    for idx in peak_indices:
        if temp_data[idx] > min_peak_height:
            # 检查梯度是否足够大
            if abs(gradient[idx-1]) > gradient_threshold:
                valid_peaks.append(time[idx])
    
    return valid_peaks

可视化示例:

makefile 复制代码

温度曲线:
       ●  (峰值,梯度0)
      ╱ ╲
     ╱   ╲
    ╱     ╲  (梯度<0)
   ╱       ╲
  ╱         ╲
──           ──
  (梯度>0)

梯度曲线:
  +100 ──╲
         │╲
    0 ───●─────  (梯度过零点→峰值)
         │  ╲
 -100 ───    ──

算法3:非周期性高温检测(创新点!)

物理背景:等离子体失稳时,温度可能持续高温不下降

算法思想:检测温度持续超过临界值的时间窗口

python 复制代码

def _detect_sustained_high_temp(self, time, temp_data, channel):
    """
    检测持续高温事件
    
    定义:温度持续超过critical_temp_threshold达到sustained_period时长
    """
    # 1. 找到高温区间
    high_temp_mask = temp_data > self.critical_temp_threshold
    high_temp_intervals = start_end_time_1D(
        high_temp_mask.astype(float), 
        threshold=0.5, 
        postive=True
    )
    
    # 2. 过滤持续时间
    predictions = []
    for start_idx, end_idx in high_temp_intervals:
        duration = time[end_idx] - time[start_idx]
        
        if duration >= self.sustained_period:  # 持续超过5ms
            avg_temp = np.mean(temp_data[start_idx:end_idx+1])
            max_temp = np.max(temp_data[start_idx:end_idx+1])
            
            # 严重程度分级
            if duration > self.sustained_period * 2:
                label = f"{channel}_严重持续高温"
                severity = "critical"
            else:
                label = f"{channel}_持续高温"
                severity = "warning"
            
            predictions.append(Prediction(
                label_group='temperature_events',
                label_choice=label,
                start=time[start_idx],
                end=time[end_idx],
                score=0.9,
                metadata={
                    'severity': severity,
                    'duration_ms': duration,
                    'avg_temp_eV': avg_temp,
                    'max_temp_eV': max_temp
                }
            ))
    
    return predictions

实际案例:

makefile 复制代码

正常放电(周期性):
  1500eV ─┐  ┌─┐  ┌─┐
         │  │ │  │ │
  1000eV ┴──┴─┴──┴─┴  (峰值后快速下降)
          2ms 2ms 2ms

异常放电(非周期性):
  1500eV ─┬──────────┐
         │          │  (持续高温8ms!)
  1000eV ┴──────────┴──
          ← 8ms →
         
  检测结果: "严重持续高温" (duration=8ms > 5ms*2)

算法4:温度平台期检测

物理背景:稳态阶段温度保持平稳,梯度接近0

算法思想:检测温度梯度小于阈值的区间

python 复制代码

def _detect_plateau_phases(self, time, temp_data, channel):
    # 1. 计算温度梯度
    gradient = np.gradient(temp_data)
    
    # 2. 检测低梯度区间
    plateau_threshold = self.gradient_threshold * 0.1  # 10 eV/ms
    plateau_mask = np.abs(gradient) < plateau_threshold
    
    # 3. 找到连续平台区间
    plateau_intervals = start_end_time_1D(
        plateau_mask.astype(float), 
        threshold=0.5, 
        postive=True
    )
    
    # 4. 过滤短暂波动(至少持续10个采样点)
    predictions = []
    for start_idx, end_idx in plateau_intervals:
        if end_idx - start_idx > 10:
            plateau_data = temp_data[start_idx:end_idx+1]
            avg_temp = np.mean(plateau_data)
            temp_std = np.std(plateau_data)
            
            # 根据温度水平分类
            if avg_temp > self.temp_rise_threshold:
                label = f"{channel}_高温平台期"
            elif avg_temp > self.temp_fall_threshold:
                label = f"{channel}_中温平台期"
            else:
                label = f"{channel}_低温平台期"
            
            # 稳定性评分
            stability_score = max(0.5, 1.0 - temp_std / avg_temp)
            
            predictions.append(Prediction(
                label_group='temperature_events',
                label_choice=label,
                start=time[start_idx],
                end=time[end_idx],
                score=stability_score
            ))
    
    return predictions

3.2 在线学习与阈值自适应

核心思想:根据用户标注反馈,自动调整阈值参数

python 复制代码

def fit(self, event, data, **kwargs):
    """
    处理标注事件,优化模型参数
    """
    if event in ['ANNOTATION_CREATED', 'ANNOTATION_UPDATED']:
        # 1. 解析标注数据
        annotation_data = self._parse_annotation_data(data)
        
        # 2. 优化阈值
        new_thresholds = self._optimize_thresholds(annotation_data)
        
        # 3. 更新预测器参数
        self.predictor.update_thresholds(new_thresholds)
        
        # 4. 更新模型版本
        new_version = f"temperature_v1.0_{int(time.time())}"
        self.set('model_version', new_version)

def _optimize_thresholds(self, annotation_data):
    """
    基于标注数据的自适应策略
    """
    new_thresholds = {}
    
    # 统计不同类型标注
    rise_annotations = [ann for ann in annotation_data if '上升' in ann['label']]
    
    # 如果用户标注了更多上升阶段,降低阈值提高敏感度
    if rise_annotations:
        current_rise = self.get('temp_rise_threshold', 1000.0)
        new_thresholds['temp_rise_threshold'] = max(800.0, current_rise * 0.95)
    
    return new_thresholds

自适应示例:

ini 复制代码

初始阈值: temp_rise_threshold = 1000 eV

用户标注反馈:
- 标注了一个850eV的上升阶段(模型未检测到)

自适应调整:
- new_threshold = 1000 * 0.95 = 950 eV (降低阈值)

下次预测:
- 可以检测到850eV的上升阶段了

3.3 多通道并行处理

场景:一次实验可能有10+个温度传感器通道

python 复制代码

def user_predict(self, task_data):
    predictions = []
    
    # 1. 提取所有温度通道
    temp_channels = self._extract_temperature_channels(task_data)
    # ['Te1', 'Te2', 'Ti1', 'Ti2', ...]
    
    # 2. 逐个通道分析
    for channel in temp_channels:
        temp_data = task_data[channel].values
        
        # 跳过全为NaN的通道
        if np.all(np.isnan(temp_data)):
            continue
        
        # 处理缺失值(前向填充)
        temp_data = self._handle_missing_values(temp_data)
        
        # 执行6种检测
        predictions.extend(self._detect_rise_phases(time, temp_data, channel))
        predictions.extend(self._detect_peak_moments(time, temp_data, channel))
        predictions.extend(self._detect_fall_phases(time, temp_data, channel))
        predictions.extend(self._detect_anomalies(time, temp_data, channel))
        predictions.extend(self._detect_plateau_phases(time, temp_data, channel))
        predictions.extend(
            self.non_cyclic_detector.detect_non_cyclic_patterns(
                time, temp_data, channel
            )
        )
    
    return predictions

四、实战应用:真实科研场景

场景1:等离子体放电数据分析

背景:SUNIST托卡马克装置,研究等离子体约束

数据特征:

10个电子温度通道(Te1-Te10)
8个离子温度通道(Ti1-Ti8)
采样率:100kHz
每次放电持续50-100ms

ML Backend应用:

python 复制代码

# 输入:CSV文件
shot_12345.csv:
  time,   Te1,   Te2,   Ti1,   ...
  0.001,  500,   480,   450,   ...
  0.002,  850,   820,   780,   ...
  0.003, 1200,  1150,  1100,   ...
  ...

# 预测输出
predictions = [
    # 通道Te1
    Prediction(label="Te1_上升阶段", start=0.001, end=0.015),
    Prediction(label="Te1_高峰值", start=0.018, end=None),
    Prediction(label="Te1_下降阶段", start=0.020, end=0.040),
    
    # 通道Te2
    Prediction(label="Te2_上升阶段", start=0.002, end=0.016),
    Prediction(label="Te2_持续高温", start=0.018, end=0.035),  # 异常!
    ...
]

科研价值:

快速筛选:从100次实验中快速找到异常放电
统计分析:批量分析峰值温度分布
参数优化:对比不同实验参数下的温度曲线特征

场景2:材料热处理工艺优化

背景:金属热处理厂,优化淬火工艺

需求:

分析加热速率对材料性能的影响
对比不同冷却曲线

ML Backend应用:

python 复制代码

# 工艺1:快速加热
prediction1 = [
    Prediction(label="temp_快速上升", start=0, end=30),  # 30秒升温
    Prediction(label="temp_高温平台期", start=30, end=180),  # 保温150秒
    Prediction(label="temp_快速下降", start=180, end=210)  # 30秒冷却
]

# 工艺2:缓慢加热
prediction2 = [
    Prediction(label="temp_上升阶段", start=0, end=120),  # 120秒升温
    Prediction(label="temp_高温平台期", start=120, end=270),  # 保温150秒
    Prediction(label="temp_下降阶段", start=270, end=360)  # 90秒冷却
]

# 自动对比分析
compare_report = {
    '加热速率': '工艺1:50℃/s, 工艺2:12.5℃/s',
    '保温稳定性': '工艺1:±5℃, 工艺2:±2℃',
    '冷却速率': '工艺1:50℃/s, 工艺2:16.7℃/s'
}

工业价值:

标准化工艺曲线标注
快速对比不同批次
质量追溯分析

场景3:非周期性事件研究

背景:研究等离子体不稳定性现象

传统方法问题:

只检测周期性升-峰-降模式
异常的持续高温被当作"正常的峰值"
遗漏关键物理现象

非周期性检测价值:

python 复制代码

# 正常放电
normal_discharge = {
    'pattern': '周期性',
    'peak_temp': 1500,
    'duration': 2,  # 峰值持续2ms
    'label': 'Te1_高峰值'
}

# 异常放电(失稳)
abnormal_discharge = {
    'pattern': '非周期性',
    'peak_temp': 1600,
    'duration': 12,  # 持续12ms!
    'label': 'Te1_严重持续高温',  # 新标签!
    'physical_meaning': '等离子体约束失效'
}

科研突破:

发现新的物理现象
建立失稳预警模型
优化实验参数

五、技术优化与最佳实践

5.1 性能优化

优化1:向量化计算

问题:逐点循环计算慢

python 复制代码

# ❌ 低效写法
for i in range(len(temp_data)):
    if temp_data[i] > threshold:
        ...

# ✅ 高效写法(向量化)
mask = temp_data > threshold  # NumPy向量化,快100倍
high_temp_indices = np.where(mask)[0]

优化2:多通道并行

python 复制代码

from concurrent.futures import ThreadPoolExecutor

def predict_parallel(self, task_data):
    temp_channels = self._extract_temperature_channels(task_data)
    
    # 并行处理多个通道
    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = []
        for channel in temp_channels:
            future = executor.submit(
                self._predict_single_channel, 
                task_data, channel
            )
            futures.append(future)
        
        # 汇总结果
        all_predictions = []
        for future in futures:
            all_predictions.extend(future.result())
    
    return all_predictions

加速效果:

单线程:18个通道 × 100ms = 1800ms
4线程并行:18个通道 ÷ 4 × 100ms = 450ms
加速4倍

5.2 阈值调优策略

经验值推荐:

物理场景	temp_rise_threshold	temp_fall_threshold	gradient_threshold
等离子体放电	1000 eV	500 eV	100 eV/ms
金属热处理	800 ℃	400 ℃	50 ℃/s
化学反应	300 K	150 K	20 K/min

调优方法:

从宽到严:初始阈值设低,捕获所有可能事件
统计分析:分析预测结果分布,找到最优阈值
在线学习:根据用户反馈自动调整

5.3 数据质量处理

处理1:缺失值处理

python 复制代码

def _handle_missing_values(self, temp_data):
    """前向填充策略"""
    temp_data_clean = temp_data.copy()
    
    # 逐点前向填充
    for i in range(1, len(temp_data_clean)):
        if np.isnan(temp_data_clean[i]):
            temp_data_clean[i] = temp_data_clean[i-1]
    
    # 处理首位NaN
    if np.isnan(temp_data_clean[0]):
        first_valid = np.where(~np.isnan(temp_data_clean))[0]
        if len(first_valid) > 0:
            temp_data_clean[0] = temp_data_clean[first_valid[0]]
    
    return temp_data_clean

处理2:异常值过滤

python 复制代码

def _filter_outliers(self, temp_data):
    """3σ准则过滤异常值"""
    mean = np.nanmean(temp_data)
    std = np.nanstd(temp_data)
    
    # 超过±3σ的点视为异常
    outlier_mask = np.abs(temp_data - mean) > 3 * std
    
    # 用相邻点平均值替代
    temp_data_clean = temp_data.copy()
    outlier_indices = np.where(outlier_mask)[0]
    
    for idx in outlier_indices:
        if idx > 0 and idx < len(temp_data) - 1:
            temp_data_clean[idx] = (temp_data[idx-1] + temp_data[idx+1]) / 2
    
    return temp_data_clean

六、总结与展望

6.1 核心创新点

多模式联合检测
- 6种温度模式全覆盖
- 周期性+非周期性双重检测
自适应阈值优化
- 在线学习机制
- 根据用户反馈自动调整
工程化设计
- 多通道并行处理
- 完善的异常处理
- 标准化Label Studio接口

6.2 应用场景总结

领域	典型应用	核心价值
等离子体物理	放电数据分析	加速科研迭代
材料科学	热处理工艺	工艺标准化
化学工程	反应过程监控	异常预警
能源	反应堆温度监控	安全保障

6.3 未来发展方向

方向1:深度学习模型

思路:用LSTM/Transformer学习温度时序模式

python 复制代码

class TemperatureLSTM(nn.Module):
    def __init__(self):
        self.lstm = nn.LSTM(input_size=1, hidden_size=64, num_layers=2)
        self.classifier = nn.Linear(64, 6)  # 6种模式分类
    
    def forward(self, temp_sequence):
        # 输入:[batch, seq_len, 1]
        # 输出:[batch, 6] 每种模式的概率
        ...

优势:

自动学习特征,无需手工设计阈值
捕获长期依赖关系

方向2:多物理量融合

思路:结合温度+密度+磁场多种信号

python 复制代码

class MultiPhysicsPredictor:
    def predict(self, temperature, density, magnetic_field):
        # 融合多种物理量,综合判断
        ...

价值:

更准确的事件检测
理解物理机制的关联

方向3:实时预测

思路:流式处理,边采集边预测

python 复制代码

class StreamingPredictor:
    def __init__(self):
        self.buffer = []
        
    def on_new_data(self, new_point):
        self.buffer.append(new_point)
        
        if len(self.buffer) >= window_size:
            prediction = self.predict(self.buffer)
            self.emit_prediction(prediction)

应用:

实时实验监控
异常即时预警

附录:快速上手

1. 数据格式要求

csv 复制代码

time, Te1, Te2, Ti1, Ti2
0.001, 500, 480, 450, 430
0.002, 850, 820, 780, 750
0.003, 1200, 1150, 1100, 1050
...

要求:

必须包含time列
温度列名包含te/ti/temp/temperature关键字
时间单位:毫秒(ms)

2. Label Studio配置

xml 复制代码

<View>
  <TimeSeries name="ts" value="$csv" valueType="url">
    <Channel column="Te1" strokeColor="#1f77b4"/>
    <Channel column="Te2" strokeColor="#ff7f0e"/>
  </TimeSeries>
  
  <TimeSeriesLabels name="temperature_events" toName="ts">
    <Label value="上升阶段" background="#4CAF50"/>
    <Label value="峰值时刻" background="#FF5722"/>
    <Label value="下降阶段" background="#2196F3"/>
    <Label value="持续高温" background="#FF9800"/>
    <Label value="平台期" background="#9C27B0"/>
  </TimeSeriesLabels>
</View>

3. 测试预测

bash 复制代码

# 启动服务
python _wsgi.py

# 测试API
curl -X POST http://localhost:9090/predict \
  -H "Content-Type: application/json" \
  -d '{
    "tasks": [{
      "data": {
        "shot": 12345,
        "csv": "http://example.com/shot_12345.csv"
      }
    }]
  }'

总结 :温度标注ML Backend将复杂的温度时序分析 转化为自动化智能标注 ,通过6种模式识别算法(上升、峰值、下降、异常、平台、非周期性),全面覆盖温度曲线特征。特别是创新性的非周期性检测 ,填补了传统周期性检测的盲区,能够发现持续高温、异常平台等重要物理现象,为等离子体物理、材料科学等领域的科研工作提供了高效、标准化、智能化的数据分析工具。