DolphinDB工业数据转换:格式标准化与单位转换

目录

    • 摘要
    • 一、数据转换概述
      • [1.1 数据转换类型](#1.1 数据转换类型)
      • [1.2 转换场景](#1.2 转换场景)
    • 二、数据格式化
      • [2.1 时间格式化](#2.1 时间格式化)
      • [2.2 数值格式化](#2.2 数值格式化)
      • [2.3 字符串格式化](#2.3 字符串格式化)
    • 三、单位转换
      • [3.1 温度单位转换](#3.1 温度单位转换)
      • [3.2 压力单位转换](#3.2 压力单位转换)
      • [3.3 流量单位转换](#3.3 流量单位转换)
      • [3.4 能量单位转换](#3.4 能量单位转换)
    • 四、编码映射
      • [4.1 设备编码映射](#4.1 设备编码映射)
      • [4.2 状态码映射](#4.2 状态码映射)
      • [4.3 告警级别映射](#4.3 告警级别映射)
    • 五、数据标准化
      • [5.1 Min-Max标准化](#5.1 Min-Max标准化)
      • [5.2 Z-Score标准化](#5.2 Z-Score标准化)
      • [5.3 小数定标标准化](#5.3 小数定标标准化)
    • 六、批量转换
      • [6.1 批量单位转换](#6.1 批量单位转换)
      • [6.2 批量编码映射](#6.2 批量编码映射)
    • 七、实战案例
      • [7.1 工业数据标准化系统](#7.1 工业数据标准化系统)
    • 八、总结
    • 参考资料

摘要

本文深入讲解DolphinDB工业数据转换技术。从数据格式化到单位转换,从编码映射到数据标准化,从批量转换到自动化流程,全面介绍数据转换的核心方法。通过丰富的代码示例,帮助读者掌握工业数据转换的核心技能。


一、数据转换概述

1.1 数据转换类型

#mermaid-svg-YJeKQ4R4dPoKspm0{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-YJeKQ4R4dPoKspm0 .error-icon{fill:#552222;}#mermaid-svg-YJeKQ4R4dPoKspm0 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-YJeKQ4R4dPoKspm0 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .marker.cross{stroke:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-YJeKQ4R4dPoKspm0 p{margin:0;}#mermaid-svg-YJeKQ4R4dPoKspm0 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster-label text{fill:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster-label span{color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster-label span p{background-color:transparent;}#mermaid-svg-YJeKQ4R4dPoKspm0 .label text,#mermaid-svg-YJeKQ4R4dPoKspm0 span{fill:#333;color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node rect,#mermaid-svg-YJeKQ4R4dPoKspm0 .node circle,#mermaid-svg-YJeKQ4R4dPoKspm0 .node ellipse,#mermaid-svg-YJeKQ4R4dPoKspm0 .node polygon,#mermaid-svg-YJeKQ4R4dPoKspm0 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .rough-node .label text,#mermaid-svg-YJeKQ4R4dPoKspm0 .node .label text,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape .label{text-anchor:middle;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .rough-node .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .node .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape .label{text-align:center;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node.clickable{cursor:pointer;}#mermaid-svg-YJeKQ4R4dPoKspm0 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .arrowheadPath{fill:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YJeKQ4R4dPoKspm0 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster text{fill:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster span{color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-YJeKQ4R4dPoKspm0 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 rect.text{fill:none;stroke-width:0;}#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape p,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape .label rect,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YJeKQ4R4dPoKspm0 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-YJeKQ4R4dPoKspm0 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 数据转换
格式转换
标准化数据
单位转换
编码映射
类型转换

1.2 转换场景

场景 说明
格式标准化 统一数据格式
单位转换 统一计量单位
编码映射 设备编码转换
类型转换 数据类型统一

二、数据格式化

2.1 时间格式化

python 复制代码
// 时间格式化
t = table(
    2024.01.15T10:30:45.123 as timestamp
)

// 格式化输出
select timestamp,
       format(timestamp, "yyyy-MM-dd") as date_str,
       format(timestamp, "HH:mm:ss") as time_str,
       format(timestamp, "yyyy-MM-dd HH:mm:ss") as datetime_str
from t

2.2 数值格式化

python 复制代码
// 数值格式化
t = table(
    1234.5678 as value
)

select value,
       format(value, "#.##") as decimal_2,
       format(value, "#,###.##") as thousand_sep,
       format(value * 100, "#.##") + "%" as percentage
from t

2.3 字符串格式化

python 复制代码
// 字符串格式化
t = table(
    "device_001" as device_id,
    25.5 as temperature
)

select device_id,
       temperature,
       concat("设备", device_id, "温度: ", string(temperature), "°C") as message
from t

三、单位转换

3.1 温度单位转换

python 复制代码
// 温度单位转换函数
def celsiusToFahrenheit(celsius) {
    return celsius * 1.8 + 32
}

def fahrenheitToCelsius(fahrenheit) {
    return (fahrenheit - 32) / 1.8
}

def celsiusToKelvin(celsius) {
    return celsius + 273.15
}

// 使用
t = table(25.0 as celsius)

select celsius,
       celsiusToFahrenheit(celsius) as fahrenheit,
       celsiusToKelvin(celsius) as kelvin
from t

3.2 压力单位转换

python 复制代码
// 压力单位转换
def barToPsi(bar) {
    return bar * 14.5038
}

def psiToBar(psi) {
    return psi / 14.5038
}

def barToKpa(bar) {
    return bar * 100
}

def kpaToBar(kpa) {
    return kpa / 100
}

3.3 流量单位转换

python 复制代码
// 流量单位转换
def m3hToLpm(m3h) {
    return m3h * 1000 / 60  // 立方米/小时 -> 升/分钟
}

def lpmToM3h(lpm) {
    return lpm * 60 / 1000  // 升/分钟 -> 立方米/小时
}

3.4 能量单位转换

python 复制代码
// 能量单位转换
def kwhToMj(kwh) {
    return kwh * 3.6  // 千瓦时 -> 兆焦
}

def mjToKwh(mj) {
    return mj / 3.6  // 兆焦 -> 千瓦时
}

四、编码映射

4.1 设备编码映射

python 复制代码
// 设备编码映射表
deviceMapping = table(
    ["D001", "D002", "D003"] as old_code,
    ["DEVICE_001", "DEVICE_002", "DEVICE_003"] as new_code,
    ["车间A-设备1", "车间A-设备2", "车间B-设备1"] as description
)

// 原始数据
t = table(
    ["D001", "D002", "D001", "D003"] as device_id,
    [25.0, 26.0, 27.0, 28.0] as temperature
)

// 映射转换
select deviceMapping.new_code as device_id,
       deviceMapping.description,
       t.temperature
from t
left join deviceMapping on t.device_id = deviceMapping.old_code

4.2 状态码映射

python 复制代码
// 状态码映射
statusMapping = table(
    [0, 1, 2, 3] as code,
    ["停止", "运行", "故障", "维护"] as status,
    ["red", "green", "red", "yellow"] as color
)

// 状态转换
t = table([1, 0, 2, 1, 3] as status_code)

select statusMapping.status,
       statusMapping.color
from t
left join statusMapping on t.status_code = statusMapping.code

4.3 告警级别映射

python 复制代码
// 告警级别映射
alertMapping = table(
    [1, 2, 3, 4] as level,
    ["提示", "警告", "严重", "紧急"] as name,
    ["blue", "yellow", "orange", "red"] as color
)

五、数据标准化

5.1 Min-Max标准化

python 复制代码
// Min-Max标准化
def normalizeMinMax(data) {
    minVal = min(data)
    maxVal = max(data)
    return (data - minVal) / (maxVal - minVal)
}

// 使用
t = table(rand(20.0..30.0, 100) as temperature)

select temperature,
       normalizeMinMax(temperature) as normalized
from t

5.2 Z-Score标准化

python 复制代码
// Z-Score标准化
def normalizeZScore(data) {
    meanVal = avg(data)
    stdVal = std(data)
    return (data - meanVal) / stdVal
}

// 使用
select temperature,
       normalizeZScore(temperature) as zscore
from t

5.3 小数定标标准化

python 复制代码
// 小数定标标准化
def normalizeDecimal(data) {
    maxAbs = max(abs(data))
    j = ceil(log10(maxAbs))
    return data / pow(10, j)
}

六、批量转换

6.1 批量单位转换

python 复制代码
// 批量单位转换
def batchUnitConvert(data, conversions) {
    result = data
    
    for (col in conversions.keys()) {
        func = conversions[col]
        result[col] = func(data[col])
    }
    
    return result
}

// 使用
t = table(
    25.0 as temperature_c,
    1.5 as pressure_bar
)

conversions = dict(STRING, ANY, [
    ["temperature_c", celsiusToFahrenheit],
    ["pressure_bar", barToPsi]
])

converted = batchUnitConvert(t, conversions)

6.2 批量编码映射

python 复制代码
// 批量编码映射
def batchCodeMapping(data, mappingTable, fromCol, toCol) {
    return select mappingTable[toCol] as mapped_value, *
           from data
           left join mappingTable on data[fromCol] = mappingTable[fromCol]
}

七、实战案例

7.1 工业数据标准化系统

python 复制代码
// ========== 工业数据标准化系统 ==========

// 1. 创建原始数据
t = table(
    take(1..10, 1000) as device_id,
    2024.01.01T00:00:00 + 0..999 * 60000 as timestamp,
    rand(20.0..30.0, 1000) as temperature_c,  // 摄氏度
    rand(1.0..2.0, 1000) as pressure_bar,     // 巴
    rand(100.0..200.0, 1000) as flow_m3h      // 立方米/小时
)

// 2. 单位转换
converted = select 
    device_id,
    timestamp,
    temperature_c,
    celsiusToFahrenheit(temperature_c) as temperature_f,
    pressure_bar,
    barToPsi(pressure_bar) as pressure_psi,
    flow_m3h,
    m3hToLpm(flow_m3h) as flow_lpm
from t

// 3. 数据标准化
normalized = select
    device_id,
    timestamp,
    temperature_c,
    normalizeZScore(temperature_c) as temp_zscore,
    pressure_bar,
    normalizeZScore(pressure_bar) as pressure_zscore,
    flow_m3h,
    normalizeZScore(flow_m3h) as flow_zscore
from converted

// 4. 写入分布式表
db = database("dfs://standardized_db", VALUE, 1..10)
db.createPartitionedTable(normalized, `standardized_data, `device_id)
loadTable("dfs://standardized_db", "standardized_data").append!(normalized)

// 5. 验证
select top 10 * from normalized

print("工业数据标准化完成")

八、总结

本文详细介绍了DolphinDB工业数据转换:

  1. 数据格式化:时间、数值、字符串格式化
  2. 单位转换:温度、压力、流量、能量转换
  3. 编码映射:设备编码、状态码、告警级别
  4. 数据标准化:Min-Max、Z-Score、小数定标
  5. 批量转换:批量单位转换、批量编码映射

思考题

  1. 如何设计通用的单位转换函数?
  2. 如何处理复杂的数据映射关系?
  3. 如何保证数据转换的一致性?

参考资料