目录
-
- 摘要
- 一、数据转换概述
-
- [1.1 数据转换类型](#1.1 数据转换类型)
- [1.2 转换场景](#1.2 转换场景)
- 二、数据格式化
-
- [2.1 时间格式化](#2.1 时间格式化)
- [2.2 数值格式化](#2.2 数值格式化)
- [2.3 字符串格式化](#2.3 字符串格式化)
- 三、单位转换
-
- [3.1 温度单位转换](#3.1 温度单位转换)
- [3.2 压力单位转换](#3.2 压力单位转换)
- [3.3 流量单位转换](#3.3 流量单位转换)
- [3.4 能量单位转换](#3.4 能量单位转换)
- 四、编码映射
-
- [4.1 设备编码映射](#4.1 设备编码映射)
- [4.2 状态码映射](#4.2 状态码映射)
- [4.3 告警级别映射](#4.3 告警级别映射)
- 五、数据标准化
-
- [5.1 Min-Max标准化](#5.1 Min-Max标准化)
- [5.2 Z-Score标准化](#5.2 Z-Score标准化)
- [5.3 小数定标标准化](#5.3 小数定标标准化)
- 六、批量转换
-
- [6.1 批量单位转换](#6.1 批量单位转换)
- [6.2 批量编码映射](#6.2 批量编码映射)
- 七、实战案例
-
- [7.1 工业数据标准化系统](#7.1 工业数据标准化系统)
- 八、总结
- 参考资料
摘要
本文深入讲解DolphinDB工业数据转换技术。从数据格式化到单位转换,从编码映射到数据标准化,从批量转换到自动化流程,全面介绍数据转换的核心方法。通过丰富的代码示例,帮助读者掌握工业数据转换的核心技能。
一、数据转换概述
1.1 数据转换类型
#mermaid-svg-YJeKQ4R4dPoKspm0{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-YJeKQ4R4dPoKspm0 .error-icon{fill:#552222;}#mermaid-svg-YJeKQ4R4dPoKspm0 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-YJeKQ4R4dPoKspm0 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .marker.cross{stroke:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-YJeKQ4R4dPoKspm0 p{margin:0;}#mermaid-svg-YJeKQ4R4dPoKspm0 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster-label text{fill:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster-label span{color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster-label span p{background-color:transparent;}#mermaid-svg-YJeKQ4R4dPoKspm0 .label text,#mermaid-svg-YJeKQ4R4dPoKspm0 span{fill:#333;color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node rect,#mermaid-svg-YJeKQ4R4dPoKspm0 .node circle,#mermaid-svg-YJeKQ4R4dPoKspm0 .node ellipse,#mermaid-svg-YJeKQ4R4dPoKspm0 .node polygon,#mermaid-svg-YJeKQ4R4dPoKspm0 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .rough-node .label text,#mermaid-svg-YJeKQ4R4dPoKspm0 .node .label text,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape .label{text-anchor:middle;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .rough-node .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .node .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape .label,#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape .label{text-align:center;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node.clickable{cursor:pointer;}#mermaid-svg-YJeKQ4R4dPoKspm0 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .arrowheadPath{fill:#333333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-YJeKQ4R4dPoKspm0 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YJeKQ4R4dPoKspm0 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster text{fill:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 .cluster span{color:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-YJeKQ4R4dPoKspm0 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-YJeKQ4R4dPoKspm0 rect.text{fill:none;stroke-width:0;}#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape p,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-YJeKQ4R4dPoKspm0 .icon-shape .label rect,#mermaid-svg-YJeKQ4R4dPoKspm0 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-YJeKQ4R4dPoKspm0 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-YJeKQ4R4dPoKspm0 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-YJeKQ4R4dPoKspm0 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 数据转换
格式转换
标准化数据
单位转换
编码映射
类型转换
1.2 转换场景
| 场景 | 说明 |
|---|---|
| 格式标准化 | 统一数据格式 |
| 单位转换 | 统一计量单位 |
| 编码映射 | 设备编码转换 |
| 类型转换 | 数据类型统一 |
二、数据格式化
2.1 时间格式化
python
// 时间格式化
t = table(
2024.01.15T10:30:45.123 as timestamp
)
// 格式化输出
select timestamp,
format(timestamp, "yyyy-MM-dd") as date_str,
format(timestamp, "HH:mm:ss") as time_str,
format(timestamp, "yyyy-MM-dd HH:mm:ss") as datetime_str
from t
2.2 数值格式化
python
// 数值格式化
t = table(
1234.5678 as value
)
select value,
format(value, "#.##") as decimal_2,
format(value, "#,###.##") as thousand_sep,
format(value * 100, "#.##") + "%" as percentage
from t
2.3 字符串格式化
python
// 字符串格式化
t = table(
"device_001" as device_id,
25.5 as temperature
)
select device_id,
temperature,
concat("设备", device_id, "温度: ", string(temperature), "°C") as message
from t
三、单位转换
3.1 温度单位转换
python
// 温度单位转换函数
def celsiusToFahrenheit(celsius) {
return celsius * 1.8 + 32
}
def fahrenheitToCelsius(fahrenheit) {
return (fahrenheit - 32) / 1.8
}
def celsiusToKelvin(celsius) {
return celsius + 273.15
}
// 使用
t = table(25.0 as celsius)
select celsius,
celsiusToFahrenheit(celsius) as fahrenheit,
celsiusToKelvin(celsius) as kelvin
from t
3.2 压力单位转换
python
// 压力单位转换
def barToPsi(bar) {
return bar * 14.5038
}
def psiToBar(psi) {
return psi / 14.5038
}
def barToKpa(bar) {
return bar * 100
}
def kpaToBar(kpa) {
return kpa / 100
}
3.3 流量单位转换
python
// 流量单位转换
def m3hToLpm(m3h) {
return m3h * 1000 / 60 // 立方米/小时 -> 升/分钟
}
def lpmToM3h(lpm) {
return lpm * 60 / 1000 // 升/分钟 -> 立方米/小时
}
3.4 能量单位转换
python
// 能量单位转换
def kwhToMj(kwh) {
return kwh * 3.6 // 千瓦时 -> 兆焦
}
def mjToKwh(mj) {
return mj / 3.6 // 兆焦 -> 千瓦时
}
四、编码映射
4.1 设备编码映射
python
// 设备编码映射表
deviceMapping = table(
["D001", "D002", "D003"] as old_code,
["DEVICE_001", "DEVICE_002", "DEVICE_003"] as new_code,
["车间A-设备1", "车间A-设备2", "车间B-设备1"] as description
)
// 原始数据
t = table(
["D001", "D002", "D001", "D003"] as device_id,
[25.0, 26.0, 27.0, 28.0] as temperature
)
// 映射转换
select deviceMapping.new_code as device_id,
deviceMapping.description,
t.temperature
from t
left join deviceMapping on t.device_id = deviceMapping.old_code
4.2 状态码映射
python
// 状态码映射
statusMapping = table(
[0, 1, 2, 3] as code,
["停止", "运行", "故障", "维护"] as status,
["red", "green", "red", "yellow"] as color
)
// 状态转换
t = table([1, 0, 2, 1, 3] as status_code)
select statusMapping.status,
statusMapping.color
from t
left join statusMapping on t.status_code = statusMapping.code
4.3 告警级别映射
python
// 告警级别映射
alertMapping = table(
[1, 2, 3, 4] as level,
["提示", "警告", "严重", "紧急"] as name,
["blue", "yellow", "orange", "red"] as color
)
五、数据标准化
5.1 Min-Max标准化
python
// Min-Max标准化
def normalizeMinMax(data) {
minVal = min(data)
maxVal = max(data)
return (data - minVal) / (maxVal - minVal)
}
// 使用
t = table(rand(20.0..30.0, 100) as temperature)
select temperature,
normalizeMinMax(temperature) as normalized
from t
5.2 Z-Score标准化
python
// Z-Score标准化
def normalizeZScore(data) {
meanVal = avg(data)
stdVal = std(data)
return (data - meanVal) / stdVal
}
// 使用
select temperature,
normalizeZScore(temperature) as zscore
from t
5.3 小数定标标准化
python
// 小数定标标准化
def normalizeDecimal(data) {
maxAbs = max(abs(data))
j = ceil(log10(maxAbs))
return data / pow(10, j)
}
六、批量转换
6.1 批量单位转换
python
// 批量单位转换
def batchUnitConvert(data, conversions) {
result = data
for (col in conversions.keys()) {
func = conversions[col]
result[col] = func(data[col])
}
return result
}
// 使用
t = table(
25.0 as temperature_c,
1.5 as pressure_bar
)
conversions = dict(STRING, ANY, [
["temperature_c", celsiusToFahrenheit],
["pressure_bar", barToPsi]
])
converted = batchUnitConvert(t, conversions)
6.2 批量编码映射
python
// 批量编码映射
def batchCodeMapping(data, mappingTable, fromCol, toCol) {
return select mappingTable[toCol] as mapped_value, *
from data
left join mappingTable on data[fromCol] = mappingTable[fromCol]
}
七、实战案例
7.1 工业数据标准化系统
python
// ========== 工业数据标准化系统 ==========
// 1. 创建原始数据
t = table(
take(1..10, 1000) as device_id,
2024.01.01T00:00:00 + 0..999 * 60000 as timestamp,
rand(20.0..30.0, 1000) as temperature_c, // 摄氏度
rand(1.0..2.0, 1000) as pressure_bar, // 巴
rand(100.0..200.0, 1000) as flow_m3h // 立方米/小时
)
// 2. 单位转换
converted = select
device_id,
timestamp,
temperature_c,
celsiusToFahrenheit(temperature_c) as temperature_f,
pressure_bar,
barToPsi(pressure_bar) as pressure_psi,
flow_m3h,
m3hToLpm(flow_m3h) as flow_lpm
from t
// 3. 数据标准化
normalized = select
device_id,
timestamp,
temperature_c,
normalizeZScore(temperature_c) as temp_zscore,
pressure_bar,
normalizeZScore(pressure_bar) as pressure_zscore,
flow_m3h,
normalizeZScore(flow_m3h) as flow_zscore
from converted
// 4. 写入分布式表
db = database("dfs://standardized_db", VALUE, 1..10)
db.createPartitionedTable(normalized, `standardized_data, `device_id)
loadTable("dfs://standardized_db", "standardized_data").append!(normalized)
// 5. 验证
select top 10 * from normalized
print("工业数据标准化完成")
八、总结
本文详细介绍了DolphinDB工业数据转换:
- 数据格式化:时间、数值、字符串格式化
- 单位转换:温度、压力、流量、能量转换
- 编码映射:设备编码、状态码、告警级别
- 数据标准化:Min-Max、Z-Score、小数定标
- 批量转换:批量单位转换、批量编码映射
思考题:
- 如何设计通用的单位转换函数?
- 如何处理复杂的数据映射关系?
- 如何保证数据转换的一致性?