目录
-
- 摘要
- 一、数据库同步概述
-
- [1.1 同步场景](#1.1 同步场景)
- [1.2 同步方案](#1.2 同步方案)
- 二、MySQL数据同步
-
- [2.1 连接MySQL](#2.1 连接MySQL)
- [2.2 全量同步](#2.2 全量同步)
- [2.3 增量同步](#2.3 增量同步)
- 三、PostgreSQL数据同步
-
- [3.1 连接PostgreSQL](#3.1 连接PostgreSQL)
- [3.2 全量同步](#3.2 全量同步)
- [3.3 增量同步](#3.3 增量同步)
- 四、数据转换
-
- [4.1 类型映射](#4.1 类型映射)
- [4.2 数据清洗](#4.2 数据清洗)
- [4.3 数据验证](#4.3 数据验证)
- 五、实时同步
-
- [5.1 Binlog同步(MySQL)](#5.1 Binlog同步(MySQL))
- [5.2 CDC同步](#5.2 CDC同步)
- 六、同步监控
-
- [6.1 同步状态表](#6.1 同步状态表)
- [6.2 监控函数](#6.2 监控函数)
- 七、实战案例
-
- [7.1 MySQL到DolphinDB完整同步](#7.1 MySQL到DolphinDB完整同步)
- 八、总结
- 参考资料
摘要
本文深入讲解DolphinDB数据库同步技术。从同步方案设计到数据迁移,从增量同步到数据转换,从定时任务到实时同步,全面介绍数据库同步的核心方法。通过丰富的代码示例,帮助读者掌握数据同步的核心技能。
一、数据库同步概述
1.1 同步场景
#mermaid-svg-gJ7DuSUWervnnArC{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-gJ7DuSUWervnnArC .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-gJ7DuSUWervnnArC .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-gJ7DuSUWervnnArC .error-icon{fill:#552222;}#mermaid-svg-gJ7DuSUWervnnArC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-gJ7DuSUWervnnArC .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-gJ7DuSUWervnnArC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-gJ7DuSUWervnnArC .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-gJ7DuSUWervnnArC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-gJ7DuSUWervnnArC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-gJ7DuSUWervnnArC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-gJ7DuSUWervnnArC .marker.cross{stroke:#333333;}#mermaid-svg-gJ7DuSUWervnnArC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-gJ7DuSUWervnnArC p{margin:0;}#mermaid-svg-gJ7DuSUWervnnArC .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster-label text{fill:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster-label span{color:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster-label span p{background-color:transparent;}#mermaid-svg-gJ7DuSUWervnnArC .label text,#mermaid-svg-gJ7DuSUWervnnArC span{fill:#333;color:#333;}#mermaid-svg-gJ7DuSUWervnnArC .node rect,#mermaid-svg-gJ7DuSUWervnnArC .node circle,#mermaid-svg-gJ7DuSUWervnnArC .node ellipse,#mermaid-svg-gJ7DuSUWervnnArC .node polygon,#mermaid-svg-gJ7DuSUWervnnArC .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .rough-node .label text,#mermaid-svg-gJ7DuSUWervnnArC .node .label text,#mermaid-svg-gJ7DuSUWervnnArC .image-shape .label,#mermaid-svg-gJ7DuSUWervnnArC .icon-shape .label{text-anchor:middle;}#mermaid-svg-gJ7DuSUWervnnArC .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .rough-node .label,#mermaid-svg-gJ7DuSUWervnnArC .node .label,#mermaid-svg-gJ7DuSUWervnnArC .image-shape .label,#mermaid-svg-gJ7DuSUWervnnArC .icon-shape .label{text-align:center;}#mermaid-svg-gJ7DuSUWervnnArC .node.clickable{cursor:pointer;}#mermaid-svg-gJ7DuSUWervnnArC .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-gJ7DuSUWervnnArC .arrowheadPath{fill:#333333;}#mermaid-svg-gJ7DuSUWervnnArC .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-gJ7DuSUWervnnArC .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-gJ7DuSUWervnnArC .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-gJ7DuSUWervnnArC .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-gJ7DuSUWervnnArC .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-gJ7DuSUWervnnArC .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-gJ7DuSUWervnnArC .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .cluster text{fill:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster span{color:#333;}#mermaid-svg-gJ7DuSUWervnnArC div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-gJ7DuSUWervnnArC .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-gJ7DuSUWervnnArC rect.text{fill:none;stroke-width:0;}#mermaid-svg-gJ7DuSUWervnnArC .icon-shape,#mermaid-svg-gJ7DuSUWervnnArC .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-gJ7DuSUWervnnArC .icon-shape p,#mermaid-svg-gJ7DuSUWervnnArC .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-gJ7DuSUWervnnArC .icon-shape .label rect,#mermaid-svg-gJ7DuSUWervnnArC .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-gJ7DuSUWervnnArC .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-gJ7DuSUWervnnArC .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-gJ7DuSUWervnnArC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 数据同步架构
MySQL
同步任务
PostgreSQL
DolphinDB
同步方式
全量同步
增量同步
实时同步
1.2 同步方案
| 方案 | 说明 | 适用场景 |
|---|---|---|
| 全量同步 | 一次性迁移全部数据 | 初始化、历史数据 |
| 增量同步 | 定时同步新增数据 | 定期更新 |
| 实时同步 | 实时捕获变更 | 实时分析 |
二、MySQL数据同步
2.1 连接MySQL
python
// 加载MySQL插件
loadPlugin("mysql")
// 连接MySQL
conn = mysql::connect("localhost", 3306, "root", "password", "test_db")
// 测试连接
mysql::query(conn, "SELECT 1")
2.2 全量同步
python
// 全量同步MySQL表到DolphinDB
// 1. 查询MySQL数据
mysqlData = mysql::query(conn, "SELECT * FROM sensor_data")
// 2. 创建DolphinDB表
db = database("dfs://mysql_sync_db", VALUE, 1..100)
schema = table(1:0, `device_id`timestamp`temperature`humidity,
[INT, TIMESTAMP, DOUBLE, DOUBLE])
db.createPartitionedTable(schema, `sensor_data, `device_id)
// 3. 写入数据
loadTable("dfs://mysql_sync_db", "sensor_data").append!(mysqlData)
// 4. 验证
select count(*) from loadTable("dfs://mysql_sync_db", "sensor_data")
2.3 增量同步
python
// 增量同步:基于时间戳
// 记录最后同步时间
share table(1:0, `table_name`last_sync_time,
[STRING, TIMESTAMP]) as sync_status
// 增量同步函数
def incrementalSync(conn, tableName) {
// 获取最后同步时间
lastTime = exec last_sync_time from sync_status where table_name = tableName
if (lastTime.size() == 0) {
lastTime = 1970.01.01 // 首次同步
}
// 查询增量数据
sql = "SELECT * FROM " + tableName + " WHERE update_time > '" + lastTime + "'"
newData = mysql::query(conn, sql)
// 写入DolphinDB
if (newData.rows() > 0) {
loadTable("dfs://mysql_sync_db", tableName).append!(newData)
// 更新同步状态
maxTime = exec max(update_time) from newData
update sync_status set last_sync_time = maxTime where table_name = tableName
}
return newData.rows()
}
// 定时执行
scheduleJob("mysql_incremental", "MySQL增量同步",
def() { incrementalSync(conn, "sensor_data") },
00:05, 2024.01.01, 2030.12.31, 'D')
三、PostgreSQL数据同步
3.1 连接PostgreSQL
python
// 加载PostgreSQL插件
loadPlugin("postgresql")
// 连接PostgreSQL
conn = postgresql::connect("localhost", 5432, "postgres", "password", "test_db")
// 测试连接
postgresql::query(conn, "SELECT 1")
3.2 全量同步
python
// 全量同步PostgreSQL表
pgData = postgresql::query(conn, "SELECT * FROM sensor_data")
// 写入DolphinDB
loadTable("dfs://pg_sync_db", "sensor_data").append!(pgData)
3.3 增量同步
python
// PostgreSQL增量同步
def pgIncrementalSync(conn, tableName) {
lastTime = exec last_sync_time from sync_status where table_name = tableName
sql = "SELECT * FROM " + tableName + " WHERE updated_at > '" + lastTime + "'"
newData = postgresql::query(conn, sql)
if (newData.rows() > 0) {
loadTable("dfs://pg_sync_db", tableName).append!(newData)
maxTime = exec max(updated_at) from newData
update sync_status set last_sync_time = maxTime where table_name = tableName
}
return newData.rows()
}
四、数据转换
4.1 类型映射
python
// MySQL/PostgreSQL -> DolphinDB 类型映射
// MySQL类型映射
def mysqlTypeToDolphinDB(mysqlType) {
typeMap = dict(STRING, STRING, [
["INT", "INT"],
["BIGINT", "LONG"],
["FLOAT", "FLOAT"],
["DOUBLE", "DOUBLE"],
["VARCHAR", "STRING"],
["DATETIME", "DATETIME"],
["TIMESTAMP", "TIMESTAMP"]
])
return typeMap[mysqlType]
}
4.2 数据清洗
python
// 数据清洗函数
def cleanData(data) {
// 处理NULL值
cleaned = select
device_id,
timestamp,
iif(temperature is null, avg(temperature), temperature) as temperature,
iif(humidity is null, avg(humidity), humidity) as humidity
from data
return cleaned
}
4.3 数据验证
python
// 数据验证
def validateData(data) {
// 检查必填字段
if (sum(isNull(data.device_id)) > 0) {
throw "device_id存在空值"
}
// 检查数据范围
if (sum(data.temperature < -40 or data.temperature > 100) > 0) {
throw "temperature超出范围"
}
return true
}
五、实时同步
5.1 Binlog同步(MySQL)
python
// MySQL Binlog实时同步
// 需要开启MySQL Binlog
// 配置Binlog监听
binlogConfig = dict(STRING, ANY, [
["host", "localhost"],
["port", 3306],
["user", "root"],
["password", "password"],
["serverId", 1]
])
// 启动Binlog监听
// mysql::startBinlogListener(binlogConfig, handler)
5.2 CDC同步
python
// 使用Debezium CDC
// 1. 部署Debezium连接器
// 2. 捕获变更事件
// 3. 推送到Kafka
// 4. DolphinDB消费Kafka
六、同步监控
6.1 同步状态表
python
// 创建同步状态表
share table(1:0,
`source_table`target_table`sync_time`sync_count`status`error_msg,
[STRING, STRING, TIMESTAMP, LONG, STRING, STRING]) as sync_log
// 记录同步日志
def logSync(sourceTable, targetTable, count, status, errorMsg = "") {
insert into sync_log values (
sourceTable, targetTable, now(), count, status, errorMsg
)
}
6.2 监控函数
python
// 同步监控
def monitorSync() {
print("=== 数据同步监控 ===")
// 最近同步记录
recentSyncs = select top 10 * from sync_log order by sync_time desc
print(recentSyncs)
// 失败记录
failures = select count(*) as cnt from sync_log where status = "FAILED"
print("失败次数: " + string(failures.cnt))
}
monitorSync()
七、实战案例
7.1 MySQL到DolphinDB完整同步
python
// ========== MySQL到DolphinDB完整同步 ==========
// 1. 加载插件
loadPlugin("mysql")
// 2. 连接MySQL
mysqlConn = mysql::connect("localhost", 3306, "root", "password", "iot_db")
// 3. 创建DolphinDB表
db = database("dfs://sync_db", VALUE, 1..1000)
schema = table(1:0,
`device_id`timestamp`temperature`humidity`pressure,
[INT, TIMESTAMP, DOUBLE, DOUBLE, DOUBLE])
db.createPartitionedTable(schema, `sensor_data, `device_id)
// 4. 全量同步
def fullSync(conn, tableName) {
print("开始全量同步: " + tableName)
data = mysql::query(conn, "SELECT * FROM " + tableName)
loadTable("dfs://sync_db", tableName).append!(data)
print("同步完成: " + string(data.rows()) + " 条")
logSync(tableName, tableName, data.rows(), "SUCCESS")
}
// 5. 增量同步
def incrementalSync(conn, tableName) {
print("开始增量同步: " + tableName)
lastTime = exec max(timestamp) from loadTable("dfs://sync_db", tableName)
sql = "SELECT * FROM " + tableName + " WHERE timestamp > '" + string(lastTime) + "'"
data = mysql::query(conn, sql)
if (data.rows() > 0) {
loadTable("dfs://sync_db", tableName).append!(data)
print("增量同步: " + string(data.rows()) + " 条")
}
logSync(tableName, tableName, data.rows(), "SUCCESS")
}
// 6. 执行同步
fullSync(mysqlConn, "sensor_data")
// 7. 定时增量同步
scheduleJob("incremental_sync", "增量同步",
def() { incrementalSync(mysqlConn, "sensor_data") },
00:10, 2024.01.01, 2030.12.31, 'D')
print("MySQL到DolphinDB同步系统启动完成")
八、总结
本文详细介绍了DolphinDB数据库同步:
- 同步方案:全量同步、增量同步、实时同步
- MySQL同步:连接、全量、增量
- PostgreSQL同步:连接、全量、增量
- 数据转换:类型映射、数据清洗、数据验证
- 实时同步:Binlog、CDC
- 同步监控:状态表、监控函数
思考题:
- 如何选择合适的同步方案?
- 如何保证数据同步的一致性?
- 如何处理同步失败问题?