DolphinDB数据库同步:MySQL/PostgreSQL到DolphinDB

目录

摘要

本文深入讲解DolphinDB数据库同步技术。从同步方案设计到数据迁移,从增量同步到数据转换,从定时任务到实时同步,全面介绍数据库同步的核心方法。通过丰富的代码示例,帮助读者掌握数据同步的核心技能。


一、数据库同步概述

1.1 同步场景

#mermaid-svg-gJ7DuSUWervnnArC{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-gJ7DuSUWervnnArC .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-gJ7DuSUWervnnArC .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-gJ7DuSUWervnnArC .error-icon{fill:#552222;}#mermaid-svg-gJ7DuSUWervnnArC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-gJ7DuSUWervnnArC .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-gJ7DuSUWervnnArC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-gJ7DuSUWervnnArC .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-gJ7DuSUWervnnArC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-gJ7DuSUWervnnArC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-gJ7DuSUWervnnArC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-gJ7DuSUWervnnArC .marker.cross{stroke:#333333;}#mermaid-svg-gJ7DuSUWervnnArC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-gJ7DuSUWervnnArC p{margin:0;}#mermaid-svg-gJ7DuSUWervnnArC .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster-label text{fill:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster-label span{color:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster-label span p{background-color:transparent;}#mermaid-svg-gJ7DuSUWervnnArC .label text,#mermaid-svg-gJ7DuSUWervnnArC span{fill:#333;color:#333;}#mermaid-svg-gJ7DuSUWervnnArC .node rect,#mermaid-svg-gJ7DuSUWervnnArC .node circle,#mermaid-svg-gJ7DuSUWervnnArC .node ellipse,#mermaid-svg-gJ7DuSUWervnnArC .node polygon,#mermaid-svg-gJ7DuSUWervnnArC .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .rough-node .label text,#mermaid-svg-gJ7DuSUWervnnArC .node .label text,#mermaid-svg-gJ7DuSUWervnnArC .image-shape .label,#mermaid-svg-gJ7DuSUWervnnArC .icon-shape .label{text-anchor:middle;}#mermaid-svg-gJ7DuSUWervnnArC .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .rough-node .label,#mermaid-svg-gJ7DuSUWervnnArC .node .label,#mermaid-svg-gJ7DuSUWervnnArC .image-shape .label,#mermaid-svg-gJ7DuSUWervnnArC .icon-shape .label{text-align:center;}#mermaid-svg-gJ7DuSUWervnnArC .node.clickable{cursor:pointer;}#mermaid-svg-gJ7DuSUWervnnArC .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-gJ7DuSUWervnnArC .arrowheadPath{fill:#333333;}#mermaid-svg-gJ7DuSUWervnnArC .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-gJ7DuSUWervnnArC .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-gJ7DuSUWervnnArC .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-gJ7DuSUWervnnArC .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-gJ7DuSUWervnnArC .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-gJ7DuSUWervnnArC .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-gJ7DuSUWervnnArC .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-gJ7DuSUWervnnArC .cluster text{fill:#333;}#mermaid-svg-gJ7DuSUWervnnArC .cluster span{color:#333;}#mermaid-svg-gJ7DuSUWervnnArC div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-gJ7DuSUWervnnArC .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-gJ7DuSUWervnnArC rect.text{fill:none;stroke-width:0;}#mermaid-svg-gJ7DuSUWervnnArC .icon-shape,#mermaid-svg-gJ7DuSUWervnnArC .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-gJ7DuSUWervnnArC .icon-shape p,#mermaid-svg-gJ7DuSUWervnnArC .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-gJ7DuSUWervnnArC .icon-shape .label rect,#mermaid-svg-gJ7DuSUWervnnArC .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-gJ7DuSUWervnnArC .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-gJ7DuSUWervnnArC .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-gJ7DuSUWervnnArC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 数据同步架构
MySQL
同步任务
PostgreSQL
DolphinDB
同步方式
全量同步
增量同步
实时同步

1.2 同步方案

方案 说明 适用场景
全量同步 一次性迁移全部数据 初始化、历史数据
增量同步 定时同步新增数据 定期更新
实时同步 实时捕获变更 实时分析

二、MySQL数据同步

2.1 连接MySQL

python 复制代码
// 加载MySQL插件
loadPlugin("mysql")

// 连接MySQL
conn = mysql::connect("localhost", 3306, "root", "password", "test_db")

// 测试连接
mysql::query(conn, "SELECT 1")

2.2 全量同步

python 复制代码
// 全量同步MySQL表到DolphinDB

// 1. 查询MySQL数据
mysqlData = mysql::query(conn, "SELECT * FROM sensor_data")

// 2. 创建DolphinDB表
db = database("dfs://mysql_sync_db", VALUE, 1..100)
schema = table(1:0, `device_id`timestamp`temperature`humidity,
               [INT, TIMESTAMP, DOUBLE, DOUBLE])
db.createPartitionedTable(schema, `sensor_data, `device_id)

// 3. 写入数据
loadTable("dfs://mysql_sync_db", "sensor_data").append!(mysqlData)

// 4. 验证
select count(*) from loadTable("dfs://mysql_sync_db", "sensor_data")

2.3 增量同步

python 复制代码
// 增量同步:基于时间戳

// 记录最后同步时间
share table(1:0, `table_name`last_sync_time,
            [STRING, TIMESTAMP]) as sync_status

// 增量同步函数
def incrementalSync(conn, tableName) {
    // 获取最后同步时间
    lastTime = exec last_sync_time from sync_status where table_name = tableName
    
    if (lastTime.size() == 0) {
        lastTime = 1970.01.01  // 首次同步
    }
    
    // 查询增量数据
    sql = "SELECT * FROM " + tableName + " WHERE update_time > '" + lastTime + "'"
    newData = mysql::query(conn, sql)
    
    // 写入DolphinDB
    if (newData.rows() > 0) {
        loadTable("dfs://mysql_sync_db", tableName).append!(newData)
        
        // 更新同步状态
        maxTime = exec max(update_time) from newData
        update sync_status set last_sync_time = maxTime where table_name = tableName
    }
    
    return newData.rows()
}

// 定时执行
scheduleJob("mysql_incremental", "MySQL增量同步",
    def() { incrementalSync(conn, "sensor_data") },
    00:05, 2024.01.01, 2030.12.31, 'D')

三、PostgreSQL数据同步

3.1 连接PostgreSQL

python 复制代码
// 加载PostgreSQL插件
loadPlugin("postgresql")

// 连接PostgreSQL
conn = postgresql::connect("localhost", 5432, "postgres", "password", "test_db")

// 测试连接
postgresql::query(conn, "SELECT 1")

3.2 全量同步

python 复制代码
// 全量同步PostgreSQL表
pgData = postgresql::query(conn, "SELECT * FROM sensor_data")

// 写入DolphinDB
loadTable("dfs://pg_sync_db", "sensor_data").append!(pgData)

3.3 增量同步

python 复制代码
// PostgreSQL增量同步
def pgIncrementalSync(conn, tableName) {
    lastTime = exec last_sync_time from sync_status where table_name = tableName
    
    sql = "SELECT * FROM " + tableName + " WHERE updated_at > '" + lastTime + "'"
    newData = postgresql::query(conn, sql)
    
    if (newData.rows() > 0) {
        loadTable("dfs://pg_sync_db", tableName).append!(newData)
        
        maxTime = exec max(updated_at) from newData
        update sync_status set last_sync_time = maxTime where table_name = tableName
    }
    
    return newData.rows()
}

四、数据转换

4.1 类型映射

python 复制代码
// MySQL/PostgreSQL -> DolphinDB 类型映射

// MySQL类型映射
def mysqlTypeToDolphinDB(mysqlType) {
    typeMap = dict(STRING, STRING, [
        ["INT", "INT"],
        ["BIGINT", "LONG"],
        ["FLOAT", "FLOAT"],
        ["DOUBLE", "DOUBLE"],
        ["VARCHAR", "STRING"],
        ["DATETIME", "DATETIME"],
        ["TIMESTAMP", "TIMESTAMP"]
    ])
    return typeMap[mysqlType]
}

4.2 数据清洗

python 复制代码
// 数据清洗函数
def cleanData(data) {
    // 处理NULL值
    cleaned = select 
        device_id,
        timestamp,
        iif(temperature is null, avg(temperature), temperature) as temperature,
        iif(humidity is null, avg(humidity), humidity) as humidity
    from data
    
    return cleaned
}

4.3 数据验证

python 复制代码
// 数据验证
def validateData(data) {
    // 检查必填字段
    if (sum(isNull(data.device_id)) > 0) {
        throw "device_id存在空值"
    }
    
    // 检查数据范围
    if (sum(data.temperature < -40 or data.temperature > 100) > 0) {
        throw "temperature超出范围"
    }
    
    return true
}

五、实时同步

5.1 Binlog同步(MySQL)

python 复制代码
// MySQL Binlog实时同步
// 需要开启MySQL Binlog

// 配置Binlog监听
binlogConfig = dict(STRING, ANY, [
    ["host", "localhost"],
    ["port", 3306],
    ["user", "root"],
    ["password", "password"],
    ["serverId", 1]
])

// 启动Binlog监听
// mysql::startBinlogListener(binlogConfig, handler)

5.2 CDC同步

python 复制代码
// 使用Debezium CDC
// 1. 部署Debezium连接器
// 2. 捕获变更事件
// 3. 推送到Kafka
// 4. DolphinDB消费Kafka

六、同步监控

6.1 同步状态表

python 复制代码
// 创建同步状态表
share table(1:0, 
    `source_table`target_table`sync_time`sync_count`status`error_msg,
    [STRING, STRING, TIMESTAMP, LONG, STRING, STRING]) as sync_log

// 记录同步日志
def logSync(sourceTable, targetTable, count, status, errorMsg = "") {
    insert into sync_log values (
        sourceTable, targetTable, now(), count, status, errorMsg
    )
}

6.2 监控函数

python 复制代码
// 同步监控
def monitorSync() {
    print("=== 数据同步监控 ===")
    
    // 最近同步记录
    recentSyncs = select top 10 * from sync_log order by sync_time desc
    print(recentSyncs)
    
    // 失败记录
    failures = select count(*) as cnt from sync_log where status = "FAILED"
    print("失败次数: " + string(failures.cnt))
}

monitorSync()

七、实战案例

7.1 MySQL到DolphinDB完整同步

python 复制代码
// ========== MySQL到DolphinDB完整同步 ==========

// 1. 加载插件
loadPlugin("mysql")

// 2. 连接MySQL
mysqlConn = mysql::connect("localhost", 3306, "root", "password", "iot_db")

// 3. 创建DolphinDB表
db = database("dfs://sync_db", VALUE, 1..1000)
schema = table(1:0, 
    `device_id`timestamp`temperature`humidity`pressure,
    [INT, TIMESTAMP, DOUBLE, DOUBLE, DOUBLE])
db.createPartitionedTable(schema, `sensor_data, `device_id)

// 4. 全量同步
def fullSync(conn, tableName) {
    print("开始全量同步: " + tableName)
    
    data = mysql::query(conn, "SELECT * FROM " + tableName)
    loadTable("dfs://sync_db", tableName).append!(data)
    
    print("同步完成: " + string(data.rows()) + " 条")
    logSync(tableName, tableName, data.rows(), "SUCCESS")
}

// 5. 增量同步
def incrementalSync(conn, tableName) {
    print("开始增量同步: " + tableName)
    
    lastTime = exec max(timestamp) from loadTable("dfs://sync_db", tableName)
    sql = "SELECT * FROM " + tableName + " WHERE timestamp > '" + string(lastTime) + "'"
    
    data = mysql::query(conn, sql)
    if (data.rows() > 0) {
        loadTable("dfs://sync_db", tableName).append!(data)
        print("增量同步: " + string(data.rows()) + " 条")
    }
    
    logSync(tableName, tableName, data.rows(), "SUCCESS")
}

// 6. 执行同步
fullSync(mysqlConn, "sensor_data")

// 7. 定时增量同步
scheduleJob("incremental_sync", "增量同步",
    def() { incrementalSync(mysqlConn, "sensor_data") },
    00:10, 2024.01.01, 2030.12.31, 'D')

print("MySQL到DolphinDB同步系统启动完成")

八、总结

本文详细介绍了DolphinDB数据库同步:

  1. 同步方案:全量同步、增量同步、实时同步
  2. MySQL同步:连接、全量、增量
  3. PostgreSQL同步:连接、全量、增量
  4. 数据转换:类型映射、数据清洗、数据验证
  5. 实时同步:Binlog、CDC
  6. 同步监控:状态表、监控函数

思考题

  1. 如何选择合适的同步方案?
  2. 如何保证数据同步的一致性?
  3. 如何处理同步失败问题?

参考资料