window10本地运行datax与datax-web

搭建 dataX

前置条件

下载 datax 编译好的包

https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz

进入目录,使用 powershell 打开

执行解压命令

plain 复制代码
tar -xzf datax.tar.gz

得到

配置 windows 命令行支持 utf-8

不配置这个命令,执行同步命令会出现中文乱码

先把日志"乱码"变成可读中文,最简单两步即可:

  1. 临时改当前窗口编码
    打开 cmd,先执行
plain 复制代码
chcp 65001

再运行

plain 复制代码
python datax.py C:\Users\longz\Downloads\1.json

日志就会以 UTF-8 输出,中文正常显示。

  1. 永久生效(可选)
    在 Windows 10/11 上可以把系统全局控制台编码改成 UTF-8:
    • 设置 → 时间和语言 → 区域 → 相关设置管理语言设置更改系统区域设置 → 勾选 "Beta: 使用 Unicode UTF-8 提供全球语言支持" → 重启。
      之后所有 cmd/PowerShell 窗口默认就是 UTF-8,不会再出现方块或乱码。

或者直接在控制面板找

把如下的复选框勾上

重启

准备测试数据

新建两个库,相同的表名,user

plain 复制代码
CREATE TABLE IF NOT EXISTS `user` (
  `id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  `username` VARCHAR(50) NOT NULL,
  `password` VARCHAR(100) NOT NULL,
  `email` VARCHAR(100) NOT NULL,
  `created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

在 test1 库,新增测试数据。

我这里用存储过程,直接调用即可生成测试数据。

sql 复制代码
DELIMITER $$

CREATE PROCEDURE GenerateUserTestData(IN num_records INT)
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE v_username VARCHAR(50);
  DECLARE v_password VARCHAR(100);
  DECLARE v_email VARCHAR(100);

  WHILE i <= num_records DO
    -- 生成随机用户名(如 User_12345)
    SET v_username = CONCAT('User_', FLOOR(10000 + RAND() * 90000));
    
    -- 生成随机密码(简化示例,实际应加密)
    SET v_password = CONCAT('Pass_', FLOOR(100000 + RAND() * 900000));
    
    -- 生成随机邮箱(如 user12345@example.com)
    SET v_email = CONCAT('user', FLOOR(10000 + RAND() * 90000), '@example.com');
    
    INSERT INTO `user` (username, password, email)
    VALUES (v_username, v_password, v_email);
    
    -- 每1000条提交一次,避免事务过大
    IF i % 1000 = 0 THEN
      COMMIT;
    END IF;
    
    SET i = i + 1;
  END WHILE;
END$$

DELIMITER ;

生成完成后,目标,使用 datax 同步 test1 库的数据到 test2 库

新建 datax 的同步脚本

json 复制代码
{
  "job": {
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "root",
            "connection": [
              {
                "jdbcUrl": ["jdbc:mysql://localhost:3306/test1"],
                "table": ["user"]
              }
            ],
            "column": ["*"],
            "splitPk": ""
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "root",
            "password": "root",
            "connection": [
              {
                "jdbcUrl": "jdbc:mysql://localhost:3306/test2",
                "table": ["user"]
              }
            ],
            "column": ["*"],
            "writeMode": "insert"
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}

执行同步命令

bash 复制代码
python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.json

执行完成

bash 复制代码
D:\Information_Technology\worksapce_tool\datax-new\datax\bin>python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2025-08-08 14:00:36.524 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2025-08-08 14:00:36.526 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2025-08-08 14:00:36.536 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2025-08-08 14:00:36.541 [main] INFO  Engine - the machine info  =>

        osInfo: Windows 10 amd64 10.0
        jvmInfo:        Oracle Corporation 1.8 25.341-b10
        cpu num:        16

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2025-08-08 14:00:36.552 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "username":"root",
                                        "password":"****",
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://localhost:3306/test1"
                                                        ],
                                                        "table":[
                                                                "user"
                                                        ]
                                                }
                                        ],
                                        "column":[
                                                "*"
                                        ],
                                        "splitPk":""
                                }
                        },
                        "writer":{
                                "name":"mysqlwriter",
                                "parameter":{
                                        "username":"root",
                                        "password":"****",
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:mysql://localhost:3306/test2",
                                                        "table":[
                                                                "user"
                                                        ]
                                                }
                                        ],
                                        "column":[
                                                "*"
                                        ],
                                        "writeMode":"insert"
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":1
                }
        }
}

2025-08-08 14:00:36.565 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2025-08-08 14:00:36.566 [main] INFO  JobContainer - DataX jobContainer starts job.
2025-08-08 14:00:36.566 [main] INFO  JobContainer - Set jobId = 0
Fri Aug 08 14:00:36 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:42.909 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2025-08-08 14:00:42.910 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.091 [job-0] INFO  OriginalConfPretreatmentUtil - table:[user] all columns:[
id,username,password,email,created_at
].
2025-08-08 14:00:43.091 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2025-08-08 14:00:43.092 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,username,password,email,created_at) VALUES(?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://localhost:3306/test2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2025-08-08 14:00:43.093 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2025-08-08 14:00:43.093 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2025-08-08 14:00:43.093 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2025-08-08 14:00:43.094 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2025-08-08 14:00:43.094 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2025-08-08 14:00:43.097 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2025-08-08 14:00:43.097 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2025-08-08 14:00:43.115 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2025-08-08 14:00:43.117 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2025-08-08 14:00:43.119 [job-0] INFO  JobContainer - Running by standalone Mode.
2025-08-08 14:00:43.124 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2025-08-08 14:00:43.126 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2025-08-08 14:00:43.127 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2025-08-08 14:00:43.136 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2025-08-08 14:00:43.140 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.179 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2025-08-08 14:00:43.456 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[321]ms
2025-08-08 14:00:43.456 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2025-08-08 14:00:53.142 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.142 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2025-08-08 14:00:53.142 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2025-08-08 14:00:53.143 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2025-08-08 14:00:53.143 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2025-08-08 14:00:53.144 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\Information_Technology\worksapce_tool\datax-new\datax\hook
2025-08-08 14:00:53.144 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                 
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 1                  | 1                  | 1                  | 0.016s             | 0.016s             | 0.016s
                 PS Scavenge          | 1                  | 1                  | 1                  | 0.008s             | 0.008s             | 0.008s

2025-08-08 14:00:53.144 [job-0] INFO  JobContainer - PerfTrace not enable!
2025-08-08 14:00:53.145 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.146 [job-0] INFO  JobContainer -
任务启动时刻                    : 2025-08-08 14:00:36
任务结束时刻                    : 2025-08-08 14:00:53
任务总计耗时                    :                 16s
任务平均流量                    :              519B/s
记录写入速度                    :             10rec/s
读出记录总数                    :                 100
读写失败总数                    :                   0

搭建 datax-web

下载源码

sql 复制代码
git clone https://github.com/WeiYe-Jing/datax-web.git

创建数据库

执行bin/db下面的datax_web.sql文件(注意老版本更新语句有指定库名)

修改项目配置

1.修改datax_admin下resources/application.yml文件

plain 复制代码
#数据源
  datasource:
    username: root
    password: root
    url: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8
    driver-class-name: com.mysql.jdbc.Driver

修改数据源配置,目前仅支持mysql

plain 复制代码
# 配置mybatis-plus打印sql日志
logging:
  level:
    com.wugui.datax.admin.mapper: error
  path: ./data/applogs/admin

修改日志路径path

plain 复制代码
  # datax-web email
  mail:
    host: smtp.qq.com
    port: 25
    username: xxx@qq.com
    password: xxx
    properties:
      mail:
        smtp:
          auth: true
          starttls:
            enable: true
            required: true
        socketFactory:
          class: javax.net.ssl.SSLSocketFactory

修改邮件发送配置(不需要可以不修改)

2.修改datax_executor下resources/application.yml文件

plain 复制代码
# log config
logging:
  config: classpath:logback.xml
  path: ./data/applogs/executor/jobhandler

修改日志路径path

plain 复制代码
datax:
  job:
    admin:
      ### datax-web admin address
      addresses: http://127.0.0.1:8080
    executor:
      appname: datax-executor
      ip:
      port: 9999
      ### job log path
      logpath: ./data/applogs/executor/jobhandler
      ### job log retention days
      logretentiondays: 30
  executor:
    jsonpath: D:\\Information_Technology\\worksapce_tool\\tmp\\executor\\json\\

  pypath: D:\\Information_Technology\\worksapce_tool\\datax-new\\datax\\bin\\datax.py

修改datax.job配置

  • admin.addresses datax_admin部署地址,如调度中心集群部署存在多个地址则用逗号分隔,执行器将会使用该地址进行"执行器心跳注册"和"任务结果回调";
  • executor.appname 执行器AppName,每个执行器机器集群的唯一标示,执行器心跳注册分组依据;
  • executor.ip 默认为空表示自动获取IP,多网卡时可手动设置指定IP,该IP不会绑定Host仅作为通讯实用;地址信息用于 "执行器注册" 和 "调度中心请求并触发任务";
  • executor.port 执行器Server端口号,默认端口为9999,单机部署多个执行器时,注意要配置不同执行器端口;
  • executor.logpath 执行器运行日志文件存储磁盘路径,需要对该路径拥有读写权限;
  • executor.logretentiondays 执行器日志文件保存天数,过期日志自动清理, 限制值大于等于3时生效; 否则, 如-1, 关闭自动清理功能;
  • executor.jsonpath datax json临时文件保存路径
  • pypath DataX启动脚本地址,例如:xxx/datax/bin/datax.py(这个路径是上面搭建 dataX 已经创建好的启动脚本)
    如果系统配置DataX环境变量(DATAX_HOME),logpath、jsonpath、pypath可不配,log文件和临时json存放在环境变量路径下。

启动项目

本地idea开发环境
  • 1.运行datax_admin下 DataXAdminApplication
  • 2.运行datax_executor下 DataXExecutorApplication

admin启动成功后日志会输出三个地址,两个接口文档地址,一个前端页面地址

启动成功

启动成功后打开页面(默认管理员用户名:admin 密码:123456)
http://localhost:8080/index.html#/dashboard

实战

项目管理-添加项目
配置数据源
创建执行器
构建任务生成同步 json

点构建,会自动生成同步 datax 脚本,复制 json,回到菜单

创建任务

创建任务,按照如下填写,并把 json 贴过来,并把 byte 的值改为 0

执行成功
相关推荐
Hx__1 小时前
Redis对象编码
数据库·redis·缓存
运维帮手大橙子2 小时前
完整的登陆学生管理系统(配置数据库)
java·前端·数据库·eclipse·intellij-idea
0wioiw03 小时前
Redis(④-消息队列削峰)
数据库·redis·缓存
Runing_WoNiu3 小时前
Mysql与Ooracle 索引失效场景对比
数据库·mysql·oracle
beijingliushao3 小时前
32-Hive SQL DML语法之查询数据
数据库·hive·sql
JIngJaneIL3 小时前
专利服务系统平台|个人专利服务系统|基于java和小程序的专利服务系统设计与实现(源码+数据库+文档)
java·数据库·小程序·论文·毕设·专利服务系统平台
__风__4 小时前
windows 上编译PostgreSQL
数据库·postgresql
木木子99994 小时前
数据库范式
数据库
涛思数据(TDengine)4 小时前
通过最严时序标准,再登产业图谱榜首,TDengine 时序数据库在可信数据库大会荣获双荣誉
大数据·数据库·时序数据库·tdengine·涛思数据
涛思数据(TDengine)4 小时前
新客户 | TDengine 时序数据库是怎么在钢厂“撬动”PI 的?
大数据·运维·数据库·时序数据库·tdengine