【实战】DataX 是什么及使用场景
介绍
DataX 是阿里巴巴集团开源的一个数据同步工具,用于实现不同数据源之间的数据同步和迁移。它提供了一个框架,通过插件的形式支持各种数据源,如 MySQL、Oracle、HDFS、HBase 等。DataX 的核心设计理念是"简单、可靠、高效",旨在解决大数据领域复杂的数据同步问题。
DataX 的特点
- 插件化:DataX采用插件化架构,可以方便地扩展支持更多的数据源。
- 高性能:DataX针对大量数据的传输做了优化,支持多线程并发读取、写入,以及数据的批量传输。
- 稳定性:DataX提供了完善的错误处理机制和重试机制,确保数据同步的可靠性。
- 易用性:DataX提供了简单易用的命令行工具和JSON配置文件,用户可以轻松地定义数据同步任务。
DataX 的使用场景
1. 数据迁移
当企业需要将数据从旧的数据存储系统迁移到新的数据存储系统时,DataX可以作为一个高效的数据迁移工具。通过配置不同的数据源插件,DataX可以方便地实现数据在不同系统之间的迁移。
2. 数据备份
DataX也可以用于数据的备份。企业可以定期使用DataX将数据从生产环境同步到备份环境,以确保数据的可靠性。同时,DataX的高性能和稳定性也保证了备份过程的高效和可靠。
3. 数据同步
DataX可以实现不同数据源之间的数据同步。例如,企业可能需要在MySQL数据库和HBase之间保持数据的实时同步,以支持在线分析和实时查询。DataX可以帮助企业实现这种跨数据源的数据同步。
4. 数据交换
在分布式系统中,不同系统之间可能需要共享数据。DataX可以作为一个数据交换工具,将数据从一个系统同步到另一个系统。通过配置不同的数据源插件,DataX可以支持各种类型的数据交换需求。
5. 数据集成
对于需要集成多个数据源进行大数据分析的场景,DataX也可以提供有力的支持。企业可以使用DataX将多个数据源的数据集成到一个统一的数据存储系统中,以便进行后续的数据分析和挖掘。
编译安装
克隆源码,进行编译即可,
bash
git clone https://github.com/alibaba/DataX.git
cd DataX
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
编译过程中如果报错,一般都是插件相关的报错。可以把编译不通过的插件从 pom.xml
中注释即可。如 otsreader
及 otswriter
。配置为:
xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-all</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-core</artifactId>
<version>1.3</version>
</dependency>
</dependencies>
<name>datax-all</name>
<packaging>pom</packaging>
<properties>
<jdk-version>1.8</jdk-version>
<datax-project-version>0.0.1-SNAPSHOT</datax-project-version>
<commons-lang3-version>3.3.2</commons-lang3-version>
<commons-configuration-version>1.10</commons-configuration-version>
<commons-cli-version>1.2</commons-cli-version>
<fastjson-version>1.1.46.sec01</fastjson-version>
<guava-version>16.0.1</guava-version>
<diamond.version>3.7.2.1-SNAPSHOT</diamond.version>
<!--slf4j 1.7.10 和 logback-classic 1.0.13 是好基友 -->
<slf4j-api-version>1.7.10</slf4j-api-version>
<logback-classic-version>1.0.13</logback-classic-version>
<commons-io-version>2.4</commons-io-version>
<junit-version>4.11</junit-version>
<tddl.version>5.1.22-1</tddl.version>
<swift-version>1.0.0</swift-version>
<project-sourceEncoding>UTF-8</project-sourceEncoding>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<maven.compiler.encoding>UTF-8</maven.compiler.encoding>
</properties>
<modules>
<module>common</module>
<module>core</module>
<module>transformer</module>
<!-- reader -->
<module>mysqlreader</module>
<module>drdsreader</module>
<module>sqlserverreader</module>
<module>postgresqlreader</module>
<module>oraclereader</module>
<module>odpsreader</module>
<module>otsreader</module>
<!--<module>otsstreamreader</module>-->
<module>txtfilereader</module>
<module>hdfsreader</module>
<module>streamreader</module>
<module>ossreader</module>
<module>ftpreader</module>
<module>mongodbreader</module>
<module>rdbmsreader</module>
<module>hbase11xreader</module>
<module>hbase094xreader</module>
<module>tsdbreader</module>
<module>opentsdbreader</module>
<module>cassandrareader</module>
<!-- writer -->
<module>mysqlwriter</module>
<module>drdswriter</module>
<module>odpswriter</module>
<module>txtfilewriter</module>
<module>ftpwriter</module>
<module>hdfswriter</module>
<module>streamwriter</module>
<!--<module>otswriter</module>-->
<module>oraclewriter</module>
<module>sqlserverwriter</module>
<module>postgresqlwriter</module>
<module>osswriter</module>
<module>mongodbwriter</module>
<module>adswriter</module>
<module>ocswriter</module>
<module>rdbmswriter</module>
<module>hbase11xwriter</module>
<module>hbase094xwriter</module>
<module>hbase11xsqlwriter</module>
<module>hbase11xsqlreader</module>
<module>elasticsearchwriter</module>
<module>tsdbwriter</module>
<module>adbpgwriter</module>
<module>gdbwriter</module>
<module>cassandrawriter</module>
<!-- common support module -->
<module>plugin-rdbms-util</module>
<module>plugin-unstructured-storage-util</module>
<module>hbase20xsqlreader</module>
<module>hbase20xsqlwriter</module>
</modules>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>${commons-lang3-version}</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>${fastjson-version}</version>
</dependency>
<!--<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava-version}</version>
</dependency>-->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>${commons-io-version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j-api-version}</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>${logback-classic-version}</version>
</dependency>
<dependency>
<groupId>com.taobao.tddl</groupId>
<artifactId>tddl-client</artifactId>
<version>${tddl.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<groupId>com.taobao.diamond</groupId>
<artifactId>diamond-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.taobao.diamond</groupId>
<artifactId>diamond-client</artifactId>
<version>${diamond.version}</version>
</dependency>
<dependency>
<groupId>com.alibaba.search.swift</groupId>
<artifactId>swift_client</artifactId>
<version>${swift-version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit-version}</version>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.9.5</version>
<scope>test</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<finalName>datax</finalName>
<descriptors>
<descriptor>package.xml</descriptor>
</descriptors>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>${jdk-version}</source>
<target>${jdk-version}</target>
<encoding>${project-sourceEncoding}</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
如果不想编译,可以下载已编译好的。命令如下:
bash
wget -c "http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz"
解压后即可使用。其目录结构为:
bash
# tree -L 2 datax
datax
├── bin # 可执行文件
│ ├── datax.py
│ ├── dxprof.py
│ └── perftrace.py
├── conf # 配置文件
│ ├── core.json
│ ├── logback.xml
│ ├── mysql01.json
│ ├── mysql01.json.splitPk
│ ├── mysql_writer.json
│ └── mysql_writer.json.bak
├── job
│ ├── job.json
│ └── stream2stream.json
├── lib
│ ├── commons-beanutils-1.9.2.jar
│ ├── commons-cli-1.2.jar
│ ├── commons-codec-1.9.jar
│ ├── commons-collections-3.2.1.jar
│ ├── commons-configuration-1.10.jar
│ ├── commons-io-2.4.jar
│ ├── commons-lang-2.6.jar
│ ├── commons-lang3-3.3.2.jar
│ ├── commons-logging-1.1.1.jar
│ ├── commons-math3-3.1.1.jar
│ ├── datax-common-0.0.1-SNAPSHOT.jar
│ ├── datax-core-0.0.1-SNAPSHOT.jar
│ ├── datax-transformer-0.0.1-SNAPSHOT.jar
│ ├── fastjson-1.1.46.sec01.jar
│ ├── fluent-hc-4.4.jar
│ ├── groovy-all-2.1.9.jar
│ ├── hamcrest-core-1.3.jar
│ ├── httpclient-4.4.jar
│ ├── httpcore-4.4.jar
│ ├── janino-2.5.16.jar
│ ├── logback-classic-1.0.13.jar
│ ├── logback-core-1.0.13.jar
│ └── slf4j-api-1.7.10.jar
├── log
│ ├── 2019-11-28
│ └── 2020-04-22
├── log_perf
│ ├── 2019-11-28
│ └── 2020-04-22
├── plugin
│ ├── reader
│ └── writer
├── script
│ └── Readme.md
└── tmp
└── readme.txt
15 directories, 36 files
生成配置文件
创建配置文件(json格式):
bash
# 通过命令生成配置模板
python datax.py -r oraclereader -w mysqlwriter > oracle2mysql2.json
# 开始同步
python datax.py ./oracle2mysql2.json
几个示例
读取 MySQL 并打印到标准输出
先看配置文件,其内容如下:
json
{
"job": {
"setting": {
"speed": {
"channel": 3
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "12345678",
"column": [
"id",
"name"
],
"connection": [
{
"table": [
"item"
],
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/it"
]
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print":true
}
}
}
]
}
}
运行程序:
bash
[root@ip-192-168-2-56 datax]# python bin/datax.py conf/mysql01.json
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2020-04-22 13:29:09.124 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-04-22 13:29:09.190 [main] INFO Engine - the machine info =>
osInfo: Oracle Corporation 1.8 25.121-b13
jvmInfo: Linux amd64 3.10.0-1062.9.1.el7.x86_64
cpu num: 4
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2020-04-22 13:29:09.211 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"column":[
"id",
"name"
],
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://127.0.0.1:3306/it"
],
"table":[
"item"
]
}
],
"password":"************",
"username":"root"
}
},
"writer":{
"name":"streamwriter",
"parameter":{
"print":true
}
}
}
],
"setting":{
"errorLimit":{
"percentage":0.02,
"record":0
},
"speed":{
"channel":3
}
}
}
2020-04-22 13:29:09.235 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-04-22 13:29:09.237 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-04-22 13:29:09.238 [main] INFO JobContainer - DataX jobContainer starts job.
2020-04-22 13:29:09.240 [main] INFO JobContainer - Set jobId = 0
2020-04-22 13:29:09.949 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://127.0.0.1:3306/it?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2020-04-22 13:29:10.086 [job-0] INFO OriginalConfPretreatmentUtil - table:[item] has columns:[id,avatar,dept,description,link,name,type].
2020-04-22 13:29:10.107 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2020-04-22 13:29:10.108 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2020-04-22 13:29:10.108 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2020-04-22 13:29:10.110 [job-0] INFO JobContainer - jobContainer starts to do split ...
2020-04-22 13:29:10.110 [job-0] INFO JobContainer - Job set Channel-Number to 3 channels.
2020-04-22 13:29:10.115 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2020-04-22 13:29:10.116 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2020-04-22 13:29:10.144 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2020-04-22 13:29:10.149 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2020-04-22 13:29:10.152 [job-0] INFO JobContainer - Running by standalone Mode.
2020-04-22 13:29:10.163 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-04-22 13:29:10.170 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-04-22 13:29:10.170 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2020-04-22 13:29:10.192 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-04-22 13:29:10.198 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from item
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/it?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2020-04-22 13:29:10.233 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from item
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/it?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
1 Confluence
2 Jira
3 Gitlab
4 XShell
2020-04-22 13:29:10.293 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[107]ms
2020-04-22 13:29:10.294 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-04-22 13:29:20.175 [job-0] INFO StandAloneJobContainerCommunicator - Total 4 records, 30 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.044s | Percentage 100.00%
2020-04-22 13:29:20.175 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2020-04-22 13:29:20.176 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.
2020-04-22 13:29:20.176 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
2020-04-22 13:29:20.177 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2020-04-22 13:29:20.178 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /root/datax/hook
2020-04-22 13:29:20.180 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
2020-04-22 13:29:20.180 [job-0] INFO JobContainer - PerfTrace not enable!
2020-04-22 13:29:20.180 [job-0] INFO StandAloneJobContainerCommunicator - Total 4 records, 30 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.044s | Percentage 100.00%
2020-04-22 13:29:20.182 [job-0] INFO JobContainer -
任务启动时刻 : 2020-04-22 13:29:09
任务结束时刻 : 2020-04-22 13:29:20
任务总计耗时 : 10s
任务平均流量 : 3B/s
记录写入速度 : 0rec/s
读出记录总数 : 4
读写失败总数 : 0
准备数据写入到 MySQL
先看数据库结构,
mysql
MariaDB [(none)]> use test;
MariaDB [test]> show tables;
+----------------+
| Tables_in_test |
+----------------+
| test |
+----------------+
1 row in set (0.00 sec)
MariaDB [test]> desc test;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(32) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
接下来看看配置文件,
bash
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column" : [
{
"value": "DataX", # 这里是要插入的数据的值
"type": "string" # 数据类型
}
],
"sliceRecordCount": 1000 # 插入多少条
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "replace",
"username": "root",
"password": "123456789",
"column": [
"name" # 对应数据库的哪个字段
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from test"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8",
"table": [
"test"
]
}
]
}
}
}
]
}
}
执行一下看看,
bash
# python bin/datax.py conf/mysql_writer.json
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2020-04-22 13:53:55.201 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-04-22 13:53:55.209 [main] INFO Engine - the machine info =>
osInfo: Oracle Corporation 1.8 25.121-b13
jvmInfo: Linux amd64 3.10.0-1062.9.1.el7.x86_64
cpu num: 4
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2020-04-22 13:53:55.230 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"streamreader",
"parameter":{
"column":[
{
"type":"string",
"value":"DataX"
}
],
"sliceRecordCount":1000
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"name"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8",
"table":[
"test"
]
}
],
"password":"************",
"preSql":[
"delete from test"
],
"session":[
"set session sql_mode='ANSI'"
],
"username":"root",
"writeMode":"replace"
}
}
}
],
"setting":{
"speed":{
"channel":1
}
}
}
2020-04-22 13:53:55.249 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-04-22 13:53:55.251 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-04-22 13:53:55.251 [main] INFO JobContainer - DataX jobContainer starts job.
2020-04-22 13:53:55.253 [main] INFO JobContainer - Set jobId = 0
2020-04-22 13:53:55.642 [job-0] INFO OriginalConfPretreatmentUtil - table:[test] all columns:[
id,name
].
2020-04-22 13:53:55.656 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
replace INTO %s (name) VALUES(?)
], which jdbcUrl like:[jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2020-04-22 13:53:55.657 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2020-04-22 13:53:55.658 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .
2020-04-22 13:53:55.659 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2020-04-22 13:53:55.667 [job-0] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[delete from test]. context info:jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2020-04-22 13:53:55.732 [job-0] INFO JobContainer - jobContainer starts to do split ...
2020-04-22 13:53:55.732 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
2020-04-22 13:53:55.733 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2020-04-22 13:53:55.734 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2020-04-22 13:53:55.754 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2020-04-22 13:53:55.759 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2020-04-22 13:53:55.761 [job-0] INFO JobContainer - Running by standalone Mode.
2020-04-22 13:53:55.769 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-04-22 13:53:55.774 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-04-22 13:53:55.774 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2020-04-22 13:53:55.785 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-04-22 13:53:55.801 [0-0-0-writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
2020-04-22 13:53:55.808 [0-0-0-writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
2020-04-22 13:53:55.986 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[202]ms
2020-04-22 13:53:55.987 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-04-22 13:54:05.783 [job-0] INFO StandAloneJobContainerCommunicator - Total 1000 records, 5000 bytes | Speed 500B/s, 100 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.021s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-04-22 13:54:05.783 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2020-04-22 13:54:05.784 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2020-04-22 13:54:05.785 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work.
2020-04-22 13:54:05.785 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2020-04-22 13:54:05.786 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /root/datax/hook
2020-04-22 13:54:05.788 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
2020-04-22 13:54:05.788 [job-0] INFO JobContainer - PerfTrace not enable!
2020-04-22 13:54:05.789 [job-0] INFO StandAloneJobContainerCommunicator - Total 1000 records, 5000 bytes | Speed 500B/s, 100 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.021s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-04-22 13:54:05.790 [job-0] INFO JobContainer -
任务启动时刻 : 2020-04-22 13:53:55
任务结束时刻 : 2020-04-22 13:54:05
任务总计耗时 : 10s
任务平均流量 : 500B/s
记录写入速度 : 100rec/s
读出记录总数 : 1000
读写失败总数 : 0
查看一下数据库中的数据,
mysql
MariaDB [test]> select * from test limit 10;
+------+-------+
| id | name |
+------+-------+
| 4001 | DataX |
| 4002 | DataX |
| 4003 | DataX |
| 4004 | DataX |
| 4005 | DataX |
| 4006 | DataX |
| 4007 | DataX |
| 4008 | DataX |
| 4009 | DataX |
| 4010 | DataX |
+------+-------+
10 rows in set (0.00 sec)