【实战系列】DataX 是什么及使用场景

【实战】DataX 是什么及使用场景

介绍

DataX 是阿里巴巴集团开源的一个数据同步工具,用于实现不同数据源之间的数据同步和迁移。它提供了一个框架,通过插件的形式支持各种数据源,如 MySQL、Oracle、HDFS、HBase 等。DataX 的核心设计理念是"简单、可靠、高效",旨在解决大数据领域复杂的数据同步问题。

DataX 的特点

  1. 插件化:DataX采用插件化架构,可以方便地扩展支持更多的数据源。
  2. 高性能:DataX针对大量数据的传输做了优化,支持多线程并发读取、写入,以及数据的批量传输。
  3. 稳定性:DataX提供了完善的错误处理机制和重试机制,确保数据同步的可靠性。
  4. 易用性:DataX提供了简单易用的命令行工具和JSON配置文件,用户可以轻松地定义数据同步任务。

DataX 的使用场景

1. 数据迁移

当企业需要将数据从旧的数据存储系统迁移到新的数据存储系统时,DataX可以作为一个高效的数据迁移工具。通过配置不同的数据源插件,DataX可以方便地实现数据在不同系统之间的迁移。

2. 数据备份

DataX也可以用于数据的备份。企业可以定期使用DataX将数据从生产环境同步到备份环境,以确保数据的可靠性。同时,DataX的高性能和稳定性也保证了备份过程的高效和可靠。

3. 数据同步

DataX可以实现不同数据源之间的数据同步。例如,企业可能需要在MySQL数据库和HBase之间保持数据的实时同步,以支持在线分析和实时查询。DataX可以帮助企业实现这种跨数据源的数据同步。

4. 数据交换

在分布式系统中,不同系统之间可能需要共享数据。DataX可以作为一个数据交换工具,将数据从一个系统同步到另一个系统。通过配置不同的数据源插件,DataX可以支持各种类型的数据交换需求。

5. 数据集成

对于需要集成多个数据源进行大数据分析的场景,DataX也可以提供有力的支持。企业可以使用DataX将多个数据源的数据集成到一个统一的数据存储系统中,以便进行后续的数据分析和挖掘。

编译安装

克隆源码,进行编译即可,

bash 复制代码
git clone https://github.com/alibaba/DataX.git
cd DataX
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

编译过程中如果报错,一般都是插件相关的报错。可以把编译不通过的插件从 pom.xml 中注释即可。如 otsreaderotswriter。配置为:

xml 复制代码
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.alibaba.datax</groupId>
    <artifactId>datax-all</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>org.hamcrest</groupId>
            <artifactId>hamcrest-core</artifactId>
            <version>1.3</version>
        </dependency>
    </dependencies>

    <name>datax-all</name>
    <packaging>pom</packaging>

    <properties>
        <jdk-version>1.8</jdk-version>
        <datax-project-version>0.0.1-SNAPSHOT</datax-project-version>
        <commons-lang3-version>3.3.2</commons-lang3-version>
        <commons-configuration-version>1.10</commons-configuration-version>
        <commons-cli-version>1.2</commons-cli-version>
        <fastjson-version>1.1.46.sec01</fastjson-version>
        <guava-version>16.0.1</guava-version>
        <diamond.version>3.7.2.1-SNAPSHOT</diamond.version>

        <!--slf4j 1.7.10 和 logback-classic 1.0.13 是好基友 -->
        <slf4j-api-version>1.7.10</slf4j-api-version>
        <logback-classic-version>1.0.13</logback-classic-version>
        <commons-io-version>2.4</commons-io-version>
        <junit-version>4.11</junit-version>
        <tddl.version>5.1.22-1</tddl.version>
        <swift-version>1.0.0</swift-version>

        <project-sourceEncoding>UTF-8</project-sourceEncoding>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <maven.compiler.encoding>UTF-8</maven.compiler.encoding>
    </properties>

    <modules>
        <module>common</module>
        <module>core</module>
        <module>transformer</module>

        <!-- reader -->
        <module>mysqlreader</module>
        <module>drdsreader</module>
        <module>sqlserverreader</module>
        <module>postgresqlreader</module>
        <module>oraclereader</module>
        <module>odpsreader</module>
        <module>otsreader</module>
	<!--<module>otsstreamreader</module>-->
        <module>txtfilereader</module>
        <module>hdfsreader</module>
        <module>streamreader</module>
        <module>ossreader</module>
        <module>ftpreader</module>
        <module>mongodbreader</module>
        <module>rdbmsreader</module>
        <module>hbase11xreader</module>
        <module>hbase094xreader</module>
        <module>tsdbreader</module>
        <module>opentsdbreader</module>
        <module>cassandrareader</module>

        <!-- writer -->
        <module>mysqlwriter</module>
        <module>drdswriter</module>
        <module>odpswriter</module>
        <module>txtfilewriter</module>
        <module>ftpwriter</module>
        <module>hdfswriter</module>
        <module>streamwriter</module>
	<!--<module>otswriter</module>-->
        <module>oraclewriter</module>
        <module>sqlserverwriter</module>
        <module>postgresqlwriter</module>
        <module>osswriter</module>
        <module>mongodbwriter</module>
        <module>adswriter</module>
        <module>ocswriter</module>
        <module>rdbmswriter</module>
        <module>hbase11xwriter</module>
        <module>hbase094xwriter</module>
        <module>hbase11xsqlwriter</module>
        <module>hbase11xsqlreader</module>
        <module>elasticsearchwriter</module>
        <module>tsdbwriter</module>
        <module>adbpgwriter</module>
        <module>gdbwriter</module>
        <module>cassandrawriter</module>
        
        <!-- common support module -->
        <module>plugin-rdbms-util</module>
        <module>plugin-unstructured-storage-util</module>
        <module>hbase20xsqlreader</module>
        <module>hbase20xsqlwriter</module>
    </modules>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.apache.commons</groupId>
                <artifactId>commons-lang3</artifactId>
                <version>${commons-lang3-version}</version>
            </dependency>
            <dependency>
                <groupId>com.alibaba</groupId>
                <artifactId>fastjson</artifactId>
                <version>${fastjson-version}</version>
            </dependency>
            <!--<dependency>
                <groupId>com.google.guava</groupId>
                <artifactId>guava</artifactId>
                <version>${guava-version}</version>
            </dependency>-->
            <dependency>
                <groupId>commons-io</groupId>
                <artifactId>commons-io</artifactId>
                <version>${commons-io-version}</version>
            </dependency>
            <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-api</artifactId>
                <version>${slf4j-api-version}</version>
            </dependency>
            <dependency>
                <groupId>ch.qos.logback</groupId>
                <artifactId>logback-classic</artifactId>
                <version>${logback-classic-version}</version>
            </dependency>

            <dependency>
                <groupId>com.taobao.tddl</groupId>
                <artifactId>tddl-client</artifactId>
                <version>${tddl.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>com.google.guava</groupId>
                        <artifactId>guava</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>com.taobao.diamond</groupId>
                        <artifactId>diamond-client</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>

            <dependency>
                <groupId>com.taobao.diamond</groupId>
                <artifactId>diamond-client</artifactId>
                <version>${diamond.version}</version>
            </dependency>

            <dependency>
                <groupId>com.alibaba.search.swift</groupId>
                <artifactId>swift_client</artifactId>
                <version>${swift-version}</version>
            </dependency>

            <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <version>${junit-version}</version>
            </dependency>

            <dependency>
                <groupId>org.mockito</groupId>
                <artifactId>mockito-all</artifactId>
                <version>1.9.5</version>
                <scope>test</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <finalName>datax</finalName>
                    <descriptors>
                        <descriptor>package.xml</descriptor>
                    </descriptors>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>${jdk-version}</source>
                    <target>${jdk-version}</target>
                    <encoding>${project-sourceEncoding}</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

如果不想编译,可以下载已编译好的。命令如下:

bash 复制代码
wget -c "http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz"

解压后即可使用。其目录结构为:

bash 复制代码
# tree -L 2 datax
datax
├── bin   # 可执行文件
│   ├── datax.py
│   ├── dxprof.py
│   └── perftrace.py
├── conf  # 配置文件
│   ├── core.json
│   ├── logback.xml
│   ├── mysql01.json
│   ├── mysql01.json.splitPk
│   ├── mysql_writer.json
│   └── mysql_writer.json.bak
├── job
│   ├── job.json
│   └── stream2stream.json
├── lib
│   ├── commons-beanutils-1.9.2.jar
│   ├── commons-cli-1.2.jar
│   ├── commons-codec-1.9.jar
│   ├── commons-collections-3.2.1.jar
│   ├── commons-configuration-1.10.jar
│   ├── commons-io-2.4.jar
│   ├── commons-lang-2.6.jar
│   ├── commons-lang3-3.3.2.jar
│   ├── commons-logging-1.1.1.jar
│   ├── commons-math3-3.1.1.jar
│   ├── datax-common-0.0.1-SNAPSHOT.jar
│   ├── datax-core-0.0.1-SNAPSHOT.jar
│   ├── datax-transformer-0.0.1-SNAPSHOT.jar
│   ├── fastjson-1.1.46.sec01.jar
│   ├── fluent-hc-4.4.jar
│   ├── groovy-all-2.1.9.jar
│   ├── hamcrest-core-1.3.jar
│   ├── httpclient-4.4.jar
│   ├── httpcore-4.4.jar
│   ├── janino-2.5.16.jar
│   ├── logback-classic-1.0.13.jar
│   ├── logback-core-1.0.13.jar
│   └── slf4j-api-1.7.10.jar
├── log
│   ├── 2019-11-28
│   └── 2020-04-22
├── log_perf
│   ├── 2019-11-28
│   └── 2020-04-22
├── plugin
│   ├── reader
│   └── writer
├── script
│   └── Readme.md
└── tmp
    └── readme.txt

15 directories, 36 files

生成配置文件

创建配置文件(json格式):

bash 复制代码
# 通过命令生成配置模板
python datax.py -r oraclereader -w mysqlwriter > oracle2mysql2.json

# 开始同步
python datax.py ./oracle2mysql2.json

几个示例

读取 MySQL 并打印到标准输出

先看配置文件,其内容如下:

json 复制代码
{
    "job": {
        "setting": {
            "speed": {
                 "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "12345678",
                        "column": [
                            "id",
                            "name"
                        ],
                        "connection": [
                            {
                                "table": [
                                    "item"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://127.0.0.1:3306/it"
                                ]
                            }
                        ]
                    }
                },
               "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print":true
                    }
                }
            }
        ]
    }
}

运行程序:

bash 复制代码
[root@ip-192-168-2-56 datax]# python bin/datax.py conf/mysql01.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

2020-04-22 13:29:09.124 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-04-22 13:29:09.190 [main] INFO  Engine - the machine info  => 

	osInfo:	Oracle Corporation 1.8 25.121-b13
	jvmInfo:	Linux amd64 3.10.0-1062.9.1.el7.x86_64
	cpu num:	4

	totalPhysicalMemory:	-0.00G
	freePhysicalMemory:	-0.00G
	maxFileDescriptorCount:	-1
	currentOpenFileDescriptorCount:	-1

	GC Names	[PS MarkSweep, PS Scavenge]

	MEMORY_NAME                    | allocation_size                | init_size     
	PS Eden Space                  | 256.00MB                       | 256.00MB       
	Code Cache                     | 240.00MB                       | 2.44MB         
	Compressed Class Space         | 1,024.00MB                     | 0.00MB 
	PS Survivor Space              | 42.50MB                        | 42.50MB       
	PS Old Gen                     | 683.00MB                       | 683.00MB      
	Metaspace                      | -0.00MB                        | 0.00MB  

2020-04-22 13:29:09.211 [main] INFO  Engine - 
{
	"content":[
		{
			"reader":{
				"name":"mysqlreader",
				"parameter":{
					"column":[
						"id",
						"name"
					],
					"connection":[
						{
							"jdbcUrl":[
								"jdbc:mysql://127.0.0.1:3306/it"
							],
							"table":[
								"item"
							]
						}
					],
					"password":"************",
					"username":"root"
				}
			},
			"writer":{
				"name":"streamwriter",
				"parameter":{
					"print":true
				}
			}
		}
	],
	"setting":{
		"errorLimit":{
			"percentage":0.02,
			"record":0
		},
		"speed":{
			"channel":3
		}
	}
}

2020-04-22 13:29:09.235 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-04-22 13:29:09.237 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-04-22 13:29:09.238 [main] INFO  JobContainer - DataX jobContainer starts job.
2020-04-22 13:29:09.240 [main] INFO  JobContainer - Set jobId = 0
2020-04-22 13:29:09.949 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://127.0.0.1:3306/it?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2020-04-22 13:29:10.086 [job-0] INFO  OriginalConfPretreatmentUtil - table:[item] has columns:[id,avatar,dept,description,link,name,type].
2020-04-22 13:29:10.107 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2020-04-22 13:29:10.108 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2020-04-22 13:29:10.108 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2020-04-22 13:29:10.110 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2020-04-22 13:29:10.110 [job-0] INFO  JobContainer - Job set Channel-Number to 3 channels.
2020-04-22 13:29:10.115 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2020-04-22 13:29:10.116 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2020-04-22 13:29:10.144 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2020-04-22 13:29:10.149 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2020-04-22 13:29:10.152 [job-0] INFO  JobContainer - Running by standalone Mode.
2020-04-22 13:29:10.163 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-04-22 13:29:10.170 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-04-22 13:29:10.170 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2020-04-22 13:29:10.192 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-04-22 13:29:10.198 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from item 
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/it?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2020-04-22 13:29:10.233 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from item 
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/it?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
1	Confluence
2	Jira
3	Gitlab
4	XShell
2020-04-22 13:29:10.293 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[107]ms
2020-04-22 13:29:10.294 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-04-22 13:29:20.175 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 30 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.044s | Percentage 100.00%
2020-04-22 13:29:20.175 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2020-04-22 13:29:20.176 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2020-04-22 13:29:20.176 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2020-04-22 13:29:20.177 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2020-04-22 13:29:20.178 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /root/datax/hook
2020-04-22 13:29:20.180 [job-0] INFO  JobContainer - 
	 [total cpu info] => 
		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
		-1.00%                         | -1.00%                         | -1.00%
                        

	 [total gc info] => 
		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
		 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
		 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2020-04-22 13:29:20.180 [job-0] INFO  JobContainer - PerfTrace not enable!
2020-04-22 13:29:20.180 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 30 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.044s | Percentage 100.00%
2020-04-22 13:29:20.182 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2020-04-22 13:29:09
任务结束时刻                    : 2020-04-22 13:29:20
任务总计耗时                    :                 10s
任务平均流量                    :                3B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   4
读写失败总数                    :                   0

准备数据写入到 MySQL

先看数据库结构,

mysql 复制代码
MariaDB [(none)]> use test;

MariaDB [test]> show tables;
+----------------+
| Tables_in_test |
+----------------+
| test           |
+----------------+
1 row in set (0.00 sec)

MariaDB [test]> desc test;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(11)     | NO   | PRI | NULL    | auto_increment |
| name  | varchar(32) | YES  |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

接下来看看配置文件,

bash 复制代码
{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            }
        },
        "content": [
            {
                 "reader": {
                    "name": "streamreader",
                    "parameter": {
                        "column" : [
                            {
                                "value": "DataX", # 这里是要插入的数据的值
                                "type": "string"  # 数据类型
                            }
                        ],
                        "sliceRecordCount": 1000  # 插入多少条
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "replace",
                        "username": "root",
                        "password": "123456789",
                        "column": [
                            "name"   # 对应数据库的哪个字段
                        ],
                        "session": [
                        	"set session sql_mode='ANSI'"
                        ],
                        "preSql": [
                            "delete from test"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8",
                                "table": [
                                    "test"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}

执行一下看看,

bash 复制代码
# python bin/datax.py conf/mysql_writer.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2020-04-22 13:53:55.201 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-04-22 13:53:55.209 [main] INFO  Engine - the machine info  => 

	osInfo:	Oracle Corporation 1.8 25.121-b13
	jvmInfo:	Linux amd64 3.10.0-1062.9.1.el7.x86_64
	cpu num:	4

	totalPhysicalMemory:	-0.00G
	freePhysicalMemory:	-0.00G
	maxFileDescriptorCount:	-1
	currentOpenFileDescriptorCount:	-1

	GC Names	[PS MarkSweep, PS Scavenge]
	MEMORY_NAME                    | allocation_size                | init_size 
	PS Eden Space                  | 256.00MB                       | 256.00MB 
	Code Cache                     | 240.00MB                       | 2.44MB 
	Compressed Class Space         | 1,024.00MB                     | 0.00MB
	PS Survivor Space              | 42.50MB                        | 42.50MB
	PS Old Gen                     | 683.00MB                       | 683.00MB
	Metaspace                      | -0.00MB                        | 0.00MB 

2020-04-22 13:53:55.230 [main] INFO  Engine - 
{
	"content":[
		{
			"reader":{
				"name":"streamreader",
				"parameter":{
					"column":[
						{
							"type":"string",
							"value":"DataX"
						}
					],
					"sliceRecordCount":1000
				}
			},
			"writer":{
				"name":"mysqlwriter",
				"parameter":{
					"column":[
						"name"
					],
					"connection":[
						{
							"jdbcUrl":"jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8",
							"table":[
								"test"
							]
						}
					],
					"password":"************",
					"preSql":[
						"delete from test"
					],
					"session":[
						"set session sql_mode='ANSI'"
					],
					"username":"root",
					"writeMode":"replace"
				}
			}
		}
	],
	"setting":{
		"speed":{
			"channel":1
		}
	}
}

2020-04-22 13:53:55.249 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-04-22 13:53:55.251 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-04-22 13:53:55.251 [main] INFO  JobContainer - DataX jobContainer starts job.
2020-04-22 13:53:55.253 [main] INFO  JobContainer - Set jobId = 0
2020-04-22 13:53:55.642 [job-0] INFO  OriginalConfPretreatmentUtil - table:[test] all columns:[
id,name
].
2020-04-22 13:53:55.656 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
replace INTO %s (name) VALUES(?)
], which jdbcUrl like:[jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2020-04-22 13:53:55.657 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2020-04-22 13:53:55.658 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2020-04-22 13:53:55.659 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2020-04-22 13:53:55.667 [job-0] INFO  CommonRdbmsWriter$Job - Begin to execute preSqls:[delete from test]. context info:jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2020-04-22 13:53:55.732 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2020-04-22 13:53:55.732 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2020-04-22 13:53:55.733 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2020-04-22 13:53:55.734 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2020-04-22 13:53:55.754 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2020-04-22 13:53:55.759 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2020-04-22 13:53:55.761 [job-0] INFO  JobContainer - Running by standalone Mode.
2020-04-22 13:53:55.769 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-04-22 13:53:55.774 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-04-22 13:53:55.774 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2020-04-22 13:53:55.785 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-04-22 13:53:55.801 [0-0-0-writer] INFO  DBUtil - execute sql:[set session sql_mode='ANSI']
2020-04-22 13:53:55.808 [0-0-0-writer] INFO  DBUtil - execute sql:[set session sql_mode='ANSI']
2020-04-22 13:53:55.986 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[202]ms
2020-04-22 13:53:55.987 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-04-22 13:54:05.783 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1000 records, 5000 bytes | Speed 500B/s, 100 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.021s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-04-22 13:54:05.783 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2020-04-22 13:54:05.784 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2020-04-22 13:54:05.785 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2020-04-22 13:54:05.785 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2020-04-22 13:54:05.786 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /root/datax/hook
2020-04-22 13:54:05.788 [job-0] INFO  JobContainer - 
	 [total cpu info] => 
		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
		-1.00%                         | -1.00%                         | -1.00%
                        

	 [total gc info] => 
		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
		 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
		 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2020-04-22 13:54:05.788 [job-0] INFO  JobContainer - PerfTrace not enable!
2020-04-22 13:54:05.789 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1000 records, 5000 bytes | Speed 500B/s, 100 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.021s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-04-22 13:54:05.790 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2020-04-22 13:53:55
任务结束时刻                    : 2020-04-22 13:54:05
任务总计耗时                    :                 10s
任务平均流量                    :              500B/s
记录写入速度                    :            100rec/s
读出记录总数                    :                1000
读写失败总数                    :                   0

查看一下数据库中的数据,

mysql 复制代码
MariaDB [test]> select * from test limit 10;
+------+-------+
| id   | name  |
+------+-------+
| 4001 | DataX |
| 4002 | DataX |
| 4003 | DataX |
| 4004 | DataX |
| 4005 | DataX |
| 4006 | DataX |
| 4007 | DataX |
| 4008 | DataX |
| 4009 | DataX |
| 4010 | DataX |
+------+-------+
10 rows in set (0.00 sec)
相关推荐
听见~40 分钟前
SQL优化
数据库·sql
ROCKY_8171 小时前
Mysql复习(一)
数据库·mysql
夜光小兔纸1 小时前
oracle dblink 的创建及使用
数据库·oracle
WANGWUSAN661 小时前
Python高频写法总结!
java·linux·开发语言·数据库·经验分享·python·编程
Smile丶凉轩1 小时前
MySQL库的操作
数据库·mysql·oracle
我自飞扬临天下1 小时前
Mybatis-Plus快速入门
数据库·mybatis-plus
Marzlam1 小时前
sql server索引优化语句
开发语言·数据库
Zmxcl-0072 小时前
IIS解析漏洞
服务器·数据库·microsoft
明矾java2 小时前
Mysql-SQL执行流程解析
数据库·sql·mysql
蓬莱道人3 小时前
BenchmarkSQL使用教程
数据库