DataX(二):DataX安装与入门

1. 官方地址

下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

源码地址:GitHub - alibaba/DataX: DataX是阿里云DataWorks数据集成的开源版本。

2. 前置要求

  • Linux

  • JDK(1.8 以上,推荐 1.8)

  • Python(推荐 Python2.6.X)

3. 安装

1)将下载好的 datax.tar.gz 上传到 hadoop102 的/opt/software

2)解压 datax.tar.gz 到/opt/module

[xxds@hadoop102 ~]$ tar -zxvf datax.tar.gz -C /opt/module/

3)运行自检脚本

[xxds@hadoop102 datax]$ cd bin/
[xxds@hadoop102 bin]$ pwd
[xxds@hadoop102 bin]$  python datax.py /opt/module/datax/job/job.json
  1. 输出如下数据
复制代码
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
​
​
2022-01-21 20:53:59.460 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2022-01-21 20:53:59.526 [main] INFO  Engine - the machine info  => 
​
        osInfo: Oracle Corporation 1.8 25.161-b12
        jvmInfo:        Linux amd64 3.10.0-1160.el7.x86_64
        cpu num:        1
​
        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1
​
        GC Names        [Copy, MarkSweepCompact]
​
        MEMORY_NAME                    | allocation_size                | init_size                      
        Eden Space                     | 273.06MB                       | 273.06MB                       
        Code Cache                     | 240.00MB                       | 2.44MB                         
        Survivor Space                 | 34.13MB                        | 34.13MB                        
        Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
        Metaspace                      | -0.00MB                        | 0.00MB                         
        Tenured Gen                    | 682.69MB                       | 682.69MB                       
​
​
2022-01-21 20:53:59.640 [main] INFO  Engine - 
{
        "content":[
                {
                        "reader":{
                                "name":"streamreader",
                                "parameter":{
                                        "column":[
                                                {
                                                        "type":"string",
                                                        "value":"DataX"
                                                },
                                                {
                                                        "type":"long",
                                                        "value":19890604
                                                },
                                                {
                                                        "type":"date",
                                                        "value":"1989-06-04 00:00:00"
                                                },
                                                {
                                                        "type":"bool",
                                                        "value":true
                                                },
                                                {
                                                        "type":"bytes",
                                                        "value":"test"
                                                }
                                        ],
                                        "sliceRecordCount":100000
                                }
                        },
                        "writer":{
                                "name":"streamwriter",
                                "parameter":{
                                        "encoding":"UTF-8",
                                        "print":false
                                }
                        }
                }
        ],
        "setting":{
                "errorLimit":{
                        "percentage":0.02,
                        "record":0
                },
                "speed":{
                        "byte":10485760
                }
        }
}
​
2022-01-21 20:53:59.733 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2022-01-21 20:53:59.742 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-01-21 20:53:59.743 [main] INFO  JobContainer - DataX jobContainer starts job.
2022-01-21 20:53:59.752 [main] INFO  JobContainer - Set jobId = 0
2022-01-21 20:53:59.972 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2022-01-21 20:53:59.979 [job-0] INFO  JobContainer - Running by standalone Mode.
2022-01-21 20:54:00.077 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-01-21 20:54:00.111 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-01-21 20:54:00.112 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2022-01-21 20:54:00.208 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2022-01-21 20:54:00.528 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[339]ms
2022-01-21 20:54:00.529 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2022-01-21 20:54:10.150 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.194s |  All Task WaitReaderTime 0.263s | Percentage 100.00%
2022-01-21 20:54:10.151 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2022-01-21 20:54:10.156 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2022-01-21 20:54:10.158 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2022-01-21 20:54:10.159 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2022-01-21 20:54:10.164 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/module/datax/hook
2022-01-21 20:54:10.196 [job-0] INFO  JobContainer - 
         [total cpu info] => 
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
                -1.00%                         | -1.00%                         | -1.00%
                        
​
         [total gc info] => 
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
                 Copy                 | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
                 MarkSweepCompact     | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
​
2022-01-21 20:54:10.197 [job-0] INFO  JobContainer - PerfTrace not enable!
2022-01-21 20:54:10.200 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.194s |  All Task WaitReaderTime 0.263s | Percentage 100.00%
2022-01-21 20:54:10.223 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2022-01-21 20:53:59
任务结束时刻                    : 2022-01-21 20:54:10
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0
相关推荐
夜泉_ly1 小时前
MySQL -安装与初识
数据库·mysql
qq_529835352 小时前
对计算机中缓存的理解和使用Redis作为缓存
数据库·redis·缓存
月光水岸New4 小时前
Ubuntu 中建的mysql数据库使用Navicat for MySQL连接不上
数据库·mysql·ubuntu
狄加山6754 小时前
数据库基础1
数据库
我爱松子鱼4 小时前
mysql之规则优化器RBO
数据库·mysql
chengooooooo5 小时前
苍穹外卖day8 地址上传 用户下单 订单支付
java·服务器·数据库
Rverdoser6 小时前
【SQL】多表查询案例
数据库·sql
Galeoto6 小时前
how to export a table in sqlite, and import into another
数据库·sqlite
人间打气筒(Ada)6 小时前
MySQL主从架构
服务器·数据库·mysql
leegong231116 小时前
学习PostgreSQL专家认证
数据库·学习·postgresql