一、数据库环境
#mysql环境
[email protected] 20:28: [(none)]> select version();
±----------+
| version() |
±----------+
| 8.0.26 |
±----------+
1 row in set (0.00 sec)
[email protected] 20:28: [(none)]> show variables like '%char%';
±-------------------------±----------------------------------+
| Variable_name | Value |
±-------------------------±----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/local/mysql8/share/charsets/ |
±-------------------------±----------------------------------+
8 rows in set (0.00 sec)
#当前mysql环境下的表
#oceanbase环境
obclient [test]> select version();
±-----------------------------+
| version() |
±-----------------------------+
| 5.7.25-OceanBase_CE-v4.2.1.2 |
±-----------------------------+
1 row in set (0.002 sec)
obclient [test]> show variables like '%chara%';
±-------------------------±--------+
| Variable_name | Value |
±-------------------------±--------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb4 |
±-------------------------±--------+
7 rows in set (0.005 sec)
确认mysql与oceanbase的字符集一样
二、mysqldump迁移数据到OceanBase
通过MySQL下的mysqldump将数据导出为SQL文本格式,将数据备份文件传输到OceanBase数据库主机后,通过source命令导入到OceanBase数据库。
#当前mysql下的表
MySQL [(none)]> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MySQL [test]> show tables;
±------------------+
| Tables_in_test |
±------------------+
| cluster_test |
| cluster_test1 |
| cluster_test2 |
| cluster_test3 |
| cluster_test4 |
| t1 |
| t2 |
| t8 |
| t_smallint |
| test_clustered |
| test_nonclustered |
±------------------+
11 rows in set (0.00 sec)
通过mysqldump导出数据
mysqldump -h 192.168.150.162 -uroot -P4000 -p --database test > test_oceanbase.sql
#传输脚本到oceanbase服务器
scp test_oceanbase.sql 192.168.150.116:/home/admin
#oceanbase导入
obclient [test]> source test_oceanbase.sql
obclient [test]> show tables;
±------------------+
| Tables_in_test |
±------------------+
| cluster_test |
| cluster_test1 |
| cluster_test2 |
| cluster_test3 |
| cluster_test4 |
| t1 |
| t2 |
| t8 |
| t_smallint |
| test_clustered |
| test_nonclustered |
±------------------+
11 rows in set (0.004 sec)
#抽查表和数据已经导入
obclient [test]> select * from big
-> ;
±---------------------±--------------------+
| id | id1 |
±---------------------±--------------------+
| 18446744073709551615 | 9223372036854775807 |
±---------------------±--------------------+
1 row in set (0.003 sec)
三、通过datax从MySQL离线导入数据到OceanBase
#datax部署安装
datax 下载地址:https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz
1、直接服务器上下载datax
wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz
2、解压datax
admin@localhost \~\]$ tar zxvf datax.tar.gz **3、安装java** yum install java **4、测试datax是否安装成功** \[admin@localhost bin\]$ python datax.py .../job/job.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright © 2010-2017, Alibaba Group. All Rights Reserved. 2023-12-21 23:22:50.245 \[main\] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN 2023-12-21 23:22:50.248 \[main\] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo\[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null
2023-12-21 23:22:50.307 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-12-21 23:22:50.312 [main] INFO Engine - the machine info =>
osInfo: Red Hat, Inc. 1.8 25.392-b08
jvmInfo: Linux amd64 3.10.0-1160.el7.x86_64
cpu num: 8
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2023-12-21 23:22:50.329 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"streamreader",
"parameter":{
"column":[
{
"type":"string",
"value":"DataX"
},
{
"type":"long",
"value":19890604
},
{
"type":"date",
"value":"1989-06-04 00:00:00"
},
{
"type":"bool",
"value":true
},
{
"type":"bytes",
"value":"test"
}
],
"sliceRecordCount":100000
}
},
"writer":{
"name":"streamwriter",
"parameter":{
"encoding":"UTF-8",
"print":false
}
}
}
],
"setting":{
"errorLimit":{
"percentage":0.02,
"record":0
},
"speed":{
"channel":1
}
}
}
2023-12-21 23:22:50.348 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-12-21 23:22:50.361 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-12-21 23:22:50.361 [main] INFO JobContainer - DataX jobContainer starts job.
2023-12-21 23:22:50.365 [main] INFO JobContainer - Set jobId = 0
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2023-12-21 23:22:50.402 [job-0] INFO JobContainer - jobContainer starts to do split ...
2023-12-21 23:22:50.403 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
2023-12-21 23:22:50.403 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2023-12-21 23:22:50.403 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2023-12-21 23:22:50.421 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2023-12-21 23:22:50.426 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2023-12-21 23:22:50.429 [job-0] INFO JobContainer - Running by standalone Mode.
2023-12-21 23:22:50.438 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-12-21 23:22:50.442 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-12-21 23:22:50.442 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2023-12-21 23:22:50.468 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-12-21 23:22:50.789 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[335]ms
2023-12-21 23:22:50.790 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-12-21 23:23:00.452 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.046s | All Task WaitReaderTime 0.056s | Percentage 100.00%
2023-12-21 23:23:00.453 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.
2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work.
2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2023-12-21 23:23:00.454 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /home/admin/datax/hook
2023-12-21 23:23:00.455 [job-0] INFO JobContainer -
total cpu info\] =\> averageCpu \| maxDeltaCpu \| minDeltaCpu -1.00% \| -1.00% \| -1.00% \[total gc info\] =\> NAME \| totalGCCount \| maxDeltaGCCount \| minDeltaGCCount \| totalGCTime \| maxDeltaGCTime \| minDeltaGCTime PS MarkSweep \| 0 \| 0 \| 0 \| 0.000s \| 0.000s \| 0.000s PS Scavenge \| 0 \| 0 \| 0 \| 0.000s \| 0.000s \| 0.000s 2023-12-21 23:23:00.455 \[job-0\] INFO JobContainer - PerfTrace not enable! 2023-12-21 23:23:00.456 \[job-0\] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes \| Speed 253.91KB/s, 10000 records/s \| Error 0 records, 0 bytes \| All Task WaitWriterTime 0.046s \| All Task WaitReaderTime 0.056s \| Percentage 100.00% 2023-12-21 23:23:00.457 \[job-0\] INFO JobContainer - 任务启动时刻 : 2023-12-21 23:22:50 任务结束时刻 : 2023-12-21 23:23:00 任务总计耗时 : 10s 任务平均流量 : 253.91KB/s 记录写入速度 : 10000rec/s 读出记录总数 : 100000 读写失败总数 : 0 **5** 、**创建datax-job的json** **{** **"job": {** **"entry": {** **"jvm": "-Xms1024m -Xmx1024m"** **},** **"setting": {** **"speed": {** **"channel": 4** **},** **"errorLimit": {** **"record": 0,** **"percentage": 0.1** **}** **},** **"content": \[{** **"reader": {** **"name": "mysqlreader",** **"parameter": {** **"username": "root",** **"password": "oracle123",** **"column": \[** **"\*"** **\],** **"connection": \[{** **"table": \[** **"Tab_A"** **\],** **"jdbcUrl": \["jdbc:mysql://192.168.150.162:4000/test?useUnicode=true\&characterEncoding=utf8\&useSSL=false"\]** **}\]** **}** **},** **"writer": {** **"name": "oceanbasev10writer",** **"parameter": {** **"obWriteMode": "insert",** **"column": \[** **"\*"** **\],** **"preSql": \[** **"truncate table Tab_A"** **\],** **"connection": \[{** **"jdbcUrl": "\|\|_dsc_ob10_dsc_\|\|obdemo:obmysql\|\|_dsc_ob10_dsc_\|\|jdbc:oceanbase://192.168.150.116:2883/test?useLocalSessionState=true\&allowBatch=true\&allowMultiQueries=true\&rewriteBatchedStatements=true",** **"table": \[** **"Tab_A"** **\]** **}\],** **"username": "root",** **"password": "oracle123",** **"writerThreadCount": 10,** **"batchSize": 1000,** **"memstoreThreshold": "0.9"** **}** **}** **}\]** **}** **}** **6、执行离线数据同步** **源端数据:** MySQL \[test\]\> select \* from Tab_A; ±---±-----±-----±-----±-----±-----±-------+ \| id \| bid \| cid \| name \| type \| num \| amt \| ±---±-----±-----±-----±-----±-----±-------+ \| 1 \| 1 \| 1 \| A01 \| 01 \| 111 \| 111.00 \| \| 2 \| 2 \| 2 \| A01 \| 01 \| 112 \| 111.00 \| \| 3 \| 3 \| 3 \| A02 \| 02 \| 113 \| 111.00 \| \| 4 \| 4 \| 4 \| A02 \| 02 \| 112 \| 111.00 \| \| 5 \| 5 \| 5 \| A01 \| 01 \| 111 \| 111.00 \| \| 6 \| 6 \| 6 \| A02 \| 02 \| 113 \| 111.00 \| \| 7 \| 5 \| 7 \| A01 \| 01 \| 111 \| 88.00 \| \| 8 \| 6 \| 8 \| A02 \| 02 \| 113 \| 88.00 \| ±---±-----±-----±-----±-----±-----±-------+ 8 rows in set (0.26 sec) 目标数据: obclient \[test\]\> select \* from Tab_A; Empty set (0.133 sec) 执行同步: python ./datax.py .../job/mysql2ob.json 2023-12-22 00:42:13.745 \[job-0\] INFO JobContainer - PerfTrace not enable! 2023-12-22 00:42:13.745 \[job-0\] INFO StandAloneJobContainerCommunicator - Total 8 records, 134 bytes \| Speed 13B/s, 0 records/s \| Error 0 records, 0 bytes \| All Task WaitWriterTime 0.020s \| All Task WaitReaderTime 0.000s \| Percentage 100.00% 2023-12-22 00:42:13.747 \[job-0\] INFO JobContainer - 任务启动时刻 : 2023-12-22 00:41:54 任务结束时刻 : 2023-12-22 00:42:13 任务总计耗时 : 19s 任务平均流量 : 13B/s 记录写入速度 : 0rec/s 读出记录总数 : 8 读写失败总数 : 0 7、检查数据: obclient \[test\]\> select \* from Tab_A; ±---±-----±-----±-----±-----±-----±-------+ \| id \| bid \| cid \| name \| type \| num \| amt \| ±---±-----±-----±-----±-----±-----±-------+ \| 1 \| 1 \| 1 \| A01 \| 01 \| 111 \| 111.00 \| \| 2 \| 2 \| 2 \| A01 \| 01 \| 112 \| 111.00 \| \| 3 \| 3 \| 3 \| A02 \| 02 \| 113 \| 111.00 \| \| 4 \| 4 \| 4 \| A02 \| 02 \| 112 \| 111.00 \| \| 5 \| 5 \| 5 \| A01 \| 01 \| 111 \| 111.00 \| \| 6 \| 6 \| 6 \| A02 \| 02 \| 113 \| 111.00 \| \| 7 \| 5 \| 7 \| A01 \| 01 \| 111 \| 88.00 \| \| 8 \| 6 \| 8 \| A02 \| 02 \| 113 \| 88.00 \| ±---±-----±-----±-----±-----±-----±-------+ 8 rows in set (0.002 sec)