Oceanbase学习之一迁移mysql数据到oceanbase

一、数据库环境

#mysql环境

root@192.168.150.162 20:28: [(none)]> select version();

±----------+

| version() |

±----------+

| 8.0.26 |

±----------+

1 row in set (0.00 sec)

root@192.168.150.162 20:28: [(none)]> show variables like '%char%';

±-------------------------±----------------------------------+

| Variable_name | Value |

±-------------------------±----------------------------------+

| character_set_client | utf8mb4 |

| character_set_connection | utf8mb4 |

| character_set_database | utf8mb4 |

| character_set_filesystem | binary |

| character_set_results | utf8mb4 |

| character_set_server | utf8mb4 |

| character_set_system | utf8mb3 |

| character_sets_dir | /usr/local/mysql8/share/charsets/ |

±-------------------------±----------------------------------+

8 rows in set (0.00 sec)

#当前mysql环境下的表

#oceanbase环境

obclient [test]> select version();

±-----------------------------+

| version() |

±-----------------------------+

| 5.7.25-OceanBase_CE-v4.2.1.2 |

±-----------------------------+

1 row in set (0.002 sec)

obclient [test]> show variables like '%chara%';

±-------------------------±--------+

| Variable_name | Value |

±-------------------------±--------+

| character_set_client | utf8mb4 |

| character_set_connection | utf8mb4 |

| character_set_database | utf8mb4 |

| character_set_filesystem | binary |

| character_set_results | utf8mb4 |

| character_set_server | utf8mb4 |

| character_set_system | utf8mb4 |

±-------------------------±--------+

7 rows in set (0.005 sec)

确认mysql与oceanbase的字符集一样

二、mysqldump迁移数据到OceanBase

通过MySQL下的mysqldump将数据导出为SQL文本格式,将数据备份文件传输到OceanBase数据库主机后,通过source命令导入到OceanBase数据库。

#当前mysql下的表

MySQL [(none)]> use test;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

MySQL [test]> show tables;

±------------------+

| Tables_in_test |

±------------------+

| cluster_test |

| cluster_test1 |

| cluster_test2 |

| cluster_test3 |

| cluster_test4 |

| t1 |

| t2 |

| t8 |

| t_smallint |

| test_clustered |

| test_nonclustered |

±------------------+

11 rows in set (0.00 sec)

通过mysqldump导出数据

mysqldump -h 192.168.150.162 -uroot -P4000 -p --database test > test_oceanbase.sql

#传输脚本到oceanbase服务器

scp test_oceanbase.sql 192.168.150.116:/home/admin

#oceanbase导入

obclient [test]> source test_oceanbase.sql

obclient [test]> show tables;

±------------------+

| Tables_in_test |

±------------------+

| cluster_test |

| cluster_test1 |

| cluster_test2 |

| cluster_test3 |

| cluster_test4 |

| t1 |

| t2 |

| t8 |

| t_smallint |

| test_clustered |

| test_nonclustered |

±------------------+

11 rows in set (0.004 sec)

#抽查表和数据已经导入

obclient [test]> select * from big

-> ;

±---------------------±--------------------+

| id | id1 |

±---------------------±--------------------+

| 18446744073709551615 | 9223372036854775807 |

±---------------------±--------------------+

1 row in set (0.003 sec)

三、通过datax从MySQL离线导入数据到OceanBase

#datax部署安装

datax 下载地址:https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz

1、直接服务器上下载datax

wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz

2、解压datax

[admin@localhost ~]$ tar zxvf datax.tar.gz

3、安装java

yum install java

4、测试datax是否安装成功

[admin@localhost bin]$ python datax.py .../job/job.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !

Copyright © 2010-2017, Alibaba Group. All Rights Reserved.

2023-12-21 23:22:50.245 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN

2023-12-21 23:22:50.248 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]

2023-12-21 23:22:50.307 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl

2023-12-21 23:22:50.312 [main] INFO Engine - the machine info =>

osInfo: Red Hat, Inc. 1.8 25.392-b08

jvmInfo: Linux amd64 3.10.0-1160.el7.x86_64

cpu num: 8

totalPhysicalMemory: -0.00G

freePhysicalMemory: -0.00G

maxFileDescriptorCount: -1

currentOpenFileDescriptorCount: -1

GC Names [PS MarkSweep, PS Scavenge]

MEMORY_NAME | allocation_size | init_size

PS Eden Space | 256.00MB | 256.00MB

Code Cache | 240.00MB | 2.44MB

Compressed Class Space | 1,024.00MB | 0.00MB

PS Survivor Space | 42.50MB | 42.50MB

PS Old Gen | 683.00MB | 683.00MB

Metaspace | -0.00MB | 0.00MB

2023-12-21 23:22:50.329 [main] INFO Engine -

{

"content":[

{

"reader":{

"name":"streamreader",

"parameter":{

"column":[

{

"type":"string",

"value":"DataX"

},

{

"type":"long",

"value":19890604

},

{

"type":"date",

"value":"1989-06-04 00:00:00"

},

{

"type":"bool",

"value":true

},

{

"type":"bytes",

"value":"test"

}

],

"sliceRecordCount":100000

}

},

"writer":{

"name":"streamwriter",

"parameter":{

"encoding":"UTF-8",

"print":false

}

}

}

],

"setting":{

"errorLimit":{

"percentage":0.02,

"record":0

},

"speed":{

"channel":1

}

}

}

2023-12-21 23:22:50.348 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null

2023-12-21 23:22:50.361 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0

2023-12-21 23:22:50.361 [main] INFO JobContainer - DataX jobContainer starts job.

2023-12-21 23:22:50.365 [main] INFO JobContainer - Set jobId = 0

2023-12-21 23:22:50.402 [job-0] INFO JobContainer - jobContainer starts to do prepare ...

2023-12-21 23:22:50.402 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .

2023-12-21 23:22:50.402 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .

2023-12-21 23:22:50.402 [job-0] INFO JobContainer - jobContainer starts to do split ...

2023-12-21 23:22:50.403 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.

2023-12-21 23:22:50.403 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.

2023-12-21 23:22:50.403 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.

2023-12-21 23:22:50.421 [job-0] INFO JobContainer - jobContainer starts to do schedule ...

2023-12-21 23:22:50.426 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.

2023-12-21 23:22:50.429 [job-0] INFO JobContainer - Running by standalone Mode.

2023-12-21 23:22:50.438 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.

2023-12-21 23:22:50.442 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.

2023-12-21 23:22:50.442 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.

2023-12-21 23:22:50.468 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started

2023-12-21 23:22:50.789 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[335]ms

2023-12-21 23:22:50.790 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.

2023-12-21 23:23:00.452 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.046s | All Task WaitReaderTime 0.056s | Percentage 100.00%

2023-12-21 23:23:00.453 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.

2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.

2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work.

2023-12-21 23:23:00.453 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.

2023-12-21 23:23:00.454 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /home/admin/datax/hook

2023-12-21 23:23:00.455 [job-0] INFO JobContainer -

[total cpu info] =>

averageCpu | maxDeltaCpu | minDeltaCpu

-1.00% | -1.00% | -1.00%

[total gc info] =>

NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime

PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s

PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s

2023-12-21 23:23:00.455 [job-0] INFO JobContainer - PerfTrace not enable!

2023-12-21 23:23:00.456 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.046s | All Task WaitReaderTime 0.056s | Percentage 100.00%

2023-12-21 23:23:00.457 [job-0] INFO JobContainer -

任务启动时刻 : 2023-12-21 23:22:50

任务结束时刻 : 2023-12-21 23:23:00

任务总计耗时 : 10s

任务平均流量 : 253.91KB/s

记录写入速度 : 10000rec/s

读出记录总数 : 100000

读写失败总数 : 0

5创建datax-job的json

{

"job": {

"entry": {

"jvm": "-Xms1024m -Xmx1024m"

},

"setting": {

"speed": {

"channel": 4

},

"errorLimit": {

"record": 0,

"percentage": 0.1

}

},

"content": [{

"reader": {

"name": "mysqlreader",

"parameter": {

"username": "root",

"password": "oracle123",

"column": [

"*"

],

"connection": [{

"table": [

"Tab_A"

],

"jdbcUrl": ["jdbc:mysql://192.168.150.162:4000/test?useUnicode=true&characterEncoding=utf8&useSSL=false"]

}]

}

},

"writer": {

"name": "oceanbasev10writer",

"parameter": {

"obWriteMode": "insert",

"column": [

"*"

],

"preSql": [

"truncate table Tab_A"

],

"connection": [{

"jdbcUrl": "||dsc_ob10_dsc||obdemo:obmysql||dsc_ob10_dsc||jdbc:oceanbase://192.168.150.116:2883/test?useLocalSessionState=true&allowBatch=true&allowMultiQueries=true&rewriteBatchedStatements=true",

"table": [

"Tab_A"

]

}],

"username": "root",

"password": "oracle123",

"writerThreadCount": 10,

"batchSize": 1000,

"memstoreThreshold": "0.9"

}

}

}]

}

}

6、执行离线数据同步

源端数据:

MySQL [test]> select * from Tab_A;

±---±-----±-----±-----±-----±-----±-------+

| id | bid | cid | name | type | num | amt |

±---±-----±-----±-----±-----±-----±-------+

| 1 | 1 | 1 | A01 | 01 | 111 | 111.00 |

| 2 | 2 | 2 | A01 | 01 | 112 | 111.00 |

| 3 | 3 | 3 | A02 | 02 | 113 | 111.00 |

| 4 | 4 | 4 | A02 | 02 | 112 | 111.00 |

| 5 | 5 | 5 | A01 | 01 | 111 | 111.00 |

| 6 | 6 | 6 | A02 | 02 | 113 | 111.00 |

| 7 | 5 | 7 | A01 | 01 | 111 | 88.00 |

| 8 | 6 | 8 | A02 | 02 | 113 | 88.00 |

±---±-----±-----±-----±-----±-----±-------+

8 rows in set (0.26 sec)

目标数据:

obclient [test]> select * from Tab_A;

Empty set (0.133 sec)

执行同步:

python ./datax.py .../job/mysql2ob.json

2023-12-22 00:42:13.745 [job-0] INFO JobContainer - PerfTrace not enable!

2023-12-22 00:42:13.745 [job-0] INFO StandAloneJobContainerCommunicator - Total 8 records, 134 bytes | Speed 13B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.020s | All Task WaitReaderTime 0.000s | Percentage 100.00%

2023-12-22 00:42:13.747 [job-0] INFO JobContainer -

任务启动时刻 : 2023-12-22 00:41:54

任务结束时刻 : 2023-12-22 00:42:13

任务总计耗时 : 19s

任务平均流量 : 13B/s

记录写入速度 : 0rec/s

读出记录总数 : 8

读写失败总数 : 0

7、检查数据:

obclient [test]> select * from Tab_A;

±---±-----±-----±-----±-----±-----±-------+

| id | bid | cid | name | type | num | amt |

±---±-----±-----±-----±-----±-----±-------+

| 1 | 1 | 1 | A01 | 01 | 111 | 111.00 |

| 2 | 2 | 2 | A01 | 01 | 112 | 111.00 |

| 3 | 3 | 3 | A02 | 02 | 113 | 111.00 |

| 4 | 4 | 4 | A02 | 02 | 112 | 111.00 |

| 5 | 5 | 5 | A01 | 01 | 111 | 111.00 |

| 6 | 6 | 6 | A02 | 02 | 113 | 111.00 |

| 7 | 5 | 7 | A01 | 01 | 111 | 88.00 |

| 8 | 6 | 8 | A02 | 02 | 113 | 88.00 |

±---±-----±-----±-----±-----±-----±-------+

8 rows in set (0.002 sec)

相关推荐
OceanBase数据库官方博客1 小时前
OceanBase V4.3.3,首个面向实时分析场景的GA版本发布
oceanbase·分布式数据库·olap·产品特性
OceanBase数据库官方博客4 小时前
云集电商:如何通过 OceanBase 实现降本 87.5%|OceanBase案例
oceanbase·分布式数据库·实践经验·架构选型·大存储降本
OceanBase数据库官方博客1 天前
在OceanBase 中,实现自增列的4种方法
oceanbase·分布式数据库·运维管理
OceanBase数据库官方博客3 天前
OceanBase中,如何解读 obdiag 收集的火焰图 【DBA早下班系列】
oceanbase·分布式数据库·运维管理
OceanBase数据库官方博客8 天前
从1000s到10s,OceanBase 标量子查询的SQL优化实践
oceanbase·分布式数据库·sql优化
OceanBase数据库官方博客8 天前
客如云:大型业务报表的分区化改造提升性能|OceanBase 应用实践
数据分析·oceanbase·分布式数据库·实践经验
大数据在线8 天前
乘云而上,OceanBase再越山峰
阿里云·oceanbase·分布式数据库·aws·云数据库
OceanBase数据库官方博客10 天前
阳振坤:云时代数据库的思考 | OceanBase发布会实录
oceanbase·分布式数据库·年度发布会
RestCloud10 天前
OceanBase数据库结合ETLCloud快速实现数据集成
数据库·oceanbase·etl·分布式存储·数据集成·数据传输