OceanBase 4.3.3 功能解析:列存副本

OceanBase 从4.3.0 版本开始,引入了列式存储的支持。用户可以根据业务的具体需求,选择创建列存表、行存表或是行列混存表。无论选择哪种表类型,在不同的Zone内,租户使用的副本模式都是一致的。详见官网文档: https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001429675

为了达成TP与AP资源在物理层面上的严格隔离,OceanBase 4.3.3.0版本引入了一种创新的部署模式:它允许在原有集群的基础上,增设独立的zone来专门存储列存副本(简称C副本)。但在4.3.3.0和4.3.3.1这两个版本中,列存副本功能被界定为实验性质,因此并不推荐在生产环境中应用。

副本类型的说明详见官网文档:

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001431874

副本类型 选举投票 日志投票 sstable clog memtable 副本类型转换
F 参与 参与 有,major为行存sstable 可以转为R副本
R 不参与 不参与 有,major为行存sstable 可以转为F副本
C 不参与 不参与 有,major为列存sstable 不能转为其他副本

创建列存副本前的环境

复制代码
# 集群拓扑
MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP         | SVR_PORT | ID | ZONE  | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME         | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME                | MODIFY_TIME                | BUILD_VERSION                                                                             | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 |    12882 |  1 | zone1 |    12881 | YES             | ACTIVE | 2024-11-04 10:27:09.942001 | NULL      | NULL                  | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.191 |    22882 |  2 | zone2 |    22881 | NO              | ACTIVE | 2024-11-04 10:28:31.472704 | NULL      | NULL                  | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    32882 |  3 | zone3 |    32881 | NO              | ACTIVE | 2024-11-04 10:29:29.111769 | NULL      | NULL                  | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
3 rows in set (0.01 sec)


# 模拟已有的租户
create resource unit u1 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;

create resource pool p1_1 unit='u1',zone_list=('zone1'),unit_num=1;
create resource pool p1_2 unit='u1',zone_list=('zone2'),unit_num=1;
create resource pool p1_3 unit='u1',zone_list=('zone3'),unit_num=1;

create tenant test1 resource_pool_list=('p1_1','p1_2','p1_3'),
primary_zone='zone1,zone2,zone3',locality='F@zone1, F@zone2, F@zone3',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';

mysql -h127.0.0.1  -P12881 -uroot@test1 -p -A
alter user root identified by 'xxx';

扩展 zone4 供列存副本使用

参考 obd 集群扩容: https://www.oceanbase.com/docs/community-obd-cn-1000000001477803

复制代码
oceanbase-ce:
  servers:
    - name: server4
      ip: 11.xxx.xxx.192
  server4:
    zone: zone4
    obshell_port: 45881
    mysql_port: 42881
    rpc_port: 42882
    local_ip: 11.xxx.xxx.192
    home_path: /home/heshun.lxd/observer4
    data_dir: /obdata/data/data4
    redo_dir: /obdata/log/log4

obd cluster scale_out ob433 -c ob433_scale_out_zone4.yaml -v

扩容后的集群拓扑

复制代码
MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP         | SVR_PORT | ID | ZONE  | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME         | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME                | MODIFY_TIME                | BUILD_VERSION                                                                             | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 |    12882 |  1 | zone1 |    12881 | YES             | ACTIVE | 2024-11-04 10:27:09.942001 | NULL      | NULL                  | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.191 |    22882 |  2 | zone2 |    22881 | NO              | ACTIVE | 2024-11-04 10:28:31.472704 | NULL      | NULL                  | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    32882 |  3 | zone3 |    32881 | NO              | ACTIVE | 2024-11-04 10:29:29.111769 | NULL      | NULL                  | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    42882 |  4 | zone4 |    42881 | NO              | ACTIVE | 2024-11-04 11:48:24.538274 | NULL      | NULL                  | 2024-11-04 11:09:44.030541 | 2024-11-04 11:48:26.306543 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
4 rows in set (0.00 sec)

给已有的租户扩列存副本

1、扩容前租户副本分布

复制代码
MySQL [oceanbase]>  select tenant_id,tenant_name,primary_zone,locality  from dba_ob_tenants where tenant_type='user';
+-----------+-------------+-------------------+---------------------------------------------+
| tenant_id | tenant_name | primary_zone      | locality                                    |
+-----------+-------------+-------------------+---------------------------------------------+
|      1010 | test1       | zone1,zone2,zone3 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 |
+-----------+-------------+-------------------+---------------------------------------------+
1 row in set (0.03 sec)

2、在增加副本之前,需要确认租户在目标 zone 上是否有资源池,并记录好当前该租户在各 zone 上的资源池名。

复制代码
MySQL [oceanbase]> select * from dba_ob_resource_pools where tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1');
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME                | MODIFY_TIME                | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
|             1008 | p1_1 |      1010 | 2024-11-04 11:01:36.377693 | 2024-11-04 11:02:00.918615 |          1 |           1004 | zone1     | FULL         |
|             1009 | p1_2 |      1010 | 2024-11-04 11:01:36.395700 | 2024-11-04 11:02:01.221993 |          1 |           1004 | zone2     | FULL         |
|             1010 | p1_3 |      1010 | 2024-11-04 11:01:36.410597 | 2024-11-04 11:02:01.224139 |          1 |           1004 | zone3     | FULL         |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
3 rows in set (0.02 sec)

3、确认各 resource pool 使用的 unit ,和 dba_ob_resource_pools 的 unit_config_id 进行关联

复制代码
MySQL [oceanbase]> select * from dba_ob_unit_configs;
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
| UNIT_CONFIG_ID | NAME            | CREATE_TIME                | MODIFY_TIME                | MAX_CPU | MIN_CPU | MEMORY_SIZE | LOG_DISK_SIZE | DATA_DISK_SIZE | MAX_IOPS            | MIN_IOPS            | IOPS_WEIGHT | MAX_NET_BANDWIDTH   | NET_BANDWIDTH_WEIGHT |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
|              1 | sys_unit_config | 2024-10-22 20:07:12.701353 | 2024-10-22 20:07:12.701353 |       2 |       2 |  2147483648 |    3221225472 |           NULL | 9223372036854775807 | 9223372036854775807 |           2 | 9223372036854775807 |                    2 |
|           1004 | u1              | 2024-11-04 11:01:30.256177 | 2024-11-04 11:01:30.256177 |       3 |       3 |  4294967296 |   12884901888 |           NULL |               10000 |               10000 |           0 | 9223372036854775807 |                    3 |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
2 rows in set (0.01 sec)

4、给 test1 租户在 zone4 上创建 resource pool

复制代码
create resource pool p1_4 unit='u1' ,unit_num=1,zone_list=('zone4');

5、修改 test1 租户的 resource_pool_list

复制代码
alter tenant test1 resource_pool_list=('p1_1','p1_2','p1_3','p1_4');

6、修改 test1 租户的 locality

复制代码
alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4';

7、确认 test1 租户 locality 修改情况

复制代码
select * from dba_ob_tenant_jobs  
where job_type='alter_tenant_locality' 
and tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1')
order by start_time desc limit 1 \G
*************************** 1. row ***************************
     JOB_ID: 2
   JOB_TYPE: ALTER_TENANT_LOCALITY
 JOB_STATUS: SUCCESS
RESULT_CODE: 0
   PROGRESS: 100
 START_TIME: 2024-11-04 12:01:55.851907
MODIFY_TIME: 2024-11-04 12:02:26.819124
  TENANT_ID: 1010
   SQL_TEXT: alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4'
 EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3', TO: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3, COLUMNSTORE{1}@zone4'
  RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
1 row in set (0.02 sec)

新建租户时创建列存副本

复制代码
create resource unit u2 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;

create resource pool p2_1 unit='u2',zone_list=('zone1'),unit_num=1;
create resource pool p2_2 unit='u2',zone_list=('zone2'),unit_num=1;
create resource pool p2_3 unit='u2',zone_list=('zone3'),unit_num=1;
create resource pool p2_4 unit='u2',zone_list=('zone4'),unit_num=1;

create tenant test2 
resource_pool_list=('p2_1','p2_2','p2_3','p2_4'),
primary_zone='zone1,zone2,zone3;zone4',
locality='F@zone1, F@zone2, F@zone3, C@zone4',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';

mysql -h127.0.0.1  -P12881 -uroot@test2 -p -A
alter user root identified by 'xxx';

配置 obproxy

使用 root@proxysys 登录对应的 obproxy

独占的 obproxy

给列存副本单独创建一个 obproxy 并登录后进行如下配置

复制代码
alter proxyconfig set obproxy_read_consistency='1';
alter proxyconfig set init_sql = 'set @@ob_route_policy="COLUMN_STORE_ONLY";';
共享的 obproxy

没有独立的机器资源供列存副本使用,需要复用已有的 obproxy环境,此时可以设置 obproxy 多级配置,关于 obproxy 的多级配置可以详见 官网文档:

https://www.oceanbase.com/docs/common-odp-doc-cn-1000000001409917

复制代码
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');

replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');

访问列存副本测试

使用如上配置的 obproxy 登录测试

复制代码
# sys 租户
MySQL [oceanbase]> select zone,tenant_id,name,value,default_value from gv$ob_parameters where tenant_id=1010 and name='default_table_store_format';
+-------+-----------+----------------------------+-------+---------------+
| zone  | tenant_id | name                       | value | default_value |
+-------+-----------+----------------------------+-------+---------------+
| zone1 |      1010 | default_table_store_format | row   | row           |
| zone4 |      1010 | default_table_store_format | row   | row           |
| zone3 |      1010 | default_table_store_format | row   | row           |
| zone2 |      1010 | default_table_store_format | row   | row           |
+-------+-----------+----------------------------+-------+---------------+
4 rows in set (0.03 sec)

# test1 租户
MySQL [test]> show create table t1 \G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL
) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0
 partition by hash(id)
(partition `p0`,
partition `p1`,
partition `p2`)
1 row in set (0.01 sec)


MySQL [test]> explain select * from t1;
+----------------------------------------------------------------------+
| Query Plan                                                           |
+----------------------------------------------------------------------+
| ================================================================     |
| |ID|OPERATOR                    |NAME    |EST.ROWS|EST.TIME(us)|     |
| ----------------------------------------------------------------     |
| |0 |PX COORDINATOR              |        |1       |7           |     |
| |1 |└─EXCHANGE OUT DISTR        |:EX10000|1       |7           |     |
| |2 |  └─PX PARTITION ITERATOR   |        |1       |7           |     |
| |3 |    └─COLUMN TABLE FULL SCAN|t1      |1       |7           |     |
| ================================================================     |
| Outputs & filters:                                                   |
| -------------------------------------                                |
|   0 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16     |
|   1 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16     |
|       dop=1                                                          |
|   2 - output([t1.id]), filter(nil), rowset=16                        |
|       force partition granule                                        |
|   3 - output([t1.id]), filter(nil), rowset=16                        |
|       access([t1.id]), partitions(p[0-2])                            |
|       is_index_back=false, is_global_index=false,                    |
|       range_key([t1.__pk_increment]), range(MIN ; MAX)always true    |
+----------------------------------------------------------------------+
19 rows in set (0.01 sec)
  • 表结构没有 with column group ,default_table_store_format 是默认的行存,执行计划展示上显示 COLUMN TABLE FULL SCAN,说明使用到了列存的范围扫描。
  • 这里的测试表 t1 是在 test1 租户下测试的,该租户的拓扑 3F-1C ,有4个副本,但是在 show create table 和 show create tenant 结果中 replica_num都等于3,使用的是全功能副本的数量。

注意事项

1、observer 需要 4.3.3.0 及其之上的版本。

2、ocp 需要 4.3.3 及其之上的版本(当前还没有发布ocp 4.3.3)。

3、obd 需要 2.10.1-1 及其之上的版本。

4、obproxy 需要 4.3.2 及其之上的版本。

5、不建议部署 2 个及以上数目的列存副本。

6、全功能和只读副本不支持转为列存副本,列存副本也不支持转为全功能和只读副本。

7、物理恢复不支持恢复列存副本。

8、如果主库未部署列存副本,备库也不建议部署列存副本。

9、列存表是指表的分区 Leader & Follower 的 Schema 均为列存格式,查询可以是强读;

列存副本是在保证表的分区 Leader & Follower 的 Schema 为行存格式的前提下,只读副本 Learner 为列存格式,并且 OLAP 的查询只能是弱读。

其他详见官网文档:

列存副本

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001428590

相关推荐
冰 河3 天前
《Mycat核心技术》第21章:高可用负载均衡集群的实现(HAProxy + Keepalived + Mycat)
分布式·微服务·程序员·分布式数据库·mycat
韩曙亮6 天前
【系统架构设计师】数据库系统 ② ( 分布式数据库 | 分布式数据库 特点 | 分布式数据库 分层模式 | 两阶段提交协议 - 2PC 协议 )
数据库·分布式·系统架构·分布式数据库·软考·dbms·两阶段提交协议
ActionTech7 天前
ChatDBA VS DeepSeek:快速诊断 OceanBase 集群新租户数据同步异常
oceanbase·deepseek·chatdba·爱可生
码农老起8 天前
从Oracle到OceanBase数据库迁移:全方位技术解析
数据库·oracle·oceanbase
OceanBase数据库官方博客8 天前
数据文件误删除,OceanBase中如何重建受影响的节点
oceanbase·分布式数据库·运维管理·实践经验
码农老起12 天前
OceanBase数据库基于脚本的分布式存储层性能深度优化
数据库·分布式·oceanbase
码农老起12 天前
万亿级数据量的OceanBase应用从JVM到协议栈立体化改造实现性能调优
jvm·oceanbase
OceanBase数据库官方博客13 天前
OceanBase 读写分离最佳实践
oceanbase·分布式数据库·读写分离·最佳实践
OceanBase数据库官方博客15 天前
网易云信架构升级实践,故障恢复时间缩至8秒
oceanbase·分布式数据库·架构选型·布道师计划
OceanBase数据库官方博客17 天前
自然语言秒转SQL—— 免费体验 OB Cloud Text2SQL 数据查询
数据库·sql·ai·oceanbase·分布式数据库·向量·text2sql