OceanBase 4.3.3 功能解析:列存副本

OceanBase 从4.3.0 版本开始,引入了列式存储的支持。用户可以根据业务的具体需求,选择创建列存表、行存表或是行列混存表。无论选择哪种表类型,在不同的Zone内,租户使用的副本模式都是一致的。详见官网文档: https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001429675

为了达成TP与AP资源在物理层面上的严格隔离,OceanBase 4.3.3.0版本引入了一种创新的部署模式:它允许在原有集群的基础上,增设独立的zone来专门存储列存副本(简称C副本)。但在4.3.3.0和4.3.3.1这两个版本中,列存副本功能被界定为实验性质,因此并不推荐在生产环境中应用。

副本类型的说明详见官网文档:

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001431874

副本类型 选举投票 日志投票 sstable clog memtable 副本类型转换
F 参与 参与 有,major为行存sstable 可以转为R副本
R 不参与 不参与 有,major为行存sstable 可以转为F副本
C 不参与 不参与 有,major为列存sstable 不能转为其他副本

创建列存副本前的环境

# 集群拓扑
MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP         | SVR_PORT | ID | ZONE  | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME         | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME                | MODIFY_TIME                | BUILD_VERSION                                                                             | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 |    12882 |  1 | zone1 |    12881 | YES             | ACTIVE | 2024-11-04 10:27:09.942001 | NULL      | NULL                  | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.191 |    22882 |  2 | zone2 |    22881 | NO              | ACTIVE | 2024-11-04 10:28:31.472704 | NULL      | NULL                  | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    32882 |  3 | zone3 |    32881 | NO              | ACTIVE | 2024-11-04 10:29:29.111769 | NULL      | NULL                  | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
3 rows in set (0.01 sec)


# 模拟已有的租户
create resource unit u1 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;

create resource pool p1_1 unit='u1',zone_list=('zone1'),unit_num=1;
create resource pool p1_2 unit='u1',zone_list=('zone2'),unit_num=1;
create resource pool p1_3 unit='u1',zone_list=('zone3'),unit_num=1;

create tenant test1 resource_pool_list=('p1_1','p1_2','p1_3'),
primary_zone='zone1,zone2,zone3',locality='F@zone1, F@zone2, F@zone3',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';

mysql -h127.0.0.1  -P12881 -uroot@test1 -p -A
alter user root identified by 'xxx';

扩展 zone4 供列存副本使用

参考 obd 集群扩容: https://www.oceanbase.com/docs/community-obd-cn-1000000001477803

oceanbase-ce:
  servers:
    - name: server4
      ip: 11.xxx.xxx.192
  server4:
    zone: zone4
    obshell_port: 45881
    mysql_port: 42881
    rpc_port: 42882
    local_ip: 11.xxx.xxx.192
    home_path: /home/heshun.lxd/observer4
    data_dir: /obdata/data/data4
    redo_dir: /obdata/log/log4

obd cluster scale_out ob433 -c ob433_scale_out_zone4.yaml -v

扩容后的集群拓扑

MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP         | SVR_PORT | ID | ZONE  | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME         | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME                | MODIFY_TIME                | BUILD_VERSION                                                                             | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 |    12882 |  1 | zone1 |    12881 | YES             | ACTIVE | 2024-11-04 10:27:09.942001 | NULL      | NULL                  | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.191 |    22882 |  2 | zone2 |    22881 | NO              | ACTIVE | 2024-11-04 10:28:31.472704 | NULL      | NULL                  | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    32882 |  3 | zone3 |    32881 | NO              | ACTIVE | 2024-11-04 10:29:29.111769 | NULL      | NULL                  | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    42882 |  4 | zone4 |    42881 | NO              | ACTIVE | 2024-11-04 11:48:24.538274 | NULL      | NULL                  | 2024-11-04 11:09:44.030541 | 2024-11-04 11:48:26.306543 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
4 rows in set (0.00 sec)

给已有的租户扩列存副本

1、扩容前租户副本分布

MySQL [oceanbase]>  select tenant_id,tenant_name,primary_zone,locality  from dba_ob_tenants where tenant_type='user';
+-----------+-------------+-------------------+---------------------------------------------+
| tenant_id | tenant_name | primary_zone      | locality                                    |
+-----------+-------------+-------------------+---------------------------------------------+
|      1010 | test1       | zone1,zone2,zone3 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 |
+-----------+-------------+-------------------+---------------------------------------------+
1 row in set (0.03 sec)

2、在增加副本之前,需要确认租户在目标 zone 上是否有资源池,并记录好当前该租户在各 zone 上的资源池名。

MySQL [oceanbase]> select * from dba_ob_resource_pools where tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1');
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME                | MODIFY_TIME                | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
|             1008 | p1_1 |      1010 | 2024-11-04 11:01:36.377693 | 2024-11-04 11:02:00.918615 |          1 |           1004 | zone1     | FULL         |
|             1009 | p1_2 |      1010 | 2024-11-04 11:01:36.395700 | 2024-11-04 11:02:01.221993 |          1 |           1004 | zone2     | FULL         |
|             1010 | p1_3 |      1010 | 2024-11-04 11:01:36.410597 | 2024-11-04 11:02:01.224139 |          1 |           1004 | zone3     | FULL         |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
3 rows in set (0.02 sec)

3、确认各 resource pool 使用的 unit ,和 dba_ob_resource_pools 的 unit_config_id 进行关联

MySQL [oceanbase]> select * from dba_ob_unit_configs;
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
| UNIT_CONFIG_ID | NAME            | CREATE_TIME                | MODIFY_TIME                | MAX_CPU | MIN_CPU | MEMORY_SIZE | LOG_DISK_SIZE | DATA_DISK_SIZE | MAX_IOPS            | MIN_IOPS            | IOPS_WEIGHT | MAX_NET_BANDWIDTH   | NET_BANDWIDTH_WEIGHT |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
|              1 | sys_unit_config | 2024-10-22 20:07:12.701353 | 2024-10-22 20:07:12.701353 |       2 |       2 |  2147483648 |    3221225472 |           NULL | 9223372036854775807 | 9223372036854775807 |           2 | 9223372036854775807 |                    2 |
|           1004 | u1              | 2024-11-04 11:01:30.256177 | 2024-11-04 11:01:30.256177 |       3 |       3 |  4294967296 |   12884901888 |           NULL |               10000 |               10000 |           0 | 9223372036854775807 |                    3 |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
2 rows in set (0.01 sec)

4、给 test1 租户在 zone4 上创建 resource pool

create resource pool p1_4 unit='u1' ,unit_num=1,zone_list=('zone4');

5、修改 test1 租户的 resource_pool_list

alter tenant test1 resource_pool_list=('p1_1','p1_2','p1_3','p1_4');

6、修改 test1 租户的 locality

alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4';

7、确认 test1 租户 locality 修改情况

select * from dba_ob_tenant_jobs  
where job_type='alter_tenant_locality' 
and tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1')
order by start_time desc limit 1 \G
*************************** 1. row ***************************
     JOB_ID: 2
   JOB_TYPE: ALTER_TENANT_LOCALITY
 JOB_STATUS: SUCCESS
RESULT_CODE: 0
   PROGRESS: 100
 START_TIME: 2024-11-04 12:01:55.851907
MODIFY_TIME: 2024-11-04 12:02:26.819124
  TENANT_ID: 1010
   SQL_TEXT: alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4'
 EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3', TO: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3, COLUMNSTORE{1}@zone4'
  RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
1 row in set (0.02 sec)

新建租户时创建列存副本

create resource unit u2 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;

create resource pool p2_1 unit='u2',zone_list=('zone1'),unit_num=1;
create resource pool p2_2 unit='u2',zone_list=('zone2'),unit_num=1;
create resource pool p2_3 unit='u2',zone_list=('zone3'),unit_num=1;
create resource pool p2_4 unit='u2',zone_list=('zone4'),unit_num=1;

create tenant test2 
resource_pool_list=('p2_1','p2_2','p2_3','p2_4'),
primary_zone='zone1,zone2,zone3;zone4',
locality='F@zone1, F@zone2, F@zone3, C@zone4',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';

mysql -h127.0.0.1  -P12881 -uroot@test2 -p -A
alter user root identified by 'xxx';

配置 obproxy

使用 root@proxysys 登录对应的 obproxy

独占的 obproxy

给列存副本单独创建一个 obproxy 并登录后进行如下配置

alter proxyconfig set obproxy_read_consistency='1';
alter proxyconfig set init_sql = 'set @@ob_route_policy="COLUMN_STORE_ONLY";';
共享的 obproxy

没有独立的机器资源供列存副本使用,需要复用已有的 obproxy环境,此时可以设置 obproxy 多级配置,关于 obproxy 的多级配置可以详见 官网文档:

https://www.oceanbase.com/docs/common-odp-doc-cn-1000000001409917

replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');

replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');

访问列存副本测试

使用如上配置的 obproxy 登录测试

# sys 租户
MySQL [oceanbase]> select zone,tenant_id,name,value,default_value from gv$ob_parameters where tenant_id=1010 and name='default_table_store_format';
+-------+-----------+----------------------------+-------+---------------+
| zone  | tenant_id | name                       | value | default_value |
+-------+-----------+----------------------------+-------+---------------+
| zone1 |      1010 | default_table_store_format | row   | row           |
| zone4 |      1010 | default_table_store_format | row   | row           |
| zone3 |      1010 | default_table_store_format | row   | row           |
| zone2 |      1010 | default_table_store_format | row   | row           |
+-------+-----------+----------------------------+-------+---------------+
4 rows in set (0.03 sec)

# test1 租户
MySQL [test]> show create table t1 \G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL
) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0
 partition by hash(id)
(partition `p0`,
partition `p1`,
partition `p2`)
1 row in set (0.01 sec)


MySQL [test]> explain select * from t1;
+----------------------------------------------------------------------+
| Query Plan                                                           |
+----------------------------------------------------------------------+
| ================================================================     |
| |ID|OPERATOR                    |NAME    |EST.ROWS|EST.TIME(us)|     |
| ----------------------------------------------------------------     |
| |0 |PX COORDINATOR              |        |1       |7           |     |
| |1 |└─EXCHANGE OUT DISTR        |:EX10000|1       |7           |     |
| |2 |  └─PX PARTITION ITERATOR   |        |1       |7           |     |
| |3 |    └─COLUMN TABLE FULL SCAN|t1      |1       |7           |     |
| ================================================================     |
| Outputs & filters:                                                   |
| -------------------------------------                                |
|   0 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16     |
|   1 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16     |
|       dop=1                                                          |
|   2 - output([t1.id]), filter(nil), rowset=16                        |
|       force partition granule                                        |
|   3 - output([t1.id]), filter(nil), rowset=16                        |
|       access([t1.id]), partitions(p[0-2])                            |
|       is_index_back=false, is_global_index=false,                    |
|       range_key([t1.__pk_increment]), range(MIN ; MAX)always true    |
+----------------------------------------------------------------------+
19 rows in set (0.01 sec)
  • 表结构没有 with column group ,default_table_store_format 是默认的行存,执行计划展示上显示 COLUMN TABLE FULL SCAN,说明使用到了列存的范围扫描。
  • 这里的测试表 t1 是在 test1 租户下测试的,该租户的拓扑 3F-1C ,有4个副本,但是在 show create table 和 show create tenant 结果中 replica_num都等于3,使用的是全功能副本的数量。

注意事项

1、observer 需要 4.3.3.0 及其之上的版本。

2、ocp 需要 4.3.3 及其之上的版本(当前还没有发布ocp 4.3.3)。

3、obd 需要 2.10.1-1 及其之上的版本。

4、obproxy 需要 4.3.2 及其之上的版本。

5、不建议部署 2 个及以上数目的列存副本。

6、全功能和只读副本不支持转为列存副本,列存副本也不支持转为全功能和只读副本。

7、物理恢复不支持恢复列存副本。

8、如果主库未部署列存副本,备库也不建议部署列存副本。

9、列存表是指表的分区 Leader & Follower 的 Schema 均为列存格式,查询可以是强读;

列存副本是在保证表的分区 Leader & Follower 的 Schema 为行存格式的前提下,只读副本 Learner 为列存格式,并且 OLAP 的查询只能是弱读。

其他详见官网文档:

列存副本

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001428590

相关推荐
OceanBase数据库官方博客12 小时前
半连接转内连接 | OceanBase SQL 查询改写
sql·oceanbase·分布式数据库
OceanBase数据库官方博客19 小时前
解析在OceanBase创建分区的常见问题|OceanBase 用户问题精粹
oceanbase·分布式数据库·分区
OceanBase数据库官方博客19 小时前
半连接转内连接规则的原理与代码解析 |OceanBase查询优化
sql·oceanbase·分布式数据库
IT培训中心-竺老师4 天前
OceanBase 数据库分布式与集中式 能力
数据库·分布式·oceanbase
靖顺4 天前
【OceanBase 诊断调优】—— OceanBase 数据库网络速率配置方案
网络·数据库·oceanbase
尚雷558012 天前
OceanBase 社区版 4.0 离线方式升级bp1至bp2 指南(含避坑总结)
oceanbase
五月高高12 天前
Linux部署oceanbase
linux·oceanbase
靖顺15 天前
【OceanBase 诊断调优】—— 统计信息自动收集超时导致的估行不准 SQL 选择错索引
数据库·sql·oceanbase
it界的哈士奇15 天前
Oceanbase离线集群部署
oceanbase