如何用SQL语句来查询表或索引的行存/列存存储方式|OceanBase 用户问题集锦

一、问题背景

OceanBase 4.3.0版本起,支持了列存引擎,允许表和索引以行存、纯列存或行列冗余的形式创建,且这些存储方式可以自由组合。除了使用 show create table命令来查看表和索引的存储类型外,也有用户询问如何通过SQL语句来查询表或索引的存储方式。那么,具体该如何操作呢?

二、测试表

说明:这里仅列举了部分组合,还有其他的组合应该也是类似的,不再赘述,欢迎测试,拍砖。

复制代码
-- 行存表,行存索引
create table  t1(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t1_c2(c2)) partition by hash(c1) partitions 3;
create table  t2(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t2_c2(c2)) partition by hash(c1) partitions 3 with column group(all columns);
create table  t3(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t3_c2(c2) with column group(all columns)) partition by hash(c1) partitions 3 with column group(all columns);

-- 行存表,纯列存索引
create table  t4(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t4_c2(c2) with column group(each column)) partition by hash(c1) partitions 3;

-- 行存表,行列混合索引
create table  t5(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t5_c2(c2) with column group(each column,all columns)) partition by hash(c1) partitions 3;

-- 纯列存表,行存索引
create table  t6(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t6_c2(c2) with column group(all columns)) partition by hash(c1) partitions 3 with column group(each column);
create table  t7(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t7_c2(c2)) partition by hash(c1) partitions 3 with column group(each column);

-- 行列混合表,行列混合索引
create table  t8(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t8_c2(c2) with column group(each column,all columns)) partition by hash(c1) partitions 3 with column group(each column,all columns);

三、摸索

从列存相关的语法上可以看出,引入列存后新增加了 with column group (xxx) 的关键字,可以尝试搜一下哪些表的列上涉及了 column_group 相关的字段,从下面的结果看目前并没有标准表或者视图提供这样的信息。

复制代码
MySQL [oceanbase]> select distinct table_name from __all_virtual_table 
where table_id in (select distinct table_id from __all_virtual_column 
                   where column_name like '%column_group%');
+--------------------------------------------+
| table_name                                 |
+--------------------------------------------+
| __all_table_history                        |
| __all_column_group                         |
| __all_column_group_history                 |
| __all_column_group_mapping                 |
| __all_column_group_mapping_history         |
| __all_virtual_core_all_table               |
| __all_virtual_table                        |
| __all_virtual_table_history                |
| __all_virtual_column_group                 |
| __all_virtual_column_group_mapping         |
| __all_virtual_column_group_history         |
| __all_virtual_column_group_mapping_history |
+--------------------------------------------+
12 rows in set (0.97 sec)

从 __all_virtual_column_group 表的 column_group_type 字段,凭感觉可以标识。

复制代码
MySQL [oceanbase]> desc __all_virtual_column_group;
+-------------------+--------------+------+-----+---------+-------+
| Field             | Type         | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| tenant_id         | bigint(20)   | NO   | PRI | NULL    |       |
| table_id          | bigint(20)   | NO   | PRI | NULL    |       |
| column_group_id   | bigint(20)   | NO   | PRI | NULL    |       |
| gmt_create        | timestamp(6) | NO   |     | NULL    |       |
| gmt_modified      | timestamp(6) | NO   |     | NULL    |       |
| column_group_name | varchar(389) | NO   |     |         |       |
| column_group_type | bigint(20)   | NO   |     | NULL    |       |
| block_size        | bigint(20)   | NO   |     | NULL    |       |
| compressor_type   | bigint(20)   | NO   |     | NULL    |       |
| row_store_type    | bigint(20)   | NO   |     | NULL    |       |
+-------------------+--------------+------+-----+---------+-------+
10 rows in set (0.02 sec)

从代码 src/share/schema/ob_schema_struct.h 找到 column_group 类型的枚举值。

复制代码
enum ObColumnGroupType : uint8_t
{
  DEFAULT_COLUMN_GROUP = 0,
  ALL_COLUMN_GROUP,
  ROWKEY_COLUMN_GROUP,
  SINGLE_COLUMN_GROUP,
  NORMAL_COLUMN_GROUP,
  MAX_COLUMN_GROUP
};

四、结论

经过测试发现如何标识 表或者索引是行存/纯列存/行列冗余的方式存储,这里之所以给 "结论"使用引号扩起来,原因:

1、受限于自己测试的 case 可能不完善,存在错误的情况,欢迎一起测试,交流。

2、随着版本的迭代,针对枚举值可能会有调整/比如增删等,应以实际版本中的枚举值为准。

  • 某个租户下当同一个 table_id 的 column_group_type 包含3 但是不包含1,输出:纯列存表
  • 某个租户下当同一个 table_id 的 column_group_type 包含1 和 3,输出:行列混合表
  • 其他情况均输出 :纯行存表

五、查询sql和结果

说明:可以按需调整 tenant_name/database_name

sql_1(混合查询)
复制代码
select 
    t1.tenant_name,
    t2.database_name,
    case 
        when t2.table_type = 'user table' then t2.table_name
        when t2.table_type = 'index' then t2.index_name
    end as table_name,
    t2.table_id,
    t2.data_table_id,
    t2.table_type,
    case
        when t2.table_type = 'user table' then
            case
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存表'
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合表'
                else '纯行存表'
            end
        when t2.table_type = 'index' then
            case
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存索引'
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合索引'
                else '纯行存索引'
            end
    end as storage_type,
    coalesce(t3.table_name, null) as main_table_name
from 
    __all_virtual_column_group cg
join 
    dba_ob_tenants t1 on cg.tenant_id = t1.tenant_id
join 
    cdb_ob_table_locations t2 on cg.tenant_id = t2.tenant_id and cg.table_id = t2.table_id
left join 
    cdb_ob_table_locations t3 on t2.data_table_id = t3.table_id
where 
    t1.tenant_name in ('test1','test7')
    and t2.database_name in ('row_column_db','db100','db600')
    and t2.table_type in ('user table', 'index')
group by 
    t1.tenant_name, t2.database_name, t2.table_name, t2.table_id, t2.data_table_id, t2.table_type, main_table_name
order by t1.tenant_name,t2.database_name, t2.table_name;
sql1_查询结果
复制代码
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| tenant_name | database_name | table_name | table_id | data_table_id | table_type | storage_type       | main_table_name |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| test1       | row_column_db | t1         |   596454 |          NULL | USER TABLE | 纯行存表           | NULL            |
| test1       | row_column_db | t2         |   596462 |          NULL | USER TABLE | 纯行存表           | NULL            |
| test1       | row_column_db | t3         |   596470 |          NULL | USER TABLE | 纯行存表           | NULL            |
| test1       | row_column_db | t4         |   596478 |          NULL | USER TABLE | 纯行存表           | NULL            |
| test1       | row_column_db | idx_t1_c2  |   596455 |        596454 | INDEX      | 纯行存索引         | t1              |
| test1       | row_column_db | idx_t2_c2  |   596463 |        596462 | INDEX      | 纯行存索引         | t2              |
| test1       | row_column_db | idx_t3_c2  |   596471 |        596470 | INDEX      | 纯行存索引         | t3              |
| test1       | row_column_db | idx_t4_c2  |   596479 |        596478 | INDEX      | 纯列存索引         | t4              |
| test7       | db100         | t8         |   500070 |          NULL | USER TABLE | 行列混合表         | NULL            |
| test7       | db100         | idx_t8_c2  |   500071 |        500070 | INDEX      | 行列混合索引       | t8              |
| test7       | db600         | t5         |   500045 |          NULL | USER TABLE | 纯行存表           | NULL            |
| test7       | db600         | t6         |   500053 |          NULL | USER TABLE | 纯列存表           | NULL            |
| test7       | db600         | t7         |   500061 |          NULL | USER TABLE | 纯列存表           | NULL            |
| test7       | db600         | idx_t5_c2  |   500046 |        500045 | INDEX      | 行列混合索引       | t5              |
| test7       | db600         | idx_t6_c2  |   500054 |        500053 | INDEX      | 纯行存索引         | t6              |
| test7       | db600         | idx_t7_c2  |   500062 |        500061 | INDEX      | 纯行存索引         | t7              |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
16 rows in set (3.45 sec)

sql2_查询纯列存的表/索引

复制代码
select 
    t1.tenant_name,
    t2.database_name,
    case 
        when t2.table_type = 'user table' then t2.table_name
        when t2.table_type = 'index' then t2.index_name
    end as table_name,
    t2.table_id,
    t2.data_table_id,
    t2.table_type,
    case
        when t2.table_type = 'user table' then
            case
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存表'
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合表'
                else '纯行存表'
            end
        when t2.table_type = 'index' then
            case
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存索引'
                when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                     and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合索引'
                else '纯行存索引'
            end
    end as storage_type,
    coalesce(t3.table_name, null) as main_table_name
from 
    __all_virtual_column_group cg
join 
    dba_ob_tenants t1 on cg.tenant_id = t1.tenant_id
join 
    cdb_ob_table_locations t2 on cg.tenant_id = t2.tenant_id and cg.table_id = t2.table_id
left join 
    cdb_ob_table_locations t3 on t2.data_table_id = t3.table_id
where 
    t1.tenant_name in ('test1','test7')
    and t2.database_name in ('row_column_db','db100','db600')
    and t2.table_type in ('user table', 'index')
group by 
    t1.tenant_name, t2.database_name, t2.table_name, t2.table_id, t2.data_table_id, t2.table_type, main_table_name
having
    (t2.table_type = 'user table' and sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0)
    or
    (t2.table_type = 'index' and sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0)
order by 
    t1.tenant_name, t2.database_name, t2.table_name;
sql2_查询结果
复制代码
+-------------+---------------+------------+----------+---------------+------------+-----------------+-----------------+
| tenant_name | database_name | table_name | table_id | data_table_id | table_type | storage_type    | main_table_name |
+-------------+---------------+------------+----------+---------------+------------+-----------------+-----------------+
| test1       | row_column_db | idx_t4_c2  |   596479 |        596478 | INDEX      | 纯列存索引      | t4              |
| test7       | db600         | t6         |   500053 |          NULL | USER TABLE | 纯列存表        | NULL            |
| test7       | db600         | t7         |   500061 |          NULL | USER TABLE | 纯列存表        | NULL            |
+-------------+---------------+------------+----------+---------------+------------+-----------------+-----------------+
3 rows in set (3.48 sec)

sql3_查询行列冗余的表/索引

复制代码
SELECT 
    tenant_name,
    database_name,
    table_name,
    table_id,
    data_table_id,
    table_type,
    storage_type,
    main_table_name
FROM (
    select 
        t1.tenant_name,
        t2.database_name,
        case 
            when t2.table_type = 'user table' then t2.table_name
            when t2.table_type = 'index' then t2.index_name
        end as table_name,
        t2.table_id,
        t2.data_table_id,
        t2.table_type,
        case
            when t2.table_type = 'user table' then
                case
                    when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                         and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存表'
                    when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                         and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合表'
                    else '纯行存表'
                end
            when t2.table_type = 'index' then
                case
                    when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                         and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存索引'
                    when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 
                         and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合索引'
                    else '纯行存索引'
                end
        end as storage_type,
        coalesce(t3.table_name, null) as main_table_name
    from 
        __all_virtual_column_group cg
    join 
        dba_ob_tenants t1 on cg.tenant_id = t1.tenant_id
    join 
        cdb_ob_table_locations t2 on cg.tenant_id = t2.tenant_id and cg.table_id = t2.table_id
    left join 
        cdb_ob_table_locations t3 on t2.data_table_id = t3.table_id
    where 
        t1.tenant_name in ('test1','test7')
        and t2.database_name in ('row_column_db','db100','db600')
        and t2.table_type in ('user table', 'index')
    group by 
        t1.tenant_name, t2.database_name, t2.table_name, t2.table_id, t2.data_table_id, t2.table_type, main_table_name
) subquery
WHERE 
    storage_type IN ('行列混合表', '行列混合索引')
ORDER BY tenant_name, database_name, table_name;
sql3_查询结果
复制代码
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| tenant_name | database_name | table_name | table_id | data_table_id | table_type | storage_type       | main_table_name |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| test7       | db100         | idx_t8_c2  |   500071 |        500070 | INDEX      | 行列混合索引       | t8              |
| test7       | db100         | t8         |   500070 |          NULL | USER TABLE | 行列混合表         | NULL            |
| test7       | db600         | idx_t5_c2  |   500046 |        500045 | INDEX      | 行列混合索引       | t5              |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
3 rows in set (3.33 sec)
相关推荐
XYiFfang1 天前
【MYSQL】SQL学习指南:从常见错误到高级函数与正则表达式
sql·mysql·正则表达式·regexp_like·group_concat
十碗饭吃不饱1 天前
sql报错:java.sql.SQLSyntaxErrorException: Unknown column ‘as0‘ in ‘where clause‘
java·数据库·sql
呆呆小金人1 天前
SQL入门: HAVING用法全解析
大数据·数据库·数据仓库·sql·数据库开发·etl·etl工程师
l1t1 天前
用parser_tools插件来解析SQL语句
数据库·sql·插件·duckdb
TDengine (老段)2 天前
TDengine 数学函数 ABS() 用户手册
大数据·数据库·sql·物联网·时序数据库·tdengine·涛思数据
Hello.Reader2 天前
Apache StreamPark 快速上手从一键安装到跑起第一个 Flink SQL 任务
sql·flink·apache
养生技术人2 天前
Oracle OCP认证考试题目详解082系列第57题
运维·数据库·sql·oracle·开闭原则
养生技术人2 天前
Oracle OCP认证考试题目详解082系列第53题
数据库·sql·oracle·database·开闭原则·ocp
豆沙沙包?2 天前
2025年--Lc171--H175 .组合两个表(SQL)
数据库·sql
养生技术人2 天前
Oracle OCP认证考试题目详解082系列第48题
运维·数据库·sql·oracle·database·开闭原则·ocp