Json 类型与多值索引 — OceanBase 4.3.2 AP 功能体验

本文来自 2024年OceanBase技术征文大赛------"让技术被看见 | OceanBase 布道师计划" 的用户征文。也欢迎更多的技术爱好者参与征文,赢取万元大奖。和我们一起,用文字让代码跳动起来! 参与**2024年OceanBase技术征文大赛>>**

MySQL在5.7.8版本中引入了JSON数据类型及相应的JSON函数,而在8.0.17版本中,又推出了针对值数组的多值索引。这些功能非常实用,一度成为 ​​​MySQL迁移到OceanBase上的一个阻点。之前,OceanBase 在 3.2.2版本推出了JSON数据类型及JSON函数,4.3.2版本则实现了多值索引。本文将分享OceanBase 4.3.2版本中JSON数据类型及其索引的使用示例和实践。


OB 的 ORACLE 租户和 MySQL 租户都支持 JSON 类型,本文主要演示 MySQL 租户下的 JSON 使用。

首先创建一个含有 JSON 类型的表,示例参考 MySQL 官方文档中 JSON 示例表。

CREATE TABLE customers (
     id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
     modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
     custinfo JSON
 );
 
 INSERT INTO customers VALUES
 (NULL, NOW(), '{"user":"Jack","user_id":37,"zipcode":[94582,94536]}'),
 (NULL, NOW(), '{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}'),
 (NULL, NOW(), '{"user":"Bob","user_id":31,"zipcode":[94477,94507]}'),
 (NULL, NOW(), '{"user":"Mary","user_id":72,"zipcode":[94536]}'),
 (NULL, NOW(), '{"user":"Ted","user_id":56,"zipcode":[94507,94582]}')
;

select * from customers;

JSON 列不支持直接索引,如果想对 JSON 列部分字段做索引,可以新增一个虚拟列,然后针对这个虚拟列创建索引。

ALTER TABLE customers ADD v_user varchar(20) GENERATED ALWAYS AS (json_unquote(json_extract (`custinfo`, _utf8mb4'$.user'))) virtual ;
ALTER TABLE customers ADD KEY idx_user(v_user);

再看看查询虚拟列的执行计划。

mysql> explain SELECT * FROM customers WHERE v_user='Bob';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                                         |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ===============================================================                                                                                                                    |
| |ID|OPERATOR        |NAME               |EST.ROWS|EST.TIME(us)|                                                                                                                    |
| ---------------------------------------------------------------                                                                                                                    |
| |0 |TABLE RANGE SCAN|customers(idx_user)|1       |7           |                                                                                                                    |
| ===============================================================                                                                                                                    |
| Outputs & filters:                                                                                                                                                                 |
| -------------------------------------                                                                                                                                              |
|   0 - output([customers.id], [customers.modified], [customers.custinfo], [column_conv(VARCHAR,utf8mb4_general_ci,length:20,NULL,cast(json_unquote(json_extract(customers.custinfo, |
|        '$.user')), VARCHAR(1048576)))]), filter(nil)                                                                                                                               |
|       access([customers.id], [customers.custinfo], [customers.v_user], [customers.modified]), partitions(p0)                                                                       |
|       is_index_back=true, is_global_index=false,                                                                                                                                   |
|       range_key([customers.v_user], [customers.id]), range(Bob,MIN ; Bob,MAX),                                                                                                     |
|       range_cond([customers.v_user = 'Bob'])                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set, 1 warning (0.02 sec)

上面方法如果要对 JSON 字段 zipcode 做索引就有点困难,这个列的值是个数组。这就用到新推出的多值索引功能。

先看查询场景 SQL 。

EXPLAIN SELECT * FROM customers WHERE 94507 MEMBER OF(custinfo->'$.zipcode');
EXPLAIN SELECT * FROM customers WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
EXPLAIN SELECT * FROM customers WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));

这三个 SQL 使用了常见的 JSON 函数,执行计划全部是全表扫描就不发了。应对方法就是新增多值索引。

mysql> ALTER TABLE customers ADD INDEX zips( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
ERROR 1235 (0A000): dynamic add multivalue index not supported yet not supported
mysql> 

遗憾的是由于 OB 4.3.2 是刚支持多值索引,目前还只实现在建表的时候创建多值索引,暂不支持后期动态添加多值索引。

所以我们再创建一个带多值索引的新表看看。

create table customers2(
  id bigint not null auto_increment  ,
  modified datetime default current_timestamp on update current_timestamp,
  custinfo json,
  index zips((cast(custinfo->'$.zipcode' as unsigned array))),
  INDEX comp(id, modified,(cast(custinfo->'$.zipcode' as unsigned array)))
);

INSERT INTO customers2 SELECT * FROM customers;

为了减少篇幅,我这里一次性创建两类多值索引。一个是针对单列的多值索引,一个是多列组合索引。

mysql> EXPLAIN SELECT * FROM customers2 WHERE 94507 MEMBER OF(custinfo->'$.zipcode');                                                                                                                                                                                       
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                                            
| Query Plan                                                                                                                                                   |                                                                                                            
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                                            
| ===========================================================                                                                                                  |                                                                                                            
| |ID|OPERATOR       |NAME            |EST.ROWS|EST.TIME(us)|                                                                                                  |                                                                                                            
| -----------------------------------------------------------                                                                                                  |                                                                                                            
| |0 |TABLE FULL SCAN|customers2(zips)|2       |13          |                                                                                                  |                                                                                                            
| ===========================================================                                                                                                  |                                                                                                            
| Outputs & filters:                                                                                                                                           |                                                                                                            
| -------------------------------------                                                                                                                        |                                                                                                            
|   0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_MEMBER_OF(94507, JSON_EXTRACT(customers2.custinfo, '$.zipcode'))]) |                                                                                                            
|       access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0)                                     |                                                                                                            
|       is_index_back=true, is_global_index=false, filter_before_indexback[false],                                                                             |                                                                                                            
|       range_key([customers2.SYS_NC_mvi_19], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]), range(94507,MIN,MIN ; 94507,MAX,MAX)       |                                                                                                            
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                                            
11 rows in set (0.00 sec)                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                            
mysql> EXPLAIN SELECT * FROM customers2 WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));                                                                                                                                                          
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                                        
| Query Plan                                                                                                                                                       |                                                                                                        
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+                                                                                                        
| ===========================================================                                                                                                      |                                                                                                        
| |ID|OPERATOR       |NAME            |EST.ROWS|EST.TIME(us)|                                                                                                      |                                                                                                        
| -----------------------------------------------------------                                                                                                      |                                                                                                        
| |0 |TABLE FULL SCAN|customers2(zips)|3       |23          |                                                                                                      |                                                                                                        
| ===========================================================                                                                                                      |                                                                                                        
| Outputs & filters:                                                                                                                                               |                                                                                                        
| -------------------------------------                                                                                                                            |
|   0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_CONTAINS(JSON_EXTRACT(customers2.custinfo, '$.zipcode'), cast('[94507, |
|       94582]', JSON(536870911)))])                                                                                                                               |
|       access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0)                                         |
|       is_index_back=true, is_global_index=false, filter_before_indexback[false],                                                                                 |
|       range_key([customers2.SYS_NC_mvi_19], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]), range(94507,MIN,MIN ; 94507,MAX,MAX),          |
|       (94582,MIN,MIN ; 94582,MAX,MAX)                                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM customers2 WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ===========================================================                                                                                                      |
| |ID|OPERATOR       |NAME            |EST.ROWS|EST.TIME(us)|                                                                                                      |
| -----------------------------------------------------------                                                                                                      |
| |0 |TABLE FULL SCAN|customers2(zips)|3       |23          |                                                                                                      |
| ===========================================================                                                                                                      |
| Outputs & filters:                                                                                                                                               |
| -------------------------------------                                                                                                                            |
|   0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_OVERLAPS(JSON_EXTRACT(customers2.custinfo, '$.zipcode'), cast('[94507, |
|       94582]', JSON(536870911)))])                                                                                                                               |
|       access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0)                                         |
|       is_index_back=true, is_global_index=false, filter_before_indexback[false],                                                                                 |
|       range_key([customers2.SYS_NC_mvi_19], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]), range(94507,MIN,MIN ; 94507,MAX,MAX),          |
|       (94582,MIN,MIN ; 94582,MAX,MAX)                                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM customers2 WHERE id = 23 and modified = 103 and 94507 MEMBER OF(custinfo->'$.zipcode');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ============================================================                                                                                                 |
| |ID|OPERATOR        |NAME            |EST.ROWS|EST.TIME(us)|                                                                                                 |
| ------------------------------------------------------------                                                                                                 |
| |0 |TABLE RANGE SCAN|customers2(comp)|1       |7           |                                                                                                 |
| ============================================================                                                                                                 |
| Outputs & filters:                                                                                                                                           |
| -------------------------------------                                                                                                                        |
|   0 - output([customers2.id], [customers2.modified], [customers2.custinfo]), filter([JSON_MEMBER_OF(94507, JSON_EXTRACT(customers2.custinfo, '$.zipcode'))]) |
|       access([customers2.__pk_increment], [customers2.custinfo], [customers2.id], [customers2.modified]), partitions(p0)                                     |
|       is_index_back=true, is_global_index=false, filter_before_indexback[false],                                                                             |
|       range_key([customers2.id], [customers2.modified], [customers2.SYS_NC_mvi_21], [customers2.__pk_increment], [customers2.__doc_id_1727685398954214]),    |
|        range(23,2000-01-03 00:00:00.000000,MIN,MIN,MIN ; 23,2000-01-03 00:00:00.000000,MAX,MAX,MAX),                                                         |
|       range_cond([customers2.id = 23], [customers2.modified = INTERNAL_FUNCTION(103, 110, 17)])                                                              |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)

这里直接将 4 种使用多值索引的查询场景 SQL 以及执行计划放出,这执行计划中的细节大家可以一一详细查看。

多值索引的结构如下。

mysql> show indexes from customers2;
+------------+------------+----------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| Table      | Non_unique | Key_name | Seq_in_index | Column_name               | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment   | Index_comment | Visible | Expression                                                                                                                     |
+------------+------------+----------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| customers2 |          1 | zips     |            1 | SYS_NC_mvi_19             | A         |        NULL | NULL     | NULL   | YES  | BTREE      | available |               | YES     | json_query(`custinfo`,'$.zipcode' RETURNING unsigned WITHOUT ARRAY WRAPPER asis error on error null on empty null on mismatch) |
| customers2 |          1 | zips     |            2 | __pk_increment            | A         |        NULL | NULL     | NULL   |      | BTREE      | available |               | YES     | NULL                                                                                                                           |
| customers2 |          1 | zips     |            3 | __doc_id_1727685398954214 | A         |        NULL | NULL     | NULL   |      | BTREE      | available |               | YES     | NULL                                                                                                                           |
| customers2 |          1 | comp     |            1 | id                        | A         |        NULL | NULL     | NULL   |      | BTREE      | available |               | YES     | NULL                                                                                                                           |
| customers2 |          1 | comp     |            2 | modified                  | A         |        NULL | NULL     | NULL   | YES  | BTREE      | available |               | YES     | NULL                                                                                                                           |
| customers2 |          1 | comp     |            3 | SYS_NC_mvi_21             | A         |        NULL | NULL     | NULL   | YES  | BTREE      | available |               | YES     | json_query(`custinfo`,'$.zipcode' RETURNING unsigned WITHOUT ARRAY WRAPPER asis error on error null on empty null on mismatch) |
| customers2 |          1 | comp     |            4 | __pk_increment            | A         |        NULL | NULL     | NULL   |      | BTREE      | available |               | YES     | NULL                                                                                                                           |
| customers2 |          1 | comp     |            5 | __doc_id_1727685398954214 | A         |        NULL | NULL     | NULL   |      | BTREE      | available |               | YES     | NULL                                                                                                                           |
+------------+------------+----------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

还有一类特殊的场景是多值索引的唯一性索引。

CREATE TABLE customers3 (
 id BIGINT not null primary key,
 modified BIGINT not null,
 custinfo JSON,
 UNIQUE INDEX zips1( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) )
);

INSERT INTO customers3 VALUES
 (10, 21, '{"user":"Jack","user_id":37,"zipcode":[94582,94536]}');
 
mysql> select * from customers3;
+----+----------+------------------------------------------------------------+
| id | modified | custinfo                                                   |
+----+----------+------------------------------------------------------------+
| 10 |       21 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]} |
+----+----------+------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO customers3 VALUES  (11, 22, '{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}');
ERROR 1062 (23000): Duplicate entry '94582' for key 'zips1'
mysql> 

mysql> show indexes from customers3;
+------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| Table      | Non_unique | Key_name | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment   | Index_comment | Visible | Expression                                                                                                                     |
+------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
| customers3 |          0 | PRIMARY  |            1 | id            | A         |        NULL | NULL     | NULL   |      | BTREE      | available |               | YES     | NULL                                                                                                                           |
| customers3 |          0 | zips1    |            1 | SYS_NC_mvi_19 | A         |        NULL | NULL     | NULL   | YES  | BTREE      | available |               | YES     | json_query(`custinfo`,'$.zipcode' RETURNING unsigned WITHOUT ARRAY WRAPPER asis error on error null on empty null on mismatch) |
+------------+------------+----------+--------------+---------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+--------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

由此可见唯一性的多值索引能拦截导致多值数组中出现重复的值。

多值索引也有一些功能限制。

  • 多值索引不能定义列的顺序 ASC 或 DESC,也不能用于消除排序,多值列不能用于主键。

    EXPLAIN SELECT v_user FROM customers order by v_user;
    EXPLAIN SELECT custinfo FROM customers2 order by custinfo;

相关推荐
慕羽★6 小时前
详细介绍如何使用rapidjson读取json文件
linux·c++·windows·json·file·param·rapidjson
轻口味11 小时前
配置TypeScript:tsconfig.json详解
ubuntu·typescript·json
OceanBase数据库官方博客11 小时前
半连接转内连接 | OceanBase SQL 查询改写
sql·oceanbase·分布式数据库
zybishe12 小时前
免费送源码:Java+ssm++MVC+HTML+CSS+MySQL springboot 社区医院信息管理系统的设计与实现 计算机毕业设计原创定制
java·hadoop·sql·zookeeper·html·json·mvc
OceanBase数据库官方博客18 小时前
解析在OceanBase创建分区的常见问题|OceanBase 用户问题精粹
oceanbase·分布式数据库·分区
OceanBase数据库官方博客18 小时前
半连接转内连接规则的原理与代码解析 |OceanBase查询优化
sql·oceanbase·分布式数据库
m0_738820201 天前
前端模拟接口工具-json-server
前端·json
love静思冥想2 天前
Java 单元测试中 JSON 相关的测试案例
java·单元测试·json
FL16238631292 天前
电力场景输电塔分割数据集labelme格式288张4类别
json
Tinalee-电商API接口呀2 天前
从零开始构建电商数据采集系统:步骤与策略
java·大数据·开发语言·人工智能·信息可视化·json