Calcite校验器

要想构建一个SqlValidator，需要用到如下4个关键类：SqlOperatorTable、SqlValidatorCatalogReader、RelDataTypeFactory、SqlValidator.Config

java 复制代码

  public static SqlValidatorWithHints newValidator(
      SqlOperatorTable opTab,
      SqlValidatorCatalogReader catalogReader,
      RelDataTypeFactory typeFactory,
      SqlValidator.Config config) {
    return new SqlValidatorImpl(opTab, catalogReader, typeFactory,
        config);
  }

SqlOperatorTable

SqlOperatorTable是用来定义查找SQL算子和函数 的接口。SQL算子，指的是SqlOperator及其子类。例如，select id,u.name as u_name from t_user u 语句中，便可能包含了 AS算子、DOT算子。可以理解为校验器所支持的SQL关键字(函数也是关键字的一种)。

SqlValidatorCatalogReader

SqlValidatorCatalogReader是用来给校验器提供目录信息的，也就是表、类型和Schema这样的元数据信息。是元数据和校验器的连接桥梁。构造它，我们需要传入一个 CalciteSchema、一个schema名称、一个数据类型处理器(RelDataTypeFactory)和连接配置信息(CalciteConnectionConfig)

java 复制代码

public CalciteCatalogReader(CalciteSchema rootSchema,
      List<String> defaultSchema, RelDataTypeFactory typeFactory, CalciteConnectionConfig config)

实际使用时，我们的后两个参数大都是固定的，因此只需传入rootSchema 和默认名称即可。

java 复制代码

RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl();
CalciteConnectionConfig config = CalciteConnectionConfig.DEFAULT;
// 创建CatalogReader, 用于指示如何读取Schema信息
Prepare.CatalogReader catalogReader = new CalciteCatalogReader(
    rootSchema,
    // 当有多个schema 且 SQL中缺省schema时，应该使用谁
    StringUtils.isEmpty(currentDatabase)?Collections.emptyList():Collections.singletonList(currentDatabase),
    typeFactory,
    config);

RelDataTypeFactory

RelDataTypeFactory是处理数据类型的工厂类，它负责SQL类型、Java类型和集合类型的创建和转化。Calcite支持SQL(SqlTypeFactoryImpl)和Java(JavaTypeFactoryImpl)两种实现，也可以仿照它们自行进行扩展。

SqlValidator.Config

校验器的一些自身的配置信息。如，是否允许隐式类型转换、是否展开选择列，等等。通常使用默认项即可SqlValidator.Config.DEFAULT

实践

准备工作

在test数据库中，准备一张用户得分表，用作此次的演示。

sql 复制代码

create table  test.t_score(
    id int(11) auto_increment primary key ,
    user_id int(11) not null comment '用户ID',
    subjects varchar(255) not null comment '学科',
    score decimal(5,2) default 0.0 comment '分数'
) ENGINE = InnoDB
  DEFAULT CHARSET = utf8mb4
  COLLATE = utf8mb4_general_ci
  ROW_FORMAT = DYNAMIC
    COMMENT ='用户得分表';

根据上一篇文章 Calcite元数据定义和获取,我们定义好一个CalciteSchema。

开始

第一步，构建CatalogReader

java 复制代码

RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl();
CalciteConnectionConfig config = CalciteConnectionConfig.DEFAULT;
// 创建CatalogReader, 用于指示如何读取Schema信息
Prepare.CatalogReader catalogReader = new CalciteCatalogReader(
    rootSchema,
    // 当有多个schema 且 SQL中缺省schema时，应该使用谁
    StringUtils.isEmpty(currentDatabase)?Collections.emptyList():Collections.singletonList(currentDatabase),
    typeFactory,
    config);

需要注意的是，如果你的数据库连接没有指定currentDatabase(也就是第二个参数赋值了emptyList),那么在写SQL时，就必须声明使用了哪个库的哪张表，如：

sql 复制代码

select user_id,sum(score) as s_score from test.t_score group by user_id

反之，如果你指定了currentDatabase(也就是第二个参数赋值了singletonList)，则无需声明库，校验器会使用默认值。

sql 复制代码

select user_id,sum(score) as s_score from t_score group by user_id

第二步，构建SqlValidator

java 复制代码

SqlValidator.Config validatorConfig = SqlValidator.Config.DEFAULT
        .withIdentifierExpansion(true);
SqlValidator validator = SqlValidatorUtil.newValidator(
        SqlStdOperatorTable.instance(), catalogReader, typeFactory, validatorConfig);

这里，笔者遇到一个问题，就是默认的SqlStdOperatorTable，在使用Mysql语法时，不支持if(bool,exp1,exp2)方法。但实际上，if函数的定义是可以在源代码中找到的，只是需要指明sql所用语法为hive。我的解决办法是,新建一个类，并继承SqlStdOperatorTable,将IF函数的定义直接赋值过来(不是copy代码).

java 复制代码

public class SqlCustomOperatorTable extends SqlStdOperatorTable {
    private static final Supplier<SqlCustomOperatorTable> INSTANCE =
            Suppliers.memoize(() ->
                    (SqlCustomOperatorTable) new SqlCustomOperatorTable().init());

    // 在init方法中，通过反射遍历的方式，将SqlOperator放入list
    public static final SqlFunction IF = SqlLibraryOperators.IF;

    public static SqlCustomOperatorTable instance() {
        return INSTANCE.get();
    }

如此一来，SqlValidator的构建就变成了

java 复制代码

SqlValidator validator = SqlValidatorUtil.newValidator(
        SqlCustomOperatorTable.instance(), catalogReader, typeFactory, validatorConfig);

if函数也得到了支持。

第三步，进行校验调用SqlNode validatedSqlNode = validator.validate(sqlNode); 方法，进行校验。如果顺利通过，则返回结果为null；否则，抛出异常。我们注意到，validate方法需要的是一个SqlNode。如果你的app是使用了Calcite进行的拼装sql，则有现成的sqlNode；如果是一条sql字符串，还需进行Sql解析这一步骤.

java 复制代码

SqlDialect sqlDialect = MysqlSqlDialect.DEFAULT;
// 从SqlDialect中还原出SqlParser.Config
SqlParser.Config config = sqlDialect.configureParser(SqlParser.config());
SqlNode sqlNode =  SqlParser.create(querySqlContext.getSql(), config).parseQuery();

至此，我们可以做一些简单的试验。比如，故意写错某一列的名字select usr_id,sum(score) as s_score from t_score group by user_id,程序会抛出异常：

java 复制代码

Exception in thread "main" org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 13: Column 'usr_id' not found in any table

而没有写group by 语句，也会提示我们：

java 复制代码

// sql is select user_id,sum(score) as s_score from t_score
Exception in thread "main" org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 14: Expression 'user_id' is not being grouped

至于更为复杂的场景，则靠大家去试验了。以上。

Calcite校验器校验SQL语句

Calcite校验器

SqlOperatorTable

SqlValidatorCatalogReader

RelDataTypeFactory

SqlValidator.Config

实践

准备工作

开始