背景
本文基于 Starrocks 3.3.5
本文主要来探索一下Starrocks在FE端怎么实现 短路径,从而加速点查查询速度。
在用户层级需要设置 enable_short_circuit 为true
分析
数据流:
直接到StatementPlanner.createQueryPlan方法:
...
OptExpression root = ShortCircuitPlanner.checkSupportShortCircuitRead(logicalPlan.getRoot(), session);
...
optimizedPlan = optimizer.optimize(
session,
root,
mvTransformerContext,
stmt,
new PhysicalPropertySet(),
new ColumnRefSet(logicalPlan.getOutputColumn()),
columnRefFactory);
首先是通过ShortCircuitPlanner.checkSupportShortCircuitRead
来判断该SQL是不是支持短路径查询:
public static OptExpression checkSupportShortCircuitRead(OptExpression root, ConnectContext connectContext) {
if (!connectContext.getSessionVariable().isEnableShortCircuit()) {
root.setShortCircuit(false);
return root;
}
boolean supportShortCircuit = root.getOp().accept(new LogicalPlanChecker(), root, null);
if (supportShortCircuit && OperatorType.LOGICAL_LIMIT.equals(root.getOp().getOpType())) {
root = root.getInputs().get(0);
}
root.setShortCircuit(supportShortCircuit);
return root;
}
-
通过
isEnableShortCircuit
也就是enable_short_circuit
(默认是false) 来判断是否支持短路径查询 -
通过visitor
LogicalPlanChecker
来判断SQL本身是否支持短路径查询
通过 LogicalPlanChecker 实现看到,目前只支持 Scan Project Filter Limit 操作:public static class LogicalPlanChecker extends BaseLogicalPlanChecker { ... @Override public Boolean visitLogicalFilter(OptExpression optExpression, Void context) { ... return visitChild(optExpression, context); } @Override public Boolean visitLogicalProject(OptExpression optExpression, Void context) { ... return visitChild(optExpression, context); } @Override public Boolean visitLogicalLimit(OptExpression optExpression, Void context) { ... return visitChild(optExpression, context); } @Override public Boolean visitLogicalTableScan(OptExpression optExpression, Void context) { return createLogicalPlanChecker(optExpression, allowFilter, allowLimit, allowProject, allowSort, predicate, orderByColumns, limit).visitLogicalTableScan(optExpression, context); } protected static boolean isPointScan(Table table, List<String> keyColumns, List<ScalarOperator> conjuncts, ShortCircuitContext shortCircuitContext) { Map<String, PartitionColumnFilter> filters = new TreeMap<>(String.CASE_INSENSITIVE_ORDER); filters.putAll(ColumnFilterConverter.convertColumnFilter(conjuncts, table)); if (keyColumns == null || keyColumns.isEmpty()) { return false; } long cardinality = 1; for (String keyColumn : keyColumns) { if (filters.containsKey(keyColumn)) { PartitionColumnFilter filter = filters.get(keyColumn); if (filter.getInPredicateLiterals() != null) { cardinality *= filter.getInPredicateLiterals().size(); // TODO(limit operator place fe) if (cardinality > MAX_RETURN_ROWS || (shortCircuitContext.getMaxReturnRows() != 0 && cardinality != 1)) { return false; } } else if (!filter.isPoint()) { return false; } } else { return false; } } return true; } } }
-
直接看visitLogicalTableScan这个方法
只有是存算一体的,也就是LogicalOlapScanOperator实例,才会有短路径查询,最终会走到ShortCircuitPlannerHybrid.LogicalPlanChecker.visitLogicalTableScan
方法public Boolean visitLogicalTableScan(OptExpression optExpression, Void context) { LogicalScanOperator scanOp = optExpression.getOp().cast(); Table table = scanOp.getTable(); if (!(table instanceof OlapTable) || !(KeysType.PRIMARY_KEYS.equals(((OlapTable) table).getKeysType()))) { return false; } for (Column column : table.getFullSchema()) { if (IDictManager.getInstance().hasGlobalDict(table.getId(), column.getColumnId())) { return false; } } List<String> keyColumns = ((OlapTable) table).getKeyColumns().stream().map(Column::getName).collect( Collectors.toList()); List<ScalarOperator> conjuncts = Utils.extractConjuncts(predicate); return isPointScan(table, keyColumns, conjuncts, shortCircuitContext); }
- 首先必须满足 是主键模型
- 再次是 必须满足SQL 查询的表和字段没有全局字典
- 最后 判断是不是点查
满足:1. 过滤条件要么是IN,要么是=
2. 如果是IN的话,IN中的项不能超过2024个
3. 必须包含所有的主键(可以额外包含其他的非主键)
-
-
如果确定可以走短路径的话,则设置
root.setShortCircuit(true)
,否则为false
再次进行计划级别的优化 optimizer.optimize
:
这里会调用optimizeByCost
方法,到调用 rewriteAndValidatePlan
方法:
private OptExpression rewriteAndValidatePlan(
OptExpression tree,
TaskContext rootTaskContext) {
OptExpression result = logicalRuleRewrite(tree, rootTaskContext);
OptExpressionValidator validator = new OptExpressionValidator();
validator.validate(result);
// skip memo
if (result.getShortCircuit()) {
result = new OlapScanImplementationRule().transform(result, null).get(0);
result.setShortCircuit(true);
}
return result;
}
ShortCircuit 短路径涉及到的有两方面:
-
logicalRuleRewrite中 ruleRewriteForShortCircuit
private Optional<OptExpression> ruleRewriteForShortCircuit(OptExpression tree, TaskContext rootTaskContext) { Boolean isShortCircuit = tree.getShortCircuit(); if (isShortCircuit) { deriveLogicalProperty(tree); ruleRewriteIterative(tree, rootTaskContext, RuleSetType.SHORT_CIRCUIT_SET); ruleRewriteOnlyOnce(tree, rootTaskContext, new MergeProjectWithChildRule()); OptExpression result = tree.getInputs().get(0); result.setShortCircuit(true); return Optional.of(result); } return Optional.empty(); }
这里会专门针对于shortCircuit做一些规则优化:
new PruneTrueFilterRule(),
new PushDownPredicateProjectRule(),
PushDownPredicateScanRule.OLAP_SCAN,
new CastToEmptyRule(),
new PruneProjectColumnsRule(),
PruneScanColumnRule.OLAP_SCAN,
new PruneProjectEmptyRule(),
new MergeTwoProjectRule(),
new PruneProjectRule(),
new PartitionPruneRule(),
new DistributionPruneRule();
new MergeProjectWithChildRule()
以上规则只是在project以及 常量优化,以及更好的过滤数据的层级进行了优化,免去了一般性的规则过滤. 正如primary_key_table所说,由于primary key模型使得谓词下推成为了可能。
- OlapScanImplementationRule().transform
这个也是在该SQL能够进行短路径的情况下,才会走到的数据流
这一步的作用主要是把逻辑的scan转换为物理的scan
经过了以上两步以后,就直接返回了,也不会进入到memo的CBO
优化。
至此 FE端 短路径的 优化就结束了,接下来就是生成物理计划了。