导读
- What:Nereids优化器的核心组件有哪些?Cascades框架是什么?
- Why:为什么需要重写优化器?Nereids相比传统Planner有哪些优势?
- How:Memo数据结构如何紧凑存储等价表达式?
- How:优化规则如何分类和执行?代价模型如何计算?
核心概念
- Cascades框架:经典的查询优化框架
- Memo数据结构:Group和GroupExpression的紧凑表示
- 优化规则系统:Rewrite、Exploration、Implementation三类规则
- 代价模型(Cost Model):CPU、内存、网络代价计算
- 统计信息(Statistics):基数估算、选择率计算
- 规则优化阶段:Analyze、Rewrite、Optimize、PostProcess
- DPHyp算法:动态规划Join顺序优化
- 物化视图改写:自动利用物化视图加速查询
1. Nereids 优化器架构概览(What & Why)
1.1 为什么需要 Nereids 优化器(Why)
传统 Planner 的局限性
传统 Planner 的问题:
-
基于规则的优化(RBO)为主:
- 缺乏完善的代价模型
- 无法根据数据统计信息做出最优选择
- Join 顺序优化能力有限
-
硬编码优化逻辑:
- 新增优化规则需要修改多处代码
- 难以维护和扩展
- 规则之间可能存在冲突
-
搜索空间受限:
- 无法探索所有可能的等价计划
- 容易陷入局部最优
-
子查询处理能力弱:
- 子查询 Unnesting 不完整
- 相关子查询优化困难
Nereids 的设计目标
核心目标:
-
完整的 CBO 优化器:
- 基于 Cascades 框架
- 完善的代价模型和统计信息
- 自动选择最优执行计划
-
声明式规则系统:
- 模式匹配(Pattern Matching)
- 规则独立,易于扩展
- 规则正确性容易验证
-
强大的表达式优化:
- 完整的常量折叠
- 谓词推导和简化
- 公共子表达式消除
-
先进的 Join 优化:
- DPHyp 动态规划算法
- 支持 Bushy Tree
- 智能 Join 顺序选择
1.2 Cascades 框架基础
Cascades 核心思想:
三个关键概念:
-
Memo:
- 紧凑存储所有等价表达式
- 避免重复生成相同的计划
- Group 表示逻辑等价的表达式集合
-
规则驱动(Rule-Based):
- 通过应用规则生成新的等价计划
- 规则分为逻辑转换和物理实现
- 自动探索搜索空间
-
代价驱动(Cost-Based):
- 为每个物理计划估算代价
- 基于统计信息选择最优计划
- 动态规划寻找全局最优
1.3 Nereids 架构组件
核心组件:
| 组件 | 职责 | 关键类 |
|---|---|---|
| CascadesContext | 优化上下文 | CascadesContext.java |
| Memo | 存储等价表达式 | Memo.java |
| Group | 逻辑等价的表达式集合 | Group.java |
| GroupExpression | 具体的表达式实现 | GroupExpression.java |
| Rewriter | 逻辑重写 | Rewriter.java |
| Optimizer | CBO 优化 | Optimizer.java |
| CostCalculator | 代价计算 | CostCalculator.java |
| Statistics | 统计信息 | Statistics.java |
2. Memo 数据结构详解(How)
2.1 Memo 核心设计
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/memo/Memo.java
java
public class Memo {
// Group ID 生成器
private final IdGenerator<GroupId> groupIdGenerator = GroupId.createGenerator();
// 所有 Group 的集合
private final Map<GroupId, Group> groups = Maps.newLinkedHashMap();
// 所有 GroupExpression 的集合(用于去重)
private final Map<GroupExpression, GroupExpression> groupExpressions = Maps.newHashMap();
// 根 Group
private Group root;
/**
* 将 Plan 添加到 Memo 中
*/
public CopyInResult copyIn(Plan plan, @Nullable Group target, boolean rewrite) {
CopyInResult result;
if (rewrite) {
result = doRewrite(plan, target);
} else {
result = doCopyIn(plan, target, null);
}
return result;
}
}
设计要点:
-
紧凑存储:
- 所有等价表达式存储在同一个 Group 中
- 避免重复存储相同结构的计划
- 通过 groupExpressions Map 快速去重
-
引用共享:
- 子计划通过 Group 引用
- 多个 GroupExpression 可以共享子 Group
- 减少内存占用
-
版本控制:
- refreshVersion 跟踪 Memo 的变化
- 用于统计信息的增量更新
2.2 Group 设计
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/memo/Group.java
java
public class Group {
private final GroupId groupId;
// 逻辑表达式列表
private final List<GroupExpression> logicalExpressions = Lists.newArrayList();
// 物理表达式列表
private final List<GroupExpression> physicalExpressions = Lists.newArrayList();
// 逻辑属性(输出列、数据分布等)
private LogicalProperties logicalProperties;
// 最优计划缓存:PhysicalProperties -> (Cost, GroupExpression)
private final Map<PhysicalProperties, Pair<Cost, GroupExpression>> lowestCostPlans
= Maps.newLinkedHashMap();
// 统计信息
private Statistics statistics;
/**
* 添加 GroupExpression
*/
public GroupExpression addGroupExpression(GroupExpression groupExpression) {
if (groupExpression.getPlan() instanceof LogicalPlan) {
logicalExpressions.add(groupExpression);
} else {
physicalExpressions.add(groupExpression);
}
groupExpression.setOwnerGroup(this);
return groupExpression;
}
/**
* 设置最优计划
*/
public void setBestPlan(GroupExpression expression, Cost cost,
PhysicalProperties properties) {
if (lowestCostPlans.containsKey(properties)) {
if (lowestCostPlans.get(properties).first.getValue() > cost.getValue()) {
lowestCostPlans.put(properties, Pair.of(cost, expression));
}
} else {
lowestCostPlans.put(properties, Pair.of(cost, expression));
}
}
}
Group 的关键属性:
-
逻辑等价性:
- 同一 Group 中的所有 LogicalExpression 逻辑等价
- 产生相同的输出数据
- 具有相同的 LogicalProperties
-
多种物理实现:
- 一个逻辑计划可以有多种物理实现
- 例如:LogicalJoin -> HashJoin / NestLoopJoin
-
最优计划缓存:
- 针对不同的物理属性需求(排序、分布)
- 缓存对应的最优计划和代价
- 避免重复计算
2.3 GroupExpression 设计
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/memo/GroupExpression.java
java
public class GroupExpression {
private Group ownerGroup; // 所属 Group
private final List<Group> children; // 子 Group 列表
private final Plan plan; // 对应的 Plan 节点
// 规则应用标记(避免重复应用)
private final BitSet ruleMasks;
// 统计信息是否已推导
private boolean statDerived;
// 最优代价表:OutputProperties -> (Cost, ChildrenInputProperties)
private final Map<PhysicalProperties, Pair<Cost, List<PhysicalProperties>>>
lowestCostTable;
// 请求属性映射:RequestProperties -> OutputProperties
private final Map<PhysicalProperties, PhysicalProperties> requestPropertiesMap;
/**
* 更新最优代价表
*/
public boolean updateLowestCostTable(PhysicalProperties outputProperties,
List<PhysicalProperties> childrenInputProperties, Cost cost) {
if (lowestCostTable.containsKey(outputProperties)) {
if (lowestCostTable.get(outputProperties).first.getValue() > cost.getValue()) {
lowestCostTable.put(outputProperties, Pair.of(cost, childrenInputProperties));
return true;
} else {
return false;
}
} else {
lowestCostTable.put(outputProperties, Pair.of(cost, childrenInputProperties));
return true;
}
}
/**
* 检查规则是否已应用
*/
public boolean hasApplied(Rule rule) {
return ruleMasks.get(rule.getRuleType().ordinal());
}
public void setApplied(Rule rule) {
ruleMasks.set(rule.getRuleType().ordinal());
}
}
GroupExpression 的作用:
-
连接 Plan 和 Group:
- Plan:具体的算子(如 HashJoin)
- Children:子 Group 列表(不是具体 Plan)
- 共享子表达式
-
代价计算:
- lowestCostTable:针对不同输出属性的最优代价
- childrenInputProperties:对子节点的属性要求
- 支持动态规划的代价累加
-
规则应用追踪:
- ruleMasks:记录已应用的规则
- 避免重复应用相同规则
- 防止无限递归
2.4 Memo 示例
查询:
sql
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.value > 100;
Memo 结构:
yaml
Group 0: (Root)
├─ LogicalExpression 0: LogicalJoin
│ ├─ child[0] -> Group 1
│ └─ child[1] -> Group 2
├─ PhysicalExpression 0: HashJoin (Build: Group 2, Probe: Group 1)
└─ PhysicalExpression 1: NestLoopJoin
Group 1: (t1 after filter)
├─ LogicalExpression 0: LogicalFilter
│ └─ child[0] -> Group 3
└─ PhysicalExpression 0: PhysicalFilter
└─ child[0] -> Group 3 (PhysicalOlapScan)
Group 2: (t2 scan)
├─ LogicalExpression 0: LogicalOlapScan(t2)
└─ PhysicalExpression 0: PhysicalOlapScan(t2)
Group 3: (t1 scan)
├─ LogicalExpression 0: LogicalOlapScan(t1)
└─ PhysicalExpression 0: PhysicalOlapScan(t1)
等价计划展开:
通过 Memo 可以生成多种等价计划:
css
计划 1: HashJoin(PhysicalFilter(PhysicalOlapScan(t1)), PhysicalOlapScan(t2))
计划 2: NestLoopJoin(PhysicalFilter(PhysicalOlapScan(t1)), PhysicalOlapScan(t2))
计划 3: HashJoin(PhysicalOlapScan(t1) + Filter下推, PhysicalOlapScan(t2))
...
3. 优化规则系统(How)
3.1 规则分类
Nereids 中的三类规则:
3.1.1 Rewrite 规则(逻辑重写)
作用:
- 在逻辑层面转换计划
- 不改变逻辑语义,但可能提高性能
- 在 Memo 初始化之前执行
常见规则:
-
谓词下推(Predicate Pushdown):
PushDownFilterThroughProjectPushDownFilterThroughJoinPushDownFilterThroughAggregation
-
列裁剪(Column Pruning):
ColumnPruning- 移除未使用的输出列
-
常量折叠(Constant Folding):
FoldConstantRuleSimplifyArithmeticRule
-
子查询去关联(Subquery Decorrelation):
CorrelateApplyToUnCorrelateApplyApplyToJoin
示例:谓词下推
java
public class PushDownFilterThroughProject extends OneRewriteRuleFactory {
@Override
public Rule build() {
return logicalFilter(logicalProject()).then(filter -> {
LogicalProject<Plan> project = filter.child();
// 将 Filter 中的表达式替换为 Project 下方的引用
Set<Expr> newPredicates = replaceExpression(
filter.getConjuncts(),
project.getProjects()
);
// 构造新的计划:Project(Filter(...))
return new LogicalProject<>(
project.getProjects(),
new LogicalFilter<>(newPredicates, project.child())
);
});
}
}
3.1.2 Exploration 规则(探索规则)
作用:
- 生成逻辑等价的不同计划
- 扩展搜索空间
- 在 Optimizer 阶段执行
常见规则:
-
Join 交换律(Join Commute):
t1 JOIN t2->t2 JOIN t1- 改变 Join 顺序
-
Join 结合律(Join Associate):
(t1 JOIN t2) JOIN t3->t1 JOIN (t2 JOIN t3)- 生成 Bushy Tree
-
聚合下推(Aggregate Pushdown):
PushDownAggThroughJoin- 利用 Group By 列的唯一性
示例:Join 交换
java
public class JoinCommute extends OneExplorationRuleFactory {
@Override
public Rule build() {
return innerLogicalJoin().when(join -> {
// 只对 Inner Join 应用交换律
return join.getJoinType() == JoinType.INNER_JOIN;
}).then(join -> {
// 交换左右子树
return new LogicalJoin<>(
join.getJoinType(),
swapHashJoinConjuncts(join.getHashJoinConjuncts()),
join.getOtherJoinConjuncts(),
join.right(), // 交换
join.left() // 交换
);
});
}
}
3.1.3 Implementation 规则(实现规则)
作用:
- 将逻辑计划转换为物理计划
- 生成可执行的算子
- 在 Optimizer 阶段执行
常见规则:
-
Join 实现:
LogicalJoin -> PhysicalHashJoinLogicalJoin -> PhysicalNestLoopJoin
-
聚合实现:
LogicalAggregate -> PhysicalHashAggregateLogicalAggregate -> PhysicalStreamAggregate
-
扫描实现:
LogicalOlapScan -> PhysicalOlapScan
示例:HashJoin 实现
java
public class LogicalJoinToPhysicalHashJoin extends OneImplementationRuleFactory {
@Override
public Rule build() {
return logicalJoin().then(join -> {
// 选择左表为 Probe,右表为 Build
return new PhysicalHashJoin<>(
join.getJoinType(),
join.getHashJoinConjuncts(),
join.getOtherJoinConjuncts(),
join.left(), // Probe 端
join.right(), // Build 端
JoinHint.NONE
);
});
}
}
3.2 规则执行框架
3.2.1 Rule 抽象基类
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/rules/Rule.java
java
public abstract class Rule {
private final RuleType ruleType; // 规则类型
private final Pattern<? extends Plan> pattern; // 匹配模式
private final RulePromise rulePromise; // 规则优先级
/**
* 应用规则
*/
public abstract List<Plan> transform(Plan node, CascadesContext context);
/**
* 检查规则是否无效
*/
public boolean isInvalid(BitSet disableRules, GroupExpression groupExpression) {
return disableRules.get(this.getRuleType().type())
|| !groupExpression.notApplied(this)
|| !this.getPattern().matchRoot(groupExpression.getPlan());
}
}
关键设计:
-
模式匹配(Pattern):
- 声明式定义规则适用的计划结构
- 支持通配符和类型匹配
- 高效的计划筛选
-
规则优先级(RulePromise):
- ALWAYS:总是有益(如谓词下推)
- HEURISTIC:启发式有益
- COST_BASED:需要代价比较
-
应用追踪:
- 记录已应用的规则
- 避免重复应用同一规则
- 防止无限递归
3.2.2 Rewriter 执行器
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/jobs/executor/Rewriter.java
java
public class Rewriter extends AbstractBatchJobExecutor {
// Rewrite 作业列表(300+ 规则)
public static final List<RewriteJob> DEFAULT_REWRITE_JOBS = jobs(
// 第一阶段:计划标准化
topic("Plan Normalization",
topDown(
new EliminateOrderByConstant(),
new LogicalSubQueryAliasToLogicalProject(),
ExpressionNormalizationAndOptimization.FULL_RULE_INSTANCE,
new ExtractFilterFromCrossJoin()
)
),
// 第二阶段:子查询去关联
topic("Subquery unnesting",
bottomUp(new PullUpProjectUnderApply()),
topDown(new PushDownFilterThroughProject()),
custom(RuleType.AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION,
AggScalarSubQueryToWindowFunction::new),
bottomUp(
new CorrelateApplyToUnCorrelateApply(),
new ApplyToJoin()
)
),
// 第三阶段:列裁剪和谓词下推
topic("Column pruning and predicate pushdown",
custom(RuleType.COLUMN_PRUNING, ColumnPruning::new),
bottomUp(
new PushDownFilterThroughProject(),
new PushDownFilterThroughJoin(),
new MergeFilters()
)
),
// 第四阶段:消除优化
topic("Eliminate optimization",
bottomUp(
new EliminateLimit(),
new EliminateFilter(),
new EliminateJoinCondition(),
new EliminateSemiJoin()
)
)
// ... 更多阶段
);
public void execute() {
// 执行所有 Rewrite 作业
for (RewriteJob job : DEFAULT_REWRITE_JOBS) {
job.execute(cascadesContext);
}
}
}
Rewrite 阶段特点:
-
分阶段执行:
- 每个 topic 是一个优化阶段
- 不同阶段有不同的优化目标
- 阶段顺序精心设计
-
遍历策略:
- topDown:自顶向下遍历
- bottomUp:自底向上遍历
- custom:自定义遍历
-
固定点迭代:
- 重复执行直到计划不再变化
- 或达到最大迭代次数
3.3 规则示例:谓词下推
规则定义:
java
public class PushDownFilterThroughJoin extends OneRewriteRuleFactory {
@Override
public Rule build() {
return logicalFilter(
logicalJoin()
).when(filter -> {
// 只处理 Inner Join
LogicalJoin<?, ?> join = filter.child();
return join.getJoinType() == JoinType.INNER_JOIN;
}).then(filter -> {
LogicalJoin<Plan, Plan> join = filter.child();
// 分离谓词
Set<Expr> leftOnly = new HashSet<>();
Set<Expr> rightOnly = new HashSet<>();
Set<Expr> both = new HashSet<>();
for (Expr predicate : filter.getConjuncts()) {
Set<SlotReference> slots = predicate.getInputSlots();
boolean leftInput = join.left().getOutputSet().containsAll(slots);
boolean rightInput = join.right().getOutputSet().containsAll(slots);
if (leftInput && !rightInput) {
leftOnly.add(predicate);
} else if (rightInput && !leftInput) {
rightOnly.add(predicate);
} else {
both.add(predicate);
}
}
// 构造新计划
Plan newLeft = leftOnly.isEmpty()
? join.left()
: new LogicalFilter<>(leftOnly, join.left());
Plan newRight = rightOnly.isEmpty()
? join.right()
: new LogicalFilter<>(rightOnly, join.right());
Plan newJoin = new LogicalJoin<>(
join.getJoinType(),
join.getHashJoinConjuncts(),
join.getOtherJoinConjuncts(),
newLeft,
newRight
);
return both.isEmpty()
? newJoin
: new LogicalFilter<>(both, newJoin);
});
}
}
执行效果:
sql
-- 原始计划
Filter(t1.value > 100 AND t2.status = 'active')
Join(t1.id = t2.id)
Scan(t1)
Scan(t2)
-- 下推后
Join(t1.id = t2.id)
Filter(t1.value > 100)
Scan(t1)
Filter(t2.status = 'active')
Scan(t2)
4. 代价模型与统计信息(How)
4.1 代价模型设计
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/cost/Cost.java
java
public class Cost {
private final double cpuCost; // CPU 代价
private final double memoryCost; // 内存代价
private final double networkCost; // 网络代价
private final double cost; // 综合代价
public Cost(SessionVariable sessionVariable, double cpuCost,
double memoryCost, double networkCost) {
this.cpuCost = Math.max(0, cpuCost);
this.memoryCost = Math.max(0, memoryCost);
this.networkCost = Math.max(0, networkCost);
// 加权计算综合代价
CostWeight costWeight = CostWeight.get(sessionVariable);
this.cost = costWeight.cpuWeight * cpuCost
+ costWeight.memoryWeight * memoryCost
+ costWeight.networkWeight * networkCost;
}
}
代价组成:
-
CPU 代价:
- 数据处理量 × CPU 处理因子
- 考虑算子类型(Scan、Join、Agg)
- 基数估算的关键输入
-
内存代价:
- 内存占用峰值
- Hash Join 的 Build 端大小
- 聚合的中间状态大小
-
网络代价:
- Shuffle 数据量
- Broadcast 数据量
- 跨节点数据传输
4.2 CostCalculator 代价计算
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostCalculator.java
java
public class CostCalculator {
/**
* 计算 GroupExpression 的代价
*/
public static Cost calculateCost(ConnectContext connectContext,
GroupExpression groupExpression,
List<PhysicalProperties> childrenProperties) {
PlanContext planContext = new PlanContext(connectContext, groupExpression);
// 判断是否为 Broadcast Join
if (childrenProperties.size() >= 2
&& childrenProperties.get(1).getDistributionSpec()
instanceof DistributionSpecReplicated) {
planContext.setBroadcastJoin();
}
// 使用访问者模式计算代价
CostModel costModel = new CostModel(connectContext);
return groupExpression.getPlan().accept(costModel, planContext);
}
}
4.3 各算子代价计算
4.3.1 OlapScan 代价
java
public Cost visitPhysicalOlapScan(PhysicalOlapScan olapScan, PlanContext context) {
Statistics stats = olapScan.getStats();
// CPU 代价 = 扫描行数 × 扫描因子
double cpuCost = stats.getRowCount() * CPU_SCAN_COST_FACTOR;
// 网络代价 = 数据大小(压缩后)
double networkCost = stats.computeSize() * NETWORK_COST_FACTOR;
return Cost.of(sessionVariable, cpuCost, 0, networkCost);
}
4.3.2 HashJoin 代价
java
public Cost visitPhysicalHashJoin(PhysicalHashJoin<?, ?> hashJoin,
PlanContext context) {
Statistics leftStats = hashJoin.left().getStats();
Statistics rightStats = hashJoin.right().getStats();
Statistics outputStats = hashJoin.getStats();
// Build 端(右表)代价
double buildCost = rightStats.getRowCount() * CPU_BUILD_COST_FACTOR;
// Probe 端(左表)代价
double probeCost = leftStats.getRowCount() * CPU_PROBE_COST_FACTOR;
// 内存代价 = Build 端数据大小
double memoryCost = rightStats.computeSize();
// 网络代价
double networkCost = 0;
if (context.isBroadcastJoin()) {
// Broadcast Join: 右表复制到所有节点
networkCost = rightStats.computeSize() * backendNum;
} else {
// Shuffle Join: 两表都需要 Shuffle
networkCost = leftStats.computeSize() + rightStats.computeSize();
}
double cpuCost = buildCost + probeCost;
return Cost.of(sessionVariable, cpuCost, memoryCost, networkCost);
}
4.3.3 HashAggregate 代价
java
public Cost visitPhysicalHashAggregate(PhysicalHashAggregate<?, ?> agg,
PlanContext context) {
Statistics inputStats = agg.child().getStats();
Statistics outputStats = agg.getStats();
// CPU 代价 = 输入行数 × 聚合因子
double cpuCost = inputStats.getRowCount() * CPU_AGG_COST_FACTOR;
// 内存代价 = Hash 表大小(Group By 列 + 聚合状态)
double memoryCost = outputStats.getRowCount()
* (groupByColumnsSize + aggregateStateSize);
// 网络代价(两阶段聚合需要 Shuffle)
double networkCost = 0;
if (agg.getAggPhase() == AggPhase.LOCAL) {
// 本地聚合输出需要 Shuffle
networkCost = outputStats.computeSize();
}
return Cost.of(sessionVariable, cpuCost, memoryCost, networkCost);
}
4.4 统计信息(Statistics)
核心指标:
-
行数(RowCount):
- 表的总行数
- 过滤后的行数
- Join 后的行数
-
列统计信息:
- NDV(Number of Distinct Values):不同值的数量
- Min/Max:最小最大值
- Null Count:空值数量
-
数据大小(DataSize):
- 每行平均大小
- 总数据大小
4.4.1 基数估算(Cardinality Estimation)
过滤条件选择率:
java
public class FilterEstimation {
/**
* 计算等值过滤的选择率
*/
public static double estimateEqualFilter(ColumnStatistic colStats, Literal value) {
// 选择率 = 1 / NDV
return 1.0 / colStats.getNdv();
}
/**
* 计算范围过滤的选择率
*/
public static double estimateRangeFilter(ColumnStatistic colStats,
double min, double max) {
double colMin = colStats.getMin();
double colMax = colStats.getMax();
// 选择率 = (filterMax - filterMin) / (colMax - colMin)
return (Math.min(max, colMax) - Math.max(min, colMin))
/ (colMax - colMin);
}
}
Join 基数估算:
java
public class JoinEstimation {
/**
* 估算 Inner Join 的输出行数
*/
public static double estimateInnerJoin(Statistics leftStats,
Statistics rightStats,
List<Expr> joinKeys) {
double leftRows = leftStats.getRowCount();
double rightRows = rightStats.getRowCount();
// 计算 Join Key 的选择率
double selectivity = 1.0;
for (Expr joinKey : joinKeys) {
SlotReference leftSlot = extractLeftSlot(joinKey);
SlotReference rightSlot = extractRightSlot(joinKey);
double leftNdv = leftStats.getColumnStatistic(leftSlot).getNdv();
double rightNdv = rightStats.getColumnStatistic(rightSlot).getNdv();
// 选择率 = 1 / max(leftNdv, rightNdv)
selectivity *= 1.0 / Math.max(leftNdv, rightNdv);
}
// 输出行数 = leftRows × rightRows × selectivity
return leftRows * rightRows * selectivity;
}
}
5. Optimizer 优化执行(How)
5.1 Optimizer 核心流程
位置 :fe/fe-core/src/main/java/org/apache/doris/nereids/jobs/executor/Optimizer.java
java
public class Optimizer {
private final CascadesContext cascadesContext;
public void execute() {
// 1. 初始化 Memo
cascadesContext.toMemo();
// 2. 统计信息推导
cascadesContext.getMemo().getRoot().getLogicalExpressions().forEach(
groupExpression -> cascadesContext.pushJob(
new DeriveStatsJob(groupExpression, cascadesContext.getCurrentJobContext())
)
);
cascadesContext.getJobScheduler().executeJobPool(cascadesContext);
// 3. 判断是否使用 DPHyp 优化
if (isDpHyp(cascadesContext)) {
dpHypOptimize();
}
// 4. Cascades 优化
cascadesContext.pushJob(
new OptimizeGroupJob(
cascadesContext.getMemo().getRoot(),
cascadesContext.getCurrentJobContext()
)
);
cascadesContext.getJobScheduler().executeJobPool(cascadesContext);
}
/**
* 判断是否使用 DPHyp 算法
*/
public static boolean isDpHyp(CascadesContext cascadesContext) {
SessionVariable sessionVariable = cascadesContext.getConnectContext()
.getSessionVariable();
int maxTableCount = sessionVariable.getMaxTableCountUseCascadesJoinReorder();
int continuousJoinNum = Memo.countMaxContinuousJoin(
cascadesContext.getRewritePlan()
);
// Join 数量超过阈值时使用 DPHyp
return sessionVariable.enableDPHypOptimizer
|| continuousJoinNum > maxTableCount;
}
}
5.2 DPHyp Join 顺序优化
DPHyp(Dynamic Programming with Hypergraph):
核心思想:
- 使用动态规划找出最优 Join 顺序
- 支持超过 10 个表的 Join
- 生成 Bushy Tree(而非左深树)
算法流程:
实现要点:
java
public class JoinOrderJob extends Job {
/**
* DPHyp 优化
*/
public void execute() {
// 1. 提取所有 Join 节点
List<LogicalJoin> joins = extractJoins(group);
// 2. 构建超图(Hypergraph)
Hypergraph hypergraph = buildHypergraph(joins);
// 3. 动态规划计算最优 Join 顺序
Map<BitSet, JoinPlan> dpTable = new HashMap<>();
// 初始化:单表子集
for (int i = 0; i < hypergraph.getNodeCount(); i++) {
BitSet subset = new BitSet();
subset.set(i);
dpTable.put(subset, new JoinPlan(hypergraph.getNode(i)));
}
// 枚举子集大小
for (int size = 2; size <= hypergraph.getNodeCount(); size++) {
// 枚举所有大小为 size 的子集
for (BitSet subset : enumerateSubsets(size, hypergraph.getNodeCount())) {
// 枚举分割方式
for (BitSet leftSubset : enumerateSplits(subset)) {
BitSet rightSubset = (BitSet) subset.clone();
rightSubset.andNot(leftSubset);
// 检查是否可以 Join
if (hypergraph.hasEdge(leftSubset, rightSubset)) {
// 计算代价
JoinPlan leftPlan = dpTable.get(leftSubset);
JoinPlan rightPlan = dpTable.get(rightSubset);
JoinPlan newPlan = buildJoin(leftPlan, rightPlan);
// 更新最优计划
updateBestPlan(dpTable, subset, newPlan);
}
}
}
}
// 4. 获取全局最优计划
BitSet allNodes = new BitSet();
allNodes.set(0, hypergraph.getNodeCount());
JoinPlan bestPlan = dpTable.get(allNodes);
// 5. 替换原 Join 树
replaceJoinTree(group, bestPlan);
}
}
DPHyp vs Cascades:
| 特性 | Cascades | DPHyp |
|---|---|---|
| 适用场景 | < 10 个表 Join | > 10 个表 Join |
| 搜索策略 | 规则驱动 + 剪枝 | 动态规划 |
| 计划形状 | 所有可能 | 主要是 Bushy Tree |
| 时间复杂度 | 指数级(有剪枝) | O(3^n) |
| 优点 | 更灵活,探索更全面 | 更快,适合大规模 Join |
5.3 物化视图自动改写
核心思想:
- 自动识别查询可以使用的物化视图
- 改写查询计划使用物化视图
- 提高查询性能
改写流程:
改写规则示例:
java
public class MaterializedViewAggregateRule extends AbstractMaterializedViewAggregateRule {
@Override
public List<Plan> transform(Plan plan, CascadesContext context) {
LogicalAggregate<?> aggregate = (LogicalAggregate<?>) plan;
// 1. 查找匹配的物化视图
List<MTMV> matchedMVs = findMatchedMaterializedViews(
aggregate,
context.getMaterializationContexts()
);
List<Plan> results = new ArrayList<>();
for (MTMV mv : matchedMVs) {
// 2. 生成改写计划
Plan rewrittenPlan = rewrite(aggregate, mv);
if (rewrittenPlan != null) {
results.add(rewrittenPlan);
}
}
return results;
}
private Plan rewrite(LogicalAggregate<?> query, MTMV mv) {
// 检查 Group By 是否匹配
if (!matchGroupBy(query, mv)) {
return null;
}
// 检查聚合函数是否匹配
if (!matchAggregateFunctions(query, mv)) {
return null;
}
// 生成从物化视图读取的计划
return new LogicalOlapScan(
mv.getRelatedTable(),
/* ... */
);
}
}
6. NereidsPlanner 完整流程(综合)
6.1 端到端优化流程
6.2 关键优化阶段
1. Analyze 阶段:
java
private void analyze(boolean showAnalyzeProcess) {
cascadesContext.newAnalyzer().analyze();
// 主要工作:
// - 类型推导(Type Inference)
// - 名称解析(Name Resolution)
// - 子查询标准化
// - 权限检查
}
2. Rewrite 阶段:
java
private void rewrite(boolean showRewriteProcess) {
new Rewriter(cascadesContext).execute();
// 主要工作:
// - 谓词下推(Predicate Pushdown)
// - 列裁剪(Column Pruning)
// - 常量折叠(Constant Folding)
// - 子查询去关联(Decorrelation)
// - 表达式简化
}
3. Optimize 阶段:
java
private void optimize(boolean showPlanProcess) {
new Optimizer(cascadesContext).execute();
// 主要工作:
// - Memo 初始化
// - 统计信息推导
// - DPHyp Join 顺序优化
// - Exploration 规则应用
// - Implementation 规则应用
// - 代价计算和最优计划选择
}
4. PostProcess 阶段:
java
private PhysicalPlan postProcess(PhysicalPlan physicalPlan) {
return new PlanPostProcessors(cascadesContext).process(physicalPlan);
// 主要工作:
// - Runtime Filter 生成
// - Enforcer 插入(Sort、Shuffle)
// - 分布式属性传播
// - TopN Filter 生成
}
6.3 完整示例
查询:
sql
SELECT
c.region,
SUM(o.amount) as total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date >= '2024-01-01'
AND c.status = 'active'
GROUP BY c.region
HAVING SUM(o.amount) > 10000
ORDER BY total DESC
LIMIT 10;
优化过程:
css
1. Parser: SQL -> LogicalPlan
LogicalLimit(10)
LogicalSort(total DESC)
LogicalFilter(SUM(amount) > 10000)
LogicalAggregate(GROUP BY region, SUM(amount))
LogicalJoin(customer_id = id)
LogicalFilter(order_date >= '2024-01-01')
LogicalOlapScan(orders)
LogicalFilter(status = 'active')
LogicalOlapScan(customers)
2. Rewrite: 谓词下推 + 列裁剪
LogicalLimit(10)
LogicalSort(total DESC)
LogicalAggregate(GROUP BY region, SUM(amount), HAVING > 10000)
LogicalJoin(customer_id = id)
LogicalOlapScan(orders, Filter: order_date >= '2024-01-01')
LogicalOlapScan(customers, Filter: status = 'active')
3. Optimize: 物理算子选择
PhysicalTopN(10, total DESC)
PhysicalHashAggregate(GROUP BY region, SUM(amount), HAVING > 10000)
PhysicalHashJoin(Build: customers, Probe: orders, Broadcast)
PhysicalOlapScan(orders, RuntimeFilter on customer_id)
PhysicalOlapScan(customers)
4. PostProcess: Runtime Filter
PhysicalTopN
PhysicalHashAggregate
PhysicalHashJoin
PhysicalOlapScan(orders) [Apply RuntimeFilter from customers.id]
PhysicalOlapScan(customers) [Build RuntimeFilter on id]