ReDistribution plan细节

In a Greenplum cluster with 4 segments, when you perform a join between two tables (sales and customer) that are distributed differently, the query plan will involve redistributing data to ensure that related rows are on the same segment. Here's a detailed breakdown of how the redistribution query plan might look:

Tables and Distribution Keys

  • sales table : Distributed by sale_id.

  • customer table : Distributed by cust_id.

Query

sql 复制代码
SELECT s.sale_id, s.amount, c.cust_name
FROM sales s
JOIN customer c ON s.cust_id = c.cust_id;

Query Plan Breakdown

  1. Initial Scan:

    • Each segment scans its local portion of the sales and customer tables.

    • Segment 1 : Scans sales and customer data assigned to it.

    • Segment 2 : Scans sales and customer data assigned to it.

    • Segment 3 : Scans sales and customer data assigned to it.

    • Segment 4 : Scans sales and customer data assigned to it.

  2. Redistribute Motion:

    • Since the sales table is distributed by sale_id and the customer table is distributed by cust_id, the join condition s.cust_id = c.cust_id requires that tuples from sales be redistributed by cust_id.

    • The query plan will include a redistribute motion operator to redistribute the sales table based on cust_id.

  3. Redistribution Execution:

    • The redistribute motion operator will redistribute the sales table across all segments based on the cust_id column.

    • Each segment will receive a portion of the sales table that matches its portion of the customer table.

  4. Local Join:

    • After redistribution, each segment will perform a local join between the redistributed sales data and its local customer data.

    • Segment 1 : Joins redistributed sales data with local customer data.

    • Segment 2 : Joins redistributed sales data with local customer data.

    • Segment 3 : Joins redistributed sales data with local customer data.

    • Segment 4 : Joins redistributed sales data with local customer data.

  5. Gather Motion:

    • The results from each segment are gathered back to the master node.

    • The master node combines the results from all segments to produce the final query result.

Example Query Plan

Here's a simplified example of what the query plan might look like:

复制代码
Gather Motion 4:1  (slice1; segments: 4)
  ->  Hash Join
        Hash Cond: (s.cust_id = c.cust_id)
        ->  Redistribute Motion 4:4  (slice2; segments: 4)
            Hash Key: s.cust_id
            ->  Seq Scan on sales s
        ->  Seq Scan on customer c

Explanation

  1. Gather Motion 4:1:

    • Collects the final results from all 4 segments and combines them on the master node.
  2. Hash Join:

    • Performs a hash join on the cust_id column between the sales and customer tables.
  3. Redistribute Motion 4:4:

    • Redistributes the sales table across all 4 segments based on the cust_id column.
  4. Seq Scan on sales s:

    • Each segment performs a sequential scan on its local portion of the sales table.
  5. Seq Scan on customer c:

    • Each segment performs a sequential scan on its local portion of the customer table.

Conclusion

In this query plan, the redistribution of the sales table based on cust_id ensures that related rows are on the same segment, allowing for efficient local joins. The results from each segment are then gathered back to the master node to produce the final result. This approach leverages Greenplum's MPP architecture to achieve parallel processing and efficient query execution.

相关推荐
悟能不能悟1 小时前
redis的红锁
数据库·redis·缓存
一个响当当的名号1 小时前
一些主要应用和NAT
运维·服务器·网络
@小博的博客1 小时前
【Linux探索学习】第二篇Linux的基本指令(2)——开启Linux学习第二篇
linux·运维·学习
安当加密3 小时前
MySQL数据库透明加密(TDE)解决方案:基于国密SM4的合规与性能优化实践
数据库·mysql·性能优化
openHiTLS密码开源社区3 小时前
【密码学实战】openHiTLS passwd命令行:专业密码哈希生成工具
linux·密码学·哈希算法·ldap·密码策略·随机盐值
筵陌4 小时前
深入理解 Reactor 反应堆模式:高性能网络编程的核心
服务器
WTCLLB4 小时前
netgear r6220 路由器,刷openwrt后,系统备份还原
linux·网络·智能路由器·openwrt
JH30734 小时前
第七篇:Buffer Pool 与 InnoDB 其他组件的协作
java·数据库·mysql·oracle
板凳坐着晒太阳4 小时前
ClickHouse 配置优化与问题解决
数据库·clickhouse
数据库生产实战4 小时前
解析Oracle 19C中并行INSERT SELECT的工作原理
数据库·oracle