ReDistribution plan细节

In a Greenplum cluster with 4 segments, when you perform a join between two tables (sales and customer) that are distributed differently, the query plan will involve redistributing data to ensure that related rows are on the same segment. Here's a detailed breakdown of how the redistribution query plan might look:

Tables and Distribution Keys

  • sales table : Distributed by sale_id.

  • customer table : Distributed by cust_id.

Query

sql 复制代码
SELECT s.sale_id, s.amount, c.cust_name
FROM sales s
JOIN customer c ON s.cust_id = c.cust_id;

Query Plan Breakdown

  1. Initial Scan:

    • Each segment scans its local portion of the sales and customer tables.

    • Segment 1 : Scans sales and customer data assigned to it.

    • Segment 2 : Scans sales and customer data assigned to it.

    • Segment 3 : Scans sales and customer data assigned to it.

    • Segment 4 : Scans sales and customer data assigned to it.

  2. Redistribute Motion:

    • Since the sales table is distributed by sale_id and the customer table is distributed by cust_id, the join condition s.cust_id = c.cust_id requires that tuples from sales be redistributed by cust_id.

    • The query plan will include a redistribute motion operator to redistribute the sales table based on cust_id.

  3. Redistribution Execution:

    • The redistribute motion operator will redistribute the sales table across all segments based on the cust_id column.

    • Each segment will receive a portion of the sales table that matches its portion of the customer table.

  4. Local Join:

    • After redistribution, each segment will perform a local join between the redistributed sales data and its local customer data.

    • Segment 1 : Joins redistributed sales data with local customer data.

    • Segment 2 : Joins redistributed sales data with local customer data.

    • Segment 3 : Joins redistributed sales data with local customer data.

    • Segment 4 : Joins redistributed sales data with local customer data.

  5. Gather Motion:

    • The results from each segment are gathered back to the master node.

    • The master node combines the results from all segments to produce the final query result.

Example Query Plan

Here's a simplified example of what the query plan might look like:

复制代码
Gather Motion 4:1  (slice1; segments: 4)
  ->  Hash Join
        Hash Cond: (s.cust_id = c.cust_id)
        ->  Redistribute Motion 4:4  (slice2; segments: 4)
            Hash Key: s.cust_id
            ->  Seq Scan on sales s
        ->  Seq Scan on customer c

Explanation

  1. Gather Motion 4:1:

    • Collects the final results from all 4 segments and combines them on the master node.
  2. Hash Join:

    • Performs a hash join on the cust_id column between the sales and customer tables.
  3. Redistribute Motion 4:4:

    • Redistributes the sales table across all 4 segments based on the cust_id column.
  4. Seq Scan on sales s:

    • Each segment performs a sequential scan on its local portion of the sales table.
  5. Seq Scan on customer c:

    • Each segment performs a sequential scan on its local portion of the customer table.

Conclusion

In this query plan, the redistribution of the sales table based on cust_id ensures that related rows are on the same segment, allowing for efficient local joins. The results from each segment are then gathered back to the master node to produce the final result. This approach leverages Greenplum's MPP architecture to achieve parallel processing and efficient query execution.

相关推荐
我有医保我先冲6 小时前
SQL复杂查询与性能优化:医药行业ERP系统实战指南
数据库·sql·性能优化
TDD_06286 小时前
【运维】Centos硬盘满导致开机时处于加载状态无法开机解决办法
linux·运维·经验分享·centos
x66ccff6 小时前
vLLM 启动 GGUF 模型踩坑记:从报错到 100% GPU 占用的原因解析
linux
阳光_你好6 小时前
详细说明Qt 中共享内存方法: QSharedMemory 对象
开发语言·数据库·qt
William.csj7 小时前
Linux——开发板显示器显示不出来,vscode远程登录不进去,内存满了的解决办法
linux·vscode
喝醉酒的小白7 小时前
MySQL响应慢是否由堵塞或死锁引起?
数据库
Pasregret7 小时前
04-深入解析 Spring 事务管理原理及源码
java·数据库·后端·spring·oracle
KeithTsui7 小时前
GCC RISCV 后端 -- 控制流(Control Flow)的一些理解
linux·c语言·开发语言·c++·算法
森叶7 小时前
linux如何与windows进行共享文件夹开发,不用来回用git进行拉来拉去,这个对于swoole开发者来说特别重要
linux·git·swoole
jnrjian7 小时前
归档重做日志archived log (明显) 比redo log重做日志文件小
数据库·oracle