ReDistribution plan细节

In a Greenplum cluster with 4 segments, when you perform a join between two tables (sales and customer) that are distributed differently, the query plan will involve redistributing data to ensure that related rows are on the same segment. Here's a detailed breakdown of how the redistribution query plan might look:

Tables and Distribution Keys

  • sales table : Distributed by sale_id.

  • customer table : Distributed by cust_id.

Query

sql 复制代码
SELECT s.sale_id, s.amount, c.cust_name
FROM sales s
JOIN customer c ON s.cust_id = c.cust_id;

Query Plan Breakdown

  1. Initial Scan:

    • Each segment scans its local portion of the sales and customer tables.

    • Segment 1 : Scans sales and customer data assigned to it.

    • Segment 2 : Scans sales and customer data assigned to it.

    • Segment 3 : Scans sales and customer data assigned to it.

    • Segment 4 : Scans sales and customer data assigned to it.

  2. Redistribute Motion:

    • Since the sales table is distributed by sale_id and the customer table is distributed by cust_id, the join condition s.cust_id = c.cust_id requires that tuples from sales be redistributed by cust_id.

    • The query plan will include a redistribute motion operator to redistribute the sales table based on cust_id.

  3. Redistribution Execution:

    • The redistribute motion operator will redistribute the sales table across all segments based on the cust_id column.

    • Each segment will receive a portion of the sales table that matches its portion of the customer table.

  4. Local Join:

    • After redistribution, each segment will perform a local join between the redistributed sales data and its local customer data.

    • Segment 1 : Joins redistributed sales data with local customer data.

    • Segment 2 : Joins redistributed sales data with local customer data.

    • Segment 3 : Joins redistributed sales data with local customer data.

    • Segment 4 : Joins redistributed sales data with local customer data.

  5. Gather Motion:

    • The results from each segment are gathered back to the master node.

    • The master node combines the results from all segments to produce the final query result.

Example Query Plan

Here's a simplified example of what the query plan might look like:

复制代码
Gather Motion 4:1  (slice1; segments: 4)
  ->  Hash Join
        Hash Cond: (s.cust_id = c.cust_id)
        ->  Redistribute Motion 4:4  (slice2; segments: 4)
            Hash Key: s.cust_id
            ->  Seq Scan on sales s
        ->  Seq Scan on customer c

Explanation

  1. Gather Motion 4:1:

    • Collects the final results from all 4 segments and combines them on the master node.
  2. Hash Join:

    • Performs a hash join on the cust_id column between the sales and customer tables.
  3. Redistribute Motion 4:4:

    • Redistributes the sales table across all 4 segments based on the cust_id column.
  4. Seq Scan on sales s:

    • Each segment performs a sequential scan on its local portion of the sales table.
  5. Seq Scan on customer c:

    • Each segment performs a sequential scan on its local portion of the customer table.

Conclusion

In this query plan, the redistribution of the sales table based on cust_id ensures that related rows are on the same segment, allowing for efficient local joins. The results from each segment are then gathered back to the master node to produce the final result. This approach leverages Greenplum's MPP architecture to achieve parallel processing and efficient query execution.

相关推荐
疯狂的挖掘机2 小时前
记一次基于QT的图片操作处理优化思路(包括在图上放大缩小,截图,画线,取值等)
开发语言·数据库·qt
张火火isgudi3 小时前
fedora43 安装 nvidia 驱动以及开启视频编解码硬件加速
linux·运维·视频编解码·nvidia
赋创小助手3 小时前
融合与跃迁:NVIDIA、Groq 与下一代 AI 推理架构的博弈与机遇
服务器·人工智能·深度学习·神经网络·语言模型·自然语言处理·架构
奇树谦3 小时前
Qt | 利用map创建多个线程和定时器
网络·数据库·qt
用户47949283569154 小时前
性能提升 4000%!我是如何解决 运营看板 不能跨库&跨库查询慢这个难题的
数据库·后端·postgresql
电商API&Tina4 小时前
跨境电商 API 对接指南:亚马逊 + 速卖通接口调用全流程
大数据·服务器·数据库·python·算法·json·图搜索算法
IT19954 小时前
Qt笔记-使用SSH2进行远程连接linux服务器并上传文件
linux·服务器·笔记
XXYBMOOO4 小时前
内核驱动开发与用户级驱动开发:深度对比与应用场景解析
linux·c++·驱动开发·嵌入式硬件·fpga开发·硬件工程
robinson19884 小时前
验证崖山数据库标量子查询是否带有CACHE功能
数据库·oracle·cache·自定义函数·崖山·标量子查询
老华带你飞4 小时前
农产品销售管理|基于java + vue农产品销售管理系统(源码+数据库+文档)
java·开发语言·前端·数据库·vue.js·spring boot·后端