ReDistribution plan细节

In a Greenplum cluster with 4 segments, when you perform a join between two tables (sales and customer) that are distributed differently, the query plan will involve redistributing data to ensure that related rows are on the same segment. Here's a detailed breakdown of how the redistribution query plan might look:

Tables and Distribution Keys

  • sales table : Distributed by sale_id.

  • customer table : Distributed by cust_id.

Query

sql 复制代码
SELECT s.sale_id, s.amount, c.cust_name
FROM sales s
JOIN customer c ON s.cust_id = c.cust_id;

Query Plan Breakdown

  1. Initial Scan:

    • Each segment scans its local portion of the sales and customer tables.

    • Segment 1 : Scans sales and customer data assigned to it.

    • Segment 2 : Scans sales and customer data assigned to it.

    • Segment 3 : Scans sales and customer data assigned to it.

    • Segment 4 : Scans sales and customer data assigned to it.

  2. Redistribute Motion:

    • Since the sales table is distributed by sale_id and the customer table is distributed by cust_id, the join condition s.cust_id = c.cust_id requires that tuples from sales be redistributed by cust_id.

    • The query plan will include a redistribute motion operator to redistribute the sales table based on cust_id.

  3. Redistribution Execution:

    • The redistribute motion operator will redistribute the sales table across all segments based on the cust_id column.

    • Each segment will receive a portion of the sales table that matches its portion of the customer table.

  4. Local Join:

    • After redistribution, each segment will perform a local join between the redistributed sales data and its local customer data.

    • Segment 1 : Joins redistributed sales data with local customer data.

    • Segment 2 : Joins redistributed sales data with local customer data.

    • Segment 3 : Joins redistributed sales data with local customer data.

    • Segment 4 : Joins redistributed sales data with local customer data.

  5. Gather Motion:

    • The results from each segment are gathered back to the master node.

    • The master node combines the results from all segments to produce the final query result.

Example Query Plan

Here's a simplified example of what the query plan might look like:

复制代码
Gather Motion 4:1  (slice1; segments: 4)
  ->  Hash Join
        Hash Cond: (s.cust_id = c.cust_id)
        ->  Redistribute Motion 4:4  (slice2; segments: 4)
            Hash Key: s.cust_id
            ->  Seq Scan on sales s
        ->  Seq Scan on customer c

Explanation

  1. Gather Motion 4:1:

    • Collects the final results from all 4 segments and combines them on the master node.
  2. Hash Join:

    • Performs a hash join on the cust_id column between the sales and customer tables.
  3. Redistribute Motion 4:4:

    • Redistributes the sales table across all 4 segments based on the cust_id column.
  4. Seq Scan on sales s:

    • Each segment performs a sequential scan on its local portion of the sales table.
  5. Seq Scan on customer c:

    • Each segment performs a sequential scan on its local portion of the customer table.

Conclusion

In this query plan, the redistribution of the sales table based on cust_id ensures that related rows are on the same segment, allowing for efficient local joins. The results from each segment are then gathered back to the master node to produce the final result. This approach leverages Greenplum's MPP architecture to achieve parallel processing and efficient query execution.

相关推荐
少云清3 分钟前
【安全测试】5_应用服务器安全性测试 _SQL注入和文件上传漏洞
数据库·sql·安全性测试
H Journey3 分钟前
Django 教程
数据库·django·sqlite
eWidget5 分钟前
核心业务系统国产化:如何破解 Oracle 迁移中的“重构代价”与“性能瓶颈”?
数据库·oracle·重构·kingbase·数据库平替用金仓·金仓数据库
yuanmenghao6 分钟前
Linux 性能实战 | 第 18 篇:ltrace 与库函数性能分析
linux·python·性能优化
lhxsir7 分钟前
oracle常用命令(DBA)
数据库·oracle·dba
Elastic 中国社区官方博客9 分钟前
Elasticsearch 8.17.2 升级到 9.2.4 完整升级过程
大数据·运维·数据库·elasticsearch·搜索引擎·全文检索·运维开发
熬了夜的程序员10 分钟前
【LeetCode】118. 杨辉三角
linux·算法·leetcode
Re.不晚12 分钟前
Redis事务
数据库·redis·php
数据知道13 分钟前
PostgreSQL:如何定期验证备份的有效性?(灾备演练)
数据库·postgresql
运维闲章印时光14 分钟前
企业跨地域互联:GRE隧道部署与互通配置
linux·服务器·网络