hive 谓词下推实例分析(on与where的区别)

测试数据

  • t1 表
bash 复制代码
select * from t1;
+--------+----------+---------+--------+
| t1.id  | t1.name  | t1.age  | t1.dt  |
+--------+----------+---------+--------+
| 1      | aa       | 12      | 01     |
| 1      | aa       | 12      | 02     |
| 2      | aa       | 14      | 01     |
| 2      | bb       | 14      | 02     |
| 3      | cc       | 16      | 02     |
| NULL   | aa       | 12      | 01     |
+--------+----------+---------+--------+
  • t2 表
bash 复制代码
select * from t2;
+--------+----------+---------+--------+
| t2.id  | t2.name  | t2.age  | t2.dt  |
+--------+----------+---------+--------+
| 1      | 1        | aa      | 12     |
| 2      | 1        | aa      | 12     |
| 1      | NULL     | aa      | 12     |
| 1      | 2        | aa      | 14     |
| 2      | 2        | bb      | 14     |
| 2      | 3        | cc      | 16     |
+--------+----------+---------+--------+

关联查询

  • t1 left join t2
bash 复制代码
select * from t1 a left join t2 b on a.id=b.id;
+-------+---------+--------+-------+-------+---------+--------+-------+
| a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
+-------+---------+--------+-------+-------+---------+--------+-------+
| 3     | cc      | 16     | 02    | NULL  | NULL    | NULL   | NULL  |
| NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
| 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | 2       | aa     | 14    |
| 1     | aa      | 12     | 02    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 02    | 1     | 1       | aa     | 12    |
| 1     | aa      | 12     | 02    | 1     | 2       | aa     | 14    |
| 2     | aa      | 14     | 01    | 2     | 3       | cc     | 16    |
| 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
| 2     | aa      | 14     | 01    | 2     | 2       | bb     | 14    |
| 2     | bb      | 14     | 02    | 2     | 3       | cc     | 16    |
| 2     | bb      | 14     | 02    | 2     | 1       | aa     | 12    |
| 2     | bb      | 14     | 02    | 2     | 2       | bb     | 14    |
+-------+---------+--------+-------+-------+---------+--------+-------+
  • t1 left join t2 on a.id=b.id where a.dt ='01',保留表谓词下推,map端提前过滤
bash 复制代码
select * from t1 a left join t2 b on a.id=b.id where a.dt ='01';
+-------+---------+--------+-------+-------+---------+--------+-------+
| a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
+-------+---------+--------+-------+-------+---------+--------+-------+
| NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
| 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | 2       | aa     | 14    |
| 2     | aa      | 14     | 01    | 2     | 3       | cc     | 16    |
| 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
| 2     | aa      | 14     | 01    | 2     | 2       | bb     | 14    |
+-------+---------+--------+-------+-------+---------+--------+-------+
  • t1 left join t2 on a.id=b.id and a.dt ='01',保留表非谓词下推,reduce端过滤
bash 复制代码
select * from t1 a left join t2 b on a.id=b.id and a.dt ='01';
+-------+---------+--------+-------+-------+---------+--------+-------+
| a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
+-------+---------+--------+-------+-------+---------+--------+-------+
| 1     | aa      | 12     | 02    | NULL  | NULL    | NULL   | NULL  |
| 2     | bb      | 14     | 02    | NULL  | NULL    | NULL   | NULL  |
| 3     | cc      | 16     | 02    | NULL  | NULL    | NULL   | NULL  |
| NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
| 1     | aa      | 12     | 01    | 1     | 2       | aa     | 14    |
| 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
| 2     | aa      | 14     | 01    | 2     | 2       | bb     | 14    |
| 2     | aa      | 14     | 01    | 2     | 3       | cc     | 16    |
| 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
+-------+---------+--------+-------+-------+---------+--------+-------+

说明:保留表应用谓词下推,提前过滤,会把不符合条件的数据提前过滤掉;保留表不应用谓词下推,不提前过滤,只能在join发生时,不符合条件的数据不参与关联计算;

  • t1 left join t2 on a.id=b.id and b.dt ='12',空表谓词下推,map端过滤
bash 复制代码
select * from t1 a left join t2 b on a.id=b.id and b.dt ='12';
+-------+---------+--------+-------+-------+---------+--------+-------+
| a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
+-------+---------+--------+-------+-------+---------+--------+-------+
| 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
| 2     | bb      | 14     | 02    | 2     | 1       | aa     | 12    |
| 3     | cc      | 16     | 02    | NULL  | NULL    | NULL   | NULL  |
| NULL  | aa      | 12     | 01    | NULL  | NULL    | NULL   | NULL  |
| 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
| 1     | aa      | 12     | 02    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 02    | 1     | 1       | aa     | 12    |
+-------+---------+--------+-------+-------+---------+--------+-------+
  • t1 left join t2 on a.id=b.id where b.dt ='12',空表非谓词下推,reduce端过滤
bash 复制代码
select * from t1 a left join t2 b on a.id=b.id where b.dt ='12';
+-------+---------+--------+-------+-------+---------+--------+-------+
| a.id  | a.name  | a.age  | a.dt  | b.id  | b.name  | b.age  | b.dt  |
+-------+---------+--------+-------+-------+---------+--------+-------+
| 2     | aa      | 14     | 01    | 2     | 1       | aa     | 12    |
| 2     | bb      | 14     | 02    | 2     | 1       | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 01    | 1     | 1       | aa     | 12    |
| 1     | aa      | 12     | 02    | 1     | NULL    | aa     | 12    |
| 1     | aa      | 12     | 02    | 1     | 1       | aa     | 12    |
+-------+---------+--------+-------+-------+---------+--------+-------+

说明:空表应用谓词下推,提前过滤,会把不符合条件的数据提前过滤掉;空表不应用谓词下推,不提前过滤,只能在join完成时,过滤不符合条件的数据;

总结 : 是否应用谓词下推,最后产生的结果往往不同,这里需要特别注意。

参考:
一文弄懂Hive中谓词下推(on与where的区别)

相关推荐
FQNmxDG4S1 小时前
Java多线程编程:Thread与Runnable的并发控制
java·开发语言
虹科网络安全2 小时前
艾体宝干货|数据复制详解:类型、原理与适用场景
java·开发语言·数据库
axng pmje3 小时前
Java语法进阶
java·开发语言·jvm
rKWP8gKv73 小时前
Java微服务性能监控:Prometheus与Grafana集成方案
java·微服务·prometheus
老前端的功夫3 小时前
【Java从入门到入土】28:Stream API:告别for循环的新时代
java·开发语言·python
qq_435287923 小时前
第9章 夸父逐日与后羿射日:死循环与进程终止?十个太阳同时值班的并行冲突
java·开发语言·git·死循环·进程终止·并行冲突·夸父逐日
小江的记录本3 小时前
【Kafka核心】架构模型:Producer、Broker、Consumer、Consumer Group、Topic、Partition、Replica
java·数据库·分布式·后端·搜索引擎·架构·kafka
yaoxin5211233 小时前
397. Java 文件操作基础 - 创建常规文件与临时文件
java·开发语言·python
极客先躯5 小时前
高级java每日一道面试题-2025年11月24日-容器与虚拟化题[Dockerj]-runc 的作用是什么?
java·oci 的命令行工具·最小可用·无守护进程·完全标准·创建容器的核心流程·runc 核心职责思维导图
用户60648767188965 小时前
AI 抢不走的技能:用 Claude API 构建自动化工作流实战
java