项目场景:
查看测试表test_1,发现表字段classes里面有null值,过滤null值。
sql
--查看
> select * from test_1;
+------------+-----------------+
| test_1.id | test_1.classes |
+------------+-----------------+
| Mary | class 1 |
| James | class 2 |
| lily | null |
| Mike | NULL |
| Herry | class 1 |
+------------+-----------------+
问题描述
使用where classes is null过滤没有成功。
sql
> select * from test_1 where classes is null;
> select * from test_1 where classes is NULL;
> select * from test_1 where classes is not null;
> select * from test_1 where classes is not NULL;
--运行结果:
+------------+-----------------+
| test_1.id | test_1.classes |
+------------+-----------------+
+------------+-----------------+
运行的结果都是为空的,并没有将classes为null或者NULL对应的id过滤出来。
原因分析:
使用 is null / is not null 对string类型字段进行过滤无效。
sql
--查看表结构
> desc test_1;
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | string | |
| classes | string | |
+-----------+------------+----------+
可以看到classes的类型是string,hive的底层保存的是'null'、'NULL'是个字符串,想要过滤掉null或者NULL值,使用is not null无效。
解决方案:
对于字符串字段,使用 ='null',='NULL',!= 'null',!= 'NULL' 进行过滤。
sql
> select * from test_1 where classes = 'null';
+------------+-----------------+
| test_1.id | test_1.classes |
+------------+-----------------+
| lily | null |
+------------+-----------------+
> select * from test_1 where classes = 'NULL';
+------------+-----------------+
| test_1.id | test_1.classes |
+------------+-----------------+
| Mike | NULL |
+------------+-----------------+
> select * from test_1 where classes != 'null';
+------------+-----------------+
| test_1.id | test_1.classes |
+------------+-----------------+
| Mary | class 1 |
| James | class 2 |
| Mike | NULL |
| Herry | class 1 |
+------------+-----------------+
> select * from test_1 where classes != 'NULL';
+------------+-----------------+
| test_1.id | test_1.classes |
+------------+-----------------+
| Mary | class 1 |
| James | class 2 |
| lily | null |
| Herry | class 1 |
+------------+-----------------+