HBase常用的Filter过滤器操作

HBase过滤器种类很多,我们选择8种常用的过滤器进行介绍。为了获得更好的示例效果,先利用HBase Shell新建students表格,并往表格中进行写入多行数据。

一、数据准备工作

(1)在默认命名空间中新建表格students,设置列族info、score。

ruby 复制代码
hbase:002:0> create 'students','info','score'
2024-03-26 00:22:15,810 INFO  [main] client.HBaseAdmin (HBaseAdmin.java:postOperationResult(3591)) - Operation: CREATE, Table Name: default:students, procId: 290 completed
Created table students
Took 3.1425 seconds                                                                                                                   
=> Hbase::Table - students

(2)往students表格中写入5行数据,并用scan 'students'命令查看写入结果。

ruby 复制代码
hbase:005:0> put 'students','s001','info:name','Jack'
Took 30.6978 seconds
hbase:017:0> put 'students','s001','info:age','18'
Took 0.0419 seconds
hbase:019:0> put 'students','s001','score:English','95'
Took 0.0472 seconds 
hbase:021:0> put 'students','s002','info:name','Tom'
Took 0.0255 seconds                                                                                                                   
hbase:022:0> put 'students','s002','info:age','20'
Took 0.0160 seconds                                                                                                                   
hbase:023:0> put 'students','s002','score:Chinese','85'
Took 0.0296 seconds                                                                                                                   
hbase:024:0> put 'students','s002','score:Math','90'
Took 0.0155 seconds                                                                                                                   
hbase:025:0> put 'students','s003','info:name','Mike'
Took 0.0188 seconds                                                                                                                   
hbase:026:0> put 'students','s003','info:age','19'
Took 0.0183 seconds                                                                                                                   
hbase:027:0> put 'students','s003','score:Chinese','90'
Took 0.0178 seconds                                                                                                                   
hbase:028:0> put 'students','s003','score:Math','95'
Took 0.0445 seconds                                                                                                                   
hbase:029:0> put 'students','s004','info:name','Lucy'
Took 0.0104 seconds                                                                                                                   
hbase:030:0> put 'students','s004','score:English','100'
Took 0.0170 seconds                                                                                                                   
hbase:031:0> put 'students','s005','info:name','Lily'
Took 0.0249 seconds                                                                                                                   
hbase:032:0> put 'students','s005','score:Chinese','99'
Took 0.0228 seconds                                                                                                                   
hbase:033:0> scan 'students'
ROW                                COLUMN+CELL                                                                                        
 s001                              column=info:age, timestamp=2024-03-26T00:25:17.982, value=18                                       
 s001                              column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack                                    
 s001                              column=score:English, timestamp=2024-03-26T00:25:52.207, value=95                                  
 s002                              column=info:age, timestamp=2024-03-26T00:26:46.922, value=20                                       
 s002                              column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom                                     
 s002                              column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85                                  
 s002                              column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90                                     
 s003                              column=info:age, timestamp=2024-03-26T00:28:08.402, value=19                                       
 s003                              column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike                                    
 s003                              column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90                                  
 s003                              column=score:Math, timestamp=2024-03-26T00:29:01.881, value=95                                     
 s004                              column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy                                    
 s004                              column=score:English, timestamp=2024-03-26T00:29:44.831, value=100                                 
 s005                              column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily                                    
 s005                              column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99                                  
5 row(s)
Took 0.3369 seconds 

二、过滤器的使用介绍

1.ValueFilter过滤器

根据数据列单元格的值进行过滤。值过滤器的比较方式有二进制位比较(binary)和子字符串匹配比较(substring)。

(1)按二进制位进行值比较

使用get命令,查询students表格中,行键为s001,单元格值为Jack的数据结果。

ruby 复制代码
#ValueFilter(=,'binary:Jack')是值过滤器,比较方式是binary二进制
hbase:034:0> get 'students','s001',{FILTER=>"ValueFilter(=,'binary:Jack')"}
COLUMN                             CELL                                                                                               
 info:name                         timestamp=2024-03-26T00:24:39.510, value=Jack                                                      
1 row(s)
Took 0.6506 seconds

使用scan命令,扫描出students表格中,单元格值为90的数据结果。

ruby 复制代码
#查询结果是多条,需要用scan命令全表扫描,不能使用get命令
hbase:036:0> scan 'students',{FILTER=>"ValueFilter(=,'binary:90')"}
ROW                                COLUMN+CELL                                                                                        
 s002                              column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90                                     
 s003                              column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90                                  
2 row(s)
Took 0.2162 seconds

(2)按子字符串匹配比较

使用get命令,查询students表格中,行键为s001,单元格值包含子字符串ac的数据结果。

ruby 复制代码
hbase:037:0> get 'students','s001',{FILTER=>"ValueFilter(=,'substring:ac')"}
COLUMN                             CELL                                                                                               
 info:name                         timestamp=2024-03-26T00:24:39.510, value=Jack                                                      
1 row(s)
Took 0.1578 seconds 

使用scan命令,扫描出表格students中单元格值包含子字符串0的数据结果。

ruby 复制代码
#查询结果是多条,需要用scan命令全表扫描,不能使用get命令
hbase:038:0> scan 'students',{FILTER=>"ValueFilter(=,'substring:0')"}
ROW                                COLUMN+CELL                                                                                        
 s002                              column=info:age, timestamp=2024-03-26T00:26:46.922, value=20                                       
 s002                              column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90                                     
 s003                              column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90                                  
 s004                              column=score:English, timestamp=2024-03-26T00:29:44.831, value=100                                 
3 row(s)
Took 0.0868 seconds

**2.**QualifierFilter过滤器

列限定符过滤器QualifierFilter是只根据数据列的列限定符进行过滤,并不关注列族名称。列限定符过滤器的常用比较方式为二进制位(binary)比较。

使用get命令,查询students表格中,行键为s001,列限定符为name的数据结果。

ruby 复制代码
hbase:039:0> get 'students','s001',{FILTER=>"QualifierFilter(=,'binary:name')"}
COLUMN                             CELL                                                                                               
 info:name                         timestamp=2024-03-26T00:24:39.510, value=Jack                                                      
1 row(s)
Took 0.3310 seconds

使用scan命令,扫描students表格中,列限定符为name的数据结果。

ruby 复制代码
hbase:041:0> scan 'students',{FILTER=>"QualifierFilter(=,'binary:name')"}
ROW                                COLUMN+CELL                                                                                        
 s001                              column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack                                    
 s002                              column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom                                     
 s003                              column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike                                    
 s004                              column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy                                    
 s005                              column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily                                    
5 row(s)
Took 0.0845 seconds

**3.**ColumnPrefixFilter过滤器

列前缀符过滤器ColumnPrefixFilter是根据数据列的列限定符的前缀进行过滤。前缀过滤必须从第一个字符开始匹配,而子字符串过滤可以从任何位置开始进行子串匹配。前缀过滤器严格区分字母大小写。

使用get命令,查询出students表格中,行键为s002,列限定符的前缀字符串为Chi的数据结果。

ruby 复制代码
hbase:042:0> get 'students','s002',{FILTER=>"ColumnPrefixFilter('Chi')"}
COLUMN                             CELL                                                                                               
 score:Chinese                     timestamp=2024-03-26T00:27:13.181, value=85                                                        
1 row(s)
Took 0.1693 seconds 

使用scan命令,扫描students表格,列限定符的前缀字符串为Chi的数据结果。

ruby 复制代码
hbase:044:0> scan 'students',{FILTER=>"ColumnPrefixFilter('Chi')"}
ROW                                COLUMN+CELL                                                                                        
 s002                              column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85                                  
 s003                              column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90                                  
 s005                              column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99                                  
3 row(s)
Took 0.0397 seconds 

**4.**RowFilter过滤器

行键过滤器RowFilter是根据行键对数据列进行过滤。

**注意:**一般不在get命令中使用行键过滤器,get命令必须指定唯一确定完整的行键,没有必要再对行键进行过滤。

(1)按二进制位比较。

使用scan命令,扫描students表格,筛选出行键值为s001的所有数据结果。

ruby 复制代码
hbase:045:0> scan 'students',{FILTER=>"RowFilter(=,'binary:s001')"}
ROW                                COLUMN+CELL                                                                                        
 s001                              column=info:age, timestamp=2024-03-26T00:25:17.982, value=18                                       
 s001                              column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack                                    
 s001                              column=score:English, timestamp=2024-03-26T00:25:52.207, value=95                                  
1 row(s)
Took 0.1297 seconds

(2)按子字符串匹配比较。

使用scan命令,扫描students表格,筛选出行键值包含子字符串01的所有数据结果。

ruby 复制代码
hbase:046:0> scan 'students',{FILTER=>"RowFilter(=,'substring:01')"}
ROW                                COLUMN+CELL                                                                                        
 s001                              column=info:age, timestamp=2024-03-26T00:25:17.982, value=18                                       
 s001                              column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack                                    
 s001                              column=score:English, timestamp=2024-03-26T00:25:52.207, value=95                                  
1 row(s)
Took 0.3426 seconds

5.PrefixFilter过滤器

行键前缀过滤器PrefixFilter是根据行键的前缀进行过滤。前缀过滤必须从行键的第一个字符开始匹配,严格区分字母大小写。

使用scan命令,扫描students表格,筛选出行键值以s00为前缀开头的数据结果。

ruby 复制代码
hbase:047:0> scan 'students',{FILTER=>"PrefixFilter('s00')"}
ROW                                COLUMN+CELL                                                                                        
 s001                              column=info:age, timestamp=2024-03-26T00:25:17.982, value=18                                       
 s001                              column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack                                    
 s001                              column=score:English, timestamp=2024-03-26T00:25:52.207, value=95                                  
 s002                              column=info:age, timestamp=2024-03-26T00:26:46.922, value=20                                       
 s002                              column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom                                     
 s002                              column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85                                  
 s002                              column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90                                     
 s003                              column=info:age, timestamp=2024-03-26T00:28:08.402, value=19                                       
 s003                              column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike                                    
 s003                              column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90                                  
 s003                              column=score:Math, timestamp=2024-03-26T00:29:01.881, value=95                                     
 s004                              column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy                                    
 s004                              column=score:English, timestamp=2024-03-26T00:29:44.831, value=100                                 
 s005                              column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily                                    
 s005                              column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99                                  
5 row(s)
Took 0.4404 seconds

6.FamilyFilter过滤器

列族过滤器FamilyFilter是根据列族名称进行过滤。列族过滤器的比较方式有二进制位比较(binary)、子字符串匹配比较(substring)等。

(1)按二进制位比较。

使用scan命令,扫描表格students,筛选出列族名称值为info的数据结果。

ruby 复制代码
hbase:005:0> scan 'students',FILTER=>"FamilyFilter(=,'binary:info')"
ROW                   COLUMN+CELL                                               
 s001                 column=info:age, timestamp=2024-03-26T00:25:17.982, value=18                                                                         
 s001                 column=info:name, timestamp=2024-03-26T00:24:39.510, value=Jack                                                                    
 s002                 column=info:age, timestamp=2024-03-26T00:26:46.922, value=20                                                                        
 s002                 column=info:name, timestamp=2024-03-26T00:26:26.924, value=Tom                                                     
 s003                 column=info:age, timestamp=2024-03-26T00:28:08.402, value=19                                                       
 s003                 column=info:name, timestamp=2024-03-26T00:27:48.629, value=Mike                                                  
 s004                 column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy                                                    
 s005                 column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily                                                     
5 row(s)
Took 0.0399 seconds 

(2)按子字符串匹配比较。

使用scan命令,扫描表格students,筛选出列族名称包含子字符串s的数据结果。

ruby 复制代码
hbase:008:0> scan 'students',FILTER=>"FamilyFilter(=,'substring:s')"
ROW                                COLUMN+CELL                                                                                        
 s001                              column=score:English, timestamp=2024-03-26T00:25:52.207, value=95                                  
 s002                              column=score:Chinese, timestamp=2024-03-26T00:27:13.181, value=85                                  
 s002                              column=score:Math, timestamp=2024-03-26T00:27:28.787, value=90                                     
 s003                              column=score:Chinese, timestamp=2024-03-26T00:28:46.714, value=90                                  
 s003                              column=score:Math, timestamp=2024-03-26T00:29:01.881, value=95                                     
 s004                              column=score:English, timestamp=2024-03-26T00:29:44.831, value=100                                 
 s005                              column=score:Chinese, timestamp=2024-03-26T00:30:25.477, value=99                                  
5 row(s)
Took 0.0915 seconds

7.SingleColumnValueFilter过滤器

单列值过滤器SingleColumnValueFilters是根据指定列族和列限定符的单个数据列的单元格值进行过滤,类似SQL中的"select列名from表名where列名=值"语句。
(1)按二进制位比较。

使用scan命令,扫描表格students,筛选出列族info,列限定符age的单元格值为19的数据列。

ruby 复制代码
hbase:006:0> scan 'students',{COLUMN=>'info:age',FILTER=>"SingleColumnValueFilter('info','age',=,'binary:19')"}
ROW                                COLUMN+CELL                                                                                        
 s003                              column=info:age, timestamp=2024-03-26T00:28:08.402, value=19                                       
1 row(s)
Took 0.4166 seconds

(2)按子字符串匹配比较。

使用scan命令,扫描表格students,筛选出列族info,列限定符name的值包括子字符串y的数据。

ruby 复制代码
hbase:008:0> scan 'students',{COLUMN=>'info:name',FILTER=>"SingleColumnValueFilter('info','name',=,'substring:y')"}
ROW                                COLUMN+CELL                                                                                        
 s004                              column=info:name, timestamp=2024-03-26T00:29:19.868, value=Lucy                                    
 s005                              column=info:name, timestamp=2024-03-26T00:30:04.231, value=Lily                                    
2 row(s)
Took 0.0658 seconds 
相关推荐
夜泉_ly1 小时前
MySQL -安装与初识
数据库·mysql
qq_529835352 小时前
对计算机中缓存的理解和使用Redis作为缓存
数据库·redis·缓存
月光水岸New4 小时前
Ubuntu 中建的mysql数据库使用Navicat for MySQL连接不上
数据库·mysql·ubuntu
狄加山6755 小时前
数据库基础1
数据库
我爱松子鱼5 小时前
mysql之规则优化器RBO
数据库·mysql
chengooooooo5 小时前
苍穹外卖day8 地址上传 用户下单 订单支付
java·服务器·数据库
Rverdoser6 小时前
【SQL】多表查询案例
数据库·sql
Galeoto6 小时前
how to export a table in sqlite, and import into another
数据库·sqlite
人间打气筒(Ada)6 小时前
MySQL主从架构
服务器·数据库·mysql
leegong231117 小时前
学习PostgreSQL专家认证
数据库·学习·postgresql