【大数据存储与管理】实验3：熟悉常用的HBase操作

【作者主页】Francek Chen

【专栏介绍】⌈ ⌈ ⌈大数据技术原理与应用 ⌋ ⌋ ⌋专栏系统介绍大数据的相关知识，分为大数据基础篇、大数据存储与管理篇、大数据处理与分析篇、大数据应用篇。内容包含大数据概述、大数据处理架构Hadoop、分布式文件系统HDFS、分布式数据库HBase、NoSQL数据库、云数据库、MapReduce、Hadoop再探讨、数据仓库Hive、Spark、流计算、Flink、图计算、数据可视化，以及大数据在互联网领域、生物医学领域的应用和大数据的其他应用。

【GitCode】专栏资源保存在我的GitCode仓库：https://gitcode.com/Morse_Chen/BigData_principle_application。

文章目录

一、实验目的

（1）理解 HBase 在 Hadoop 体系结构中的角色。

（2）熟练使用 HBase 操作常用的 Shell 命令。

（3）熟悉 HBase 操作常用的 Java API。

二、实验平台

（1）操作系统：Ubuntu 16.04。

（2）Hadoop版本：3.3.5。

（3）HBase版本：2.5.4。

（4）JDK版本：1.8。

（5）Java IDE：Eclipse。

三、实验内容和要求

1. 编程实现以下指定功能，并用 Hadoop 提供的 HBase Shell 命令完成相同的任务。

（1）列出 HBase 所有表的相关信息，如表名、创建时间等。

① Shell 命令

shell 复制代码

hbase> list

② Java代码

java 复制代码

public static void listTables() throws IOException {
    init(); // 建立连接
    List<TableDescriptor> tableDescriptors = admin.listTableDescriptors();
    for (TableDescriptor tableDescriptor : tableDescriptors) {
        TableName tableName = tableDescriptor.getTableName();
        System.out.println("Table: " + tableName);
    }
    close(); // 关闭连接
}

（2）在终端输出指定表的所有记录数据。

① Shell 命令

shell 复制代码

hbase> scan 's1'

② Java 代码

java 复制代码

// 在终端打印出指定的表的所有记录数据
public static void getData(String tableName) throws IOException {
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    Scan scan = new Scan();
    ResultScanner scanner = table.getScanner(scan);// 获取行的遍历器
    for (Result result : scanner) {
        printRecoder(result);
    }
    close();
}

// 打印一条记录的详情
public static void printRecoder(Result result) throws IOException {
    for (Cell cell : result.rawCells()) {
        System.out.print("行健: " + new String(Bytes.toString(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength())));
        System.out.print("列簇: " + new String(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength())));
        System.out.print(" 列: " + new String(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength())));
        System.out.print(" 值: " + new String(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())));
        System.out.println("时间戳: " + cell.getTimestamp());
    }
}

（3）向已经创建好的表添加和删除指定的列族或列。

① Shell 命令

请先在 Shell 中创建表 s1，作为示例表，命令如下：

shell 复制代码

hbase> create 's1','score'

然后，可以在 s1 中添加数据，命令如下：

shell 复制代码

hbase> put 's1','zhangsan','score:Math','69'

之后，可以执行如下命令删除指定的列：

shell 复制代码

hbase> delete 's1','zhangsan','score:Math'

② Java 代码

java 复制代码

// 向表添加数据
public static void insterRow(String tableName, String rowKey, String colFamily, String col, String val) throws IOException {
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    Put put = new Put(rowKey.getBytes());
    put.addColumn(colFamily.getBytes(), col.getBytes(), val.getBytes());
    table.put(put);
    table.close();
    close();
}

// 删除数据
public static void deleRow(String tableName, String rowKey, String colFamily, String col) throws IOException {
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    Delete delete = new Delete(rowKey.getBytes());
    // 删除指定列族
    delete.addFamily(Bytes.toBytes(colFamily));
    // 删除指定列
    delete.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col));
    table.delete(delete);
    table.close();
    close();
}

（4）清空指定表的所有记录数据。

① Shell 命令

shell 复制代码

hbase> truncate 's1'

② Java 代码

java 复制代码

// 清空指定的表的所有记录数据
public static void clearRows(String tableName) throws IOException {
    init();
    TableName tablename = TableName.valueOf(tableName);
    admin.disableTable(tablename);
    admin.deleteTable(tablename);
    TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tablename);
    admin.createTable(tableDescriptor.build());
    close();
}

（5）统计表的行数。

① Shell 命令

shell 复制代码

hbase> count 's1'

② Java 代码

java 复制代码

// 统计表的行数
public static void countRows(String tableName) throws IOException {
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    Scan scan = new Scan();
    ResultScanner scanner = table.getScanner(scan);
    int num = 0;
    for (Result result = scanner.next(); result != null; result = scanner.next()) {
        num++;
    }
    System.out.println("行数:" + num);
    scanner.close();
    close();
}

2. 现有以下关系数据库中的表（见表1、表2和表3），要求将其转换为适合 HBase存储的表并插入数据。
表1 学生（Student）表

学号（S_No）	姓名（S_Name）	性别（S_Sex）	年龄（S_Age）
2015001	Zhangsan	male	23
2015002	Mary	female	22
2015003	Lisi	male	24

表2 课程（Course）表

课程号（C_No）	课程名（C_Name）	学分（C_Credit）
123001	Math	2.0
123002	Computer Science	5.0
123003	English	3.0

表3 选课（SC）表

学号（SC_Sno）	课程号（SC_Cno）	成绩（SC_Score）
2015001	123001	86
2015001	123003	69
2015002	123002	77
2015002	123003	99
2015003	123001	98
2015003	123002	95

（1）学生 Student 表

创建表的 HBase Shell 命令语句如下：

shell 复制代码

hbase> create 'Student','S_No','S_Name','S_Sex','S_Age'

插入数据的 HBase Shell 命令如下：

第一行数据：

shell 复制代码

put 'Student','s001','S_No','2015001'
put 'Student','s001','S_Name','Zhangsan'
put 'Student','s001','S_Sex','male'
put 'Student','s001','S_Age','23'

第二行数据：

shell 复制代码

put 'Student','s002','S_No','2015002'
put 'Student','s002','S_Name','Mary'
put 'Student','s002','S_Sex','female'
put 'Student','s002','S_Age','22'

第三行数据：

shell 复制代码

put 'Student','s003','S_No','2015003'
put 'Student','s003','S_Name','Lisi'
put 'Student','s003','S_Sex','male'
put 'Student','s003','S_Age','24'

（2）课程 Course 表

创建表的 HBase Shell 命令语句如下：

shell 复制代码

hbase> create 'Course','C_No','C_Name','C_Credit'

插入数据的 HBase Shell 命令如下：

第一行数据

shell 复制代码

put 'Course','c001','C_No','123001'
put 'Course','c001','C_Name','Math'
put 'Course','c001','C_Credit','2.0'

第二行数据

shell 复制代码

put 'Course','c002','C_No','123002'
put 'Course','c002','C_Name','Computer'
put 'Course','c002','C_Credit','5.0'

第三行数据

shell 复制代码

put 'Course','c003','C_No','123003'
put 'Course','c003','C_Name','English'
put 'Course','c003','C_Credit','3.0'

（3）选课表

创建表的 HBase Shell 命令语句如下：

shell 复制代码

hbase> create 'SC','SC_Sno','SC_Cno','SC_Score'

插入数据的 HBase Shell 命令如下：

第一行数据

shell 复制代码

put 'SC','sc001','SC_Sno','2015001'
put 'SC','sc001','SC_Cno','123001'
put 'SC','sc001','SC_Score','86'

第二行数据

shell 复制代码

put 'SC','sc002','SC_Sno','2015001'
put 'SC','sc002','SC_Cno','123003'
put 'SC','sc002','SC_Score','69'

第三行数据

shell 复制代码

put 'SC','sc003','SC_Sno','2015002'
put 'SC','sc003','SC_Cno','123002'
put 'SC','sc003','SC_Score','77'

第四行数据

shell 复制代码

put 'SC','sc004','SC_Sno','2015002'
put 'SC','sc004','SC_Cno','123003'
put 'SC','sc004','SC_Score','99'

第五行数据

shell 复制代码

put 'SC','sc005','SC_Sno','2015003'
put 'SC','sc005','SC_Cno','123001'
put 'SC','sc005','SC_Score','98'

第六行数据

shell 复制代码

put 'SC','sc006','SC_Sno','2015003'
put 'SC','sc006','SC_Cno','123002'
put 'SC','sc006','SC_Score','95'

同时，请编程完成以下指定功能。

① createTable(String tableName, String\[\] fields)。

创建表，参数 tableName 为表的名称，字符串数组 fields 为存储记录各个域名称的数组。要求当 HBase 已经存在名为 tableName 的表的时候，先删除原有的表，再创建新的表。

java 复制代码

public static void createTable(String tableName, String[] fields) throws IOException {
    init();
    TableName tablename = TableName.valueOf(tableName);

    if (admin.tableExists(tablename)) {
        System.out.println("table is exists!");
        admin.disableTable(tablename);
        admin.deleteTable(tablename);// 删除原来的表
    }

    TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tablename);
    for (String str : fields) {
        tableDescriptor.setColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(str)).build());
        admin.createTable(tableDescriptor.build());
    }
    close();
}

② addRecord(String tableName, String row, String\[\] fields, String\[\] values)。

向表 tableName、行 row（用S_Name表示）和字符串数组 fields 指定的单元格中添加对应的数据 values。其中，如果 fields 中每个元素对应的列族下还有相应的列限定符，用"columnFamily:column"表示。例如同时向"Math""Computer Science""English" 3 列添加成绩时，字符串数组 fields 为{"Score:Math","Score：Computer Science","Score:English"}，数组 values 存储这 3 门课的成绩。

java 复制代码

public static void addRecord(String tableName, String row, String[] fields, String[] values) throws IOException {
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    for (int i = 0; i != fields.length; i++) {
        Put put = new Put(row.getBytes());
        String[] cols = fields[i].split(":");
        put.addColumn(cols[0].getBytes(), cols[1].getBytes(), values[i].getBytes());
        table.put(put);
    }
    table.close();
    close();
}

③ scanColumn(String tableName, String column)。

浏览表 tableName 某一列的数据，如果某一行记录中该列数据不存在，则返回 null。要求当参数 column 为某一列族名称时，如果底下有若干个列限定符，则列出每个列限定符代表的列的数据；当参数 column 为某一列具体名称（如"Score:Math"）时，只需要列出该列的数据。

java 复制代码

public static void scanColumn(String tableName, String column) throws IOException {
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    Scan scan = new Scan();
    scan.addFamily(Bytes.toBytes(column));
    ResultScanner scanner = table.getScanner(scan);
    for (Result result = scanner.next(); result != null; result = scanner.next()) {
        showCell(result);
    }
    table.close();
    close();
}

// 格式化输出
public static void showCell(Result result) {
    Cell[] cells = result.rawCells();
    for (Cell cell : cells) {
        System.out.println("RowName:" + new String(Bytes.toString(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength())) + " ");
        System.out.println("Timetamp:" + cell.getTimestamp() + " ");
        System.out.println("column Family:" + new String(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength())) + " ");
        System.out.println("row Name:" + new String(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength())) + " ");
        System.out.println("value:" + new String(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())) + " ");
    }
}

④ modifyData(String tableName, String row, String column)。

修改表 tableName、行 row（可以用学生姓名S_Name表示）、列 column 指定的单元格的数据。

java 复制代码

public static void modifyData(String tableName,String row,String column,String val)throws IOException{
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    Put put = new Put(row.getBytes());
    put.addColumn(column.getBytes(),null,val.getBytes());
    table.put(put);
    table.close();
    close();
}

⑤ deleteRow(String tableName, String row)。

删除表 tableName 中 row 指定的行的记录。

java 复制代码

public static void deleteRow(String tableName, String row) throws IOException {
    init();
    Table table = connection.getTable(TableName.valueOf(tableName));
    Delete delete = new Delete(row.getBytes());
    table.delete(delete);
    table.close();
    close();
}

小结

本次实验深入理解了 HBase 在 Hadoop 生态中的角色------一种面向列的、适合海量数据实时读写的 NoSQL 数据库。通过对比 HBase Shell 命令与 Java API 操作，掌握了表的创建、数据增删改查、统计行数、清空表等核心操作。在关系模型向 HBase 模型转换的过程中，体会到了两者在设计理念上的差异：关系数据库强调范式与约束，而 HBase 以行键为唯一索引，支持稀疏存储和动态列族。实验中遇到的问题，如 API 中连接资源的正确关闭、表重建时的列族设置、删除操作中列族与列的范围区分，都加深了对 HBase 底层机制的理解。编程练习虽能完成功能，但也暴露出代码在多列批量操作、事务性等方面的局限性。总体而言，实验为后续使用 HBase 处理真实大规模数据奠定了基础。

欢迎点赞👍 | 收藏⭐ | 评论✍ | 关注🤗