查询hive指定数据库下所有表的建表语句并生成数据字典

功能:查询hive指定数据库下所有表的建表语句并生成数据字典

处理前:

复制代码

| CREATE TABLE `test_db.customer`(                                    |
|   `c_name` string COMMENT '姓名',                                   |
|   `c_gender` string COMMENT '性别',                                 |
|   `c_type` string COMMENT '证件类型')                               |
| ROW FORMAT SERDE                                                    |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'     |
| WITH SERDEPROPERTIES (                                              |
|   'field.delim'='|',                                                |
|   'serialization.format'='|')                                       |
| STORED AS INPUTFORMAT                                               |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'   |
| OUTPUTFORMAT                                                        |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'  |
| LOCATION                                                            |
|   'hdfs://hacluster/user/hive/warehouse/test_db.db/customer'        |
| TBLPROPERTIES (                                                     |
|   'bucketing_version'='2',                                          |
|   'parquet.compression'='gzip',                                     |
|   'transient_lastDdlTime'='1735109698')                             |
| ;

处理后:生成数据字段如下

复制代码

属主    表名    字段名  字段类型        字段注释        是否外表        是否分区字段
test_db customer        c_name  string  姓名    否      否
test_db customer        c_gender        string  性别    否      否
test_db customer        c_type  string  证件类型        否      否

处理步骤及代码逻辑如下:

查询hive指定数据库下所有表的建表语句

cat hive_ddl.sh
#!/bin/bash

********************************************

file_name： hive_ddl.sh

Func：查询hive指定数据库下所有表的建表语句

Author: wx.yangpg

create_date: 2025-02-01

modify_info:

version : V1.0

execution：sh hive_ddl.sh

********************************************

#加载环境
source ~/.bash_profile

#数据库名可通过传参的方式来执行
dbname= $1 #查询出该数据库下所有的表,并写入配置文件 beeline --silent=true -e "show tables in$ {dbname}" | grep '|' > show_tables.txt
sed -i 's/|//g' show_tables.txt
sed -i 's/ //g' show_tables.txt

#保存hive的建表语句
[ -e create_table.txt ] && rm create_table.txt

#遍历配置文件,查询建表语句
for table in awk 'NR>1' show_tables.txt
do
beeline --silent=true -e "show create table ${dbname}.$ {table}" >> create_table.txt
echo '| ;' >> create_table.txt
done

#初步处理建表语句
cat create_table.txt | grep '|' | grep -v 'createtab_stmt' > create_table2.txt
执行shell命令,查看生成的建表语句

#执行shell命令
sh hive_ddl.sh test_db

#查看生成的建表语句
head create_table2.txt
| CREATE TABLE test_db.customer( |
| c_name string COMMENT '姓名', |
| c_gender string COMMENT '性别', |
| c_type string COMMENT '证件类型') |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' |
| WITH SERDEPROPERTIES ( |
| 'field.delim'='|', |
| 'serialization.format'='|') |
| STORED AS INPUTFORMAT |

使用java处理建表语句,代码逻辑如下

package com.ods.sqoop.common;

import java.io.*;

public class DealHiveDDL {
public static void main(String[] args) throws Exception {
FileInputStream fileInputStream = new FileInputStream(args[0]);
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);

复制代码

     BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(args[1])));
     String line;
     String owner = "";
     String tablename = "";
     String column_name = "";
     String data_type = "";
     String comments = "";
     //是否外表标识
     String is_external = "否";
     //是否分区表标识
     String is_partition = "否";
     StringBuilder sb = new StringBuilder();
     while ((line = bufferedReader.readLine()) != null) {
         //获取数据库名,表名,是否外表标识
         if (line.startsWith("| CREATE")) {
             if (line.indexOf("EXTERNAL") > 0) {
                 owner = line.split("\\s+")[4].split("\\.")[0];
                 tablename = line.split("\\s+")[4].split("\\.")[1];
                 is_external = "是";
             } else {
                 owner = line.split("\\s+")[3].split("\\.")[0];
                 tablename = line.split("\\s+")[3].split("\\.")[1];
             }
         } else if (line.startsWith("| PARTITIONED")) { //获取是否分区字段
             is_partition = "是";
         } else if (line.startsWith("|   `")) { //获取字段名,字段类型,字段注释
             column_name = line.split("\\s+")[1];
             data_type = line.split("\\s+")[2];
             if (line.indexOf("COMMENT") > 0) {
                 comments = line.split("\\s+")[4];
             }
             //owner tablename column_name data_type comments is_external is_partition
             sb.append(owner).append("\t")
                     .append(tablename).append("\t")
                     .append(column_name).append("\t")
                     .append(data_type).append("\t")
                     .append(comments).append("\t")
                     .append(is_external).append("\t")
                     .append(is_partition).append("\t").append("\n");
         }
     }
     String replacement = sb.toString().replaceAll("`","")
             .replaceAll("\\(","")
             .replaceAll("\\)","")
             .replaceAll("\\'","")
             .replaceAll("\\,", "");
     String result = "属主\t表名\t字段名\t字段类型\t字段注释\t是否外表\t是否分区字段\n" + replacement;
     bufferedWriter.write(result);
     bufferedReader.close();
     bufferedWriter.close();
 }

}

执行jar包生成最终需要的数据字典

#执行jar包
java -cp ods_etl.jar com.tpiods.sqoop.common.DealHiveDDL create_table2.txt result2.txt

#查看最终生成的数据字典
head result2.txt
属主表名字段名字段类型字段注释是否外表是否分区字段
test_db customer c_name string 姓名否否
test_db customer c_gender string 性别否否
test_db customer c_type string 证件类型否否

查询hive指定数据库下所有表的建表语句并生成数据字典

********************************************

file_name： hive_ddl.sh

Func：查询hive指定数据库下所有表的建表语句

Author: wx.yangpg

create_date: 2025-02-01

modify_info:

version : V1.0

execution：sh hive_ddl.sh

********************************************