Hive的使用 - 技术栈

1、建库

bash 复制代码

hive>  create database `scott`;
OK
Time taken: 0.281 seconds
hive> show databases;
OK
default
scott
Time taken: 0.02 seconds, Fetched: 2 row(s)
hive> use scott;
OK
Time taken: 0.045 seconds
hive> show tables;
OK
Time taken: 0.041 seconds
hive>

2、建表

bash 复制代码

create table if not exists scott.emp(
empno int,
ename string,
job string,
mgr int,
hiredate date,
sal double,
comm double,
deptno int
)
comment 'scott'                 //表的备注
row format delimited fields terminated by '\t'  //指定列分隔符'\t'
lines terminated by '\n'                      //指定行分割符
stored as textfile                           //存储的文件格式
location '/user/hive/warehouse/scott.db/emp';   //制定数据库创建的目录
hive> create table if not exists scott.emp(
    > empno int,
    > ename string,
    > job string,
    > mgr int,
    > hiredate date,
    > sal double,
    > comm double,
    > deptno int
    > )
    > comment 'scott'
    > row format delimited fields terminated by '\t'
    > lines terminated by '\n' 
    > stored as textfile 
    > location '/user/hive/warehouse/scott.db/emp';
OK
Time taken: 0.241 seconds
hive> create table if not exists scott.dept(
    > deptno int,
    > dname string,
    > loc string
    > )
    > comment 'scott.dept'
    > row format delimited fields terminated by '\t'
    > lines terminated by '\n' 
    > stored as textfile 
    > location '/user/hive/warehouse/scott.db/dept';
OK
Time taken: 0.09 seconds
hive> show tables;
OK
dept
emp
Time taken: 0.019 seconds, Fetched: 2 row(s)

3、上传数据

本地上传

bash 复制代码

load data local inpath '/tmp/emp.csv' into table scott.emp; 
load data local inpath '/tmp/dept.csv' into table scott.dept;

bash 复制代码

hive> load data local inpath '/tmp/emp.csv' into table scott.emp; 
Loading data to table scott.emp
OK
Time taken: 1.369 seconds
hive> load data local inpath '/tmp/dept.csv' into table scott.dept; 
Loading data to table scott.dept
OK
Time taken: 0.621 seconds

4、查询

bash 复制代码

hive> select * from emp e , dept d where e.deptno = d.deptno;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) or 
using Hive 1.X releases.
Query ID = root_20260325152412_91d4c949-38d4-403b-b2d7-20bd9bf29f5c
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/hive-2.1.1/lib/log4j-slf4j-impl-
2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop-
2.7.1/share/hadoop/common/lib/slf4j-log4j12-
1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2026-03-25 15:24:19     Starting to launch local task to process map join;      
maximum memory = 477626368
2026-03-25 15:24:20     Dump the side-table for tag: 1 with group count: 4 into 
file: file:/opt/bigdata/hive-2.1.1/temp/root/daa2520c-4797-47ab-b0bd-
e027ebf2e63c/hive_2026-03-25_15-24-12_809_1603657098055148777-1/-local-
10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2026-03-25 15:24:20     Uploaded 1 File to: file:/opt/bigdata/hive-
2.1.1/temp/root/daa2520c-4797-47ab-b0bd-e027ebf2e63c/hive_2026-03-25_15-24-
12_809_1603657098055148777-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01-
-.hashtable (404 bytes)
2026-03-25 15:24:20     End of local task; Time Taken: 1.559 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1774421907259_0001, Tracking URL = 
http://c001:8099/proxy/application_1774421907259_0001/
Kill Command = /opt/bigdata/hadoop-2.7.1/bin/hadoop job  -kill 
job_1774421907259_0001
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2026-03-25 15:24:55,911 Stage-3 map = 0%,  reduce = 0%
2026-03-25 15:25:04,638 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.89 sec
MapReduce Total cumulative CPU time: 2 seconds 890 msec
Ended Job = job_1774421907259_0001
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 2.89 sec   HDFS Read: 8648 HDFS Write: 
1198 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 890 msec
OK
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20      20      
RESEARCH        DALLAS
7499    ALLEN   SALESMAN        7698    1981-02-20      1600.0  300.0   30     
 30      SALES   CHICAGO
7521    WARD    SALESMAN        7698    1981-02-22      1250.0  500.0   30     
 30      SALES   CHICAGO
7566    JONES   MANAGER 7839    1981-04-02      2975.0  NULL    20      20      
RESEARCH        DALLAS
7654    MARTIN  SALESMAN        7698    1981-09-28      1250.0  1400.0  30     
 30      SALES   CHICAGO
7698    BLAKE   MANAGER 7839    1981-05-01      2850.0  NULL    30      30      
SALES   CHICAGO
7782    CLARK   MANAGER 7839    1981-06-09      2450.0  NULL    10      10      
ACCOUNTING      NEW YORK
7788    SCOTT   ANALYST 7566    1987-07-13      3000.0  NULL    20      20      
RESEARCH        DALLAS
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10     
 10      ACCOUNTING      NEW YORK
7844    TURNER  SALESMAN        7698    1981-09-08      1100.0  0.0     30     
 30      SALES   CHICAGO
7876    ADAMS   CLERK   7788    1987-07-13      1100.0  NULL    20      20      
RESEARCH        DALLAS
7900    JAMES   CLERK   7698    1981-12-03      950.0   NULL    30      30      
SALES   CHICAGO
7902    FORD    ANALYST 7566    1981-12-03      3000.0  NULL    20      20      
RESEARCH        DALLAS
7934    MILLER  CLERK   7782    1982-01-23      1300.0  NULL    10      10      
ACCOUNTING      NEW YORK
Time taken: 54.091 seconds, Fetched: 14 row(s)
hive> select max(sal) maxsal , min(sal) minsal from emp group by deptno;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) or 
using Hive 1.X releases.
Query ID = root_20260325152658_622daa51-1a5d-423f-856d-a58f541fc784
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1774421907259_0002, Tracking URL = 
http://c001:8099/proxy/application_1774421907259_0002/
Kill Command = /opt/bigdata/hadoop-2.7.1/bin/hadoop job  -kill 
job_1774421907259_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2026-03-25 15:27:29,539 Stage-1 map = 0%,  reduce = 0%
2026-03-25 15:27:37,570 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2026-03-25 15:27:44,254 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.64 
sec
MapReduce Total cumulative CPU time: 3 seconds 640 msec
Ended Job = job_1774421907259_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.64 sec   HDFS Read: 10832 
HDFS Write: 163 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 640 msec
OK
5000.0  1300.0
3000.0  800.0
2850.0  950.0
Time taken: 47.891 seconds, Fetched: 3 row(s)

5、新增

bash 复制代码

)
hive> insert into scott.dept (deptno,dname,loc) values (50,"gong guan","tsrj");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) or 
using Hive 1.X releases.
Query ID = root_20260325153147_27aedfad-6e1e-4c82-b961-16baf00fde88
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1774421907259_0003, Tracking URL = 
http://c001:8099/proxy/application_1774421907259_0003/
Kill Command = /opt/bigdata/hadoop-2.7.1/bin/hadoop job  -kill 
job_1774421907259_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2026-03-25 15:32:08,785 Stage-1 map = 0%,  reduce = 0%
2026-03-25 15:32:15,979 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.26 sec
MapReduce Total cumulative CPU time: 2 seconds 260 msec
Ended Job = job_1774421907259_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory 
hdfs://192.168.100.101:9000/user/hive/warehouse/scott.db/dept/.hive-
staging_hive_2026-03-25_15-31-47_882_4932928533857558034-1/-ext-10000
Loading data to table scott.dept
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.26 sec   HDFS Read: 4324 HDFS Write: 84 
SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 260 msec
OK
Time taken: 30.66 seconds
hive> select * from dept;
OK
50      gong guan       tsrj
10      ACCOUNTING      NEW YORK
20      RESEARCH        DALLAS
30      SALES   CHICAGO
40      OPERATIONS      BOSTON
Time taken: 0.143 seconds, Fetched: 5 row(s)

hive可以新增数据，但是不可以删除和修改，部分sql语句和复杂的子查询同样不支持。

注意：

insert语句，每执行一次相当于增加一个文件

所有相同行为，可以通过hive上传文件（推荐）或者直接将输入放入hdfs的hive目录中，两张

方式任选。