Hive SQL业务场景:连续5天涨幅超过5%股票

一、需求描述

现有一张股票价格表 dwd_stock_trade_dtl 有3个字段分别是:

股票代码(stock_code),

日期(trade_date),

收盘价格(closing_price) 。

请找出满足连续5天以上(含)每天上涨超过5%的股票,并给出连续满足天数及开始和结束日期。

备注:不考虑停牌或其他情况,仅仅关注每天连续5天上涨超过5%的股票。

二、数据准备

1、建Hive表

sql 复制代码
DROP TABLE IF EXISTS dwd_stock_trade_dtl;
CREATE TABLE IF NOT EXISTS dwd_stock_trade_dtl (  
  stock_code STRING,  
  trade_date DATE,  
  closing_price DECIMAL(10,2)  
)  
ROW FORMAT DELIMITED  
FIELDS TERMINATED BY ','  
STORED AS TEXTFILE; 

2、插入测试数据

sql 复制代码
--样例数据插入
INSERT INTO TABLE dwd_stock_trade_dtl
VALUES 
('AAPP', '2024-02-26', 100.00),
('RAAB', '2024-02-27', 105.00),
('RAAB', '2024-02-28', 110.25),
('RAAB', '2024-03-01', 115.78),
('RAAB', '2024-03-02', 121.59),
('RAAB', '2024-03-03', 128.73),
('RAAB', '2024-03-04', 137.00),
('RAAB', '2024-03-05', 144.67),
('RAAB', '2024-03-06', 147.64),
('EWXN', '2024-02-26', 2000.00),
('EWXN', '2024-02-27', 2100.00),
('EWXN', '2024-02-28', 2205.00),
('EWXN', '2024-03-01', 2313.25),
('EWXN', '2024-03-02', 2431.01),
('EWXN', '2024-03-03', 2547.56),
('EWXN', '2024-03-04', 2680.19),
('EWXN', '2024-03-05', 2814.20),
('EWXN', '2024-03-06', 2955.91);

三、需求分析

用lag函数列出前一天的交易价格

算出每日涨幅:(今天交易价格 / 前一天交易价格)- 1

判断是否满足涨幅大于5%,满足打个flag

用row_number 函数算出连续

最后用min,max,count 函数求出连续上涨的最小日期,最大日期和天数

四、需求实现

1、用lag函数列出前一天的交易价格并算出每日涨幅:(今天交易价格 / 前一天交易价格)- 1

sql 复制代码
SELECT stock_code,
       trade_date,
       closing_price, -- -- 交易价格
       LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) as a_closing_price, -- 前一天交易价格
       (closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) ) - 1 AS daily_return -- 涨幅
FROM dwd_stock_trade_dtl;

2、判断是否满足涨幅大于5%,满足标记1,不满足0

sql 复制代码
SELECT stock_code,
       trade_date, 
       closing_price, -- -- 交易价格
       LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) as a_closing_price, -- 前一天交易价格
       (closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) ) - 1 AS daily_return, -- 涨幅
       if(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - 1 >= 0.05, 1,
          0) AS flag
FROM  dwd_stock_trade_dtl 

3、用row_number 函数算出每只股票日期排序,和每只股票是否满足条件下的日期排序,然后相减,相同则满足连续

sql 复制代码
SELECT stock_code, 
       trade_date, 
       flag, 
       row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) AS a_rn,
       row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS flag_rn,
       row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS diff_rn
FROM (
      SELECT stock_code, 
             trade_date, 
             if(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - 1 >= 0.05, 1, 0) AS flag
      FROM dwd_stock_trade_dtl
      ) a
ORDER BY stock_code, trade_date

4、最后,求出连续涨幅超过5%的开始日期,结束日期,天数。

sql 复制代码
SELECT stock_code, 
       min(trade_date) AS min_trade_date, -- 开始日期
       max(trade_date) AS max_trade_date, -- 结束日期
       count(1) AS day_cnt -- 天数
FROM (
      SELECT stock_code, 
             trade_date, 
             flag, 
             row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) AS a_rn, 
             row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS flag_rn, 
             row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS diff_rn
      FROM (
            SELECT stock_code, 
                   trade_date, 
                   if(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - 1 >= 0.05, 1, 0) AS flag
            FROM dwd_stock_trade_dtl
            ) a
      ) b
WHERE flag = 1
GROUP BY stock_code, 
         diff_rn
HAVING count(1) >= 5
相关推荐
追光天使19 天前
Hive SQL 和 SQL 的区别总结(持续更新中.....)
sql·hive sql·区别
demo1235671 年前
hivesql执行过程
hive·hive sql
追梦22221 年前
row_number() over(partition by xx order by xx desc)
数据库·根据指定字段去重·row_number
赵延东的一亩三分地1 年前
【SQL开发实战技巧】系列(十七):数据仓库中时间类型操作(初级)确定两个日期之间的工作天数、计算—年中周内各日期出现次数、确定当前记录和下一条记录之间相差的天数
sql·lag·lead·两个日期之间工作日·当前记录的上一条记录