餐饮供应链的数仓设计思考 (六) 数据分析与业务预测方案

目录

  1. 方案概述
  2. 门店运营深度分析
  3. 市场营销数据赋能
  4. 供应链优化分析
  5. 新店选址建模方案
  6. 数据文化与自助分析

方案概述

1.1 数据赋能的核心目标

makefile 复制代码
从数据仓库到业务赋能的价值链:

第一阶段: 数据集成 (已完成)
├─ 统一数据、消除孤岛
├─ POS/供应链/CRM数据整合
└─ 为后续分析奠定基础

第二阶段: 深度分析 (本文核心)
├─ 发现规律、识别机会、预测未来
├─ 四大方向: 门店、营销、供应链、选址
└─ 统计、建模、预测、因果分析

第三阶段: 应用创新 (后续文档)
├─ 产品化、自动化、智能化
├─ BI系统、驾驶舱、决策引擎
└─ 赋能业务部门自助决策

最终成果: 数据驱动的灵活高效餐饮帝国

1.2 四大业务赋能方向

1️⃣ 门店运营精细化管理

  • 销售预测 (日/周级精度)
  • 人效优化 (班次排班、菜品搭配)
  • 客流分析 (高峰预警、客源构成)
  • 菜品贡献度 (销售额、利润贡献)
  • 💰 预期收益: 人效 ↑8-12%, 客单价 ↑5-8%

2️⃣ 精准营销与会员运营

  • 用户画像 (消费行为、偏好分析)
  • 精准营销 (目标客群、推荐产品)
  • 会员生命周期 (新增、活跃、流失)
  • 复购率提升 (挽回、激活、升值)
  • 💰 预期收益: 复购率 ↑15-25%, 客价值 ↑30%

3️⃣ 供应链全流程优化

  • 需求预测 (食材、库存)
  • 库存优化 (最小库存、周转率)
  • 采购优化 (品质、成本、配送)
  • 损耗控制 (浪费率监测)
  • 💰 预期收益: 库存 ↓15-20%, 损耗 ↓30%

4️⃣ 数据驱动的选址与扩张

  • 位置评估 (商圈分析、竞争格局)
  • 选址建模 (成熟周期、成本回收)
  • 风险识别 (失败预警、止损机制)
  • 决策支持 (开店vs关店vs改型)
  • 💰 预期收益: 新店成功率 ↑20-30%

门店运营深度分析

2.1 销售预测模型

预测模型架构:

makefile 复制代码
数据输入:
├─ 历史销售数据 (过去18-24个月)
├─ 外部特征 (天气、节假日、促销、竞争)
├─ 门店特征 (位置、规模、类型、成熟度)
└─ 时间特征 (季节性、周期性、趋势)

模型层:
├─ L1: 基线模型 (移动平均、指数平滑) - 精度±10-15%
├─ L2: 统计模型 (ARIMA、Holt-Winters) - 精度±8-12%
└─ L3: 机器学习 (XGBoost、LightGBM) - 精度±5-8%

预测粒度:
├─ 日级: 班次排班、库存准备
├─ 周级: 促销活动、营销计划
├─ 月级: 财务预算、采购计划
└─ 季度级: 战略规划、区域对标

Python销售预测代码:

python 复制代码
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import xgboost as xgb
from datetime import datetime, timedelta

class SalesForecaster:
    def __init__(self, shop_id, lookback_days=540):
        self.shop_id = shop_id
        self.lookback_days = lookback_days
        self.model = None
        self.scaler = StandardScaler()
    
    def fetch_historical_data(self, db_connection):
        """从Doris获取历史数据"""
        query = f"""
        SELECT stat_date, shop_id, daily_sales as label,
               customer_count, order_count, avg_order_price,
               max_temperature, rainfall, day_of_week,
               is_holiday, discount_amount, is_promo_day
        FROM dws_daily_shop_analysis
        WHERE shop_id = {self.shop_id}
            AND stat_date >= DATE_SUB(CURDATE(), INTERVAL {self.lookback_days} DAY)
        ORDER BY stat_date
        """
        df = pd.read_sql(query, db_connection)
        return self._engineer_features(df)
    
    def _engineer_features(self, df):
        """特征工程"""
        df['stat_date'] = pd.to_datetime(df['stat_date'])
        df['month'] = df['stat_date'].dt.month
        df['week'] = df['stat_date'].dt.isocalendar().week
        df['sales_ma7'] = df['label'].rolling(7, min_periods=1).mean()
        df['sales_ma30'] = df['label'].rolling(30, min_periods=1).mean()
        df['sales_yoy'] = df['label'].shift(365)
        return df.dropna()
    
    def train(self, df):
        """训练XGBoost模型"""
        feature_cols = [c for c in df.columns if c not in ['stat_date', 'shop_id', 'label']]
        X = df[feature_cols]
        y = df['label']
        
        X_train, X_test, y_train, y_test = X[:int(len(X)*0.8)], X[int(len(X)*0.8):], \
                                            y[:int(len(y)*0.8)], y[int(len(y)*0.8):]
        
        X_train_scaled = self.scaler.fit_transform(X_train)
        X_test_scaled = self.scaler.transform(X_test)
        
        self.model = xgb.XGBRegressor(n_estimators=200, max_depth=7, learning_rate=0.1)
        self.model.fit(X_train_scaled, y_train, eval_set=[(X_test_scaled, y_test)],
                      early_stopping_rounds=20, verbose=False)
        
        y_pred = self.model.predict(X_test_scaled)
        mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
        return {'mape': mape, 'model_ready': True}
    
    def predict(self, future_days=30):
        """预测未来销售"""
        predictions = []
        for day in range(1, future_days + 1):
            pred_date = datetime.now() + timedelta(days=day)
            # 构造特征、预测
            pred_value = self._get_forecast_value(pred_date)
            std_error = 1500
            predictions.append({
                'date': pred_date.strftime('%Y-%m-%d'),
                'forecast': round(pred_value, 2),
                'lower_ci': round(max(0, pred_value - 1.96*std_error), 2),
                'upper_ci': round(pred_value + 1.96*std_error, 2)
            })
        return predictions
    
    def _get_forecast_value(self, pred_date):
        # 实现具体预测逻辑
        return 8000  # 示例

2.2 人效优化与班次排班

班次排班SQL:

sql 复制代码
-- 基于销售预测的班次优化建议
CREATE TEMPORARY TABLE hourly_forecast AS
SELECT shop_id, forecast_date, hour, forecasted_orders,
       CASE WHEN forecasted_orders < 20 THEN 2
            WHEN forecasted_orders < 50 THEN 3
            WHEN forecasted_orders < 100 THEN 4
            ELSE 5
       END AS recommended_staff
FROM dws_hourly_forecast
WHERE shop_id = 'SHOP001' AND forecast_date = CURDATE();

-- 对比当前排班
SELECT h.hour, c.actual_staff, h.recommended_staff,
       CASE WHEN c.actual_staff > h.recommended_staff + 1 THEN '可减员'
            WHEN c.actual_staff < h.recommended_staff - 1 THEN '需增员'
            ELSE '正常'
       END AS suggestion
FROM hourly_forecast h
LEFT JOIN current_shifts c ON h.hour = HOUR(c.shift_date)
ORDER BY h.hour;

2.3 菜品贡献度分析

sql 复制代码
-- 计算菜品的销售、利润、出菜效率贡献度
SELECT
    menu_id, menu_name,
    COUNT(DISTINCT order_id) AS sales_count,
    SUM(actual_price) AS total_revenue,
    SUM(actual_price) - SUM(cost) AS total_profit,
    ROUND(SUM(actual_price) / SUM(actual_price) OVER () * 100, 2) AS revenue_pct,
    ROUND((SUM(actual_price) - SUM(cost)) / SUM(actual_price) * 100, 2) AS profit_margin,
    ROUND(AVG(cook_time), 1) AS avg_cook_time,
    CASE WHEN SUM(actual_price) / SUM(actual_price) OVER () >= 0.05 THEN '主力菜'
         WHEN ROUND((SUM(actual_price) - SUM(cost)) / SUM(actual_price) * 100, 2) >= 50 THEN '高利菜'
         ELSE '普通菜'
    END AS menu_category
FROM ods_order_items
WHERE shop_id = 'SHOP001' AND DATE(order_time) >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
GROUP BY 1, 2
ORDER BY total_profit DESC;

市场营销数据赋能

3.1 用户画像系统

sql 复制代码
-- 构建完整用户画像
CREATE TABLE dws_user_profile AS
SELECT
    member_id, phone, age_group, gender,
    -- 消费行为
    COUNT(DISTINCT order_id) AS total_orders,
    SUM(order_total) AS total_spending,
    AVG(order_total) AS avg_order_value,
    MAX(order_date) AS last_order_date,
    DATEDIFF(CURDATE(), MAX(order_date)) AS days_since_last_order,
    -- 菜品偏好
    (SELECT menu_category FROM ods_order_items WHERE member_id = m.member_id 
     GROUP BY menu_category ORDER BY COUNT(*) DESC LIMIT 1) AS favorite_category,
    -- 生命周期
    CASE WHEN DATEDIFF(CURDATE(), MAX(order_date)) <= 7 THEN '高活跃'
         WHEN DATEDIFF(CURDATE(), MAX(order_date)) <= 30 THEN '活跃'
         WHEN DATEDIFF(CURDATE(), MAX(order_date)) <= 90 THEN '偶发'
         WHEN DATEDIFF(CURDATE(), MAX(order_date)) <= 180 THEN '沉睡'
         ELSE '流失'
    END AS lifecycle_stage,
    -- 价值评级
    CASE WHEN SUM(order_total) >= (SELECT PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY total_spending))
              AND DATEDIFF(CURDATE(), MAX(order_date)) <= 30 THEN 'VIP'
         WHEN SUM(order_total) >= (SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_spending))
              AND DATEDIFF(CURDATE(), MAX(order_date)) <= 60 THEN '高价值'
         ELSE '普通'
    END AS value_segment
FROM dim_member m
LEFT JOIN ods_transactions t ON m.member_id = t.member_id
WHERE t.order_date >= DATE_SUB(CURDATE(), INTERVAL 365 DAY)
GROUP BY m.member_id;

3.2 会员复购率提升

python 复制代码
# 会员生命周期管理与复购优化
class MemberLifecycleManager:
    def calculate_churn_probability(self, member_id):
        """基于RFM模型计算流失概率"""
        # R (Recency): 最后购买距离
        # F (Frequency): 购买频率
        # M (Monetary): 购买金额
        
        r_score = self._get_recency_score(member_id)
        f_score = self._get_frequency_score(member_id)
        m_score = self._get_monetary_score(member_id)
        
        rfm_score = (r_score + f_score + m_score) / 3
        churn_prob = 1 - (rfm_score / 5)  # 分数高则流失率低
        
        return {
            'rfm_score': round(rfm_score, 2),
            'churn_probability': round(churn_prob, 3),
            'risk_level': 'P0_即将流失' if churn_prob > 0.7 else 'P1_高风险' if churn_prob > 0.5 else '低风险'
        }
    
    def recommend_marketing_action(self, member_id):
        """推荐个性化营销行动"""
        profile = self._get_member_profile(member_id)
        recommendations = []
        
        if profile['days_since_last_order'] > 7:
            recommendations.append({
                'action': 'push_notification',
                'content': f"好久不见,{profile['favorite_category']}限时优惠",
                'discount_code': self._generate_coupon(member_id),
                'timing': 'lunch_time'
            })
        
        if profile['avg_order_value'] < 200:
            recommendations.append({
                'action': 'premium_recommendation',
                'message': '您可能会喜欢这些新菜品...',
                'expected_uplift': '15-20%'
            })
        
        return recommendations
    
    def calculate_ltv(self, member_id):
        """计算客户生命周期价值 (LTV)"""
        profile = self._get_member_profile(member_id)
        avg_order = profile['avg_order_value']
        annual_freq = 365 / max(profile['order_frequency'], 1)
        
        lifecycle_duration = {
            '新客': 1, '成长期': 2, '活跃期': 3,
            '偶发期': 1.5, '沉睡期': 0.5, '流失期': 0
        }
        
        ltv = avg_order * annual_freq * lifecycle_duration.get(profile['stage'], 1)
        return round(ltv, 2)

供应链优化分析

4.1 食材需求预测

python 复制代码
class IngredientDemandForecaster:
    """食材需求预测系统"""
    
    def forecast_demand(self, ingredient_id, forecast_days=14):
        """预测食材需求量"""
        # 获取菜品销售预测
        menu_forecast = self._get_menu_forecast(forecast_days)
        
        # 获取菜品-食材映射
        recipe_mapping = self._get_recipe_mapping()
        
        # 计算食材需求
        demand_forecast = {}
        for day in range(forecast_days):
            daily_demand = 0
            for menu_item, qty in menu_forecast[day].items():
                if ingredient_id in recipe_mapping[menu_item]:
                    usage = recipe_mapping[menu_item][ingredient_id]
                    daily_demand += qty * usage
            
            # 加入10%安全库存
            demand_forecast[day] = {
                'base': daily_demand,
                'with_safety': daily_demand * 1.1,
                'reorder_point': daily_demand * 1.1 * 2  # 2天lead time
            }
        
        return demand_forecast
    
    def optimize_inventory(self, ingredient_id, current_stock):
        """库存优化建议"""
        forecast = self.forecast_demand(ingredient_id, 30)
        total_30day = sum([d['with_safety'] for d in forecast.values()])
        avg_daily = total_30day / 30
        
        # 计算经济批量 (EOQ)
        params = self._get_ingredient_params(ingredient_id)
        eoq = np.sqrt((2 * avg_daily * 365 * params['order_cost']) / params['holding_cost'])
        
        reorder_point = avg_daily * params['lead_time_days']
        max_stock = reorder_point + eoq
        
        return {
            'reorder_point': round(reorder_point, 2),
            'order_qty': round(eoq, 2),
            'max_level': round(max_stock, 2),
            'action': self._get_action(current_stock, reorder_point, max_stock)
        }
    
    def _get_action(self, current, reorder, max_level):
        if current <= reorder:
            return '紧急补货'
        elif current >= max_level:
            return '库存过量'
        else:
            return '正常'

4.2 库存和浪费监测

sql 复制代码
-- 检测库存异常和浪费风险
SELECT
    ingredient_id, ingredient_name,
    current_stock,
    CASE WHEN current_stock <= reorder_point THEN '缺货风险'
         WHEN current_stock >= max_stock * 0.9 THEN '积压风险'
         WHEN shelf_life_days - days_in_stock < 5 THEN '临期风险'
         ELSE '正常'
    END AS inventory_status,
    
    -- 浪费率
    ROUND(waste_amount / total_purchase * 100, 2) AS waste_rate,
    
    CASE WHEN waste_rate > 15 THEN 'P0_高风险'
         WHEN waste_rate > 8 THEN 'P1_中风险'
         ELSE 'P2_低风险'
    END AS waste_risk
FROM dws_ingredient_inventory
WHERE stat_date = CURDATE()
ORDER BY waste_risk;

新店选址建模方案

5.1 选址评估模型

sql 复制代码
-- 商圈评估
SELECT
    site_id, site_address, city, district,
    resident_population_within_1km,
    working_population_within_1km,
    competitor_count_within_1km,
    public_transport_density,
    
    -- 综合评分
    (population_score + competition_score + traffic_score) / 3 AS overall_score,
    
    CASE WHEN competitor_count_within_1km > 3 THEN '竞争激烈'
         WHEN resident_population_within_1km < 20000 THEN '人口不足'
         WHEN public_transport_density < 2 THEN '交通不便'
         ELSE '正常'
    END AS risk_level
FROM dws_site_evaluation
ORDER BY overall_score DESC;

5.2 成功概率预测

python 复制代码
from sklearn.ensemble import RandomForestClassifier

class NewStoreSuccessPredictor:
    """新店成功概率预测"""
    
    def train(self, db_connection):
        """基于历史开店数据训练"""
        query = """
        SELECT location_rent, location_population_1km,
               competitor_count, store_size, store_type,
               CASE WHEN cumulative_roi >= 1.2 THEN 1 ELSE 0 END AS success
        FROM historical_store_performance
        WHERE opening_date >= DATE_SUB(CURDATE(), INTERVAL 5 YEAR)
        """
        df = pd.read_sql(query, db_connection)
        
        X = df.drop('success', axis=1)
        y = df['success']
        
        self.model = RandomForestClassifier(n_estimators=200, max_depth=15)
        self.model.fit(X, y)
        
        return {'training_samples': len(df), 'success_rate': y.mean()}
    
    def predict(self, site_features):
        """预测成功概率"""
        X = pd.DataFrame([site_features])
        prob = self.model.predict_proba(X)[0][1]
        
        return {
            'success_probability': round(prob, 3),
            'rating': '优秀⭐⭐⭐⭐⭐' if prob >= 0.8 else '良好⭐⭐⭐⭐' if prob >= 0.6 else '风险⭐⭐',
            'recommendation': '🟢立即开店' if prob >= 0.8 else '🟡需评估' if prob >= 0.6 else '🔴不建议'
        }
    
    def estimate_financials(self, site_features):
        """估算财务指标"""
        city = site_features['city']
        benchmarks = self._get_benchmarks(city)
        adjusted = self._adjust_for_location(site_features, benchmarks)
        
        return {
            'monthly_sales': adjusted['sales'],
            'monthly_profit': adjusted['sales'] * 0.35 - adjusted['cost'],
            'payback_months': site_features['rent'] / (adjusted['sales'] * 0.35 - adjusted['cost'])
        }

数据文化与自助分析

6.1 自助分析赋能体系

yaml 复制代码
┌─ 工具民主化 ─┬─ BI工具 (Tableau/Metabase)
│             │  └─ 50+预建模板, 拖拽分析
│             │
│             ├─ SQL客户端 (DBeaver)
│             │  └─ SQL snippet库, 查询模板
│             │
│             └─ 分析环境 (Jupyter)
│                └─ Python/R, 预建框架

┌─ 能力分级 ──┬─ L0: 阅读报表 (1小时)
│             ├─ L1: BI自助分析 (4小时)
│             ├─ L2: SQL分析 (3天)
│             └─ L3: 建模统计 (5天)

┌─ 权限治理 ──┬─ 数据分级: 公开/部门/敏感
│             ├─ 角色权限: 查看/创建/管理
│             └─ 审计日志: 谁查了什么

6.2 典型分析案例模板

场景一: 门店日常诊断 (门店经理用)

  • 问题: "为什么今天销售下降?"
  • 分析: 时段分布、菜品排行、客流分析、外部因素
  • 数据: 日报表模板 (预建SQL)
  • 结果: 找到问题根因 → 采取行动

场景二: 营销活动评估 (营销经理用)

  • 问题: "这次活动ROI是多少?"
  • 分析: 参与人数、转化率、增量销售、投资回报
  • 数据: 活动效果模板 (按菜品/地区/会员维度)
  • 结果: 评估效果 → 优化下次活动

场景三: 库存预警 (采购经理用)

  • 问题: "哪些食材库存异常?"
  • 分析: 自动对标所有门店、所有食材
  • 数据: 库存预警表 (即刻更新)
  • 结果: 及时调整采购计划

下一步: 详见08文档 - 数据产品与应用创新

相关推荐
数据智研13 小时前
【数据分享】腾格里沙漠空间矢量范围
大数据·信息可视化·数据分析
数据智研14 小时前
【数据分享】毛乌素沙地(毛乌素沙漠)空间矢量范围
大数据·人工智能·信息可视化·数据分析
小王毕业啦15 小时前
2000-2023年 地级市-公路运输相关数据
大数据·人工智能·数据挖掘·数据分析·数据统计·社科数据·实证数据
B站_计算机毕业设计之家16 小时前
python招聘数据 求职就业数据可视化平台 大数据毕业设计 BOSS直聘数据可视化分析系统 Flask框架 Echarts可视化 selenium爬虫技术✅
大数据·python·深度学习·考研·信息可视化·数据分析·flask
咚咚王者21 小时前
人工智能之数据分析 Pandas:第七章 相关性分析
人工智能·数据分析·pandas
咚咚王者21 小时前
人工智能之数据分析 Pandas:第六章 数据清洗
人工智能·数据分析·pandas
B站计算机毕业设计之家21 小时前
大数据项目:基于python电商平台用户行为数据分析可视化系统 电商订单数据分析 Django框架 Echarts可视化 大数据技术(建议收藏)
大数据·python·机器学习·数据分析·django·电商·用户分析
Maxwell_li11 天前
Pandas 描述分析和分组分析学习文档
学习·数据分析·numpy·pandas·matplotlib
来鸟 鸣间1 天前
日常简单数据分析之matlab (一)
matlab·数据分析