从零搭建特征工程流水线,让仓库托盘任务实现动态出库决策
摘要
本文详细介绍如何为WMS(仓库管理系统)构建机器学习特征表,以支持托盘任务的智能调度(一层整托出库 vs 二层拣选出库)。涵盖:环境部署(CentOS 7/麒麟V10)、Oracle数据库(通过DBLINK访问备库)、Python特征工程脚本、库存分区设计(快速区、立库区、散件区、预先成件区)、销售滚动窗口计算、促销特征提取等。提供完整可运行的代码模板,帮助读者快速落地一套自动化特征工程流水线。
一、业务背景与特征表设计目标
在出版行业/电商仓库中,托盘任务下发常因重复调出同一托盘而效率低下。核心思路 :通过机器学习预测SKU的月均销量,并结合库存分布动态决定出库口(一层整托 vs 二层拣选)。为此需要构建一张特征表 WMS_ML_FEATURES,存储每个SKU每月的历史销售、库存、促销等指标。
特征表主要字段:
- 销售特征:最近7/30/90天托数、件数、趋势斜率、环比增长率
- 库存特征:按库位类型拆分为快速区(直发区+越库区)、立库区、散件区、预先成件区的托数和件数
- 商品属性:出版社、分类、上架天数
- 促销特征:是否有促销、平均折扣强度
- 标签:目标月份的实际月均销售托数/件数
二、环境准备(以CentOS 7 / 麒麟V10为例)
2.1 安装Python 3.9
bash
sudo yum install -y gcc gcc-c++ make openssl-devel bzip2-devel libffi-devel sqlite-devel
cd /usr/src
sudo wget https://www.python.org/ftp/python/3.9.18/Python-3.9.18.tgz
sudo tar xzf Python-3.9.18.tgz
cd Python-3.9.18
sudo ./configure --enable-optimizations
sudo make -j$(nproc)
sudo make altinstall
# 验证
/usr/local/bin/python3.9 --version
2.2 安装Oracle Instant Client(厚模式,支持Oracle 11g)
bash
# 下载两个RPM包(需Oracle账号)
# oracle-instantclient-basic-19.30.0.0.0-1.x86_64.rpm
# oracle-instantclient-devel-19.30.0.0.0-1.x86_64.rpm
sudo yum localinstall -y oracle-instantclient-*.rpm
echo "/usr/lib/oracle/19.30/client64/lib" | sudo tee /etc/ld.so.conf.d/oracle.conf
sudo ldconfig
2.3 创建项目虚拟环境并安装依赖
bash
mkdir -p /opt/wms_ml
cd /opt/wms_ml
/usr/local/bin/python3.9 -m venv venv
source venv/bin/activate
pip install --upgrade pip
requirements.txt:
pandas==1.5.3
numpy==1.23.5
scikit-learn==1.0.2
lightgbm==3.3.5
sqlalchemy>=1.4.0
oracledb>=1.4.0
python-dateutil>=2.8.0
安装:
bash
pip install -r requirements.txt --only-binary=:all:
三、数据库准备(在报表库中执行)
3.1 创建DBLINK指向源库备库
sql
CREATE DATABASE LINK SOURCE_DB_LINK
CONNECT TO readonly_user IDENTIFIED BY "password"
USING '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STANDBY)))';
3.2 创建特征表 WMS_ML_FEATURES
完整建表语句(含注释):
sql
CREATE TABLE WMS_ML_FEATURES (
item_id NUMBER(20) NOT NULL,
begin_month DATE NOT NULL,
sales_last_7d NUMBER,
sales_last_30d NUMBER,
sales_last_90d NUMBER,
avg_sales_last_30d NUMBER,
trend_slope_30d NUMBER,
mom_growth NUMBER,
monthly_avg_tuo NUMBER,
sales_last_7d_qty NUMBER,
sales_last_30d_qty NUMBER,
sales_last_90d_qty NUMBER,
avg_sales_last_30d_qty NUMBER,
monthly_avg_qty NUMBER,
piece_sales_last_30d NUMBER,
avg_piece_sales_per_day NUMBER,
carton_sales_last_30d NUMBER,
avg_carton_sales_per_day NUMBER,
inv_fast_zone_tuo NUMBER,
inv_fast_zone_qty NUMBER,
inv_rack_tuo NUMBER,
inv_rack_qty NUMBER,
inv_piece_zone_tuo NUMBER,
inv_piece_zone_qty NUMBER,
inv_pre_packed_tuo NUMBER,
inv_pre_packed_qty NUMBER,
age_days NUMBER,
publisher_code VARCHAR2(50),
category_code VARCHAR2(50),
has_promotion NUMBER(1),
promotion_intensity NUMBER
);
ALTER TABLE WMS_ML_FEATURES ADD CONSTRAINT PK_WMS_ML_FEATURES PRIMARY KEY (item_id, begin_month);
-- 索引
CREATE INDEX idx_ml_features_month ON WMS_ML_FEATURES(begin_month);
CREATE INDEX idx_ml_features_item ON WMS_ML_FEATURES(item_id);
-- 列注释(略)
四、特征工程核心代码
4.1 配置文件 config.py
python
# 报表库连接(目标库)
TARGET_DB = {
'user': 'report_user',
'password': '***',
'host': '10.0.0.2',
'port': 1521,
'service_name': 'REPORT'
}
TARGET_DATABASE_URL = f"oracle+oracledb://{TARGET_DB['user']}:{TARGET_DB['password']}@" \
f"{TARGET_DB['host']}:{TARGET_DB['port']}/?service_name={TARGET_DB['service_name']}"
# 源库DBLINK名称
SOURCE_DB_LINK = "SOURCE_DB_LINK"
# 表名
FEATURE_TABLE = "WMS_ML_FEATURES"
# 特征列(数值型)
NUMERIC_FEATURES = [
'sales_last_7d', 'sales_last_30d', 'sales_last_90d',
'avg_sales_last_30d', 'trend_slope_30d', 'mom_growth',
'sales_last_7d_qty', 'sales_last_30d_qty', 'sales_last_90d_qty',
'avg_sales_last_30d_qty',
'piece_sales_last_30d', 'avg_piece_sales_per_day',
'carton_sales_last_30d', 'avg_carton_sales_per_day',
'inv_fast_zone_tuo', 'inv_fast_zone_qty',
'inv_rack_tuo', 'inv_rack_qty',
'inv_piece_zone_tuo', 'inv_piece_zone_qty',
'inv_pre_packed_tuo', 'inv_pre_packed_qty',
'age_days', 'has_promotion', 'promotion_intensity'
]
CATEGORICAL_FEATURES = ['publisher_code', 'category_code']
ALL_FEATURES = NUMERIC_FEATURES + CATEGORICAL_FEATURES
4.2 数据库工具 db_utils.py
python
import oracledb
from sqlalchemy import create_engine, text
import pandas as pd
from config import TARGET_DATABASE_URL
# 启用厚模式
oracledb.init_oracle_client(lib_dir="/usr/lib/oracle/19.30/client64/lib")
class OracleDB:
def __init__(self, url):
self.engine = create_engine(url, pool_pre_ping=True, pool_recycle=3600)
def query(self, sql, params=None):
with self.engine.connect() as conn:
return pd.read_sql(text(sql), conn, params=params)
def execute(self, sql, params=None):
with self.engine.connect() as conn:
conn.execute(text(sql), params)
conn.commit()
def to_sql(self, df, table, if_exists='append'):
# 简化版写入
df.to_sql(table.upper(), self.engine, if_exists=if_exists, index=False)
target_db = OracleDB(TARGET_DATABASE_URL)
4.3 工具函数 utils.py
python
import logging
from datetime import datetime, timedelta
import numpy as np
def setup_logger(name, log_file):
logger = logging.getLogger(name)
logger.setLevel(logging.INFO)
handler = logging.FileHandler(log_file)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
return logger
def add_months(dt, months):
# 月份加减
new_month = dt.month + months
new_year = dt.year + (new_month - 1) // 12
new_month = (new_month - 1) % 12 + 1
return datetime(new_year, new_month, min(dt.day, 28)) # 简化
def calculate_trend_slope(series):
x = np.arange(len(series))
slope = np.polyfit(x, series, 1)[0] if len(series) >= 3 else 0.0
return slope
def safe_divide(a, b):
return a / b if b != 0 else 0
4.4 特征工程主脚本 feature_engineering.py(核心片段)
由于代码较长,此处展示关键函数,完整代码请见文末附录。
库存快照函数(分区逻辑):
python
def fetch_inventory_snapshot(snapshot_date):
sql = f"""
SELECT
ext.item_id,
CASE
WHEN loc.loc_type IN ('直发区', '越库区') THEN '快速区'
WHEN loc.loc_type IN ( '异形品区', '托盘立库区', '托盘立库大储位区') THEN '立库'
WHEN loc.loc_type IN ('箱式立库区', '阁楼区',) THEN '散件区'
WHEN loc.loc_type = '预先成件区' THEN '预先成件区'
ELSE '其他'
END AS zone_type,
SUM(FLOOR(ext.quantity_bu / pu.convert_figure)) AS total_qty,
SUM(FLOOR(ext.quantity_bu / (pu.convert_figure * i.supportcode_quantity))) AS total_tuo
FROM WMS_INVENTORY_EXTEND_GOOD_BAK@{SOURCE_DB_LINK} ext
JOIN WMS_ITEM@{SOURCE_DB_LINK} i ON ext.item_id = i.id
JOIN WMS_LOCATION@{SOURCE_DB_LINK} loc ON ext.location_id = loc.id
JOIN WMS_PACKAGE_UNIT@{SOURCE_DB_LINK} pu ON pu.item_id = ext.item_id AND pu.line_no = 3
WHERE ext.data_date = TRUNC(:snapshot_date)
AND ext.status = 'ENABLED'
AND i.supportcode_quantity > 0
AND loc.loc_type NOT IN ('不良品区','待检区','退货区','报废区','盘点区','待修复区')
GROUP BY ext.item_id,
CASE WHEN loc.loc_type IN ('直发区','越库区') THEN '快速区'
WHEN loc.loc_type IN ('异形品区','托盘立库区','托盘立库大储位区') THEN '立库'
WHEN loc.loc_type IN ('箱式立库区','阁楼区') THEN '散件区'
WHEN loc.loc_type = '预先成件区' THEN '预先成件区'
ELSE '其他' END
)
SELECT item_id, zone_type, SUM(total_qty) AS qty, SUM(total_tuo) AS tuo
FROM inv_detail WHERE zone_type != '其他' GROUP BY item_id, zone_type
"""
df = target_db.query(sql, {'snapshot_date': snapshot_date})
# 透视成宽表...
return result_df
单月特征构建:
python
def build_features_for_month(target_month):
month_start = target_month
history_end = month_start - timedelta(days=1)
history_start = history_end - timedelta(days=365)
sales_df = fetch_sales_data(history_start, history_end)
sku_df = fetch_sku_master()
inv_df = fetch_inventory_snapshot(month_start - timedelta(days=1))
promo_df = fetch_promotion_data(month_start, add_months(month_start,1))
rows = []
for item_id in sales_df['item_id'].unique():
# 时间序列补齐,滚动求和,斜率计算等
# ... 详细计算省略
rows.append({...})
return pd.DataFrame(rows)
五、执行特征工程
5.1 设置时间范围并运行
python
if __name__ == "__main__":
start = datetime(2025, 1, 1)
end = datetime(2026, 3, 1)
build_full_feature_table(start, end)
5.2 后台执行
bash
cd /opt/wms_ml
source venv/bin/activate
nohup python feature_engineering.py > logs/feature_engineering.log 2>&1 &
tail -f logs/feature_engineering.log
5.3 验证结果
sql
SELECT COUNT(*) FROM WMS_ML_FEATURES;
SELECT * FROM WMS_ML_FEATURES WHERE begin_month = DATE '2026-04-01' AND ROWNUM <= 10;
六、常见问题与优化
| 问题 | 解决方案 |
|---|---|
| 库存查询卡死 | 创建索引,或者将聚合结果物化到报表库本地表 |
| 内存不足 | 缩小历史窗口(365→180天)或增加swap |
| 终端断开进程终止 | 使用screen或nohup |
缺少_sqlite3 |
编译Python前安装sqlite-devel |
| pandas编译失败 | 使用--only-binary=:all:安装预编译wheel |
七、总结
通过本文提供的完整流程,你可以在自己的WMS环境中构建出一张高质量的特征表,为托盘任务智能调度提供数据基础。核心要点:
- 环境准备:Python 3.9 + Oracle Instant Client(厚模式)
- 特征设计:销售滚动窗口、库存分区、促销、商品属性
- 性能优化:索引、物化视图、分批处理
- 运维:每日增量更新,保留历史库存备份
下一步:利用特征表训练LightGBM模型,并将预测值集成到WMS出库口决策规则中,实现动态一层/二层出库。
附录:完整feature_engineering.py代码
由于篇幅,评论区留言交流获取。