文章目录
-
- 一、推荐系统架构全景
-
- [1.1 四层漏斗架构](#1.1 四层漏斗架构)
- [1.2 为什么不能直接精排?](#1.2 为什么不能直接精排?)
- 二、多路召回设计
-
- [2.1 五路召回实现](#2.1 五路召回实现)
- [2.2 多路召回的关键设计决策](#2.2 多路召回的关键设计决策)
- 三、精排特征体系
-
- [3.1 四类精排特征](#3.1 四类精排特征)
- [3.2 为什么交叉特征最重要?](#3.2 为什么交叉特征最重要?)
- 四、精排模型选型与实现
-
- [4.1 模型选型决策](#4.1 模型选型决策)
- [4.2 XGBoost 精排实现](#4.2 XGBoost 精排实现)
- [4.3 LambdaMART 精排实现](#4.3 LambdaMART 精排实现)
- 五、冷启动策略设计
-
- [5.1 分级冷启动处理](#5.1 分级冷启动处理)
- [5.2 冷启动的核心矛盾](#5.2 冷启动的核心矛盾)
- 六、实时特征更新链路
-
- [6.1 实时推荐全链路](#6.1 实时推荐全链路)
- [6.2 特征实时更新的延迟预算](#6.2 特征实时更新的延迟预算)
- [七、A/B 测试设计](#七、A/B 测试设计)
-
- [7.1 推荐系统 A/B 测试框架](#7.1 推荐系统 A/B 测试框架)
- [7.2 A/B 测试的指标选择](#7.2 A/B 测试的指标选择)
- 八、推荐系统监控体系
-
- [8.1 四维监控框架](#8.1 四维监控框架)
- [8.2 推荐监控的"北极星指标"](#8.2 推荐监控的"北极星指标")
- 九、实战完整架构
-
- [9.1 MovieLens 完整推荐链路](#9.1 MovieLens 完整推荐链路)
- [9.2 推荐系统完整链路验证](#9.2 推荐系统完整链路验证)
- 十、常见坑与最小可行方案对照表
- 总结
推荐系统的论文讲"NDCG 提升 5%"------但线上推荐服务要解决的问题是:1000 万用户 × 100 万商品 → 实时推荐延迟 < 200ms → 每天更新特征 → 每周迭代模型 → A/B 测试验证效果 → 10% 流量灰度发布。这不是"训个模型"------这是"运行一个系统"。
前文推荐系统基础讲了协同过滤/矩阵分解的算法原理------本篇是从算法到系统的工程化落地:把"理论上的推荐"变成"线上跑的推荐服务"。
一、推荐系统架构全景
1.1 四层漏斗架构
推荐系统的核心架构是一个从万级到十级的漏斗------每一层的数据量递减、计算精度递增、延迟容忍递减:
#mermaid-svg-cF6romfKWuhQUvfv{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-cF6romfKWuhQUvfv .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-cF6romfKWuhQUvfv .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-cF6romfKWuhQUvfv .error-icon{fill:#552222;}#mermaid-svg-cF6romfKWuhQUvfv .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-cF6romfKWuhQUvfv .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-cF6romfKWuhQUvfv .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-cF6romfKWuhQUvfv .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-cF6romfKWuhQUvfv .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-cF6romfKWuhQUvfv .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-cF6romfKWuhQUvfv .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-cF6romfKWuhQUvfv .marker{fill:#333333;stroke:#333333;}#mermaid-svg-cF6romfKWuhQUvfv .marker.cross{stroke:#333333;}#mermaid-svg-cF6romfKWuhQUvfv svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-cF6romfKWuhQUvfv p{margin:0;}#mermaid-svg-cF6romfKWuhQUvfv .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-cF6romfKWuhQUvfv .cluster-label text{fill:#333;}#mermaid-svg-cF6romfKWuhQUvfv .cluster-label span{color:#333;}#mermaid-svg-cF6romfKWuhQUvfv .cluster-label span p{background-color:transparent;}#mermaid-svg-cF6romfKWuhQUvfv .label text,#mermaid-svg-cF6romfKWuhQUvfv span{fill:#333;color:#333;}#mermaid-svg-cF6romfKWuhQUvfv .node rect,#mermaid-svg-cF6romfKWuhQUvfv .node circle,#mermaid-svg-cF6romfKWuhQUvfv .node ellipse,#mermaid-svg-cF6romfKWuhQUvfv .node polygon,#mermaid-svg-cF6romfKWuhQUvfv .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-cF6romfKWuhQUvfv .rough-node .label text,#mermaid-svg-cF6romfKWuhQUvfv .node .label text,#mermaid-svg-cF6romfKWuhQUvfv .image-shape .label,#mermaid-svg-cF6romfKWuhQUvfv .icon-shape .label{text-anchor:middle;}#mermaid-svg-cF6romfKWuhQUvfv .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-cF6romfKWuhQUvfv .rough-node .label,#mermaid-svg-cF6romfKWuhQUvfv .node .label,#mermaid-svg-cF6romfKWuhQUvfv .image-shape .label,#mermaid-svg-cF6romfKWuhQUvfv .icon-shape .label{text-align:center;}#mermaid-svg-cF6romfKWuhQUvfv .node.clickable{cursor:pointer;}#mermaid-svg-cF6romfKWuhQUvfv .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-cF6romfKWuhQUvfv .arrowheadPath{fill:#333333;}#mermaid-svg-cF6romfKWuhQUvfv .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-cF6romfKWuhQUvfv .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-cF6romfKWuhQUvfv .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-cF6romfKWuhQUvfv .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-cF6romfKWuhQUvfv .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-cF6romfKWuhQUvfv .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-cF6romfKWuhQUvfv .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-cF6romfKWuhQUvfv .cluster text{fill:#333;}#mermaid-svg-cF6romfKWuhQUvfv .cluster span{color:#333;}#mermaid-svg-cF6romfKWuhQUvfv div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-cF6romfKWuhQUvfv .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-cF6romfKWuhQUvfv rect.text{fill:none;stroke-width:0;}#mermaid-svg-cF6romfKWuhQUvfv .icon-shape,#mermaid-svg-cF6romfKWuhQUvfv .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-cF6romfKWuhQUvfv .icon-shape p,#mermaid-svg-cF6romfKWuhQUvfv .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-cF6romfKWuhQUvfv .icon-shape .label rect,#mermaid-svg-cF6romfKWuhQUvfv .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-cF6romfKWuhQUvfv .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-cF6romfKWuhQUvfv .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-cF6romfKWuhQUvfv :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 全部商品池 100万+
召回层 多路召回 千级
粗排 快速过滤 百级
精排 LTR模型 十级
重排 多样性/商业策略 最终推荐列表
协同过滤召回
内容召回
热门召回
标签召回
实时行为召回
| 层级 | 输入量 | 输出量 | 延迟预算 | 计算复杂度 | 核心目标 |
|---|---|---|---|---|---|
| 召回 | 100万+ | 1000~5000 | <50ms | 低(近似检索) | 不漏掉好商品 |
| 粗排 | 5000 | 200~500 | <20ms | 中(浅层模型) | 快速过滤明显差的 |
| 精排 | 500 | 50~100 | <100ms | 高(LTR/DNN) | 精确排序 |
| 重排 | 100 | 10~30 | <30ms | 低(规则+轻量模型) | 多样性+商业策略 |
1.2 为什么不能直接精排?
精排模型(LambdaMART/DNN)的单次推理成本远高于召回。假设精排单次推理耗时 1ms,对 100 万商品全量精排需要 1000 秒------这完全不可接受。漏斗架构的核心逻辑是:用低成本的召回层保证"好商品不漏",用高成本的精排层保证"好商品排前面"。
二、多路召回设计
2.1 五路召回实现
python
import numpy as np
from collections import defaultdict
from sklearn.metrics.pairwise import cosine_similarity
class MultiChannelRetrieval:
"""多路召回聚合器"""
def __init__(self, n_items, n_users):
self.n_items = n_items
self.n_users = n_users
self.channels = {} # 各路召回结果
def item_cf_recall(self, user_items, item_similarity_matrix, top_k=200):
"""基于 Item-CF 的协同过滤召回"""
recalled = defaultdict(dict)
for user_id, interacted_items in user_items.items():
scores = defaultdict(float)
for item_id in interacted_items:
if item_id >= len(item_similarity_matrix):
continue
sim_row = item_similarity_matrix[item_id]
top_sim_items = np.argsort(sim_row)[-top_k:]
for sim_item in top_sim_items:
if sim_item in interacted_items: # 排除已交互
continue
scores[sim_item] += sim_row[sim_item]
recalled[user_id] = dict(sorted(scores.items(),
key=lambda x: x[1], reverse=True)[:top_k])
self.channels['item_cf'] = recalled
return recalled
def content_recall(self, user_profile_embeddings, item_content_embeddings, top_k=200):
"""基于内容特征的召回(解决新商品冷启动)"""
recalled = defaultdict(dict)
# 用户画像向量与商品内容向量的余弦相似度
sim_matrix = cosine_similarity(user_profile_embeddings, item_content_embeddings)
for user_id in range(len(user_profile_embeddings)):
top_items = np.argsort(sim_matrix[user_id])[-top_k:]
for item_id in top_items:
recalled[user_id][item_id] = sim_matrix[user_id][item_id]
self.channels['content'] = recalled
return recalled
def popular_recall(self, item_popularity_scores, top_k=100):
"""热门召回(兜底策略,对新用户和冷启动场景)"""
recalled = defaultdict(dict)
top_popular = np.argsort(item_popularity_scores)[-top_k:]
for user_id in range(self.n_users):
for rank, item_id in enumerate(top_popular):
# 热门分数用排名衰减:越热门权重越高,但衰减避免过度集中
recalled[user_id][item_id] = item_popularity_scores[item_id] * \
(1 - 0.01 * rank)
self.channels['popular'] = recalled
return recalled
def tag_recall(self, user_tag_preferences, item_tag_matrix, top_k=150):
"""标签召回(用户兴趣标签与商品标签匹配)"""
recalled = defaultdict(dict)
for user_id, tags in user_tag_preferences.items():
scores = defaultdict(float)
for tag, preference in tags.items():
for item_id in range(self.n_items):
if item_id < len(item_tag_matrix) and tag in item_tag_matrix[item_id]:
scores[item_id] += preference * item_tag_matrix[item_id][tag]
recalled[user_id] = dict(sorted(scores.items(),
key=lambda x: x[1], reverse=True)[:top_k])
self.channels['tag'] = recalled
return recalled
def realtime_behavior_recall(self, user_recent_actions, top_k=100):
"""实时行为召回(最近点击/加购/浏览的商品相似品)"""
recalled = defaultdict(dict)
# 实时行为权重:加购(3.0) > 点击(2.0) > 浏览(1.0)
action_weights = {'cart': 3.0, 'click': 2.0, 'view': 1.0}
for user_id, actions in user_recent_actions.items():
scores = defaultdict(float)
for action in actions:
item_id = action['item_id']
weight = action_weights.get(action['type'], 1.0)
# 时间衰减:越近的行为权重越高
time_decay = np.exp(-0.1 * action['hours_ago'])
scores[item_id] += weight * time_decay
recalled[user_id] = dict(sorted(scores.items(),
key=lambda x: x[1], reverse=True)[:top_k])
self.channels['realtime'] = recalled
return recalled
def merge_and_dedup(self, channel_weights=None, top_k=500):
"""多路合并去重------加权融合各路召回结果"""
if channel_weights is None:
# 默认权重:协同过滤最可信,热门兜底最弱
channel_weights = {
'item_cf': 1.0, 'content': 0.8,
'popular': 0.3, 'tag': 0.6, 'realtime': 1.2
}
merged = defaultdict(dict)
for channel_name, channel_results in self.channels.items():
weight = channel_weights.get(channel_name, 0.5)
for user_id, item_scores in channel_results.items():
for item_id, score in item_scores.items():
# 多路命中的商品:累加加权分数
merged[user_id][item_id] = merged[user_id].get(item_id, 0) + \
score * weight
# 按合并分数排序,取 top_k
result = {}
for user_id, scores in merged.items():
sorted_items = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_k]
result[user_id] = dict(sorted_items)
return result
2.2 多路召回的关键设计决策
为什么需要多路? 单路召回的覆盖率有限。Item-CF 只能召回与用户历史相似的商品("信息茧房"问题);内容召回能覆盖新商品但精度较低;热门召回覆盖广但无个性化。多路召回的核心价值是互补------每路命中不同的候选集,合并后覆盖率大幅提升。
加权融合 vs 阶段排序? 工程实践中加权融合更常用,因为简单且可调。阶段排序(先 CF 再 Content 再 Popular)存在级联依赖,且某路失败会阻塞后续路。加权融合允许各路独立运行、并行计算、容错降级。
三、精排特征体系
3.1 四类精排特征
#mermaid-svg-z9hCNYPpncK5xFt0{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-z9hCNYPpncK5xFt0 .error-icon{fill:#552222;}#mermaid-svg-z9hCNYPpncK5xFt0 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-z9hCNYPpncK5xFt0 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-z9hCNYPpncK5xFt0 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-z9hCNYPpncK5xFt0 .marker.cross{stroke:#333333;}#mermaid-svg-z9hCNYPpncK5xFt0 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-z9hCNYPpncK5xFt0 p{margin:0;}#mermaid-svg-z9hCNYPpncK5xFt0 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-z9hCNYPpncK5xFt0 .cluster-label text{fill:#333;}#mermaid-svg-z9hCNYPpncK5xFt0 .cluster-label span{color:#333;}#mermaid-svg-z9hCNYPpncK5xFt0 .cluster-label span p{background-color:transparent;}#mermaid-svg-z9hCNYPpncK5xFt0 .label text,#mermaid-svg-z9hCNYPpncK5xFt0 span{fill:#333;color:#333;}#mermaid-svg-z9hCNYPpncK5xFt0 .node rect,#mermaid-svg-z9hCNYPpncK5xFt0 .node circle,#mermaid-svg-z9hCNYPpncK5xFt0 .node ellipse,#mermaid-svg-z9hCNYPpncK5xFt0 .node polygon,#mermaid-svg-z9hCNYPpncK5xFt0 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-z9hCNYPpncK5xFt0 .rough-node .label text,#mermaid-svg-z9hCNYPpncK5xFt0 .node .label text,#mermaid-svg-z9hCNYPpncK5xFt0 .image-shape .label,#mermaid-svg-z9hCNYPpncK5xFt0 .icon-shape .label{text-anchor:middle;}#mermaid-svg-z9hCNYPpncK5xFt0 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-z9hCNYPpncK5xFt0 .rough-node .label,#mermaid-svg-z9hCNYPpncK5xFt0 .node .label,#mermaid-svg-z9hCNYPpncK5xFt0 .image-shape .label,#mermaid-svg-z9hCNYPpncK5xFt0 .icon-shape .label{text-align:center;}#mermaid-svg-z9hCNYPpncK5xFt0 .node.clickable{cursor:pointer;}#mermaid-svg-z9hCNYPpncK5xFt0 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-z9hCNYPpncK5xFt0 .arrowheadPath{fill:#333333;}#mermaid-svg-z9hCNYPpncK5xFt0 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-z9hCNYPpncK5xFt0 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-z9hCNYPpncK5xFt0 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-z9hCNYPpncK5xFt0 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-z9hCNYPpncK5xFt0 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-z9hCNYPpncK5xFt0 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-z9hCNYPpncK5xFt0 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-z9hCNYPpncK5xFt0 .cluster text{fill:#333;}#mermaid-svg-z9hCNYPpncK5xFt0 .cluster span{color:#333;}#mermaid-svg-z9hCNYPpncK5xFt0 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-z9hCNYPpncK5xFt0 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-z9hCNYPpncK5xFt0 rect.text{fill:none;stroke-width:0;}#mermaid-svg-z9hCNYPpncK5xFt0 .icon-shape,#mermaid-svg-z9hCNYPpncK5xFt0 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-z9hCNYPpncK5xFt0 .icon-shape p,#mermaid-svg-z9hCNYPpncK5xFt0 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-z9hCNYPpncK5xFt0 .icon-shape .label rect,#mermaid-svg-z9hCNYPpncK5xFt0 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-z9hCNYPpncK5xFt0 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-z9hCNYPpncK5xFt0 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-z9hCNYPpncK5xFt0 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 上下文特征
时间/时段
场景/页面
设备类型
天气/节日
交叉特征
用户-商品交互次数
同类别购买率
价格匹配度
商品特征
类别/标签
价格区间
质量评分
库存状态
用户特征
历史偏好向量
人口统计
活跃度指标
消费能力
精排特征体系
用户特征
模型输入
商品特征
交叉特征
上下文特征
python
class RecommendationFeatureBuilder:
"""推荐系统精排特征构建器"""
def __init__(self):
self.feature_groups = {
'user': [], 'item': [], 'cross': [], 'context': []
}
def build_user_features(self, user_history, user_profile):
"""用户特征:历史偏好 + 人口统计 + 活跃度"""
features = {}
# 历史偏好:最近 30 天的类别分布
category_dist = defaultdict(int)
for action in user_history[-100:]: # 最近100次行为
category_dist[action['category']] += 1
total = sum(category_dist.values()) or 1
for cat, count in category_dist.items():
features[f'user_cat_{cat}_ratio'] = count / total
# 消费能力:平均客单价 + 价格区间偏好
prices = [a['price'] for a in user_history if 'price' in a]
features['user_avg_price'] = np.mean(prices) if prices else 0
features['user_price_std'] = np.std(prices) if len(prices) > 1 else 0
# 活跃度指标
features['user_active_days_30d'] = user_profile.get('active_days', 0)
features['user_action_count_7d'] = user_profile.get('7d_actions', 0)
features['user_conversion_rate'] = user_profile.get('conversion_rate', 0)
self.feature_groups['user'] = list(features.keys())
return features
def build_item_features(self, item_info, item_stats):
"""商品特征:类别/价格/质量/热度"""
features = {}
features['item_category'] = item_info.get('category', 'unknown')
features['item_price'] = item_info.get('price', 0)
features['item_price_vs_avg'] = item_info.get('price', 0) / \
max(item_stats.get('avg_price', 1), 1)
features['item_quality_score'] = item_info.get('quality_score', 0)
features['item_review_count'] = item_info.get('review_count', 0)
features['item_avg_rating'] = item_info.get('avg_rating', 0)
features['item_popularity_7d'] = item_stats.get('7d_views', 0)
features['item_conversion_rate'] = item_stats.get('conversion_rate', 0)
features['item_stock_status'] = int(item_info.get('in_stock', False))
self.feature_groups['item'] = list(features.keys())
return features
def build_cross_features(self, user_features, item_features, user_history):
"""交叉特征:用户-商品交互信号"""
features = {}
# 用户在该商品类别下的购买率
cat = item_features.get('item_category', 'unknown')
cat_ratio_key = f'user_cat_{cat}_ratio'
features['user_cat_purchase_rate'] = user_features.get(cat_ratio_key, 0)
# 价格匹配度:用户平均客单价与商品价格的比值
user_avg_price = user_features.get('user_avg_price', 0)
item_price = item_features.get('item_price', 0)
features['price_match_ratio'] = item_price / max(user_avg_price, 1)
# 用户是否已浏览/加购/购买该商品
features['user_has_viewed'] = 0
features['user_has_carted'] = 0
features['user_has_purchased'] = 0
for action in user_history:
if action.get('item_id') == item_features.get('item_id'):
if action['type'] == 'view':
features['user_has_viewed'] = 1
elif action['type'] == 'cart':
features['user_has_carted'] = 1
elif action['type'] == 'purchase':
features['user_has_purchased'] = 1
self.feature_groups['cross'] = list(features.keys())
return features
def build_context_features(self, request_context):
"""上下文特征:时间/场景/设备"""
features = {}
hour = request_context.get('hour', 12)
features['request_hour'] = hour
# 正弦余弦周期编码(前文时间序列预测中已介绍)
features['hour_sin'] = np.sin(2 * np.pi * hour / 24)
features['hour_cos'] = np.cos(2 * np.pi * hour / 24)
features['request_page_type'] = request_context.get('page_type', 'home')
features['request_device'] = request_context.get('device', 'mobile')
features['is_weekend'] = int(request_context.get('is_weekend', False))
features['is_holiday'] = int(request_context.get('is_holiday', False))
self.feature_groups['context'] = list(features.keys())
return features
def build_full_features(self, user_features, item_features,
cross_features, context_features):
"""组装完整的精排特征向量"""
# 交叉特征是精排模型最重要的信号来源
# 纯用户特征和纯商品特征的预测力有限
# 交叉特征(如"用户偏好类别 × 商品类别")才是区分"推荐 A 还是 B"的关键
full = {}
full.update(user_features)
full.update(item_features)
full.update(cross_features)
full.update(context_features)
return full
3.2 为什么交叉特征最重要?
精排模型的目标不是"预测用户喜欢什么类别"(这是召回层的事),而是"在这个类别的 50 个候选商品中,用户最可能点哪个"。区分同一类别下不同商品的核心信号来自交叉特征------比如价格匹配度、用户历史购买率、是否已浏览等。纯用户特征和纯商品特征对排序的贡献有限,因为同类候选商品的用户侧和商品侧特征完全相同。
四、精排模型选型与实现
4.1 模型选型决策
| 模型 | 适用场景 | 优势 | 劣势 | 特征要求 |
|---|---|---|---|---|
| 矩阵分解(基线) | 特征少、冷启动 | 简单、可解释 | 无交叉特征 | 仅交互矩阵 |
| XGBoost | 特征丰富 | 高精度、快推理 | 不感知排序结构 | 20~50 维 |
| LambdaMART | 排序专业场景 | 直接优化 NDCG | 训练较慢 | 20~100 维 |
| Wide&Deep | 大数据量 | 记忆+泛化 | 工程复杂度高 | 100+ 维 |
4.2 XGBoost 精排实现
python
import xgboost as xgb
from sklearn.model_selection import GroupKFold
from sklearn.metrics import ndcg_score
class XGBoostRanker:
"""XGBoost 精排模型------以点击/转化为目标"""
def __init__(self, params=None):
self.params = params or {
'objective': 'binary:logistic', # 二分类目标(点击 vs 未点击)
'eval_metric': 'logloss',
'max_depth': 6,
'learning_rate': 0.1,
'n_estimators': 200,
'subsample': 0.8,
'colsample_bytree': 0.8,
'reg_alpha': 0.1,
'reg_lambda': 1.0,
}
self.model = None
def train(self, X_train, y_train, group_sizes):
"""
X_train: 特征矩阵 [n_samples, n_features]
y_train: 标签(0=未点击, 1=点击, 2=加购, 3=购买)
group_sizes: 每个用户的候选商品数量列表
例如 [50, 50, 50] 表示3个用户各有50个候选商品
"""
# 使用 XGBRanker(感知分组结构)
ranker_params = self.params.copy()
ranker_params['objective'] = 'rank:pairwise' # 对式排序目标
self.model = xgb.XGBRanker(
**ranker_params,
tree_method='hist' # 快速直方图方法
)
self.model.fit(X_train, y_train, group=group_sizes)
return self
def predict_scores(self, X_test):
"""输出排序分数------分数越高,推荐优先级越高"""
return self.model.predict(X_test)
def rank_items(self, user_candidates_with_features):
"""对单个用户的候选商品排序"""
features_matrix = np.array([
list(f.values()) for f in user_candidates_with_features
])
scores = self.predict_scores(features_matrix)
# 按分数降序排列
ranked_indices = np.argsort(scores)[::-1]
return [(idx, scores[idx]) for idx in ranked_indices]
def evaluate_ranking(self, X_test, y_test, group_sizes):
"""评估排序质量------NDCG@K"""
scores = self.predict_scores(X_test)
ndcg_values = []
offset = 0
for group_size in group_sizes:
group_scores = scores[offset:offset + group_size]
group_labels = y_test[offset:offset + group_size]
if len(set(group_labels)) > 1: # 需要至少2种标签级别
ndcg = ndcg_score([group_labels], [group_scores], k=10)
ndcg_values.append(ndcg)
offset += group_size
return np.mean(ndcg_values) if ndcg_values else 0
4.3 LambdaMART 精排实现
前文 Learning to Rank 已详细介绍了 LambdaMART 的三层架构------本篇展示它在推荐精排场景的工程落地:
python
import lightgbm as lgb
class LambdaMARTRanker:
"""LambdaMART 精排------直接优化 NDCG 排序指标"""
def __init__(self):
self.params = {
'objective': 'lambdarank',
'metric': 'ndcg',
'ndcg_eval_at': [5, 10, 20],
'learning_rate': 0.05,
'num_leaves': 31,
'max_depth': 6,
'min_data_in_leaf': 50,
'feature_fraction': 0.8,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'lambda_l1': 0.1,
'lambda_l2': 1.0,
'min_gain_to_split': 0.01,
}
self.model = None
def train(self, X_train, y_train, group_sizes,
X_val=None, y_val=None, val_groups=None):
"""
关键:group_sizes 必须正确传递------每个用户的候选商品数
这是 LTR 区别于普通分类/回归的核心:模型需要知道"哪些样本属于同一个排序组"
"""
train_data = lgb.Dataset(X_train, label=y_train, group=group_sizes)
valid_sets = [train_data]
valid_names = ['train']
if X_val is not None:
val_data = lgb.Dataset(X_val, label=y_val, group=val_groups,
reference=train_data)
valid_sets.append(val_data)
valid_names.append('valid')
self.model = lgb.train(
self.params,
train_data,
num_boost_round=300,
valid_sets=valid_sets,
valid_names=valid_names,
callbacks=[
lgb.early_stopping(stopping_rounds=30),
lgb.log_evaluation(period=50)
]
)
return self
def predict_and_rank(self, X_test, group_sizes):
"""预测排序分数并按组排序"""
raw_scores = self.model.predict(X_test)
ranked_results = []
offset = 0
for group_size in group_sizes:
group_scores = raw_scores[offset:offset + group_size]
ranked_indices = np.argsort(group_scores)[::-1]
ranked_results.append(ranked_indices)
offset += group_size
return ranked_results, raw_scores
五、冷启动策略设计
5.1 分级冷启动处理
python
class ECommerceColdStartHandler:
"""电商推荐冷启动分级处理"""
def __init__(self, popular_items, item_content_features):
self.popular_items = popular_items # 热门商品列表
self.item_features = item_content_features
def handle_new_user(self, user_id, level=0, user_choices=None):
"""
冷启动分级策略:
Level 0: 纯热门推荐(零信息)
Level 1: 人口统计分组推荐(性别/年龄/地域)
Level 2: 基于用户引导选择的推荐
Level 3: 积累少量行为后进入协同过滤
"""
if level == 0:
# 无任何信息的用户 → 热门推荐 + 高多样性
return self._popular_with_diversity(n=30, diversity_ratio=0.4)
elif level == 1:
# 有人口统计信息 → 分组热门推荐
return self._demographic_group_recommend(user_choices, n=30)
elif level == 2:
# 用户主动选择了兴趣标签 → 内容召回
return self._interest_guided_recommend(user_choices, n=30)
elif level >= 3:
# 积累了足够行为 → 协同过滤可启用
return None # 交给正常召回流程
def handle_new_item(self, item_id, item_info):
"""新商品冷启动------内容特征召回 + 加速曝光"""
# 新商品无法被协同过滤召回(无交互历史)
# 但可以通过内容特征被内容召回路径覆盖
# 额外策略:前 7 天给予曝光加权,加速积累交互数据
recommendations = self._content_similar_items(item_info, n=50)
# 标记为冷启动商品,在重排层给予曝光加权
return {
'item_id': item_id,
'similar_items': recommendations,
'cold_start_boost': 2.0, # 前7天曝光权重加倍
'cold_start_end_date': '7_days_later'
}
def _popular_with_diversity(self, n, diversity_ratio):
"""热门推荐 + 类别多样性注入"""
popular = self.popular_items[:int(n * (1 - diversity_ratio))]
# 从各类别随机抽取,保证多样性
diverse = self._random_diverse_items(n=int(n * diversity_ratio))
return popular + diverse
def _demographic_group_recommend(self, demographics, n):
"""按人口统计分组的热门推荐"""
# 同组用户的热门商品 → 比全局热门更精准
group_key = f"{demographics.get('gender', 'all')}_" \
f"{demographics.get('age_group', 'all')}_" \
f"{demographics.get('region', 'all')}"
return self.popular_items[:n] # 实际按 group_key 索引分组热门
def _interest_guided_recommend(self, interests, n):
"""基于用户选择的兴趣标签推荐"""
recalled = []
for tag in interests:
tag_items = [i for i, f in self.item_features.items()
if tag in f.get('tags', [])]
recalled.extend(tag_items[:n // len(interests)])
return recalled[:n]
def _content_similar_items(self, item_info, n):
"""基于内容特征的相似商品"""
# 用商品类别、标签、价格区间等特征匹配相似品
return self.popular_items[:n] # 简化示例
5.2 冷启动的核心矛盾
推荐系统的冷启动矛盾在于:没有数据就无法个性化推荐,但推荐又是获取数据的前提。解决思路是:
- 新用户:先用非个性化策略(热门/引导选择)"破冰",积累行为后快速进入个性化
- 新商品:内容特征保证可被召回,曝光加权加速积累交互数据
- 关键指标:新用户首次推荐的点击率(衡量冷启动质量)、新商品首次曝光到首次被购买的时间间隔(衡量冷启动速度)
六、实时特征更新链路
6.1 实时推荐全链路
推荐系统的"实时性"不是指模型实时训练------而是指特征实时更新。用户 5 分钟前点击了一个电子产品,下一次推荐请求时应该能反映这个行为:
#mermaid-svg-XJgE0XKhkxfJriDr{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-XJgE0XKhkxfJriDr .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-XJgE0XKhkxfJriDr .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-XJgE0XKhkxfJriDr .error-icon{fill:#552222;}#mermaid-svg-XJgE0XKhkxfJriDr .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-XJgE0XKhkxfJriDr .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-XJgE0XKhkxfJriDr .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-XJgE0XKhkxfJriDr .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-XJgE0XKhkxfJriDr .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-XJgE0XKhkxfJriDr .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-XJgE0XKhkxfJriDr .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-XJgE0XKhkxfJriDr .marker{fill:#333333;stroke:#333333;}#mermaid-svg-XJgE0XKhkxfJriDr .marker.cross{stroke:#333333;}#mermaid-svg-XJgE0XKhkxfJriDr svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-XJgE0XKhkxfJriDr p{margin:0;}#mermaid-svg-XJgE0XKhkxfJriDr .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-XJgE0XKhkxfJriDr .cluster-label text{fill:#333;}#mermaid-svg-XJgE0XKhkxfJriDr .cluster-label span{color:#333;}#mermaid-svg-XJgE0XKhkxfJriDr .cluster-label span p{background-color:transparent;}#mermaid-svg-XJgE0XKhkxfJriDr .label text,#mermaid-svg-XJgE0XKhkxfJriDr span{fill:#333;color:#333;}#mermaid-svg-XJgE0XKhkxfJriDr .node rect,#mermaid-svg-XJgE0XKhkxfJriDr .node circle,#mermaid-svg-XJgE0XKhkxfJriDr .node ellipse,#mermaid-svg-XJgE0XKhkxfJriDr .node polygon,#mermaid-svg-XJgE0XKhkxfJriDr .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-XJgE0XKhkxfJriDr .rough-node .label text,#mermaid-svg-XJgE0XKhkxfJriDr .node .label text,#mermaid-svg-XJgE0XKhkxfJriDr .image-shape .label,#mermaid-svg-XJgE0XKhkxfJriDr .icon-shape .label{text-anchor:middle;}#mermaid-svg-XJgE0XKhkxfJriDr .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-XJgE0XKhkxfJriDr .rough-node .label,#mermaid-svg-XJgE0XKhkxfJriDr .node .label,#mermaid-svg-XJgE0XKhkxfJriDr .image-shape .label,#mermaid-svg-XJgE0XKhkxfJriDr .icon-shape .label{text-align:center;}#mermaid-svg-XJgE0XKhkxfJriDr .node.clickable{cursor:pointer;}#mermaid-svg-XJgE0XKhkxfJriDr .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-XJgE0XKhkxfJriDr .arrowheadPath{fill:#333333;}#mermaid-svg-XJgE0XKhkxfJriDr .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-XJgE0XKhkxfJriDr .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-XJgE0XKhkxfJriDr .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XJgE0XKhkxfJriDr .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-XJgE0XKhkxfJriDr .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XJgE0XKhkxfJriDr .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-XJgE0XKhkxfJriDr .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-XJgE0XKhkxfJriDr .cluster text{fill:#333;}#mermaid-svg-XJgE0XKhkxfJriDr .cluster span{color:#333;}#mermaid-svg-XJgE0XKhkxfJriDr div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-XJgE0XKhkxfJriDr .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-XJgE0XKhkxfJriDr rect.text{fill:none;stroke-width:0;}#mermaid-svg-XJgE0XKhkxfJriDr .icon-shape,#mermaid-svg-XJgE0XKhkxfJriDr .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XJgE0XKhkxfJriDr .icon-shape p,#mermaid-svg-XJgE0XKhkxfJriDr .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-XJgE0XKhkxfJriDr .icon-shape .label rect,#mermaid-svg-XJgE0XKhkxfJriDr .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XJgE0XKhkxfJriDr .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-XJgE0XKhkxfJriDr .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-XJgE0XKhkxfJriDr :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 用户行为 点击/加购
Kafka 消息队列
特征计算引擎
Feature Store 在线存储 Redis
精排模型实时读取
推荐结果 200ms内
Feature Store 离线存储 Parquet
每日/每周模型训练
python
import time
from collections import defaultdict
class RealtimeFeatureUpdater:
"""实时特征更新------从用户行为到精排特征 < 200ms"""
def __init__(self, redis_client=None):
self.redis = redis_client # 在线特征存储
self.feature_buffer = defaultdict(list) # 本地缓冲
def process_user_action(self, user_id, action):
"""处理一条用户行为事件"""
start_time = time.time()
# 1. 解析行为事件
event = {
'user_id': user_id,
'item_id': action['item_id'],
'action_type': action['type'], # view/click/cart/purchase
'timestamp': action['timestamp'],
'item_category': action.get('category', ''),
'item_price': action.get('price', 0),
}
# 2. 更新实时特征
# 2a. 用户最近行为序列(最近 50 条)
self._update_recent_actions(user_id, event)
# 2b. 用户类别偏好分布(滑动窗口统计)
self._update_category_preference(user_id, event)
# 2c. 用户价格偏好(滑动平均)
self._update_price_preference(user_id, event)
# 3. 写入在线存储
self._write_to_online_store(user_id)
elapsed = time.time() - start_time
return {'status': 'ok', 'elapsed_ms': elapsed * 1000}
def _update_recent_actions(self, user_id, event):
"""更新用户最近行为序列"""
key = f'user:{user_id}:recent_actions'
self.feature_buffer[key].append(event)
# 保留最近 50 条
if len(self.feature_buffer[key]) > 50:
self.feature_buffer[key] = self.feature_buffer[key][-50:]
def _update_category_preference(self, user_id, event):
"""更新用户类别偏好分布"""
if event['item_category']:
key = f'user:{user_id}:cat_preference'
# 滑动窗口统计:最近100次行为的类别分布
recent = self.feature_buffer[f'user:{user_id}:recent_actions']
cat_counts = defaultdict(int)
for a in recent[-100:]:
if a.get('item_category'):
cat_counts[a['item_category']] += 1
total = sum(cat_counts.values()) or 1
preference = {cat: count / total for cat, count in cat_counts.items()}
self.feature_buffer[key] = preference
def _update_price_preference(self, user_id, event):
"""更新用户价格偏好(最近行为的滑动平均价格)"""
key = f'user:{user_id}:price_preference'
recent = self.feature_buffer[f'user:{user_id}:recent_actions']
prices = [a['item_price'] for a in recent[-30:] if a.get('item_price')]
if prices:
self.feature_buffer[key] = {
'avg': np.mean(prices),
'std': np.std(prices) if len(prices) > 1 else 0,
}
def _write_to_online_store(self, user_id):
"""写入 Redis 在线存储------精排模型直接读取"""
# 实际工程中用 Redis SET/HSET
# 此处为逻辑示意
for key in [f'user:{user_id}:recent_actions',
f'user:{user_id}:cat_preference',
f'user:{user_id}:price_preference']:
if key in self.feature_buffer:
# redis_client.hset(key, mapping=self.feature_buffer[key])
pass
def get_online_features(self, user_id, item_ids):
"""精排模型读取在线特征------必须在 50ms 内完成"""
start_time = time.time()
user_features = {}
for key_prefix in ['recent_actions', 'cat_preference', 'price_preference']:
key = f'user:{user_id}:{key_prefix}'
if key in self.feature_buffer:
user_features[key_prefix] = self.feature_buffer[key]
elapsed = time.time() - start_time
return user_features, elapsed * 1000
6.2 特征实时更新的延迟预算
精排的总延迟预算是 100ms,其中:
- 特征读取:30ms(Redis 单次读取 1~5ms,批量读取 50 个特征约 30ms)
- 模型推理:50ms(XGBoost/LambdaMART 单次推理约 1ms,100 个候选约 100ms → 需要批量优化)
- 后处理(重排+过滤):20ms
关键优化点:批量推理 ------对同一用户的 100 个候选商品,一次性调用 model.predict(batch) 而不是逐条推理,可将推理耗时从 100ms 降到 5ms。
七、A/B 测试设计
7.1 推荐系统 A/B 测试框架
python
import hashlib
from scipy import stats
class RecommendationABTest:
"""推荐系统 A/B 测试框架"""
def __init__(self, experiment_id, variants):
self.experiment_id = experiment_id
self.variants = variants # {'control': model_v1, 'treatment': model_v2}
self.results = defaultdict(list)
def assign_variant(self, user_id):
"""用户级哈希分流------同一用户始终在同一组"""
hash_key = f"{self.experiment_id}_{user_id}"
hash_val = int(hashlib.md5(hash_key.encode()).hexdigest(), 16)
variant_index = hash_val % len(self.variants)
return list(self.variants.keys())[variant_index]
def calculate_sample_size(self, baseline_rate, mde, alpha=0.05, power=0.8):
"""
计算所需样本量
baseline_rate: 基线组 CTR
mde: 最小可检测效应 (Minimum Detectable Effect)
"""
p1 = baseline_rate
p2 = baseline_rate * (1 + mde) # 治疗组预期 CTR
z_alpha = stats.norm.ppf(1 - alpha / 2)
z_beta = stats.norm.ppf(power)
p_avg = (p1 + p2) / 2
n = ((z_alpha * np.sqrt(2 * p_avg * (1 - p_avg)) +
z_beta * np.sqrt(p1 * (1 - p1) + p2 * (1 - p2))) ** 2) / \
(p2 - p1) ** 2
return int(np.ceil(n))
def analyze_results(self, metric='ctr'):
"""统计显著性检验"""
control_values = self.results[f'control_{metric}']
treatment_values = self.results[f'treatment_{metric}']
# 两样本 t 检验
t_stat, p_value = stats.ttest_ind(control_values, treatment_values)
# 效应量(Cohen's d)
mean_diff = np.mean(treatment_values) - np.mean(control_values)
pooled_std = np.sqrt((np.var(control_values) + np.var(treatment_values)) / 2)
cohens_d = mean_diff / pooled_std if pooled_std > 0 else 0
# 相对提升
relative_lift = mean_diff / np.mean(control_values) if np.mean(control_values) > 0 else 0
return {
'control_mean': np.mean(control_values),
'treatment_mean': np.mean(treatment_values),
'absolute_lift': mean_diff,
'relative_lift': relative_lift,
'p_value': p_value,
'significant': p_value < 0.05,
'cohens_d': cohens_d,
}
def check_simpson_paradox(self, stratified_results):
"""
Simpson 悖论检测:
整体看治疗组更好,但分人群看对照组在每个子群体都更好
→ 结论反转,必须分人群分析
"""
overall_treatment_wins = 0
subgroup_control_wins = 0
for subgroup, data in stratified_results.items():
control_mean = np.mean(data['control'])
treatment_mean = np.mean(data['treatment'])
if treatment_mean > control_mean:
overall_treatment_wins += 1
else:
subgroup_control_wins += 1
paradox_detected = (overall_treatment_wins > 0) and (subgroup_control_wins > overall_treatment_wins)
return {
'paradox_detected': paradox_detected,
'subgroup_control_wins': subgroup_control_wins,
'overall_treatment_wins': overall_treatment_wins,
'recommendation': '分人群单独分析,不看整体汇总' if paradox_detected else '整体结论可靠'
}
7.2 A/B 测试的指标选择
推荐系统的 A/B 测试不能只看 CTR------CTR 高不代表用户体验好:
| 指标类别 | 指标 | 衡量什么 | 注意事项 |
|---|---|---|---|
| 核心业务 | CTR | 推荐精准度 | 容易被标题党优化 |
| 核心业务 | 转化率(CVR) | 推荐→购买链路 | 延迟更高,样本量更大 |
| 核心业务 | GMV | 商业价值 | 受价格因素干扰 |
| 用户体验 | 多样性 | 推荐覆盖类别数 | 高多样性可能降低 CTR |
| 用户体验 | 新鲜度 | 新商品曝光比例 | 过高则质量下降 |
| 用户体验 | 覆盖率 | 推荐池利用率 | 太低 = 信息茧房 |
| 长期指标 | 留存率 | 推荐对长期粘性的影响 | 需要长期实验(2~4 周) |
Simpson 悖论防范:新模型整体 CTR 提升 2%,但分人群看------高活跃用户 CTR 下降 3%、低活跃用户 CTR 提升 5%。表面"成功"的实验实际伤害了核心用户群。A/B 测试必须分人群分析。
八、推荐系统监控体系
8.1 四维监控框架
python
class RecommendationMonitor:
"""推荐系统四维监控"""
def __init__(self):
self.metrics = defaultdict(list)
def coverage_rate(self, recommended_items, total_item_pool):
"""覆盖率------推荐池中有多少商品被至少推荐过一次"""
recommended_set = set()
for items in recommended_items:
recommended_set.update(items)
return len(recommended_set) / len(total_item_pool)
def freshness_rate(self, recommended_items, item_creation_dates,
current_date, new_threshold_days=7):
"""新鲜度------最近7天新商品的曝光比例"""
total_new_exposed = 0
total_exposed = 0
for items in recommended_items:
for item_id in items:
total_exposed += 1
if item_id in item_creation_dates:
age_days = (current_date - item_creation_dates[item_id]).days
if age_days <= new_threshold_days:
total_new_exposed += 1
return total_new_exposed / total_exposed if total_exposed > 0 else 0
def diversity_score(self, recommended_items, item_categories):
"""多样性------推荐列表的类别分布均匀度(用熵衡量)"""
category_counts = defaultdict(int)
for items in recommended_items:
for item_id in items:
if item_id in item_categories:
category_counts[item_categories[item_id]] += 1
total = sum(category_counts.values()) or 1
probs = [count / total for count in category_counts.values()]
# 信息熵:越高越多样
entropy = -sum(p * np.log2(p) for p in probs if p > 0)
max_entropy = np.log2(len(category_counts)) if category_counts else 1
return entropy / max_entropy # 归一化到 0~1
def business_metric_correlation(self, recommendation_metrics, business_metrics):
"""
推荐指标与业务指标的联动
例:推荐 GMV 占比 = 推荐引导的 GMV / 总 GMV
"""
correlation = np.corrcoef(recommendation_metrics, business_metrics)[0, 1]
return correlation
def health_report(self, recommended_items, total_pool,
item_categories, item_creation_dates, current_date):
"""推荐系统健康度报告"""
coverage = self.coverage_rate(recommended_items, total_pool)
freshness = self.freshness_rate(recommended_items, item_creation_dates,
current_date)
diversity = self.diversity_score(recommended_items, item_categories)
# 健康度阈值
alerts = []
if coverage < 0.8:
alerts.append(f"⚠ 覆盖率 {coverage:.1%} < 80% → 信息茧房风险")
if freshness < 0.05:
alerts.append(f"⚠ 新鲜度 {freshness:.1%} < 5% → 新商品曝光不足")
if diversity < 0.6:
alerts.append(f"⚠ 多样性 {diversity:.1%} < 60% → 类别集中度过高")
return {
'coverage': coverage,
'freshness': freshness,
'diversity': diversity,
'alerts': alerts,
'overall_health': 'healthy' if not alerts else 'attention_needed'
}
8.2 推荐监控的"北极星指标"
覆盖率、新鲜度、多样性是系统层面的监控指标------它们防止推荐系统"走偏"。但最终判断推荐系统好坏的北极星指标是 推荐 GMV 占比(推荐引导的 GMV / 总 GMV)。如果推荐 GMV 占比持续上升,说明推荐确实在创造商业价值;如果 CTR 上升但 GMV 占比不变,说明推荐在"空转"。
九、实战完整架构
9.1 MovieLens 完整推荐链路
python
from sklearn.model_selection import train_test_split
import pandas as pd
class ECommerceRecommendationSystem:
"""电商推荐系统完整实现------MovieLens 数据"""
def __init__(self):
self.retrieval = MultiChannelRetrieval(n_items=1000, n_users=500)
self.feature_builder = RecommendationFeatureBuilder()
self.ranker = None
self.cold_start = None
self.monitor = RecommendationMonitor()
def load_movielens_data(self, ratings_path, movies_path):
"""加载 MovieLens 数据"""
ratings = pd.read_csv(ratings_path)
movies = pd.read_csv(movies_path)
# 构建用户-商品交互字典
user_items = defaultdict(list)
for _, row in ratings.iterrows():
user_items[row['userId']].append(row['movieId'])
# 构建商品内容特征
item_features = {}
for _, row in movies.iterrows():
genres = row['genres'].split('|')
item_features[row['movieId']] = {
'category': genres[0],
'tags': genres,
'title': row['title'],
}
return ratings, movies, user_items, item_features
def build_and_train(self, user_items, item_features, ratings):
"""完整训练流程"""
# Phase 1: 多路召回
print("[Phase 1] 多路召回...")
# Item-CF 召回(需要先计算商品相似度矩阵)
# 此处简化为直接使用评分数据构建
n_items = len(item_features)
item_sim = np.random.rand(n_items, n_items) # 实际用cosine_similarity
self.retrieval.item_cf_recall(user_items, item_sim, top_k=200)
self.retrieval.popular_recall(np.random.rand(n_items), top_k=100)
# Phase 2: 合并去重
print("[Phase 2] 多路合并...")
candidates = self.retrieval.merge_and_dedup(top_k=500)
# Phase 3: 特征构建 + 精排训练
print("[Phase 3] 精排训练...")
X_train, y_train, groups = self._build_training_data(
candidates, user_items, item_features, ratings
)
self.ranker = LambdaMARTRanker()
self.ranker.train(X_train, y_train, groups)
# Phase 4: 重排策略配置
print("[Phase 4] 系统就绪")
return self
def recommend(self, user_id, n=30):
"""完整推荐流程:召回 → 精排 → 重排"""
# 1. 多路召回 → 合并去重
candidates = self.retrieval.merge_and_dedup(top_k=500)
if user_id not in candidates:
return self.cold_start.handle_new_user(user_id, level=0)
user_candidates = candidates[user_id]
# 2. 精排特征构建
features = []
for item_id, recall_score in user_candidates.items():
feature = self.feature_builder.build_full_features(
{}, # user features (从在线存储读取)
item_features.get(item_id, {}),
{}, # cross features
{} # context features
)
feature['recall_score'] = recall_score # 召回分数也作为精排特征
features.append(feature)
# 3. 精排模型排序
X = np.array([list(f.values()) for f in features])
scores = self.ranker.model.predict(X)
# 4. 重排:多样性注入 + 商业策略
ranked_items = np.argsort(scores)[::-1][:n]
final_list = self._rerank_with_diversity(
ranked_items, item_features, diversity_ratio=0.3
)
return final_list
def _rerank_with_diversity(self, ranked_items, item_features, diversity_ratio):
"""
重排策略:在精排结果中注入多样性
算法:MMR(Maximal Marginal Relevance)变体
每选一个商品时,平衡相关性和与已选商品的差异性
"""
selected = []
remaining = list(ranked_items)
# 第一个直接选最高分
selected.append(remaining.pop(0))
while remaining and len(selected) < len(ranked_items):
best_score = -1
best_idx = 0
for i, item_id in enumerate(remaining):
# 相关性分数(精排输出)
relevance = scores[item_id] if 'scores' in dir() else 0.5
# 与已选商品的差异性(类别维度)
selected_categories = set(
item_features[s]['category'] for s in selected
if s in item_features
)
current_category = item_features.get(item_id, {}).get('category', '')
diversity_bonus = 1.0 if current_category not in selected_categories else 0.2
# MMR 综合分数
mmr_score = (1 - diversity_ratio) * relevance + \
diversity_ratio * diversity_bonus
if mmr_score > best_score:
best_score = mmr_score
best_idx = i
selected.append(remaining.pop(best_idx))
return selected
9.2 推荐系统完整链路验证
python
def validate_recommendation_system(system, test_data, ground_truth):
"""推荐系统完整链路验证"""
metrics = {}
# 1. 召回层验证:召回率
recall_hits = 0
total_relevant = 0
for user_id, relevant_items in ground_truth.items():
candidates = system.retrieval.merge_and_dedup(top_k=500)
if user_id in candidates:
recalled_items = set(candidates[user_id].keys())
hits = len(recalled_items & set(relevant_items))
recall_hits += hits
total_relevant += len(relevant_items)
metrics['recall_rate'] = recall_hits / total_relevant if total_relevant > 0 else 0
# 2. 精排层验证:NDCG@10
ndcg_values = []
for user_id, relevant_items in ground_truth.items():
recommended = system.recommend(user_id, n=10)
# 构建相关性标签:ground truth 中有 = 1,否则 = 0
relevance_labels = [1 if item in relevant_items else 0 for item in recommended]
if len(set(relevance_labels)) > 1:
ndcg = ndcg_score([relevance_labels], [range(len(relevance_labels), 0, -1)])
ndcg_values.append(ndcg)
metrics['ndcg_at_10'] = np.mean(ndcg_values) if ndcg_values else 0
# 3. 监控指标验证
all_recommended = [system.recommend(uid, n=30) for uid in range(100)]
health = system.monitor.health_report(
all_recommended,
total_pool=range(len(test_data)),
item_categories={},
item_creation_dates={},
current_date=pd.Timestamp.now()
)
metrics['monitor'] = health
print(f"召回率: {metrics['recall_rate']:.3f}")
print(f"NDCG@10: {metrics['ndcg_at_10']:.3f}")
print(f"系统健康度: {metrics['monitor']['overall_health']}")
return metrics
十、常见坑与最小可行方案对照表
| 环节 | 常见坑 | 最小可行方案 |
|---|---|---|
| 召回 | 只用单路召回 → 覆盖率低 | 三路召回(CF + Content + Popular),加权合并 |
| 召回 | 召回量太大 → 精排超时 | 每路 Top 200,合并后 Top 500 |
| 精排 | 不传 group_sizes → 模型无法感知排序结构 | LightGBM group 参数必传 |
| 精排 | 只用用户/商品特征 → 排序区分度不足 | 必须加交叉特征(类别偏好×商品类别) |
| 重排 | 不做多样性 → 信息茧房 | MMR 重排,diversity_ratio=0.3 |
| 冷启动 | 新用户推荐空白页 | 分级冷启动:热门 → 分组 → 引导 → 个性化 |
| 实时 | 用户行为无法实时反映 | Kafka + Feature Store + Redis,全链路 <200ms |
| A/B | 只看 CTR → Simpson 悖论 | 分人群分析 + 多指标(CTR/GMV/多样性) |
| 监控 | 训完就不管了 | 覆盖率/新鲜度/多样性 + GMV 占比联动 |
#mermaid-svg-ZHNA0BcLWukk3Zu8{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .error-icon{fill:#552222;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .marker.cross{stroke:#333333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 p{margin:0;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .cluster-label text{fill:#333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .cluster-label span{color:#333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .cluster-label span p{background-color:transparent;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .label text,#mermaid-svg-ZHNA0BcLWukk3Zu8 span{fill:#333;color:#333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .node rect,#mermaid-svg-ZHNA0BcLWukk3Zu8 .node circle,#mermaid-svg-ZHNA0BcLWukk3Zu8 .node ellipse,#mermaid-svg-ZHNA0BcLWukk3Zu8 .node polygon,#mermaid-svg-ZHNA0BcLWukk3Zu8 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .rough-node .label text,#mermaid-svg-ZHNA0BcLWukk3Zu8 .node .label text,#mermaid-svg-ZHNA0BcLWukk3Zu8 .image-shape .label,#mermaid-svg-ZHNA0BcLWukk3Zu8 .icon-shape .label{text-anchor:middle;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .rough-node .label,#mermaid-svg-ZHNA0BcLWukk3Zu8 .node .label,#mermaid-svg-ZHNA0BcLWukk3Zu8 .image-shape .label,#mermaid-svg-ZHNA0BcLWukk3Zu8 .icon-shape .label{text-align:center;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .node.clickable{cursor:pointer;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .arrowheadPath{fill:#333333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-ZHNA0BcLWukk3Zu8 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ZHNA0BcLWukk3Zu8 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-ZHNA0BcLWukk3Zu8 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .cluster text{fill:#333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .cluster span{color:#333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-ZHNA0BcLWukk3Zu8 rect.text{fill:none;stroke-width:0;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .icon-shape,#mermaid-svg-ZHNA0BcLWukk3Zu8 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .icon-shape p,#mermaid-svg-ZHNA0BcLWukk3Zu8 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .icon-shape .label rect,#mermaid-svg-ZHNA0BcLWukk3Zu8 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-ZHNA0BcLWukk3Zu8 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-ZHNA0BcLWukk3Zu8 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-ZHNA0BcLWukk3Zu8 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 运维体系
监控体系
推荐系统全链路
多路召回
合并去重
特征构建
精排模型
重排多样性
推荐结果
覆盖率 > 80%
新鲜度 > 5%
多样性 > 60%
GMV占比
A/B测试 分人群分析
冷启动分级策略
实时特征更新 Kafka+Redis
总结
端到端推荐系统不是"训个排序模型"------而是"运行一个四层漏斗架构"。召回层保证好商品不漏(多路互补、加权融合);精排层保证好商品排前面(交叉特征是核心信号);重排层保证推荐多样性(MMR 平衡相关性和差异性);监控体系保证系统不跑偏(覆盖率/新鲜度/多样性 + GMV 占比联动)
前文推荐系统基础讲了协同过滤和矩阵分解的算法原理------本篇是从算法到系统的工程化落地。前文 Learning to Rank 讲了 LambdaMART 的三层架构------本篇展示了它在推荐精排场景的工程落地,特别是 group_sizes 参数的正确传递。前文端到端项目实战一和二建立了"从需求到监控的全链路思维"------本篇将这套思维应用到推荐系统的四层漏斗架构中
如果觉得这篇电商推荐系统端到端实战对理解推荐工程化有帮助,欢迎点赞收藏,关注专栏获取后续更新