目录
- 背景与动机
- 核心思想:内积与外积
- [PNN vs DeepFM 对比](#PNN vs DeepFM 对比)
- 模型架构
- 面试常见问题
背景与动机
DeepFM 的局限
DeepFM 的结构:
y = FM(x) + DNN(Embedding(x))
问题:
- FM 部分计算所有特征对的交互
- DNN 部分直接使用原始 embedding
- 两者之间没有信息共享
PNN 的创新
显式特征交互 + DNN 深度学习:
y = Linear(x) + DNN([Embedding(x), Product_Layer(Embedding(x))])
↑
内积/外积交互
优势:
- 显式计算特征交互(内积/外积)
- 将交互信息输入 DNN(而非独立的 FM)
- DNN 学习更好的特征表示
核心思想:内积与外积
IPNN (Inner Product)
内积计算:
对于两个向量 v_i, v_j:
v_i · v_j = Σ_k v_i[k] × v_j[k]
结果: 标量(一个数)
例子:
v_1 = [0.2, 0.5, 0.8]
v_2 = [0.3, 0.1, 0.6]
内积 = 0.2×0.3 + 0.5×0.1 + 0.8×0.6
= 0.06 + 0.05 + 0.48
= 0.59
OPNN (Outer Product)
外积计算:
对于两个向量 v_i, v_j:
v_i ⊗ v_j: 生成 embedding_dim × embedding_dim 矩阵
结果: 向量(矩阵展开)
例子:
v_1 = [0.2, 0.5, 0.8]
v_2 = [0.3, 0.1, 0.6]
外积 = [
[0.2×0.3, 0.2×0.1, 0.2×0.6], # 第一行
[0.5×0.3, 0.5×0.1, 0.5×0.6], # 第二行
[0.8×0.3, 0.8×0.1, 0.8×0.6] # 第三行
]
= [
[0.06, 0.02, 0.12],
[0.15, 0.05, 0.30],
[0.24, 0.08, 0.48]
]
展开后: [0.06, 0.02, 0.12, 0.15, 0.05, 0.30, 0.24, 0.08, 0.48]
IPNN vs OPNN
| 维度 | IPNN | OPNN |
|---|---|---|
| 计算方式 | 内积 v_i · v_j | 外积 v_i ⊗ v_j |
| 交互维度 | 标量 (1) | 向量 (k²) |
| 计算复杂度 | O(n²k) | O(n²k²) |
| 参数量 | 少 | 多 |
| 信息保留 | 少 | 多 |
| 适用场景 | 特征少 | 特征多 |
PNN vs DeepFM 对比
核心区别
| 维度 | DeepFM | PNN |
|---|---|---|
| 特征交互 | FM 独立计算 | 显式计算后输入 DNN |
| 信息共享 | 无 | DNN 共享交互信息 |
| 交互方式 | 固定(内积) | 可选(内积/外积) |
| 灵活性 | 中 | 高 |
架构对比
DeepFM:
输入特征 → Embedding → ┌─────┐
│ FM │ → 输出1
└─────┘
┌─────┐
│ DNN │ → 输出2
└─────┘
最终输出: 输出1 + 输出2
PNN:
输入特征 → Embedding → ┌────────────┐
│ Product层 │ → 交互特征
└────────────┘
↓
[Embedding, 交互特征]
↓
┌─────┐
│ DNN │ → 输出
└─────┘
最终输出: Linear + DNN输出
模型架构
整体结构
输入特征 (离散索引)
↓
Embedding 层
↓
┌────────────┴────────────┐
↓ ↓
原始 Embedding Product 层
(v_1, v_2, ...) (内积/外积交互)
└────────────┬────────────┘
↓
拼接: [Embedding, 交互]
↓
DNN 层
↓
线性部分 + DNN 输出
↓
最终输出
组件详解
1. Embedding 层
将离散特征映射到稠密向量
用户 123 → [0.23, 0.15, 0.08, ...]
广告 45 → [0.67, 0.32, 0.91, ...]
2. Product 层
IPNN 内积:
python
# 计算所有特征对的内积
for i in range(num_features):
for j in range(i + 1, num_features):
# v_i · v_j
inner_prod = torch.sum(
embeddings[:, i, :] * embeddings[:, j, :],
dim=1
)
inner_products.append(inner_prod)
OPNN 外积:
python
# 计算外积(使用 einsum)
for i in range(num_features):
for j in range(i + 1, num_features):
# v_i ⊗ v_j
outer = torch.einsum('bi,bj->bij',
embeddings[:, i, :],
embeddings[:, j, :])
outer_sum += outer.view(batch_size, -1)
3. DNN 层
输入:
[原始 Embedding, 内积/外积交互]
DNN 学习更好的特征表示:
输入 → Linear → ReLU → BatchNorm → ... → 输出
代码实现
PyTorch 实现 - IPNN
python
import torch
import torch.nn as nn
class IPNN(nn.Module):
"""
Inner Product-based Neural Network (IPNN)
核心思想: 使用内积计算特征交互
1. 嵌入层: 转换特征到 embedding 向量
2. 内积层: 计算所有特征对的 v_i · v_j
3. DNN: 融合原始 embedding 和内积交互
"""
def __init__(self, feature_dims, embedding_dim=8, hidden_dims=[64, 32]):
super().__init__()
self.feature_dims = feature_dims
self.num_features = len(feature_dims)
self.embedding_dim = embedding_dim
# Embedding 层
self.embeddings = nn.ModuleList([
nn.Embedding(dim, embedding_dim) for dim in feature_dims
])
# 一阶线性部分
self.linear = nn.Linear(self.num_features, 1)
# DNN 输入维度: 原始 embedding + 内积交互
num_pairs = self.num_features * (self.num_features - 1) // 2
dnn_input_dim = self.num_features * embedding_dim + num_pairs
# DNN 部分
dnn_layers = []
for hidden_dim in hidden_dims:
dnn_layers.append(nn.Linear(dnn_input_dim, hidden_dim))
dnn_layers.append(nn.ReLU())
dnn_layers.append(nn.BatchNorm1d(hidden_dim))
dnn_input_dim = hidden_dim
dnn_layers.append(nn.Linear(dnn_input_dim, 1))
self.dnn = nn.Sequential(*dnn_layers)
def forward_inner_products(self, embeddings):
"""
计算所有特征对的内积
"""
batch_size = embeddings.shape[0]
inner_products = []
for i in range(self.num_features):
for j in range(i + 1, self.num_features):
# v_i · v_j: 内积
prod = torch.sum(
embeddings[:, i, :] * embeddings[:, j, :],
dim=1,
keepdim=True
)
inner_products.append(prod)
return torch.cat(inner_products, dim=1)
def forward(self, x):
batch_size = x.shape[0]
# Embedding
embedded_features = []
for i, emb in enumerate(self.embeddings):
emb_i = emb(x[:, i])
embedded_features.append(emb_i)
all_embeddings = torch.cat(embedded_features, dim=1)
all_embeddings = all_embeddings.view(
batch_size, self.num_features, self.embedding_dim
)
# 一阶线性
linear_part = self.linear(x.float())
# 计算内积交互
inner_products = self.forward_inner_products(all_embeddings)
# 拼接: 原始 embedding + 内积交互
flatten_embeddings = all_embeddings.view(batch_size, -1)
dnn_input = torch.cat([flatten_embeddings, inner_products], dim=1)
# DNN
dnn_output = self.dnn(dnn_input)
# 融合输出
output = linear_part + dnn_output
return output
PyTorch 实现 - OPNN
python
class OPNN(nn.Module):
"""
Outer Product-based Neural Network (OPNN)
核心思想: 使用外积计算特征交互
"""
def __init__(self, feature_dims, embedding_dim=8, hidden_dims=[64, 32],
reduce_sum=True):
super().__init__()
self.feature_dims = feature_dims
self.num_features = len(feature_dims)
self.embedding_dim = embedding_dim
self.reduce_sum = reduce_sum
# Embedding 层
self.embeddings = nn.ModuleList([
nn.Embedding(dim, embedding_dim) for dim in feature_dims
])
# 一阶线性部分
self.linear = nn.Linear(self.num_features, 1)
# DNN 输入维度
if reduce_sum:
num_pairs = self.num_features * (self.num_features - 1) // 2
dnn_input_dim = self.num_features * embedding_dim + num_pairs
else:
dnn_input_dim = self.num_features * embedding_dim + embedding_dim * embedding_dim
# DNN 部分
dnn_layers = []
for hidden_dim in hidden_dims:
dnn_layers.append(nn.Linear(dnn_input_dim, hidden_dim))
dnn_layers.append(nn.ReLU())
dnn_layers.append(nn.BatchNorm1d(hidden_dim))
dnn_input_dim = hidden_dim
dnn_layers.append(nn.Linear(dnn_input_dim, 1))
self.dnn = nn.Sequential(*dnn_layers)
def forward_outer_products(self, embeddings):
"""
计算外积交互
"""
batch_size = embeddings.shape[0]
if self.reduce_sum:
# 对所有外积求和后压缩
outer_sum = torch.zeros(batch_size, self.embedding_dim * self.embedding_dim,
device=embeddings.device)
for i in range(self.num_features):
for j in range(i + 1, self.num_features):
# v_i ⊗ v_j: 外积
outer = torch.einsum('bi,bj->bij',
embeddings[:, i, :],
embeddings[:, j, :])
outer_sum += outer.view(batch_size, -1)
# 求和得到每个维度的总交互
outer_products = torch.sum(outer_sum, dim=1, keepdim=True)
outer_products = outer_products.repeat(1, outer_sum.shape[1] // batch_size)
return outer_products
else:
# 只计算一个特征对的外积
outer = torch.einsum('bi,bj->bij',
embeddings[:, 0, :],
embeddings[:, 1, :])
return outer.view(batch_size, -1)
def forward(self, x):
batch_size = x.shape[0]
# Embedding
embedded_features = []
for i, emb in enumerate(self.embeddings):
emb_i = emb(x[:, i])
embedded_features.append(emb_i)
all_embeddings = torch.cat(embedded_features, dim=1)
all_embeddings = all_embeddings.view(
batch_size, self.num_features, self.embedding_dim
)
# 一阶线性
linear_part = self.linear(x.float())
# 计算外积交互
outer_products = self.forward_outer_products(all_embeddings)
# 拼接: 原始 embedding + 外积交互
flatten_embeddings = all_embeddings.view(batch_size, -1)
dnn_input = torch.cat([flatten_embeddings, outer_products], dim=1)
# DNN
dnn_output = self.dnn(dnn_input)
# 融合输出
output = linear_part + dnn_output
return output
面试常见问题
Q1: PNN 和 DeepFM 的核心区别是什么?
A:
- DeepFM: FM 和 DNN 独立计算,最后相加
- PNN: 显式计算特征交互(内积/外积),然后将交互信息输入 DNN
Q2: 内积和外积的区别是什么?
A:
- 内积 (IPNN): v_i · v_j → 标量(一个数)
- 外积 (OPNN): v_i ⊗ v_j → 矩阵(k×k 向量)
Q3: PNN 的计算复杂度如何?
A:
IPNN: O(n²k) - 内积计算
OPNN: O(n²k²) - 外积计算(复杂度高)
n: 特征数
k: embedding 维度
Q4: 什么时候使用 IPNN vs OPNN?
A:
| 场景 | 推荐 | 原因 |
|---|---|---|
| 特征少 (n<10) | OPNN | 外积保留更多信息 |
| 特征多 (n>10) | IPNN | 内积计算效率高 |
| 延迟敏感 | IPNN | 计算速度快 |
| 精度要求高 | OPNN | 信息保留多 |
Q5: PNN 的参数量如何?
A:
Embedding: Σ(n_i × k)
Linear: n
DNN: 取决于 hidden_dims
Product 层: 无参数(仅计算)
Q6: PNN vs DCN 如何选择?
A:
| 模型 | 优势 | 劣势 |
|---|---|---|
| PNN | 显式交互,灵活 | 特征多时计算复杂 |
| DCN | 计算高效,显式高阶 | 低秩可能损失信息 |
选择建议:
- 需要显式内积/外积 → PNN
- 需要高阶交互 + 高效 → DCN
模型演进
FM (2010)
↓ 二阶交互
DeepFM (2017)
↓ 添加 DNN
PNN (2016) ⭐
↓ 显式交互 (内积/外积)
DCN (2017)
↓ Cross Network
xDeepFM (2018)
↓ CIN 显式高阶
实现案例(CTR预测)
python
import torch
import torch.nn as nn
class IPNN(nn.Module):
"""
Inner Product-based Neural Network (IPNN)
核心思想: 使用内积计算特征交互
1. 嵌入层: 转换特征到 embedding 向量
2. 内积层: 计算所有特征对的 v_i · v_j
3. DNN: 融合原始 embedding 和内积交互
"""
def __init__(self, feature_dims, embedding_dim=8, hidden_dims=[64, 32]):
super().__init__()
self.feature_dims = feature_dims
self.num_features = len(feature_dims)
self.embedding_dim = embedding_dim
# Embedding 层
self.embeddings = nn.ModuleList([
nn.Embedding(dim, embedding_dim) for dim in feature_dims
])
# 一阶线性部分
self.linear = nn.Linear(self.num_features, 1)
# DNN 输入维度: 原始 embedding + 内积交互
# 原始: num_features * embedding_dim
# 内积交互: C(n,2) = n*(n-1)/2 个特征对
num_pairs = self.num_features * (self.num_features - 1) // 2
dnn_input_dim = self.num_features * embedding_dim + num_pairs
# DNN 部分
dnn_layers = []
for hidden_dim in hidden_dims:
dnn_layers.append(nn.Linear(dnn_input_dim, hidden_dim))
dnn_layers.append(nn.ReLU())
dnn_layers.append(nn.BatchNorm1d(hidden_dim))
dnn_input_dim = hidden_dim
dnn_layers.append(nn.Linear(dnn_input_dim, 1))
self.dnn = nn.Sequential(*dnn_layers)
def forward_inner_products(self, embeddings):
"""
计算所有特征对的内积
Args:
embeddings: (batch_size, num_features, embedding_dim)
Returns:
inner_products: (batch_size, num_pairs) - 所有特征对的内积
"""
batch_size = embeddings.shape[0]
inner_products = []
# 计算所有特征对的内积
for i in range(self.num_features):
for j in range(i + 1, self.num_features):
# v_i · v_j: (batch_size, embedding_dim)
prod = torch.sum(
embeddings[:, i, :] * embeddings[:, j, :],
dim=1,
keepdim=True
)
inner_products.append(prod)
# 拼接所有内积: (batch_size, num_pairs)
return torch.cat(inner_products, dim=1)
def forward(self, x):
"""
Args:
x: (batch_size, num_features) - 离散特征索引
Returns:
output: (batch_size, 1) - 预测分数
"""
batch_size = x.shape[0]
# Embedding
embedded_features = []
for i, emb in enumerate(self.embeddings):
emb_i = emb(x[:, i])
embedded_features.append(emb_i)
all_embeddings = torch.cat(embedded_features, dim=1)
all_embeddings = all_embeddings.view(
batch_size, self.num_features, self.embedding_dim
)
# 一阶线性
linear_part = self.linear(x.float())
# 计算内积交互
inner_products = self.forward_inner_products(all_embeddings)
# 拼接: 原始 embedding + 内积交互
flatten_embeddings = all_embeddings.view(batch_size, -1)
dnn_input = torch.cat([flatten_embeddings, inner_products], dim=1)
# DNN
dnn_output = self.dnn(dnn_input)
# 融合输出
output = linear_part + dnn_output
return output
class OPNN(nn.Module):
"""
Outer Product-based Neural Network (OPNN)
核心思想: 使用外积计算特征交互
1. 嵌入层: 转换特征到 embedding 向量
2. 外积层: 计算所有特征对的 v_i ⊗ v_j
3. DNN: 融合原始 embedding 和外积交互
"""
def __init__(self, feature_dims, embedding_dim=8, hidden_dims=[64, 32],
reduce_sum=True):
super().__init__()
self.feature_dims = feature_dims
self.num_features = len(feature_dims)
self.embedding_dim = embedding_dim
self.reduce_sum = reduce_sum
# Embedding 层
self.embeddings = nn.ModuleList([
nn.Embedding(dim, embedding_dim) for dim in feature_dims
])
# 一阶线性部分
self.linear = nn.Linear(self.num_features, 1)
# DNN 输入维度
if reduce_sum:
# 使用求和压缩外积交互
num_pairs = self.num_features * (self.num_features - 1) // 2
dnn_input_dim = self.num_features * embedding_dim + num_pairs
else:
# 保留完整外积 (维度会很大)
dnn_input_dim = self.num_features * embedding_dim + embedding_dim * embedding_dim
# DNN 部分
dnn_layers = []
for hidden_dim in hidden_dims:
dnn_layers.append(nn.Linear(dnn_input_dim, hidden_dim))
dnn_layers.append(nn.ReLU())
dnn_layers.append(nn.BatchNorm1d(hidden_dim))
dnn_input_dim = hidden_dim
dnn_layers.append(nn.Linear(dnn_input_dim, 1))
self.dnn = nn.Sequential(*dnn_layers)
def forward_outer_products(self, embeddings):
"""
计算外积交互
Args:
embeddings: (batch_size, num_features, embedding_dim)
Returns:
outer_products: (batch_size, num_pairs) 或 (batch_size, embedding_dim^2)
"""
batch_size = embeddings.shape[0]
if self.reduce_sum:
# 对所有外积求和后压缩
outer_sum = torch.zeros(batch_size, self.embedding_dim * self.embedding_dim,
device=embeddings.device)
for i in range(self.num_features):
for j in range(i + 1, self.num_features):
# v_i ⊗ v_j: 外积
# 使用 einsum: (batch, d) ⊗ (batch, d) -> (batch, d, d)
outer = torch.einsum('bi,bj->bij',
embeddings[:, i, :],
embeddings[:, j, :])
outer_sum += outer.view(batch_size, -1)
# 求和得到每个维度的总交互
outer_products = torch.sum(outer_sum, dim=1, keepdim=True)
# 重复以匹配批大小
outer_products = outer_products.repeat(1, outer_sum.shape[1] // batch_size)
return outer_products
else:
# 只计算一个特征对的外积 (简化版本)
outer = torch.einsum('bi,bj->bij',
embeddings[:, 0, :],
embeddings[:, 1, :])
return outer.view(batch_size, -1)
def forward(self, x):
"""
Args:
x: (batch_size, num_features) - 离散特征索引
Returns:
output: (batch_size, 1) - 预测分数
"""
batch_size = x.shape[0]
# Embedding
embedded_features = []
for i, emb in enumerate(self.embeddings):
emb_i = emb(x[:, i])
embedded_features.append(emb_i)
all_embeddings = torch.cat(embedded_features, dim=1)
all_embeddings = all_embeddings.view(
batch_size, self.num_features, self.embedding_dim
)
# 一阶线性
linear_part = self.linear(x.float())
# 计算外积交互
outer_products = self.forward_outer_products(all_embeddings)
#num_pairs = self.num_features * (self.num_features - 1) // 2
#outer_products = outer_products[:, :num_pairs]
# 拼接: 原始 embedding + 外积交互
flatten_embeddings = all_embeddings.view(batch_size, -1)
dnn_input = torch.cat([flatten_embeddings, outer_products], dim=1)
# DNN
dnn_output = self.dnn(dnn_input)
# 融合输出
output = linear_part + dnn_output
return output
if __name__ == '__main__':
# 特征定义
feature_dims = [1000, 500, 5, 4, 10]
print('=' * 60)
print('PNN: Product-based Neural Network')
print('=' * 60)
# ==================== IPNN ====================
print('\n=== IPNN (Inner Product) ===')
ipnn = IPNN(
feature_dims=feature_dims,
embedding_dim=8,
hidden_dims=[64, 32]
)
print(ipnn)
ipnn_params = sum(p.numel() for p in ipnn.parameters())
print(f'\n参数量: {ipnn_params:,}')
# ==================== OPNN ====================
print('\n=== OPNN (Outer Product) ===')
opnn = OPNN(
feature_dims=feature_dims,
embedding_dim=8,
hidden_dims=[64, 32],
reduce_sum=True
)
print(opnn)
opnn_params = sum(p.numel() for p in opnn.parameters())
print(f'\n参数量: {opnn_params:,}')
# ==================== 对比 ====================
print('\n=== 参数量对比 ===')
print(f'IPNN: {ipnn_params:,}')
print(f'OPNN: {opnn_params:,}')
# ==================== 训练 IPNN ====================
print('\n=== 训练 IPNN ===')
batch_size = 32
x = torch.tensor([
[torch.randint(0, dim, size=(1,)).item() for dim in feature_dims]
for _ in range(batch_size)
])
y = torch.randint(0, 2, (batch_size, 1), dtype=torch.float32)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(ipnn.parameters(), lr=0.001)
for epoch in range(4000):
pred = ipnn(x)
loss = criterion(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 400 == 0:
print(f'Epoch {epoch + 1:4d}, Loss: {loss.item():.6f}')
# ==================== 预测 ====================
ipnn.eval()
with torch.no_grad():
test_x = torch.tensor([[
torch.randint(0, dim, size=(1,)).item() for dim in feature_dims
]])
logits = ipnn(test_x)
click_prob = torch.sigmoid(logits)
print(f'\n=== 预测结果 ===')
print(f'模型输出 (logits): {logits.item():.4f}')
print(f'点击概率 (sigmoid): {click_prob.item():.4f}')
# ==================== 原理说明 ====================
print(f'\n=== PNN 工作原理 ===')
print('1. 内积 (IPNN):')
print(' - v_i · v_j = Σ(v_i[k] × v_j[k])')
print(' - 结果: 每个特征对得到一个标量')
print(' - 优势: 计算效率高')
print('\n2. 外积 (OPNN):')
print(' - v_i ⊗ v_j: 生成 embedding_dim × embedding_dim 矩阵')
print(' - 结果: 每个特征对得到一个向量')
print(' - 优势: 保留更多信息,但计算成本高')
print('\n3. 特征交互:')
print(' - 传统 DNN: embedding 直接 concat')
print(' - PNN: 先计算内积/外积,再输入 DNN')
print(' - 结果: DNN 可以学习到更好的特征交互表示')
快速检查清单
理解 PNN,你应该能回答:
- 解释 PNN 和 DeepFM 的区别
- 说明内积和外积的计算方式
- 计算 IPNN 的内积交互
- 计算 OPNN 的外积交互
- 了解 PNN 的计算复杂度
- 比较 IPNN vs OPNN 的适用场景
- 能从零实现简单的 IPNN/OPNN
参考资料
- PNN 原始论文: https://arxiv.org/abs/1604.06738
- 推荐系统模型库: https://github.com/shenweichen/DeepCTR
- TensorRT 优化 PNN: https://developer.nvidia.com/blog/optimizing-recsys-models-with-tensorrt/