推理攻击-Python案例

1、本文通过推理攻击的方式来估计训练集中每个类别的样本数量、某样本是否在训练集中。

2、一种简单的实现方法：用模型对训练数据标签进行拟合，拟合结果即推理为训练集中的情况。

3、了解这些案例可以帮助我们更好的保护数据隐私。

推理攻击（Inference Attack）是针对机器学习模型的一种攻击方式，攻击者通过查询模型获取关于其训练数据的敏感信息。尽管模型本身不直接暴露训练数据，但通过分析模型的输出，攻击者可以推测出有关训练数据的属性，例如样本分布、类别信息或甚至具体的训练样本。

主要特点：

数据泄露：攻击者能够获取训练集中的隐私信息，可能涉及个人身份信息等敏感数据。
模型查询：攻击者通过向模型发送输入并观察输出，推测与训练数据相关的特征。
风险：这种攻击可能导致数据隐私的泄露，影响数据保护合规性。
防御措施：常用的防御方法包括差分隐私、模型蒸馏和限制模型查询等。

推理攻击的研究对于保护机器学习系统中的数据隐私至关重要。

1. 背景

推理攻击的目的是通过观察模型的输出，获取关于训练数据的敏感信息。在这个示例中，我们将使用模型的预测结果来推测CIFAR-10数据集中每个类别的样本数量。

2. 代码实现

python 复制代码

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np

# 检查CUDA是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 数据预处理
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# 下载并加载 CIFAR-10 数据集
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

# 定义模型
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# 将模型移动到CUDA设备
net = Net().to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# 训练模型
for epoch in range(2):  # 迭代次数
    for inputs, labels in trainloader:
        inputs, labels = inputs.to(device), labels.to(device)  # 移动到CUDA设备
        
        optimizer.zero_grad()   # 清零梯度
        outputs = net(inputs)   # 前向传播
        loss = criterion(outputs, labels)  # 计算损失
        loss.backward()         # 反向传播
        optimizer.step()        # 更新参数

# 推理攻击
def inference_attack(model, data_loader):
    model.eval()  # 设置为评估模式
    class_counts = {i: 0 for i in range(10)}  # CIFAR-10有10个类
    true_counts = {i: 0 for i in range(10)}  # 用于真实标签计数

    with torch.no_grad():
        for inputs, labels in data_loader:
            inputs = inputs.to(device)  # 移动到CUDA设备
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)  # 获取预测的类别
            
            # 统计每个类别的预测次数
            for label in predicted:
                class_counts[label.item()] += 1

            # 统计每个类别的真实标签次数
            for label in labels:
                true_counts[label.item()] += 1

    return class_counts, true_counts

# 执行推理攻击
predicted_counts, true_counts = inference_attack(net, trainloader)

# 输出结果
print("Predicted class counts from inference attack:", predicted_counts)
print("True class counts from training data:", true_counts)

# 计算和输出每个类的比例
total_predicted = sum(predicted_counts.values())
total_true = sum(true_counts.values())

print("\nTotal samples predicted:", total_predicted)
print("Total samples true:", total_true)

for class_id in range(10):
    predicted_count = predicted_counts[class_id]
    true_count = true_counts[class_id]
    print(f"Class {class_id}: Predicted {predicted_count} samples, True {true_count} samples, "
          f"Predicted proportion: {predicted_count / total_predicted:.2%}, "
          f"True proportion: {true_count / total_true:.2%}")

输出

python 复制代码

Files already downloaded and verified
Predicted class counts from inference attack: {0: 5465, 1: 5794, 2: 5200, 3: 3754, 4: 5434, 5: 6335, 6: 4841, 7: 4568, 8: 5093, 9: 3516}
True class counts from training data: {0: 5000, 1: 5000, 2: 5000, 3: 5000, 4: 5000, 5: 5000, 6: 5000, 7: 5000, 8: 5000, 9: 5000}

Total samples predicted: 50000
Total samples true: 50000
Class 0: Predicted 5465 samples, True 5000 samples, Predicted proportion: 10.93%, True proportion: 10.00%
Class 1: Predicted 5794 samples, True 5000 samples, Predicted proportion: 11.59%, True proportion: 10.00%
Class 2: Predicted 5200 samples, True 5000 samples, Predicted proportion: 10.40%, True proportion: 10.00%
Class 3: Predicted 3754 samples, True 5000 samples, Predicted proportion: 7.51%, True proportion: 10.00%
Class 4: Predicted 5434 samples, True 5000 samples, Predicted proportion: 10.87%, True proportion: 10.00%
Class 5: Predicted 6335 samples, True 5000 samples, Predicted proportion: 12.67%, True proportion: 10.00%
Class 6: Predicted 4841 samples, True 5000 samples, Predicted proportion: 9.68%, True proportion: 10.00%
Class 7: Predicted 4568 samples, True 5000 samples, Predicted proportion: 9.14%, True proportion: 10.00%
Class 8: Predicted 5093 samples, True 5000 samples, Predicted proportion: 10.19%, True proportion: 10.00%
Class 9: Predicted 3516 samples, True 5000 samples, Predicted proportion: 7.03%, True proportion: 10.00%

训练阶段

模型定义：首先定义了一个简单的卷积神经网络（CNN），用于分类CIFAR-10图像。
数据加载 ：使用PyTorch的DataLoader加载CIFAR-10数据集，并进行必要的预处理。
训练过程：
- 模型在训练过程中计算损失，并通过反向传播更新权重。
- 在每个训练步骤中，输入和标签都被移动到CUDA设备以加速计算。

推理攻击阶段

设置评估模式 ：在推理攻击中，使用model.eval()将模型设置为评估模式，这样可以禁用dropout等训练时特有的操作。
无梯度计算 ：使用torch.no_grad()来禁用梯度计算，减少内存使用并加快推理速度。
预测类别：
- 遍历训练数据集中的每个样本，输入到模型中进行前向传播，得到输出。
- 使用torch.max(outputs, 1)获取每个样本的预测类别。
统计预测次数：
- 创建一个字典class_counts来记录每个类别的预测次数。
- 对每个预测结果进行迭代，更新class_counts字典中对应类别的计数。

3. 结果分析

输出类别计数：在推理攻击完成后，打印每个类别的预测次数。这表示模型认为训练集中每个类别的样本数量。
比例计算：计算并输出每个类别在总样本中的比例，以帮助攻击者理解各类别在训练集中的分布。

4. 具体推理过程

推理攻击的核心在于，通过观察模型对训练数据的输出，攻击者可以获取有关训练集的信息。尽管模型并不直接透露训练数据的样本，但通过分析模型的预测分布，攻击者能够推测出某些类的样本数量。

另一个示例

在CIFAR-10数据集上实现更复杂的推理攻击可以包括多种策略，比如利用模型输出的概率分布、对抗性样本生成或者训练集样本的重构等。以下是利用模型的输出概率分布来推测训练样本的存在性，同理。

在这个示例中，我们将利用模型的预测概率来分析某些特定样本是否存在于训练集中。

1. 设置

我们将训练一个CNN模型，并使用模型输出的概率来推测某些特定类别样本的存在。

2. 完整代码

python 复制代码

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np

# 检查CUDA是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 数据预处理
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# 下载并加载 CIFAR-10 数据集
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

# 定义模型
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# 将模型移动到CUDA设备
net = Net().to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# 训练模型
for epoch in range(2):  # 迭代次数
    for inputs, labels in trainloader:
        inputs, labels = inputs.to(device), labels.to(device)  # 移动到CUDA设备
        
        optimizer.zero_grad()   # 清零梯度
        outputs = net(inputs)   # 前向传播
        loss = criterion(outputs, labels)  # 计算损失
        loss.backward()         # 反向传播
        optimizer.step()        # 更新参数

# 推理攻击：检查某些样本是否在训练集中
def inference_attack(model, data_loader, sample_images):
    model.eval()  # 设置为评估模式
    existence_scores = []

    with torch.no_grad():
        for image in sample_images:
            image = image.to(device).unsqueeze(0)  # 增加batch维度
            output = model(image)
            probabilities = F.softmax(output, dim=1)
            existence_scores.append(probabilities.cpu().numpy())

    return existence_scores

# 选择一些样本进行推理攻击
sample_indices = [0, 1, 2, 3, 4]  # 假设你想检查前五个训练样本
sample_images = [trainset[i][0] for i in sample_indices]

# 执行推理攻击
scores = inference_attack(net, trainloader, sample_images)

# 输出结果
for i, score in enumerate(scores):
    print(f"Sample {i} existence scores: {score}")

输出

python 复制代码

Files already downloaded and verified
Sample 0 existence scores: [[1.4911070e-03 7.6075867e-03 2.7023258e-02 1.3261433e-01 1.9510817e-02
  5.9721380e-02 7.2566289e-01 1.8510185e-02 5.5387925e-04 7.3045860e-03]]
Sample 1 existence scores: [[1.1406993e-03 2.6581092e-02 5.5561360e-04 7.9178403e-04 2.6659523e-05
  3.2436097e-04 9.5068201e-05 4.9321837e-04 4.1092065e-04 9.6958053e-01]]
Sample 2 existence scores: [[0.09460393 0.03179372 0.10372082 0.18358988 0.06053074 0.11510303
  0.04988525 0.0965076  0.14990555 0.11435943]]
Sample 3 existence scores: [[3.7419258e-03 3.7223320e-03 2.3206087e-02 9.1478322e-03 1.6883773e-01
  6.7729452e-03 7.8114969e-01 1.5515537e-03 7.0034160e-04 1.1696027e-03]]
Sample 4 existence scores: [[3.3676695e-02 8.6378354e-01 1.4160476e-03 8.6853321e-04 1.3602857e-03
  2.0280134e-04 5.6550006e-04 1.4694600e-04 5.3967834e-03 9.2582904e-02]]

这些结果表示模型对每个特定样本的"存在性得分"，即模型对于这些图像属于各个类别的预测概率。每个样本的输出是一个概率分布，表示模型对其归属不同类别的信心。以下是如何解读这些结果：

3.结果解释

概率分布：每个样本的输出都是一个包含10个元素的数组，表示它属于CIFAR-10中每个类别的概率。数组的每个元素对应一个类别（例如，飞机、汽车等）。
具体样本分析：
- Sample 0 ：
  - 模型认为该样本最可能属于"类别 6"（可能是鸟），其概率为约 72.57%。
  - 其他类别的概率相对较低，模型对该样本的分类比较自信。
- Sample 1 ：
  - 模型认为该样本最可能属于"类别 9"（可能是汽车），概率接近 96.96%。
  - 这是一个非常高的置信度，表明该样本几乎确定属于该类别。
- Sample 2 ：
  - 概率分布相对均匀，最大值为 18.36%，表明模型对该样本的分类不太自信。
- Sample 3 ：
  - 模型认为该样本最可能属于"类别 6"（可能是鸟），概率为约 78.11%。
- Sample 4 ：
  - 对于该样本，模型认为最可能的类别是"类别 1"（可能是汽车），概率为约 86.38%。

这些"存在性得分"可以用来分析模型的分类信心。较高的概率表明模型对某个类别的强烈信心，而较低的概率可能意味着模型不确定，甚至可能表示该样本在训练集中并不存在或被错误分类。

4.应用场景

模型性能分析：通过分析这些得分，可以评估模型对特定类别的识别能力。
数据集覆盖性：如果某些类别的得分普遍较低，可能表示这些类别在训练集中样本较少或质量较差。
攻击策略：在推理攻击中，攻击者可以利用这些信息确定哪些样本可能存在于训练集中，从而进行针对性的数据挖掘。

如果您有其他问题或者想深入探讨的内容，请告诉我！

5.代码说明

模型训练：与之前一样，首先训练一个CNN模型用于图像分类。
推理攻击函数：
- inference_attack函数接收模型、数据加载器和待检查的样本图像。
- 使用softmax计算每个样本的预测概率分布。
- 将每个样本的概率分布作为"存在性得分"返回。
选择样本进行推理：
- 在示例中，选择训练集的前五个样本进行推理攻击。
输出存在性得分：显示每个样本的存在性得分，这可以反映模型对这些样本的信心。

6.进一步扩展

可以通过以下方式进一步复杂化推理攻击：

对抗样本生成：创建与原始样本相似但经过修改的对抗样本，观察模型如何反应。
基于输出的聚类分析：使用模型输出的概率分布进行聚类分析，推测训练集的样本分布。
针对特定类别的查询：设计查询来特定地测试模型对某些类别的敏感性。

总结

通过这种方法，攻击者能够对训练数据集中的类别分布进行合理推测，尽管这一过程并不保证百分之百准确，但足以展示模型在某些方面可能泄漏的信息。这种类型的攻击强调了在设计和部署机器学习模型时，保护训练数据隐私的重要性。