机器学习安全：图像多分类任务的测试时对抗样本转移攻击实战（一）

Content

[Contribution Declaration](#Contribution Declaration)

[Results and Discussion](#Results and Discussion)

[(1) Training and Single sample attack(Surrogate model, No ensemble for attack sample generation):Error-generic(1-100)](#(1) Training and Single sample attack(Surrogate model, No ensemble for attack sample generation):Error-generic(1-100))

[(2) Error-specific attack(51-100)](#(2) Error-specific attack(51-100))

[(3) Hyperparameters Setting: alpha(step size), steps(step count), momentum(employed in Momentum Iterative PGD)](#(3) Hyperparameters Setting: alpha(step size), steps(step count), momentum(employed in Momentum Iterative PGD))

[(4) Evaluate adversarial attack performance on "ground truth" model(No ensemble, transfer attack)](#(4) Evaluate adversarial attack performance on “ground truth” model(No ensemble, transfer attack))

[(5) Use Ensemble Adversarial Attack Method based on ResNet18 and VGG-11-BN and evaluate on surrogate/"ground truth" model](#(5) Use Ensemble Adversarial Attack Method based on ResNet18 and VGG-11-BN and evaluate on surrogate/”ground truth” model)

[Surrogate Model Case](#Surrogate Model Case)

["Ground Truth" Model Case](#“Ground Truth” Model Case)

Conclusion

Contribution Declaration

Code implementation framework is provided by the other group member, I revise the code, optimize the training process to actually implement training and evaluation on NVIDIA GPUs, correct testing ground-truth label marking, expand the code to support testing adversarial attack effect with different hyperparameter settings, do transfer attack robustness analysis, add figure and table recording functions. By the way, I offer a batch-samples based addversarial attack mode to accelerate attack process and lower costs. Notice that normalization is necessary and standardization is critical.

Introduction

This is a task aiming to carry out transfer adversarial attack and craft perturbation on the 3x32x32(CHW) images from CIFAR10 dataset. The basic task is a image classification task and the targeted model is not seen by attackers and hence use surrogate models to generate adversarial examples.

We choose ResNet18 and VGG-11-BN as two robust surrogate models and use Momentum Iterative PGD method with L-infinity norm restraint within a given budget epsilon. And then we do attack test on surrogate models to roughly verify attack effectiveness. 10000 images from 10 even distributed classes in CIFAR10 are used for surrogate model training and 100 images are used for adversarial attack testing.

Our innovation points including:

(1)Discussing relation between attack budget epsilon and ASR(Attack Success Rate) to check model robustness and verify attack effectiveness;

(2)Discussing about impacts of PGD step, PGD iteration count and momentum on ASR;

(3)Discussing about ensemble strategy for adversarial attack sample generation and generalization consideration in transfer attack scenario;

(4) Discussing about PGD acceleration strategy.

Method

Then we construct training dataset and data loader, start surrogate model training;
Then construct testing dataset and data loader with batch size=1 (BN is invalid in model evaluation mode so it can be larger to accelerate attack), do adversarial attack and evaluate on 1-50 and 51-100 samples with different targets;
Change PGD iteration steps, step size and momentum hyperparameter settings and repeat process (2);
Adjust batch size to 50 (maximum support size) to verify result consistency and accelerate attack speed.

Figure 1 Project Workflow

Experiment

Settings

We run python code on PyCharm IDE, do iteration searching in hyperparameter space under variable control setting (change one by one) , including adversarial attack budget epsilon, PGD step size, PGD iteration count and momentum of MIPGD.

Results and Discussion

(1) Training and Single sample attack(Surrogate model, No ensemble for attack sample generation):Error-generic(1-100)

Training Result on the left, an attack result given alpha=0.01, steps=25, momentum=0.9 is plotted on the right. It seems that the generalized ability of ResNet18 is still problematic due to accuracy drop from 62% to 29% (without any attack).

For epsilon=8/255, accuracy drops to 11%; epsilon=12/255, drops to 5%; For epsilon of 16/255 and 20/255, the final accuracy drops to about 4% and maintains because there are and are only 4 class-2 (bird) images in all 51-100 test samples. (Batch_Infer=1)

(2) Error-specific attack(51-100)

Below are five 2x2 confusion matrix plots concentrating on all 51-100 test images under five epsilon budget settings to further verify attack effiectiveness.(Batch_Infer=50)

(3) Hyperparameters Setting: alpha(step size), steps(step count), momentum(employed in Momentum Iterative PGD)

Influence of momentum is not obvious for momentum in $0.9, 0.95,0.99$ ; However, steps count makes ASR curve descends faster and AUC smaller and alpha (step size) has a greater impact when epsilon starts from small values, causing some perturbation on the ASR curve especially when steps count is small.(Batch_Infer=50)

(4) Evaluate adversarial attack performance on "ground truth" model(No ensemble, transfer attack)

Obviously, ASR starts from much higher accuracy and has a much larger AUC than surrogate model evaluation. For budget epsilon=20/255, the ASR is 100%-31%=69% while that reaches 100%-4%=96%(ASR upper bound) in the surrogate model case.

(5) Use Ensemble Adversarial Attack Method based on ResNet18 and VGG-11-BN and evaluate on surrogate/"ground truth" model

Surrogate Model Case

"Ground Truth" Model Case

Ensemble adversarial sample generation method indeed enhance the generalization ability of attack, which is interpretable because true data and model distribution is unknown and a more powerful and common model system gets closer to the true model going to be attacked.

What's more, collecting more generalized data such as images from the rest of the CIFAR10 dataset (there are still about 50000 images not imported for surrogate models training), from CIFAR100 or from the well-known ImageNet-2012 dataset may also help a lot.

Conclusion

To conclude,

(1) adversarial attack with L-infinity(maximum) attack greatly depends on epsilon setting and has an epsilon effective boundary relevant to model ability and training data.

(2) Momentum parameter is not so important in MIPGD method and attackers should decide step size and steps count reasonably.

(3) We can employ batch mode adversarial attack to accelerate and lower cost.

(4) Transfer attack usually has lower actual performance after employing surrogation.

(5)Ensemble adversarial attack has a more robust propulsion attack effect practically for attack budget increment under larger epsilon scenario.