机器学习安全：ImageNet数据集均匀抽样与对抗攻击样本生成（二）

Content

Introduction

Method

Experiment

Settings

[Results and Discussion](#Results and Discussion)

[(1) Cost](#(1) Cost)

[(2) Concealment](#(2) Concealment)

[(3) Damage](#(3) Damage)

Conclusion

Introduction

The project aims to test image classification capacity and robustness of different models on validation samples from ImageNet Large Scale Visual Recognition Challenge 2012 with different levels of Gaussian noises and crafted adversarial noises by AUC and accuracy curve tendency, expounding the effect of adversarial training.

Method

Model Selection: The project selects two resnet50 models trained on ILSVRC2012(image classification) to do adversarial attacks and random perturbation. One is non-robust "Standard_R50" without adversarial training and the other one is robust "Engstrom2019Robustness" with adversarial training. "Standard_R50" is directly downloaded from https://download.pytorch.org/models/resnet50-19c8e357.pth

and "Engstrom2019Robustness" is downloaded from https://drive.google.com/uc?id=1T2Fvi1eCJTeAOEzrH_4TAIwO8HTOYVyn.

Data Preprocessing: Downloading whole validation set of 50,000 images and use .index() method on shuffled index list to guarantee 1 image for 1 class. Labels are correctly obtained by using ILSVRC2012_validation_ground_truth.txt, meta.mat from ILSVRC2012_img_val.tar in ImageNet and synset.txt from Content - a9e8c7f50d144ef6034d5231709dd3545b10b69c - 5dd7047/synset.txt -- Software Heritage archiveConstruct 1-to-1 mapping from image name to its label to guarantee attack and gaussian noise images saving. The final test dataset loader is a three-element iterator. Colored input images with varying resolutions go through "Res256Crop224" composite transform to normalize.
Attack Performance: For a series of epsilon with predefined range and step precision, do Momentum Iterative PGD(MIPGD) on input clean images, normalize gradient on CHW by L-infinity norm (maximum term normalization), use L-infinity norm to restrict perturbation extent within epsilon and also ensure domain restraints falling in $0,1$ . Attacks and noises are injected after normalization and before standardization. Mean tensor of RGB channels is $0.485,0.456,0.406$ and standard deviation tensor is $0.229, 0.224, 0.225$ .
Attack evaluation: The momentum, alpha (step size) and PGD turns are fixed to a prior suboptimal setting. Then draw ASR-epsilon curves of these two models to verify model robustness and effect of adversarial training.

Experiment

Settings

Deploy model and execute clean, attack, noise evaluation on NVIDIA GeForce RTX 5070. There is only validation procedure in the project. No finetuning is required. Run pure python code on PyCharm IDE and draw ASR-epsilon curves of two models for randomly selected 1000 validation images (for robustbench package, quantity should be no more than 5,000).

Momentum is 0.9, PGD turns is 20, PGD step size (alpha) is 0.01, epsilon ranges from 2/255 to 20/255 with step size 2/255. Use L-infinity norm as threat model. Things to mention is that gaussian noise is zero-centered and hence abort "centered at epsilon with variance 0.001" written in instruction, choosing epsilon as std rather than variance of normal distribution. Use batching adversarial attack with batch size=100 to accelerate attack process, corresponding tensor shape is NCHW.

Results and Discussion

(1) Cost

10 epsilon budgets per model. For different epsilon budgets time cost almost maintain at about 55 seconds under fixed alpha, turns(20), momentum and batch size setting.

For turns=50(default iterations setting for autoattack package), my algorithm costs about 128s. Compared with implementation of AutoAttack using DLR and CE loss with n_restarts set to 1 and the same batch_size, 1000 images attack costs about 5 minutes to raise ASR from 62% to 70%.

(2) Concealment

For epsilon = 4 /255, the perturbation is not obvious, output image with suffix "-a.JPEG" of size 224x224x3.

(3) Damage

The ASR curve drops rapidly for non-robust model and relatively stably for robust model. Use top1, top3, top5 accuracy indices to evaluate. For "Engstrom2019Robustness", clean accuracy are separately 64.6%, 78.5% and 82.9%, top3 and top5 accuracy curve hardly drops below baseline 45% however top1 accuracy curve finally drops to about 22% for epsilon=20/255. The optimal ASR for "Engstrom2019Robustness" is about 70.78% given larger validation sets.

Conclusion

"Engstrom2019Robustness" model is actually more robust than "Standard_R50" model, while choosing a high-quality attack is of essential significance for adversarial training and raising model robustness and generalization ability. At the same time, collecting comprehensive and augmented dataset or designing more robust model architecture also matters such as WideResNet-50-2 increases robustness accuracy from 34.96% to 38.14%.

Adversarial attack performance should balance between cost, concealment and damage by controlling perturbation, while hyperparameters and threat models such as L0, L1, L2 and L-infinity also play an unignorable role in the attack design.

Adversarial attack evaluation should consider validation set size and random factors. Enlarging classification candidate set lowers the attack effect greatly.

Relevant program is open-source with detailed instruction and explantion on Gitee repository of my account, visit the following links for more recent information!!!

AdversarialAttack: Implement adversarial attack using Momentum Iterative PGD algorithm and transfer attack for image classification models (CNNs) on CIFAR10 and ILSVRC2012. - Gitee.comhttps://gitee.com/wawaforest4689/AdversarialAttack/tree/master?svcp_stk=1_oaiXyX8LbUfvCjatVDzHhhTfzDSUuzfGJzkIF03NOEyi3obu5UJK2iCIECvxV1g-BWCGzahx8IQapGWve4yj3uW0O4uB9Ey5gQ5rJzllBH0IHzyotqwREPZfFD1UgDt-V8j9DUB0I3OXGB02yQQI3uQPKJ9iKF83I-cTkP1xxl8pRaqqmCUPs_RFHAK8RsGeqiHe5r5Ujwq9tqD4mkDpJg%3D%3D

https://gitee.com/wawaforest4689https://gitee.com/wawaforest4689