2019 Why Do Adversarial Attacks Transfer Explaining Transferability of Evasion and Poisoning Attacks

Adversarial Examples(对抗样本)

Fig. 1: Example of attacks on deep learning models with ‘universal adversarial perturbations’ : The attacks are shown for the CaffeNet, VGG-F network and GoogLeNet. All the networks recognized the original clean images correctly with high confidence. After small perturbations were added to the images, the networks predicted wrong labels with similar high confidence. Notice that the perturbations are hardly perceptible for human vision system, however their effects on the deep learning models are catastrophic. 2018/5/15
Definition of terms
• Adversarial example/image is a modified version of a clean
image that is intentionally perturbed (e.g. by adding noise) to confuse/fool a machine learning technique, such as deep neural networks. • Adversarial perturbation is the noise that is added to the clean image to make it an adversarial example. • Adversarial training uses adversarial images besides the clean
adversarial 名词

adversarial 名词
在机器学习中,Adversarial通常用来形容针对模型的攻击或测试,即对抗样本(Adversarial examples)。
1. Adversarial attack:对抗攻击,即针对机器学习模型的攻击,旨在改变模型的输出。
2. Adversarial example:对抗样本,经过特定的扰动处理后,能够欺骗机器学习模型产生错误分类结果的样本。
3. Adversarial training:对抗训练,是一种训练模型的方法,通过引入对抗样本来提高模型的鲁棒性。
4. Adversarial defense:对抗防御,是一种针对对抗攻击的防御方法,如检测对抗样本或对抗训练等。
5. Gradient descent:梯度下降,是一种优化算法,通过计算损失函数的梯度来更新模型参数,从而使损失函数最小化。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Creating poisoning and evasion attack points is not a trivial task, particularly when many online services avoid disclosing information about their machine learning algorithms. As a result, attackers are forced to craft their attacks in black-box settings, against a surrogate model instead of the real model used by the service, hoping that the attack will be effective on the real model. The transferability property of an attack is satisfied when an attack developed for a particular machine learning model (i.e., a surrogate model) is also effective against the target model. Attack transferability was observed in early studies on adversarial examples [13, 41] and has gained a lot more interest in recent years with the advancement of machine learning cloud services. Previous work has reported empirical findings about the transferability of evasion attacks [3, 12, 13, 19, 24, 30, 31, 41, 42, 45] and, only recently, also on the transferability of poisoning integrity attacks [40]. In spite of these efforts, the question of when and why do adversarial points transfer remains largely unanswered.
Marco Melis
DIEE, University of Cagliari, Italy
Maura Pintor
DIEE, University of Cagliari, Italy
Matthew Jagielski
Northeastern University, Boston, MA
Battista Biggio
The wide adoption of machine learning (ML) and deep learning algorithms in many critical applications introduces strong incentives for motivated adversaries to manipulate the results and models generated by these algorithms. Attacks against machine learning systems can happen during multiple stages in the learning pipeline. For instance, in many settings training data is collected online and thus can not be fully trusted. In poisoning availability attacks, the attacker controls a certain amount of training data, thus influencing the trained model and ultimately the predictions at testing time on most points in testing set [4, 16, 18, 26–28, 33, 35, 40, 46]. Poisoning integrity attacks have the goal of modifying predictions on a few targeted points by manipulating the training process [18, 40].
On the other hand, evasion attacks involve small manipulations of testing data points that results in misprediction at testing time on those points [3, 7, 9, 13, 30, 37, 41, 44, 47].
DIEE, University of Cagliari, Italy Pluribus One
Transferability captures the ability of an attack against a machinelearning model to be effective against a different, potentially unknown, model. Studying transferability of attacks has gained interest in the last years due to deployment of cyber-attack detection services based on machine learning. For these applications of machine learning, service providers avoid disclosing information about their machine-learning algorithms. As a result, attackers trying to bypass detection are forced to craft their attacks against a surrogate model instead of the actual target model used by the service. While previous work has shown that finding test-time transferable attack samples is possible, it is not well understood how an attacker may construct adversarial examples that are likely to transfer against different models, in particular in the case of training-time poisoning attacks. In this paper, we present the first empirical analysis aimed to investigate the transferability of both test-time evasion and training-time poisoning attacks. We provide a unifying, formal definition of transferability of such attacks and show how it relates to the input gradients of the surrogate and of the target classification models. We assess to which extent some of the most well-known machine-learning systems are vulnerable to transfer attacks, and explain why such attacks succeed (or not) across different models. To this end, we leverage some interesting connections highlighted in this work among the adversarial vulnerability of machine-learning models, their regularization hyperparameters and input gradients.