[1906.03787] Intriguing properties of adversarial training
In this paper, we reveal two intriguing properties of adversarial training: (1) conducting normalization in the right manner is essential for training robust models
Abstract: Adversarial training is one of the main defenses against adversarial attacks.
In this paper, we provide the first rigorous study on diagnosing elements of
adversarial training, which reveals two intriguing properties.
‹Figure 1: The relationship between model robustness and the portion of clean images used for training. We observe that the strongest robustness can be obtained by training completely without clean images, surpassing the baseline model by 18.3% accuracy against PGD-2000 attacker. (Exploring Normalization Techniques in Adversarial Training)Figure 2: Comprehensive robust evaluation on ImageNet. For models trained with different strategies, we show their accuracy against a PGD attacker with 10 to 2000 iterations. Only the curve of 100% adv + 0% clean becomes asymptotic when evaluating against attackers with more iterations. (Exploring Normalization Techniques in Adversarial Training)Figure 3: Disentangling the mixture distribution for normalization secures model robustness. Unlike the blue curves in Figure ??, these new curves now become asymptotic when evaluating against attackers with more iterations, which indicate that the networks using MBNadv can behave robustly against PGD attackers with different attack iterations, even if clean images are used for training. (The Devil is in the Batch Normalization)Figure 4: Standard BN (left) estimates normalization statistics on the mixture distribution. Our proposed MBN (right) disentangles the distribution by constructing different mini-batch for clean and adversarial images to estimate normalization statistics. (The Devil is in the Batch Normalization)Figure 5: Statistics of running mean and running variance of the proposed MBN on randomly sampled 20 channels in a ResNet-152’s res3 block. This suggests that clean and adversarial images induce significantly different normalization statistics. (The Devil is in the Batch Normalization) (The Devil is in the Batch Normalization)Figure 6: Comparison of batch statistics and running statistics of BN on randomly sampled 20 channels in a ResNet-152’s res3 block. We observe that batch mean can converge to running mean, while batch variance still differs from running variance. (Revisiting Statistics Estimation of BN)Figure 7: Compared to traditional image classification tasks, adversarial training exhibits a stronger demand on deeper networks. (Beyond Adversarial Robustness) (Applying Running Statistics in Training)›
First, we study the role of normalization. Batch normalization (BN) is a
crucial element for achieving state-of-the-art performance on many vision
tasks, but we show it may prevent networks from obtaining strong robustness in
adversarial training. One unexpected observation is that, for models trained
with BN, simply removing clean images from training data largely boosts
adversarial robustness, i.e., 18.3%. We relate this phenomenon to the
hypothesis that clean images and adversarial images are drawn from two
different domains. This two-domain hypothesis may explain the issue of BN when
training with a mixture of clean and adversarial images, as estimating
normalization statistics of this mixture distribution is challenging. Guided by
this two-domain hypothesis, we show disentangling the mixture distribution for
normalization, i.e., applying separate BNs to clean and adversarial images for
statistics estimation, achieves much stronger robustness. Additionally, we find
that enforcing BNs to behave consistently at training and testing can further
Second, we study the role of network capacity. We find our so-called "deep"
networks are still shallow for the task of adversarial learning. Unlike
traditional classification tasks where accuracy is only marginally improved by
adding more layers to "deep" networks (e.g., ResNet-152), adversarial training
exhibits a much stronger demand on deeper networks to achieve higher
adversarial robustness. This robustness improvement can be observed
substantially and consistently even by pushing the network capacity to an
unprecedented scale, i.e., ResNet-638.