[1912.03192] Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations
We have shown how this framework can be realized by leveraging the StyleGAN architecture – resulting in models that are not only robust to systematic evaluation of insensitivity to variations but also exhibit better generalization, demonstrating that that accuracy is not necessarily at odds with robustness

Abstract Recent research has made the surprising finding that state-ofthe-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this problem. However, its application has been limited to enforcing invariance to analytically defined transformations like `p-norm bounded perturbations. Such perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input (such as a change in lighting conditions). In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input. The two key ideas underlying our formulation are (1) leveraging disentangled representations of the input to define different factors of variations, and (2) generating new input images by adversarially composing the representations of different images. We use a StyleGAN model to demonstrate the efficacy of this framework. Specifically, we leverage the disentangled latent representations computed by a StyleGAN model to generate perturbations of an image that are similar to real-world variations (like adding make-up, or changing the skin-tone of a person) and train models to be invariant to these perturbations. Extensive experiments show that our method improves generalization and reduces the effect of spurious correlations.
‹Figure 1. Two variations of the same face that are classified as both “smiling” and “not smiling” with close to 100% confidence by the same classifier. Note that this person “does not exist” and has been generated using a StyleGAN model. (Introduction)Figure 2. Comparison of different data augmentation techniques. These transformations tend to destroy the image semantics. (Related work)Figure 3. Illustration of the maximization process in Equation (??). (Adversarial Mixing with Disentangled Representations)Figure 4. Comparison of mixup and AdvMix on a toy example. In this example, we are given 200 datapoints from an unknown distribution. Each data point (x1, x2) is sampled according to x1 ∼ N(z⊥, √ 3) where z⊥ ∈ Z⊥ = {0., 10.} and x2 ∼ N(zk, 1) where zk ∈ Zk = {0., 20.}. The colors represent the label. Note that the latent variable zk = 20y is dependent on the label while z⊥ is independent of the label. Panel (a) shows the original set of 200 datapoints; panel (b) shows the effect of sampling additional data using AdvMix; and panel (c) shows the effect of mixup. Of course, we should point out that our method, AdvMix, is aware of the underlying latent representation, while mixup is not. (Adversarial Mixing with Disentangled Representations)Figure 5 Figure 6 Figure 7. Panel ?? shows how the latents are progressively able to match a target image (on the far right). Panel ?? shows two different variations of the obtained image. (Implementation using StyleGAN)Figure 8. Mean colors given to each digit in the training set of our Colored-MNIST case-study. (Results)Figure 9. Accuracy of different training methods on images from our unbiased Colored-MNIST test set. The training set is progressively debiased by increasing the standard deviation of the colors present. (Results)Figure 10. The top row shows examples of clean images from CELEBA that are all classified correctly by the nominal model. The bottom row shows semantically plausible variants of these images that are all misclassified. (CELEBA)

Figure 11. mixup Figure 12. Adversarial Training ( = 0.1) Figure 13. AdvMix or RandMix Figure 14. Example of perturbations obtained by different techniques on our Colored-MNIST dataset. The image on the far left is the original image. On the same row are variations of that image. Even rows show the rescaled difference between the original image and its variants. (Additional examples)Figure 15. mixup Figure 16. Adversarial Training ( = 8/255) Figure 17. AdvMix or RandMix Figure 18. Example of perturbations obtained by different techniques on CELEBA. The image on the far left is the original image. On the same row are variations of that image. Even rows show the rescaled difference between the original image and its variants. (Additional examples)

Figure 19. Example of perturbations obtained by AdvMix on randomly generated images. The top row consists of images generated by a StyleGAN model – all these images are classified as “not smiling” by the nominal classifier (the numbers indicate the classifier output probability for “smiling”). The second row consists of adversarial perturbations obtained by AdvMix. The last row shows the rescaled differences between the original images and their variant. (Additional examples)›