[1805.08657] Robust Conditional Generative Adversarial Networks
We introduce the Robust Conditional GAN (RoCGAN) model, a new conditional GAN capable of leveraging unsupervised data to learn better latent representations, even in the face of large amount of noise

Abstract: Conditional generative adversarial networks (cGAN) have led to large
improvements in the task of conditional image generation, which lies at the
heart of computer vision. The major focus so far has been on performance
improvement, while there has been little effort in making cGAN more robust to
noise. The regression (of the generator) might lead to arbitrarily large errors
in the output, which makes cGAN unreliable for real-world applications. In this
work, we introduce a novel conditional GAN model, called RoCGAN, which
leverages structure in the target space of the model to address the issue. Our
model augments the generator with an unsupervised pathway, which promotes the
outputs of the generator to span the target manifold even in the presence of
intense noise. We prove that RoCGAN share similar theoretical properties as GAN
and experimentally verify that our model outperforms existing state-of-the-art
cGAN architectures by a large margin in a variety of domains including images
from natural scenes and faces.

‹Figure 1: The mapping process of the generator of the baseline cGAN (in (a)) and our model (in (b)). (a) The source signal is embedded into a low-dimensional, latent subspace, which is then mapped to the target subspace. The lack of constraints might result in outcomes that are arbitrarily off the target manifold. (b) On the other hand, in RoCGAN, steps 1b and 2b learn an autoencoder in the target manifold and by sharing the weights of the decoder, we restrict the output of the regression (step 2a). All figures in this work are best viewed in color. (Introduction)Figure 2: Schematic of the generator of (a) cGAN versus (b) our proposed RoCGAN. The single pathway of the original model is replaced with two pathways. (Method)Figure 3: Qualitative results in the synthetic experiment of sec. ??. Each plot corresponds to the respective manifolds in the output vector; the first and third depend on both x, y (xyz plot), while the rest on x (xz plot). The green color visualizes the target manifold, the red the baseline and the blue ours. Even though the two models include the same parameters during inference, the baseline does not approximate the target manifold as well as our method. (Experiment on synthetic data)Figure 4: Qualitative results (best viewed in color). The first row depicts the target image, the second row the corrupted one (used as input to the methods). The third row depicts the output of the baseline cGAN, while the outcome of our method is illustrated in the fourth row. There are different evaluations visualized for faces: (a) denoising, (b) denoising with additional noise at test time, (c) sparse inpainting, (d) sparse inpainting with 75% black pixels. For natural scenes the columns (e) and (f) denote the denoising and sparse inpainting results respectively. (Natural scenes)

Figure 5: Qualitative results in the synthetic experiment (main paper). Output of the 1st function. From left to right: The target (ground-truth) curve in green, the output of the single pathway network (baseline) in red, the two pathway network in blue and all three overlaid. The output vector of the 1st and the 3rd functions are plotted here with respect to x, the full 3D plot is in the manuscript. All figures in this work are best viewed in color. (Introduction)Figure 6: Qualitative results in the synthetic experiment (main paper). Output of the 2nd function. See Fig. ?? for details. (Introduction)Figure 7: Qualitative results in the synthetic experiment (main paper). Output of the 3rd function. See Fig. ?? for details. (Introduction)Figure 8: Qualitative results in the synthetic experiment (main paper). Output of the 4th function. See Fig. ?? for details. (Introduction)Figure 9: Projection to linear subspace. The original image on the left was corrupted (downscaled ×4). The image was upscaled with bi-linear interpolation in (c). Both the original and the upscaled images are projected and reconstructed from a PCA. The reconstruction of the original image (‘reconst-org’) along with the linearly upscaled (‘reconst-ups’) image are similar. The simple linear projection demonstrates how constraining the output through a linear subspace can result in a more robust output. (Linear generator analogy)Figure 10: Qualitative results; best viewed in color. The first row depicts the ground-truth image, the second row the corrupted one (input to methods), the third the output of the baseline cGAN, the fourth illustrates the outcome of our method. The four first columns are based on the protocol of ‘4layer’ network, while the four rightmost columns on the protocol ‘4layer-50k’. There are different evaluations visualized for faces: (a), (e) Denoising, (b), (f) denoising with augmented noise at test time, (c), (g) sparse inpainting, (d), (h) sparse inpainting with 75% black pixels. (Additional experiments)Figure 11: Cosine distance distribution plot (sec. ??). A perfect reconstruction per compared image would yield a plot of a Dirac delta around one; a narrow distribution centered at one denotes proximity to the target images’ embeddings. The word ’den’ abbreviates denoising and ’inp’ sparse inpainting. (Alternative error metric)Figure 15: Qualitative figure illustrating the different noise levels (sec. ??). The first row depicts different target samples, while every three-row block, depicts the corrupted image, the baseline output and our output. The blocks top down correspond to the 25%, 35%, 50% noise (25/0, 35/0 and 50/0). The images in the first blocks are closer to the respective target images; as we increase the noise the baseline results deteriorate faster than RoCGAN outputs. The readers can zoom-in to further notice the difference in the quality of the outputs. (Additional noise)Figure 12: The layer schematics of the generators in case of (a) the ‘4layer-skip’ case, (b) the ‘5layer’ case. (Additional experimental details)Figure 13: Histogram plots for different noise cases (experiment of sec. ??). The distribution that is more concentrated to the right of the histogram is closer to the target distribution. In (a) we note that even in the training noise case, the two histograms differ with ours concentrating towards the right. However, as there is additional noise added in (b), (c) the histogram of cGAN deteriorates faster. A similar phenomenon is observed in the cases of (d), (e) and (f). (Additional noise)Figure 14: Sample images depicting the corruption level in each case (sec. ??). (Additional noise)Figure 16: Qualitative figure for different type of noise during testing (see Fig ??). The first row depicts different target samples, while every three-row block, depicts the corrupted image, the baseline output and our output. The blocks top down correspond to the 25/10, 25/20, 25/25 cases (different type of testing noise). The last block contains the most challenging noise in this work, i.e. both increased noise and of different type than the training noise. Nevertheless, our model generates a more realistic image in comparison to the baseline. (Additional noise)