[1901.11058] HyperGAN: A Generative Model for Diverse, Performant Neural Networks
We have also shown the uncertainty estimates from the generated ensembles are capable of detecting out-of-distribution data and adversarial examples.In the future, we believe HyperGAN would impact many domains including meta learning and reinforcement learning.

Abstract: We introduce HyperGAN, a generative network that learns to generate all the
weights within a deep neural network. HyperGAN employs a novel mixer to
transform independent Gaussian noise into a latent space where dimensions are
correlated, which is then transformed to generate weights in each layer of a
deep neural network. We utilize an architecture that bears resemblance to
generative adversarial networks, but we evaluate the likelihood of samples with
a classification loss. This is equivalent to minimizing the KL-divergence
between the generated network parameter distribution and an unknown true
parameter distribution. We apply HyperGAN to classification, showing that
HyperGAN can learn to generate parameters which solve the MNIST and CIFAR-10
datasets with competitive performance to fully supervised learning, while
learning a rich distribution of effective parameters. We also show that
HyperGAN can also provide better uncertainty than standard ensembles. This is
evaluated by the ability of HyperGAN generated ensembles to detect out of
distribution data as well as adversarial examples. We see that in addition to
being highly accurate on inlier data, HyperGAN can provide reasonable
uncertainty estimates.

‹Figure 1. HyperGAN architecture. The mixer transforms s ∼ S into latent codes {q1, . . . , qN }. The generators each transform each latent subvector qi into the parameters of the corresponding layer in the target network. The discriminator forces Q(q|s) to be well-distributed and close to P (Related Work)Figure 2. Results of HyperGAN on the 1D regression task. From left to right, we plot the predictive distribution of 10 and 100 sampled models from a trained HyperGAN. Within each image, the blue line is the target function x3 , the red circles show the noisy observations, the grey line is the learned mean function, and the light blue shaded region denotes ±3 standard deviations (1-D Toy Regression Task)Figure 3. Empirical CDF of the predictive entropy of all approaches on notMNIST. L2 refers to conventional ensembles trained from different random starts. One an see the entropy of HyperGAN models are significantly higher than baselines (Anomaly Detection)Figure 4. Empirical CDF of the predictive entropy on out of distribution data: the 5 classes of CIFAR-10 unseen during training. L2 refers to conventional ensembles trained from different random starts (Anomaly Detection)Figure 5. Entropy of predictions on FGSM and PGD adversarial examples. HyperGAN generates ensembles that are far more effective than standard ensembles even with equal population size.Note that for large ensembles, it is hard to find adversarial examples with small norms e.g. = 0.01 (Adversarial Detection)Figure 6. Ablation of HyperGAN accuracy on CIFAR-10, with normal HyperGAN, without the mixer and without the discriminator, respectively. All of them converge to very similar accuracy but the version without the mixer stumbles significantly in the beginning (Ablation Study)Figure 7. HyperGAN diversity on CIFAR-10 given a normal training run, with the mixer removed, and with the discriminator removed. Diversity is shown as the standard deviation divided by the norm of the weights, within a population of 100 generated networks. (Ablation Study)Figure 8. Convolutional filters from MNIST classifiers sampled from HyperGAN. For each image we sample the same 5x5 filter from 25 separate generated networks. From left to right: figures a and b show the first samples of the first two generated filters for layer 1 respectively. Figures c and d show samples of filters 1 and 2 for layer 2. We can see that qualitatively, HyperGAN learns to generate classifiers with a variety of filters. (Generated Filter Examples)Figure 9. Top: MNIST examples to which HyperGAN assigns high entropy (outlier). Bottom: Not-MNIST examples which are predicted with low entropy (inlier) (Outlier Examples)Figure 10. Diversity of predictions on adversarial examples. FGSM and PGD examples are created against a network generated by HyperGAN, and tested on 500 more generated networks. FGSM transfers better than PGD, though both attacks fail to cover the distribution learned by HyperGAN (HyperGAN Diversity on Adversarial Examples)›