[1911.01382v1] Amortized Population Gibbs Samplers with Neural Sufficient Statistics
APG samplers are very general, and offer a path towards the development of deep generative models that incorporate structured priors to provide meaningful inductive biases in settings where we have little or no supervision

Abstract We develop amortized population Gibbs (APG) samplers, a new class of autoencoding variational methods for deep probabilistic models. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. Each conditional update is a neural proposal, which we train by minimizing the inclusive KL divergence relative to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics, resulting in quasi-conjugate variational approximations. Experiments demonstrate that learned proposals converge to the known analytical conditional posterior in conjugate models, and that APG samplers can learn inference networks for highly-structured deep generative models when the conditional posteriors are intractable. Here APG samplers offer a path toward scaling up stochastic variational methods to models in which standard autoencoding architectures fail to produce accurate samples.
‹

Figure 1: GMM Figure 2: DGMM Figure 3: Samples from the GMM and the DGMM. (a) GMM, the left column shows 5 test datasets with different number of data points. The subsequent columns show inference results by RWS, followed by results after 4, 8 and 12 APG updates. (b) DGMM, the left column shows 5 test datasets with different number of data points. The subsequent columns show the inference results by RWS, followed by results after 3 and 6 APG updates. The right column shows reconstructions from the learned generative model. (Neural Sufficient Statistics)

Figure 6: Mean squared error between video frames and reconstructions as a function of the number of APG sweeps. (Time Series Model – Bouncing MNIST)Figure 7: Full reconstruction for a video where T = 100, D = 3. (More Qualitative Results of Bouncing MNIST)Figure 8: Full reconstruction for a video where T = 100, D = 4. (More Qualitative Results of Bouncing MNIST)Figure 9: Full reconstruction for a video where T = 100, D = 5. (More Qualitative Results of Bouncing MNIST)