[1902.04294] Density Estimation and Incremental Learning of Latent Vector for Generative Autoencoders
As a result, the autoencoder applying the latent density estimator and the incremental learning strategy not only improved the generation quality of the generative models using the autoencoder, but also outperformed the previous studies in the log-likelihood score.

Abstract: In this paper, we treat the image generation task using the autoencoder, a
representative latent model. Unlike many studies regularizing the latent
variable's distribution by assuming a manually specified prior, we approach the
image generation task using an autoencoder by directly estimating the latent
distribution. To do this, we introduce 'latent density estimator' which
captures latent distribution explicitly and propose its structure. In addition,
we propose an incremental learning strategy of latent variables so that the
autoencoder learns important features of data by using the structural
characteristics of under-complete autoencoder without an explicit
regularization term in the objective function. Through experiments, we show the
effectiveness of the proposed latent density estimator and the incremental
learning strategy of latent variables. We also show that our generative model
generates images with improved visual quality compared to previous generative
models based on autoencoders.

‹Figure 1: Results of image generation with an 128 × 128 resolution by our method using the aligned CelebA dataset. Our generative model produces sharp and detailed images, while preserving the diverse characteristics of data. (Introduction)Figure 1: Results of image generation with an 128 × 128 resolution by our method using the aligned CelebA dataset. Our generative model produces sharp and detailed images, while preserving the diverse characteristics of data. (Introduction)Figure 1: Results of image generation with an 128 × 128 resolution by our method using the aligned CelebA dataset. Our generative model produces sharp and detailed images, while preserving the diverse characteristics of data. (Introduction)Figure 3: The overview of the proposed generative framework based on an autoencoder. It is composed of an encoder, a decoder and a Latent Density Estimator (LDE). The solid line represents the inference and the reconstruction path and the dashed line represents the generation path. Our LDE estimates the estimated latent distribution pλ(z) from the empirical latent distribution pdata(z). (Latent Density Estimator (LDE))Figure 2: Toy example on MNIST dataset. Two dimensional scatter plot of VAEs’ latent variables. Each VAE’s regularizer as a different coefficient β from 0 to 1. The stronger the regularizer, the closer the distribution gets to a Gaussian distribution, but the lower becomes the reconstruction performance. In each pairs of ’5’, the left and the right are the target and the reconstructed images, respectively. (Autoencoder with Latent Density Estimator)Figure 7: CelebA ⇐⇒ CelebA Figure 8: Shoes ⇐⇒ Shoes Figure 9: CelebA ⇐⇒ Shoes Figure 10: The Linear interpolation results of three cases in latent space on CelebA and Shoes dastasets. In each subfigure, from top to bottom, the first images is result of AAE with Gaussian prior, the second to the fourth are the result of AAE with a mixture of 2 Gaussians, AE, and IAE, respectively. And the leftmost and rightmost images of each subfigure are the original test images, and the image between them is the results of interpolation from 1:0 to 0:1 at 0.2 intervals. (Experiments Details)Figure 6: Comparison of reconstruction results according to different learning methods of autoencoders on CelebA. In each subfigure, the first column is the input test image, the second to the fourth are the result of AAE, AE, and IAE, respectively. In all the samples, the results of AE and IAE are more clear than AAE. (Latent Space Walking on Two Domains)Figure 11: Comparison of random samples from different generative models using autoencoders on CelebA. The generated images from IAE with LDE look the best, most stable and clear. (Additional Experiments for Section 4.2.2)Figure 4: The architecture of the proposed latent density estimator. The vectors p and m are a zero-padding and a binary masking vector, respectively. Here, the dimension of z is 4, the filter size s is 2 and the number of layers of dilated causal convolution part is 2 = dlogs(4)e. The mixture density networks part outputs the parameters of the Gaussian mixture: µ, σ and π from h. (Experiments)Figure 5: The samples from the distribution of the GMMN and the LDEs. The target distribution is in orange and the sampling distribution is in blue. As the number of Gaussian in the mixture of Gaussian distribution increases, the samples tend to fit the target distribution sharply. Watch the empty region on the upper right side. (Log-likelihood of the Latent Density Estimator)Figure 12: Our autoencoder architecture. C is the number of channels in the input data, N is the number of convolution blocks, S is filter size of encoder’s last convolution. D is the dimension of the latent vector. (More Generation Results of Section 4.3.1)Figure 11: Comparison of random samples from different generative models using autoencoders on CelebA. The generated images from IAE with LDE look the best, most stable and clear. (Additional Experiments for Section 4.2.2)Figure 13: AE+LDE Figure 14: IAE+LDE Figure 15: Test log-likelihood of the interpolated latent variables of AE with LDE and IAE with LDE according to α on section 4.2.2 of the main paper. The blue and yellow line is log-likelihood of the interpolated sample in same domains and different domain, respectively. The log-likelihood of different domain is relatively low in the middle of each graph (when α is between 0.4 and 0.6). (Trade-off between Reconstruction and prioritized Regularization)›