[1912.13464v1] Model Inversion Networks for Model-Based Optimization
Our experiments showed that MINs are capable of solving MBO optimization tasks in both contextual and non-contextual settings, and are effective over highly semantic score functions such as age of the person in an image

Abstract In this work, we aim to solve data-driven optimization problems, where the goal is to find an input that maximizes an unknown score function given access to a dataset of inputs with corresponding scores. When the inputs are high-dimensional and valid inputs constitute a small subset of this space (e.g., valid protein sequences or valid natural images), such model-based optimization problems become exceptionally difficult, since the optimizer must avoid out-of-distribution and invalid inputs. We propose to address such problem with model inversion networks (MINs), which learn an inverse mapping from scores to inputs. MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems. MINs can also handle both purely offline data sources and active data collection. We evaluate MINs on tasks from the Bayesian optimization literature, highdimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data.
‹Figure 1: Schematic for MIN training and optimization. Reweighting (Section ??) and, optionally, active data collection (Section ??) are used during training. The MIN is then used to obtain the optimal input x? using the Approx-Infer procedure in Section ??. (Model Inversion Networks)Figure 2: Thickest stroke Figure 3: Thickest digit (3) Figure 4: Most number of blobs (8) Figure 5: Results for non-contextual static dataset optimization on MNIST: (a) and (b): Stroke width optimization, and (c): Maximization of disconnected black pixel blobs. From left to right: MINs, MINs w/o Inference (Section ??), which sample x from the inverse map conditioned on the highest seen value of y, MINs w/o Reweighting (Section ??), and direct optimization of a forward model, which starts with a random image from the dataset and updates it via stochastic gradient descent for the highest score based on the forward model. Observe that MINs can produce thickest characters which resemble valid digits. Optimizing the forward function often turns non-digit pixels on, thus going off the valid manifold. Both the reweighting and inference procedure are important for good results. Scores are mentioned beneath each figure. The larger score the better, provided the solution x is the image of a valid digit. Dataset average is 149.0. (Data-Driven Optimization with Static Datasets)Figure 6: Optimized x (trained on > 15 years) Figure 7: Optimized x (trained on > 25 years) Figure 8: MIN optimization to obtain the youngest faces (x) when trained on faces older than 15 (left) and older than 25 (right). The score function being optimized (maximized) in this case is the negative age of the face. Generated optimization output x (bottom) are obtained via inference in the inverse map at different points during model training. Real faces (x) of varying ages (including ages lower than those used to train the model) are shown in the top rows. We overlay the actual negative score (age) for each face on the real images, and the age obtained from subjective user rankings on the generated faces. (Data-Driven Optimization with Static Datasets)Figure 9: Optimized x produced from contextual training on Celeb-A. Context = (brown hair, black hair, bangs, moustache and f(x) = `1(wavy hair, eyeglasses, smiling, no beard). We show the produced x? for two contexts. The model optimizes score for both observed contexts such as brown or black hair and extrapolates to unobserved contexts such as brown and black hair. (Optimization with Active Data Collection)Figure 10: Contextual MBO on MNIST. In (a) and (b), top one-half and top one-fourth of the image respectively and in (c) the one-hot encoded label are provided as contexts. The goal is to produce the maximum stroke width character that is valid given the context. In (a) and (b), we show triplets of the groundtruth digit (green), the context passed as input (yellow) and the produced images x from the MIN model (purple). (Contextual Image Optimization)Figure 11: Additional results for non-contextual image optimization. This task is performed on the CelebA dataset. The aim is to maximize the score of an image which is given by the sum of attributes: eyeglasses, smiling, wavy hair and no beard. MINs produce optimal x – visually these solutions indeed optimize the score. (Additional results for non-contextual image optimization)Figure 12: Optimal x solutions produced by a cGAN for the youngest face optimization task on the IMDB-faces dataset. We note that a cGAN learned to ignore the score value and produced images as an unconditional model, without any noticeable correlation with the score value. The samples produced mostly correspond to the most frequently occurring images in the dataset. (Additional results for non-contextual image optimization)Figure 13: Images returned by the MIN optimization for optimization over images. We note that MINs perform successful optimization over the an objective defined by the sum of desired attributes. Moreover, for unseen contexts, such as both brown and black hair, the optimized solutions look aligning with the context reasonably, and optimize for the score as well. (Additional results for non-contextual image optimization)Figure 14: Thickest stroke Figure 15: Thickest digit (3) Figure 16: Most number of blobs (8) Figure 17: Results for non-contextual static dataset optimization on MNIST annotated with quantitative score values achieved mentioned below each figure. (Quantitative Scores for Non-contextual MNIST optimization)›