[1911.03432v1] Penalty Method for Inversion-Free Deep Bilevel Optimization
In this paper we presented an efficient algorithm based on penalty function to solve bilevel optimization, which is both simple and has theoretical and practical advantages over existing methods

Abstract Bilevel optimizations are at the center of several important machine learning problems such as hyperparameter tuning, data denoising, fewshot learning, data poisoning. Different from simultaneous or multi-objective optimization, obtaining the exact descent direction for continuous bilevel optimization requires computing the inverse of the hessian of the lower-level cost function, even for first order methods. In this paper, we propose a new method for solving bilevel optimization, using the penalty function, which avoids computing the inverse of the hessian. We prove convergence of the method under mild conditions and show that it computes the exact hypergradient asymptotically. Small space and time complexity of our method allows us to solve large-scale bilevel optimization problems involving deep neural networks with up to 3.8M upper-level and 1.4M lowerlevel variables. We present results of our method for data denoising on MNIST/CIFAR10/SVHN datasets, for few-shot learning on Omniglot/MiniImagenet datasets and for training-data poisoning on MNIST/Imagenet datasets. In all experiments, our method outperforms or is comparable to previously proposed methods both in terms of accuracy and run-time.
‹

Figure 1. f(u, v)=v2 −(u-v)2 (Introduction)Figure 2. Convergence of GD, RMD, ApproxGrad, and Penalty for two example bilevel problems. The mean curve (blue) is superimposed on 20 independent trials (yellow). (Synthetic examples)Figure 3. Convergence of GD, RMD, ApproxGrad, and Penalty for two example bilevel problems where AT A is a rank-deficient random matrix. The mean curve (blue) is superimposed on 20 independent trials (yellow). (Synthetic examples)Figure 4. Comparison of accuracy and wall clock time (per upperlevel iteration) with number of lower-level iterations T of Penalty and ApproxGrad (For ApproxGrad, we perform T updates for the linear system) on data denoising problem (Sec. ?? with 25% noise on MNIST), few-shot learning problem (Sec. ?? with 20 way 5 shot classification on Omniglot) and untargeted data poisoning (Sec. ?? with 60 poisoned points on MNIST). (Few-shot learning)Figure 5. Penalty method for T=1,5,10 and λ0 = 0, 10−4 , 10−2 , 1 for Example 1 of Sec.3.1. Left: with ν. Right: without ν. Averaged over 5 trials. (Impact of various hyperparameters and terms)Figure 6. Untargeted data poisoning attack on MNIST. The top row shows the learned poisoned image using Penalty, starting from the images in the bottom row as initial poisoned images. The column number represents the fixed label of the image, i.e. the label of the images in first column is digit 0, the label of the second column is digit 1, etc. (Clean label attack)Figure 7. Targeted data poisoning attack on MNIST. The top row shows the learned poisoned images using Penalty, starting from the images in the bottom row as initial poisoned images. The images in the first 5 columns have the fixed label of digit 3, and in the next 5 columns are images with the fixed label of digit 8. (Clean label attack)Figure 8. Clean label poisoning attack on dog-fish dataset. The top row shows the target instances from the test set, the second row shows the base instances from the training set used to initialize the poisoned images and the last row shows the poisoned instances obtained from Penalty. Notice that poisoned images (third row) are visually indistinguishable from the base images (second row) and can evade visual detection. (Clean label attack)Figure 9. TODO: (Attack by perturbing training data)›