[1911.06987v1] Faster AutoAugment: Learning Augmentation Strategies using Backpropagation
We have verified our method on several standard benchmarks and showed that Faster AutoAugment could achieve competitive performance with other methods for automatic data augmentation

Abstract Data augmentation methods are indispensable heuristics to boost the performance of deep neural networks, especially in image recognition tasks. Recently, several studies have shown that augmentation strategies found by search algorithms outperform hand-made strategies. Such methods employ black-box search algorithms over image transformations with continuous or discrete parameters and require a long time to obtain better strategies. In this paper, we propose a differentiable policy search pipeline for data augmentation, which is much faster than previous methods. We introduce approximate gradients for several transformation operations with discrete parameters as well as the differentiable mechanism for selecting operations. As the objective of training, we minimize the distance between the distributions of augmented data and the original data, which can be differentiated. We show that our method, Faster AutoAugment, achieves significantly faster searching than prior work without a performance drop.
‹Figure 1. Overview of our proposed model. We propose to use a differentiable data augmentation pipeline to achieve faster policy search by using adversarial learning. (Introduction)Figure 3. Schematic view of the problem setting. Each image is augmented by a sub-policy randomly selected from the policy. A single sub-policy is composed of K consecutive operations (O1, . . . , OK ), such as shear x and solarize. An operation Ok operates a given image with probability pk and magnitude µk. (Preliminaries)Figure 4. Schematic view of the selection of operations in a single sub-policy when K = 2. During searching, we apply all operations to an image and take weighted sum of the results as an augmented image. The weights, w1 and w2, are also updated as other parameters. After searching, we sample operations according to the trained weights. (Searching for operations in sub-policies)Figure 2. We regard data augmentation as a process that fills missing data points of the original training data; therefore, our objective is to minimize the distance between the distributions of augmented data and the original data using adversarial learning. (Introduction)Figure 5. Original and augmented images of CIFAR-10 (upper) and SVHN (lower). As can been seen, Faster AutoAugment can transform original images into diverse augmented images with sub-policies at the right-hand side. (CIFAR-10 and CIFAR-100)

Figure 6. As the number of sub-policies grows, performance increases. The relationship between the number of sub-policies and the test error rate (CIFAR-10 with WideResNet-40-2). We plot test error rates and their standard deviations averaged over three runs. Figure 7. As the operation count grows, performance increases. The relationship between the operation count of each sub-policy and the average test error rate of three runs (CIFAR-10 with WideResNet-40-2). (Changing the Number of Sub-policies)›