[1910.11319] Progressive Domain Adaptation for Object Detection
Experimental results show that our method performs favorably against the state-ofthe-art method and can further reduce the domain discrepancy under various scenarios, such as the cross-camera case, weather condition, and adaption to a large-scale dataset.
Abstract: Recent deep learning methods for object detection rely on a large amount of
bounding box annotations. Collecting these annotations is laborious and costly,
yet supervised models do not generalize well when testing on images from a
different distribution. Domain adaptation provides a solution by adapting
existing labels to the target testing data. However, a large gap between
domains could make adaptation a challenging task, which leads to unstable
training processes and sub-optimal results. In this paper, we propose to bridge
the domain gap with an intermediate domain and progressively solve easier
adaptation subtasks. This intermediate domain is constructed by translating the
source images to mimic the ones in the target domain. To tackle the
domain-shift problem, we adopt adversarial learning to align distributions at
the feature level. In addition, a weighted task loss is applied to deal with
unbalanced image quality in the intermediate domain. Experimental results show
that our method performs favorably against the state-of-the-art method in terms
of the performance on the target domain.
‹Figure 1. The proposed progressive adaptation framework. The algorithm includes two stages of adaptation as shown in a) and b). In a), we first transform source images to generate synthetic ones by using the generator G learned via CycleGAN [36]. Afterward, we use the labeled source domain and perform first stage adaptation to the synthetic domain. Then in b), our model applies a second stage adaptation which takes the synthetic domain with labels inherited from the source and aligns the synthetic domain features with the target distribution. In addition, a weight w is obtained from the discriminator Dcycle in CycleGAN to balance the synthetic image qualities in the detection loss. The overall structure of our adaptation network is shown in c). Labeled and unlabeled images are both passed through the encoder network E to extract CNN features featL and featU . We then use them to: 1) learn supervised object detection with the detector network from featL, and 2) forward both features to GRL and a domain discriminator, learning domain-invariant features in an adversarial manner. (Related Work)Figure 2. Visualization of the feature distributions via t-SNE [34], showing that our synthetic images serve as an intermediate feature space between the source and target distributions. Each dot represents one image feature extracted from E. We take 500 images from the Cityscapes validation set and 500 from the KITTI training set for comparison. (Progressive Adaptation)Figure 3. Image quality examples from the KITTI dataset synthesized to be in the Cityscapes domain. a) shows the ones that are translated with better quality. Images in b) contain artifacts and fail to preserve details of the car, almost blend into the background. (Progressive Adaptation)Figure 4. Examples of the detection results from our three adaptation tasks. The first two rows are the tasks KITTI → Cityscapes and Cityscapes → Foggy Cityscapes respectively, while the last two rows are the task Cityscapes → BDD100k. We show the detection results on the target domain before and after applying our adaptation method as well as the ground truth labels. (Adaptation to Large-scale Dataset)›