[1912.10917v1] FasterSeg: Searching for Faster Real-time Semantic Segmentation
We also demonstrate that by seamlessly extending to teacher-student co-searching, our NAS framework can boost the student’s accuracy via effective distillation.
We present FasterSeg, an automatically designed semantic segmentation network with not only state-of-the-art performance but also faster speed than current methods. Utilizing neural architecture search (NAS), FasterSeg is discovered from a novel and broader search space integrating multi-resolution branches, that has been recently found to be vital in manually designed segmentation models. To better calibrate the balance between the goals of high accuracy and low latency, we propose a decoupled and fine-grained latency regularization, that effectively overcomes our observed phenomenons that the searched networks are prone to “collapsing” to low-latency yet poor-accuracy models. Moreover, we seamlessly extend FasterSeg to a new collaborative search (co-searching) framework, simultaneously searching for a teacher and a student network in the same single run. The teacher-student distillation further boosts the student model’s accuracy. Experiments on popular segmentation benchmarks demonstrate the competency of FasterSeg. For example, FasterSeg can run over 30% faster than the closest manually designed competitor on Cityscapes, while maintaining comparable accuracy.
‹Figure 1: The multi-resolution branching search space for FasterSeg, where we aim to optimize multiple branches with different output resolutions. These outputs are progressively aggregated together in the head module. Each cell is individually searchable and may have two inputs and two outputs, both of different downsampling rates (s). Inside each cell, we enable searching for expansion ratios within a single superkernel. (FasterSeg: Faster Real-time Segmentation)Figure 2: Our multi-resolution search space covers existing manual designs for real-time segmentation (unused cells omitted). Top: ICNet (Zhao et al., 2018). Bottom: BiSeNet (Yu et al., 2018a) (FasterSeg: Faster Real-time Segmentation)Figure 3: Correlation between network latency and its estimation via our latency lookup table (linear coefficient: 0.993). Red line indicates “y = x”. (Choosing Efficient operators with large Receptive Fields)Figure 4: Comparing mIoU (%) and latency (ms) between supernets. The search is conducted on the Cityscapes training set and mIoU is measured on the validation set. (Choosing Efficient operators with large Receptive Fields)Figure 5: Our co-searching framework, which optimizes two architectures during search (left orange) and distills from a complex teacher to a light student during training from scratch (right green). (Searchable Superkernel for Expansion Ratios)Figure 6: FasterSeg network discovered by our NAS framework. (Architecture Search)Figure 7: Visualization on Cityscapes validation set. Columns from left to right correspond to original images, ground truths, and results of “O, s|χ = 8, b = 1”, “O, s|χ = 8, b = 2”, “T pruned T ”, FasterSeg (see Table ?? for annotation details). Best viewed in color. (Visualization)
Continuous Relaxation of Search Space
I_(s, l) == beta_(s, l)**0 * O_(s/2 * shortrightarrow * s, l
- 1) + beta_(s, l)**1 * O_(s * shortrightarrow * s, l - 1)
Continuous Relaxation of Search Space
O_(s * shortrightarrow * s, l) == sum(alpha_(s, l)**k O_(s *
shortrightarrow * s, l)**k(I_(s, l), chi_(s, l)**j, stride ==
O_(s * shortrightarrow * 2 * s, l) == sum(alpha_(s, l)**k O_(s
* shortrightarrow * 2 * s, l)**k(I_(s, l), chi_(s, l)**j, stride
Choosing Efficient operators with large Receptive Fields
Latency(O, s, chi) == w_1 * Latency(O / s, chi) + w_2 * Latency(s ›
/ O, chi) + w_3 * Latency(chi / O, s)