[1912.02175] Differentiation of Blackbox Combinatorial Solvers
This breaks some of our theoretical guarantees but given their strong empirical performance, the fusion might still work well in practice.

Achieving fusion of deep learning with combinatorial algorithms promises transformative changes to artificial intelligence. One possible approach is to introduce combinatorial building blocks into neural networks. Such end-to-end architectures have the potential to tackle combinatorial problems on raw input data such as ensuring global consistency in multi-object tracking or route planning on maps in robotics. In this work, we present a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions. We provide both theoretical and experimental backing. In particular, we incorporate the Gurobi MIP solver, Blossom V algorithm, and Dijkstra’s algorithm into architectures that extract suitable features from raw inputs for the traveling salesman problem, the min-cost perfect matching problem and the shortest path problem. The code is available at https://github.com/martius-lab/blackbox-backprop.
‹

Figure 1: Architecture design enabled by Theorem ??. Blackbox combinatorial solver embedded into a neural network. (Introduction)Figure 2 Figure 3 Figure 4: Continuous interpolation of a piecewise constant function. (??) fλ for a small value of λ; the set Wλ eq is still substantial and only two interpolators g1 and g2 are incomplete. Also, all interpolators are 0-interpolators. (??) fλ for a high value of λ; most interpolators are incomplete and we also encounter a δ-interpolator g3 (between y1 and y2) which attains the value f(y1) δ-away from the set P1. Despite losing some local structure for high λ, the gradient of fλ is still informative. (Method)Figure 5: Example fλ for w ∈ R2 and λ = 3, 10, 20 (left to right). As λ changes, the interpolation fλ is less faithful to the piecewise constant f(y(w)) but provides reasonable gradient on a larger set. (Method)Figure 15: The situation for λ = 0. We can see the polytope P on which y(w) attains y1 ∈ Y . The boundary of P is composed of segments of lines F(y1, yk) for k = 2, . . . , 5. Figure 16: The same situation is captured for some relatively small λ > 0. Each line Fλ(y1, yk) is parallel to its corresponding F(y1, yk) and encompasses a convex polytope in Wλ. Figure 17: The family Wλ of all maximal connected sets P on which yλ is constant. (Proofs)Figure 6 Figure 7 Figure 8: The SP(k) dataset. (??) Each input is a k × k grid of tiles corresponding to a Warcraft II terrain map, the respective label is a the matrix indicating the shortest path from top left to bottom right. (??) is a different map with correctly predicted shortest path. (Warcraft Shortest Path)Figure 9 Figure 10 Figure 11: The TSP(k) problem. (??) illustrates the dataset. Each input is a sequence of k flags and the corresponding label is the adjacency matrix of the optimal TSP tour around the corresponding capitals. (??) displays the learned locations of 10 country capitals in southeast Asia and Australia, accurately recovering their true position. (Globe Traveling Salesman Problem)Figure 12 Figure 13 Figure 14: Visualization of the PM dataset. (??) shows the case of PM(4). Each input is a 4 × 4 grid of MNIST digits and the corresponding label is the indicator vector for the edges in the min-cost perfect matching. (??) shows the correct min-cost perfect matching output from the network. The cost of the matching is 348 (46 + 12 horizontally and 27 + 45 + 40 + 67 + 78 + 33 vertically). (MNIST Min-cost Perfect Matching)Figure 18: The facets of P1 consist of parts of hyperplanes F(y1, zk) in W. Each facet F(y1, zk) has its corresponding shifts Fλ and F−λ, from which only one intersects P. The polytope Pλ 1 is then bounded by those outer shifts. Figure 19: The interpolator g attains the value f(y1) on a part of Fλ(y1, y2) – a border of the domain G. The value f(y2) is attained on a part of F(y1, y2) – the second border of the strip G. Figure 20: The polytopes P1 and Pλ 1 and the interpolator g. (Proof of Theorem ??)Figure 21: Three random example maps. Figure 22: the shortest path distribution in the training set. All possible path lengths (18-35) occur. Figure 23: Warcraft SP(18) dataset. (Warcraft Shortest Path)›