[1909.13488v1] Locally Constant Networks
We create a novel neural architecture by casting the derivatives of deep networks as the representation, which realizes a new class of neural models that is equivalent to oblique decision trees
Abstract: We show how neural models can be used to realize piece-wise constant
functions such as decision trees. Our approach builds on ReLU networks that are
piece-wise linear and hence their associated gradients with respect to the
inputs are locally constant. We formally establish the equivalence between the
classes of locally constant networks and decision trees. Moreover, we highlight
several advantageous properties of locally constant networks, including how
they realize decision trees with parameter sharing across branching / leaves.
Indeed, only $M$ neurons suffice to implicitly model an oblique decision tree
with $2^M$ leaf nodes. The neural representation also enables us to adopt many
tools developed for deep networks (e.g., DropConnect (Wan et al. 2013)) while
implicitly training decision trees. We demonstrate that our method outperforms
alternative techniques for training oblique decision trees in the context of
molecular property classification and regression tasks.
o_j**i == (dx a_j**i)/(dx z_j**i) = I, forall(i, j) in I
Figure 1: Toy examples for the equivalent representations of the same mappings for different M. Here the locally constant networks have 1 neuron per layer. We show the locally constant networks on the LHS, the raw mappings in the middle, and the equivalent oblique decision trees on the RHS. (Canonical locally constant networks)Figure 2: Learning curve of LCN Figure 3: Training performance Figure 4: Testing performance Figure 5: Empirical analysis for oblique decision trees on the HIV dataset. Fig. ?? is an ablation study for LCN and Fig. ??-?? compare different training methods. (Experiment)Figure 6: Visualization of learned locally constant network in the representation of oblique decision trees using the proof of Theorem ??. The number in the leaves indicates the ranking of output probability among the 16 leaves (the exact value is not important). See the descriptions in Appendix ??. (Visualization)›