[1912.09393v1] Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks
We have demonstrated that two simple baselines that modify the cross-entropy loss are able to outperform the few modern methods tackling this problem

Abstract Deep neural networks have improvedimage classification dramatically over the past decade, but have done so by focusing on performance measures that treat all classes other than the ground truth as equally wrong. This has led to a situation in which mistakes are less likely to be made than before, but are equally likely to be absurd or catastrophic when they do occur. Past works have recognised andtried to address this issue of mistake severity, often by using graph distances in class hierarchies, but this has largely been neglected since the advent of the current deep learning era in computer vision. In this paper, we aim to renew interest in this problem by reviewing past approaches and proposing two simple modifications of the cross-entropy loss which outperform the prior art under several metrics on two large datasets with complex class hierarchies: tieredImageNet and iNaturalist’19.
‹Figure 1: Top-1 error and distribution of mistakes w.r.t. the WordNet hierarchy for well-known deep neural network architectures on ImageNet: see text for definition of mistake severity. The top-1 error has enjoyed a spectacular improvement in the last few years, but even though the number of mistakes has decreased in absolute terms, the severity of the mistakes made has remained fairly unchanged over the same period. The grey dashed lines denote the minimal possible values of the respective measures. (Introduction)Figure 2: Representations of the HXE (Sec. ??) and soft labels (Sec. ??) losses for a simple illustrative hierarchy are drawn in subfigures (a) and (b) respectively. The groundtruth class is underlined, and the edges contributing to the total value of the loss are drawn in bold. (Soft labels)

Figure 3: Top-1 error vs. hierarchical distance of mistakes, for tieredImageNet-H (top) and iNaturalist19-H (bottom). Points closer to the bottom-left corner of the plot are the ones achieving the best tradeoff. (Experimental results)Figure 4: Top-1 error vs. average hierarchical distance of top-k (with k ∈ {1, 5, 20} for tieredImageNet-H (top three) and iNaturalist19-H (bottom three). Points closer to the bottom-left corner of the plot are the ones achieving the best tradeoff. (Experimental results)Figure 5: Top-1 error vs. hierarchical distance of mistakes (top) and hierarchical distance of top-20 (bottom) for iNaturalist19-H. Points closer to the bottom-left corner of the plots are the ones achieving the best tradeoff. (Experimental results) (Supplementary figures)›