[1801.05772] Ranking Data with Continuous Labels through Oriented Recursive Partitions
The numerical results obtained via the algorithmic approaches we proposed for optimizing the criteria aforementioned highlight the difference in nature between these two statistical learning tasks.

Abstract: We formulate a supervised learning problem, referred to as continuous
ranking, where a continuous real-valued label Y is assigned to an observable
r.v. X taking its values in a feature space $\mathcal{X}$ and the goal is to
order all possible observations x in $\mathcal{X}$ by means of a scoring
function $s:\mathcal{X}\rightarrow \mathbb{R}$ so that s(X) and Y tend to
increase or decrease together with highest probability. This problem
generalizes bi/multi-partite ranking to a certain extent and the task of
finding optimal scoring functions s(x) can be naturally cast as optimization of
a dedicated functional criterion, called the IROC curve here, or as
maximization of the Kendall ${\tau}$ related to the pair (s(X), Y ). From the
theoretical side, we describe the optimal elements of this problem and provide
statistical guarantees for empirical Kendall ${\tau}$ maximization under
appropriate conditions for the class of scoring function candidates. We also
propose a recursive statistical learning algorithm tailored to empirical IROC
curve optimization and producing a piecewise constant scoring function that is
fully described by an oriented binary tree. Preliminary numerical experiments
highlight the difference in nature between regression and continuous ranking
and provide strong empirical evidence of the performance of empirical
optimizers of the criteria proposed.

Figure 1: A scoring function described by an oriented binary subtree T . For any element x ∈ X, one may compute the quantity sT (x) very fast in a top-down fashion by means of the heap structure: starting from the initial value 2J at the root node, at each internal node Cj,k, the score remains unchanged if x moves down to the left sibling, whereas one subtracts 2J−(j+1) from it if x moves down to the right. (Ranking trees and Oriented Recursive Partitions)Figure 2: The least squares regressor sLS (dotted line) better approximates, in terms of mean squared error, the regression function m (solid line) than s∗ (dashed line) does. Still, the latter is optimal for the ranking task as it is a strictly increasing transform of m. (Distribution-free regression vs continuous ranking)Figure 3 Figure 4 Figure 5: Polynomial regression function m and scoring functions provided by CRank, Kendall and CART. For visualization reasons, sCRank and sKendall have been renormalized by 2D = 8 to take values in [0, 1] and, in Fig. ??, affine functions have been applied to the three scoring functions. (Numerical Experiments (Figures))›