[1910.12379v1] Landmark Ordinal Embedding
Through experimental results we show that this method can still be advantageous in settings where one may wish to sacrifice a small amount of accuracy for significant computational savings

Abstract In this paper, we aim to learn a low-dimensional Euclidean representation from a set of constraints of the form “item j is closer to item i than item k”. Existing approaches for this “ordinal embedding” problem require expensive optimization procedures, which cannot scale to handle increasingly larger datasets. To address this issue, we propose a landmark-based strategy, which we call Landmark Ordinal Embedding (LOE). Our approach trades off statistical efficiency for computational efficiency by exploiting the low-dimensionality of the latent embedding. We derive bounds establishing the statistical consistency of LOE under the popular BradleyTerry-Luce noise model. Through a rigorous analysis of the computational complexity, we show that LOE is significantly more efficient than conventional ordinal embedding approaches as the number of items grows. We validate these characterizations empirically on both synthetic and real datasets. We also present a practical approach that achieves the “best of both worlds”, by using LOE to warm-start existing methods that are more statistically efficient but computationally expensive.
‹

Figure 1: Obtaining W from R (c.f. Line ?? and Line ??). Left: The entries of f W are unshifted estimates of the off-diagonal entries E∗ . Right: We add a diagonal of zeroes to f W to match the diagonal of zeroes in E∗ . Figure 2: Shifting each Ri by s∗ i to get E∗ and F∗ . Figure 3: Shifting the rankings Ri to estimate the first ` columns of D∗ (see Algorithm ??). (Recovering Landmark Column Shifts s∗)Figure 4: (n, d, c) = (105 , 2, 200) Figure 5: Time to LOE error vs n Figure 6: Scalability Figure 7: (n, d, c) = (105 , 2, 50) Figure 8: (n, d, c) = (2 × 104 , 10, 200) Figure 9: Warm-start Figure 10: Time to completion vs n Figure 11: Purity vs n Figure 12: MNIST (Synthetic Experiments)Figure 13: Simulation results: Time vs Performance for d = 2, m = 200n log n with n = 1000. (Alternative Baselines)Figure 19: (n, d) = (100, 2), normal Figure 20: (n, d) = (100, 2), uniform Figure 21: (n, d) = (100, 10), normal Figure 22: (n, d) = (100, 10), uniform Figure 23: Simulation results: Consistency plots. We fix the number of objects to be n = 100. Each scenario corresponds to a different combination of data distribution D ∈ {uniform, normal}, and embedding dimension d ∈ {2, 10}). Here, we plot the performance of our algorithm as the number of triplets m = cn log n increases. The plots demonstrate the statistical consistency of our algorithm. (Consistency Results)Figure 14: n = 20000 Figure 15: n = 40000 Figure 16: n = 60000 Figure 17: n = 80000 Figure 18: Simulation results: Time vs Performance for d = 2, m = 200n log n. (Supplemental Computational Efficiency Results)Figure 24: Visualization of the clustering results on the Food dataset: (Top) STE; (Bottom) LOE-STE. As shown in the figures, the embeddings computed by LOE-STE and STE) are identical. (Food Dataset Embedding Visualization)›