[1910.04536v1] Deep Structured Mixtures of Gaussian Processes
We discussed that DSMGPs can be understood to perform Bayesian model averaging over naive-local-experts (NLE) models and showed that the hierarchical structure of DSMGPs can be exploited to speed up computations and model non-stationary time-series

Abstract Gaussian Processes (GPs) are powerful nonparametric Bayesian regression models that allow exact posterior inference, but exhibit high computational and memory costs. In order to improve scalability of GPs, approximate posterior inference is frequently employed, where a prominent class of approximation techniques is based on local GP experts. However, the local-expert techniques proposed so far are either not well-principled, come with limited approximation guarantees, or lead to intractable models. In this paper, we introduce deep structured mixtures of GP experts, a stochastic process model which i) allows exact posterior inference, ii) has attractive computational and memory costs, and iii), when used as GP approximation, captures predictive uncertainties consistently better than previous approximations. In a variety of experiments, we show that deep structured mixtures have a low approximation error and outperform existing expert-based approaches.
‹Figure 1: Illustration of a DSMGP (with depth 1). Vertical lines (red) represent hypotheses of independence assumptions, i.e. partitions of the covariate space. (DEEP STRUCTURED MIXTURE OF GAUSSIAN PROCESSES)Figure 2: Noise parameter of DSMGP after global hyperparameter optimisation (global) and local finetuning (fine-tuning) on a synthetic dataset with heteroscedastic noise. (Hyperparameter Optimisation)

Figure 3: generalized PoE Figure 4: robust BCM Figure 5: DSMGP (our) Figure 6: Comparison of gPoE, rBCM and DSMGP against an GP (shaded area) with 5 observations per expert. (EXPERIMENTS)Figure 7: Approximation error on Kin40k dataset. (Approximation Error)

Figure 8: Time required to solve the Cholesky decomposition of a DSMGP on a synthetic dataset using a naive approach or using our shared approach. (Shared Cholesky Decomposition)›