[2001.00426v1] Graph Signal Processing -- Part III: Machine Learning on Graphs, from Graph Topology to Applications
The inherent connection between graphs and deep neural networks has been further addressed and enormous potential of the combination of universal function approximation of neural networks and the elegance of graph models has been demonstrated on an example of semi-supervised learning

\begin{abstract}
Many modern data analytics applications on graphs operate on domains where graph topology is not known a priori, and hence its determination becomes part of the problem
definition, rather than serving as prior knowledge which aids the problem solution. Part III of this monograph starts by addressing ways to learn graph topology, from the case where the physics of the problem already suggest a possible topology, through to most general cases where the graph topology is learned from the data. A particular emphasis is on graph topology definition based on the correlation and precision matrices of the observed data, combined with additional prior knowledge and structural conditions, such as the smoothness or sparsity of graph connections. For learning sparse graphs (with small number of edges), the least absolute shrinkage and selection operator, known as LASSO is employed, along with its graph specific variant, graphical LASSO. For completeness, both variants of LASSO are derived in an intuitive way, and explained. An in-depth elaboration of the graph topology learning paradigm is provided through several examples on physically well defined graphs, such as electric circuits, linear heat transfer, social and computer networks, and spring-mass systems. As many graph neural networks (GNN) and convolutional graph networks (GCN) are emerging, we have also reviewed the main trends in GNNs and GCNs, from the perspective of graph signal filtering. We have in particular studied the diffusion process over graphs and have shown that the trend of various improvements on GCNs can also be understood from the graph diffusion perspective. Given that the existing GCNs have been introduced largely in a heuristic manner, the definition of different diffusion processes can also serve as a basis for a new design of GCNs. Tensor representation of lattice-structured graphs is next considered, and it is shown that tensors (multidimensional data arrays) are a special class of graph signals, whereby the graph vertices reside on a high-dimensional regular lattice structure. This part of monograph concludes with two emerging applications in financial data processing and underground transportation networks modeling. By means of portfolio cuts of an asset graph, we show how domain knowledge can be meaningfully incorporated into investment analysis. In the underground traffic example, we demonstrate how graph theory can be used to identify the stations in the London underground network which have the greatest influence on the functionality of the traffic, and proceed, in an innovative way, to assess the impact of a station closure on service levels across the city.
\end{abstract}
‹Figure 1: Concept of graph definition based on problem geometry. (a) Vertices (points) on a three-dimensional manifold called the Swiss roll surface. (b) A graph representation on the Swiss roll manifold. (c) Two-dimensional presentation of the three-dimensional graph from (b) obtained by unfolding the original 3D surface. (d) Vertices colored using the spectral vector, qn = [u1(n), u2(n)], formed from the two smoothest generalized eigenvectors of the graph Laplacian, u1 and u2. (e) Vertices colored using the spectral vector, qn = [u1(n), u2(n), u3(n)], formed from the three smoothest eigenvectors of the graph Laplacian, u1, u2, and u3. The vertex indexing in (d) and (e) is performed based on the sorted values of the smoothest (Fiedler) eigenvector, u1. (Geometrically Defined Graph Topologies)Figure 3: Temperatures simulated on the Minnesota roadmap graph. (a) Original synthetic temperature field signal. (b) Noisy temperature signal. (c) Low-pass filtered temperature signal from (b). The signal values are designated by the corresponding vertex color. (Geometrically Defined Graph Topologies)Figure 4: Graph learning based on the similarity of blocks of image data. (a) Original image with designated blocks of pixels. (b) The graph produced from the blocks in (a). Notice that the resulting graph consist of seven disconnected subgraphs, which correspond to the seven different groups of blocks. (Graph Topology Based on Signal Similarity)Figure 5: Graph representation of a set of hand-written images of the letter ”b”. The images serve as vertices, while the weight matrix for the edges is defined through the structural similarity index (SSIM) between the images, with Wmn = SSIM(m, n). The vertices are colored in (c) using the smoothest (Fiedler) eigenvector, u1, and the smoothest two eigenvectors, u1 and u2, of the generalized eigenvectors of the Laplacian (with spectral vectors qn = [u1(n)] and qn = [u1(n), u2(n)]) are respectively shown in Fig. ??(c) (left) and (right). (Graph Topology Based on Signal Similarity)Figure 2: Graph which corresponds to the weighted moving average operator with Gaussian weights given in (??). (Geometrically Defined Graph Topologies)Figure 6: Estimation of the weight matrix for the graph from Fig. 2 in Part I with color-coded element values. (a) Ground truth weight matrix. (b) Estimated weight matrix with LASSO and ρ = 0.2. (c) Estimated weight matrix with LASSO and ρ = 0.05. (d) Estimated weight matrix with LASSO and ρ = 1. (Learning of Graph Laplacian from Data)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 8: Data-based learning of graph topology in the temperature sensing example from Part 2, Section 2. (a) Sensing locations in a geographic region along the Adriatic sea. (b) Temperatures measured at N = 16 sensing locations over P = 150 days. (c) Ground truth weight matrix, W, obtained through geographic properties of the sensing locations as in Part 2, Section 2. (d) The weight matrix, W, estimated solely based on the analysis of data from (b) and using the LASSO approach. (Learning of Graph Laplacian from Data)Figure 11: Illustration of eigenvalue calculation based on their second order polynomial obtained from H(λk) = q λ (Rx) k . (Graph Topology Learning Based on the Eigenvectors)Figure 10: Weight matrix for the graph from Fig. 2 in Part I. (a) Ground truth weight matrix. (b) Estimated weight matrix using the graphical LASSO and inverse correlation (precision) matrix. (Learning of Generalized Laplacian Graphical LASSO)Figure 14: Estimation of the weight matrix for a graph with N = 50 randomly positioned vertices. (a) Ground truth weight matrix. (b) Estimated weight matrix using the sparsity minimization of the normalized Laplacian and the polynomial fitting method. (Graph Topology Learning Based on the Eigenvectors)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 7: Adjacency matrix for the unweighted graph from Fig. 1(a) in Part I. (a) Ground truth adjacency matrix. (b) Estimated adjacency matrix with LASSO and ρ = 0.2. (Learning of Graph Laplacian from Data)Figure 9: Weight matrix for the graph from Fig. 2 in Part I. (a) Ground truth weight matrix. (b) Estimated weight matrix using the inverse correlation (precision) matrix. (Learning of Generalized Laplacian Graphical LASSO)Figure 12: Estimation of the weight matrix, W = L − I, for the graph with N = 8 vertices. (a) Ground truth weight matrix. (b) Estimated weight matrix using the sparsity minimization of the normalized Laplacian. (c) Sparsity measure minimization, as a function of parameter ξ1. (d) The exact (blue lines) and estimated (red crosses) eigenvalues of the normalized Laplacian. (Graph Topology Learning Based on the Eigenvectors)Figure 13: Estimation of the weight matrix for the graph with N = 8 vertices. (a) Ground truth weight matrix. (b) Estimated weight matrix using the polynomial fitting method. (c) Sparsity measure minimization, as a function of parameters ξ1 and ξ2. (d) The exact (blue lines) and estimated (red crosses) eigenvalues of the normalized Laplacian. (e) The estimated weight matrix using the LASSO minimization. (f) The estimated weight matrix using the graphical LASSO. (Graph Topology Learning Based on the Eigenvectors)Figure 16: Electric potential, x(n), as a signal on an electric circuit graph. (Resistive Electrical Circuits)Figure 17: Electric potential, x(n), as a signal on an electric circuit graph at the three vertices with nonzero external sources. For this graph, all other values of x(n) in Fig. ?? can be calculated based on the signal values at vertices n = 2, n = 5, and n = 7. (Resistive Electrical Circuits)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 15: Graph for which the Laplacian eigenvectors are the Hadamard transform basis functions for N = 8 and N = 16. (Graph Topology Learning Based on the Eigenvectors)Figure 20: Original piece-wise linear noisy signal (top) and the reconstructed signal (bottom), with the Laplacian of the noisy observations and its re-estimated sparse version (middle panels). (Graph Data Denoising for Sparse External Sources)Figure 21: Temperature, x(n) = T(n), as a signal on a heat transfer graph. (Heat Transfer)Figure 22: Spring-mass system on a path graph. (Spring-Mass Systems)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 18: Graph signal, x(n), from Fig. ??, observed on a graph with a reduced number of vertices (“downsampling”), whereby the vertices n = 6 and n = 1 are removed (crosses in green dots). Observe that the signal values at the active vertices, n = 0, n = 5, and n = 7, are not changed. The edge weights in gray shade are the equivalent values obtained using the standard resistor, Rmn = 1/Wmn, transformations. (Graph transformations)Figure 19: Graph signal, x(n), from Fig. ?? observed on a graph with an extended number of vertices (“interpolation”). Observe that the signal values at all vertices, n = 0, 1, 2, 3, 4, 5, 6, 7, from Fig. ?? are not changed. In the locations where the new vertices n = 8 and n = 9 are added, the graph signal is interpolated using x(2), x(5), and x(7), as in (??), and the corresponding edge weights are shown in gray. (Graph transformations)Figure 25: Electric circuit interpretation of the commute time, CT(m, n) = DR (m,n) eff . (Hitting and Commute Time)Figure 29: Smoothness and graph learning. (a) The observed graph signal x = [0.7, 0.2, 0.6, 1.1 − 0.3, −1.1, 1.3, −0.7]T , with (b)- (c) two types of possible path graph connections resulting in different smoothness values, xT Lx. (Factor analysis model)Figure 26: Estimation of the graph Laplacian, L, for a graph with N = 50 randomly positioned vertices and a small number of edges. (a) Ground truth graph Laplacian, L. (b) Estimated graph Laplacian using the norm-two for a large number of observations, P = 60 > N = 50. (c) Estimated graph Laplacian using the LASSO, for a large number of observations, P = 60 > N = 50. (d) Estimated graph Laplacian using the norm-two, for a small number of observations, P = 40 < N = 50. (c) Estimated graph Laplacian using the LASSO, for a small number of observations, P = 40 < N = 50. (Graph Learning from Data and External Sources)Figure 27: A sparse signal with N = 60 and K = 4, which is reconstructed using a reduced set of M = 40 observations and the LASSO iterative algorithm. The results for the matched filter (initial estimate), X0 = AT y, and LASSO iterative algorithm with ρ = 0.01 and ρ = 0.0001 are shown. (LASSO)Figure 23: Hyper-linked pages represented as a directed graph. (Social Networks and Linked Pages)Figure 24: An example of a small social network represented as an undirected graph. (Random Walk)Figure 28: Estimation of the weight matrix, W, for a graph with N = 50 randomly positioned vertices. (a) Ground truth weight matrix, W. (b) Precision matrix, for a large number of observations, P = 1000 N = 50. (c) Estimated weight matrix using the graphical LASSO, for a large number of observations, P = 1000 N = 50. (d) Precision matrix, for a small number of observations, P = 40 < N = 50 (the correlation matrix, R, is singular and with a rank lower or equal to P, so that pseudo-inversion is used). (e) Estimated weight matrix using the graphical LASSO, for a small number of observations, P = 40 < N = 50. (Graphical LASSO)Figure 41: Path graph signal, x1 ∈ RI1 , sampled from x1 : N 7→ R. Figure 42: Path graph signal, x2 ∈ RI2 , sampled from x2 : N 7→ R. (Connectivity of a tensor)Figure 32: Label propagation via two labelled images out of ten available images per digit from the MNIST dataset. Three sets of digits (1, 5 and 9) are chosen and each set contains ten images. a) The constructed graph via the SSIM metric, where two images (nodes) are connected when their SSIM is larger than a threshold (set to 0.35). b) The ground truth labels for the 30 images considered. c) Only two labels are provided for each set of images, as indicated by the red color. d) Predicted labels from the given six (i.e., 2×3) labels via label propagation over the graph Laplacian matrix L. The color bar designates the certainty of predictions, namely, the red color denotes an almost sure prediction with probability approaching 1 and green color poor prediction; (Label propagation as a diffusion process)Figure 33: The structure of the GCN proposed in [74] for semisupervised learning. (Graph spectral filtering via neural networks)Figure 34: Portions of data used for training versus the test accuracy on Cora dataset [82] for a simple implementation of one typical GCN [74] for semi-supervised learning. In this example, we used one hidden layer with 256 neurons. The dropout rate was set to 0.5 and learning rate to 0.01. (Graph spectral filtering via neural networks)Figure 35: Tensorization of discrete samples from a field x : N3 7→ R. (Tensorization of graph signals in highdimensional spaces)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 36: Rank-1 CPD of an order-3 tensor. (Tensor decomposition)Figure 37: Rank-1 CPD of the Netflix ratings data matrix. (Tensor decomposition)Figure 38: Cartesian product of 3 disjoint path graphs. (Connectivity of a tensor)Figure 39: A field, x : N2 7→ R. (Connectivity of a tensor)Figure 30: The graph signal spectrum values corresponding to the two types of graph connections in Figure ??. The top panel corresponds to Fig. ??-(b) and the bottom panel to Fig. ??-(c). The energy is calculated via xT Lx, where small values indicate a smooth graph. (Factor analysis model)Figure 31: An illustrative graph, which is a simplified version of Figure 1-(a) of Part 1. (Connection to the Laplacian operator in function analysis)Figure 40: Order-2 tensor, X ∈ RI1×I2 , sampled from x : N2 7→ R. (Connectivity of a tensor)Figure 45: Factorization of a multi-relational adjacency tensor, A ∈ RN×N×M as in (??). (Tensor representation of multi-relational graphs)Figure 43: Unstructured graph, x̃ ∈ RK , sampled from x : N2 7→ R. (Unstructured graphs)Figure 44: Construction of a multi-relational adjacency tensor, A ∈ RN×N×M , where En denotes the n-th entity and Rm the m-th relation type. (Tensor representation of multi-relational graphs)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 46 Figure 47 Figure 48: Social network modelled as a multi-relational graph. (a) Graph representation of the social network. (b) Adjacency tensor, A ∈ R3×3×4, associated with the social network in (a). (Tensor representation of multi-relational graphs)Figure 55 Figure 56 Figure 57 Figure 58: Graph cut based asset allocation strategies. (a) Hierarchical graph structure resulting from K = 4 portfolio cuts. (b) A graph tree based on the 1 2Ki scheme. (c) A graph tree based on the 1 K+1 scheme. (Repeated portfolio cuts)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 49: Graph model of the London underground network in Zones 1–3. (Traffic centrality as a graph-theoretic measure)Figure 50: Betweenness centrality, designated by magenta-coloured bars, of the London underground network in Zones 1–3. The largest betweenness centrality is observed for the following stations: Green Park, Earl’s Court, Baker Street, Waterloo and Westminster. (Traffic centrality as a graph-theoretic measure)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 51: Closeness vitality, designated in magenta bars, of the London underground network in Zones 1–3. (Traffic centrality as a graph-theoretic measure)Figure 52: Towards a graph representation of the London underground network. A simplified path graph with two stations surrounded by the respective populations, φ(1) and φ(2), exhibits the corresponding net fluxes, q(1) and q(2). Intuitively, stations surrounded by large populations experience net in-flows of passengers, whereas stations surrounded by low populations experience net outflows of passengers. (Modeling commuter population from net passenger flow)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 59 Figure 60 Figure 61 Figure 62 Figure 63: Visualisation of the 100-vertex market graph connectivity for the 100 most liquid stocks in S&P 500 index, and its partitions into disjoint sub-graphs (separated by dashed grey lines). The edges (blue lines) were calculated based on the correlation between assets. (a) Fully connected market graph with 5050 edges. (b) Partitioned graph after K = 1 portfolio cuts (CutV), with 2746 edges. (c) Partitioned graph after K = 2 portfolio cuts (CutV), with 1731 edges. (d) Partitioned graph after K = 10 portfolio cuts (CutV), with 575 edges. Notice that the number of edges required to model the market graph is significantly reduced with each subsequent portfolio cut, since PK+1 i=1 1 2 (N2 i +Ni) < 1 2 (N2 +N), ∀K > 0. (Numerical Example)Figure 64: Evolution of wealth for both the traditional (EW and MV) and graph-theoretic asset allocation strategies, based on (K = 10) portfolio cuts. Figure 65: Sharpe ratios attained for a varying number of portfolio cuts K. Figure 66: Out-sample performance of the asset allocation strategies. Notice that the Sharpe ratio typically improves with each subsequent portfolio cut. The traditional portfolio strategies, EW and MV, attained the respective Sharpe ratios of SREW = 1.85 and SRMV = 1.6. (Numerical Example)

W_mn == if(for * r_mn <= kappa): (1/r_mn,) elseIf(for * r_mn > kappa * and * m == n.): (0,)

Figure 53: Net passenger outflow during the morning rush hour within Zones 1–3 of the London underground network. The magenta bars designate a net outflow of passengers while the cyan bars designate a net inflow of passengers. Stations located within business districts exhibit the greatest net outflow of passengers, while stations located in residential areas toward the boundaries of Zones 2–3 exhibit the largest net inflow of passengers. (Modeling commuter population from net passenger flow)Figure 54: Population distribution implied by our graph model in (??), calculated from the net passenger outflow during the morning rush hour within Zones 1–3. As expected, business districts exhibit the lowest population density, while residential areas (Zones 2–3) exhibit the highest commuter population density. (Modeling commuter population from net passenger flow)