[1912.12693v1] A Gentle Introduction to Deep Learning for Graphs
In relation to this challenge, recent progresses have been facilitated by the growth and wide adoption by the community of modern software packages for the adaptive processing of graphs

\begin{abstract}
The adaptive processing of graph data is a long-standing research topic which has been lately consolidated as a theme of major interest in the deep learning community.
The snap increase in the amount and breadth of related research has come at the price of little systematization of knowledge and attention to earlier literature. This work is designed as a tutorial introduction to the field of deep learning for graphs. It favours a consistent and progressive introduction of the main concepts and architectural aspects over an exposition of the most recent literature, for which the reader is referred to available surveys. The paper takes a top-down view to the problem, introducing a generalized formulation of graph representation learning based on a local and iterative approach to structured information processing. It introduces the basic building blocks that can be combined to design novel and effective neural models for graphs. The methodological exposition is complemented by a discussion of interesting research challenges and applications in the field.
\end{abstract}
‹Figure 1 Figure 2 Figure 3 Figure 4: (a) An input graph with undirected and directed arcs is shown. (b) We transform undirected arcs into directed ones to obtain a viable input for graph learning methods. (c) We visually represent the (open) neighborhood of node v1. (High-level Overview)Figure 5: The bigger picture that all graph learning methods share. A DGN takes an input graph and produces node representations hv ∀v ∈ Vg. Such representations can be aggregated to form a single graph representation hg. (The Bigger Picture)Figure 6: The road-map of the architectures we will discuss in detail. (The Bigger Picture)Figure 7: Context spreading in an undirected graph is shown for a network of depth 3, where wavy arrows represent the context flow. Specifically, we focus on the context of node u at the last layer, by looking at the figure from right to left. It is easy to see that the context of node u at ` = 2 depends on its only neighbor v at ` = 1, which in turn depends on its neighboring node representations at ` = 0 (u included). Therefore, the context of u is given by almost all the nodes in the graph. (Three Mechanisms of Context Diffusion)Figure 8: The sampling technique affects the neighborhood aggregation procedure, by selecting either a subset of the neighbors [18] or a subset of the nodes in the graph [42] to compute h`+1 v . Here, nodes in red have been randomly excluded from the neighborhood aggregation of node v, and the context flows only through the wavy arrows. (Neighborhood Aggregation)Figure 9: We show an example of pooling for graph classification. Each pooling layer coarsens the graph by clustering nodes of the same community together, so that each group becomes a node of the coarsened graph. If we reduce the graph up to a single node, we can interpret that as a supersource node that represents the whole graph. At that point, a standard classifier can be applied to output a graph prediction yg. (Pooling)Figure 10: Two possible architectures (feedforward and recurrent) for node and graph classification. Inside each layer, one can apply the attention and sampling techniques described in this Section. After pooling is applied, it is not possible to perform node classification anymore, which is why a potential model for node classification can simply combine graph convolutional layers. A recurrent architecture (bottom) iteratively applies the same neighborhood aggregation, possibly until a convergence criterion is met. (Summary)Figure 11: A simplified schema of graph-level (top row) and node-level (bottom row) generative decoders is shown. Tilde symbols on top of arrows indicate sampling. Dashed arrows indicate that the corresponding sampling procedure is not differentiable in general. Darker shades of blue indicate higher probabilities. (Generative learning)