[1612.00222] Interaction Networks for Learning about Objects, Relations and Physics

Abstract: Reasoning about objects, relations, and physics is central to human
intelligence, and a key goal of artificial intelligence. Here we introduce the
interaction network, a model which can reason about how objects in complex
systems interact, supporting dynamical predictions, as well as inferences about
the abstract properties of the system. Our model takes graphs as input,
performs object- and relation-centric reasoning in a way that is analogous to a
simulation, and is implemented using deep neural networks. We evaluate its
ability to reason about several challenging physical domains: n-body problems,
rigid-body collision, and non-rigid dynamics. Our results show it can be
trained to accurately simulate the physical trajectories of dozens of objects
over thousands of time steps, estimate abstract quantities such as energy, and
generalize automatically to systems with different numbers and configurations
of objects and relations. Our interaction network implementation is the first
general-purpose, learnable physics engine, and a powerful general framework for
reasoning about object and relations in a wide variety of complex real-world
domains.

‹Figure 1: Schematic of an interaction network. a. For physical reasoning, the model takes objects and relations as input, reasons about their interactions, and applies the effects and physical dynamics to predict new states. b. For more complex systems, the model takes as input a graph that represents a system of objects, oj, and relations, hi, j, rkik, instantiates the pairwise interaction terms, bk, and computes their effects, ek, via a relational model, fR(·). The ek are then aggregated and combined with the oj and external effects, xj, to generate input (as cj), for an object model, fO(·), which predicts how the interactions and dynamics influence the objects, p. (Introduction)Figure 3: Prediction experiment accuracy and generalization. Each colored bar represents the MSE between a model’s predicted velocity and the ground truth physics engine’s (the y-axes are log-scaled). Sublots (a-c) show n-body performance, (d-f) show balls, and (g-k) show string. The leftmost subplots in each (a, d, g) for each domain compare the constant velocity model (black), baseline MLP (grey), dynamics-only IN (red), and full IN (blue). The other panels show the IN’s generalization performance to different numbers and configurations of objects, as indicated by the subplot titles. For the string systems, the numbers correspond to: (the number of masses, how many ends were pinned). (Results)Figure 2: Prediction rollouts. Each column contains three panels of three video frames (with motion blur), each spanning 1000 rollout steps. Columns 1-2 are ground truth and model predictions for n-body systems, 3-4 are bouncing balls, and 5-6 are strings. Each model column was generated by a single model, trained on the underlying states of a system of the size in the top panel. The middle and bottom panels show its generalization to systems of different sizes and structure. For n-body, the training was on 6 bodies, and generalization was to 3 and 12 bodies. For balls, the training was on 6 balls, and generalization was to 3 and 9 balls. For strings, the training was on 15 masses with 1 end pinned, and generalization was to 30 masses with 0 and 2 ends pinned. The URLs to the full videos of each rollout are in Table ??. (Experiments)›