[1910.13616] Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation
We present a novel approach that is able to leverage the strengths of both model-based and model-agnostic meta-learners to discover and exploit the structure of multimodal task distributions

Abstract: Model-agnostic meta-learners aim to acquire meta-learned parameters from
similar tasks to adapt to novel tasks from the same distribution with few
gradient updates. With the flexibility in the choice of models, those
frameworks demonstrate appealing performance on a variety of domains such as
few-shot image classification and reinforcement learning. However, one
important limitation of such frameworks is that they seek a common
initialization shared across the entire task distribution, substantially
limiting the diversity of the task distributions that they are able to learn
from. In this paper, we augment MAML with the capability to identify the mode
of tasks sampled from a multimodal task distribution and adapt quickly through
gradient updates. Specifically, we propose a multimodal MAML (MMAML) framework,
which is able to modulate its meta-learned prior parameters according to the
identified mode, allowing more efficient fast adaptation. We evaluate the
proposed model on a diverse set of few-shot learning tasks, including
regression, image classification, and reinforcement learning. The results not
only demonstrate the effectiveness of our model in modulating the meta-learned
prior in response to the characteristics of tasks but also show that training
on a multimodal distribution can produce an improvement over unimodal training.

‹ (Method)Figure 1: Qualitative Visualization of Regression on Five-modes Simple Functions Dataset. (a): We compare the predicted function shapes of modulated MMAML against the prior models of MAML and Multi-MAML, before gradient updates. Our model can fit the target function with limited observations and no gradient updates. (b): The predicted function shapes after five steps of gradient updates, MMAML is qualitatively better. More visualizations in Supplementary Material. (Regression Experiments)Figure 2: tSNE plots of the task embeddings produced by our model from randomly sampled tasks; marker color indicates different modes of a task distribution. The plots (b) and (d) reveal a clear clustering according to different task modes, which demonstrates that MMAML is able to identify the task from a few samples and produce a meaningful embedding υ. (a) Regression: the distance between modes aligns with the intuition of the similarity of functions (e.g. a quadratic function can sometimes be similar to a sinusoidal or a linear function while a sinusoidal function is usually different from a linear function) (b) Few-shot image classification: each dataset (i.e. mode) forms its own cluster. (c-d) Reinforcement learning: The numbered clusters represent different modes of the task distribution. The tasks from different modes are clearly clustered together in the embedding space. (Image Classification)Figure 3: RL environments. Three environments are used to explore the capability of MMAML to adapt in multimodal task distributions in RL. In all of the environments the agent is tasked to reach a goal marked by a star of a sphere in the figures. The goals are sampled from a multimodal distribution in two or three dimensions depending on the environment. In POINT MASS (a) the agent navigates a simple point mass agent in 2-dimensions. In REACHER (b) the agent controls a 3-link robot arm in 2-dimensions. In ANT (c) the agent controls four-legged ant robot and has to navigate to the goal. The goals are sampled from a 2-dimensional distribution presented in figure (d), while the agent itself is 3-dimensional. (Reinforcement Learning)Figure 4: Visualizations of MMAML and ProMP trajectories in the 4-mode Point Mass 2D environment. Each trajectory originates in the green star. The contours present the multimodal goal distribution. Multiple trajectories are shown per each update step. For each column: the leftmost figure depicts the initial exploratory trajectories without modulation or gradient adaptation applied. The middle figure presents ProMP after one gradient adaptation step and MMAML after a gradient adaptation step and the modulation step, which are computed based on the same initial trajectories. The figure on the right presents the methods after two gradient adaptation steps in addition to the MMAML modulation step. (Reinforcement Learning)Figure 5: Visualizations of MMAML and ProMP trajectories in the ANT and REACHER environments. The figures represent randomly sampled trajectories after the modulation step and two gradient steps for REACHER and three for ANT. Each frame sequence represents a complete trajectory, with the beginning, middle and end of the trajectories captured by the left, middle and right frames respectively. Videos of the trained agents can be found at https://vuoristo.github.io/MMAML/. (Reinforcement Learning)Figure 6: tSNE plots of the task embeddings produced by our model from randomly sampled tasks for regression. We choose to visualize the corresponding task embeddings of two modes, three modes and five modes. (Regression)Figure 8: Examples of images from all the datasets. (Meta-dataset)Figure 9: tSNE plots of task embeddings produced in multimodal few-shot image classification domain. (a) 2-mode 5-way 1-shot (b) 3-mode 5-way 1-shot (c) 5-mode 5-way 5-shot. (Network Architectures)Figure 10: Training curves for MMAML and ProMP in reinforcement learning environments. The curves indicate the average return per episode after gradient-based updates and modulation. The shaded region indicates standard deviation across three random seeds. The curves have been smoothed by averaging the values within a window of 10 steps. (Reinforcement Learning)Figure 11: Additional qualitative results of the regression tasks. MMAML after adaptation vs. other posterior models. (Regression)Figure 12: Additional trajectories sampled from the point mass environment with MMAML and ProMP for six tasks. The contour plots represents the multimodal task distribution. The stars mark the start and goal locations. The curves depict five trajectories sampled using each method after zero, one and two update steps. In the figure, the modulation step takes place between the initial policy and the step after one update. (Reinforcement Learning)›