[1910.08041v1] Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction
The strong performance of DRF-NET’s discrete predictions is very promising for cost-based and constrained robotic planning.

Abstract: Self-driving vehicles plan around both static and dynamic objects, applying predictive models of behavior to estimate future locations of the objects in the environment. However, future behavior is inherently uncertain, and models of motion that produce deterministic outputs are limited to short timescales. Particularly difficult is the prediction of human behavior. In this work, we propose the discrete residual flow network (DRF-NET), a convolutional neural network for human motion prediction that captures the uncertainty inherent in long-range motion forecasting. In particular, our learned network effectively captures multimodal posteriors over future human motion by predicting and updating a discretized distribution over spatial locations. We compare our model against several strong competitors and show that our model outperforms all baselines.
‹Figure 1: Challenging urban scenarios for pedestrian prediction, depicting pedestrian detections (circles) and future state posteriors colored by time horizon. (a) Gaussian distributions often poorly express scene-sensitive behaviors. (b) Inherent multimodality: the pedestrian may cross a crosswalk or continue along a sidewalk. (c) Partial observability: signals and actors may be occluded. (Introduction)Figure 2: Overview of the Discrete Residual Flow Network. Pedestrian of Interest (PoI) and actor detections are aligned with a semantic map. A multi-scale backbone jointly reasons over spatiotemporal information in the input, embedding context into a feature F. Finally, the DRF head recursively adapts an initial distribution to predict future pedestrian states on long time horizons. (Discrete Residual Flow Network)Figure 3: Scene history and context representation. DRF-NET rasterizes map elements into a shared spatial representation (b), augmented with spatio-temporal encodings of actor motion (c). (Discrete Residual Flow Network)Figure 4: One step of recursive Discrete Residual Flow. The log potential is used to update the global feature map F. DRF then predicts a residual ψt;θt to flow to the log potential for the next timestep. (Discrete Residual Flow Network)Figure 5: Calibration curves and expected calibration error (∗ 10−3 %) (Baselines)Figure 6: Test metrics. DRF-NET has low NLL (a) and captures the multimodality inherent in longrange futures (b). Discrete state space (DRF, ConvLSTM) yields the lowest NLL and entropy (c), and entropy per mode saturates. However, EPM increases with horizon for continuous MDNs (d). (Results)Figure 7: Pedestrian predictions: ground truth past trajectory is green, future is black, opacity shows density, and color shows time horizon. MDN-4 predictions are omitted due to similarity to MDN-8; both are largely unimodal. More results in the supplementary video. (Results)Figure 8: Backbone feature pyramid network (FPN). N denotes the batch size, e.g. the number of pedestrians of interest for inference or number of scenarios per batch for training. (Backbone network)