[1911.01562] DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning
The platform integrates state-of-the-art Deep RL algorithms, multiple simulation engines with OpenAI Gym interface, provides on-demand compute, distributed rollouts that facilitates domain randomization and robust evaluation in parallel
Abstract— DeepRacer is a platform for end-toend experimentation with RL and can be used to systematically investigate the key challenges in developing intelligent control systems. Using the platform, we demonstrate how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera. It is trained in simulation with no additional tuning in physical world and demonstrates: 1) formulation and solution of a robust reinforcement learning algorithm, 2) narrowing the reality gap through joint perception and dynamics, 3) distributed on-demand compute architecture for training optimal policies, and 4) a robust evaluation method to identify when to stop training. It is the first successful largescale deployment of deep reinforcement learning on a robotic control agent that uses only raw camera images as observations and a model-free learning method to perform robust path planning. We open source our code and video demo on GitHub [DeepRacer training source code: https://git.io/fjxoJ].
‹Fig. 1: Observation, action and reward for DeepRacer agent (Autonomous Racing with RL)Fig. 2: Training the agent with DeepRacer distributed rollouts (Training Workflow)Fig. 3: Simulation tracks Fig. 4: Camera view of simulation tracks Fig. 5: Camera view of real world tracks Fig. 6: We train in multiple tracks and evaluate with a replica track as well as a track made with duct tape. (Training Workflow)Fig. 7: DeepRacer Hardware Specifications (Simulation with AWS RoboMaker)Fig. 8: Training with Track A and maximum throttle of 1 m/s Fig. 9: Training with Track A and maximum throttle of 1.67 m/s Fig. 10: Training with Track B and maximum throttle of 1.67 m/s Fig. 11: Training with multiple rollout workers. Progress on track is reported across two runs. (Sim2Real Calibration)Fig. 12: Robust evaluation with domain randomization as a criteria to select policy checkpoints for sim2real transfer. (Robust Evaluation)›