HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing

10Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

Abstract

In this work we present a planning and control method for a quadrotor in an autonomous drone race. Our method combines the advantages of both model-based optimal control and model-free deep reinforcement learning. We consider a single drone racing on a track marked by a series of gates, through which it must maneuver in minimum time. Firstly we solve the discretized Hamilton-Jacobi-Bellman (HJB) equation to produce a closed-loop policy for a simplified, reduced order model of the drone. Next, we train a deep network policy in a supervised fashion to mimic the HJB policy. Finally, we further train this network using policy gradient reinforcement learning on the full drone dynamics model with a low-level feedback controller in the loop. This gives a deep network policy for controlling the drone to pass through a single gate. In a race course, this policy is applied successively to each new oncoming gate to guide the drone through the course. The resulting policy completes a highfidelity AirSim drone race with 12 gates in 34.89s (on average), outracing a model-based HJB policy by 33.20s, a supervised learning policy by 1.24s, and a trajectory planning policy by 12.99s, while a model-free RL policy was never able to complete the race.

Cite

CITATION STYLE

APA

Nagami, K., & Schwager, M. (2021). HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2021.XVII.062

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free