HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing

Keiko Nagami; Mac Schwager

Conference ProceedingsOPEN ACCESS

HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing

Robotics: Science and Systems (2021)

DOI: 10.15607/RSS.2021.XVII.062

10Citations

29Readers

Abstract

In this work we present a planning and control method for a quadrotor in an autonomous drone race. Our method combines the advantages of both model-based optimal control and model-free deep reinforcement learning. We consider a single drone racing on a track marked by a series of gates, through which it must maneuver in minimum time. Firstly we solve the discretized Hamilton-Jacobi-Bellman (HJB) equation to produce a closed-loop policy for a simplified, reduced order model of the drone. Next, we train a deep network policy in a supervised fashion to mimic the HJB policy. Finally, we further train this network using policy gradient reinforcement learning on the full drone dynamics model with a low-level feedback controller in the loop. This gives a deep network policy for controlling the drone to pass through a single gate. In a race course, this policy is applied successively to each new oncoming gate to guide the drone through the course. The resulting policy completes a highfidelity AirSim drone race with 12 gates in 34.89s (on average), outracing a model-based HJB policy by 33.20s, a supervised learning policy by 1.24s, and a trajectory planning policy by 12.99s, while a model-free RL policy was never able to complete the race.

Cite

CITATION STYLE

APA

Nagami, K., & Schwager, M. (2021). HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2021.XVII.062

HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing

Abstract

Cite

Register to see more suggestions