Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning

Peter Henderson; Wei Di Chang; Pierre Luc Bacon; David Meger; Joelle Pineau; Doina Precup

Conference ProceedingsOPEN ACCESS

Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning

32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (2018) 3199-3206

DOI: 10.1609/aaai.v32i1.11775

37Citations

184Readers

Abstract

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

Cite

CITATION STYLE

APA

Henderson, P., Chang, W. D., Bacon, P. L., Meger, D., Pineau, J., & Precup, D. (2018). Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (pp. 3199–3206). AAAI press. https://doi.org/10.1609/aaai.v32i1.11775

Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning

Abstract

Cite

Register to see more suggestions