Self-correcting models for model-based reinforcement learning

Erik Talvitie

Conference ProceedingsOPEN ACCESS

Self-correcting models for model-based reinforcement learning

Talvitie E

31st AAAI Conference on Artificial Intelligence, AAAI 2017 (2017) 2597-2603

DOI: 10.1609/aaai.v31i1.10850

58Citations

111Readers

Abstract

When an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to "correct" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations.

Cite

CITATION STYLE

APA

Talvitie, E. (2017). Self-correcting models for model-based reinforcement learning. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 2597–2603). AAAI press. https://doi.org/10.1609/aaai.v31i1.10850

Self-correcting models for model-based reinforcement learning

Abstract

Cite

Register to see more suggestions