Gradient-aware model-based policy search

Pierluca D'Oro; Alberto Maria Metelli; Andrea Tirinzoni; Matteo Papini; Marcello Restelli

Conference ProceedingsOPEN ACCESS

Gradient-aware model-based policy search

AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (2020) 3801-3808

DOI: 10.1609/aaai.v34i04.5791

31Citations

32Readers

Abstract

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Modelbased Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

Cite

CITATION STYLE

APA

D’Oro, P., Metelli, A. M., Tirinzoni, A., Papini, M., & Restelli, M. (2020). Gradient-aware model-based policy search. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 3801–3808). AAAI press. https://doi.org/10.1609/aaai.v34i04.5791

Gradient-aware model-based policy search

Abstract

Cite

Register to see more suggestions