A comparison of plan-based and abstract MDP reward shaping

Kyriakos Efthymiadis; Daniel Kudenko

Journal ArticleOPEN ACCESS

A comparison of plan-based and abstract MDP reward shaping

Connection Science (2014) 26(1) 85-99

DOI: 10.1080/09540091.2014.885283

6Citations

5Readers

Get full text

Abstract

Reward shaping has been shown to significantly improve an agent's performance in reinforcement learning. As attention is shifting away from tabula-rasa approaches many different reward shaping methods have been developed. In this paper, we compare two different methods for reward shaping; plan-based, in which an agent is provided with a plan and extra rewards are given according to the steps of the plan the agent satisfies, and reward shaping via abstract Markov decision process (MDPs), in which an abstract high-level MDP of the environment is solved and the resulting value function is used to shape the agent. The comparison is conducted in terms of total reward, convergence speed and scaling up to more complex environments. Empirical results demonstrate the need to correctly select and set up reward shaping methods according to the needs of the environment the agents are acting in. This leads to the more interesting question, is there a reward shaping method which is universally better than all other approaches regardless of the environment dynamics?. © 2014 Taylor & Francis.

Author supplied keywords

Cite

CITATION STYLE

APA

Efthymiadis, K., & Kudenko, D. (2014). A comparison of plan-based and abstract MDP reward shaping. Connection Science, 26(1), 85–99. https://doi.org/10.1080/09540091.2014.885283

A comparison of plan-based and abstract MDP reward shaping

Abstract

Author supplied keywords

Cite

Register to see more suggestions