A comparison of plan-based and abstract MDP reward shaping

6Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Reward shaping has been shown to significantly improve an agent's performance in reinforcement learning. As attention is shifting away from tabula-rasa approaches many different reward shaping methods have been developed. In this paper, we compare two different methods for reward shaping; plan-based, in which an agent is provided with a plan and extra rewards are given according to the steps of the plan the agent satisfies, and reward shaping via abstract Markov decision process (MDPs), in which an abstract high-level MDP of the environment is solved and the resulting value function is used to shape the agent. The comparison is conducted in terms of total reward, convergence speed and scaling up to more complex environments. Empirical results demonstrate the need to correctly select and set up reward shaping methods according to the needs of the environment the agents are acting in. This leads to the more interesting question, is there a reward shaping method which is universally better than all other approaches regardless of the environment dynamics?. © 2014 Taylor & Francis.

Cite

CITATION STYLE

APA

Efthymiadis, K., & Kudenko, D. (2014). A comparison of plan-based and abstract MDP reward shaping. Connection Science, 26(1), 85–99. https://doi.org/10.1080/09540091.2014.885283

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free