Fitness landscape features and reward shaping in reinforcement learning policy spaces

Nathaniel du Preez-Wilkinson; Marcus Gallagher

Conference Proceedings

Fitness landscape features and reward shaping in reinforcement learning policy spaces

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12270 LNCS 500-514

DOI: 10.1007/978-3-030-58115-2_35

2Citations

1Readers

Get full text

Abstract

Reinforcement learning (RL) algorithms have received a lot of attention in recent years. However, relatively little work has been dedicated to analysing RL problems; which are thought to contain unique challenges, such as sparsity of the reward signal. Reward shaping is one approach that may help alleviate the sparse reward problem. In this paper we use fitness landscape features to study how reward shaping affects the underlying optimisation landscape of RL problems. Our results indicate that features such as deception, ruggedness, searchability, and symmetry can all be greatly affected by reward shaping; while neutrality, dispersion, and the number of local optima remain relatively invariant. This may provide some guidance as to the potential effectiveness of reward shaping for different algorithms, depending on what features they are sensitive to. Additionally, all of the reward functions we studied produced policy landscapes that contain a single local optimum and very high neutrality. This suggests that algorithms that explore spaces globally, rather than locally, may perform well on RL problems; and may help explain the success of evolutionary methods on RL problems. Furthermore, we suspect that the high neutrality of these landscapes is connected to the issue of reward sparsity in RL.

Author supplied keywords

Cite

CITATION STYLE

APA

du Preez-Wilkinson, N., & Gallagher, M. (2020). Fitness landscape features and reward shaping in reinforcement learning policy spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12270 LNCS, pp. 500–514). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58115-2_35

Fitness landscape features and reward shaping in reinforcement learning policy spaces

Abstract

Author supplied keywords

Cite

Register to see more suggestions