The fields of Reinforcement Learning (RL) and Optimization aim at finding an optimal solution to a problem, characterized by an objective function. The exploration-exploitation dilemma (EED) is a well known subject in those fields. Indeed, a consequent amount of literature has already been proposed on the subject and shown it is a non-negligible topic to consider to achieve good performances. Yet, many problems in real life involve the optimization of multiple objectives. Multi-Policy Multi-Objective Reinforcement Learning (MPMORL) offers a way to learn various optimised behaviours for the agent in such problems. This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in Inner-Loop MPMORL algorithms. We present three new exploration strategies inspired from the metaheuristics domain. To assess the performance of our methods on various environments, we use a classical benchmark-the Deep Sea Treasure (DST)-as well as propose a harder version of it. Our experiments show all of the proposed strategies outperform the current state-of-the-art ε-greedy based methods on the studied benchmarks.
CITATION STYLE
Felten, F., Danoy, G., Talbi, E. G., & Bouvry, P. (2022). Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning. In International Conference on Agents and Artificial Intelligence (Vol. 2, pp. 662–673). Science and Technology Publications, Lda. https://doi.org/10.5220/0010989100003116
Mendeley helps you to discover research relevant for your work.