Abstract
Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HRL method: Interactive Influence-based Hierarchical Reinforcement Learning (I2HRL). First, inspired by agent modeling, we enable the interaction between the low-level and high-level policies, i.e., the low-level policy sends its policy representation to the high-level policy. The high-level policy makes decisions conditioned on the received low-level policy representation as well as the state of the environment. Second, we stabilize the training of the high-level policy via an information-theoretic regularization with minimal dependence on the changing low-level policy. Third, we propose the influence-based exploration to more frequently visit the non-stationary states where more transition data is needed. We experimentally validate the effectiveness of the proposed solution in several tasks in MuJoCo domains by demonstrating that our approach can significantly boost the learning performance and accelerate learning compared with state-of-the-art HRL methods.
Cite
CITATION STYLE
Wang, R., Yu, R., An, B., & Rabinovich, Z. (2020). I2HRL: Interactive influence-based hierarchical reinforcement learning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 3131–3138). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/433
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.