We present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and defines subtasks for moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from non-stationarities induced by high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. We show that our approach is significantly more sample efficient than that of a flat learner and similar hierarchical approaches when the set of boundary states is smaller than the entire state space.
CITATION STYLE
Infante, G., Jonsson, A., & Gómez, V. (2022). Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 6970–6977). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i6.20655
Mendeley helps you to discover research relevant for your work.