We introduce a new method for hierarchical reinforcement learning. High- level policies automatically discover subgoals; low-level policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts of the state space at a fine-grained level. Experiments showthat this method outperforms several flat reinforcement learn- ing methods in a deterministic task and in a stochastic task.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below