Scaling Up Reinforcement Learning through Targeted Exploration

Timothy A. Mann; Yoonsuck Choe

Conference ProceedingsOPEN ACCESS

Scaling Up Reinforcement Learning through Targeted Exploration

Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI 2011 (2011) 435-440

DOI: 10.1609/aaai.v25i1.7929

1Citations

17Readers

Abstract

Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelope ξ. When ξ equals the total state space, STAR-MAX behaves identically to R-MAX. When ξ is a subset of the state space, to keep exploration within ξ, a recovery rule β is needed. We compared existing algorithms with our algorithm employing various exploration envelopes. With an appropriate choice of ξ, STAR-MAX scales far better than existing RL algorithms as the number of states increases. A possible drawback of our algorithm is its dependence on a good choice of ξ and β. However, we show that an effective recovery rule β can be learned on-line and ξ can be learned from demonstrations. We also find that even randomly sampled exploration envelopes can improve cumulative rewards compared to R-MAX. We expect these results to lead to more efficient methods for RL in large-scale problems.

Cite

CITATION STYLE

APA

Mann, T. A., & Choe, Y. (2011). Scaling Up Reinforcement Learning through Targeted Exploration. In Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI 2011 (pp. 435–440). AAAI Press. https://doi.org/10.1609/aaai.v25i1.7929

Scaling Up Reinforcement Learning through Targeted Exploration

Abstract

Cite

Register to see more suggestions