Filtered reinforcement learning

Douglas Aberdeen

Conference ProceedingsOPEN ACCESS

Filtered reinforcement learning

Aberdeen D

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004) 3201 27-38

DOI: 10.1007/978-3-540-30115-8_6

1Citations

8Readers

Abstract

Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to temporal credit assignment, taking advantage of exact or approximate prior information about correct credit assignment. Infinite impulse response (IIR) filters are used to model credit assignment information. IIR filters generalise exponentially discounting eligibility traces to arbitrary credit assignment models. This approach can be applied to any RL algorithm that employs an eligibility trace, The use of IIR credit assignment filters is explored using both the GPOMDP policy-gradient algorithm and the Sarsa(λ) temporal-difference algorithm. A drop in bias and variance of value or gradient estimates is demonstrated, resulting in faster convergence to better policies. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Aberdeen, D. (2004). Filtered reinforcement learning. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3201, pp. 27–38). Springer Verlag. https://doi.org/10.1007/978-3-540-30115-8_6

Filtered reinforcement learning

Abstract

Cite

Register to see more suggestions