Feedback of delayed rewards in XCS for environments with aliasing states

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Wilson [13] showed how delayed reward feedback can be used to solve many multi-step problems for the widely used XCS learning classifier system. However, Wilson's method - based on back-propagation with discounting from Q-learning - runs into difficulties in environments with aliasing states, since the local reward function often does not converge. This paper describes a different approach to reward feedback, in which a layered reward scheme for XCS classifiers is learnt during training. We show that, with a relatively minor modification to XCS feedback, the approach not only solves problems such as Woods1 but can also solve aliasing states problems such as Littman57, MiyazakiA and MazeB. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Chen, K. Y., & Lindsay, P. A. (2009). Feedback of delayed rewards in XCS for environments with aliasing states. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5865 LNAI, pp. 252–261). https://doi.org/10.1007/978-3-642-10427-5_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free