Wilson [13] showed how delayed reward feedback can be used to solve many multi-step problems for the widely used XCS learning classifier system. However, Wilson's method - based on back-propagation with discounting from Q-learning - runs into difficulties in environments with aliasing states, since the local reward function often does not converge. This paper describes a different approach to reward feedback, in which a layered reward scheme for XCS classifiers is learnt during training. We show that, with a relatively minor modification to XCS feedback, the approach not only solves problems such as Woods1 but can also solve aliasing states problems such as Littman57, MiyazakiA and MazeB. © Springer-Verlag Berlin Heidelberg 2009.
CITATION STYLE
Chen, K. Y., & Lindsay, P. A. (2009). Feedback of delayed rewards in XCS for environments with aliasing states. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5865 LNAI, pp. 252–261). https://doi.org/10.1007/978-3-642-10427-5_25
Mendeley helps you to discover research relevant for your work.