Temporal Difference Coding in Reinforcement Learning

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we regard the sequence of returns as outputs from a parametric compound source. The coding rate of the source shows the amount of information on the return, so the information gain concerning future information is given by the sum of the discounted coding rates. We accordingly formulate a temporal difference learning for estimating the expected information gain, and give a convergence proof of the information gain under certain conditions. As an example of applications, we propose the ratio w of return loss to information gain to be used in probabilistic action selection strategies. We found in experiments that our tu-based strategy performs well compared with the conventional Q-based strategy. © Springer-Verlag 2003.

Cite

CITATION STYLE

APA

Iwata, K., & Ikeda, K. (2004). Temporal Difference Coding in Reinforcement Learning. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2690, 218–227. https://doi.org/10.1007/978-3-540-45080-1_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free