Temporal Difference Coding in Reinforcement Learning

Kazunori Iwata; Kazushi Ikeda

Journal Article

Temporal Difference Coding in Reinforcement Learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 2690 218-227

DOI: 10.1007/978-3-540-45080-1_30

0Citations

2Readers

Get full text

Abstract

In this paper, we regard the sequence of returns as outputs from a parametric compound source. The coding rate of the source shows the amount of information on the return, so the information gain concerning future information is given by the sum of the discounted coding rates. We accordingly formulate a temporal difference learning for estimating the expected information gain, and give a convergence proof of the information gain under certain conditions. As an example of applications, we propose the ratio w of return loss to information gain to be used in probabilistic action selection strategies. We found in experiments that our tu-based strategy performs well compared with the conventional Q-based strategy. © Springer-Verlag 2003.

Cite

CITATION STYLE

APA

Iwata, K., & Ikeda, K. (2004). Temporal Difference Coding in Reinforcement Learning. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2690, 218–227. https://doi.org/10.1007/978-3-540-45080-1_30

Temporal Difference Coding in Reinforcement Learning

Abstract

Cite

Register to see more suggestions