Proposal of detour path suppression method in PS reinforcement learning and its application to altruistic multi-agent environment

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Profit Sharing is well known as a kind of reinforcement learning. In PS method, a reward is generally distributed with a geometrically decreasing function, and the common ratio of the function is called a discount rate. A large discount rate increases the learning speed, but a non-optimal policy may be learned. On the other hand, a small discount rate improves the performance of the policy, but the learning may not proceed smoothly due to the shallow learning depth. In this paper, in order to cope with these problems, we propose a method that reinforces detour paths and a non-detour path with different discount rates, respectively. Finally, this method is applied to an altruistic multi-agent environment to confirm its effectiveness.

Cite

CITATION STYLE

APA

Shiraishi, D., Miyazaki, K., & Kobayashi, H. (2018). Proposal of detour path suppression method in PS reinforcement learning and its application to altruistic multi-agent environment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11224 LNAI, pp. 638–645). Springer Verlag. https://doi.org/10.1007/978-3-030-03098-8_51

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free