On-line profit sharing works efficiently

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Reinforcement learning constructs knowledge containing state-to-action decision rules from agent's experiences. Most of reinforcement learning methods are action-value estimation methods which estimate the true values of state-action pairs and derive the optimal policy from the value estimates. However, these methods have a serious drawback that they stray when the values for the "opposite" actions, such as moving left and moving right, are equal. This paper describes the basic mechanism of on-line profit-sharing (OnPS) which is an action-preference learning method. The main contribution of this paper is to show the equivalence of off-line and on-line in profit sharing. We also show a typical benchmark example for comparison between OnPS and Q-learning.

Cite

CITATION STYLE

APA

Matsui, T., Inuzuka, N., & Seki, H. (2003). On-line profit sharing works efficiently. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2773 PART 1, pp. 317–324). Springer Verlag. https://doi.org/10.1007/978-3-540-45224-9_45

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free