On-line profit sharing works efficiently

Tohgoroh Matsui; Nobuhiro Inuzuka; Hirohisa Seki

Conference Proceedings

On-line profit sharing works efficiently

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2003) 2773 PART 1 317-324

DOI: 10.1007/978-3-540-45224-9_45

2Citations

2Readers

Get full text

Abstract

Reinforcement learning constructs knowledge containing state-to-action decision rules from agent's experiences. Most of reinforcement learning methods are action-value estimation methods which estimate the true values of state-action pairs and derive the optimal policy from the value estimates. However, these methods have a serious drawback that they stray when the values for the "opposite" actions, such as moving left and moving right, are equal. This paper describes the basic mechanism of on-line profit-sharing (OnPS) which is an action-preference learning method. The main contribution of this paper is to show the equivalence of off-line and on-line in profit sharing. We also show a typical benchmark example for comparison between OnPS and Q-learning.

Cite

CITATION STYLE

APA

Matsui, T., Inuzuka, N., & Seki, H. (2003). On-line profit sharing works efficiently. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2773 PART 1, pp. 317–324). Springer Verlag. https://doi.org/10.1007/978-3-540-45224-9_45

On-line profit sharing works efficiently

Abstract

Cite

Register to see more suggestions