Reinforcement learning constructs knowledge containing state-to-action decision rules from agent's experiences. Most of reinforcement learning methods are action-value estimation methods which estimate the true values of state-action pairs and derive the optimal policy from the value estimates. However, these methods have a serious drawback that they stray when the values for the "opposite" actions, such as moving left and moving right, are equal. This paper describes the basic mechanism of on-line profit-sharing (OnPS) which is an action-preference learning method. The main contribution of this paper is to show the equivalence of off-line and on-line in profit sharing. We also show a typical benchmark example for comparison between OnPS and Q-learning.
CITATION STYLE
Matsui, T., Inuzuka, N., & Seki, H. (2003). On-line profit sharing works efficiently. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2773 PART 1, pp. 317–324). Springer Verlag. https://doi.org/10.1007/978-3-540-45224-9_45
Mendeley helps you to discover research relevant for your work.