This paper explores ways to discover strategy from a state-action-state-reward log recorded during a reinforcement learning session. The term strategy here implies that we are interested not only in a one-step state-action but also a fruitful sequence of state-actions. Traditional RL has proved that it can successfully learn a good sequence of actions. However, it is often observed that some of the action sequences learned could be more effective. For example, an effective five-step navigation to the north direction can be achieved in thousands of ways if there are no other constraints since an agent could move in numerous tactics to achieve the same end result. Traditional RL such as value learning or state-action value learning does not directly address this issue. In this preliminary experiment, sets of state-action (i.e., a one-step policy) are extracted from 10,446 records, grouped together and then joined together forming a directed graph. This graph summarizes the policy learned by the agent. We argue that strategy could be extracted from the analysis of this graph network.
CITATION STYLE
Haji Mohd Sani, N., Phon-Amnuaisuk, S., & Au, T. W. (2019). Discovering strategy in navigation problem. In Communications in Computer and Information Science (Vol. 1071, pp. 231–239). Springer Verlag. https://doi.org/10.1007/978-981-32-9563-6_24
Mendeley helps you to discover research relevant for your work.