Discovering strategy in navigation problem

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper explores ways to discover strategy from a state-action-state-reward log recorded during a reinforcement learning session. The term strategy here implies that we are interested not only in a one-step state-action but also a fruitful sequence of state-actions. Traditional RL has proved that it can successfully learn a good sequence of actions. However, it is often observed that some of the action sequences learned could be more effective. For example, an effective five-step navigation to the north direction can be achieved in thousands of ways if there are no other constraints since an agent could move in numerous tactics to achieve the same end result. Traditional RL such as value learning or state-action value learning does not directly address this issue. In this preliminary experiment, sets of state-action (i.e., a one-step policy) are extracted from 10,446 records, grouped together and then joined together forming a directed graph. This graph summarizes the policy learned by the agent. We argue that strategy could be extracted from the analysis of this graph network.

Cite

CITATION STYLE

APA

Haji Mohd Sani, N., Phon-Amnuaisuk, S., & Au, T. W. (2019). Discovering strategy in navigation problem. In Communications in Computer and Information Science (Vol. 1071, pp. 231–239). Springer Verlag. https://doi.org/10.1007/978-981-32-9563-6_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free