Blind Decision Making: Reinforcement Learning with Delayed Observations

7Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

In Reinforcement Learning (RL) the current state of the environment may not always be available. One approach to fix this could be to include the actions after the last-known state as a part of the state information, however, that leads to an increased state-space making the problem complex and slower in convergence. We propose an approach, where the delay in the knowledge of the state can be used, and the decisions are made to maximize the expected state-action value function. The proposed algorithm is an alternate approach where the state space is not enlarged, as compared to the case when there is no delay in the state update. Evaluations on the basic RL environments further illustrate the improved performance of the proposed algorithm.

Cite

CITATION STYLE

APA

Agarwal, M., & Aggarwal, V. (2021). Blind Decision Making: Reinforcement Learning with Delayed Observations. In Proceedings International Conference on Automated Planning and Scheduling, ICAPS (Vol. 2021-August, pp. 2–6). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icaps.v31i1.15940

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free