In Reinforcement Learning (RL) the current state of the environment may not always be available. One approach to fix this could be to include the actions after the last-known state as a part of the state information, however, that leads to an increased state-space making the problem complex and slower in convergence. We propose an approach, where the delay in the knowledge of the state can be used, and the decisions are made to maximize the expected state-action value function. The proposed algorithm is an alternate approach where the state space is not enlarged, as compared to the case when there is no delay in the state update. Evaluations on the basic RL environments further illustrate the improved performance of the proposed algorithm.
CITATION STYLE
Agarwal, M., & Aggarwal, V. (2021). Blind Decision Making: Reinforcement Learning with Delayed Observations. In Proceedings International Conference on Automated Planning and Scheduling, ICAPS (Vol. 2021-August, pp. 2–6). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icaps.v31i1.15940
Mendeley helps you to discover research relevant for your work.