Deep Recurrent Policy Networks for Planning Under Partial Observability

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

QMDP-net is a recurrent network architecture that combines the features of model-free learning and model-based planning for planning under partial observability. The architecture represents a policy by connecting a partially observable Markov decision process (POMDP) model with the QMDP algorithm that uses value iteration to handle the POMDP model. However, as the value iteration used in QMDP iterates through the entire state space, it may suffer from the “curse of dimensionality”. Besides, as the policies based on the QMDP will not take actions to gain information, this may lead to bad policies in domains where information gathering is necessary. To address these two issues, this paper introduces two deep recurrent policy networks, asynchronous QMDP-net and ReplicatedQ-net, based on the plain QMDP-net. The former takes advantage of the idea of asynchronous update into the value iteration process of QMDP to learn a smaller abstract state space representation for planning. The latter partially replaces the QMDP with the replicated Q-learning algorithm to take informative actions. Experimental results demonstrate the proposed networks perform better than the plain QMDP-net on the robotic tasks in simulation.

Cite

CITATION STYLE

APA

Chen, Z., & Zhang, Z. (2019). Deep Recurrent Policy Networks for Planning Under Partial Observability. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11727 LNCS, pp. 598–610). Springer Verlag. https://doi.org/10.1007/978-3-030-30487-4_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free