Deep Recurrent Policy Networks for Planning Under Partial Observability

Zixuan Chen; Zongzhang Zhang

Conference Proceedings

Deep Recurrent Policy Networks for Planning Under Partial Observability

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11727 LNCS 598-610

DOI: 10.1007/978-3-030-30487-4_46

0Citations

2Readers

Get full text

Abstract

QMDP-net is a recurrent network architecture that combines the features of model-free learning and model-based planning for planning under partial observability. The architecture represents a policy by connecting a partially observable Markov decision process (POMDP) model with the QMDP algorithm that uses value iteration to handle the POMDP model. However, as the value iteration used in QMDP iterates through the entire state space, it may suffer from the “curse of dimensionality”. Besides, as the policies based on the QMDP will not take actions to gain information, this may lead to bad policies in domains where information gathering is necessary. To address these two issues, this paper introduces two deep recurrent policy networks, asynchronous QMDP-net and ReplicatedQ-net, based on the plain QMDP-net. The former takes advantage of the idea of asynchronous update into the value iteration process of QMDP to learn a smaller abstract state space representation for planning. The latter partially replaces the QMDP with the replicated Q-learning algorithm to take informative actions. Experimental results demonstrate the proposed networks perform better than the plain QMDP-net on the robotic tasks in simulation.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Z., & Zhang, Z. (2019). Deep Recurrent Policy Networks for Planning Under Partial Observability. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11727 LNCS, pp. 598–610). Springer Verlag. https://doi.org/10.1007/978-3-030-30487-4_46

Deep Recurrent Policy Networks for Planning Under Partial Observability

Abstract

Author supplied keywords

Cite

Register to see more suggestions