Influence-aware memory architectures for deep reinforcement learning in POMDPs

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.

References Powered by Scopus

Long Short-Term Memory

76955Citations
N/AReaders
Get full text

Human-level control through deep reinforcement learning

22583Citations
N/AReaders
Get full text

Learning phrase representations using RNN encoder-decoder for statistical machine translation

11652Citations
N/AReaders
Get full text

Cited by Powered by Scopus

AGRCNet: communicate by attentional graph relations in multi-agent reinforcement learning for traffic signal control

5Citations
N/AReaders
Get full text

Memory-extraction-based DRL cooperative guidance against the maneuvering target protected by interceptors

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Suau, M., He, J., Congeduti, E., Starre, R. A. N., Czechowski, A., & Oliehoek, F. A. (2022). Influence-aware memory architectures for deep reinforcement learning in POMDPs. Neural Computing and Applications. https://doi.org/10.1007/s00521-022-07691-7

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

100%

Readers' Discipline

Tooltip

Computer Science 3

75%

Engineering 1

25%

Save time finding and organizing research with Mendeley

Sign up for free