Using rewards for belief state updates in partially observable markov decision processes

Masoumeh T. Izadi; Doina Precup

Conference ProceedingsOPEN ACCESS

Using rewards for belief state updates in partially observable markov decision processes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3720 LNAI 593-600

DOI: 10.1007/11564096_58

8Citations

24Readers

Abstract

Partially Observable Markov Decision Processes (POMDP) provide a standard framework for sequential decision making in stochastic environments. In this setting, an agent takes actions and receives observations and rewards from the environment. Many POMDP solution methods are based on computing a belief state, which is a probability distribution over possible states in which the agent could be. The action choice of the agent is then based on the belief state. The belief state is computed based on a model of the environment, and the history of actions and observations seen by the agent. However, reward information is not taken into account in updating the belief state. In this paper, we argue that rewards can carry useful information that can help disambiguate the hidden state. We present a method for updating the belief state which takes rewards into account. We present experiments with exact and approximate planning methods on several standard POMDP domains, using this belief update method, and show that it can provide advantages, both in terms of speed and in terms of the quality of the solution obtained. © Springer-Verlag Berlin Heidelberg 2005.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Izadi, M. T., & Precup, D. (2005). Using rewards for belief state updates in partially observable markov decision processes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3720 LNAI, pp. 593–600). https://doi.org/10.1007/11564096_58

Readers' Seniority

PhD / Post grad / Masters / Doc 11

61%

Researcher 4

22%

Professor / Associate Prof. 2

11%

Lecturer / Post doc 1

Readers' Discipline

Computer Science 9

56%

Engineering 5

31%

Decision Sciences 1

Physics and Astronomy 1

Using rewards for belief state updates in partially observable markov decision processes

Abstract

References Powered by Scopus

Value-Function Approximations for Partially Observable Markov Decision Processes

Equivalence notions and model minimization in Markov decision processes

Cited by Powered by Scopus

Artificial virtuous agents in a multi-agent tragedy of the commons

Learning reward machines: A study in partially observable reinforcement learning

Artificial Intelligence inspired methods for the allocation of common goods and services

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline