Abstract
A non-stationary Bayesian dynamic decision model with general state, action and parameter spaces is considered. It is shown that this model can be reduced to a non-Markovian decision model with completely known transition probabilities. Under rather weak convergence assumptions on the expected total rewards, some general results are presented concerning the restriction on deterministic generalized Markov policies, the criteria of optimality and the existence of Bayes policies.
Cite
CITATION STYLE
Rieder, U. (1975). BAYESIAN DYNAMIC PROGRAMMING. Advances in Applied Probability, 7(2), 330–348. https://doi.org/10.1017/S0001867800046012
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.