The explore-exploit dilemma in nonstationary decision making under uncertainty

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

It is often assumed that autonomous systems are operating in environments that may be described by a stationary (time-invariant) environment. However, real-world environments are often nonstationary (time-varying), where the underlying phenomena changes in time, so stationary approximations of the nonstationary environment may quickly lose relevance. Here, two approaches are presented and applied in the context of reinforcement learning in nonstationary environments. In Sect. 2.2, the first approach leverages reinforcement learning in the presence of a changing reward-model. In particular, a functional termed the Fog-of-War is used to drive exploration which results in the timely discovery of new models in nonstationary environments. In Sect. 2.3, the Fog-of-War functional is adapted in real-time to reflect the heterogeneous information content of a real-world environment; this is critically important for the use of the approach in Sect. 2.2 in realworld environments.

Cite

CITATION STYLE

APA

Axelrod, A., & Chowdhary, G. (2015). The explore-exploit dilemma in nonstationary decision making under uncertainty. In Studies in Systems, Decision and Control (Vol. 42, pp. 29–52). Springer International Publishing. https://doi.org/10.1007/978-3-319-26327-4_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free