Policy Gradient Planning for Environmental Decision Making with Existing Simulators

  • Crowley M
  • Poole D
  • 6


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


In environmental and natural resource planning do-mains actions are taken at a large number of locations over multiple time periods. These problems have enor-mous state and action spaces, spatial correlation be-tween actions, uncertainty and complex utility models. We present an approach for modeling these planning problems as factored Markov decision processes. The reward model can contain local and global components as well as spatial constraints between locations. The transition dynamics can be provided by existing simula-tors developed by domain experts. We propose a land-scape policy defined as the equilibrium distribution of a Markov chain built from many locally-parameterized policies. This policy is optimized using a policy gra-dient algorithm. Experiments using a forestry simulator demonstrate the algorithm's ability to devise policies for sustainable harvest planning of a forest.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

There are no full text links


  • Mark Crowley

  • David Poole

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free