Hindsight optimization for hybrid state and action MDPs

11Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Hybrid (mixed discrete and continuous) state and action Markov Decision Processes (HSA-MDPs) provide an expressive formalism for modeling stochastic and concurrent sequential decision-making problems. Existing solvers for HSA-MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs.

Cite

CITATION STYLE

APA

Raghavan, A., Sanner, S., Khardon, R., Tadepalli, P., & Fern, A. (2017). Hindsight optimization for hybrid state and action MDPs. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 3790–3796). AAAI press. https://doi.org/10.1609/aaai.v31i1.11056

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free