Hindsight optimization for hybrid state and action MDPs

Aswin Raghavan; Scott Sanner; Roni Khardon; Prasad Tadepalli; Alan Fern

Conference ProceedingsOPEN ACCESS

Hindsight optimization for hybrid state and action MDPs

31st AAAI Conference on Artificial Intelligence, AAAI 2017 (2017) 3790-3796

DOI: 10.1609/aaai.v31i1.11056

11Citations

17Readers

Abstract

Hybrid (mixed discrete and continuous) state and action Markov Decision Processes (HSA-MDPs) provide an expressive formalism for modeling stochastic and concurrent sequential decision-making problems. Existing solvers for HSA-MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs.

Cite

CITATION STYLE

APA

Raghavan, A., Sanner, S., Khardon, R., Tadepalli, P., & Fern, A. (2017). Hindsight optimization for hybrid state and action MDPs. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 3790–3796). AAAI press. https://doi.org/10.1609/aaai.v31i1.11056

Hindsight optimization for hybrid state and action MDPs

Abstract

Cite

Register to see more suggestions