In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.
CITATION STYLE
Gilbert, H., Weng, P., & Xu, Y. (2017). Optimizing quantiles in preference-based markov decision processes. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 3569–3575). AAAI press. https://doi.org/10.1609/aaai.v31i1.11026
Mendeley helps you to discover research relevant for your work.