Using Bisimulation for Policy Transfer in MDPs

Pablo Samuel Castro; Doina Precup

Conference ProceedingsOPEN ACCESS

Using Bisimulation for Policy Transfer in MDPs

Proceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI 2010 (2010) 1065-1070

DOI: 10.1609/aaai.v24i1.7751

8Citations

40Readers

Abstract

Knowledge transfer has been suggested as a useful approach for solving large Markov Decision Processes. The main idea is to compute a decision-making policy in one environment and use it in a different environment, provided the two are”close enough”. In this paper, we use bisimulation-style metrics (Ferns et al., 2004) to guide knowledge transfer. We propose algorithms that decide what actions to transfer from the policy computed on a small MDP task to a large task, given the bisimulation distance between states in the two tasks. We demonstrate the inherent”pessimism” of bisimulation metrics and present variants of this metric aimed to overcome this pessimism, leading to improved action transfer. We also show that using this approach for transferring temporally extended actions (Sutton et al., 1999) is more successful than using it exclusively with primitive actions. We present theoretical guarantees on the quality of the transferred policy, as well as promising empirical results.

Cite

CITATION STYLE

APA

Castro, P. S., & Precup, D. (2010). Using Bisimulation for Policy Transfer in MDPs. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI 2010 (pp. 1065–1070). AAAI Press. https://doi.org/10.1609/aaai.v24i1.7751

Using Bisimulation for Policy Transfer in MDPs

Abstract

Cite

Register to see more suggestions