PG3: Policy-Guided Planning for Generalized Policy Generation

3Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions - policy evaluation and plan comparison - and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generalization (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.

Cite

CITATION STYLE

APA

Yang, R., Silver, T., Curtis, A., Lozano-Perez, T., & Kaelbling, L. (2022). PG3: Policy-Guided Planning for Generalized Policy Generation. In IJCAI International Joint Conference on Artificial Intelligence (pp. 4686–4692). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/650

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free