Importance Prioritized Policy Distillation

7Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Policy distillation (PD) has been widely studied in deep reinforcement learning (RL), while existing PD approaches assume that the demonstration data (i.e., state-action pairs in frames) in a decision making sequence is uniformly distributed. This may bring in unwanted bias since RL is a reward maximizing process instead of simple label matching. Given such an issue, we denote the frame importance as its contribution to the expected reward on a particular frame, and hypothesize that adapting such frame importance could benefit the performance of the distilled student policy. To verify our hypothesis, we analyze why and how frame importance matters in RL settings. Based on the analysis, we propose an importance prioritized PD framework that highlights the training on important frames, so as to learn efficiently. Particularly, the frame importance is measured by the reciprocal of weighted Shannon entropy from a teacher policy's action prescriptions. Experiments on Atari games and policy compression tasks show that capturing the frame importance significantly boosts the performance of the distilled policies.

Cite

CITATION STYLE

APA

Qu, X., Ong, Y. S., Gupta, A., Wei, P., Sun, Z., & Ma, Z. (2022). Importance Prioritized Policy Distillation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1420–1429). Association for Computing Machinery. https://doi.org/10.1145/3534678.3539266

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free