In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.
CITATION STYLE
Dai, Z., Xie, Q., & Hovy, E. (2018). From credit assignment to entropy regularization: Two new algorithms for neural sequence prediction. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 1, pp. 1672–1682). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-1155
Mendeley helps you to discover research relevant for your work.