What does a policy network learn after mastering a pong game?

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following quantities: state values V, state-action values Q, and policy π. Due to high computational cost, the reinforcement learning problem is commonly formulated for learning task specific representations with hand-crafted input features. In this report, we discuss an alternative end-to-end approach where the RL attempts to learn general task representations, in this context, learning how to play the Pong game from a sequence of screen snap shots. We apply artificial neural networks to approximate a policy of a reinforcement learning model. The policy network learns to play the game from a sequence of frames without any extra semantics apart from the pixel information and the score. Many games are simulated using different network architectures and different parameters settings. We examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the RL has successfully learned to play the game. Insights into the internal learning mechanisms and future research directions are discussed.

Cite

CITATION STYLE

APA

Phon-Amnuaisuk, S. (2017). What does a policy network learn after mastering a pong game? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10607 LNAI, pp. 213–222). Springer Verlag. https://doi.org/10.1007/978-3-319-69456-6_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free