What does a policy network learn after mastering a pong game?

Somnuk Phon-Amnuaisuk

Conference Proceedings

What does a policy network learn after mastering a pong game?

Phon-Amnuaisuk S

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10607 LNAI 213-222

DOI: 10.1007/978-3-319-69456-6_18

0Citations

3Readers

Get full text

Abstract

Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following quantities: state values V, state-action values Q, and policy π. Due to high computational cost, the reinforcement learning problem is commonly formulated for learning task specific representations with hand-crafted input features. In this report, we discuss an alternative end-to-end approach where the RL attempts to learn general task representations, in this context, learning how to play the Pong game from a sequence of screen snap shots. We apply artificial neural networks to approximate a policy of a reinforcement learning model. The policy network learns to play the game from a sequence of frames without any extra semantics apart from the pixel information and the score. Many games are simulated using different network architectures and different parameters settings. We examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the RL has successfully learned to play the game. Insights into the internal learning mechanisms and future research directions are discussed.

Author supplied keywords

Cite

CITATION STYLE

APA

Phon-Amnuaisuk, S. (2017). What does a policy network learn after mastering a pong game? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10607 LNAI, pp. 213–222). Springer Verlag. https://doi.org/10.1007/978-3-319-69456-6_18

What does a policy network learn after mastering a pong game?

Abstract

Author supplied keywords

Cite

Register to see more suggestions