A brief survey of parametric value function approximation

  • Geist M
  • Pietquin O
  • 7

    Readers

    Mendeley users who have this article in their library.
  • N/A

    Citations

    Citations of this article.

Abstract

Reinforcement learning is a machine learning answer to the optimal
control problem. It

consists in learning an optimal control policy through interactions
with the system to be

controlled, the quality of this policy being quantified by the so-called
value function. An

important subtopic of reinforcement learning is to compute an approximation
of this value

function when the system is too large for an exact representation.
This survey reviews

state of the art methods for (parametric) value function approximation
by grouping them

into three main categories: bootstrapping, residuals and projected
fixed-point approaches.

Related algorithms are derived by considering one of the associated
cost functions and

a specific way to minimize it, almost always a stochastic gradient
descent or a recursive

least-squares approach.

Author-supplied keywords

  • reinforcement learning
  • survey
  • value function approximation

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

There are no full text links

Authors

  • Matthieu Geist

  • Olivier Pietquin

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free