Many reinforcement learning methods are based on a function Q(s,a) whose value is the discounted total reward expected after performing the action a in the state s. This paper explores the implications of representing the Q function as Q(s,a) = s T Wa, where W is a matrix that is learned. In this representation, both s and a are real-valued vectors that may have high dimension. We show that action selection can be done using standard linear programming, and that W can be learned using standard linear regression in the algorithm known as fitted Q iteration. Experimentally, the resulting method learns to solve the mountain car task in a sample-efficient way. The same method is also applicable to an inventory management task where the state space and the action space are continuous and high-dimensional. © 2012 Springer-Verlag.
CITATION STYLE
Elkan, C. (2012). Reinforcement learning with a bilinear Q function. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7188 LNAI, pp. 78–88). https://doi.org/10.1007/978-3-642-29946-9_11
Mendeley helps you to discover research relevant for your work.