Bias-Optimal Incremental Learning of Control Sequences for Virtual Robots
Abstract
Learning and planning control is hard. The search space of traditional planners consists of sequences of primitive actions. To exploit reusable subsequences and other algorithmic regularities, however, we should instead search the general space of programs that compute action sequences. Such programs may invoke very fast thinking actions consuming only nanoseconds (such as conditional jumps to certain code addresses) as well as very slow control actions consuming seconds in the real world (such as stretch-arm-until-obstacle-sensation). What is an optimal way of allocating time to tests of such non-homogeneous programs? What is an optimal way of reusing experience with previous tasks to learn solutions to new tasks? One answer is given by the recent Optimal Ordered Problem Solver OOPS, a near-bias-optimal incremental extension of Levin's nonincremental universal search, which we apply to virtual robotics for the first time: our snake robot uses OOPS to learn to walk and jump in a partially observable environment (POMDP) with a huge state/action space.
Author-supplied keywords
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime

