A Sample-Efficient Black-Box Optimizer to Train Policies for Human-in-the-Loop Systems with User Preferences

Nitish Thatte; Helei Duan; Hartmut Geyer

Journal ArticleOPEN ACCESS

A Sample-Efficient Black-Box Optimizer to Train Policies for Human-in-the-Loop Systems with User Preferences

IEEE Robotics and Automation Letters (2017) 2(2) 993-1000

DOI: 10.1109/LRA.2017.2656948

24Citations

111Readers

Get full text

Abstract

We present a new algorithm for optimizing control policies for human-in-the-loop systems based on qualitative preference feedback. This method is especially applicable to systems such as lower limb prostheses and exoskeletons for which it is difficult to define an objective function, hard to identify a model, and costly to repeat hardware experiments. To solve these problems, we combine and extend an algorithm for learning from preferences and the Predictive Entropy Search Bayesian optimization method. The resulting algorithm, Predictive Entropy Search with Preferences (PES-P), solicits preferences between pairs of control parameter sets that optimally reduce the uncertainty in the distribution of objective function optima with the least number of experiments. We find that this algorithm outperforms the expected improvement method (EI), and random comparisons via Latin hypercubes (LH) in three simulation tests that range from optimizing randomly generated functions to tuning control parameters of linear systems and of a walking model. Furthermore, we find in a pilot study on the control of a robotic transfemoral prosthesis that PES-P finds good control parameters quickly and more consistently than EI or LH given real user preferences. The results suggest the proposed algorithm can help engineers optimize certain robotic systems more accurately, efficiently, and consistently.

Author supplied keywords

Cite

CITATION STYLE

APA

Thatte, N., Duan, H., & Geyer, H. (2017). A Sample-Efficient Black-Box Optimizer to Train Policies for Human-in-the-Loop Systems with User Preferences. IEEE Robotics and Automation Letters, 2(2), 993–1000. https://doi.org/10.1109/LRA.2017.2656948

A Sample-Efficient Black-Box Optimizer to Train Policies for Human-in-the-Loop Systems with User Preferences

Abstract

Author supplied keywords

Cite

Register to see more suggestions