Preference-based reinforcement learning has recently been introduced as a generalization of conventional reinformcement learning. Instead of numerical rewards, which are often difficult to specify, the former assumes weaker feedback in the form of qualitative preferences between states or trajectories. A specific realization of preference-based reinforcement learning is approximate policy iteration using label ranking. We propose an extension of this method, in which label ranking is replaced by so-called dyad ranking. The main advantage of this extension is the ability of dyad ranking to learn from feature descriptions of actions, which are often available in reinforcement learning. Several simulation studies are conducted to confirm the usefulness of the approach.
CITATION STYLE
Schäfer, D., & Hüllermeier, E. (2018). Preference-Based Reinforcement Learning Using Dyad Ranking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11198 LNAI, pp. 161–175). Springer Verlag. https://doi.org/10.1007/978-3-030-01771-2_11
Mendeley helps you to discover research relevant for your work.