Preference-Based Reinforcement Learning Using Dyad Ranking

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Preference-based reinforcement learning has recently been introduced as a generalization of conventional reinformcement learning. Instead of numerical rewards, which are often difficult to specify, the former assumes weaker feedback in the form of qualitative preferences between states or trajectories. A specific realization of preference-based reinforcement learning is approximate policy iteration using label ranking. We propose an extension of this method, in which label ranking is replaced by so-called dyad ranking. The main advantage of this extension is the ability of dyad ranking to learn from feature descriptions of actions, which are often available in reinforcement learning. Several simulation studies are conducted to confirm the usefulness of the approach.

Cite

CITATION STYLE

APA

Schäfer, D., & Hüllermeier, E. (2018). Preference-Based Reinforcement Learning Using Dyad Ranking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11198 LNAI, pp. 161–175). Springer Verlag. https://doi.org/10.1007/978-3-030-01771-2_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free