Preference-Based Reinforcement Learning Using Dyad Ranking

Dirk Schäfer; Eyke Hüllermeier

Conference Proceedings

Preference-Based Reinforcement Learning Using Dyad Ranking

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11198 LNAI 161-175

DOI: 10.1007/978-3-030-01771-2_11

2Citations

3Readers

Get full text

Abstract

Preference-based reinforcement learning has recently been introduced as a generalization of conventional reinformcement learning. Instead of numerical rewards, which are often difficult to specify, the former assumes weaker feedback in the form of qualitative preferences between states or trajectories. A specific realization of preference-based reinforcement learning is approximate policy iteration using label ranking. We propose an extension of this method, in which label ranking is replaced by so-called dyad ranking. The main advantage of this extension is the ability of dyad ranking to learn from feature descriptions of actions, which are often available in reinforcement learning. Several simulation studies are conducted to confirm the usefulness of the approach.

Cite

CITATION STYLE

APA

Schäfer, D., & Hüllermeier, E. (2018). Preference-Based Reinforcement Learning Using Dyad Ranking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11198 LNAI, pp. 161–175). Springer Verlag. https://doi.org/10.1007/978-3-030-01771-2_11

Preference-Based Reinforcement Learning Using Dyad Ranking

Abstract

Cite

Register to see more suggestions