PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Preference-based reinforcement learning (RL) has emerged as a new field in robot learning, where humans play a pivotal role in shaping robot behavior by expressing preferences on different sequences of state-action pairs. However, formulating realistic policies for robots demands responses from humans to an extensive array of queries. In this work, we approach the sample-efficiency challenge by expanding the information collected per query to contain both preferences and optional text prompting. To accomplish this, we leverage the zero-shot capabilities of a large language model (LLM) to reason from the text provided by humans. To accommodate the additional query information, we reformulate the reward learning objectives to contain flexible highlights - state-action pairs that contain relatively high information and are related to the features processed in a zero-shot fashion from a pretrained LLM. In both a simulated scenario and a user study, we reveal the effectiveness of our work by analyzing the feedback and its implications. Additionally, the collective feedback collected serves to train a robot on socially compliant trajectories in a simulated social navigation landscape. We provide video examples of the trained policies at https://sites.google.com/view/rl-predilect.

References Powered by Scopus

Social force model for pedestrian dynamics

5376Citations
N/AReaders
Get full text

Self-training with noisy student improves imagenet classification

1678Citations
N/AReaders
Get full text

Interactively shaping agents via human reinforcement: The TAMER framework

355Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Shielding for Socially Appropriate Robot Listening Behaviors

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Holk, S., Marta, D., & Leite, I. (2024). PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning. In ACM/IEEE International Conference on Human-Robot Interaction (pp. 259–268). IEEE Computer Society. https://doi.org/10.1145/3610977.3634970

Readers over time

‘24‘25036912

Readers' Seniority

Tooltip

Lecturer / Post doc 1

50%

PhD / Post grad / Masters / Doc 1

50%

Readers' Discipline

Tooltip

Computer Science 1

33%

Business, Management and Accounting 1

33%

Social Sciences 1

33%

Save time finding and organizing research with Mendeley

Sign up for free
0