Training for Implicit Norms in Deep Reinforcement Learning Agents through Adversarial Multi-Objective Reward Optimization

Markus Peschl

Conference ProceedingsOPEN ACCESS

Training for Implicit Norms in Deep Reinforcement Learning Agents through Adversarial Multi-Objective Reward Optimization

Peschl M

AIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (2021) 275-276

DOI: 10.1145/3461702.3462473

5Citations

13Readers

Get full text

Abstract

We propose a deep reinforcement learning algorithm that employs an adversarial training strategy for adhering to implicit human norms alongside optimizing for a narrow goal objective. Previous methods which incorporate human values into reinforcement learning algorithms either scale poorly or assume hand-crafted state features. Our algorithm drops these assumptions and is able to automatically infer norms from human demonstrations, which allows for integrating it into existing agents in the form of multi-objective optimization. We benchmark our approach in a search-and-rescue grid world and show that, conditioned on respecting human norms, our agent maintains optimal performance with respect to the predefined goal.

Author supplied keywords

Cite

CITATION STYLE

APA

Peschl, M. (2021). Training for Implicit Norms in Deep Reinforcement Learning Agents through Adversarial Multi-Objective Reward Optimization. In AIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 275–276). Association for Computing Machinery, Inc. https://doi.org/10.1145/3461702.3462473

Training for Implicit Norms in Deep Reinforcement Learning Agents through Adversarial Multi-Objective Reward Optimization

Abstract

Author supplied keywords

Cite

Register to see more suggestions