LiFE: Deep Exploration via Linear-Feature Bonus in Continuous Control

Jiantao Qiu; Yu Wang

Journal ArticleOPEN ACCESS

LiFE: Deep Exploration via Linear-Feature Bonus in Continuous Control

Tsinghua Science and Technology (2023) 28(1) 155-166

DOI: 10.26599/TST.2021.9010063

3Citations

6Readers

Abstract

Reinforcement Learning (RL) algorithms work well with well-defined rewards, but they fail with sparse/deceptive rewards and require additional exploration strategies. This work introduces a deep exploration method based on the Upper Confidence Bound (UCB) bonus. The proposed method can be plugged into actor-critic algorithms that use deep neural networks as a critic. Based on the conclusion of the regret bound under the linear Markov decision process approximation, we use the feature matrix to calculate the UCB bonus for deep exploration. The proposed method is equivalent to the count-based exploration method in special cases and is suitable for general situations. Our method uses the last d-dimensional feature vector in the critic network and is easy to deploy. We design a simple task, 'swim', to demonstrate the principle of the proposed method to achieve exploration in sparse/deceptive reward environments. Then, we perform an empirical evaluation on sparse/deceptive reward version gym environments and Ackermann robot control tasks. The evaluation results verify that the proposed algorithm can perform effective deep explorations in sparse/deceptive reward tasks.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Qiu, J., & Wang, Y. (2023). LiFE: Deep Exploration via Linear-Feature Bonus in Continuous Control. Tsinghua Science and Technology, 28(1), 155–166. https://doi.org/10.26599/TST.2021.9010063

Readers over time

Readers' Seniority

Lecturer / Post doc 1

50%

PhD / Post grad / Masters / Doc 1

50%

Readers' Discipline

Medicine and Dentistry 1

33%

Computer Science 1

33%

Engineering 1

33%

LiFE: Deep Exploration via Linear-Feature Bonus in Continuous Control

Abstract

Author supplied keywords

References Powered by Scopus

Human-level control through deep reinforcement learning

Finite-time analysis of the multiarmed bandit problem

MuJoCo: A physics engine for model-based control

Cited by Powered by Scopus

An Energy Management Strategy Based on DDPG with Improved Exploration for Battery/Supercapacitor Hybrid Electric Vehicle

Multi-Objective Genetic Programming Assisted Stochastic Deep Reinforcement Learning for Dynamic Knowledge Integration in Transportation Networks

SG-MOACO: a semi-greedy multi-objective ACO method for edge server placement in mobile edge computing

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline