Exploration-exploitation strategies in deep q-networks applied to route-finding problems

Pengyuan Wei

Conference ProceedingsOPEN ACCESS

Exploration-exploitation strategies in deep q-networks applied to route-finding problems

Wei P

Journal of Physics: Conference Series (2020) 1684(1)

DOI: 10.1088/1742-6596/1684/1/012073

2Citations

8Readers

Abstract

Reinforcement learning is a class of algorithm that allows computers to learn how to accumulate rewards effectively in the environment and ultimately get excellent results. Among them, the exploration-exploitation tradeoff is a very important concept, since a good strategy can improve learning speed and final total reward. In this work, we applied DQN algorithm with different exploration-exploitation strategies to solve traditional route-finding problems. The experimental results show that the epsilon greedy strategy with a parabolic drop in epsilon value over reward improvement is the best, while it is not satisfactory after incorporating the softmax function. We hypothesized that the simplicity of the maze we use in this work in which the agent attempts to find the shortest path leads to the inadequacy of applying softmax to further encourage exploration. Future work thus involves experimenting with mazes at different scales and complexities and observing which exploration-exploitation strategies work best in each condition.

Cite

CITATION STYLE

APA

Wei, P. (2020). Exploration-exploitation strategies in deep q-networks applied to route-finding problems. In Journal of Physics: Conference Series (Vol. 1684). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1684/1/012073

Exploration-exploitation strategies in deep q-networks applied to route-finding problems

Abstract

Cite

Register to see more suggestions