An Improved On-Policy Reinforcement Learning Algorithm

Moirangthem Tiken Singh; Aninda Chakrabarty; Bhargab Sarma; Sourav Dutta

Conference Proceedings

An Improved On-Policy Reinforcement Learning Algorithm

Advances in Intelligent Systems and Computing (2021) 1248 321-330

DOI: 10.1007/978-981-15-7394-1_30

1Citations

2Readers

Get full text

Abstract

The paper aims to find the paths for a mobile agent over a stochastic environment. The stochastic environment is chosen to mimic the real world as the agent’s actions do not uniquely determine the outcome. We placed the agent in the initial state and allowed to traverse each state, which has different rewards function with the goal to maximize its reward. Various algorithms, viz. SARSA and Q-Learning, etc., are used by many scholars to evaluate the path for the agent. Here, a reinforced learning algorithm is proposed in collaborating with the idea of information theory. The motive was to make the algorithm explore and learn. After that, it decides the best possible action to take based on policy that it learned. The learning rate of the algorithm is kept varying to check the performance of the proposed algorithm.

Author supplied keywords

Cite

CITATION STYLE

APA

Singh, M. T., Chakrabarty, A., Sarma, B., & Dutta, S. (2021). An Improved On-Policy Reinforcement Learning Algorithm. In Advances in Intelligent Systems and Computing (Vol. 1248, pp. 321–330). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-7394-1_30

An Improved On-Policy Reinforcement Learning Algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions