Hybrid learning using profit sharing and genetic algorithm for partially observable markov decision processes

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Reinforcement learning is generally performed in the Markov decision processes (MDPs). However, the agent might not be able to correctly observe the environment because of the perception ability of the sensor. This is called the partially observable Markov decision process (POMDP). In a POMDP environment, an agent may observe the same information at more than one state. HQ-learning and episode-based profit sharing (EPS) are well-known methods for solving this problem. HQ-learning divides a POMDP environment into subtasks. EPS distributes the same reward to state-action pairs in the episode when an agent achieves a goal. However, these methods have disadvantages related to the learning efficiency and localized solutions. In this paper, we propose a hybrid learning method that combines profit sharing and genetic algorithm. We also report the effectiveness of our method by using some experiments with partially observable mazes.

Cite

CITATION STYLE

APA

Suzuki, K., & Kato, S. (2018). Hybrid learning using profit sharing and genetic algorithm for partially observable markov decision processes. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 7, pp. 463–475). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-65521-5_40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free