Hybrid learning using profit sharing and genetic algorithm for partially observable markov decision processes

Kohei Suzuki; Shohei Kato

Book Chapter

Hybrid learning using profit sharing and genetic algorithm for partially observable markov decision processes

Springer Science and Business Media Deutschland GmbH, (2018), 463-475

DOI: 10.1007/978-3-319-65521-5_40

1Citations

3Readers

Get full text

Abstract

Reinforcement learning is generally performed in the Markov decision processes (MDPs). However, the agent might not be able to correctly observe the environment because of the perception ability of the sensor. This is called the partially observable Markov decision process (POMDP). In a POMDP environment, an agent may observe the same information at more than one state. HQ-learning and episode-based profit sharing (EPS) are well-known methods for solving this problem. HQ-learning divides a POMDP environment into subtasks. EPS distributes the same reward to state-action pairs in the episode when an agent achieves a goal. However, these methods have disadvantages related to the learning efficiency and localized solutions. In this paper, we propose a hybrid learning method that combines profit sharing and genetic algorithm. We also report the effectiveness of our method by using some experiments with partially observable mazes.

Author supplied keywords

Cite

CITATION STYLE

APA

Suzuki, K., & Kato, S. (2018). Hybrid learning using profit sharing and genetic algorithm for partially observable markov decision processes. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 7, pp. 463–475). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-65521-5_40

Hybrid learning using profit sharing and genetic algorithm for partially observable markov decision processes

Abstract

Author supplied keywords

Cite

Register to see more suggestions