Reinforcement learning is generally performed in the Markov decision processes (MDPs). However, the agent might not be able to correctly observe the environment because of the perception ability of the sensor. This is called the partially observable Markov decision process (POMDP). In a POMDP environment, an agent may observe the same information at more than one state. HQ-learning and episode-based profit sharing (EPS) are well-known methods for solving this problem. HQ-learning divides a POMDP environment into subtasks. EPS distributes the same reward to state-action pairs in the episode when an agent achieves a goal. However, these methods have disadvantages related to the learning efficiency and localized solutions. In this paper, we propose a hybrid learning method that combines profit sharing and genetic algorithm. We also report the effectiveness of our method by using some experiments with partially observable mazes.
CITATION STYLE
Suzuki, K., & Kato, S. (2018). Hybrid learning using profit sharing and genetic algorithm for partially observable markov decision processes. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 7, pp. 463–475). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-65521-5_40
Mendeley helps you to discover research relevant for your work.