Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments

Honguk Woo; Gwangpyo Yoo; Minjong Yoo

Conference ProceedingsOPEN ACCESS

Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments

Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (2022) 36 8657-8665

DOI: 10.1609/aaai.v36i8.20844

5Citations

10Readers

Abstract

Reinforcement learning (RL) agents empowered by deep neural networks have been considered a feasible solution to automate control functions in a cyber-physical system. In this work, we consider an RL-based agent and address the issue of learning via continual interaction with a time-varying dynamic system modeled as a non-stationary Markov decision process (MDP). We view such a non-stationary MDP as a time series of conventional MDPs that can be parameterized by hidden variables. To infer the hidden parameters, we present a task decomposition method that exploits CycleGAN-based structure learning. This method enables the separation of time-variant tasks from a non-stationary MDP, establishing the task decomposition embedding specific to time-varying information. To mitigate the adverse effect due to inherent noises of task embedding, we also leverage continual learning on sequential tasks by adapting the orthogonal gradient descent scheme with a sliding window. Through various experiments, we demonstrate that our approach renders the RL agent adaptable to time-varying dynamic environment conditions, outperforming other methods including state-ofthe-art non-stationary MDP algorithms.

Cite

CITATION STYLE

APA

Woo, H., Yoo, G., & Yoo, M. (2022). Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 8657–8665). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i8.20844

Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments

Abstract

Cite

Register to see more suggestions