Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments

5Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Reinforcement learning (RL) agents empowered by deep neural networks have been considered a feasible solution to automate control functions in a cyber-physical system. In this work, we consider an RL-based agent and address the issue of learning via continual interaction with a time-varying dynamic system modeled as a non-stationary Markov decision process (MDP). We view such a non-stationary MDP as a time series of conventional MDPs that can be parameterized by hidden variables. To infer the hidden parameters, we present a task decomposition method that exploits CycleGAN-based structure learning. This method enables the separation of time-variant tasks from a non-stationary MDP, establishing the task decomposition embedding specific to time-varying information. To mitigate the adverse effect due to inherent noises of task embedding, we also leverage continual learning on sequential tasks by adapting the orthogonal gradient descent scheme with a sliding window. Through various experiments, we demonstrate that our approach renders the RL agent adaptable to time-varying dynamic environment conditions, outperforming other methods including state-ofthe-art non-stationary MDP algorithms.

Cite

CITATION STYLE

APA

Woo, H., Yoo, G., & Yoo, M. (2022). Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 8657–8665). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i8.20844

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free