Intelligent traffic signal control (TSC) is essential for transportation efficiency in modern road networks. There is an emerging trend of using deep reinforcement learning techniques to train TSC models in simulators for reducing trial-and-error in real-world scenarios, and recent studies have shown promising results. The target of TSC is to minimize the average travel time of a given area. However, it is impractical to directly optimize the target by setting the average travel time as the reward function due to its nature of feedback latency and difficulty on credit assignment. Existing methods often define the reward function in a heuristic way, which may cause a biased optimization on the real target as they only optimize the accumulative reward. In this work, we propose PlanLight, a novel planning-based TSC algorithm that learns from the demonstration of rollout algorithm, which obtains suboptimal control on the given target, through behavior cloning. We show the effectiveness and efficiency of the rollout algorithm in the multi-intersection control scenario. Moreover, we achieve further policy optimization by improving the base policy in the rollout procedure iteratively. Through comprehensive experiments, we demonstrate that PlanLight outperforms both conventional transportation approaches and existing learning-based methods in various sizes of traffic datasets. Furthermore, we empirically show the potential of PlanLight to be a general algorithm to obtain improvement on future state-of-the-art TSC methods.
CITATION STYLE
Zhang, H., Kafouros, M., & Yu, Y. (2020). PlanLight: Learning to Optimize Traffic Signal Control with Planning and Iterative Policy Improvement. IEEE Access, 8, 219244–219255. https://doi.org/10.1109/ACCESS.2020.3041441
Mendeley helps you to discover research relevant for your work.