Value Iteration Algorithm for Optimal Consensus Control of Multi-agent Systems

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we investigate the optimal consensus control problem for the multi-agent systems by utilizing the Heuristic Dynamic Programming (HDP) algorithm under the centralized learning and decentralized execution framework, which is a kind of value iteration algorithms in reinforcement learning. Different from independent learning framework, a centralized value function which is shared for all the agents is defined. To approach the Nash equilibrium, we prove the equivalence relationship between the Bellman optimality equation and the discrete-time Hamilton-Jacobi-Bellman (DTHJB) equation. For the implementation purpose, the actor-critic structure with NN approximators is proposed to approach the solution of DTHJB equation, where the critic network for all the agents is centralized using the global information, and each actor network for the corresponding agent is decentralized using the local information. Finally, the simulation results are provided, which demonstrates the effectiveness of the proposed HDP algorithm under the centralized learning and decentralized execution framework.

Cite

CITATION STYLE

APA

Zhang, Q., & Zhao, D. (2018). Value Iteration Algorithm for Optimal Consensus Control of Multi-agent Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11307 LNCS, pp. 200–208). Springer Verlag. https://doi.org/10.1007/978-3-030-04239-4_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free