Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control----Institute of Automation

Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control

Oct 30, 2017Author：

PrintText Size A A

Title: Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control

Authors: Luo, BA; Liu, DR; Wu, HN; Wang, D; Lewis, FL

Author Full Names: Luo, Biao; Liu, Derong; Wu, Huai-Ning; Wang, Ding; Lewis, Frank L.

Source: IEEE TRANSACTIONS ON CYBERNETICS, 47 (10):3341-3354; SI 10.1109/TCYB.2016.2623859 OCT 2017

Language: English

Abstract: The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q-function sequence converges to the optimal Q-function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q-function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.

ISSN: 2168-2267

eISSN: 2168-2275

IDS Number: FF9BM

Unique ID: WOS:000409311800032

*Click Here to View Full Record

Journals & Publications

Papers