報告題目: Model-Free Finite Horizon H-Infinity Control: An Off-Policy Approach with Double Minimax Q-learning
報 告 人:Wen Yu (余文)
報告時間:2025年8月28日 09:30-10:30
報告地點:蓮花街校區惟學樓228會議室
報告人簡介:
Wen Yu(余文),墨西哥科學院院士,墨西哥國立理工學院全職教授,入選前2%高被引科學家,1990年獲得清華大學自動化專業學士學位,1992年和1995年分別獲得東北大學自動控制專業碩士和博士學位。1995年至1996年期間,在東北大學自動控制系擔任講師。自1996年起,就職于墨西哥國立理工學院。2002年至2003年,在墨西哥石油研究所擔任研究職務。2006年至2007年,他作為高級訪問研究員在英國貝爾法斯特女王大學工作;2009年至2010年,他在美國加利福尼亞州圣克魯茲的加州大學擔任訪問副教授。自2006年以來,他還擔任東北大學的訪問教授。
已發表500余篇學術論文,其中包括200余篇期刊論文,并出版了8部專著。指導了38篇博士論文和40篇碩士論文。根據Google Scholar統計,學術成果已被引用超過12,000次,H指數為52。曾擔任IEEE旗艦年會SSCI 2023的大會主席。還曾擔任《IEEE Transactions on Cybernetics》《IEEE Transactions on Neural Networks and Learning Systems》《Neurocomputing》《Scientific Reports》《Intelligence & Robotics》等期刊的副編輯。
報告內容簡介:
Finite horizon H-infinity control is essential for robust system design, particularly when guaranteed system performance is required over a specific time interval. Despite offering practical benefits over its infinite horizon counterpart, these model-based frameworks present complexities, notably the time-varying nature of the Difference Riccati Equation (DRE), which significantly complicates solutions for systems with unknown dynamics. This paper proposes a novel model-free method by leveraging off-policy reinforcement learning (RL), known for its superior data efficiency and flexibility compared to traditional on-policy methods prevalent in model-free H-infinity control literature. Recognizing the unique challenges of off-policy RL within the inherent minimax optimization problem of H-infinity control, we propose the Neural Network-based Double Minimax Q-learning (NN-DMQ) algorithm. This algorithm is specifically designed to handle the adversarial interaction between the controller and the worst-case disturbance, while also mitigating the bias introduced by Q-value overestimation, which can destabilize learning. A key theoretical contribution of this work is a rigorous convergence proof of the proposed Double Minimax Q-learning (DMQ) algorithm. This proof provides strong guarantees for the algorithm's stability and capability to learn the optimal finite-horizon robust control and worst-case disturbance policies. Extensive were performed to verify the effectiveness and robustness of our approach, illustrating its real-world implementation in challenging real-world control problems with unknown dynamics.
歡迎廣大師生參加!
信息科學與工程學院
2025年8月25日