信息科學與工程學院學術報告預告： Model-Free Finite Horizon H-Infinity Control: An Off-Policy Approach with Double Minimax Q-learning

來源：信息科學與工程學院 2025-08-25 12:14 瀏覽：次

報告題目： Model-Free Finite Horizon H-Infinity Control: An Off-Policy Approach with Double Minimax Q-learning

報告人：Wen Yu （余文）

報告時間：2025年8月28日 09:30-10:30

報告地點：蓮花街校區惟學樓228會議室

報告人簡介：

Wen Yu（余文），墨西哥科學院院士，墨西哥國立理工學院全職教授，入選前2%高被引科學家，1990年獲得清華大學自動化專業學士學位，1992年和1995年分別獲得東北大學自動控制專業碩士和博士學位。1995年至1996年期間，在東北大學自動控制系擔任講師。自1996年起，就職于墨西哥國立理工學院。2002年至2003年，在墨西哥石油研究所擔任研究職務。2006年至2007年，他作為高級訪問研究員在英國貝爾法斯特女王大學工作；2009年至2010年，他在美國加利福尼亞州圣克魯茲的加州大學擔任訪問副教授。自2006年以來，他還擔任東北大學的訪問教授。

已發表500余篇學術論文，其中包括200余篇期刊論文，并出版了8部專著。指導了38篇博士論文和40篇碩士論文。根據Google Scholar統計，學術成果已被引用超過12,000次，H指數為52。曾擔任IEEE旗艦年會SSCI 2023的大會主席。還曾擔任《IEEE Transactions on Cybernetics》《IEEE Transactions on Neural Networks and Learning Systems》《Neurocomputing》《Scientific Reports》《Intelligence & Robotics》等期刊的副編輯。

報告內容簡介：

Finite horizon H-infinity control is essential for robust system design, particularly when guaranteed system performance is required over a specific time interval. Despite offering practical benefits over its infinite horizon counterpart, these model-based frameworks present complexities, notably the time-varying nature of the Difference Riccati Equation (DRE), which significantly complicates solutions for systems with unknown dynamics. This paper proposes a novel model-free method by leveraging off-policy reinforcement learning (RL), known for its superior data efficiency and flexibility compared to traditional on-policy methods prevalent in model-free H-infinity control literature. Recognizing the unique challenges of off-policy RL within the inherent minimax optimization problem of H-infinity control, we propose the Neural Network-based Double Minimax Q-learning (NN-DMQ) algorithm. This algorithm is specifically designed to handle the adversarial interaction between the controller and the worst-case disturbance, while also mitigating the bias introduced by Q-value overestimation, which can destabilize learning. A key theoretical contribution of this work is a rigorous convergence proof of the proposed Double Minimax Q-learning (DMQ) algorithm. This proof provides strong guarantees for the algorithm's stability and capability to learn the optimal finite-horizon robust control and worst-case disturbance policies. Extensive were performed to verify the effectiveness and robustness of our approach, illustrating its real-world implementation in challenging real-world control problems with unknown dynamics.

歡迎廣大師生參加！

信息科學與工程學院

2025年8月25日

（責任編輯：李翰）

演講者	Wen Yu （余文）	演講時間	2025年8月28日 09:30-10:30
地點	蓮花街校區惟學樓228會議室	分類
職位		攝影
審核		審校
主要負責		聯系學院
事記時間