BIMSA

Modern control theory

This course explores the deep connections between optimal control and reinforcement learning, bridging classical techniques (Dynamic Programming, LQR, MPC) with modern data-driven methods (Q-Learning, Policy Gradient, Deep RL). Students will learn: Mathematical foundations (Bellman equations, value/policy iteration); Optimal control (LQR, LQG, Model Predictive Control); Approximate DP & RL (Monte Carlo, TD Learning, Actor-Critic methods); Applications in robotics, autonomous systems, and finance. The course balances theory (convergence, stability) and implementation (Python examples).

讲师

焦小沛

日期

2025年09月03日至 2026年01月07日

位置

Weekday	Time	Venue	Online	ID	Password
周三	14:20 - 16:55	A3-1-101	ZOOM 03	242 742 6089	BIMSA

修课要求

Linear algebra, probabilistic theory, calculus, optimization

课程大纲

Foundations of Optimal Control & Exact DP
1: Introduction to Dynamic Programming
2: Deterministic Continuous-Time Optimal Control
3: Stochastic DP and the LQG Problem
4: Model Predictive Control (MPC)
5: Infinite Horizon Problems
6: Shortest Path Problems & Computational Methods

Approximate DP & RL Basics
7: Approximate Value Iteration
8: Monte Carlo & Temporal Difference Learning
9: Policy Gradient Methods
10: Approximate Policy Iteration

Advanced Topics
11: Robust DP and H infinity Control
12: Multiagent RL and Games
13: Inverse Reinforcement Learning
14: Deep Reinforcement Learning

参考资料

Bertsekas, D. P. (2019). Reinforcement Learning and Optimal Control. Athena Scientific.
Sutton & Barto, Reinforcement Learning: An Introduction
Bruno C. da Silva, Reinforcement Learning Lectures Notes (Fall 2022)

听众

Undergraduate , Advanced Undergraduate , Graduate , 博士后 , Researcher

视频公开

不公开

笔记公开

不公开

语言

中文 , 英文

讲师介绍

焦小沛，于2017年本科毕业于上海交通大学致远学院（物理班），2022年博士毕业于清华大学数学科学系，师从丘成栋教授（IEEE fellow，前美国伊利诺伊大学芝加哥分校终身教授）。先后在北京雁栖湖应用数学研究院，荷兰特文特大学从事博士后工作（导师Johannes Schmidt-Hieber教授，国际数理统计学会会士）。现研究方向包括控制理论，随机滤波，数值微分方程，物理信息深度学习。获得2025年国家青年科学基金[C类]资助。