| Weekday | Time | Venue | Online | ID | Password |
|---|---|---|---|---|---|
| 周六,周日 | 09:00 - 18:00 | A6-101 | - | - | - |
| 时间\日期 | 03-28 周六 |
03-29 周日 |
|---|---|---|
| 09:00-09:30 | 郑喜印 | 文再文 |
| 09:30-10:00 | 张立卫 | 游科友 |
| 10:20-10:50 | 袁坤 | |
| 10:30-11:00 | 刘歆 | |
| 10:50-11:20 | 李安琪 | |
| 11:00-11:30 | 丁超 | |
| 11:20-11:50 | 孟德宇 | |
| 14:00-14:30 | 王晓 | |
| 14:30-15:00 | 温金明 | |
| 15:20-15:50 | 孙海琳 | |
| 15:50-16:20 | 徐姿 | |
| 16:20-16:50 | 梁经纬 |
*本页面所有时间均为北京时间(GMT+8)。
09:00-09:30 郑喜印
Weak Metric Regularity and Weak Aubin Property in Nonsmooth Optimization
Considering that the metric regularity and Aubin property in variational analysis are often too restrictive in applications, this paper introduces and studies the weak metric regularity and weak Aubin property. For optimization problem $\mathcal{P}_A(f)$ with a nonsmooth objective function $f$ and a closed constraint set $A$, it is proved that the weak Aubin property of the solution mapping $x^*\mapsto \text{Argmin}_{x\in A}f_{x^*}(x) := f(x)−\langle x^*,x\rangle$ for the tilt perturbed problems $\mathcal{P}_A(fx^*)$ implies its single valued property. In the convex case, it is proved that the KKT mapping of $\mathcal{P}_A(f)$ is weakly metrically regular at a reference point $(\bar{x},0)$ if and only if the solution mapping $x^*\mapsto \text{Argmin}_{x\in A}f_{x^*}(x)$ is single-valued and uniformly continuous at some neighborhood of $0$ if and only if tilt-perturbed problems $\mathcal{P}_A(f_{x^*})$ with $x^*$ in some neighborhood of $0$ are unformly well-posed solvable with respect to the same admissible function. Moreover the metric subregularity of the normal cone mapping to $A$ is characrerized by the well-posed solvability of constrained convex optimization problems on $A$ in a quantitative manner.
09:30-10:00 张立卫
A Quadratic-Approximation-Based Stochastic Approximation Method for Weakly Convex Stochastic Programming
We propose a novel stochastic approximation algorithm, termed PMQSopt, for solving weakly convex stochastic optimization problems defined by expectation-valued functions. The algorithm is constructed by integrating the proximal method of multipliers with quadratic approximations of the original stochastic problem. We analyze the sample complexity of PMQSopt in terms of the total number of stochastic gradient evaluations required. The convergence of the algorithm is characterized by three metrics corresponding to the $\epsilon$-KKT conditions: the average squared norm of the gradient of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation. For each of these metrics, we establish an expected convergence rate of $\mathcal{O}(T^{-1/4})$ after $T$ iterations. Moreover, we show that with probability at least $1-5n/ T^{2/3}$, the gradient of the Lagrangian achieves an $\mathcal{O}(T^{-1/4})$ bound, and with probability at least $1-2/T^{2/3}$, the constraint violation attains an $\mathcal{O}(T^{-1/4})$ bound, and with probability at least $1-3/T^{2/3}$ and complementarity violation achieves $\mathcal{O}(T^{-1/4})$ bound. All results are established under two mild conditions: (i) weak convexity of all problem functions, and (ii) the existence of a strictly feasible point. The proposed PMQSopt algorithm is a sequential strongly convex programming method that is readily implementable. Numerical experiments validate its efficiency.
10:30-11:00 刘歆
Momentum Stability and Adaptive Control in Stochastic Reconfiguration
Variational Monte Carlo (VMC) combined with expressive neural network wavefunctions has become a powerful route to high-accuracy ground-state calculations, yet its practical success hinges on efficient and stable wavefunction optimization. While stochastic reconfiguration (SR) provides a geometry-aware preconditioner motivated by imaginary-time evolution, its Kaczmarzinspired variant, subsampled projected-increment natural gradient descent (SPRING), achieves state-of-the-art empirical performance. However, the effectiveness of SPRING is highly sensitive to the choice of a momentum-like parameter $\mu$. The origin of this sensitivity and the instability observed at $\mu=1$ have remained unclear. In this work, we clarify the distinct mechanisms governing the regimes $\mu<1$ and $\mu=1$. We establish convergence guarantees for $0\le\mu<1$ under mild assumptions, and construct counterexamples showing that $\mu=1$ can induce divergence via uncontrolled growth along kernel-related directions when the step-size is not summable. Motivated by this theoretical insight and extensive numerical experiments, we further propose a tuning-free adaptive strategy for selecting μ based on spectral flatness and subspace overlap. This approach achieves performance comparable to optimally tuned SPRING, while significantly improving robustness in VMC optimization.
11:00-11:30 丁超
Stratification for Nonlinear Semidefinite Programming
We develop a stratification framework for nonlinear semidefinite programming (NLSDP) that exposes the geometric structure underlying the generally nonsmooth Karush--Kuhn--Tucker (KKT) system. By exploiting the inertia stratification of the symmetric matrix space $\mathbb{S}^{n}$ and lifting it to the primal--dual space, we derive the corresponding stratum-wise variational analysis. In particular, we introduce stratum-restricted strong metric regularity and provide an exact characterization in terms of two verifiable weak-form conditions, namely the weak second order condition (W-SOC) and the weak strict Robinson constraint qualification (W-SRCQ). Moreover, the W-SRCQ is interpreted geometrically via transversality, which implies its genericity in the ambient space and its stability along individual strata. By combining the stratum-wise theory with an across-strata propagation analysis, we further show that classical strong-form regularity conditions are equivalent to the local uniform validity of their stratum-restricted counterparts. On the algorithmic side, we propose a globalized stratified Gauss--Newton method for minimizing a nonsmooth least-squares merit function, which incorporates stratum-tangential Levenberg--Marquardt steps, normal escape directions, and an eigenvalue-triggered correction mechanism. It is proved that the resulting iterates converge globally to a directionally stationary point. Furthermore, under the W-SOC and the strict Robinson constraint qualification (SRCQ) at a KKT pair, the method identifies the active stratum and attains local quadratic convergence. These regularity requirements are weaker than the standard nondegeneracy assumptions commonly imposed in the NLSDP literature.
14:00-14:30 王晓
Stochastic approximation methods for nonconvex constrained optimization
Nonconvex constrained optimization is a vital research area within the optimization community, encompassing a wide range of applications across various fields. However, addressing nonconvex constrained optimization presents significant challenges due to the large-scale data and inherent uncertainties as well as potentially nonconvex functional constraints in optimization models. In this talk, I will report our recent progress on stochastic approximation methods for nonconvex constrained optimization that include established complexity bounds and/or convergence properties.
14:30-15:00 温金明
整数和稀疏信号重构的数学原理与算法设计
在通信、信号处理、全球定位系统、数论、优化、医疗成像、雷达定位等应用中我们经常需要从一个线性模型中重构一个整数或一个稀疏信号。由于整数信号的离散性以及稀疏信号观测模型的高度欠定性,设计快速高效的重构算法极其困难。本报告介绍整数信号和稀疏信号重构的理论,高效算法以及瓶颈。
15:20-15:50 孙海琳
Statistical Robustness of Kernel Learning Estimatorwith Respect to Data Perturbation
Inspired by the recent work \cite{guo2023statistical} on the statistical robustness of empirical risks in reproducing kernel Hilbert space (RKHS) where the training data are potentially perturbed or even corrupted, we take a step further in this paper to investigate the statistical robustness of the kernel learning estimator (the regularized empirical risk minimizer). We begin by deriving qualitative statistical robustness of the estimator for a broad class of convex loss functionswhen all of the training data are potentially perturbed under some topological structures, and then move on to consider the quantitative statistical robustness of the estimator for a specific case that the loss function is twice continuously differentiable and convex. In the latter case, we derive the first-order optimality condition of the regularized expected risk minimization problem, which is essentially a stochastic variational inequality problem (SVIP) in RKHS, and then use the SVIP as a platform to investigate local and global Lipschitz continuity of the regularized risk minimizer against perturbation of the probability distribution under the Fortet-Mourier metric. A crucial assumption in the analysis is that the perturbed data are independent and identically distributed. In some practical applications, this assumption may not be fulfilled if a small proportion of perceived data is seriously perturbed/contaminated. In this case, we use the influence function to investigate the impact of single data perturbation on the regularized risk minimizer. Differing from Steinwart and Christmann~\cite[Chapter 10]{StC08}, we concentrate on constrained expected risk minimization problems. The research is essentially down to the derivation of the implicit function theorem of the SVIP in RKHS. Finally, we illustrate our theoretical analysis with a couple of academic examples.
15:50-16:20 徐姿
Restart-Free (Accelerated) Gradient Sliding Methods for Strongly Convex Composite Optimization
In this paper, we study a class of composite optimization problems whose objective function is given by the summation of a general smooth and nonsmooth component, together with a relatively simple nonsmooth term. While restart strategies are commonly employed in first-order methods to achieve optimal convergence under strong convexity, they introduce structural complexity and practical overhead, making algorithm design and nesting cumbersome. To address this, we propose a <b>restart-free</b> stochastic gradient sliding algorithm that eliminates the need for explicit restart phases when the simple nonsmooth component is strongly convex. Through a novel and carefully designed parameter selection strategy, we prove that the proposed algorithm achieves an $\epsilon$-solution with only $\mathcal{O}(\log(\frac{1}{\epsilon}))$ gradient evaluations for the smooth component and $\mathcal{O}(\frac{1}{\epsilon})$ stochastic subgradient evaluations for the nonsmooth component, matching the optimal complexity of existing multi-phase (restart-based) methods. Moreover, for the case where the nonsmooth component is structured, allowing the overall problem to be reformulated as a bilinear saddle-point problem, we develop a restart-free accelerated stochastic gradient sliding algorithm. We show that the resulting method requires only $\mathcal{O}(\log(\frac{1}{\epsilon}))$ gradient computations for the smooth component while preserving an overall iteration complexity of $\mathcal{O}(\frac{1}{\sqrt{\epsilon}})$ for solving the corresponding saddle-point problems. Our work thus provides simpler, restart-free alternatives that retain the optimal convergence guarantees of their more complex, restart-based counterparts.
16:20-16:50 梁经纬
Adaptive Dimension Reduction for Overlapping Group Sparsity
Typical dimension reduction techniques for sparse optimization involve screening strategies based on a dual certificate derived from the first-order optimality condition, approximating the gradients or exploiting some inherent low dimensional structure that an optimization algorithm promotes. Screening rules for overlapping group lasso are generally less developed because the subgradient structure is more complex and the link between sparsity pattern and the dual vector is generally indirect. In this talk, I will present a new strategy for certifying the support of the overlapping group lasso and demonstrate how this can be applied significantly accelerate the performance of numerical methods.
09:00-09:30 文再文
智能数学推理:形式化与定理证明
本报告探讨通过数学形式化推动智能数学推理的技术路线。与传统依赖直觉的证明方式不同,形式化要求每一步都经过严格论证,具有验证可靠性高、定理可复用、支持模块化思维、可自动化验证等优势。我们将介绍数学优化形式化进展,随后阐述大规模数学文献自动形式化工具M2F,Lean MCP工具,数学教材形式化库ReasBook,应用数学形式化测评集AMBER,利用Lean核心表达式树实现基于树状结构的前提选择,以及从结构到实例的定理自动形式化框架等。
09:30-10:00 游科友
Reliably Learn to Trim Multiparametric Quadratic Programs
In numerous critical applications, it is necessary to solve a sequence of convex multiparametric quadratic programs (mp-QPs) involving many linear inequalities under limited computational resources. This problem is nontrivial and has remained a central research topic in the control and optimization communities for decades. It is well recognized that handling inequality constraints is typically computationally expensive, even though a large portion of them may be redundant—that is, their removal does not affect the optimal solution. This work proposes an efficient algorithm that leverages knowledge from previously solved mp-QPs to reliably identify and remove redundant inequalities in a new mp-QP instance with a different parameter vector. As a result, the trimmed mp-QP becomes significantly easier to solve. We further extend the approach to linear model predictive control (MPC) problems formulated as mp-QPs. Importantly, the solutions of the mp-QP instances from both offline and closed-loop system can be exploited to eliminate redundant inequalities. We demonstrate that, under the proposed method, the number of remaining linear inequalities reduces to zero within a finite and explicitly characterized time step, and this convergence rate can be accelerated by increasing offline computational effort. Numerical results are provided to illustrate the effectiveness and computational advantages of the proposed learning-based constraint removal method.
10:20-10:50 袁坤
Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement
Pre-training Large Language Models requires immense computational resources, making optimizer efficiency essential. The optimization landscape is highly anisotropic, with loss reduction driven predominantly by progress along flat directions. While matrix-based optimizers such as Muon and SOAP leverage fine-grained curvature information to outperform AdamW, their updates tend toward isotropy---relatively conservative along flat directions yet potentially aggressive along sharp ones. To address this limitation, we first establish a unified Riemannian Ordinary Differential Equation (ODE) framework that elucidates how common adaptive algorithms operate synergistically: the preconditioner induces a Riemannian geometry that mitigates ill-conditioning, while momentum serves as a Riemannian damping term that promotes convergence. Guided by these insights, we propose \textbf{LITE}, a generalized acceleration strategy that enhances training dynamics by applying larger Hessian damping coefficients and learning rates along flat trajectories. Extensive experiments demonstrate that LITE significantly accelerates both Muon and SOAP across diverse architectures (Dense, MoE), parameter scales (130M--1.3B), datasets (C4, Pile), and learning-rate schedules (cosine, warmup-stable-decay). Theoretical analysis confirms that LITE facilitates faster convergence along flat directions in anisotropic landscapes, providing a principled approach to efficient LLM pre-training.
10:50-11:20 李安琪
从数据生成到算法求解——树结构组合优化问题的智能求解方法研究
本报告围绕“从数据生成到算法求解”的整体框架,系统研究树结构组合优化问题的智能求解方法。针对训练数据匮乏与实例不可控的问题,我们首先构建了基于矩阵奇异值分解的可控实例生成方法,实现对问题结构与最优解的精确刻画。在此基础上,从几何视角出发,将线性规划可行域刻画为伪树结构,并引入蒙特卡洛树搜索机制,设计具有前瞻性的最优转轴规则。进一步地,结合模仿学习,发展了高效的端到端求解策略,实现从数据到决策的直接映射。最后,将上述方法推广至以布尔可满足性问题为代表的树结构组合优化问题,构建了统一的树搜索学习框架。
11:20-11:50 孟德宇
Filter Parametrization Methods and Their Applications
The convolution operator is the core of convolutional networks and plays a vital role in modern deep learning. However, the commonly used discrete convolution kernels lack flexible transformability, remaining functionally limited and unsuitable for operations such as rotation and scale invariance modeling, and non-grid convolution. FIlter parametrization (continuous convolution) methods are a key technique to alleviate these challenges, making them highly valuable for research. Currently, the study of fIlter parametrizationmethods is still in its early stages, with many limitations yet to be addressed. This report introduces a new fIlter parametrization method tailored for low-level vision tasks and demonstrates its applications through the construction of rotation-equivariant convolution operators, among other examples.