BIMSA

Stochastic Gradient Descent (SGD), in one form or another, serves as the workhorse method for training modern machine learning models. Amidst its myriad variations, the SGD domain is both extensive and burgeoning, presenting a significant challenge for both practitioners and even experts to understand its landscape and inhabitants. This course offers a mathematically rigorous and comprehensive introduction to the field, drawing upon the most recent advancements and insights. It meticulously constructs a theory of convergence and complexity for SGD's serial, parallel, and distributed variants across strongly convex, convex, and nonconvex settings, incorporating randomness from subsampling, compression, and other sources.

The curriculum also delves into advanced techniques such as acceleration through Polyak momentum or Nesterov extrapolation. A notable portion of the course is dedicated to a unified analysis of a large family of SGD variants. Historically, these variants have demanded distinct intuitions, convergence analyses, and applications, evolving separately across various communities. This framework includes but not limited to the useful techniques: variance reduction, data sampling, coordinate sampling, arbitrary sampling, importance sampling, mini-batching, quantization, sketching, dithering, and sparsification, as well as their combinations. This comprehensive exploration aims to equip learners with a deep understanding of SGD's intricate landscape, fostering the ability to adeptly apply and innovate upon these methods in their work.

2024年04月16日至 06月11日

Weekday	Time	Venue	Online	ID	Password
周二,周四	19:10 - 21:35	A3-2-303	ZOOM 11	435 529 7909	BIMSA

Linear Algebra, Calculus, Convex Analysis, Probability theory

1. Introduction
2. Basic Tools from Convex Analysis, Optimization and Probability
3. Gradient Descent
4. Stochastic Gradient Descent (with Sampling and Minibatching)
5. Acceleration (Polyak Momentum and Nesterov Acceleration)
6. Adaptive Learning Rate (AdaGrad, RMSProp, AdaDelta and ADAM)
7. SGD with Gradient Shift
8. SGD with Control
9. Variance Reduction (SVRG and Loopless-SVRG, SAG and SAGA)
10. Distributed Training: Compressed Gradient Descent (CGD)
11. Randomized Coordinate Descent (RCD)
12. Federated Learning and Local Gradient Descent
13. General Convergence Analysis in the Convex Setting
14. General Convergence Analysis in the Nonconvex Setting
15. Stochastic Newton Method
16. Randomized BFGS

1. Lectures on Convex Optimization – Y. Nesterov
2. Learning Theory from First Principles – F. Bach
3. First-Order Methods in Optimization – A. Beck
4. Large-Scale Convex Optimization: Algorithms and Analyses via Monotone Operators – E.K. Ryu and W.T. Yin
5. First-order and Stochastic Optimization Methods for Machine Learning – G.H. Lan
6. Accelerated Optimization for Machine Learning: First-Order Algorithms – Z.C. Lin, H. Li, C. Fang

Undergraduate , Advanced Undergraduate , Graduate , 博士后 , Researcher

公开

英文

Yi-Shuai Niu, a tenured Associate Professor of Mathematics at Beijing Institute of Mathematical Sciences and Applications (BIMSA), specialized in Optimization, Scientific Computing, Machine Learning, and Computer Sciences. Before joining BIMSA in October 2023, he was a research fellow at the Hong Kong Polytechnic University (2021-2022); an associate professor at Shanghai Jiao Tong University (2014-2021), where he led the “Optimization and Interdisciplinary Research Group” and double-appointed at the ParisTech Elite Institute of Technology and the School of Mathematical Sciences. His earlier roles include postdoc at the University of Paris 6 (2013-2014) and junior researcher both at the French National Center for Scientific Research (CNRS) and Stanford University (2010-2012). He was also a lecturer at the National Institute of Applied Sciences (INSA) of Rouen (2007-2010) in France, where he earned a Ph.D. in Mathematics-Optimization in 2010 and double Masters in Pure and Applied Mathematics and Genie Mathematics in 2006. His research covers a wide range of applied mathematics, with a spotlight on optimization theory, machine learning, high-performance computing, and software development. His works span various interdisciplinary applications including: machine learning, natural language processing, self-driving car, finance, image processing, turbulent combustion, polymer science, quantum chemistry and computing, and plasma physics. His contributions encompass fundamental research, emphasizing novel algorithms for large-scale nonconvex and nonsmooth problems, and practical implementations, focusing on efficient optimization solvers and scientific computing packages using high-performance computing techniques. He developed more than 33 pieces of software and published about 30 articles in prestigious journals and conferences (including SIAM Journal on Optimization, Journal of Scientific Computing, Combustion and Flames, Applied Mathematics and Computation). He was PI of 5 research grants and members of 5 joint international research projects. He was awarded of shanghai teaching achievement award (First prize) in 2017, two outstanding teaching awards (First prize) at Shanghai Jiao Tong University in 2016 and 2017 respectively, as well as 17 awards in international contests of mathematics MCM/ICM (including the INFORMS best paper award in 2017).