Date 
Topic 
Comment 
15.12.2022 
PACBayesian Learning of Optimization Algorithms
We apply the PACBayes theory to the setting of learningtooptimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PACbounds) and explicit tradeoff between a high probability of convergence and a high convergence speed. Even in the limit case, where convergence is guaranteed, our learned optimization algorithms provably outperform related algorithms based on a (deterministic) worstcase analysis. Our results rely on PACBayes bounds for general, unbounded lossfunctions based on exponential families. By generalizing existing ideas, we reformulate the learning procedure into a onedimensional minimization problem and study the possibility to find a global minimum, which enables the algorithmic realization of the learning procedure. As a proofofconcept, we learn hyperparameters of standard optimization algorithms to empirically underline our theory.

Talk by Michael Sucker 
08.12.2022 
SAM as an Optimal Relaxation of Bayes 
Talk by Thomas Möllenhoff 
01.12.2022 
FixedPoint Automatic Differentiation of ForwardBackward Splitting Algorithms for Partly Smooth Functions
FixedPoint Automatic Differentiation of ForwardBackward Splitting Algorithms for Partly Smooth Functions A large class of nonsmooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning optmization problems. We show that under partial smoothness and other mild assumptions, Automatic Differentiation (AD) of the sequence generated by proximal splitting algorithms converges to the derivative of the solution mapping. For a variant of automatic differentiation, which we call FixedPoint Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical practical image denoising problem by learning the regularization term.

Talk by Sheheryar Mehmood 
24.11.2022 
Inertial QuasiNewton Methods for Monotone Inclusion: Efficient Resolvent Calculus and PrimalDual Methods
We introduce an inertial quasiNewton ForwardBackward Splitting Algorithm to solve a class of monotone inclusion problems. While the inertial step is computationally cheap, in general, the bottleneck is the evaluation of the resolvent operator. A change of the metric makes its computation hard even for (otherwise in the standard metric) simple operators. In order to fully exploit the advantage of adapting the metric, we develop a new efficient resolvent calculus for a lowrank perturbed standard metric, which accounts exactly for quasiNewton metrics. Moreover, we prove the convergence of our algorithms, including linear convergence rates in case one of the two considered operators is strongly monotone. Beyond the general monotone inclusion setup, we instantiate a novel inertial quasiNewton PrimalDual Hybrid Gradient Method for solving saddle point problems. The favourable performance of our inertial quasiNewton PDHG method is demonstrated on several numerical experiments in image processing.

Talk by Shida Wang 
17.11.2022 
Bolte, Combettes, Pauwels: The Iterates of the FrankWolfe Algorithm May Not Converge, 2022. 
reading session 
10.11.2022 
Kunstner, Kumar, Schmidt: HomeomorphicInvariance of EM: NonAsymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent, 2021. 
reading session 
03.11.2022 
Mei, Bai, Montanari: The Landscape of Empirical Risk for Nonconvex Losses, 2017. 
reading session 
27.10.2022 
Aberdam, Beck: An Accelerated Coordinate Gradient Descent Algorithm for Nonseparable Composite Optimization, 2021. 
reading session 
20.10.2022 
Zhang, Sra: Firstorder Methods for Geodesically Convex Optimization, 2016. 
reading session 
13.10.2022 
Du, Jin, Lee, Jordan: Gradient Descent Can Take Exponential Time to Escape Saddle Points, 2017. 
reading session 
06.10.2022 
Subgradient sampling for nonsmooth nonconvex minimization
We focus on a stochastic minimization problem in the nonsmooth and nonconvex setting which applies for instance to the training of deep learning models. A popular way in the machine learning community to deal with this problem is to use stochastic gradient descent (SGD). This method combines both subgradient sampling and backpropagation, which is an efficient implementation of the chainrule formula on nonsmooth compositions. Due to the incompatibility of these operations in the nonsmooth world, the SGD can generate spurious critical points in the optimization landscape which does not guarantee the convergence of the iterates to the critical set. We will explain in this talk how the model of Conservative Gradients is compatible with subgradient sampling and backpropagation, allowing to obtain convergence results for nonsmooth SGD. By means of definable geometry, we will emphasize that functions used in machine learning are locally endown with geometric properties of piecewise affine functions. In this setting, chainruling nonsmooth functions, and sampling subgradients output conservative gradients, but also, spurious critical points are hardly attained when performing SGD in practice.

Talk by Tam Le 
29.09.2022 
Lyu, Li: Gradient Descent Maximizes the Margin of Homogeneous Neural Networks, 2020. 
reading session 
22.07.2022 
An SDE perspective on stochastic convex optimization
We analyze the global and local behavior of gradientlike flows under stochastic errors towards the aim of solving convex optimization problems with noisy gradient input. We first study the unconstrained differentiable convex case, using a stochastic differential equation where the drift term is minus the gradient of the objective function and the diffusion term is either bounded or squareintegrable. In this context, under Lipschitz continuity of the gradient, our first main result shows almost sure convergence of the objective and the trajectory process towards a minimizer of the objective function. We also provide a comprehensive complexity analysis by establishing several new pointwise and ergodic convergence rates in expectation for the convex, strongly convex, and (local) Lojasiewicz case. The latter, which involves local analysis, is challenging and requires nontrivial arguments from measure theory. Then, we extend our study to the constrained case and more generally to certain nonsmooth situations. We show that several of our results have natural extensions obtained by replacing the gradient of the objective function by a cocoercive monotone operator. This makes it possible to obtain similar convergence results for optimization problems with an additively "smooth + nonsmooth" convex structure. Finally, we consider another extension of our results to nonsmooth optimization which is based on the Moreau envelope.

Talk by Rodrigo Maulen Soto 
15.09.2022 
Bianchi, Hachem, Schechtman: Convergence of constant step stochastic gradient descent for nonsmooth nonconvex functions, 2020. 
reading session 
08.09.2022 
Nonsmooth Implicit Differentiation for MachineLearning and Optimization 
Talk by Tony SilvetiFalls 
01.09.2022 
Jin, Koppel, Rajawat, Mokhtari: Sharpened QuasiNewton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood, 2022. 
reading session 
28.07.  25.08.2022 
Summer Break 
no session 
21.07.2022 
Luo, Chen: From differential equation solvers to accelerated firstorder methods for convex optimization, 2022. 
reading session 
14.07.2022 
Benaim, Hofbauer, Sorin: Stochastic Approximations and Differential Inclusions, 2005. 
reading session 
07.07.2022 
Davis, Drusvyatskiy, Kakade, Lee: Stochastic Subgradient Method Converges on Tame Functions, 2020. 
reading session 
30.06.2022 
Savarino, Schnörr: ContinuousDomain Assignment Flows, 2019. 
reading session 
23.06.2022 
Lewis, Malick: Alternating Projections on Manifolds, 2008. 
reading session 
16.06.2022 
Public Holiday 
no session 
09.06.2022 
Ahn, Sra: From Proximal Point Method to Nesterov's Acceleration, 2020. 
reading session 
02.06.2022 
Luo, Chen: From differential equation solvers to accelerated firstorder methods for convex optimization, 2022. 
reading session 
26.05.2022 
Public Holiday 
no session 
19.05.2022 
Drori, Taylor: Efficient firstorder methods for convex minimization: a constructive approach, 2020. 
reading session 
12.05.2022 
Li, Pong: Calculus of the exponent of KurdykaLojasiewicz inequality and its applications to linear convergence of firstorder methods, 2021. 
reading session 
05.05.2022 
Solving 01 ILPs with Binary Decision Diagrams
We present a Lagrange decomposition method for solving 01 integer linear programs occurring in structured prediction. We propose a sequential and a massively minmarginal averaging schemes for solving the Lagrangean dual and a perturbation technique for decoding primal solutions. For representing subproblems we use binary decision diagrams (BDDs), which support efficient computation of subproblem solutions and update steps. We present experimental results on combinatorial problems from MAP inference for Markov Random Fields, quadratic assignment and cell tracking for developmental biology. Our highly parallel GPU implementation improves, comes close to, or outperform some stateoftheart specialized heuristics while being problem agnostic.

Talk by Paul Swoboda 
28.04.2022 
Bredies, Chenchene, Lorenz, Naldi: Degenerate Preconditioned Proximal Point algorithms, 2021. 
reading session 
21.04.2022 
Frecon, Salzo, Pontil, Gasso: Bregman Neural Networks, 2022. 
reading session 
14.04.2022 
Pedregosa, Scieur: AverageCase Acceleration Through Spectral Density Estimation, 2020. 
reading session 
07.04.2022 
Li, Voroninski: Sparse Signal Recovery from Quadratic Measurements via Convex Programming, 2012. 
reading session 
31.03.2022 
Master Thesis Defense 
Talk by Michael Sucker 
24.03.2022 
Lewis, Tian: The structure of conservative gradient fields, 2021. 
reading session 
17.03.2022 
Progress of current research 
Talk by Sheheryar Mehmood 
10.03.2022 
Progress of current research 
Talk by Rodrigo Maulen 
03.03.2022 
Progress of current research 
Talk by JeanJacques Godeme 
24.02.2022 
Raskutti, Mukherjee: The information geometry of mirror descent, 2013. 
Reading Session 
17.02.2022 
Progress of current research 
Talk by Oskar Adolfson 
10.02.2022 
Muehlebach, Jordan: Continuoustime Lower Bounds for Gradientbased Algorithms, 2020. 
reading session 
03.02.2022 
An Inertial Newton Algorithm for Deep Learning

Talk by Camille Castera 
27.01.2022 
Alvarez, Attouch, Bolte, Redont: A secondorder gradientlike dissipative dynamical system with Hessiandriven damping, 2002. 
Reading Session 
20.01.2022 
Progress of current research 
Talk by Shida Wang 
13.01.2022 
Apidopoulos, Ginatta, Villa: Convergence rates for the HeavyBall continuous dynamics for nonconvex optimization, under PolyakĆojasiewicz conditioning, 2021. 
Reading Session 
23.12.2021 to 06.01.2022 
Break 
no session 
16.12.2021 
Calatroni, Chambolle: Backtracking Strategies for Accelerated Descent Methods with Smooth Composite Objectives, 2019. 
reading session 
09.12.2021 
Sahin, Eftekhari, Alacaoglu, Latorre, Cevher: An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints, 2019. 
reading session 
02.12.2021 
Muehlebach, Jordan: On Constraints in FirstOrder Optimization: A View from NonSmooth Dynamical Systems, 2020. 
reading session 
25.11.2021 
Andersen, Dahl, Vandenberghe: Logarithmic barriers for sparse matrix cones, 2013. 
reading session 
18.11.2021 
Nesterov: Implementable tensor methods in unconstrained convex optimization, 2021. 
reading session 
11.11.2021 
Wang, Fang, Liu: Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values, 2016. 
reading session 
04.11.2021 
Wilson, Recht, Jordan: A Lyapunov Analysis of Momentum Methods in Optimization, 2016. 
reading session 
28.10.2021 
Connor, Vandenberghe: On the equivalence of the primaldual hybrid gradient method and DouglasRachford splitting, 2020. 
reading session 
21.10.2021 
Bilevel Learning for Inverse Problems
Variational regularization techniques are dominant in the field of inverse problems. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. This issue can be approached by machine learning where we estimate these parameters from data. This is known as "Bilevel Learning" and has been successfully applied to many tasks, some as smalldimensional as learning a regularization parameter, others as highdimensional as learning a sampling pattern in MRI. While mathematically appealing this strategy leads to a nested optimization problem which is computationally difficult to handle. In this talk we discuss several applications of bilevel learning for imaging as well as new computational approaches. There are quite a few open problems in this relatively recent field of study, some of which I will highlight along the way.

Talk by Matthias Ehrhardt 
14.10.2021 
Fadili, Malick, Peyre: Sensitivity Analysis for MirrorStratifiable Convex Functions, 2017. 
reading session 
07.10.2021 
cancelled 
no session 
30.09.2021 
Gower, Schmidt, Bach, Richtarik: Variancereduced methods for machine learning, 2020. 
reading session 
23.09.2021 
Sabach, Teboulle: Faster LagrangianBased Methods in Convex Optimization, 2020. 
reading session 
16.09.2021 
Proximal Algorithms for Scalar and Vector Optimization Problems
This presentation will go through several techniques for solving both scalar and vector optimization problems with various objective functions. On scalar version of optimization problems, we intend to discuss how thirdorder differential equations, or equations with an order higher than three, can contribute to development of fast optimization methods. The first problem to consider is to minimize the realvalued, convex, and continuously differentiable function defined on a Hilbert space. The problem will then be altered to minimize non smooth and nonconvex functions. We show that the convergence rate of the continuous dynamical system and the proximal algorithm obtained from discretization of it is the same. At the end of this route, we want to analyze the convergence of trajectories towards optimal solutions in both continuous and discrete versions. In the case of vector optimization problems, we aim to present a model of Alternating Direction Method of Multipliers (ADMM) which is a simple and powerful proximal algorithm, while the problem is minimization of sum of two vectorvalued functions with a linear constraint respect to variables. This research is vital since there is a lack of literature on proximal techniques for tackling vector optimization problems.

Talk by Maede Ramazannejad 
09.09.2021 
Teboulle, Vaisbourd: Novel proximal gradient methods for nonnegative matrix factorization with sparsity constraints, SIIMS 13(1):381421, 2020. 
reading session 
08.2021 
Break 
no session 
29.07.2021 
Reading: Chatterji, Diakonikolas, Jordan, Bartlett: Langevin Monte Carlo without smoothness, 2020. 
reading session 
22.07.2021 
Pokutta, Spiegel, Zimmer: Deep Neural Network Training with FrankWolfe, 2020. 
reading session 
15.07.2021 
Drori, Teboulle: Performance of firstorder methods for smooth convex minimization: a novel approach, 2020. 
reading session 
08.07.2021 
Lin, Jin, Jordan: On Gradient Descent Ascent for NonconvexConcave Minimax Problems, 2020. 
reading session 
01.07.2021 
Nonconvex Minmax Optimization in Machine Learning: Algorithms and Applications.
Abstract: Minmax Optimization has broad applications in machine learning, e.g., Adversarial Robustness, AUC Maximization, Generative Adversarial Networks, etc. However, modern machine learning models are intrinsically nonconvex (e.g., deep learning, matrix factorization), which brings tremendous challenge for minmax optimization. In this talk, I will talk about my work on provably efficient algorithms for minmax optimization in the presence of nonconvexity. This talk will showcase a few results. First, I will talk about nonconvexconcave minmax optimization, with applications in deep AUC optimization. We designed polynomialtime algorithms to converge to stationary point, as well as maximal AUC value, under different assumptions. Second, I will talk about nonconvexnonconcave minmax optimization, with applications in training Generative Adversarial Networks. The key result is that under the Minty Variational Inequality (MVI) condition, we design polynomialtime algorithms to find stationary points.

Talk by Mingrui Liu 
24.06.2021 
Stochastic trustregion method with adaptive sample sizes for finitesum minimization problems.
In this talk, we present SIRTR (Stochastic Inexact Restoration TrustRegion method) for solving finitesum minimization problems. At each iteration, SIRTR approximates both function and gradient by sampling. The function sample size is computed using a deterministic rule inspired by the inexact restoration method, whereas the gradient sample size can be smaller than the sample size employed in function approximations. Notably, our approach may allow the decrease of the sample sizes at some iterations. We show that SIRTR eventually reaches full precision in evaluating the objective function and we provide a worstcase complexity result on the number of iterations required to achieve full precision. Numerical results on nonconvex binary classification problems confirm that SIRTR is able to provide accurate approximations way before the maximum sample size is reached and without requiring a problemdependent tuning of the parameters involved.

Talk by Simone Rebegoldi 
17.06.2021 
Jiang, Vandenberghe: "Bregman primaldual firstorder method and application to sparse semidefinite programming", 2020. 
reading session 
10.06.2021 
Salimans, Zhang, Radford, Metaxas: "Improving GANs Using Optimal Transport", ICLR 2018. 
reading session 
03.06.2021 
Public Holiday 
no session 
27.05.2021 
Cuturi, Teboul, Vert: "Differentiable Ranks and Sorting using Optimal Transport." NeurIPS, 2019. 
reading session 
20.05.2021 
Adaptive Acceleration for Firstorder Methods.
Abstract: Firstorder operator splitting methods are ubiquitous among many fields through science and engineering, such as inverse problems, image processing, statistics, data science and machine learning, to name a few. In this talk, through the fixedpoint sequence, I will first discuss a geometry property of firstorder methods when applying to solve nonsmooth optimization problems. Then I will discuss the limitation of current widely used "inertial acceleration" technique, and propose a trajectory following adaptive acceleration algorithm. Global convergence is established for the proposed acceleration scheme based on the perturbation of fixedpoint iteration. Locally, connections between the acceleration scheme and the well studied "vector extrapolation technique" in the field of numerical analysis will be discussed, followed by acceleration guarantees of the proposed acceleration scheme. Numeric experiments on various firstorder methods are provided to demonstrate the advantage of the proposed adaptive acceleration scheme.

Talk by Jingwei Liang 
13.05.2021 
Public Holiday 
no session 
06.05.2021 
Rockafellar : "Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming." SIAM Journal on Control, 1974. 
reading session 
29.04.2021 
Attouch, Bolte, Svaiter: "Convergence of descent methods for semialgebraic and tame problems: proximal algorithms, forwardbackward splitting, and regularized GaussSeidel methods." Mathematical Programming, 2013. 
reading session 
22.04.2021 
Implicit differentiation for fast hyperparameter selection in nonsmooth convex learning.
Abstract: Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zeroorder techniques. In this work we study firstorder methods when the inner optimization problem is convex but nonsmooth.
We show that the forwardmode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit differentiation, we show it is possible
to leverage the nonsmoothness of the inner problem to speed up the computation. Finally, we provide a bound on the error made on the hypergradient when the inner optimization problem is solved approximately. Results on regression
and classification problems reveal computational benefits for hyperparameter optimization, especially when multiple hyperparameters are required.

Talk by Quentin Bertrand 
15.04.2021 
Chen, Rubanova, Bettencourt, Duvenaud: "Neural ordinary differential equations." NeurIPS, 2018. 
reading session 
08.04.2021 
Bach: "Submodular functions: from discrete to continuous domains", Mathematical Programming, 2019. 
reading session 
01.04.2021 
Learning to optimize with unrolled algorithms
Abstract: When solving multiple optimization problems sharing the same underlying structure, using iterative algorithms designed for worst case scenario can be considered as inefficient. When one aim at having good solution in
average, it is possible to improve the performances by learning the weights of a neural networked designed to mimic an unfolded optimization algorithm. However, the reason why learning the weights of such a network would accelerate
the problem resolution is not always clear. In this talk, I will first review how one can design unrolled algorithms to solve the linear regression with l1 or TV regularization, with a particular focus on the choice of parametrization
and loss. Then, I will discuss the reasons why such procedure can lead to better results compared to classical optimization, with a particular focus on the choice of step sizes.

Talk by Thomas Moreau 
25.03.2021 
Wright: "Coordinate Descent Algorithms", Mathematical Programming 151:334, 2015. 
reading session 
18.03.2021 
Dragomir, d'Aspremont, Bolte: "Quartic firstorder methods for low rank minimization." Journal of Optimization Theory and Applications, 2021 
reading session 
11.03.2021 
Monsaingeon, Vorotnikov: "The Schrödinger problem on the noncommutative FisherRao space." Calculus of Variations and Partial Differential Equations, 2021. 
reading session 
04.03.2021 
Kolmogorov: "Recursive frankwolfe algorithms", 2020. 
reading session 
25.02.2021 
Bolte, Pauwels: "A mathematical model for automatic differentiation in machine learning". NeurIPS, 2020. 
reading session 
18.02.2021 
Dragomir, Taylor, d'Aspremont, and Bolte: "Optimal complexity and certification of Bregman firstorder methods". arXiv preprint, 2019. 
reading session 
11.02.2021 
Kolmogorov: "Convergent treereweighted message passing for energy minimization". IEEE transactions on pattern analysis and machine intelligence, 2006. 
reading session 
04.02.2021 
Dalalyan: "Theoretical guarantees for approximate sampling from smooth and logconcave densities". Journal of the Royal Statistical Society, 2017. 
reading session 
28.01.2021 
Djolonga, Krause: "Differentiable Learning of Submodular Models". Advances in Neural Information Processing Systems, 2017. 
reading session 
21.01.2021 
"A Stochastic Bregman PrimalDual Splitting Algorithm for Composite Optimization". 
Talk by Tony SilvetiFalls 
14.01.2021 
Van den Berg, Friedlander: "Sparse optimization with leastsquares constraints". SIAM Journal on Optimization, 2011. 
reading session 
17.12.2020 
Stella, Themelis, Patrinos: "Forwardbackward quasiNewton methods for nonsmooth optimization problems". Computational Optimization and Applications 67:443487, 2017. 
reading session 
10.12.2020 
"Lifting the Convex Conjugate in Lagrangian Relaxations". 
Talk by Emanuel Laude 
03.12.2020 
Dlask, Werner: "A Class of Linear Programs Solvable by Coordinatewise Minimization". 2020. 
reading session 
02.12.2020 
S. Wang: "Numerical scheme for Wasserstein distance on Manifold". Master's Thesis Mathematics, University of Bonn, 2020. 
Talk by Shida Wang 
26.11.2020 
Attouch, Chbani, Fadili, Riahi: "Firstorder optimization algorithms via inertial systems with Hessian driven damping". Mathematical Programming, 2020. 
reading session 
19.11.2020 
Ablin, Peyré, Moreau: "Superefficiency of automatic differentiation for functions defined as a minimum". ICML, 2020. 
reading session 
12.11.2020 
Mukkamala, Ochs: "Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms". NeurIPS, 2019. 
Talk by W. Bender 
05.11.2020 
Ghadimi, Lan: Stochastic first and zerothorder methods for nonconvex stochastic programming. 2013. 
reading session 
29.10.2020 
Swoboda, Kolmogorov: "MAP inference via BlockCoordinate FrankWolfe Algorithm". CVPR, 2018. 
reading session 
22.10.2020 
Ahookhosh, Hien, Gillis, Patrinos: A block inertial Bregman proximal algorithm for nonsmooth nonconvex problems, 2020. 
reading session 
15.10.2020 
Pilanci, Ergen: "Neural Networks are Convex Regularizers". 2020. 
reading session 