Structured composite non-smooth optimization has proved to be extremely useful in data science, where a common trend is the deluge of large-scale multidimensional data, which drives the need for more efficient optimization schemes. While, so far, primarily, first-order schemes have been widely used to solve such optimization problems, they are quickly approaching their natural (and provable) limitations. In contrast, higher-order methods, in particular quasi-Newton ones, are hardly used due to their lack of scalability, the complexity in their mathematical analysis and the deployment in the non-smooth case. TRINOM-DS will unlock these bottlenecks and develop theoretical, numerical and algorithmic advances to exploit the great potential of quasi-Newton-type schemes for non-smooth large-scale optimization problems, which are ubiquitous in data science. These algorithms will be developed in a variety of challenging settings, and are expected to have far reaching applications in data science, e.g., machine learning (optimal transport, deep learning, etc), imaging and computer vision. They will be implemented as fast optimization codes, possibly on dedicated architectures, that will be made publicly available following the philosophy of reproducible research. TRINOM-DS's members are very active players in the European school of optimization and data science, which makes this project a unique opportunity to give a scalable, computational and practical embodiment to higher-order non-smooth optimization methods.

Funded by the ANR DFG 2020 NLE.

Open Positions: We have several open PhD and PostDoc positions for this project. Applications are welcome from now on. If you are interested, please send an eMail to J. Fadili (jalal.fadili@ensicaen.fr) or P. Ochs (ochs@math.uni-tuebingen.de) and include your CV as well as a letter describing your interest and proficiency in the field.

Our Preliminary Work:

[BF12] S. Becker, J. Fadili: A quasi-Newton proximal splitting method.
In Advances in Neural Information Processing Systems (NIPS), pages 2618--2626. Curran Associates Inc., 2012.

In our joint work on proximal quasi-Newton methods, we have built the basement for this project. The contribution distinguishes significantly from related work, which either study the theory of variable metric methods or their practical implementation. In contrast, we developed in
[BF12] and [BFO19] an advanced and efficient proximal calculus for a common type of non-diagonal quasi-Newton metrics, which can be written as the sum of a diagonal matrix and a rank-r matrix. In particular, we have studied a proximal SR1- and BFGS-type quasi-Newton method with a favorable performance in practical convex optimization problems. While this contribution is a significant step towards the usage of quasi-Newton methods for non-smooth optimization, their establishment as generic state-of-the-art solvers for applications in large-scale data science requires successful completion of the work program in this project proposal.

The surprising equivalence between types of (proximal) quasi-Newton methods and extrapolated (proximal) gradient based algorithms has been established. A certain choice of extrapolation leads exactly to the proximal SR1-metric used in [BF12] and [BFO19]. This equivalence yields a novel point of view on proximal quasi-Newton methods and serves as a basis for further development for both types of algorithms.

In [VPF15] and [LFP17], we have developed a unifying theory for exact low-complexity model finite-time identification by proximal splitting methods for a large class of regularizers called partly smooth functions. In [FMP18] and [FGMP18], a key limiting assumption of [VPF15], [LFP17] was removed, but for a smaller class of regularizers coined mirror-stratifiable functions, while ensuring finite-time approximate (but not exact) model identification. These results were the first to explain empirical observations made in inverse problems and machine learning. They will provide a good basis for the design and the analysis of scalable quasi-Newton methods when applied to solve problems with low-complexity regularization.