This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Disciplined Saddle Programming

Philipp Schiele111Equal contribution. Department of Statistics, Ludwig-Maximilians-Universität München Eric Luxenberg11footnotemark: 1 Department of Electrical Engineering, Stanford University Stephen Boyd Department of Electrical Engineering, Stanford University
Abstract

We consider convex-concave saddle point problems, and more generally convex optimization problems we refer to as saddle problems, which include the partial supremum or infimum of convex-concave saddle functions. Saddle problems arise in a wide range of applications, including game theory, machine learning, and finance. It is well known that a saddle problem can be reduced to a single convex optimization problem by dualizing either the convex (min) or concave (max) objectives, reducing a min-max problem into a min-min (or max-max) problem. Carrying out this conversion by hand can be tedious and error prone. In this paper we introduce disciplined saddle programming (DSP), a domain specific language (DSL) for specifying saddle problems, for which the dualizing trick can be automated. The language and methods are based on recent work by Juditsky and Nemirovski [JN22], who developed the idea of conic-representable saddle point programs, and showed how to carry out the required dualization automatically using conic duality. Juditsky and Nemirovski’s conic representation of saddle problems extends Nesterov and Nemirovski’s earlier development of conic representable convex problems; DSP can be thought of as extending disciplined convex programming (DCP) to saddle problems. Just as DCP makes it easy for users to formulate and solve complex convex problems, DSP allows users to easily formulate and solve saddle problems. Our method is implemented in an open-source package, also called DSP.

1 Introduction

We consider saddle problems, by which we mean convex-concave saddle point problems or, more generally, convex optimization problems that include the partial supremum or infimum of convex-concave saddle functions. Saddle problems arise in various fields such as game theory, robust and minimax optimization, machine learning, and finance.

While there are algorithms specifically designed to solve some types of saddle point or minimax problems, another approach is to convert them into standard convex optimization problems using a trick based on duality that can be traced back to at least the 1920s. The idea is to express the infima or suprema that appear in the saddle problem via their duals, which converts them to suprema or infima, respectively. Roughly speaking, this turns a min-max problem into a min-min (or max-max) problem, which can then be solved by standard methods. Specific cases of this trick are well known; the classical example is converting a matrix game, a specific saddle point problem, into a linear program (LP) [MVN53]. While the dualizing trick has been known and used for almost 100 years, it has always been done by hand, for specific problems. It can only be carried out by those who have a working knowledge of duality in convex optimization, and are aware of the trick.

In this paper we propose an automated method for carrying out the dualizing trick. Our method is based on the theory of conic representation of saddle point problems, developed recently by Juditsky and Nemirovski [JN22]. Based on this development, we have designed a domain specific language (DSL) for describing saddle problems, which we refer to as disciplined saddle programming (DSP). When a problem description complies with the syntax rules, i.e., is DSP-compliant, it is easy to verify that it is a valid saddle problem, and more importantly, automatically carry out the dualizing trick. We have implemented the DSL in an open source software package, also called DSP, which works with CVXPY [DB16], a DSL for specifying and solving convex optimization problems. DSP makes it easy to specify and solve saddle problems, without any expertise in (or even knowledge of) convex duality. Even for those with the required expertise to carry out the dualizing trick by hand, DSP is less tedious and error prone.

DSP is disciplined, meaning it is based on a small number of syntax rules that, if followed, guarantee that the specified problem is a valid saddle problem. It is analogous to disciplined convex programming (DCP) [GBY06], which is a DSL for specifying convex optimization problems. When a problem specification follows these syntax rules, i.e., is DCP-compliant, it is a valid convex optimization problem, and more importantly can be automatically converted to an equivalent cone program, and then solved. As a practical matter, DCP allows a large number of users to specify and solve even complex convex optimization problems, with no knowledge of the reduction to cone form. Indeed, most DCP users are blissfully unaware of how their problems are solved, i.e., a reduction to cone form. DCP was based on the theory of conic representations of convex functions and problems, pioneered by Nesterov and Nemirovski [NN92]. Widely used implementations of DCP include CVXPY [DB16], Convex.jl [Ude+14], CVXR [FNB20], YALMIP [Lof04], and CVX [GB14]. Like DCP did for convex problems, DSP makes it easy to specify and solve saddle problems, with most users unaware of the dualization trick and reduction used to solve their problems.

1.1 Previous and related work

Saddle problems.

Studying saddle problems is a long-standing area of research, resulting in many theoretical insights, numerous algorithms for specific classes of problems, and a large number of applications.

Saddle problems are often studied in the context of minimax or maximin optimization [DM90, DP95], which, while dating back to the 1920s and the work of von Neumann and Morgenstern on game theory [MVN53], continue to be active areas of research, with many recent advancements for example in machine learning [Goo+14]. A variety of methods have been developed for solving saddle point problems, including interior point methods [HT03, Nem99], first-order methods [Kor76, Nem04, Nes07, NO09, CLO13], and second-order methods [NP06, Nes08], where many of these methods are specialized to specific classes of saddle problems. Depending on the class of saddle problem, the methods differ in convergence rate. For example, for the subset of smooth minimax problems, an overview of rates for different curvature assumptions is given in [The+19]. Due to their close relation to Lagrange duality, saddle problems are commonly studied in the context of convex analysis (see, for example, [BV04, §5.4], [Roc70, §33–37], [RW09, §11.J], [BL06, §4.3]), with an analysis via monotone operators given in [RY22].

The practical usefulness of saddle programming in many applications is also increasingly well known. Many applications of saddle programming are robust optimization problems [BBC11, BTEGN09]. For example, in statistics, distributionally robust models can be used when the true distribution of the data generating process is not known [DA19]. Another common area of application is in finance, with [CPT18, §19.3–4] describing a range of financial applications that can be characterized as saddle problems. Similarly, [Boy+17, GI03, LB00] describe variations of the classical portfolio optimization problem as saddle problems.

Disciplined convex programming.

DCP is a grammar for constructing optimization problems that are provably convex, meaning that they can be solved globally, efficiently and accurately. It is based on the rule that the convexity of a function ff is preserved under composition if all inner expressions in arguments where ff is nondecreasing are convex, and all expressions where ff is nonincreasing are concave, and all other expressions are affine. A detailed description of the composition rule is given in [BV04, §3.2.4]. Using this rule, functions can be composed from a small set of primitives, called atoms, where each atom has known curvature, sign, and monotonicity. Every function that can be constructed from these atoms according to the composition rule is convex, but the converse is not true. The DCP framework has been implemented in many programming languages, including MATLAB [GB14, Lof04], Python [DB16], R [FNB20], and Julia [Ude+14], and is used by researchers and practitioners in a wide range of fields.

Well-structured convex-concave saddle point problems.

As mentioned earlier, disciplined saddle programming is based on Juditsky and Nemirovski’s recent work on well-structured convex-concave saddle point problems [JN22].

1.2 Our contributions

We summarize our contributions as follows:

  • We introduce disciplined saddle programming, a domain specific language for specifying and solving convex-concave saddle problems. To solve the saddle problems, automated dualization is applied to the conic representation of the problem. We extend the existing literature by deriving a procedure that returns both the convex and concave coordinates of the saddle point. This also guarantees that a valid saddle point was found without the need to check for technical conditions (such as compactness). These developments make the theory of conic representable saddle problems practically applicable for the first time.

  • We specify and implement the first DSL that encodes sufficient conditions for conic representability of saddle problems. We develop an open-source Python package, also called DSP, providing a user-friendly interface for specifying and solving saddle problems. Using this implementation, we demonstrate the effectiveness of the framework by solving a variety of saddle problems from different application domains.

1.3 Outline

In §2 we describe saddle programming, which includes the classical saddle point problem, as well as convex problems that include functions described via partial minimization or maximization of a saddle function. We describe some typical applications of saddle programming in §3. In §4 we describe disciplined saddle programming, which is a way to specify saddle programs in such a way that validity is easy to verify, and the reduction to an equivalent cone program can be automated. We describe our implementation in §5, showing how saddle functions, saddle extremum functions, saddle point problems, and saddle problems are specified. We present numerical examples in §6.

2 Saddle programming

2.1 Saddle functions

A saddle function (also referred to as a convex-concave saddle function) f:𝒳×𝒴Rf:\mathcal{X}\times\mathcal{Y}\to{\mbox{\bf R}} is one for which f(,y)f(\cdot,y) is convex for any fixed y𝒴y\in\mathcal{Y}, and f(x,)f(x,\cdot) is concave for any fixed x𝒳x\in\mathcal{X}. The argument domains 𝒳Rn\mathcal{X}\subseteq{\mbox{\bf R}}^{n} and 𝒴Rm\mathcal{Y}\subseteq{\mbox{\bf R}}^{m} must be nonempty closed convex. We refer to xx as the convex variable, and yy as the concave variable, of the saddle function ff.

Examples.

  • Functions of xx or yy alone. A convex function of xx, or a concave function of yy, are trivial examples of saddle functions.

  • Lagrangian of a convex optimization problem. The convex optimization problem

    minimizef0(x)subject toAx=b,fi(x)0,i=1,,m,\begin{array}[]{ll}\mbox{minimize}&f_{0}(x)\\ \mbox{subject to}&Ax=b,\quad f_{i}(x)\leq 0,\quad i=1,\ldots,m,\end{array}

    with variable xRnx\in{\mbox{\bf R}}^{n}, where f0,,fmf_{0},\ldots,f_{m} are convex and ARp×nA\in{\mbox{\bf R}}^{p\times n}, has Lagrangian

    L(x,ν,λ)=f(x)+νT(Axb)+λ1f1(x)++λmfm(x),L(x,\nu,\lambda)=f(x)+\nu^{T}(Ax-b)+\lambda_{1}f_{1}(x)+\cdots+\lambda_{m}f_{m}(x),

    for λ0\lambda\geq 0 (elementwise). It is convex in xx and affine (and therefore also concave) in y=(ν,λ)y=(\nu,\lambda), so it is a saddle function with

    𝒳=i=0,,m𝐝𝐨𝐦fi,𝒴=Rp×R+m,\mathcal{X}=\bigcap_{i=0,\ldots,m}\mathop{\bf dom}f_{i},\qquad\mathcal{Y}={\mbox{\bf R}}^{p}\times{\mbox{\bf R}}_{+}^{m},
  • Bi-affine function. The function f(x,y)=(Ax+b)T(Cy+d)f(x,y)=(Ax+b)^{T}(Cy+d), with 𝒳=Rp\mathcal{X}={\mbox{\bf R}}^{p} and 𝒴=Rq\mathcal{Y}={\mbox{\bf R}}^{q}, is evidently a saddle function. The inner product xTyx^{T}y is a special case of a bi-affine function. For a bi-affine function, either variable can serve as the convex variable, with the other serving as the concave variable.

  • Convex-concave inner product. The function f(x,y)=F(x)TG(y)f(x,y)=F(x)^{T}G(y), where F:RpRnF:{\mbox{\bf R}}^{p}\to{\mbox{\bf R}}^{n} is a nonnegative elementwise convex function and G:RqRnG:{\mbox{\bf R}}^{q}\to{\mbox{\bf R}}^{n} is a nonnegative elementwise concave function.

  • Weighted 2\ell_{2} norm. The function

    f(x,y)=(i=1nyixi2)1/2,f(x,y)=\left(\sum_{i=1}^{n}y_{i}x_{i}^{2}\right)^{1/2},

    with 𝒳=Rn\mathcal{X}={\mbox{\bf R}}^{n} and 𝒴=R+n\mathcal{Y}={\mbox{\bf R}}^{n}_{+}, is a saddle function.

  • Weighted log-sum-exp. The function

    f(x,y)=log(i=1nyiexpxi),f(x,y)=\log\left(\sum_{i=1}^{n}y_{i}\exp x_{i}\right),

    with 𝒳=Rn\mathcal{X}={\mbox{\bf R}}^{n} and 𝒴=R+n\mathcal{Y}={\mbox{\bf R}}^{n}_{+}, is a saddle function.

  • Weighted geometric mean. The function f(x,y)=i=1nyixif(x,y)=\prod_{i=1}^{n}y_{i}^{x_{i}}, with 𝒳=R+n\mathcal{X}={\mbox{\bf R}}_{+}^{n} and 𝒴=R+n\mathcal{Y}={\mbox{\bf R}}^{n}_{+}, is a saddle function.

  • Quadratic form with quasi-semidefinite matrix. The function

    f(x,y)=[xy]T[PSSTQ][xy],f(x,y)=\left[\begin{array}[]{c}x\\ y\end{array}\right]^{T}\left[\begin{array}[]{cc}P&S\\ S^{T}&Q\end{array}\right]\left[\begin{array}[]{c}x\\ y\end{array}\right],

    where the matrix is quasi-semidefinite, i.e., PS+nP\in{\mbox{\bf S}}_{+}^{n} (the set of symmetric positive semidefinite matrices) and QS+n-Q\in{\mbox{\bf S}}_{+}^{n}.

  • Quadratic form. The function f(x,Y)=xTYxf(x,Y)=x^{T}Yx, with 𝒳=Rn\mathcal{X}={\mbox{\bf R}}^{n} and 𝒴=S+n\mathcal{Y}={\mbox{\bf S}}_{+}^{n} (the set of symmetric positive semidefinite n×nn\times n matrices), is a saddle function.

  • As a more esoteric example, the function f(x,Y)=xTY1/2xf(x,Y)=x^{T}Y^{1/2}x, with 𝒳=Rn\mathcal{X}={\mbox{\bf R}}^{n} and 𝒴=S+n\mathcal{Y}={\mbox{\bf S}}_{+}^{n}, is a saddle function.

Combination rules.

Saddle functions can be combined in several ways to yield saddle functions. For example the sum of two saddle functions is a saddle function, provided the domains have nonempty intersection. A saddle function scaled by a nonnegative scalar is a saddle function. Scaling a saddle function with a nonpositive scalar, and swapping its arguments, yields a saddle function: g(x,y)=f(y,x)g(x,y)=-f(y,x) is a saddle function provided ff is. Saddle functions are preserved by pre-composition of the convex and concave variables with an affine function, i.e., if ff is a saddle function, so is f(Ax+b,Cx+d)f(Ax+b,Cx+d). Indeed, the bi-affine function is just the inner product with an affine pre-composition for each of the convex and concave variables.

2.2 Saddle point problems

A saddle point (x,y)𝒳×𝒴(x^{\star},y^{\star})\in\mathcal{X}\times\mathcal{Y} is any point that satisfies

f(x,y)f(x,y)f(x,y)for allx𝒳,y𝒴.f(x^{\star},y)\leq f(x^{\star},y^{\star})\leq f(x,y^{\star})~{}\mbox{for all}~{}x\in\mathcal{X},~{}y\in\mathcal{Y}. (1)

In other words, xx^{\star} minimizes f(x,y)f(x,y^{\star}) over x𝒳x\in\mathcal{X}, and yy^{\star} maximizes f(x,y)f(x^{\star},y) over y𝒴y\in\mathcal{Y}. The basic saddle point problem is to find such a saddle point,

findx,ywhich satisfy (1).\mbox{find}~{}x^{\star},\;y^{\star}~{}\mbox{which satisfy \eqref{e-saddle-point}.} (2)

The value of the saddle point problem is f(x,y)f(x^{\star},y^{\star}).

Existence of a saddle point for a saddle function is guaranteed, provided some technical conditions hold. For example, Sion’s theorem [Sio58] guarantees the existence of a saddle point when 𝒴\mathcal{Y} is compact. There are many other cases.

Examples.

  • Matrix game. In a matrix game, player one chooses i{1,,m}i\in\{1,\ldots,m\}, and player two chooses j{1,,n}j\in\{1,\ldots,n\}, resulting in player one paying player two the amount CijC_{ij}. Player one wants to minimize this payment, while player two wishes to maximize it. In a mixed strategy, player one makes choices at random, from probabilities given by xx and player two makes independent choices with probabilities given by yy. The expected payment from player one to player two is then f(x,y)=xTCyf(x,y)=x^{T}Cy. With 𝒳={xx0, 1Tx=1}\mathcal{X}=\{x\mid x\geq 0,\;\mathbf{1}^{T}x=1\}, and similarly for 𝒴\mathcal{Y}, a saddle point corresponds to an equilibrium, where no player can improve her position by changing (mixed) strategy. The saddle point problem consists of finding a stable equilibrium, i.e., an optimal mixed strategy for each player.

  • Lagrangian. A saddle point of a Lagrangian of a convex optimization problem is a primal-dual optimal pair for the convex optimization problem.

2.3 Saddle extremum functions

Suppose ff is a saddle function. The function G:𝒳R{}G:\mathcal{X}\to{\mbox{\bf R}}\cup\{\infty\} defined by

G(x)=supy𝒴f(x,y),x𝒳,G(x)=\sup_{y\in\mathcal{Y}}f(x,y),\quad x\in\mathcal{X}, (3)

is called a saddle max function. Similarly, the function H:𝒴R{}H:\mathcal{Y}\to{\mbox{\bf R}}\cup\{-\infty\} defined by

H(x)=infx𝒳f(x,y),y𝒴,H(x)=\inf_{x\in\mathcal{X}}f(x,y),\quad y\in\mathcal{Y}, (4)

is called a saddle min function. Saddle max functions are convex, and saddle min functions are concave. We will use the term saddle extremum (SE) functions to refer to saddle max or saddle min functions. Which is meant is clear from context, i.e., whether it is defined by minimization (infimum) or maximization (supremum), or its curvature (convex or concave). Note that in SE functions, we always maximize (or take supremum) over the concave variable, and minimize (or take infimum) over the convex variable. This means that evaluating G(x)G(x) or H(y)H(y) involves solving a convex optimization problem.

Examples.

  • Dual function. Minimizing a Lagrangian L(x,ν,λ)L(x,\nu,\lambda) over xx gives the dual function of the original convex optimization problem.

  • Maximizing a Lagrangian L(x,ν,λ)L(x,\nu,\lambda) over y=(ν,λ)y=(\nu,\lambda) gives the objective function restricted to the feasible set.

  • Conjugate of a convex function. Suppose ff is convex. Then g(x,y)=f(x)xTyg(x,y)=f(x)-x^{T}y is a saddle function, the Lagrangian of the problem of minimizing ff subject to x=0x=0. Its saddle min is the negative conjugate function: infxg(x,y)=f(y)\inf_{x}g(x,y)=-f^{*}(y).

  • Sum of kk largest entries. Consider f(x,y)=xTyf(x,y)=x^{T}y, with 𝒴={y0y1, 1Ty=k}\mathcal{Y}=\{y\mid 0\leq y\leq 1,\;\mathbf{1}^{T}y=k\}. The associated saddle max function GG is the sum of the kk largest entries of xx.

Saddle points via SE functions.

A pair (x,y)(x^{\star},y^{\star}) is a saddle point of a saddle function ff if and only if xx^{\star} minimizes the convex SE function GG in (3) over x𝒳x\in\mathcal{X}, and yy^{\star} maximizes the concave SE function HH defined in (4) over y𝒴y\in\mathcal{Y}. This means that we can find saddle points, i.e., solve the saddle point problem (2), by solving the convex optimization problem

minimizeG(x)subject tox𝒳,\begin{array}[]{ll}\mbox{minimize}&G(x)\\ \mbox{subject to}&x\in\mathcal{X},\end{array} (5)

with variable xx, and the convex optimization problem

maximizeH(y)subject toy𝒴,\begin{array}[]{ll}\mbox{maximize}&H(y)\\ \mbox{subject to}&y\in\mathcal{Y},\end{array} (6)

with variable yy. The problem (5) is called a minimax problem, since we are minimizing a function defined as the maximum over another variable. The problem (6) is called a maximin problem.

While the minimax problem (5) and maximin problem (6) are convex, they cannot be directly solved by conventional methods, since the objectives themselves are defined by maximization and minimization, respectively. There are solution methods specifically designed for minimax and maximin problems [LJJ20, MB09], but as we will see minimax problems involving SE functions can be transformed to equivalent forms that can be directly solved using conventional methods.

2.4 Saddle problems

In this paper we consider convex optimization problems that include SE functions in the objective or constraints, which we refer to as saddle problems. The convex problems that solve the basic saddle point problem (5) and (6) are special cases, where the objective is an SE function. As another example consider the problem of minimizing a convex function ϕ\phi subject to the convex SE constraint H(y)0H(y)\leq 0, which can be expressed as

minimizeϕ(x)subject tof(x,y)0for ally𝒴,\begin{array}[]{ll}\mbox{minimize}&\phi(x)\\ \mbox{subject to}&f(x,y)\leq 0~{}\mbox{for all}~{}y\in\mathcal{Y},\end{array} (7)

with variable xx. The constraint here is called a semi-infinite constraint, since (when 𝒴\mathcal{Y} is not a singleton) it can be thought of as an infinite collection of convex constraints, one for each y𝒴y\in\mathcal{Y} [HK93].

Saddle problems include the minimax and maximin problems (that can be used to solve the saddle point problem), and semi-infinite problems that involve SE functions. There are many other examples of saddle problems, where SE functions can appear in expressions that define the objective and constraints.

Robust cost LP.

As a more specific example of a saddle problem consider the linear program with robust cost,

minimizesupc𝒞cTxsubject toAx=b,x0,\begin{array}[]{ll}\mbox{minimize}&\sup_{c\in\mathcal{C}}c^{T}x\\ \mbox{subject to}&Ax=b,\quad x\geq 0,\end{array} (8)

with variable xRnx\in{\mbox{\bf R}}^{n}, with 𝒞={cFcg}\mathcal{C}=\{c\mid Fc\leq g\}. This is an LP with worst case cost over the polyhedron 𝒞\mathcal{C} [BBC11, BTEGN09]. This is a saddle problem with convex variable xx, concave variable yy, and an objective which is a saddle max function.

2.5 Solving saddle problems

Special cases with tractable analytical expressions.

There are cases where an SE function can be worked out analytically. An example is the max of a linear function over a box,

suplyuyTx=(1/2)(u+l)Tx+(1/2)(ul)T|x|,\sup_{l\leq y\leq u}y^{T}x=(1/2)(u+l)^{T}x+(1/2)(u-l)^{T}|x|,

where the absolute value is elementwise. We will see other cases in our examples.

Subgradient methods.

We can readily compute a subgradient of a saddle max function (or a supergradient of a saddle min function) at a given input, by simply maximizing over the concave variable (minimizing over the convex variable), which is itself a convex optimization problem, and then obtaining a subgradient (supergradient) at that maximizer (minimizer). We can then use any method to solve the saddle problem using these subgradients, e.g., subgradient-type methods, ellipsoid method, or localization methods such as the analytic center cutting plane method. In [MB09] such an approach is used for general minimax problems.

Methods for specific forms.

Many methods have been developed for finding saddle points of saddle functions with the special form

f(x,y)=xTKy+ϕ(x)+ψ(y),f(x,y)=x^{T}Ky+\phi(x)+\psi(y),

where ϕ\phi is convex, ψ\psi is concave, and KK is a matrix [BS15, Con13, CP11, Nes05, Nes05a, CP16]. Beyond this example, there are many other special forms of saddle functions, with different methods adapted to properties such as smoothness, separability, and strong-convex-strong-concavity.

2.6 Dual reduction

A well-known trick can be used to transform a saddle point problem into an equivalent problem that does not contain SE functions. This method of transforming an inner minimization is not new; it has been used since the 1950s when Von Neumann proved the minimax theorem using strong duality in his work with Morgenstern on game theory [MVN53]. Using this observation, he showed that the minimax problem of a two player game is equivalent to an LP. Duality allows us to express the convex (concave) SE function as an infimum (supremum), which facilitates the use of standard convex optimization. We think of this as a reduction to an equivalent problem that removes the SE functions from the objective and constraints.

Robust cost LP.

We illustrate the dualization method for the robust cost LP (8). The key is to express the robust cost or saddle max function supFcgcTx\sup_{Fc\leq g}c^{T}x as an infimum. We first observe that this saddle max function is the optimal value of the LP

maximizexTcsubject toFcg,\begin{array}[]{ll}\mbox{maximize}&x^{T}c\\ \mbox{subject to}&Fc\leq g,\end{array}

with variable cc. Its dual is

minimizegTλsubject toFTλ=x,λ0,\begin{array}[]{ll}\mbox{minimize}&g^{T}\lambda\\ \mbox{subject to}&F^{T}\lambda=x,\quad\lambda\geq 0,\end{array}

with variable λ\lambda. With 𝒞={cFcg}\mathcal{C}=\{c\mid Fc\leq g\}, and assuming 𝒞\mathcal{C} is nonempty, this dual problem has the same optimal value as the primal, i.e.,

supc𝒞cTx=infλ0,FTλ=xgTλ\sup_{c\in\mathcal{C}}c^{T}x=\inf_{\lambda\geq 0,\;F^{T}\lambda=x}g^{T}\lambda

Substituting this into (8) we obtain the problem

minimizegTλsubject toAx=b,x0,FTλ=x,λ0,\begin{array}[]{ll}\mbox{minimize}&g^{T}\lambda\\ \mbox{subject to}&Ax=b,\quad x\geq 0,\quad F^{T}\lambda=x,\quad\lambda\geq 0,\end{array} (9)

with variables xx and λ\lambda. This simple LP is equivalent to the original robust LP (8), in the sense that if (x,λ)(x^{\star},\lambda^{\star}) is a solution of (9), then xx^{\star} is a solution of the robust LP (8).

We will see this dualization trick in a far more general setting in §4.

3 Applications

In this section we describe a few applications of saddle programming.

3.1 Robust bond portfolio construction

We describe here a simplified version of the problem described in much more detail in [LSB22]. Our goal is to construct a portfolio of nn bonds, giving by its holdings vector hR+nh\in{\mbox{\bf R}}^{n}_{+}, where hih_{i} is the number of bond ii held in the portfolio. Each bond produces a cash flow, i.e., a sequence of payments to the portfolio holder, up to some period TT. Let ci,tc_{i,t} be the payment from bond ii in time period tt. Let yRTy\in{\mbox{\bf R}}^{T} be the yield curve, which gives the time value of cash: A payment of one dollar at time tt is worth exp(tyt)\exp(-ty_{t}) current dollars, assuming continuously compounded returns. The bond portfolio value, which is the present value of the total cash flow, can be expressed as

V(h,y)=i=1nt=1Thici,texp(tyt).V(h,y)=\sum_{i=1}^{n}\sum_{t=1}^{T}h_{i}c_{i,t}\exp(-ty_{t}).

This function is convex in the yields yy and concave (in fact, linear) in the holdings vector hh.

Now suppose we do not know the yield curve, but instead have a convex set 𝒴\mathcal{Y} of possible values, with y𝒴y\in\mathcal{Y}. The worst case value of the bond portfolio, over this set of possible yield curves, is

Vwc(h)=infy𝒴V(h,y).V^{\mathrm{wc}}(h)=\inf_{y\in\mathcal{Y}}V(h,y).

We recognize this as a saddle min function. (In this application, yy is the convex variable of the saddle function VV, whereas elsewhere in this paper we use yy to denote the concave variable.)

We consider a robust bond portfolio construction problem of the form

minimizeϕ(h)subject toh,Vwc(h)Vlim,\begin{array}[]{ll}\mbox{minimize}&\phi(h)\\ \mbox{subject to}&h\in\mathcal{H},\quad V^{\mathrm{wc}}(h)\geq V^{\mathrm{lim}},\end{array} (10)

where ϕ\phi is a convex objective, typically a measure of return and risk, \mathcal{H} is a convex set of portfolio constraints (for example, imposing h0h\geq 0 and a total budget), and VlimV^{\mathrm{lim}} is a specified limit on worst case value of the portfolio over the yield curve set 𝒴\mathcal{Y}, which has a saddle min as a constraint.

For some simple choices of 𝒴\mathcal{Y} the worst case value can be found analytically. One example is when 𝒴\mathcal{Y} has a maximum element. In this special case, the maximum element is the minimizer of the value over 𝒴\mathcal{Y} (since VV is a monotone decreasing function of yy). For other cases, however, we need to solve the saddle problem (10).

3.2 Model fitting robust to data weights

We wish to fit a model parametrized by θΘRn\theta\in\Theta\subseteq{\mbox{\bf R}}^{n} to mm observed data points. We do this by minimizing a weighted loss over the observed data, plus a regularizer,

i=1mwii(θ)+r(θ),\sum_{i=1}^{m}w_{i}\ell_{i}(\theta)+r(\theta),

where i\ell_{i} is the convex loss function for observed data point ii, rr is a convex regularizer function, and the weights wiw_{i} are nonnegative. The weights can be used to adjust a data sample that was not representative, as in [BAB21], or to ignore some of the data points (by taking wi=0w_{i}=0), as in [BGM20]. Evidently the weighted loss is a saddle function, with convex variable θ\theta and concave variable ww.

We consider the case when the weights are unknown, but lie in a convex set, w𝒲w\in\mathcal{W}. The robust fitting problem is to choose θ\theta to minimize the worst case loss over the set of possible weights, plus the regularizer,

maxw𝒲i=1mwii(θ)+r(θ).\max_{w\in\mathcal{W}}\sum_{i=1}^{m}w_{i}\ell_{i}(\theta)+r(\theta).

We recognize the first term, i.e., the worst case loss over the set of possible weights, as a saddle max function.

For some simple choices of 𝒲\mathcal{W} the worst case loss can be expressed analytically. For example with

𝒲={w0w1, 1Tw=k},\mathcal{W}=\{w\mid 0\leq w\leq 1,\;\mathbf{1}^{T}w=k\},

(with k[0,n]k\in[0,n]), the worst case loss is given by

maxw𝒲i=1mwii(θ)=ϕ(1,,m),\max_{w\in\mathcal{W}}\sum_{i=1}^{m}w_{i}\ell_{i}(\theta)=\phi(\ell_{1},\ldots,\ell_{m}),

where ϕ\phi is the sum-of-kk-largest entries [BV04, §3.2.3]. (Our choice of symbol kk suggests that kk is an integer, but it need not be.) In this case we judge the model parameter θ\theta by its worst loss on any subset of kk of data points. Put another way, we judge θ\theta by dropping the mkm-k data points on which it does best (i.e., has the smallest loss) [BGM20].

CVXPY directly supports the sum-of-kk-largest function, so the robust fitting problem can be formulated and solved without using DSP. To support this function, CVXPY carries out a transformation very similar to the one that DSP does. The difference is that the transformation in CVXPY is specific to this one function, whereas the one carried out in DSP is general, and would work for other convex weight sets. One such case would be to constrain the Wasserstein distance of the weights to a nominal distribution.

3.3 Robust production problem with worst case prices

We consider the choice of a vector of quantities q𝒬Rnq\in\mathcal{Q}\subseteq{\mbox{\bf R}}^{n}. Positive entries indicate goods we buy, and negative quantities are goods we sell. The set of possible quantities 𝒬\mathcal{Q} is our production set, which is convex. In addition, we have a manufacturing cost associated with the choice qq, given by ϕ(q)\phi(q), where ϕ\phi is a convex function. The total cost is the manufacturing cost plus the cost of goods (which includes revenue), ϕ(q)+pTq\phi(q)+p^{T}q, where pRnp\in{\mbox{\bf R}}^{n} is vector of prices.

We consider the situation when we do not know the prices, but we have a convex set they lie in, p𝒫p\in\mathcal{P}. The worst case cost of the goods is maxp𝒫pTq\max_{p\in\mathcal{P}}p^{T}q. The robust production problem is

minimizeϕ(q)+maxp𝒫pTqsubject toq𝒬,\begin{array}[]{ll}\mbox{minimize}&\phi(q)+\max_{p\in\mathcal{P}}p^{T}q\\ \mbox{subject to}&q\in\mathcal{Q},\end{array} (11)

with variable qq. Here too we can work out analytical expressions for simple choices of 𝒫\mathcal{P}, such as a range for each component, in which case the worst case price is the upper limit for goods we buy, and the lower limit for goods we sell. In other cases, we solve the saddle problem (11).

3.4 Robust Markowitz portfolio construction

Markowitz portfolio construction  [Mar52] chooses a set of weights (the fraction of the total portfolio value held in each asset) by solving the convex problem

maximizeμTwγwTΣwsubject to𝟏Tw=1,w𝒲,\begin{array}[]{ll}\mbox{maximize}&\mu^{T}w-\gamma w^{T}\Sigma w\\ \mbox{subject to}&\mathbf{1}^{T}w=1,\quad w\in\mathcal{W},\end{array}

where the variable is the vector of portfolio weights wRnw\in{\mbox{\bf R}}^{n}, μRn\mu\in{\mbox{\bf R}}^{n} is a forecast of the asset returns, γ>0\gamma>0 is the risk aversion parameter, ΣS++n\Sigma\in{\mbox{\bf S}}_{++}^{n} is a forecast of the asset return covariance matrix, and 𝒲\mathcal{W} is a convex set of feasible portfolios. The objective is called the risk adjusted (mean) return.

Markowitz portfolio construction is known to be fairly sensitive to the (forecasts) μ\mu and Σ\Sigma, which have to be chosen with some care; see, e.g., [BL91]. One approach is to specify a convex uncertainty set 𝒰\mathcal{U} that (μ,Σ)(\mu,\Sigma) must lie in, and replace the objective with its worst case (smallest) value over this uncertainty set. This gives the robust Markowitz portfolio construction problem

maximizeinf(μ,Σ)𝒰(μTwγwTΣw)subject to𝟏Tw=1,w𝒲,\begin{array}[]{ll}\mbox{maximize}&\inf_{(\mu,\Sigma)\in\mathcal{U}}\left(\mu^{T}w-\gamma w^{T}\Sigma w\right)\\ \mbox{subject to}&\mathbf{1}^{T}w=1,\quad w\in\mathcal{W},\end{array}

with variable ww. This is described in, e.g.[Boy+17, GI03, LB00]. We observe that this is directly a saddle problem, with a saddle min objective, i.e., a maximin problem.

For some simple versions of the problem we can work out the saddle min function explicitly. One example, given in [Boy+17], uses 𝒰=×𝒮\mathcal{U}=\mathcal{M}\times\mathcal{S}, where

\displaystyle\mathcal{M} =\displaystyle= {μ+δ|δi|ρi,i=1,,n},\displaystyle\{\mu+\delta\mid|\delta_{i}|\leq\rho_{i},\;i=1,\ldots,n\},
𝒮\displaystyle\mathcal{S} =\displaystyle= {Σ+ΔΣ+Δ0,|Δij|η(ΣiiΣjj)1/2,i,j=1,,n},\displaystyle\{\Sigma+\Delta\mid\Sigma+\Delta\succeq 0,\;|\Delta_{ij}|\leq\eta(\Sigma_{ii}\Sigma_{jj})^{1/2},\;i,j=1,\ldots,n\},

where ρ>0\rho>0 is a vector of uncertainties in the forecast returns, and η(0,1)\eta\in(0,1) is a parameter that scales the perturbation to the forecast covariance matrix. (We interpret δ\delta and Δ\Delta as perturbations of the nominal mean and covariance μ\mu and Σ\Sigma, respectively.) We can express the worst case risk adjusted return analytically as

inf(μ,Σ)𝒰(μTwγwTΣw)=μTwγwTΣwρT|w|γη(i=1nΣii1/2|wi|)2.\inf_{(\mu,\Sigma)\in\mathcal{U}}\left(\mu^{T}w-\gamma w^{T}\Sigma w\right)=\mu^{T}w-\gamma w^{T}\Sigma w-\rho^{T}|w|-\gamma\eta\left(\sum_{i=1}^{n}\Sigma_{ii}^{1/2}|w_{i}|\right)^{2}.

The first two terms are the nominal risk adjusted return; the last two terms (which are nonpositive) represent the cost of uncertainty.

4 Disciplined saddle programming

4.1 Saddle function calculus

We use the notation ϕ(x,y):𝒳×𝒴Rn×mR\phi(x,y):\mathcal{X}\times\mathcal{Y}\subseteq{\mbox{\bf R}}^{n\times m}\to{\mbox{\bf R}} to denote a saddle function with concave variables xx and convex variables yy. The set of operations that, when performed on saddle functions, preserves the saddle property are called the saddle function calculus. The calculus is quite simple, and consists of the following operations:

  1. 1.

    Conic combination of saddle functions. Let ϕi(xi,yi)\phi_{i}(x_{i},y_{i}), i=1,,ki=1,\ldots,k be saddle functions. Let θi0\theta_{i}\geq 0 for each ii. Then the conic combination, ϕ(x,y)=i=1kθiϕi(xi,yi)\phi(x,y)=\sum_{i=1}^{k}\theta_{i}\phi_{i}(x_{i},y_{i}), is a saddle function.

  2. 2.

    Affine precomposition of saddle functions. Let ϕ(x,y)\phi(x,y) be a saddle function, with xRnx\in{\mbox{\bf R}}^{n} and yRmy\in{\mbox{\bf R}}^{m}. Let ARn×qA\in{\mbox{\bf R}}^{n\times q}, bRnb\in{\mbox{\bf R}}^{n}, CRm×pC\in{\mbox{\bf R}}^{m\times p}, and dRmd\in{\mbox{\bf R}}^{m}. Then, with uRqu\in{\mbox{\bf R}}^{q} and vRpv\in{\mbox{\bf R}}^{p}, the affine precomposition, ϕ(Au+b,Cv+d)\phi(Au+b,Cv+d), is a saddle function.

  3. 3.

    Precomposition of saddle functions. Let ϕ(x,y):𝒳×𝒴Rn×mR\phi(x,y):\mathcal{X}\times\mathcal{Y}\subseteq{\mbox{\bf R}}^{n\times m}\to{\mbox{\bf R}} be a saddle function, with xRnx\in{\mbox{\bf R}}^{n} and yRmy\in{\mbox{\bf R}}^{m}. The precomposition with a function f:RpRnf:{\mbox{\bf R}}^{p}\to{\mbox{\bf R}}^{n}, ϕ(f(u),y)\phi(f(u),y), is a saddle function if for each i=1,,ni=1,\ldots,n one of the following holds:

    • fi(u)f_{i}(u) is convex and ϕ\phi is nondecreasing in xix_{i} for all y𝒴y\in\mathcal{Y} and all x𝒳x\in\mathcal{X}.

    • fi(u)f_{i}(u) is concave and ϕ\phi is nonincreasing in xix_{i} for all y𝒴y\in\mathcal{Y} and all x𝒳x\in\mathcal{X}.

    Similarly, the precomposition with a function g:RqRmg:{\mbox{\bf R}}^{q}\to{\mbox{\bf R}}^{m}, ϕ(x,g(v))\phi(x,g(v)), is a saddle function if for each j=1,,mj=1,\ldots,m one of the following holds:

    • gj(v)g_{j}(v) is convex and ϕ\phi is nonincreasing in yjy_{j} for all x𝒳x\in\mathcal{X} and all y𝒴y\in\mathcal{Y}.

    • gj(v)g_{j}(v) is concave and ϕ\phi is nondecreasing in yjy_{j} for all x𝒳x\in\mathcal{X} and all y𝒴y\in\mathcal{Y}.

4.2 Conic representable saddle functions

Nemirovski and Juditsky propose a class of conic representable saddle functions which facilitate the automated dualization of saddle problems [JN22]. We will first introduce some terminology and notation, and then describe the class of conic representable saddle functions.

Notation.

We use the notation ϕ(x,y):𝒳×𝒴Rn×mR\phi(x,y):\mathcal{X}\times\mathcal{Y}\subseteq{\mbox{\bf R}}^{n\times m}\to{\mbox{\bf R}} to denote a saddle function which is convex in xx and concave in yy. Let KxK_{x}, KyK_{y} and KK be members of a collection 𝒦\mathcal{K} of closed, convex, and pointed cones with nonempty interiors in Euclidean spaces such that 𝒦\mathcal{K} contains a nonnegative ray, is closed with respect to taking finite direct products of its members, and is closed with respect to passing from a cone to its dual. We denote conic membership zKz\in K by zK0z\succeq_{K}0. We call a set 𝒳Rn\mathcal{X}\subseteq{\mbox{\bf R}}^{n} 𝒦\mathcal{K}-representable if there exist constant matrices AA and BB, a constant vector cc, and a cone K𝒦K\in\mathcal{K} such that

𝒳={xu:Ax+BuKc}.\mathcal{X}=\{x\mid\exists u:Ax+Bu\preceq_{K}c\}.

CVXPY [DB16] can implement a function ff exactly when its epigraph {(x,u)f(x)u}\{(x,u)\mid f(x)\leq u\} is 𝒦\mathcal{K}-representable.

Conic representable saddle functions.

Let 𝒳\mathcal{X} and 𝒴\mathcal{Y} be nonempty and possessing 𝒦\mathcal{K}-representations

𝒳={xu:Ax+BuKc},𝒴={yv:Cy+DvKe}.\mathcal{X}=\{x\mid\exists u:Ax+Bu\preceq_{K}c\},\quad\mathcal{Y}=\{y\mid\exists v:Cy+Dv\preceq_{K}e\}.

A saddle function ϕ(x,y):𝒳×𝒴R\phi(x,y):\mathcal{X}\times\mathcal{Y}\to{\mbox{\bf R}} is 𝒦\mathcal{K}-representable if there exist constant matrices PP, QQ, RR, constant vectors pp and ss and a cone K𝒦K\in\mathcal{K} such that for each x𝒳x\in\mathcal{X} and y𝒴y\in\mathcal{Y},

ϕ(x,y)=inff,t,u{fTy+tPf+tp+Qu+RxKs}.\phi(x,y)=\inf_{f,t,u}\{f^{T}y+t\mid Pf+tp+Qu+Rx\preceq_{K}s\}.

Here ff is a vector of the same dimension as yy, tt is a scalar, and uu is a vector. This definition generalizes simple class of bilinear saddle functions. See [JN22] for much more detail.

Automated dualization.

Suppose we have a 𝒦\mathcal{K}-representable saddle function ϕ\phi as above. The conic form allows us to derive a dualized representation of the saddle extremum function

Φ(x)=supy𝒴ϕ(x,y)\Phi(x)=\sup_{y\in\mathcal{Y}}\phi(x,y)

which again admits a tractable conic form, meaning that it can be represented in a DSL like CVXPY. Specifically,

Φ(x)\displaystyle\Phi(x) =\displaystyle= supy𝒴ϕ(x,y)\displaystyle\sup_{y\in\mathcal{Y}}\phi(x,y)
=\displaystyle= supy𝒴inff,t,u{fTy+t|Pf+tp+Qu+RxKs}\displaystyle\sup_{y\in\mathcal{Y}}\inf_{f,t,u}\left\{f^{T}y+t~{}\middle|~{}Pf+tp+Qu+Rx\preceq_{K}s\right\}
=\displaystyle= inff,t,u{supy𝒴(fTy+t)|Pf+tp+Qu+RxKs}\displaystyle\inf_{f,t,u}\left\{\sup_{y\in\mathcal{Y}}\left(f^{T}y+t\right)~{}\middle|~{}Pf+tp+Qu+Rx\preceq_{K}s\right\}
=\displaystyle= inff,t,u{supy𝒴(fTy)+t|Pf+tp+Qu+RxKs}\displaystyle\inf_{f,t,u}\left\{\sup_{y\in\mathcal{Y}}\left(f^{T}y\right)+t~{}\middle|~{}Pf+tp+Qu+Rx\preceq_{K}s\right\}
=\displaystyle= inff,t,u{infλ{λTe|CTλ=f,DTλ=0λK0}+t|Pf+tp+Qu+RxKs}\displaystyle\inf_{f,t,u}\left\{\inf_{\lambda}\left\{\lambda^{T}e~{}\middle|~{}\begin{array}[]{l}C^{T}\lambda=f,\;D^{T}\lambda=0\\ \lambda\succeq_{K^{*}}0\end{array}\right\}+t~{}\middle|~{}Pf+tp+Qu+Rx\preceq_{K}s\right\} (15)

where in (4.2) we use Sion’s minimax theorem [Sio58] to reverse the inf and sup, and in (15) we invoke strong duality to replace the supremum over yy with an infimum over λ\lambda. Concretely, strong duality and the conic structure allow us to equate

supy{fTy|Cy+DvKe}=infλ{λTe|CTλ=f,DTλ=0,λK0},\sup_{y}\left\{f^{T}y~{}\middle|~{}Cy+Dv\preceq_{K}e\right\}=\inf_{\lambda}\left\{\lambda^{T}e~{}\middle|~{}C^{T}\lambda=f,\;D^{T}\lambda=0,\;\lambda\succeq_{K^{*}}0\right\},

where KK^{*} is the dual cone of KK. This is exactly the automated dualization made possible by the conic representable form of ϕ\phi (which DSP provides). Given the conic representation of ϕ\phi, the dualized form is obtained via the explicit formula given in (15).

The final line implies a conic representation of the epigraph of Φ(x)\Phi(x),

{(x,u)Φ(x)u}={(x,u)|λ,f,t,u:λTe+tuCTλ=f,DTλ=0,λK0Pf+tp+Qu+RxKs},\{(x,u)\mid\Phi(x)\leq u\}=\left\{(x,u)~{}\middle|~{}\exists\lambda,f,t,u:\begin{array}[]{l}\lambda^{T}e+t\leq u\\ C^{T}\lambda=f,\;D^{T}\lambda=0,\;\lambda\succeq_{K^{*}}0\\ Pf+tp+Qu+Rx\preceq_{K}s\end{array}\right\},

which is tractable and can be implemented in a DSL like CVXPY. This transformation is exact, and so there is no notion of approximation error or optimality gap arising from the dualization procedure.

A mathematical nuance.

Switching the inf\inf and sup\sup in (4.2) requires Sion’s theorem to hold. A sufficient condition for Sion’s theorem to hold is that the set 𝒴\mathcal{Y} is compact. However, the min and max can be exchanged even if 𝒴\mathcal{Y} is not compact. Then, due to the max-min inequality

maxy𝒴minx𝒳f(x,y)minx𝒳maxy𝒴f(x,y),\max_{y\in\mathcal{Y}}\min_{x\in\mathcal{X}}f(x,y)\leq\min_{x\in\mathcal{X}}\max_{y\in\mathcal{Y}}f(x,y),

the equality in (15) is replaced with a less than or equal to, and we obtain a convex restriction. Thus, if a user creates a problem involving an SE function (as opposed to a saddle point problem only containing saddle functions in the objective), then DSP guarantees that the problem generated is a restriction. This means that the variables returned are feasible and the returned optimal value is an upper bound on the optimal value for the user’s problem.

Obtaining convex and concave saddle point coordinates.

One challenge that arises in transforming the mathematical concept of conic representable saddle functions into a practical implementation is that the automated dualization removes the concave variable from the problem. Additionally, the procedure relies on the technical conditions such as compactness, which we believe should not be exposed in a user interface. We now address these points.

In our implementation, a saddle problem with an SE function in the objective is solved by applying the above automatic dualization to both the objective ϕ\phi and ϕ-\phi and then solving each resulting convex problem. Note that ϕ(x,y)\phi(x,y) is convex in xx and concave in yy, while ϕ(x,y)-\phi(x,y) is concave in xx and convex in yy. We do so in order to obtain both the convex and concave components of the saddle point, since the dualization removes the concave variable. To see this, note that (15) contains xx but not yy (and the opposite holds for the negated problem). The saddle problem is only reported as solved if the optimal value of the problem with objective ϕ\phi, uu, is within a numerical tolerance of the negated optimal value of the problem with objective ϕ-\phi, l-l. If this holds, this actually implies that

maxy𝒴minx𝒳ϕ(x,y)=minx𝒳maxy𝒴ϕ(x,y),\max_{y\in\mathcal{Y}}\min_{x\in\mathcal{X}}\phi(x,y)=\min_{x\in\mathcal{X}}\max_{y\in\mathcal{Y}}\phi(x,y),

i.e., (4.2) was valid, even if for example 𝒴\mathcal{Y} is not compact. To see this, note that solving for ϕ\phi as well as ϕ-\phi results in an upper and a lower bound on the optimal value of the saddle point problem,

maxy𝒴minx𝒳ϕ(x,y)minx𝒳maxy𝒴ϕ(x,y)u,andmaxx𝒳miny𝒴ϕ(x,y)miny𝒴maxx𝒳ϕ(x,y)l.\max_{y\in\mathcal{Y}}\min_{x\in\mathcal{X}}\phi(x,y)\leq\min_{x\in\mathcal{X}}\max_{y\in\mathcal{Y}}\phi(x,y)\leq u,\quad\text{and}\quad\max_{x\in\mathcal{X}}\min_{y\in\mathcal{Y}}-\phi(x,y)\leq\min_{y\in\mathcal{Y}}\max_{x\in\mathcal{X}}-\phi(x,y)\leq-l.

Using symmetry and combining the above inequalities, we obtain

lmaxy𝒴minx𝒳ϕ(x,y)minx𝒳maxy𝒴ϕ(x,y)u.l\leq\max_{y\in\mathcal{Y}}\min_{x\in\mathcal{X}}\phi(x,y)\leq\min_{x\in\mathcal{X}}\max_{y\in\mathcal{Y}}\phi(x,y)\leq u.

Suppose now that l=ul=u. Note that since

ϕ(x,y)maxy𝒴ϕ(x,y)=u,andϕ(x,y)minx𝒳ϕ(x,y)=l,\phi(x^{\star},y^{\star})\leq\max_{y\in\mathcal{Y}}\phi(x^{\star},y)=u,\quad\text{and}\quad\phi(x^{\star},y^{\star})\geq\min_{x\in\mathcal{X}}\phi(x,y^{\star})=l,

we have that ϕ(x,y)=l=u\phi(x^{\star},y^{\star})=l=u. That is, the pair (x,y)(x^{\star},y^{\star}) obtains the optimal value of the saddle point problem. All that remains is to verify that this pair satisfies the saddle point property. We have that ϕ(x,y)ϕ(x,y)\phi(x^{\star},y)\leq\phi(x^{\star},y^{\star}) for all y𝒴y\in\mathcal{Y}, since otherwise u=ϕ(x,y)<maxy𝒴ϕ(x,y)=uu=\phi(x^{\star},y^{\star})<\max_{y\in\mathcal{Y}}\phi(x^{\star},y)=u, a contradiction. Similarly, ϕ(x,y)ϕ(x,y)\phi(x,y^{\star})\geq\phi(x^{\star},y^{\star}) for all x𝒳x\in\mathcal{X}. Taken together, these inequalities state that (x,y)(x^{\star},y^{\star}) is a saddle point, since

ϕ(x,y)ϕ(x,y)ϕ(x,y),x𝒳,y𝒴.\phi(x^{\star},y)\leq\phi(x^{\star},y^{\star})\leq\phi(x,y^{\star}),\quad\forall x\in\mathcal{X},y\in\mathcal{Y}.

Thus, a user need not concern themselves with the compactness of 𝒴\mathcal{Y} (or any other sufficient condition for Sion’s theorem) when using DSP to find a saddle point; if a saddle point problem is solved, then the saddle point property is guaranteed to hold. This mathematical insight extends the work of [JN22], which assumes compactness of 𝒴\mathcal{Y}, allowing users who might be unfamiliar with this technical restriction to use DSP.

5 Implementation

In this section we describe our Python implementation of the concepts and methods described in §4, which we also call DSP. It can be accessed online under an open source license at https://github.com/cvxgrp/dsp. DSP works with CVXPY [DB16], an implementation of a DSL for convex optimization based on DCP. We use the term DSP in two different ways. We use it to refer to the mathematical concept of disciplined saddle programming, and also our specific implementation; which is meant should be clear from the context. The term DSP-compliant refers to a function or expression that is constructed according to the DSP composition rules given in §5.2. It can also refer to a problem that is constructed according to these rules. In the code snippets below, we use the prefix cp to indicate functions and classes from CVXPY. (We give functions and classes from DSP without prefix, whereas they would likely have a prefix such as dsp in real code.)

5.1 Atoms

Saddle functions in DSP are created from fundamental building blocks or atoms. These building blocks extend the atoms from CVXPY [DB16]. In CVXPY, atoms are either jointly convex or concave in all their variables, but in DSP, atoms are (jointly) convex in a subset of the variables and (jointly) concave in the remaining variables. We describe some DSP atoms below. The listing is not exhaustive, and additional atoms can be added as necessary.

Inner product.

The atom inner(x,y) represents the inner product xTyx^{T}y. Since either xx or yy could represent the convex variable, we adopt the convention in DSP that the first argument of inner is the convex variable. According to the DSP rules, both arguments to inner must be affine, and the variables they depend on must be disjoint.

Saddle inner product.

The atom saddle_inner(F, G) corresponds to the function F(x)TG(y)F(x)^{T}G(y), where FF and GG are vectors of nonnegative and respectively elementwise convex and concave functions. It is DSP-compliant if FF is DCP convex and nonnegative and GG is DCP concave. If the function GG is not DCP nonnegative, then the DCP constraint G >= 0 is attached to the expression. This is analogous to how the DCP constraint x >= 0 is added to the expression cp.log(x). As an example consider

f = saddle_inner(cp.square(x), cp.log(y)).

This represents the saddle function

f(x,y)=x2logyI(y1),f(x,y)=x^{2}\log y-I(y\geq 1),

where II is the {0,}\{0,\infty\} indicator function of its argument.

Weighted 2\ell_{2} norm.

The weighted_norm2(x, y) atom represents the saddle function (i=1nyixi2)1/2\left(\sum_{i=1}^{n}y_{i}x_{i}^{2}\right)^{1/2}, with y0y\geq 0. It is DSP-compliant if x is either DCP affine or both convex and nonnegative, and yy is DCP concave. Here too, the constraint y >= 0 is added if yy is not DCP nonnegative.

Weighted log-sum-exp.

The weighted_log_sum_exp(x, y) atom represents the saddle function log(i=1nyiexpxi)\log\left(\sum_{i=1}^{n}y_{i}\exp x_{i}\right), with y0y\geq 0. It is DSP-compliant if x is DCP convex, and yy is DCP concave. The constraint y >= 0 is added if yy is not DCP nonnegative.

Quasi-semidefinite quadratic form.

The quasidef_quad_form(x, y, P, Q, S) atom represents the function

f(x,y)=[xy]T[PSSTQ][xy],f(x,y)=\left[\begin{array}[]{c}x\\ y\end{array}\right]^{T}\left[\begin{array}[]{cc}P&S\\ S^{T}&Q\end{array}\right]\left[\begin{array}[]{c}x\\ y\end{array}\right],

where the matrix is quasi-semidefinite, i.e., PS+nP\in{\mbox{\bf S}}_{+}^{n} and QS+n-Q\in{\mbox{\bf S}}_{+}^{n}. It is DSP-compliant if xx is DCP affine and yy is DCP affine.

Quadratic form.

The saddle_quad_form(x, Y) atom represents the function xTYxx^{T}Yx, where YY is a PSD matrix. It is DSP-compliant if xx is DCP affine, and YY is DCP PSD.

5.2 Calculus rules

The atoms can be combined according to the calculus described below to form expressions that are DSP-compliant. For example, saddle functions can be added or scaled. DCP-compliant convex and concave expressions are promoted to saddle functions with no concave or convex variables, respectively. For example, with variables x, y, and z, the expression

f = 2.5 * saddle_inner(cp.square(x), cp.log(y)) + cp.minimum(y,1) - z

is DSP-compliant, with convex variable x, concave variable y, and affine variable z.

Calling the is_dsp method of an expression returns True if the expression is DSP-compliant. The methods convex_variables, concave_variables, and affine_variables, list the convex, concave, and affine variables, respectively. The convex variables are those that could only be convex, and similarly for concave variables. We refer to the convex variables as the unambiguously convex variables, and similarly for the concave variables. The three lists of variables gives a partition of all the variables the expression depends on. For the expression above, f.is_dsp() evaluates as True, f.convex_variables() returns the list [x], f.concave_variables() returns the list [y], and f.affine_variables() returns the list [z]. Note that the role of z is ambiguous in the expression, since it could be either a convex or concave variable.

No mixing variables rule.

The DSP rules prohibit mixing of convex and concave variables. For example if we add two saddle expressions, no variable can appear in both its convex and concave variable lists.

DSP-compliance is sufficient but not necessary to be a saddle function.

Recall that if an expression is DCP convex (concave), then it is convex (concave), but the converse is false. For example, the expression cp.sqrt(1 + cp.square(x)) represents the convex function 1+x2\sqrt{1+x^{2}}, but is not DCP. But we can express the same function as cp.norm2(cp.hstack([1, x])), which is DCP. The same holds for DSP and saddle function: If an expression is DSP-compliant, then it represents a saddle function; but it can represent a saddle function and not be DSP-compliant. As with DCP, such an expression would need to be rewritten in DSP-compliant form, to use any of the other features of DSP (such as a solution method). As an example, the expression x.T @ C @ y represents the saddle function xTCyx^{T}Cy, but is not DSP-compliant. The same function can be expressed as inner(x, C @ y), which is DSP-compliant. While this restrictive syntax is an inherent limitation of disciplined convex programming in general, it is required for any parser based on the DSP composition rules.

When there are affine variables in a DSP-compliant expression, it means that those variables could be considered either convex or concave; either way, the function is a saddle function.

Example.

The code below defines the bi-linear saddle function f(x,y)=xTCyf(x,y)=x^{T}Cy, the objective of a matrix game, with xx the convex variable and yy the concave variable.

Creating a saddle function.
1from dsp import * # notational convenience
2import cvxpy as cp
3import numpy as np
4
5x = cp.Variable(2)
6y = cp.Variable(2)
7C = np.array([[1, 2], [3, 1]])
8
9f = inner(x, C @ y)
10
11f.is_dsp() # True
12
13f.convex_variables() # [x]
14f.concave_variables() # [y]
15f.affine_variables() # []

Lines 1–3 import the necessary packages (which we will use but not show in the sequel). In lines 5–7, we create two CVXPY variables and a constant matrix. In line 9 we construct the saddle function f using the DSP atom inner. Both its arguments are affine, so this matches the DSP rules. In line 11 we check if saddle_function is DSP-compliant, which it is. In lines 13–15 we call functions that return lists of the convex, concave, and affine variables, respectively. The results of lines 13–15 might seem odd, but recall that inner marks its first argument as convex and its second as concave.

5.3 Saddle point problems

Saddle point problem objective.

To construct a saddle point problem, we first create an objective using

obj = MinimizeMaximize(f),

where f is a CVXPY expression. The objective obj is DSP-compliant if the expression f is DSP-compliant. This is analogous to the CVXPY contructors cp.Minimize(f) and cp.Maximize(f), which create objectives from expressions.

Saddle point problem.

A saddle point problem is constructed using

prob = SaddlePointProblem(obj, constraints, cvx_vars, ccv_vars)

Here, obj is a MinimizeMaximize objective, constraints is a list of constraints, cvx_vars is a list of convex variables and ccv_vars is a list of concave variables. The objective must be DSP-compliant for the problem to be DSP-compliant. We now describe the remaining conditions under which the constructed problem is DSP-compliant.

Each constraint in the list must be DCP, and can only involve convex variables or concave variables; convex and concave variables cannot both appear in any one constraint. The list of convex and concave variables partitions all the variables that appear in the objective or the constraints. In cases where the role of a variable is unambiguous, it is inferred, and does not need to be in either list. For example with the objective

MinimizeMaximize(weighted_log_sum_exp(x, y) + cp.exp(u) + cp.log(v) + z),

x and u must be convex variables, and y and v must be concave variables, and so do not need to appear in the lists used to construct a saddle point problem. The variable z, however, could be either a convex or concave variable, and so must appear in one of the lists.

The role of a variable can also be inferred from the constraints: Any variable that appears in a constraint with convex (concave) variables must also be convex (concave). With the objective above, the constraint z + v <= 1 would serve to classify z as a concave variable. With this constraint, we could pass empty variable lists to the saddle point constructor, since the roles of all variables can be inferred. When the roles of all variables are unambiguous, the lists are optional.

The roles of the variables in a saddle point problem prob can be found by calling prob.convex_variables() and prob.concave_variables(), which return lists of variables, and is a partition of all the variables appearing in the objective or constraints. This is useful for debugging, to be sure that DSP agrees with you about the roles of all variables. A DSP-compliant saddle point problem must have an empty list of affine variables. (If it did not, the problem would be ambiguous.)

Solving a saddle point problem.

The solve() method of a SaddlePointProblem object canonicalizes and solves the problem. This involves checking the objective and constraints for DSP-compliance. The conic representation of the problem is obtained, which involves setting up an auxiliary problem and compiling it using CVXPY. Then, the dualization is carried out, which results in another CVXPY problem which is then solved to yield the objective value. This has the side effect of setting all convex variables’ .value attribute. To also obtain the values of the concave variables, the saddle point problem is solved again with a negated objective and the roles of the minimization and maximization variables reversed. We emphasize that as DSP acts as a compiler, it does not implement any optimization algorithms itself, but rather relies on the solvers accessible through CVXPY.

Example.

Here we create and solve a matrix game, continuing the example above where f was defined. We do not need to pass in lists of variables since their roles can be inferred.

Creating and solving a matrix game.
1obj = MinimizeMaximize(f)
2constraints = [x >= 0, cp.sum(x) == 1, y >= 0, cp.sum(y) == 1]
3prob = SaddlePointProblem(obj, constraints)
4
5prob.is_dsp() # True
6prob.convex_variables() # [x]
7prob.concave_variables() # [y]
8prob.affine_variables() # []
9
10prob.solve() # solves the problem
11prob.value # 1.6666666666666667
12x.value # array([0.66666667, 0.33333333])
13y.value # array([0.33333333, 0.66666667])

5.4 Saddle extremum functions

Local variables.

An SE function has one of the forms

G(x)=supy𝒴f(x,y)orH(y)=infx𝒳f(x,y),G(x)=\sup_{y\in\mathcal{Y}}f(x,y)\quad\text{or}\quad H(y)=\inf_{x\in\mathcal{X}}f(x,y),

where ff is a saddle function. Note that yy in the definition of GG, and xx in the definition of HH, are local or dummy variables, understood to have no connection to any other variable. Their scope extends only to the definition, and not beyond.

To express this subtlety in DSP, we use the class LocalVariable to represent these dummy variables. The variables that are maximized over (in a saddle max function) or minimized over (in a saddle min function) must be declared using the LocalVariable() constructor. Any LocalVariable in an SE function cannot appear in any other SE function.

Constructing SE functions.

We construct SE functions in DSP using

saddle_max(f, constraints)orsaddle_min(f, constraints).

Here, f is a CVXPY scalar expression, and constraints is a list of constraints. We now describe the rules for constructing a DSP-compliant SE function.

If a saddle_max is being constructed, f must be DSP-compliant, and the function’s concave variables, and all variables appearing in the list of constraints, must be LocalVariables, while the function’s convex variables must all be regular Variables. A similar rule applies for saddle_min.

The list of constraints is used to specify the set over which the sup or inf is taken. Each constraint must be DCP-compliant, and can only contain LocalVariables.

With x a Variable, y_loc a LocalVariable, z_loc a LocalVariable, and z a Variable, consider the following two SE functions:

1f_1 = saddle_max(inner(x, y_loc) + z, [y_loc <= 1])
2f_2 = saddle_max(inner(x, y_loc) + z_loc, [y_loc <= 1, z_loc <= 1])

Both are DSP-compliant. For the first, calling f_1.convex_variables() would return [x, z], and calling f_1.concave_variables() would return [y_loc]. For the second, calling f_2.convex_variables() would return [x], and f_2.concave_variables() return [y_loc, z_loc].

Let y be a Variable. Both of the following are not DSP-compliant:

1f_3 = saddle_max(inner(x, y_loc) + z, [y_loc <= 1, z <= 1])
2f_4 = saddle_max(inner(x, y) + z_loc, [y_loc <= 1, z_loc <= 1])

The first is not DSP-compliant because z is not a LocalVariable, but appears in the constraints. The second is not DSP-compliant because y is not a LocalVariable, but appears as a concave variable in the saddle function.

SE functions are DCP.

When they are DSP-compliant, a saddle_max is a convex function, and a saddle_min is a concave function. They can be used anywhere in CVXPY that a convex or concave function is appropriate. You can add them, compose them (in appropriate ways), use them in the objective or either side of constraints (in appropriate ways).

Examples.

Now we provide full examples demonstrating construction of a saddle_max, which we can use to solve the matrix game described in §5.3 as a saddle problem involving an SE function.

Creating a saddle max.
1# Creating variables
2x = cp.Variable(2)
3
4# Creating local variables
5y_loc = LocalVariable(2)
6
7# Convex in x, concave in y_loc
8f = saddle_inner(C @ x, y_loc)
9
10# maximizes over y_loc
11G = saddle_max(f, [y_loc >= 0, cp.sum(y_loc) == 1])

Note that G is a CVXPY expression. Constructing a saddle_min works exactly the same way.

5.5 Saddle problems

A saddle problem is a convex problem that uses SE functions. To be DSP-compliant, the problem must be DCP (which implies all SE functions are DSP-compliant). When you call the solve method on a saddle problem involving SE functions, and the solve is successful, then all variables’ .value fields are overwritten with optimal values. This includes LocalVariables that the SE functions maximized or minimized over; they are assigned to the value of a particular maximizer or minimizer of the SE function at the value of the non-local variables, with no further guarantees.

Example.

We continue our example from §5.4 and solve the matrix game using either a saddle max.

Creating and solving a saddle problem using a saddle max to solve the matrix game.
1prob = cp.Problem(cp.Minimize(G), [x >= 0, cp.sum(x) == 1])
2
3prob.is_dsp() # True
4
5prob.solve() # solving the problem
6prob.value # 1.6666666666666667
7x.value # array([0.66666667, 0.33333333])

6 Examples

In this section we give numerical examples, taken from §3, showing how to create DSP-compliant problems. The specific problem instances we take are small, since our main point is to show how easily the problems can be specified in DSP. But DSP will scale to far larger problem instances. Again, code and data for these examples are available at https://github.com/cvxgrp/dsp.

6.1 Robust bond portfolio construction

Our first example is the robust bond portfolio construction problem described in §3.1. We consider portfolios of n=20n=20 bonds, over a period T=60T=60 half-years, i.e., 3030 years. The bonds are taken as representative ones in a global investment grade bond portfolio; for more detail, see [LSB22]. The payments from the bonds are given by CR20×60C\in{\mbox{\bf R}}^{20\times 60}, with cash flow of bond ii in period tt denoted ci,tc_{i,t}. The goal is to choose holdings hR+20h\in{\mbox{\bf R}}_{+}^{20}, with the portfolio constraint set \mathcal{H} given by

={hh0,pTh=B},\mathcal{H}=\{h\mid h\geq 0,\;p^{T}h=B\},

i.e., the investments must be nonnegative and have a total value (budget) BB, which we take to be $100. Here pR+20p\in{\mbox{\bf R}}_{+}^{20} denotes the price of the bonds on September 12, 2022. The portfolio objective is

ϕ(h)=12(hhmkt)p1,\phi(h)=\frac{1}{2}\|(h-h^{\mathrm{mkt}})\circ p\|_{1},

where hmktR+20h^{\mathrm{mkt}}\in{\mbox{\bf R}}^{20}_{+} is the market portfolio scaled to a value of $100, and \circ denotes Hadamard or elementwise multiplication. This is called the turn-over distance, since it tells us how much we would need to buy and sell to convert our portfolio to the market portfolio.

The yield curve set 𝒴\mathcal{Y} is described in terms of perturbations to the nominal or current yield curve ynomR60y^{\mathrm{nom}}\in{\mbox{\bf R}}^{60}, which is the yield curve on September 12, 2022. We take

𝒴={ynom+δ|δδmax,δ1κ,t=1T1(δt+1δt)2ω}.\mathcal{Y}=\left\{y^{\mathrm{nom}}+\delta~{}\left|~{}\|\delta\|_{\infty}\leq\delta^{\mathrm{max}},~{}\|\delta\|_{1}\leq\kappa,~{}\sum_{t=1}^{T-1}(\delta_{t+1}-\delta_{t})^{2}\leq\omega\right\}\right..

We interpret δR60\delta\in{\mbox{\bf R}}^{60} as a shock to the yield curve, which we limit elementwise, in absolute sum, and in smoothness. The specific parameter values are given by

δmax=0.02,κ=0.9,ω=106.\delta^{\mathrm{max}}=0.02,\quad\kappa=0.9,\quad\omega=10^{-6}.

In the robust bond portfolio problem (10) we take Vlim=90V^{\mathrm{lim}}=90, that is, the worst case value of the portfolio cannot drop below $90 for any y𝒴y\in\mathcal{Y}.

We solve the problem using the following code, where we assume the cash flow matrix C, the price vector p, the nominal yield curve y_nom, and the market portfolio h_mkt are defined.

Robust bond portfolio construction.
1# Constants and parameters
2n, T = C.shape
3delta_max, kappa, omega = 0.02, 0.9, 1e-6
4B = 100
5V_lim = 90
6
7# Creating variables
8h = cp.Variable(n, nonneg=True)
9
10delta = LocalVariable(T)
11y = y_nom + delta
12
13# Objective
14phi = 0.5 * cp.norm1(cp.multiply(h, p) - cp.multiply(h_mkt, p))
15
16# Creating saddle min function
17V = 0
18for i in range(n):
19 t_plus_1 = np.arange(T) + 1 # Account for zero-indexing
20 V += saddle_inner(cp.exp(cp.multiply(-t_plus_1, y)), h[i] * C[i])
21
22Y = [
23 cp.norm_inf(delta) <= delta_max,
24 cp.norm1(delta) <= kappa,
25 cp.sum_squares(delta[1:] - delta[:-1]) <= omega,
26]
27
28V_wc = saddle_min(V, Y)
29
30# Creating and solving the problem
31problem = cp.Problem(cp.Minimize(phi), [h @ p == B, V_wc >= V_lim])
32problem.solve() # 15.32

We first define the constants and parameters in lines 2–5, before creating the variable for the holdings h in line 8, and the LocalVariable delta, which gives the yield curve perturbation, in line 10. In line 11 we define y as the sum of the current yield curve y_nom and the perturbation delta. The objective function is defined in line 14. Lines 17–20 define the saddle function V via the saddle_inner atom. The yield uncertainty set Y is defined in lines 22–26, and the worst case portfolio value is defined in line 25 using saddle_min. We use the concave expression saddle_min to create and solve a CVXPY problem in lines 31–32.

Table 1 summarizes the results. The nominal portfolio is the market portfolio, which has zero turn-over distance to the market portfolio, i.e., zero objective value. This nominal portfolio, however, does not satisfy the worst-case portfolio value constraint, since there are yield curves in 𝒴\mathcal{Y} that cause the portfolio value to drop to around $87, less than our limit of $90. The solution of the robust problem has turn-over distance $15.32, and satisfies the constraint that the worst-case value be at least $90.

Nominal portfolio Robust portfolio
Turn-over distance $0.00 $15.32
Worst-case value $86.99 $90.00
Table 1: Turn-over distance and worst-case value for the nominal (market) portfolio and the robust portfolio. The nominal portfolio does not meet our requirement that the worst-case value be at least $90.

6.2 Model fitting robust to data weights

We consider an instance of the model fitting problem described in §3.2. We use the well known Titanic data set [HC17], which gives several attributes for each passenger on the ill-fated Titanic voyage, including whether they survived. A classifier is fit to predict survival based on the features sex, age (binned into three groups, 0–26, 26–53, and 53–80), and class (11, 22, or 33). These features are encoded as a Boolean vector aiR7a_{i}\in{\mbox{\bf R}}^{7}. The label yi=1y_{i}=1 means passenger ii survived, and yi=1y_{i}=-1 otherwise. There are 1046 examples, but we fit our model using only the m=50m=50 passengers who embarked from Queenstown, one of three ports of embarkation. This is a somewhat non-representative sample; for example, the survival rate among Queenstown departures is 26%, whereas the overall survival rate is 40.8%. This is a common situation in machine learning, where the distribution of labels in the training data does not match that of the test dataset (known as label shift), for which we seek a robust solution.

We seek a linear classifier y^i=𝐬𝐢𝐠𝐧(aiTθ+β0)\hat{y}_{i}=\mathop{\bf sign}(a_{i}^{T}\theta+\beta_{0}), where θR7\theta\in{\mbox{\bf R}}^{7} is the classifier parameter vector and β0R\beta_{0}\in{\mbox{\bf R}} is the bias. The hinge loss and 2\ell_{2} regularization are used, given by

i(θ)=max(0,1yiaiTθ),r(θ)=ηθ22,\ell_{i}(\theta)=\max(0,1-y_{i}a_{i}^{T}\theta),\qquad r(\theta)=\eta\|\theta\|_{2}^{2},

with η=0.05\eta=0.05.

The data is weighted to partially correct for the different survival rates for our training set (26%) and the whole data set (40.8%). To do this we set wi=z1w_{i}=z_{1} when yi=1y_{i}=1 and wi=z2w_{i}=z_{2} when yi=1y_{i}=-1. We require w0w\geq 0 and 𝟏Tw=1\mathbf{1}^{T}w=1, and

0.4080.05yi=1wi0.408+0.05.0.408-0.05\leq\sum_{y_{i}=1}w_{i}\leq 0.408+0.05.

Thus 𝒲\mathcal{W} consists of weights on the Queenstown departure samples that correct the survival rate to within 5% of the overall survival rate.

The code shown below solves this problem, where we assume the data matrix is already defined as A_train (with rows aiTa_{i}^{T}), the survival label vector is defined as y_train, and the indicator of survival in the training set is defined as surv.

Model fitting robust to data weights.
1# Constants and parameters
2m, n = A_train.shape
3inds_0 = surv == 0
4inds_1 = surv == 1
5eta = 0.05
6
7# Creating variables
8theta = cp.Variable(n)
9beta_0 = cp.Variable()
10weights = cp.Variable(m, nonneg=True)
11surv_weight_0 = cp.Variable()
12surv_weight_1 = cp.Variable()
13
14# Defining the loss function and the weight constraints
15y_hat = A_train @ theta + beta_0
16loss = cp.pos(1 - cp.multiply(y_train, y_hat))
17objective = MinimizeMaximize(saddle_inner(loss, weights)
18 + eta * cp.sum_squares(theta))
19
20constraints = [
21 cp.sum(weights) == 1,
22 0.408 - 0.05 <= weights @ surv,
23 weights @ surv <= 0.408 + 0.05,
24 weights[inds_0] == surv_weight_0,
25 weights[inds_1] == surv_weight_1,
26]
27
28# Creating and solving the problem
29problem = SaddlePointProblem(objective, constraints)
30problem.solve()

After defining the constants and parameters in lines 2–5, we specify the variables for the model coefficient and the weights in lines 8–9 and 10–12, respectively. The loss function and regularizer which make up the objective are defined next in lines 15–18. The weight constraints are defined in lines 20–26. The saddle point problem is created and solved in lines 29 and 30.

Nominal classifier Robust classifier
Train accuracy 82.0% 80.0%
Test accuracy 76.0% 78.6%
Table 2: Nominal and worst-case objective values for classification and robust classification models.

The results are shown in table 2. We report the test accuracy on all samples in the dataset with a different port of embarkation than Queenstown (996 samples). We see that while the robust classification model has slightly lower training accuracy than the nominal model, it achieves a higher test accuracy, generalizing from the non-representative training data better than the nominal classifier, which uses uniform weights.

6.3 Robust Markowitz portfolio construction

We consider the robust Markowitz portfolio construction problem described in §3.4. We take n=6n=6 assets, which are the (five) Fama-French factors  [FF15] plus a risk-free asset. The data is obtained from the Kenneth R. French data library [Fre22], with monthly return data available from July 1963 to October 2022. The nominal return and risk are the empirical mean and covariance of the returns. (These obviously involve look-ahead, but the point of the example is how to specify and solve the problem with DSP, not the construction of a real portfolio.) We take parameters ρ=0.02\rho=0.02, η=0.2\eta=0.2, and risk aversion parameter γ=1\gamma=1.

In the code, we use mu and Sigma for the mean and covariance estimates, respectively, and the parameters are denoted rho, eta, and gamma.

Robust Markowitz portfolio construction.
1# Constants and parameters
2n = len(mu)
3rho, eta, gamma = 0.2, 0.2, 1
4
5# Creating variables
6w = cp.Variable(n, nonneg=True)
7
8delta_loc = LocalVariable(n)
9Sigma_perturbed = LocalVariable((n, n), PSD=True)
10Delta_loc = LocalVariable((n, n))
11
12# Creating saddle min function
13f = w @ mu + saddle_inner(delta_loc, w) \
14 - gamma * saddle_quad_form(w, Sigma_perturbed)
15
16Sigma_diag = Sigma.diagonal()
17local_constraints = [
18 cp.abs(delta_loc) <= rho, Sigma_perturbed == Sigma + Delta_loc,
19 cp.abs(Delta_loc) <= eta * np.sqrt(np.outer(Sigma_diag, Sigma_diag))
20]
21
22G = saddle_min(f, local_constraints)
23
24# Creating and solving the problem
25problem = cp.Problem(cp.Maximize(G), [cp.sum(w) == 1])
26problem.solve() # 0.076

We first define the constants and parameters, before creating the weights variable in line 6, and the local variables for the perturbations in lines 8–10. The saddle function for the objective is defined in line 13, followed by the constraints on the perturbations. Both are combined into the concave saddle min function, which is maximized over the portfolio constraints in lines 25–26.

The results are shown in table 3. The robust portfolio yields a slightly lower risk adjusted return of 0.291 compared to the nominal optimal portfolio with 0.295. But the robust portfolio attains a higher worst-case risk adjusted return of 0.076, compared to the nominal optimal portfolio which attains 0.065.

Nominal portfolio Robust portfolio
Nominal objective .295 .291
Robust objective .065 .076
Table 3: Nominal and worst-case objective for the nominal and robust portfolios.

Acknowledgements

P. Schiele is supported by a fellowship within the IFI program of the German Academic Exchange Service (DAAD). This research was partially supported by ACCESS (AI Chip Center for Emerging Smart Systems), sponsored by InnoHK funding, Hong Kong SAR, and by ONR N000142212121.

References

  • [Mar52] H. Markowitz “Portfolio Selection” In Journal of Finance 7, 1952, pp. 77–91
  • [MVN53] O. Morgenstern and J. Von Neumann “Theory of games and economic behavior” Princeton University Press, 1953
  • [Sio58] M. Sion “On general minimax theorems” In Pacific Journal of Mathematics 8.1, 1958, pp. 171–176
  • [Roc70] R. Rockafellar “Convex analysis” Princeton university press, 1970
  • [Kor76] G. Korpelevich “The extragradient method for finding saddle points and other problems” In Matecon 12, 1976, pp. 747–756
  • [DM90] V. Dem’yanov and V. Malozemov “Introduction to minimax” Courier Corporation, 1990
  • [BL91] F. Black and R. Litterman “Asset Allocation” In The Journal of Fixed Income 1.2 Institutional Investor Journals Umbrella, 1991, pp. 7–18 DOI: 10.3905/jfi.1991.408013
  • [NN92] Y. Nesterov and A. Nemirovski “Conic formulation of a convex programming problem and duality” In Optimization Methods & Software 1, 1992, pp. 95–115
  • [HK93] R. Hettich and K. Kortanek “Semi-infinite programming: Theory, methods, and applications” In SIAM review 35.3, 1993, pp. 380–429
  • [DP95] D. Du and P. Pardalos “Minimax and applications” Springer Science & Business Media, 1995
  • [Nem99] A. Nemirovski “On self-concordant convex–concave functions” In Optimization Methods and Software 11.1-4 Taylor & Francis, 1999, pp. 303–384 DOI: 10.1080/10556789908805755
  • [LB00] M. Lobo and S. Boyd “The worst-case risk of a portfolio”, 2000 URL: https://web.stanford.edu/~boyd/papers/pdf/risk_bnd.pdf
  • [GI03] D. Goldfarb and G. Iyengar “Robust Portfolio Selection Problems” In Mathematics of Operations Research 28.1 INFORMS, 2003, pp. 1–38 URL: http://www.jstor.org/stable/4126989
  • [HT03] B. Halldórsson and R. Tütüncü “An interior-point method for a class of saddle-point problems” In Journal of Optimization Theory and Applications 116.3 Springer, 2003, pp. 559–590
  • [BV04] S. Boyd and L. Vandenberghe “Convex Optimization” Cambridge University Press, 2004
  • [Lof04] J. Lofberg “YALMIP : a toolbox for modeling and optimization in MATLAB” In 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508), 2004, pp. 284–289 DOI: 10.1109/CACSD.2004.1393890
  • [Nem04] A. Nemirovski “Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems” In SIAM Journal on Optimization 15.1 SIAM, 2004, pp. 229–251
  • [Nes05] Y. Nesterov “Excessive gap technique in nonsmooth convex minimization” In SIAM Journal on Optimization 16.1 SIAM, 2005, pp. 235–249
  • [Nes05a] Y. Nesterov “Smooth minimization of non-smooth functions” In Mathematical programming 103.1 Springer, 2005, pp. 127–152
  • [BL06] J. Borwein and A. Lewis “Convex Analysis” Springer, 2006
  • [GBY06] M. Grant, S. Boyd and Y. Ye “Disciplined convex programming” In Global optimization Springer, 2006, pp. 155–210
  • [NP06] Y. Nesterov and B. Polyak “Cubic regularization of Newton method and its global performance” In Mathematical Programming 108.1 Springer, 2006, pp. 177–205
  • [Nes07] Y. Nesterov “Dual extrapolation and its applications to solving variational inequalities and related problems” In Mathematical Programming 109.2 Springer, 2007, pp. 319–344
  • [Nes08] Y. Nesterov “Accelerating the cubic regularization of Newton’s method on convex problems” In Mathematical Programming 112.1 Springer, 2008, pp. 159–181
  • [BTEGN09] A. Ben-Tal, L. El Ghaoui and A. Nemirovski “Robust optimization” Princeton university press, 2009
  • [MB09] A. Mutapcic and S. Boyd “Cutting-set methods for robust convex optimization with pessimizing oracles” In Optimization Methods & Software 24.3 Taylor & Francis, 2009, pp. 381–406
  • [NO09] A. Nedić and A. Ozdaglar “Subgradient methods for saddle-point problems” In Journal of optimization theory and applications 142.1 Springer, 2009, pp. 205–228
  • [RW09] R. Rockafellar and R. Wets “Variational analysis” Springer Science & Business Media, 2009
  • [BBC11] D. Bertsimas, D. Brown and C. Caramanis “Theory and applications of robust optimization” In SIAM review 53.3 SIAM, 2011, pp. 464–501
  • [CP11] A. Chambolle and T. Pock “A first-order primal-dual algorithm for convex problems with applications to imaging” In Journal of mathematical imaging and vision 40.1 Springer, 2011, pp. 120–145
  • [CLO13] Y. Chen, G. Lan and Y. Ouyang “Optimal Primal-Dual Methods for a Class of Saddle Point Problems” arXiv, 2013 DOI: 10.48550/ARXIV.1309.5548
  • [Con13] L. Condat “A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms” In Journal of optimization theory and applications 158.2 Springer, 2013, pp. 460–479
  • [Goo+14] I. Goodfellow et al. “Generative Adversarial Nets” In Advances in Neural Information Processing Systems 27 Curran Associates, Inc., 2014
  • [GB14] M. Grant and S. Boyd “CVX: Matlab software for disciplined convex programming, version 2.1”, 2014
  • [Ude+14] M. Udell et al. “Convex Optimization in Julia” In SC14 Workshop on High Performance Technical Computing in Dynamic Languages, 2014 arXiv:1410.4821 [math-oc]
  • [BS15] K. Bredies and H. Sun “Preconditioned Douglas–Rachford splitting methods for convex-concave saddle-point problems” In SIAM Journal on Numerical Analysis 53.1 SIAM, 2015, pp. 421–444
  • [FF15] E. Fama and K. French “A five-factor asset pricing model” In Journal of Financial Economics 116.1, 2015, pp. 1–22 DOI: https://doi.org/10.1016/j.jfineco.2014.10.010
  • [CP16] A. Chambolle and T. Pock “On the ergodic convergence rates of a first-order primal–dual algorithm” In Mathematical Programming 159.1 Springer, 2016, pp. 253–287
  • [DB16] S. Diamond and S. Boyd “CVXPY: A Python-embedded modeling language for convex optimization” In Journal of Machine Learning Research 17.83, 2016, pp. 1–5
  • [Boy+17] S. Boyd et al. “Multi-Period Trading via Convex Optimization” In Foundations and Trends in Optimization 3.1, 2017, pp. 1–76 DOI: 10.1561/2400000023
  • [HC17] F. Harrell Jr. and T. Cason “Titanic dataset”, 2017 URL: https://www.openml.org/d/40945
  • [CPT18] G. Cornuéjols, J. Peña and R. Tütüncü “Optimization Methods in Finance” Cambridge University Press, 2018 DOI: 10.1017/9781107297340
  • [DA19] X. Dou and M. Anitescu “Distributionally robust optimization with correlated data from vector autoregressive processes” In Operations Research Letters 47.4, 2019, pp. 294–299 DOI: https://doi.org/10.1016/j.orl.2019.04.005
  • [The+19] K. Thekumparampil, P. Jain, P. Netrapalli and S. Oh “Efficient Algorithms for Smooth Minimax Optimization” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019 URL: https://proceedings.neurips.cc/paper/2019/file/05d0abb9a864ae4981e933685b8b915c-Paper.pdf
  • [BGM20] T. Broderick, R. Giordano and R. Meager “An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?” In arXiv preprint arXiv:2011.14999, 2020
  • [FNB20] A. Fu, B. Narasimhan and S. Boyd “CVXR: An R Package for Disciplined Convex Optimization” In Journal of Statistical Software 94.14, 2020, pp. 1–34 DOI: 10.18637/jss.v094.i14
  • [LJJ20] T. Lin, C. Jin and M. Jordan “Near-optimal algorithms for minimax optimization” In Conference on Learning Theory, 2020, pp. 2738–2779 PMLR
  • [BAB21] S. Barratt, G. Angeris and S. Boyd “Optimal representative sample weighting” In Statistics and Computing 31.2 Springer, 2021, pp. 1–14
  • [Fre22] K. French “Kenneth R. French Data Library”, 2022 URL: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
  • [JN22] A. Juditsky and A. Nemirovski “On well-structured convex–concave saddle point problems and variational inequalities with monotone operators” In Optimization Methods and Software 37.5 Taylor & Francis, 2022, pp. 1567–1602 DOI: 10.1080/10556788.2021.1928121
  • [LSB22] E. Luxenberg, P. Schiele and S. Boyd “Robust Bond Portfolio Construction via Convex-Concave Saddle Point Optimization”, 2022 DOI: 10.48550/ARXIV.2212.02570
  • [RY22] E. Ryu and W. Yin “Large-Scale Convex Optimization: Algorithms & Analyses via Monotone Operators” Cambridge University Press, 2022