This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Domain-Driven Solver (DDS) Version 2.0:
a MATLAB-based Software Package for Convex Optimization Problems in Domain-Driven Form

Mehdi Karimi and Levent Tunçel
(Date: August 7, 2019 (arXiv: 1908.03075); revised: August 11, 2025)
Abstract.

Domain-Driven Solver (DDS) is a MATLAB-based software package for convex optimization problems in Domain-Driven form [19]. The current version of DDS accepts every combination of the following function/set constraints: (1) symmetric cones (LP, SOCP, and SDP); (2) quadratic constraints that are SOCP representable; (3) direct sums of an arbitrary collection of 2-dimensional convex sets defined as the epigraphs of univariate convex functions (including as special cases geometric programming and entropy programming); (4) generalized power cone; (5) epigraphs of matrix norms (including as a special case minimization of nuclear norm over a linear subspace); (6) vector relative entropy; (7) epigraphs of quantum entropy and quantum relative entropy; and (8) constraints involving hyperbolic polynomials. DDS is a practical implementation of the infeasible-start primal-dual algorithm designed and analyzed in [19]. This manuscript contains the users’ guide, as well as theoretical results needed for the implementation of the algorithms. To help the users, we included many examples. We also discussed some implementation details and techniques we used to improve the efficiency and further expansion of the software to cover the emerging classes of convex optimization problems.

Mehdi Karimi (m7karimi@uwaterloo.ca) and Levent Tunçel (ltuncel@math.uwaterloo.ca): Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada.
Research of the authors was supported in part by Discovery Grants from NSERC and by U.S. Office of Naval Research under award numbers: N00014-12-1-0049, N00014-15-1-2171 and N00014-18-1-2078.

1. Introduction

The code DDS (Domain-Driven Solver) solves convex optimization problems of the form

(1) infx{c,x:AxD},\displaystyle\inf_{x}\{\langle c,x\rangle:Ax\in D\},

where xAx:nmx\mapsto Ax:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} is a linear embedding, AA and cnc\in\mathbb{\mathbb{R}}^{n} are given, and DmD\subset\mathbb{\mathbb{R}}^{m} is a closed convex set defined as the closure of the domain of a ϑ\vartheta-self-concordant (s.c.) barrier Φ\Phi [27, 26]. In practice, the set DD is typically formulated as D=D1DD=D_{1}\oplus\cdots\oplus D_{\ell}, where DiD_{i} is associated with a s.c. barrier Φi\Phi_{i}, for i{1,,}i\in\{1,\ldots,\ell\}. Every input constraint for DDS may be thought of as either the convex set it defines or the corresponding s.c. barrier.

The current version of DDS accepts many functions and set constraints as we explain in this article. If a user has a nonlinear convex objective function f(x)f(x) to minimize, one can introduce a new variable xn+1x_{n+1} and minimize a linear function xn+1x_{n+1} subject to the convex set constraint f(x)xn+1f(x)\leq x_{n+1} (and other convex constraints in the original optimization problem). As a result, in this article we will talk about representing functions and convex set constraints interchangeably. The algorithm underlying the code also uses the Legendre-Fenchel (LF) conjugate Φ\Phi_{*} of Φ\Phi if it is computationally efficient to evaluate Φ\Phi_{*} and its first (and hopefully second) derivative. Any new discovery of a s.c. barrier allows DDS to expand the classes of convex optimization problems it can solve as any new s.c. barrier Φ\Phi with a computable LF conjugate can be easily added to the code. DDS is a practical implementation of the primal-dual algorithm designed and analyzed in [19], which has the current best iteration complexity bound available for conic formulations. Stopping criteria for DDS and the way DDS suggests the status (“has an approximately optimal solution”, “is infeasible”, “is unbounded”, etc.) is based on the analyses in [20].

Even though there are similarities between DDS and some modeling systems such as CVX [16] (such as the variety input constraints), there are major differences, including:

  • DDS is not just a modeling system and it uses its own algorithm. The algorithm used in DDS is an infeasible-start primal-dual path-following algorithm, and is of predictor corrector type [19].

  • The modeling systems for convex optimization that rely on SDP solvers have to use approximation for set constraints which are not efficiently representable by spectrahedra (for example epigraph of functions involving expexp or ln functions). However, DDS uses a s.c. barrier specifically well-suited to each set constraint without such approximations. This enables DDS to return proper certificates for all the input problems.

  • As far as we know, some set constraints such as hyperbolic ones, are not accepted by other modeling systems. Some other constraints such as epigraphs of matrix norms, and those involving quantum entropy and quantum relative entropy are handled more efficiently than other existing options.

The main part of the article is organized to be more as a users’ guide, and contains the minimum theoretical content required by the users. In Section 2, we explain how to install and use DDS. Sections 3-11 explain how to input different types of set/function constraints into DDS. Section 13 contains several numerical results and tables. Many theoretical foundations needed by DDS are included in the appendices. The LHS matrix of the linear systems of equations determining the predictor and corrector steps have a similar form. In Appendix A, we explain how such linear systems are being solved for DDS. Some of the constraints accepted by DDS are involving matrix functions and many techniques are used in DDS to efficiently evaluate these functions and their derivatives, see Appendices B and C. DDS heavily relies on efficient evaluation of LF conjugates of s.c. barrier functions. For some of the functions, we show in the main text how to efficiently calculate the LF conjugate. For some others, the calculations are shifted to the appendices, for example Appendix D. DDS uses interior-point methods, and gradient and Hessian of the s.c. barriers are building blocks of the underlying linear systems. Explicit formulas for the gradient and Hessian of some s.c. barriers and their LF conjugates are given in Appendix D.2. Appendix F contains a brief comparison of DDS with some popular solvers and modeling systems.

1.1. Installation

The current version of DDS is written in MATLAB. This version is available from the websites:

To use DDS, the user can follow these steps:

  • unzip DDS.zip;

  • run MATLAB in the directory DDS;

  • run the m-file DDS_startup.m.

  • (optional) run the prepared small examples DDS_example_1.m and DDS_example_2.m.

The prepared examples contain many set constraints accepted by DDS and running them without error indicates that DDS is ready to use. There is a directory Text_Examples in the DDS package which includes many examples on different classes of convex optimization problems.

2. How to use the DDS code

In this section, we explain the format of the input for many popular classes of optimization problems. In practice, we typically have D=D¯bD=\bar{D}-b, where intD¯\textup{int}\bar{D} is the domain of a “canonical” s.c. barrier and bmb\in\mathbb{R}^{m}. For example, for LP, we typically have D=+n+bD=\mathbb{R}^{n}_{+}+b, where bmb\in\mathbb{R}^{m} is given as part of the input data, and i=1nln(xi)-\sum_{i=1}^{n}\textup{ln}(x_{i}) is a s.c. barrier for +n\mathbb{R}^{n}_{+}. The command in MATLAB that calls DDS is

[x,y,info]=DDS(c,A,b,cons,OPTIONS);

Input Arguments:
cons: A cell array that contains the information about the type of constraints.
c,A,b: Input data for DDS: AA is the coefficient matrix, cc is the objective vector, bb is the RHS vector (i.e., the shift in the definition of the convex domain DD).
OPTIONS (optional): An array which contains information about the tolerance and initial points.

Output Arguments:
x: Primal point.
y: Dual point which is a cell array. Each cell contains the dual solution for the constraints in the corresponding cell in A.
info: A structure array containing performance information such as info.time, which returns the CPU time for solving the problem.

Note that in the Domain-Driven setup, the primal problem is the main problem, and the dual problem is implicit for the user. This implicit dual problem is:

(2) infy{δ(y|D):Ay=c,yD},\displaystyle\inf_{y}\{\delta_{*}(y|D):A^{\top}y=-c,y\in D_{*}\},

where δ(y|D):=sup{y,z:zD},\delta_{*}(y|D):=\sup\{\langle y,z\rangle:z\in D\}, is the support function of DD, and DD_{*} is defined as

(3) D:={y:y,h0,hrec(D)},\displaystyle D_{*}:=\{y:\langle y,h\rangle\leq 0,\ \ \forall h\in\text{rec}(D)\},

where rec(D)\text{rec}(D) is the recession cone of DD. For the convenience of users, there is a built-in function in DDS package to calculate the dual objective value of the returned yy vector:

z=dual_obj_value(y,b,cons);

For a primal feasible point xnx\in\mathbb{R}^{n} which satisfies AxDAx\in D and a dual feasible point yDy\in D_{*}, the duality gap is defined in [19] as

(4) c,x+δ(y|D).\displaystyle\langle c,x\rangle+\delta_{*}(y|D).

It is proved in [19] that the duality gap is well-defined and zero duality gap implies optimality. If DDS returns status “solved” for a problem (info.status=1), it means (x,y) is a pair of approximately feasible primal and dual points, with duality gap close to zero (based on tolerance). If info.status=2, the problem is suspected to be unbounded and the returned x is a point, approximately primal feasible with very small objective value (c,x1/tol\langle c,x\rangle\leq-1/tol). If info.status=3, problem is suspected to be infeasible, and the returned y in DD_{*} approximately satisfies Ay=0A^{\top}y=0 with δ(y|D)<0\delta_{*}(y|D)<0. If info.status=4, problem is suspected to be ill-conditioned.

The user is not required to input any part of the OPTIONS array. The default settings are:

  • tol=108tol=10^{-8}.

  • The initial points x0x^{0} and z0z^{0} for the infeasible-start algorithm are chosen such that, assuming D=D1DD=D_{1}\oplus\cdots\oplus D_{\ell}, the iith part of Ax0+z0Ax^{0}+z^{0} is a canonical point in intDi\textup{int}D_{i}.

However, if a user chooses to provide OPTIONS as an input, here is how to define the desired parts: OPTIONS.tol may be given as the desired tolerance, otherwise the default tol:=108tol:=10^{-8} is used. OPTIONS.x0 and OPTIONS.z0 may be defined as the initial points as any pair of points x0nx^{0}\in\mathbb{R}^{n} and z0mz^{0}\in\mathbb{R}^{m} that satisfy Ax0+z0intDAx^{0}+z^{0}\in\textup{int}D. If only OPTIONS.x0 is given, then x0x^{0} must satisfy Ax0intDAx^{0}\in\textup{int}D. In other words, OPTIONS.x0 is a point that strictly satisfies all the constraints.

In the following sections, we discuss the format of each input function/set constraint. Table 1 shows the classes of function/set constraints the current version of DDS accepts, plus the abbreviation we use to represent the constraint. From now on, we assume that the objective function is “infc,x\inf\ \langle c,x\rangle”, and we show how to add various function/set constraints. Note that A, b, and cons are cell arrays in MATLAB. cons(k,1) represents type of the kkth block of constraints by using the abbreviations of Table 1. For example, cons(2,1)=’LP’ means that the second block of constraints are linear inequalities. It is advisable to group the constraints of the same type in one block, but not necessary.

Table 1. Function/set constraints the current version of DDS accepts, and their abbreviations.
function/set constraint abbreviation
LP LP
SOCP SOCP
Rotated SOCP SOCPR
SDP SDP
Generalized Power Cone GPC
Quadratic Constraints QC
Epigraph of a Matrix Norm MN
Direct sum of 2-dimensional sets
(geometric, entropy, and pp-norm TD
programming)
Quantum Entropy QE
Quantum Relative Entropy QRE
Relative Entropy RE
Hyperbolic Polynomials HB
Equality Constraints EQ

3. LP, SOCP, SDP

3.1. Linear programming (LP) and second-order cone programming (SOCP)

Suppose we want to add \ell LP constraints of the form

(5) ALix+bLi0,i{1,,},\displaystyle A_{L}^{i}x+b_{L}^{i}\geq 0,\ \ \ i\in\{1,\ldots,\ell\},

where ALiA_{L}^{i} is an mLim_{L}^{i}-by-nn matrix, as the kkth block of constraints. Then, we define

(12) A{k,1}=[AL1AL],b{k,1}=[bL1bL]\displaystyle\texttt{A\{k,1\}}=\left[\begin{array}[]{c}A_{L}^{1}\\ \vdots\\ A_{L}^{\ell}\end{array}\right],\ \ \ \texttt{b\{k,1\}}=\left[\begin{array}[]{c}b_{L}^{1}\\ \vdots\\ b_{L}^{\ell}\end{array}\right]
(13) cons{k,1}=’LP’,cons{k,2}=[mL1,,mL].\displaystyle\texttt{cons\{k,1\}='LP'},\ \ \ \texttt{cons\{k,2\}}=[m_{L}^{1},\ldots,m_{L}^{\ell}].

Similarly to add \ell SOCP constraints of the form

(14) ASix+bSi(gSi)x+dSi,i{1,,},\displaystyle\|A_{S}^{i}x+b_{S}^{i}\|\leq(g_{S}^{i})^{\top}x+d_{S}^{i},\ \ \ i\in\{1,\ldots,\ell\},

where ASiA_{S}^{i} is an mSim_{S}^{i}-by-nn matrix for i={1,,}i=\in\{1,\ldots,\ell\}, as the kkth block, we define

(25) A{k,1}=[(gS1)AS1(gS)AS],b{k,1}=[dS1bS1dSbS]\displaystyle\texttt{A\{k,1\}}=\left[\begin{array}[]{c}(g_{S}^{1})^{\top}\\ A_{S}^{1}\\ \vdots\\ (g_{S}^{\ell})^{\top}\\ A_{S}^{\ell}\end{array}\right],\ \ \ \texttt{b\{k,1\}}=\left[\begin{array}[]{c}d_{S}^{1}\\ b_{S}^{1}\\ \vdots\\ d_{S}^{\ell}\\ b_{S}^{\ell}\end{array}\right]
(26) cons{k,1}=’SOCP’,cons{k,2}=[mS1,,mS].\displaystyle\texttt{cons\{k,1\}='SOCP'},\ \ \ \texttt{cons\{k,2\}}=[m_{S}^{1},\ldots,m_{S}^{\ell}].

Let us see an example:

Example 3.1.

Suppose we are given the problem:

(31) min\displaystyle\min cx\displaystyle c^{\top}x
s.t. [2,1]x1,\displaystyle[-2,1]x\leq 1,
[2113]x+[34]22.\displaystyle\left\|\left[\begin{array}[]{cc}2&1\\ 1&3\end{array}\right]x+\left[\begin{array}[]{c}3\\ 4\end{array}\right]\right\|_{2}\leq 2.

Then we define

cons{1,1}=’LP’,cons{1,2}=[1],A{1,1}=[21],b{1,1}=[1],\displaystyle\texttt{cons\{1,1\}='LP'},\ \ \texttt{cons\{1,2\}=[1]},\ \ \texttt{A\{1,1\}}=\left[\begin{array}[]{cc}2&-1\end{array}\right],\ \ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}1\end{array}\right],
cons{2,1}=’SOCP’,cons{2,2}=[2],A{2,1}=[002113],b{2,2}=[234].\displaystyle\texttt{cons\{2,1\}='SOCP'},\ \ \texttt{cons\{2,2\}=[2]},\ \ \texttt{A\{2,1\}}=\left[\begin{array}[]{cc}0&0\\ 2&1\\ 1&3\end{array}\right],\ \ \ \texttt{b\{2,2\}}=\left[\begin{array}[]{c}2\\ 3\\ 4\end{array}\right].

The s.c. barriers and their LF conjugates being used in DDS for these constraints are

Φ(z)=ln(z),z+,Φ(η)=1ln(η),\displaystyle\Phi(z)=-\textup{ln}(z),\ \ z\in\mathbb{R}_{+},\ \ \ \Phi_{*}(\eta)=-1-\textup{ln}(-\eta),
(34) Φ(t,z)=ln(t2zz),Φ(η,w)=2+ln(4)ln(η2ww).\displaystyle\Phi(t,z)=-\textup{ln}(t^{2}-z^{\top}z),\ \ \ \Phi_{*}(\eta,w)=-2+\textup{ln}(4)-\textup{ln}(\eta^{2}-w^{\top}w).

DDS also accepts constraints defined by the rotated second order cones:

(35) {(z,t,s)n:z2ts,t0,s0},\displaystyle\{(z,t,s)\in\mathbb{R}^{n}\oplus\mathbb{R}\oplus\mathbb{R}:\|z\|^{2}\leq ts,\ t\geq 0,\ s\geq 0\},

which is handled by the s.c. barrier ln(tszz)-\textup{ln}(ts-z^{\top}z). The abbreviation we use is ’SOCPR’. To add \ell rotated SOCP constraints of the form

ASix+bSi22((gSi)x+dSi)((g¯Si)x+d¯Si),i{1,,},\displaystyle\|A_{S}^{i}x+b_{S}^{i}\|_{2}^{2}\leq((g_{S}^{i})^{\top}x+d_{S}^{i})((\bar{g}_{S}^{i})^{\top}x+\bar{d}_{S}^{i}),\ \ \ i\in\{1,\ldots,\ell\},
(36) (gSi)x+dSi0,(g¯Si)x+d¯Si0,\displaystyle(g_{S}^{i})^{\top}x+d_{S}^{i}\geq 0,\ \ (\bar{g}_{S}^{i})^{\top}x+\bar{d}_{S}^{i}\geq 0,

where ASiA_{S}^{i} is an mSim_{S}^{i}-by-nn matrix for i{1,,}i\in\{1,\ldots,\ell\}, as the kkth block, we define

(51) A{k,1}=[(gS1)(g¯S1)AS1(gS)(g¯S)AS],b{k,1}=[dS1d¯S1bS1dSd¯SbS]\displaystyle\texttt{A\{k,1\}}=\left[\begin{array}[]{c}(g_{S}^{1})^{\top}\\ (\bar{g}_{S}^{1})^{\top}\\ A_{S}^{1}\\ \vdots\\ (g_{S}^{\ell})^{\top}\\ (\bar{g}_{S}^{\ell})^{\top}\\ A_{S}^{\ell}\end{array}\right],\ \ \ \texttt{b\{k,1\}}=\left[\begin{array}[]{c}d_{S}^{1}\\ \bar{d}_{S}^{1}\\ b_{S}^{1}\\ \vdots\\ d_{S}^{\ell}\\ \bar{d}_{S}^{\ell}\\ b_{S}^{\ell}\end{array}\right]
(52) cons{k,1}=’SOCPR’,cons{k,2}=[mS1,,mS].\displaystyle\texttt{cons\{k,1\}='SOCPR'},\ \ \ \texttt{cons\{k,2\}}=[m_{S}^{1},\ldots,m_{S}^{\ell}].

3.2. Semidefinite programming (SDP)

Consider \ell SDP constraints in standard inequality (linear matrix inequality (LMI)) form:

(53) F0i+x1F1i++xnFni0,i{1,,}.\displaystyle F^{i}_{0}+x_{1}F^{i}_{1}+\cdots+x_{n}F^{i}_{n}\succeq 0,\ \ \ i\in\{1,\ldots,\ell\}.

FjiF^{i}_{j}’s are nin_{i}-by-nin_{i} symmetric matrices. The above optimization problem is in the matrix form. To formulate it in our setup, we need to write it in the vector form. DDS has two internal functions sm2vec and vec2sm. sm2vec takes an nn-by-nn symmetric matrix and changes it into a vector in n2\mathbb{R}^{n^{2}} by stacking the columns of it on top of one another in order. vec2sm changes a vector into a symmetric matrix such that

(54) vec2sm(sm2vec(X))=X.\displaystyle\texttt{vec2sm(sm2vec(X))=X}.

By this definition, it is easy to check that for any pair of nn-by-nn symmetric matrices XX and YY we have

(55) X,Y=sm2vec(X)sm2vec(Y).\displaystyle\langle X,Y\rangle=\texttt{sm2vec(X)}^{\top}\texttt{sm2vec(Y)}.

To give (53) to DDS as the kkth input block, we define:

(62) A{k,1}:=[sm2vec(F11),,sm2vec(Fn1)sm2vec(F1),,sm2vec(Fn)],b{k,1}:=[sm2vec(F01)sm2vec(F0)],\displaystyle\texttt{A\{k,1\}}:=\left[\begin{array}[]{c}\texttt{sm2vec}(F^{1}_{1}),\cdots,\texttt{sm2vec}(F^{1}_{n})\\ \vdots\\ \texttt{sm2vec}(F^{\ell}_{1}),\cdots,\texttt{sm2vec}(F^{\ell}_{n})\end{array}\right],\ \ \ b\{k,1\}:=\left[\begin{array}[]{c}\texttt{sm2vec}(F^{1}_{0})\\ \vdots\\ \texttt{sm2vec}(F^{\ell}_{0})\end{array}\right],
(63) cons{k,1}=’SDP’cons{k,2}=[n1,,n].\displaystyle\texttt{cons\{k,1\}='SDP'}\ \ \ \texttt{cons\{k,2\}}=[n^{1},\ \ldots\ ,n^{\ell}].

The s.c. barrier used in DDS for SDP is the well-known function ln(det(X))-\textup{ln}(\det(X)) defined on the convex cone of symmetric positive definite matrices.

Example 3.2.

Assume that we want to find scalars x1x_{1}, x2x_{2}, and x3x_{3} such that x1+x2+x31x_{1}+x_{2}+x_{3}\geq 1 and the maximum eigenvalue of A0+x1A1+x2A2+x3A3A_{0}+x_{1}A_{1}+x_{2}A_{2}+x_{3}A_{3} is minimized, where

(76) A0=[20.50.60.520.40.60.43],A1=[010100000],A2=[001000100],A3=[000001010].\displaystyle A_{0}=\left[\begin{array}[]{ccc}2&-0.5&-0.6\\ -0.5&2&0.4\\ -0.6&0.4&3\end{array}\right],\ A_{1}=\left[\begin{array}[]{ccc}0&1&0\\ 1&0&0\\ 0&0&0\end{array}\right],\ A_{2}=\left[\begin{array}[]{ccc}0&0&1\\ 0&0&0\\ 1&0&0\end{array}\right],\ A_{3}=\left[\begin{array}[]{ccc}0&0&0\\ 0&0&1\\ 0&1&0\end{array}\right].

We can write this problem as

(77) min\displaystyle\min t\displaystyle t
s.t.\displaystyle s.t. 1+x1+x2+x30,\displaystyle-1+x_{1}+x_{2}+x_{3}\geq 0,
tI(A0+x1A1+x2A2+x3A3)0.\displaystyle tI-(A_{0}+x_{1}A_{1}+x_{2}A_{2}+x_{3}A_{3})\succeq 0.

To solve this problem, we define:

cons{1,1}=’LP’,cons{1,2}=[1],cons{2,1}=’SDP’,cons{2,2}=[3],\displaystyle\texttt{cons\{1,1\}='LP'},\ \ \texttt{cons\{1,2\}}=[1],\ \ \texttt{cons\{2,1\}='SDP'},\ \ \texttt{cons\{2,2\}}=[3],
A{1,1}=[1110],b{1,1}=[1],\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{cccc}1&1&1&0\end{array}\right],\ \ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}-1\end{array}\right],
A{2,1}=[000110000100100000010010010000100001],b{2,1}=[20.50.60.520.40.60.43],\displaystyle\texttt{A\{2,1\}}=\left[\begin{array}[]{cccc}0&0&0&1\\ -1&0&0&0\\ 0&-1&0&0\\ -1&0&0&0\\ 0&0&0&1\\ 0&0&-1&0\\ 0&-1&0&0\\ 0&0&-1&0\\ 0&0&0&1\end{array}\right],\ \ \ \texttt{b\{2,1\}}=\left[\begin{array}[]{c}-2\\ 0.5\\ 0.6\\ 0.5\\ -2\\ -0.4\\ 0.6\\ -0.4\\ -3\end{array}\right],
c=(0,0,0,1).\displaystyle\texttt{c}=(0,0,0,1)^{\top}.

Then DDS(c,A,b,cons) gives the answer x=(1.1265,0.6,0.4,3)x=(1.1265,0.6,-0.4,3)^{\top}, which means the minimum largest eigenvalue is 33.

4. Quadratic constraints

Suppose we want to add the following constraints to DDS:

(80) xAiQiAix+bix+di0,i{1,,},\displaystyle x^{\top}A_{i}^{\top}Q_{i}A_{i}x+b_{i}^{\top}x+d_{i}\leq 0,\ \ \ i\in\{1,\ldots,\ell\},

where each AiA_{i} is mim_{i}-by-nn with rank nn, and Qi𝕊miQ_{i}\in\mathbb{S}^{m_{i}}. In general, this type of constraints may be non-convex and difficult to handle. Currently, DDS handles two cases:

  • QiQ_{i} is positive semidefinite,

  • QiQ_{i} has exactly one negative eigenvalue. In this case, DDS considers the intersection of the set of points satisfying (80) and a shifted hyperbolicity cone defined by the quadratic inequality yQiy0y^{\top}Q_{i}y\leq 0.

To give constraints in (80) as input to DDS as the kkth block, we define

(91) A{k,1}=[b1A1blA],b{k,1}=[d10d0]\displaystyle\texttt{A\{k,1\}}=\left[\begin{array}[]{c}b_{1}^{\top}\\ A_{1}\\ \vdots\\ b_{l}^{\top}\\ A_{\ell}\end{array}\right],\ \ \ \texttt{b\{k,1\}}=\left[\begin{array}[]{c}d_{1}\\ 0\\ \vdots\\ d_{\ell}\\ 0\end{array}\right]
cons{k,1}=’QC’cons{k,2}=[m1,,m],\displaystyle\texttt{cons\{k,1\}='QC'}\ \ \texttt{cons\{k,2\}}=[m_{1},\ldots,m_{\ell}],
(92) cons{k,3,i}=Qi,i{1,,}.\displaystyle\texttt{cons\{k,3,i\}}=Q_{i},\ \ i\in\{1,\ldots,\ell\}.

If cons{k,3} is not given as the input, DDS takes all QiQ_{i}’s to be identity matrices.

If QiQ_{i} is positive semidefinite, then the corresponding constraint in (80) can be written as

uu+w+d0\displaystyle u^{\top}u+w+d\leq 0
(93) u:=RiAix,w:=bix,d:=di,\displaystyle u:=R_{i}A_{i}x,\ \ \ w:=b_{i}^{\top}x,\ \ d:=d_{i},

where Qi=RiRiQ_{i}=R_{i}^{\top}R_{i} is a Cholesky factorization of QiQ_{i}. We associate the following s.c. barrier and its LF conjugate to such quadratic constraints:

Φ(u,w)\displaystyle\Phi(u,w) =\displaystyle= ln((uu+w+d)),\displaystyle-\textup{ln}(-(u^{\top}u+w+d)),
(94) Φ(y,η)\displaystyle\Phi_{*}(y,\eta) =\displaystyle= yy4η1dηln(η).\displaystyle\frac{y^{\top}y}{4\eta}-1-d\eta-\textup{ln}(\eta).

If QiQ_{i} has exactly one negative eigenvalue with eigenvector vv, then yQiy-y^{\top}Q_{i}y is a hyperbolic polynomial with respect to vv. The hyperbolicity cone is the connected component of yQiy0y^{\top}Q_{i}y\leq 0 which contains vv and ln(yQiy)-\textup{ln}(-y^{\top}Q_{i}y) is a s.c. barrier for this cone.

If for any of the inequalities in (80), QiQ_{i} has exactly one negative eigenvalue while bi=0b_{i}=0 and di=0d_{i}=0, DDS considers the hyperbolicity cone defined by the inequality as the set constraint.

5. Generalized Power Cone

We define the (m,n)(m,n)-generalized power cone with parameter α\alpha as

(95) Kα(m,n):={(s,u)+mn:i=1msiαiu2},\displaystyle K^{(m,n)}_{\alpha}:=\left\{(s,u)\in\mathbb{R}_{+}^{m}\oplus\mathbb{R}^{n}:\prod_{i=1}^{m}s_{i}^{\alpha_{i}}\geq\|u\|_{2}\right\},

where α\alpha belongs to the simplex {α+m:i=1mαi=1}\{\alpha\in\mathbb{R}^{m}_{+}:\sum_{i=1}^{m}\alpha_{i}=1\}. Note that the rotated second order cone is a special case where m=2m=2 and α1=α2=12\alpha_{1}=\alpha_{2}=\frac{1}{2}. Different s.c. barriers for this cone or special cases of it have been considered [8, 35]. Chares conjectured a s.c. barrier for the cone in (95), which is proved in [31]. This function that we use in DDS is:

(96) Φ(s,u)=ln(i=1msi2αiuu)i=1m(1αi)ln(si).\displaystyle\Phi(s,u)=-\textup{ln}\left(\prod_{i=1}^{m}s_{i}^{2\alpha_{i}}-u^{\top}u\right)-\sum_{i=1}^{m}(1-\alpha_{i})\textup{ln}(s_{i}).

To add generalized power cone constraints to DDS, we use the abbreviation ’GPC’. Therefore, if the kkth block of constraints is GNC, we define cons{k,1}=’GPC’. Assume that we want to input the following \ell constraints to DDS:

(97) (Asix+bsi,Auix+bui)Kαi(mi,ni),i{1,,},\displaystyle(A_{s}^{i}x+b_{s}^{i},A_{u}^{i}x+b_{u}^{i})\in K^{(m_{i},n_{i})}_{\alpha^{i}},\ \ \ i\in\{1,\ldots,\ell\},

where AsiA_{s}^{i}, bsib_{s}^{i}, AuiA_{u}^{i}, and buib_{u}^{i} are matrices and vectors of proper size. Then, to input these constraints as the kkth block, we define cons{k,2} as a MATLAB cell array of size \ell-by-22, each row represents one constraint. We then define:

cons{k,2}{i,1} =\displaystyle= [mini],\displaystyle[m_{i}\ \ n_{i}],
(98) cons{k,2}{i,2} =\displaystyle= αi,i{1,,}.\displaystyle\alpha^{i},\ \ \ \ i\in\{1,\ldots,\ell\}.

For matrices AA and bb, we define:

(109) A{k,1}=[As1Au1AsAu],b{k,1}=[bs1bu1bsbu].\displaystyle\texttt{A\{k,1\}}=\left[\begin{array}[]{c}A_{s}^{1}\\ A_{u}^{1}\\ \vdots\\ A_{s}^{\ell}\\ A_{u}^{\ell}\end{array}\right],\ \ \ \texttt{b\{k,1\}}=\left[\begin{array}[]{c}b_{s}^{1}\\ b_{u}^{1}\\ \vdots\\ b_{s}^{\ell}\\ b_{u}^{\ell}\end{array}\right].
Example 5.1.

Consider the following optimization problem with ’LP’ and ’GPC’ constraints:

(110) min\displaystyle\min x1x2x3\displaystyle-x_{1}-x_{2}-x_{3}
s.t.\displaystyle s.t. x(x1+3)0.3(x2+1)0.3(x3+2)0.4,\displaystyle\|x\|\leq(x_{1}+3)^{0.3}(x_{2}+1)^{0.3}(x_{3}+2)^{0.4},
x1,x2,x33.\displaystyle x_{1},x_{2},x_{3}\geq 3.

Then we define:

cons{1,1}=’GPC’,cons{1,2}={[3, 3],[0.3;0.3;0.4]}\displaystyle\texttt{cons\{1,1\}='GPC'},\ \ \texttt{cons\{1,2\}}=\{[3,\ \ 3],\ \ [0.3;0.3;0.4]\}
A{1,1}=[eye(3)eye(3)],b{1,1}=[3;1;2;0;0;0]\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{cc}\texttt{eye(3)}\\ \texttt{eye(3)}\end{array}\right],\ \ \texttt{b\{1,1\}}=[3;1;2;0;0;0]
cons{2,1}=’LP’,cons{2,2}=[3]\displaystyle\texttt{cons\{2,1\}='LP'},\ \ \texttt{cons\{2,2\}}=[3]
A{2,1}=[eye(3)],b{2,1}=[3;3;3]\displaystyle\texttt{A\{2,1\}}=\left[-\texttt{eye(3)}\right],\ \ \texttt{b\{2,1\}}=[3;3;3]
c=[1,1,1].\displaystyle c=[-1,-1,-1].

6. Epigraphs of matrix norms

Assume that we have constraints of the form

XUU0,\displaystyle X-UU^{\top}\succeq 0,
X=A0+i=1xiAi,\displaystyle X=A_{0}+\sum_{i=1}^{\ell}x_{i}A_{i},
(112) U=B0+i=1xiBi,\displaystyle U=B_{0}+\sum_{i=1}^{\ell}x_{i}B_{i},

where AiA_{i}, i{1,,}i\in\{1,\ldots,\ell\}, are mm-by-mm symmetric matrices, and BiB_{i}, i{1,,}i\in\{1,\ldots,\ell\}, are mm-by-nn matrices. The set {(Z,U)𝕊mm×n:ZUU0}\{(Z,U)\in\mathbb{S}^{m}\oplus\mathbb{R}^{m\times n}:Z-UU^{\top}\succeq 0\} is handled by the following s.c. barrier:

(113) Φ(Z,U):=ln(det(ZUU)),\displaystyle\Phi(Z,U):=-\textup{ln}(\det(Z-UU^{\top})),

with LF conjugate

(114) Φ(Y,V)=m14Tr(VY1V)ln(det(Y)),\displaystyle\Phi_{*}(Y,V)=-m-\frac{1}{4}\textup{Tr}(V^{\top}Y^{-1}V)-\textup{ln}(\det(-Y)),

where Ym×mY\in\mathbb{R}^{m\times m} and Vm×nV\in\mathbb{R}^{m\times n} [25]. This constraint can be reformulated as an SDP constraint using a Schur complement. However, Φ(Z,U)\Phi(Z,U) is a mm-s.c. barrier while the size of SDP reformulation is m+nm+n. For the cases that mnm\ll n, using the Domain-Driven form may be advantageous. A special but very important application is minimizing the nuclear norm of a matrix, which we describe in a separate subsection in the following.

DDS has two internal functions m2vec and vec2m for converting matrices (not necessarily symmetric) to vectors and vice versa. For an mm-by-nn matrix ZZ, m2vec(Z,n) change the matrix into a vector. vec2m(v,m) reshapes a vector vv of proper size to a matrix with mm rows. The abbreviation we use for epigraph of a matrix norm is MN. If the kkth input block is of this type, cons{k,2} is a \ell-by-22 matrix, where \ell is the number of constraints of this type, and each row is of the form [mn][m\ \ n]. For each constraint of the form (6), the corresponding parts in AA and bb are defined as

(119) A{k,1}=[m2vec(B1,n)m2vec(B,n)sm2vec(A1)sm2vec(A)],b{k,1}=[m2vec(B0,n)sm2vec(A0)].\displaystyle\texttt{A\{k,1\}}=\left[\begin{array}[]{ccc}\texttt{m2vec}(B_{1},n)&\cdots&\texttt{m2vec}(B_{\ell},n)\\ \texttt{sm2vec}(A_{1})&\cdots&\texttt{sm2vec}(A_{\ell})\end{array}\right],\ \ \texttt{b\{k,1\}}=\left[\begin{array}[]{c}\texttt{m2vec}(B_{0},n)\\ \texttt{sm2vec}(A_{0})\end{array}\right].

For implementation details involving epigraph of matrix norms, see Appendix B.2. Here is an example:

Example 6.1.

Assume that we have matrices

(126) U0=[100011],U1=[111001],U2=[100010],\displaystyle U_{0}=\left[\begin{array}[]{ccc}1&0&0\\ 0&1&1\end{array}\right],\ \ U_{1}=\left[\begin{array}[]{ccc}-1&-1&1\\ 0&0&1\end{array}\right],\ \ U_{2}=\left[\begin{array}[]{ccc}1&0&0\\ 0&1&0\end{array}\right],

and our goal is to solve

(127) min\displaystyle\min t\displaystyle t
s.t.\displaystyle s.t. UUtI,\displaystyle UU^{\top}\preceq tI,
U=U0+x1U1+x2U2.\displaystyle U=U_{0}+x_{1}U_{1}+x_{2}U_{2}.

Then the input to DDS is defined as

cons{1,1}=’MN’,cons{2,1}=[2 3],\displaystyle\texttt{cons\{1,1\}='MN'},\ \ \texttt{cons\{2,1\}}=[2\ \ 3],
A{1,1}=[m2vec(U1,3)m2vec(U2,3)zeros(6,1)zeros(4,1)zeros(4,1)sm2vec(I2×2)],b{1,1}=[m2vec(U0,3)zeros(4,1)],\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{ccc}\texttt{m2vec}(U_{1},3)&\texttt{m2vec}(U_{2},3)&\texttt{zeros}(6,1)\\ \texttt{zeros}(4,1)&\texttt{zeros}(4,1)&\texttt{sm2vec}(I_{2\times 2})\end{array}\right],\ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}\texttt{m2vec}(U_{0},3)\\ \texttt{zeros}(4,1)\end{array}\right],
c=[0,0,1].\displaystyle\texttt{c}=[0,0,1].

6.1. Minimizing nuclear norm

The nuclear norm of a matrix ZZ is Z:=Tr((ZZ)1/2)\|Z\|_{*}:=\textup{Tr}\left((ZZ^{\top})^{1/2}\right). The dual norm of \|\cdot\|_{*} is the operator 2-norm \|\cdot\| of a matrix. Minimization of nuclear norm has application in machine learning and matrix sparsification. The following optimization problems are a primal-dual pair [29].

(133) (PN)minXXs.t.A(X)=b.(DN)maxzb,zs.t.A(z)1,\displaystyle\begin{array}[]{ccc}(P_{N})&\min_{X}&\|X\|_{*}\\ &s.t.&A(X)=b.\end{array}\ \ \ \begin{array}[]{ccc}(D_{N})&\max_{z}&\langle b,z\rangle\\ &s.t.&\|A^{*}(z)\|\leq 1,\end{array}

where AA is a linear transformation on matrices and AA^{*} is its adjoint. (PN)(P_{N}) is a very popular relaxation of the problem of minimizing rank(X)\textup{rank}(X) subject to A(X)=bA(X)=b, with applications in machine learning and compressed sensing. The dual problem (DN)(D_{N}) is a special case of (6) where Z=IZ=I and U=A(z)U=A^{*}(z). As we will show on an example, solving (DN)(D_{N}) by [x,y]=DDS(c,A,b,Z) leads us to yy, which gives a solution for (PN)(P_{N}).

More specifically, consider the optimization problem

(134) min\displaystyle\min X\displaystyle\|X\|_{*}
s.t.\displaystyle s.t. Tr(UiX)=ci,i{1,,},\displaystyle\textup{Tr}(U_{i}X)=c_{i},\ \ \ i\in\{1,\ldots,\ell\},

where XX is nn-by-mm. DDS can solve the dual problem by defining

(139) A{1,1}=[m2vec(U1,n)m2vec(U,n)zeros(m2,1)zeros(m2,1)],b{1,1}=[zeros(mn,1)sm2vec(Im×m)],\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{ccc}\texttt{m2vec}(U_{1},n)&\cdots&\texttt{m2vec}(U_{\ell},n)\\ \texttt{zeros}(m^{2},1)&\cdots&\texttt{zeros}(m^{2},1)\end{array}\right],\ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}\texttt{zeros}(mn,1)\\ \texttt{sm2vec}(I_{m\times m})\end{array}\right],
(140) cons{1,1}=’MN’,cons{1,2}=[mn].\displaystyle\texttt{cons\{1,1\}='MN'},\ \ \texttt{cons\{1,2\}}=[m\ \ n].

Then, if we run [x,y]=DDS(c,A,b,cons) and define V:=(vec2m(y{1}(1:m*n),m)), then VV is an optimal solution for (134). In subsection 13.3, we present numerical results for solving problem (134) and show that for the cases that nmn\gg m, DDS can be more efficient than SDP based solvers. Here is an example:

Example 6.2.

We consider minimizing the nuclear norm over a subspace. Consider the following optimization problem:

(141) min\displaystyle\min X\displaystyle\|X\|_{*}
s.t.\displaystyle s.t. Tr(U1X)=1\displaystyle\textup{Tr}(U_{1}X)=1
Tr(U2X)=2,\displaystyle\textup{Tr}(U_{2}X)=2,

where

(146) U1=[10000100],U2=[00100001].\displaystyle U_{1}=\left[\begin{array}[]{cccc}1&0&0&0\\ 0&1&0&0\end{array}\right],\ \ U_{2}=\left[\begin{array}[]{cccc}0&0&1&0\\ 0&0&0&1\end{array}\right].

By using (133), the dual of this problem is

(147) min\displaystyle\min u12u2\displaystyle-u_{1}-2u_{2}
s.t.\displaystyle s.t. u1U1+u2U21.\displaystyle\|u_{1}U_{1}+u_{2}U_{2}\|\leq 1.

To solve this problem with our code, we define

cons{1,1}=’MN’,cons{1,2}=[2 4],\displaystyle\texttt{cons\{1,1\}='MN'},\ \ \texttt{cons\{1,2\}}=[2\ \ 4],
A{1,1}=[m2vec(U1,4)m2vec(U2,4)zeros(4,1)zeros(4,1)],b{1,1}=[zeros(8,1)sm2vec(I2×2)],\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{cc}\texttt{m2vec}(U_{1},4)&\texttt{m2vec}(U_{2},4)\\ \texttt{zeros}(4,1)&\texttt{zeros}(4,1)\end{array}\right],\ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}\texttt{zeros}(8,1)\\ \texttt{sm2vec}(I_{2\times 2})\end{array}\right],
c=[1,2].\displaystyle\texttt{c}=[-1,-2].

If we solve the problem using [x,y]=DDS(c,A,b,cons), the optimal value is 2.2360-2.2360. Now V:=(vec2m(y{1}(1:8),2)) is the solution of (141) with objective value 2.23602.2360. We have

(153) X:=V=[0.5000.51001].\displaystyle X^{*}:=V=\left[\begin{array}[]{cc}0.5&0\\ 0&0.5\\ 1&0\\ 0&1\end{array}\right].

7. Epigraphs of convex univariate functions (geometric, entropy, and pp-norm programming)

DDS accepts constraints of the form

(154) i=1αifi(aix+βi)+gx+γ0,ai,gn,βi,γ,i{1,,},\displaystyle\sum_{i=1}^{\ell}\alpha_{i}f_{i}(a_{i}^{\top}x+\beta_{i})+g^{\top}x+\gamma\leq 0,\ \ \ a_{i},g\in\mathbb{R}^{n},\ \ \beta_{i},\gamma\in\mathbb{R},\ \ i\in\{1,\ldots,\ell\},

where αi0\alpha_{i}\geq 0 and fi(x)f_{i}(x), i{1,,}i\in\{1,\ldots,\ell\}, can be any function from Table 2. Note that every univariate convex function can be added to this table in the same fashion. By using this simple structure, we can model many interesting optimization problems. Geometric programming (GP) [2] and entropy programming (EP) [12] with many applications in engineering are constructed with constraints of the form (154) when fi(z)=ezf_{i}(z)=e^{z} for i{1,,}i\in\{1,\ldots,\ell\} and fi(z)=zln(z)f_{i}(z)=z\textup{ln}(z) for i{1,,}i\in\{1,\ldots,\ell\}, respectively. The other functions with pp powers let us solve optimization problems related to pp-norm minimization.

Table 2. Some 2-dimensional convex sets and their s.c. barriers.
set (z,t)(z,t) s.c. barrier Φ(z,t)\Phi(z,t)
1 ln(z)t,z>0-\textup{ln}(z)\leq t,\ z>0 ln(t+ln(z))ln(z)-\textup{ln}(t+\textup{ln}(z))-\textup{ln}(z)
2 ezte^{z}\leq t ln(ln(t)z)ln(t)-\textup{ln}(\textup{ln}(t)-z)-\textup{ln}(t)
3 zln(z)t,z>0z\textup{ln}(z)\leq t,\ z>0 ln(tzln(z))ln(z)-\textup{ln}(t-z\textup{ln}(z))-\textup{ln}(z)
4 |z|pt,p1|z|^{p}\leq t,\ p\geq 1 ln(t2pz2)2ln(t)-\textup{ln}(t^{\frac{2}{p}}-z^{2})-2\textup{ln}(t)
5 zpt,z>0, 0p1-z^{p}\leq t,\ z>0,\ 0\leq p\leq 1 ln(zp+t)ln(z)-\textup{ln}(z^{p}+t)-\textup{ln}(z)
6 1zt,z>0\frac{1}{z}\leq t,\ z>0 ln(zt1)-\textup{ln}(zt-1)

The corresponding s.c. barriers used in DDS are shown in Table 2. There is a closed form expression for the LF conjugate of the first two functions. For the last four, the LF conjugate can be calculated to high accuracy efficiently. In Appendix D, we show how to calculate the LF conjugates for the functions in Table 2 and the internal functions we have in DDS.

To represent a constraint of the from (154), for given γ\gamma\in\mathbb{R} and βi\beta_{i}\in\mathbb{R}, i{1,,}i\in\{1,\ldots,\ell\}, we can define the corresponding convex set DD as

(155) D:={(w,si,ui):w+γ0,fi(si+βi)ui,i},\displaystyle D:=\left\{(w,s_{i},u_{i}):w+\gamma\leq 0,\ \ f_{i}(s_{i}+\beta_{i})\leq u_{i},\ \forall i\right\},

and our matrix AA represents w=i=1αiui+gxw=\sum_{i=1}^{\ell}\alpha_{i}u_{i}+g^{\top}x and si=aiTxs_{i}=a_{i}^{T}x, i{1,,}i\in\{1,\ldots,\ell\}. As can be seen, to represent our set as above, we introduce some auxiliary variables uiu_{i}’s to our formulations. DDS code does this internally. Let us assume that we want to add the following ss constraints to our model

(156) typei=1typejαij,typeftype((aij,type)x+βij,type)+gjx+γj0,j{1,,s}.\displaystyle\sum_{type}\sum_{i=1}^{\ell_{type}^{j}}\alpha_{i}^{j,type}f_{type}((a_{i}^{j,type})^{\top}x+\beta_{i}^{j,type})+g_{j}^{\top}x+\gamma_{j}\leq 0,\ \ \ \ j\in\{1,\ldots,s\}.

From now on, typetype indexes the rows of Table 2. The abbreviation we use for these constraints is TD. Hence, if the kkth input block are the constraints in (156), then we have cons{k,1}=’TD’. cons{k,2} is a cell array of MATLAB with ss rows, each row represents one constraint. For the jjth constraint we have:

  • cons{k,2}{j,1} is a matrix with two columns: the first column shows the type of a function from Table 2 and the second column shows the number of that function in the constraint. Let us say that in the jjth constraint, we have l2jl_{2}^{j} functions of type 2 and l3jl_{3}^{j} functions of type 3, then we have

    cons{k,2}{j,1}=[2l2j3l3j].\displaystyle\texttt{cons\{k,2\}\{j,1\}}=\left[\begin{array}[]{ccc}2&l_{2}^{j}\\ 3&l_{3}^{j}\end{array}\right].

    The types can be in any order, but the functions of the same type are consecutive and the order must match with the rows of AA and bb.

  • cons{k,2}{j,2} is a vector with the coefficients of the functions in a constraint, i.e., αij,type\alpha_{i}^{j,type} in (156). Note that the coefficients must be in the same order as their corresponding rows in AA and bb. If in the jjth constraint we have 2 functions of type 2 and 1 function of type 3, it starts as

    cons{k,2}{j,2}=[α1j,2,α2j,2,α1j,3,].\displaystyle\texttt{cons\{k,2\}\{j,2\}}=[\alpha_{1}^{j,2},\alpha_{2}^{j,2},\alpha_{1}^{j,3},\cdots].

To add the rows to AA, for each constraint jj, we first add gjg_{j}, then aij,typea_{i}^{j,type}’s in the order that matches cons{k,2}. We do the same thing for vector bb (first γj\gamma_{j}, then βij,type\beta_{i}^{j,type}’s). The part of AA and bb corresponding to the jjth constraint is as follows if we have for example five types

(174) A=[gja1j,1al1jj,1a1j,5al5jj,5],b=[γjβ1j,1βl1jj,1β1j,5βl5jj5].\displaystyle\texttt{A}=\left[\begin{array}[]{c}g_{j}^{\top}\\[3.00003pt] a_{1}^{j,1}\\ \vdots\\ a_{l_{1}^{j}}^{j,1}\\ \vdots\\ a_{1}^{j,5}\\ \vdots\\ a_{l_{5}^{j}}^{j,5}\end{array}\right],\ \ \ \texttt{b}=\left[\begin{array}[]{c}\gamma_{j}\\[3.00003pt] \beta_{1}^{j,1}\\ \vdots\\ \beta_{l_{1}^{j}}^{j,1}\\ \vdots\\ \beta_{1}^{j,5}\\ \vdots\\ \beta_{l_{5}^{j}}^{j5}\end{array}\right].

Let us see an example:

Example 7.1.

Assume that we want to solve

(175) min\displaystyle\min cx\displaystyle c^{\top}x
s.t. 1.2ln(x2+2x3+55)+1.8ex1+x2+1+x12.10,\displaystyle-1.2\textup{ln}(x_{2}+2x_{3}+55)+1.8e^{x_{1}+x_{2}+1}+x_{1}-2.1\leq 0,
3.5ln(x1+2x2+3x330)+0.9ex33x3+1.20,\displaystyle-3.5\textup{ln}(x_{1}+2x_{2}+3x_{3}-30)+0.9e^{-x_{3}-3}-x_{3}+1.2\leq 0,
x0.\displaystyle x\geq 0.

For this problem, we define:

cons{1,1}=’LP’,cons{1,2}=[3],\displaystyle\texttt{cons\{1,1\}='LP'},\ \ \texttt{cons\{1,2\}}=[3],
cons{2,1}=’TD’,cons{2,2}={[1121],[1.2 1.8];[1121],[3.5 0.9]},\displaystyle\texttt{cons\{2,1\}='TD'},\ \ \texttt{cons\{2,2\}}=\left\{\left[\begin{array}[]{ccc}1&1\\ 2&1\end{array}\right],[1.2\ \ 1.8]\ ;\ \ \left[\begin{array}[]{ccc}1&1\\ 2&1\end{array}\right],[3.5\ \ 0.9]\right\},
A{1,1}=[100010001],b{1,1}=[000],\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{ccc}-1&0&0\\ 0&-1&0\\ 0&0&-1\end{array}\right],\ \ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}0\\ 0\\ 0\end{array}\right],
A{2,1}=[100012110001123001],b{2,1}=[2.15511.2303].\displaystyle\texttt{A\{2,1\}}=\left[\begin{array}[]{ccc}1&0&0\\ 0&1&2\\ 1&1&0\\ 0&0&-1\\ 1&2&3\\ 0&0&-1\end{array}\right],\ \ \ \texttt{b\{2,1\}}=\left[\begin{array}[]{c}-2.1\\ 55\\ 1\\ 1.2\\ -30\\ -3\end{array}\right].

Note: As we mentioned, modeling systems for convex optimization that are based on SDP solvers, such as CVX, have to use approximation for functions involving expexp and ln. Approximation makes it hard to return dual certificates, specifically when the problem is infeasible or unbounded.

7.1. Constraints involving power functions

The difference between these two types (4 and 5) and the others is that we also need to give the value of pp for each function. To do that, we add another column to cons{k,2}.

Note: For TD constraints, cons{k,2} can have two or three columns. If we do not use types 4 and 5, it has two, otherwise three columns. cons{k,2}{j,3} is a vector which contains the powers pp for functions of types 4 and 5. The powers are given in the same order as the coefficients in cons{k,2}{j,2}. If the constraint also has functions of other types, we must put 0 in place of the power.

Let us see an example:

Example 7.2.
min\displaystyle\min cx\displaystyle c^{\top}x
s.t. 2.2exp(2x1+3)+|x1+x2+x3|2+4.5|x1+x2|2.5+|x2+2x3|3+1.3x11.90.\displaystyle 2.2\textup{exp}(2x_{1}+3)+|x_{1}+x_{2}+x_{3}|^{2}+4.5|x_{1}+x_{2}|^{2.5}+|x_{2}+2x_{3}|^{3}+1.3x_{1}-1.9\leq 0.

For this problem, we define:

A{1,1}=[1.300200111110012],b{1,1}=[1.93000],\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{ccc}1.3&0&0\\ 2&0&0\\ 1&1&1\\ 1&1&0\\ 0&1&2\end{array}\right],\ \ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}-1.9\\ 3\\ 0\\ 0\\ 0\end{array}\right],
cons{1,1}=’TD’,cons{1,2}={[2143],[2.2 1 4.5 1],[0 2 2.5 3]}.\displaystyle\texttt{cons\{1,1\}='TD'},\ \ \texttt{cons\{1,2\}}=\left\{\left[\begin{array}[]{ccc}2&1\\ 4&3\end{array}\right],\ [2.2\ \ 1\ \ 4.5\ \ 1],\ [0\ \ 2\ \ 2.5\ \ 3]\right\}.

8. Vector Relative Entropy

Consider the relative entropy function f:++++f:\mathbb{R}_{++}^{\ell}\oplus\mathbb{R}_{++}^{\ell}\rightarrow\mathbb{R} defined as

f(u,z):=i=1uiln(ui)uiln(zi).f(u,z):=\sum_{i=1}^{\ell}u_{i}\textup{ln}(u_{i})-u_{i}\textup{ln}(z_{i}).

The epigraph of this function accepts the following (2+1)(2\ell+1)-s.c. barrier (see Appendix E):

(181) Φ(t,u,z):=ln(ti=1uiln(ui/zi))i=1ln(ui)i=1ln(zi).\displaystyle\Phi(t,u,z):=-\textup{ln}\left(t-\sum_{i=1}^{\ell}u_{i}\textup{ln}(u_{i}/z_{i})\right)-\sum_{i=1}^{\ell}\textup{ln}(u_{i})-\sum_{i=1}^{\ell}\textup{ln}(z_{i}).

This function covers many applications as discussed in [10]. The special case of =1\ell=1 is used for handling EXP cone in the commercial solver MOSEK. We claim that the LF of Φ\Phi can be efficiently calculated using \ell one-dimensional minimizations, i.e., having the function θ(r)\theta(r) as the solution of 1θln(θ)=r\frac{1}{\theta}-\textup{ln}(\theta)=r. We consider the case of =1\ell=1 and the generalization is straightforward. We have

(182) Φ(α,yu,yz):=mint,z,uαt+yzz+yuu+ln(tuln(u)+uln(z))+ln(z)+ln(u).\displaystyle\ \ \ \ \Phi_{*}(\alpha,y_{u},y_{z}):=\min_{t,z,u}\alpha t+y_{z}z+y_{u}u+\textup{ln}(t-u\textup{ln}(u)+u\textup{ln}(z))+\textup{ln}(z)+\textup{ln}(u).

Writing the optimality conditions, we get (defining T:=tuln(u)+uln(z)T:=t-u\textup{ln}(u)+u\textup{ln}(z))

(186) α+1T=0yz+uzT+1z=0yu1+ln(u)ln(z)T+1u=0\displaystyle\begin{array}[]{lcl}\alpha+\frac{1}{T}&=&0\\ y_{z}+\frac{u}{zT}+\frac{1}{z}&=&0\\ y_{u}-\frac{1+\textup{ln}(u)-\textup{ln}(z)}{T}+\frac{1}{u}&=&0\end{array}

We can see that at the optimal solution, T=1/αT=-1/\alpha, and if we calculate uu, we can easily get zz. uu is calculated by the following equation:

(187) 1T+1u=1θ(Tyu+2+ln(yz)+ln(T))\displaystyle\frac{1}{T}+\frac{1}{u}=\frac{1}{\theta\left(-Ty_{u}+2+\textup{ln}(-y_{z})+\textup{ln}(T)\right)}

Therefore, to calculate Φ\Phi_{*} at every point, we need to evaluate θ\theta once. For the general \ell, we evaluate θ\theta by \ell times.

The abbreviation we use for relative entropy is RE. So, for the kkth block of ss constraints of the form

(188) f(Auix+bui,Azix+bzi)+gix+γi0,i{1,,s},\displaystyle f(A_{u}^{i}x+b_{u}^{i},A_{z}^{i}x+b_{z}^{i})+g_{i}^{\top}x+\gamma_{i}\leq 0,\ \ i\in\{1,\ldots,s\},

we define cons{k,1} = ’RE’ and cons{k,2} is a vector of length ss with the iith element equal to 2+12\ell+1. We also define:

(203) A=[giAu1Az1gsAusAzs],b=[γ1bu1bz1γsbusbzs].\displaystyle\texttt{A}=\left[\begin{array}[]{c}g_{i}^{\top}\\[3.00003pt] A_{u}^{1}\\ A_{z}^{1}\\ \vdots\\ g_{s}^{\top}\\[3.00003pt] A_{u}^{s}\\ A_{z}^{s}\end{array}\right],\ \ \ \texttt{b}=\left[\begin{array}[]{c}\gamma_{1}\\[3.00003pt] b_{u}^{1}\\ b_{z}^{1}\\ \vdots\\ \gamma_{s}\\ b_{u}^{s}\\ b_{z}^{s}\end{array}\right].
Example 8.1.

Assume that we want to minimize a relative entropy function under a linear constraint:

min\displaystyle\min (0.8x1+1.3)ln(0.8x1+1.32.1x1+1.3x2+1.9)+(1.1x11.5x23.8)ln(1.1x11.5x23.83.9x2)\displaystyle(0.8x_{1}+1.3)\textup{ln}\left(\frac{0.8x_{1}+1.3}{2.1x_{1}+1.3x_{2}+1.9}\right)+(1.1x_{1}-1.5x_{2}-3.8)\textup{ln}\left(\frac{1.1x_{1}-1.5x_{2}-3.8}{3.9x_{2}}\right)
(204) s.t.\displaystyle s.t. x1+x2β.\displaystyle x_{1}+x_{2}\leq\beta.

We add an auxiliary variable x3x_{3} to model the objective function as a constraint. For this problem we define:

cons{1,1}=’RE’,cons{1,2}=[5]\displaystyle\texttt{cons\{1,1\}='RE'},\ \ \texttt{cons\{1,2\}}=\left[5\right]
A{1,1}=[0010.8001.11.502.11.3003.90],b{1,1}=[01.33.81.90],\displaystyle\texttt{A\{1,1\}}=\left[\begin{array}[]{ccc}0&0&-1\\ 0.8&0&0\\ 1.1&-1.5&0\\ 2.1&1.3&0\\ 0&3.9&0\end{array}\right],\ \ \ \texttt{b\{1,1\}}=\left[\begin{array}[]{c}0\\ 1.3\\ -3.8\\ 1.9\\ 0\end{array}\right],
cons{2,1}=’LP’,cons{2,2}=[1]\displaystyle\texttt{cons\{2,1\}='LP'},\ \ \texttt{cons\{2,2\}}=\left[1\right]
A{2,1}=[11 0],b{2,1}=[β].\displaystyle\texttt{A\{2,1\}}=[-1\ \ -1\ \ 0],\ \ \ \texttt{b\{2,1\}}=[\beta].

If we solve this problem by DDS, for β=2\beta=2 the problem is infeasible, and for β=7\beta=7 it returns an optimal solution x:=(5.93,1.06)x^{*}:=(5.93,1.06)^{\top} with the minimum of function equal to 7.259-7.259.

9. Quantum entropy and Quantum relative entropy

Quantum entropy and quantum relative entropy functions are important in quantum information processing. DDS 2.0 accepts constraints involving these two functions. Let us start with the main definitions. Consider a function f:{+}f:\mathbb{R}\rightarrow\mathbb{R}\cup\{+\infty\} and let XnX\in\mathbb{H}^{n} be a Hermitian matrix (with entries from \mathbb{C}) with a spectral decomposition X=UDiag(λ1,,λn)UX=U\textup{Diag}(\lambda_{1},\ldots,\lambda_{n})U^{*}, where Diag returns a diagonal matrix with the given entries on its diagonal and UU^{*} is the conjugate transpose of a unitary matrix UU. We define the matrix extension FF of ff as F(X):=UDiag(f(λ1),,f(λn))UF(X):=U\textup{Diag}(f(\lambda_{1}),\ldots,f(\lambda_{n}))U^{*}. Whenever we write ϕ(X):=Tr(F(X))\phi(X):=\textup{Tr}(F(X)), we mean a function ϕ:n{+}\phi:\mathbb{H}^{n}\rightarrow\mathbb{R}\cup\{+\infty\} defined as

(208) ϕ(X):={Tr(UDiag(f(λ1),,f(λn))U)if f(λi),i,+o.w.\displaystyle\phi(X):=\left\{\begin{array}[]{ll}\textup{Tr}(U\textup{Diag}(f(\lambda_{1}),\ldots,f(\lambda_{n}))U^{*})&\text{if $f(\lambda_{i})\in\mathbb{R},\ \forall i$},\\ +\infty&\text{o.w.}\end{array}\right.

Study of such matrix functions go back to the work of Löwner as well as Von-Neumann (see [11], [21], and the references therein). A function f:(a,b)f:(a,b)\mapsto\mathbb{R} is said to be matrix monotone if for any two self-adjoint matrix XX and YY with eigenvalues in (a,b)(a,b) that satisfy XYX\succeq Y, we have F(X)F(Y)F(X)\succeq F(Y). A function f:(a,b)f:(a,b)\mapsto\mathbb{R} is said to be matrix convex if for any pair of self-adjoint matrices XX and YY with eigenvalues in (a,b)(a,b), we have

(209) F(tX+(1t)Y)tF(X)+(1t)F(Y),t(0,1).\displaystyle F(tX+(1-t)Y)\preceq tF(X)+(1-t)F(Y),\ \ \forall t\in(0,1).

Faybusovich and Tsuchiya [14] utilized the connection between the matrix monotone functions and self-concordant functions. Let ff be a continuously differentiable function whose derivative is matrix monotone on the positive semi-axis and let us define ϕ\phi as (208). Then, the function

(210) Φ(t,X):=ln(tϕ(X))lndet(X)\displaystyle\Phi(t,X):=-\textup{ln}(t-\phi(X))-\textup{ln}\det(X)

is a (n+1)(n+1)-s.c. barrier for the epigraph of ϕ(X)\phi(X). This convex set has many applications. Many optimization problems arising in quantum information theory and some other areas requires dealing with the so-called quantum or von Neumann entropy qe:n{+}qe:\mathbb{H}^{n}\rightarrow\mathbb{R}\cup\{+\infty\} which is defined as qe(X):=Tr(Xln(X))qe(X):=\textup{Tr}(X\textup{ln}(X)). If we consider f(x):=xln(x)f(x):=x\textup{ln}(x), then f(x)=1+ln(x)f^{\prime}(x)=1+\textup{ln}(x) is matrix monotone on (0,)(0,\infty) (see, for instance [18]-Example 4.2) and so we have a s.c. barrier for the set

{(t,X)𝕊n:Tr(Xln(X))t}.\{(t,X)\in\mathbb{R}\oplus\mathbb{S}^{n}:\textup{Tr}(X\textup{ln}(X))\leq t\}.

We have to solve the optimization problem

(211) Φ(η,Y)=supt,Xtη+X,Y+ln(tqe(X))+lndet(X),\displaystyle\Phi_{*}(\eta,Y)=\sup_{t,X}t\eta+\langle X,Y\rangle+\textup{ln}(t-qe(X))+\textup{ln}\det(X),

to calculate the LF conjugate of (210), which is done in Appendix C.

Another interesting function with applications in quantum information theory is quantum relative entropy qre:nn{+}qre:\mathbb{H}^{n}\oplus\mathbb{H}^{n}\rightarrow\mathbb{R}\cup\{+\infty\} defined as

qre(X,Y):=Xln(X)Xln(Y).qre(X,Y):=X\textup{ln}(X)-X\textup{ln}(Y).

This function is convex as proved in [34]. The epigraph of qreqre is

{(t,X,Y)𝕊n𝕊n:Tr(Xln(X)Xln(Y))t}.\{(t,X,Y)\in\mathbb{R}\oplus\mathbb{S}^{n}\oplus\mathbb{S}^{n}:\textup{Tr}(X\textup{ln}(X)-X\textup{ln}(Y))\leq t\}.

DDS 2.0 uses the following barrier (not yet known to be s.c.) for solving problems involving quantum relative entropy constraints:

Φ(t,X,Y):=ln(tqre(X,Y))lndet(X)lndet(Y).\Phi(t,X,Y):=\textup{ln}(t-qre(X,Y))-\textup{ln}\det(X)-\textup{ln}\det(Y).

We can input constraints involving quantum entropy and quantum relative entropy into DDS as we explain in the following subsections. Appendix C contains some interesting theoretical results about quantum entropy functions and how to calculate the derivative and Hessian for the above s.c. barriers and also the LF conjugate given in (211).

9.1. Adding quantum entropy based constraints

Let qei:𝕊ni{+}qe_{i}:\mathbb{S}^{n_{i}}\rightarrow\mathbb{R}\cup\{+\infty\} be quantum entropy functions and consider \ell quantum entropy constraints of the form

(212) qei(A0i+x1A1i++xnAni)gix+di,i{1,,}.\displaystyle qe_{i}(A^{i}_{0}+x_{1}A^{i}_{1}+\cdots+x_{n}A^{i}_{n})\leq g_{i}^{\top}x+d_{i},\ \ \ i\in\{1,\ldots,\ell\}.

AjiA^{i}_{j}’s are nin_{i}-by-nin_{i} symmetric matrices. To input (212) to DDS as the kkth block, we define:

cons{k,1}=’QE’,cons{k,2}=[n1,,n],\displaystyle\texttt{cons\{k,1\}='QE'},\ \ \texttt{cons\{k,2\}}=[n_{1},\ldots,n_{\ell}],
(223) A{k,1}:=[g1sm2vec(A11),,sm2vec(An1)gsm2vec(A1),,sm2vec(An)],b{k,1}:=[d1sm2vec(A01)dsm2vec(A0)].\displaystyle\texttt{A\{k,1\}}:=\left[\begin{array}[]{c}g_{1}^{\top}\\ \texttt{sm2vec}(A^{1}_{1}),\cdots,\texttt{sm2vec}(A^{1}_{n})\\ \vdots\\ g_{\ell}^{\top}\\ \texttt{sm2vec}(A^{\ell}_{1}),\cdots,\texttt{sm2vec}(A^{\ell}_{n})\end{array}\right],\ \ \ \texttt{b\{k,1\}}:=\left[\begin{array}[]{c}d_{1}\\ \texttt{sm2vec}(A^{1}_{0})\\ \vdots\\ d_{\ell}\\ \texttt{sm2vec}(A^{\ell}_{0})\end{array}\right].
Example 9.1.

Assume that we want to find scalars x1x_{1}, x2x_{2}, and x3x_{3} such that 2x1+3x2x352x_{1}+3x_{2}-x_{3}\leq 5 and all the eigenvalues of H:=x1A1+x2A2+x3A3H:=x_{1}A_{1}+x_{2}A_{2}+x_{3}A_{3} are at least 3, for

(233) A1=[100010001],A2=[001010100],A3=[010100000],\displaystyle A_{1}=\left[\begin{array}[]{ccc}1&0&0\\ 0&1&0\\ 0&0&1\end{array}\right],\ A_{2}=\left[\begin{array}[]{ccc}0&0&1\\ 0&1&0\\ 1&0&0\end{array}\right],\ A_{3}=\left[\begin{array}[]{ccc}0&1&0\\ 1&0&0\\ 0&0&0\end{array}\right],

such that the quantum entropy HH is minimized. We can write this problem as

(234) min\displaystyle\min t\displaystyle t
s.t.\displaystyle s.t. qe(x1A1+x2A2+x3A3)t,\displaystyle qe(x_{1}A_{1}+x_{2}A_{2}+x_{3}A_{3})\leq t,
2x1+3x2x35,\displaystyle 2x_{1}+3x_{2}-x_{3}\leq 5,
x1A1+x2A2+x3A33I.\displaystyle x_{1}A_{1}+x_{2}A_{2}+x_{3}A_{3}\succeq 3I.

For the objective function we have c=(0,0,0,1)\texttt{c}=(0,0,0,1)^{\top}. Assume that the first and second blocks are LP and SDP as before. We define the third block of constraints as:

cons{3,1}=’QE’,cons{3,2}=[3],b{3,1}:=[0zeros(9,1)],\displaystyle\texttt{cons\{3,1\}='QE'},\ \ \texttt{cons\{3,2\}}=[3],\ \ \texttt{b\{3,1\}}:=\left[\begin{array}[]{c}0\\ \texttt{zeros}(9,1)\end{array}\right],
A{3,1}:=[0001sm2vec(A1)sm2vec(A2)sm2vec(A3)sm2vec(zeros(3))].\displaystyle\texttt{A\{3,1\}}:=\left[\begin{array}[]{cccc}0&0&0&1\\ \texttt{sm2vec}(A1)&\texttt{sm2vec}(A2)&\texttt{sm2vec}(A3)&\texttt{sm2vec}(\texttt{zeros}(3))\end{array}\right].

If we run DDS, the answer we get is (x1,x2,x3)=(4,1,0)(x_{1},x_{2},x_{3})=(4,-1,0) with f(H)=14.63f(H)=14.63.

9.2. Adding quantum relative entropy based constraints

The abbreviation we use for quantum relative entropy is QRE. Let qrei:𝕊ni𝕊ni{+}qre_{i}:\mathbb{S}^{n_{i}}\oplus\mathbb{S}^{n_{i}}\rightarrow\mathbb{R}\cup\{+\infty\} be quantum relative entropy functions and consider \ell quantum entropy constraints of the form

qrei(A0i+x1A1i++xnAni,B0i+x1B1i++xnBni)gix+di,i{1,,}.\displaystyle qre_{i}(A^{i}_{0}+x_{1}A^{i}_{1}+\cdots+x_{n}A^{i}_{n},B^{i}_{0}+x_{1}B^{i}_{1}+\cdots+x_{n}B^{i}_{n})\leq g_{i}^{\top}x+d_{i},\ \ \ i\in\{1,\ldots,\ell\}.

AjiA^{i}_{j}’s and BjiB^{i}_{j}’s are nin_{i}-by-nin_{i} symmetric matrices. To input (212) to DDS as the kkth block, we define:

(251) cons{k,1}=’QRE’,cons{k,2}=[n1,,n],\displaystyle\texttt{cons\{k,1\}='QRE'},\ \ \texttt{cons\{k,2\}}=[n_{1},\ldots,n_{\ell}],
A{k,1}:=[g1sm2vec(A11),,sm2vec(An1)sm2vec(B11),,sm2vec(Bn1)gsm2vec(A1),,sm2vec(An)sm2vec(B1),,sm2vec(Bn)],b{k,1}:=[d1sm2vec(A01)sm2vec(B01)dsm2vec(A0)sm2vec(B0)].\displaystyle\texttt{A\{k,1\}}:=\left[\begin{array}[]{c}g_{1}^{\top}\\ \texttt{sm2vec}(A^{1}_{1}),\cdots,\texttt{sm2vec}(A^{1}_{n})\\ \texttt{sm2vec}(B^{1}_{1}),\cdots,\texttt{sm2vec}(B^{1}_{n})\\ \vdots\\ g_{\ell}^{\top}\\ \texttt{sm2vec}(A^{\ell}_{1}),\cdots,\texttt{sm2vec}(A^{\ell}_{n})\\ \texttt{sm2vec}(B^{\ell}_{1}),\cdots,\texttt{sm2vec}(B^{\ell}_{n})\end{array}\right],\ \ \ \texttt{b\{k,1\}}:=\left[\begin{array}[]{c}d_{1}\\ \texttt{sm2vec}(A^{1}_{0})\\ \texttt{sm2vec}(B^{1}_{0})\\ \vdots\\ d_{\ell}\\ \texttt{sm2vec}(A^{\ell}_{0})\\ \texttt{sm2vec}(B^{\ell}_{0})\end{array}\right].

10. Hyperbolic polynomials

Hyperbolic programming problems form one of the largest classes of convex optimization problems which have the kind of special mathematical structure amenable to more robust and efficient computational approaches. DDS 2.0 accepts hyperbolic constraints. Let us first define a hyperbolic polynomial. A polynomial p(x)[x1,,xm]p(x)\in\mathbb{R}[x_{1},\ldots,x_{m}] is said to be homogeneous if every term has the same degree dd. A homogeneous polynomial p:mp:\mathbb{R}^{m}\rightarrow\mathbb{R} is hyperbolic in direction eme\in\mathbb{R}^{m} if

  • p(e)>0p(e)>0.

  • for every xmx\in\mathbb{R}^{m}, the univariate polynomial p(xte)p(x-te) has only real roots.

Let p(x)[x1,,xm]p(x)\in\mathbb{R}[x_{1},\ldots,x_{m}] be hyperbolic in direction ee. We define the eigenvalue function λ:md\lambda:\mathbb{R}^{m}\rightarrow\mathbb{R}^{d} with respect to pp and ee such that for every xmx\in\mathbb{R}^{m}, the elements of λ(x)\lambda(x) are the roots of the univariate polynomial p(xte)p(x-te). The underlying hyperbolicity cone is defined as

(252) Λ+(p,e):={xm:λ(x)0}.\displaystyle\Lambda_{+}(p,e):=\{x\in\mathbb{R}^{m}:\lambda(x)\geq 0\}.
Example 10.1.

The polynomial p(x)=x12x22xm2p(x)=x_{1}^{2}-x_{2}^{2}-\cdots-x_{m}^{2} is hyperbolic in the direction e:=(1,0,,0)e:=(1,0,\ldots,0)^{\top} and the hyperbolicity cone with respect to ee is the second-order cone. The polynomial p(X)=det(X)p(X)=\det(X) defined on 𝕊n\mathbb{S}^{n} is hyperbolic in the direction II, and the hyperbolicity cone with respect to II is the positive-semidefinite cone.

By the above example, optimization over hyperbolicity cone is an extension of SOCP and SDP. The following theorem by Güler gives a s.c. barrier for the hyperbolicity cone.

Theorem 10.1 (Güler [17]).

Let p(x)p(x) be a homogeneous polynomial of degree dd, which is hyperbolic in direction ee. Then, the function ln(p(x))-\textup{ln}(p(x)) is a dd-LH s.c. barrier for Λ+(p,e)\Lambda_{+}(p,e).

DDS handles optimization problems involving hyperbolic polynomials using the above s.c. barrier. A computational problem is that, currently, we do not have a practical, efficient, algorithm to evaluate the LF conjugate of ln(p(x))-\textup{ln}(p(x)). Therefore, DDS uses a primal-heavy version of the algorithm for these problems, as we explain in Section 12.

10.1. Different formats for inputting multivariate polynomials

To input constraints involving hyperbolic polynomials, we use a matrix named poly. In DDS, there are different options to input a multivariate polynomial:

Using monomials: In this representation, if p(x)p(x) is a polynomial of mm variables, then poly is an kk-by-(m+1)(m+1) matrix, where kk is the number of monomials. In the jjth row, the first mm entries are the powers of the mm variables in the monomial, and the last entry is the coefficient of the monomial in p(x)p(x). For example, if p(x)=x12x22x32p(x)=x_{1}^{2}-x_{2}^{2}-x_{3}^{2}, then

poly:=[200102010021].\displaystyle\texttt{poly}:=\left[\begin{array}[]{cccc}2&0&0&1\\ 0&2&0&-1\\ 0&0&2&-1\end{array}\right].

Note: In many applications, the above matrix is very sparse. DDS recommends that in the monomial format, poly should be defined as a sparse matrix.

Using straight-line program: Another way to represent a polynomial is by a straight-line program, which can be seen as a rooted acyclic directed graph. The leaves represent the variables or constants. Each node is a simple binary operation (such as addition or multiplication), and the root is the result of the polynomial. In this case, poly is a kk-by-4 matrix, where each row represents a simple operation. Assume that p(x)p(x) has mm variables, then we define

f0=1,fi:=xi,i{1,,m}.f_{0}=1,\ \ f_{i}:=x_{i},\ \ i\in\{1,\ldots,m\}.

The \ellth row of poly is of the form [αjij][\alpha_{j}\ \ i\ \ j\ \ \Box], which means that

fm+j=αj(fifj).f_{m+j}=\alpha_{j}(f_{i}\ \Box\ f_{j}).

Operations are indexed by 2-digit numbers as the following table:

operation  \Box index
++ 11
- 22
×\times 33

For example, if p(x)=x12x22x32p(x)=x_{1}^{2}-x_{2}^{2}-x_{3}^{2}, we have the following representation:

[1113312233133331451116711].\displaystyle\left[\begin{array}[]{cccc}1&1&1&33\\ -1&2&2&33\\ -1&3&3&33\\ 1&4&5&11\\ 1&6&7&11\end{array}\right].

Note that straight-line program is not unique for a polynomial.

Determinantal representation: In this case, if possible, the polynomial p(x)p(x) is written as

(255) p(x)=det(H0+x1H1+x2H2++xmHm),\displaystyle p(x)=\det(H_{0}+x_{1}H_{1}+x_{2}H_{2}+\cdots+x_{m}H_{m}),

where HiH_{i}, i{0,1,,m}i\in\{0,1,\ldots,m\} are in 𝕊m\mathbb{S}^{m}. We define

poly:=[sm2vec(H0)sm2vec(H1)sm2vec(Hm)].\texttt{poly}:=[\texttt{sm2vec}(H_{0})\ \ \texttt{sm2vec}(H_{1})\ \cdots\ \texttt{sm2vec}(H_{m})].

For example, for p(x)=x12x22x32p(x)=x_{1}^{2}-x_{2}^{2}-x_{3}^{2}, we can have

H0:=(0000),H1:=(1001),H2:=(1001),H3:=(0110).H_{0}:=\begin{pmatrix}0&0\\ 0&0\end{pmatrix},\ \ H_{1}:=\begin{pmatrix}1&0\\ 0&1\end{pmatrix},\ \ H_{2}:=\begin{pmatrix}1&0\\ 0&-1\end{pmatrix},\ \ H_{3}:=\begin{pmatrix}0&1\\ 1&0\end{pmatrix}.\ \

Note: H1,,HmH_{1},\ldots,H_{m} must be nonzero symmetric matrices.

10.2. Adding constraints involving hyperbolic polynomials

Consider a hyperbolic polynomial constraint of the form

(256) p(Ax+b)0.\displaystyle p(Ax+b)\geq 0.

To input this constraint to DDS as the kkth block, AA and bb are defined as before, and different parts of cons are defined as follows:

cons{k,1}=’HB’,
cons{k,2}= number of variables in p(z)p(z).
cons{k,3} is the poly that can be given in one of the three formats of Subsection 10.1.
cons{k,4} is the format of polynomial that can be ’monomial’, ’straight_line’, or ’determinant’.
cons{k,5} is the direction of hyperbolicity or a vector in the interior of the hyperbolicity cone.

Example 10.2.

Assume that we want to give constraint (256) to DDS for p(x)=x12x22x32p(x)=x_{1}^{2}-x_{2}^{2}-x_{3}^{2}, using the monomial format. Then, cons part is defined as

cons{k,1}=’HB’,cons{k,2}=[3],\displaystyle\texttt{cons\{k,1\}='HB'},\ \ \texttt{cons\{k,2\}}=[3],
cons{k,3}=[200102010021],\displaystyle\texttt{cons\{k,3\}}=\left[\begin{array}[]{cccc}2&0&0&1\\ 0&2&0&-1\\ 0&0&2&-1\end{array}\right],
cons{k,4}=’monomial’,cons{k,5}=[100].\displaystyle\texttt{cons\{k,4\}='monomial'},\ \ \texttt{cons\{k,5\}}=\left[\begin{array}[]{c}1\\ 0\\ 0\end{array}\right].

11. Equality constraints

In the main formulation of Domain-Driven form (1), equality constraints are not included. However, the current version of DDS accepts equality constraints. In other form, users can input a problem of the form

(259) infx{c,x:Bx=d,AxD}.\displaystyle\inf_{x}\{\langle c,x\rangle:Bx=d,\ Ax\in D\}.

To input a block of equality constraints Bx=dBx=d, where BB is a mEQm_{EQ}-by-nn matrix, as the kkth block, we define

cons{k,1}=’EQ’,cons{k,2}=mEQ,A{k,1}=B,b{k,1}=d.\texttt{cons\{k,1\}='EQ'},\ \ \texttt{cons\{k,2\}}=m_{EQ},\ \ \texttt{A\{k,1\}}=B,\ \ \texttt{b\{k,1\}}=d.
Example 11.1.

If for a given problem with x3x\in\mathbb{R}^{3}, we have a constraint x2x3=2x_{2}-x_{3}=2, then we can add it as the kkth block by the following definitions:

cons{k,1}=’EQ’,cons{k,2}=1,A{k,1}=[0 11],b{k,1}=[2].\texttt{cons\{k,1\}='EQ'},\ \ \texttt{cons\{k,2\}}=1,\ \ \texttt{A\{k,1\}}=[0\ \ 1\ -1],\ \ \texttt{b\{k,1\}}=[2].

In the main Domain-Driven formulation (1), for some theoretical and practical considerations, we did not consider the equality constraints. From a mathematical point of view, this is not a problem. For any matrix ZZ whose columns form a basis for the null space of BB, we may represent the solution set of Bx=dBx=d as x0+Zwx^{0}+Zw, where x0x^{0} is any solution of Bx=dBx=d. Then the feasible region in (259) is equivalent to:

(260) Dw:={w:AZw(DAx0)}.\displaystyle D_{w}:=\{w:AZw\in(D-Ax^{0})\}.

DAx0D-Ax^{0} is a translation of DD with the s.c. barrier Φ(zAx0)\Phi(z-Ax^{0}), where Φ\Phi is a s.c. barrier for DD. Therefore, theoretically, we can reformulate (259) as (1), where AZAZ replaces AA. Even though this procedure is straightforward in theory, there might be numerical challenges in application. For example, if we have a nice structure for AA, such as sparsity, multiplying AA with ZZ may ruin the structure. In DDS, we use a combination of heuristics, LU and QR factorizations to implement this transition efficiently.

12. Primal-heavy version of the algorithm

For some class of problems, such as hyperbolic programming, a computable s.c. barrier Φ\Phi is available for the set DD, while the LF of it is not available. For these classes, DDS uses a primal-heavy version of the algorithm. In the primal-heavy version, we approximate the primal-dual system of equations for computing the search directions by approximating the gradient and Hessian of Φ\Phi_{*}. The approximations are based on the relations between the derivatives of Φ\Phi and Φ\Phi_{*}: for every point zintDz\in\textup{int}D we have

(261) z=Φ(Φ(z)),Φ′′(Φ(z))=[Φ′′(z)]1.\displaystyle z=\Phi_{*}^{\prime}(\Phi^{\prime}(z)),\ \ \ \Phi_{*}^{\prime\prime}(\Phi^{\prime}(z))=[\Phi^{\prime\prime}(z)]^{-1}.

Instead of the primal-dual proximity measure defined in [19], we use the primal-heavy version:

(262) τyμΦ(u)[Φ′′(u)]1,\displaystyle\left\|\frac{\tau y}{\mu}-\Phi^{\prime}(u)\right\|_{[\Phi^{\prime\prime}(u)]^{-1}},

where u:=Ax+1τz0u:=Ax+\frac{1}{\tau}z^{0}, τ\tau is an artificial variable we use in the formulation of the central path (see [19] for details), and μ\mu is the parameter of the central path. By [19]-Corollary 4.1, this proximity measures is “equivalent” to the primal-dual one, but (262) is less efficient computationally.

By using a primal-heavy version, we lose some of the desired properties of primal-dual setup, such as the ability to move in a wider neighbourhood of the central path. Moreover, in the primal-heavy version, we have to somehow make sure that the dual iterates yy are feasible (or at least the final dual iterate is). Another issue is with calculating the duality gap (4). For a general convex domain DD, we need Φ\Phi_{*}^{\prime} to accurately calculate δ(y|D)\delta_{*}(y|D) as explained in [19]. Note that when DD is a shifted cone D=KbD=K-b, then we have

(263) δ(y|D)=b,y.\displaystyle\delta_{*}(y|D)=-\langle b,y\rangle.

To calculate the duality gap, we can write it as the summation of separate terms for the domains, and if a domain with only the primal barrier is a shifted cone, we can use (263). This is the case for the current version of DDS as all the set constraints using primal-heavy version are based on a shifted convex cone. To make sure that the dual iterates are feasible, we choose our neighborhoods to satisfy

(264) τyμΦ(u)Φ′′(u)<1,\displaystyle\left\|\frac{\tau y}{\mu}-\Phi^{\prime}(u)\right\|_{\Phi^{\prime\prime}(u)}<1,

and by the Dikin ellipsoid property of s.c. functions, yy iterates stay feasible. We can specify in OPTIONS if we want to use a primal-heavy version of the algorithm in DDS by

OPTIONS.primal=1;

13. Numerical Results

In this section, we present some numerical examples of running DDS 2.0. We performed computational experiments using the software MATLAB R2018b, on a 4-core 3.2 GHz Intel Xeon X5672 machine with 96GB of memory. Many of the examples in this section are given as mat-files in the DDS package; in the folder Test_Examples.

DDS pre-processes the input data and returns the statistics before and after the pre-processing. This processing includes checking the structure of the input data removing obviously redundant constraints. For example, if the size of AA given for a block of constraints does not match the information in cons, DDS returns an error. The pre-processing also includes scaling the input data properly for each block of constraints.

As mentioned in Section 2, users have the option to give the starting point of the infeasible-start algorithm as input. Otherwise, DDS chooses an easy-to-find starting point for every input set constraint. The stopping criteria in DDS are based on the status determination result in [20], which studied how to efficiently certify different status of a given problem in Domain-Driven form, using duality theory. [20]-Section 9 discusses stopping criteria in a practical setting. We define the following parameters for the current point (x,τ,y)(x,\tau,y):

(265) gap:=|c,x+1τδ(y|D)|1+|c,x|+|1τδ(y|D)|,Pfeas:=1τz0,Dfeas:=1τAy+c1+c,\displaystyle gap:=\frac{|\langle c,x\rangle+\frac{1}{\tau}\delta_{*}(y|D)|}{1+|\langle c,x\rangle|+|\frac{1}{\tau}\delta_{*}(y|D)|},\ \ P_{feas}:=\frac{1}{\tau}\|z^{0}\|,\ \ D_{feas}:=\frac{\|\frac{1}{\tau}A^{\top}y+c\|}{1+\|c\|},

where xx and yy are the primal and dual points, and τ\tau is the artificial variable we use (see [19] for details). DDS stops the algorithm and returns the status as “solved” when we have

(266) max{gap,Pfeas,Dfeas}tol.\displaystyle\max\{gap,P_{feas},D_{feas}\}\leq tol.

DDS returns that status as “infeasible” if the returned certificate y¯:=τμy\bar{y}:=\frac{\tau}{\mu}y satisfies

(267) Ay¯tol,δ(y¯|D)<0,\displaystyle\|A^{\top}\bar{y}\|\leq tol,\ \ \ \delta_{*}(\bar{y}|D)<0,

and returns the status as “unbounded” if

(268) c,x1tol.\displaystyle\langle c,x\rangle\leq-\frac{1}{tol}.

DDS has different criteria for returning “ill-conditioned” status, which can be because of numerical inaccuracy issues.

In the following, we see the numerical performance of DDS for different combinations of set constraints.

13.1. LP-SOCP-SDP

In this subsection, we consider LP-SOCP-SDP instances mostly from the Dimacs library [28]. Note that the problems in the library are for the standard equality form and we solve the dual of the problems. Table 3 shows the results. DDS has a built-in function read_sedumi_DDS to read problems in SeDuMi input format. You can use either of the following commands based on the SeDuMi format file

[c,A,b,cons]=read_sedumi_DDS(At,b,c,K);
[c,A,b,cons]=read_sedumi_DDS(A,b,c,K);
Table 3. Numerical results for some problem from the Dimacs library for tol=108tol=10^{-8}.
Problem size of AA Type of Constraints Iterations
nb 23831232383*123 SOCP-LP 42
nb_L1 31769153176*915 SOCP-LP 37
nb_L2 41951234195*123 SOCP-LP 23
nb_L2_bessel 26411232641*123 SOCP-LP 27
filter48_socp 32849693284*969 SDP-SOCP-LP 80
filtinf1 33959833395*983 SDP-SOCP-LP 26
truss5 33012083301*208 SDP 66
truss8 1191449611914*496 SDP 76
copo14 310812753108*1275 SDP-LP 24
copo23 310812753108*1275 SDP-LP 32
toruspm3-8-50 262144512262144*512 SDP 20
torusg3-8 262144512262144*512 SDP 24
sched_50_50_scaled 497725264977*2526 SOCP-LP 81
mater-3 39448143939448*1439 SDP-LP 124
cnhil8 14400171614400*1716 SDP 31
cnhil10 48400500548400*5005 SDP 35
cphil10 48400500548400*5005 SDP 12
ros_500 17944498817944*4988 SDP 41
sensor_500 2456013540245601*3540 SDP 65
taha1a 2316723002231672*3002 SDP 23
taha1a 2316723002231672*3002 SDP 42
G40mc 400000020004000000*2000 SDP 33
1tc.1024 104857679371048576*7937 SDP 41
yalsdp 30000505130000*5051 SDP 17

13.2. EXP cone optimization problems from CBLIB

Conic Benchmark Library (CBLIB) [15] is a collection of benchmark problems for conic mixed-integer and continuous optimization. It contains a collection of optimization problems involving exponential cone, they show as EXP. Exponential cone is a special case of vector relative entropy we discussed in Section 8 when =1\ell=1. Table 4 show the results of running DDS to solve problems with EXP cone from CBLIB. The files converted into DDS format can be found in the Test_Examples folder.

Table 4. Numerical results for some EXP cone problems from CBLIB.
Problem size of AA size of AEQA_{EQ} Type of Constraints Iterations
syn30m 324121324*121 1912119*121 EXP-LP 36
syn30h 528249528*249 161249161*249 EXP-LP 51
syn30m02h 13266171326*617 381617381*617 EXP-LP 46
syn05m02m 17767177*67 116711*67 EXP-LP 36
syn10h 17184171*84 558455*84 EXP-LP 37
syn15m 17767177*67 126712*67 EXP-LP 31
syn40h 715331715*331 213331213*331 EXP-LP 48
synthes1 301330*13 1131*13 EXP-LP 13
synthes2 411641*16 1121*12 EXP-LP 26
synthes3 712671*26 3263*26 EXP-LP 20
ex1223a 591759*17 1171*17 EXP-LP 11
gp_dave_1 985705985*705 453705453*705 EXP-LP 37
fiac81a 163191163*191 126191126*191 EXP-LP 15
fiac81b 658765*87 578757*87 EXP-LP 15
batch 17558175*58 135813*58 EXP-LP 160
demb761 9313193*131 9013190*131 EXP-LP 24
demb762 9313193*131 9013190*131 EXP-LP 27
demb781 141914*19 121912*19 EXP-LP 8
enpro48 506167506*167 3316733*167 EXP-LP 216
isi101 467393467*393 26133261*33 EXP-LP 52 (infeas)
jha88 8661131866*1131 8251131825*1131 EXP-LP 33
LogExpCR-n20-m400 202320222023*2022 120020221200*2022 EXP-LP 40
LogExpCR-n100-m400 210321022103*2102 120021021200*2102 EXP-LP 46
LogExpCR-n20-m1200 602360226023*6022 360060223600*6022 EXP-LP 44
LogExpCR-n500-m400 250325022503*2502 120025021200*2502 EXP-LP 68
LogExpCR-n500-m1600 850385028503*8502 480085024800*8502 EXP-LP 63

13.3. Minimizing Nuclear Norm

In this subsection, we are going to use DDS to solve problem (134); minimizing nuclear norm of a matrix subject to linear constraints. We are interested in the case that XX is an nn-by-mm matrix with nmn\gg m, which makes the s.c. barrier for the epigraph of a matrix norm more efficient than the SDP representation. We compare DDS with convex modeling system CVX, which has a function norm_nuc for nuclear norm. For numerical results, we assume (134) has two linear constraints and U1U_{1} and U2U_{2} are 0-1 matrices generated randomly. Let us fix the number of columns mm and calculate the running time by changing the number of rows nn. Figure 1 shows the plots of running time for both DDS and CVX. For epigraph of a matrix norm constraints, DDS does not form matrices of size m+nm+n, and as can be seen in the figure, the running time is almost linear as a function of nn. On the other hand, the CVX running time is clearly super-linear.

Refer to caption
Figure 1. Average running time for minimizing nuclear norm versus the number of rows in the matrix, where the number of columns is fixed at m=20m=20.

13.4. Quantum Entropy and Quantum Relative Entropy

In this subsection, we see optimization problems involving quantum entropy. As far as we know, there is no library for quantum entropy and quantum relative entropy functions. We have created examples of combinations of quantum entropy constraints and other types such as LP, SOCP, and pp-norm. Problems for quantum entropy are of the form

(269) min\displaystyle\min t\displaystyle t
s.t. qe(A0+x1A1++xnAn)t,\displaystyle qe(A_{0}+x_{1}A_{1}+\cdots+x_{n}A_{n})\leq t,
other constraints,\displaystyle\text{other constraints},

where qe=TrXln(X)qe=\textup{Tr}{X\textup{ln}(X)} and AiA_{i}’s are sparse symmetric matrices. The problems for qreqre are also in the same format. Our collection contains both feasible and infeasible cases. The results are shown in Tables 5 and 6 respectively for quantum entropy and quantum relative entropy. The MATLAB files in the DDS format for some of these problems are given in the Test_Examples folder. We compare our results to CVXQUAD, which is a collection of matrix functions to be used on top of CVX. The package is based on the paper [13], which approximates matrix logarithm with functions that an be described as feasible region of SDPs. Note that these approximations increase the dimension of the problem, which is why CVX fails for the larger problems in Table 5.

Table 5. Results for problems involving Quantum Entropy optimization problems.
Problem size of AA Type of Constraints Itr/  time(s) CVXQUAD
QuanEntr-10 10121101*21 QE 11/ 0.6 12/  1.3
QuanEntr-20 40141401*41 QE 13/ 0.8 13/ 2.4
QuanEntr-50 25011012501*101 QE 18/ 1.8 12/ 23.2
QuanEntr-100 1000120110001*201 QE 24/ 6.7 array size error
QuanEntr-200 4000140140001*401 QE 32/ 53.9 array size error
QuanEntr-LP-10 11121111*21 QE-LP 19/ 1.1 16/ 3.4
QuanEntr-LP-20 42141421*41 QE-LP 23 / 1.6 20/ 9.1
QuanEntr-LP-50 25511012551*101 QE-LP 34/ 2.3 20/ 103.4
QuanEntr-LP-100 1010120110101*201 QE-LP 44/ 14.5 array size error
QuanEntr-LP-SOCP-10-infea 12221122*21 QE-LP-SOCP 12/ 0.6 (infea) 16/ 5.1
QuanEntr-LP-SOCP-10 12221122*21 QE-LP-SOCP 26/ 1.0 24/ 2.0
QuanEntr-LP-SOCP-20-infea 44141441*41 QE-LP-SOCP 14/ 1.1 (infea) 14/ 5.8
QuanEntr-LP-SOCP-20 44141441*41 QE-LP-SOCP 33/ 1.8 22/ 4.1
QuanEntr-LP-SOCP-100 1020120110201*201 QE-LP-SOCP 51/ 20.8 array size error
QuanEntr-LP-SOCP-200 4040240140402*401 QE-LP-SOCP 69/ 196 array size error
QuanEntr-LP-3norm-10-infea 12121121*21 QE-LP-pnorm 16/ 0.6 (infea) 22/ 2.0
QuanEntr-LP-3norm-10 12121121*21 QE-LP-pnorm 21/ 1.2 19/ 2.0
QuanEntr-LP-3norm-20-infea 44141441*41 QE-LP-pnorm 16/ 0.9 (infea) 18/ 6.5
QuanEntr-LP-3norm-20 44141441*41 QE-LP-pnorm 29/ 1.1 20/ 7.3
QuanEntr-LP-3norm-100-infea 1020120110201*201 QE-LP-pnorm 25/ 15.3 (infea) array size error
QuanEntr-LP-3norm-100 1020120110201*201 QE-LP-pnorm 66/ 40.5 array size error
Table 6. Results for problems involving Quantum Relative Entropy optimization problems.
Problem size of AA Type of Constraints Itr/  time(s) CVXQUAD
QuanReEntr-6 731373*13 QRE 12/ 6.3 12/  18.1
QuanReEntr-10 20121201*21 QRE 12/  21.2 25/  202
QuanReEntr-20 80141801*41 QRE 17/  282 array size error
QuanReEntr-LP-6 791379*13 QRE-LP 25/ 23.2 21/  19.6
QuanReEntr-LP-6-infea 791379*13 QRE-LP 30/ 11.8 28/  21.5
QuanReEntr-LP-10 10121101*21 QRE-LP 26/ 49.3 24/  223

13.5. Hyperbolic Polynomials

We have created a library of hyperbolic cone programming problems for our experiments. Hyperbolic polynomials are defined in Section 10. We discuss three methods in this section for the creation of our library.

13.5.1. Hyperbolic polynomials from matriods

Let us first define a matroid.

Definition 13.1.

A ground set EE and a collection of independent sets 2E\mathcal{I}\subseteq 2^{E} form a matroid M=(E,)M=(E,\mathcal{I}) if:

  • \emptyset\in\mathcal{I},

  • if AA\in\mathcal{I} and BAB\subset A, then BB\in\mathcal{I},

  • if A,BA,B\in\mathcal{I}, and |B|>|A||B|>|A|, then there exists bBAb\in B\setminus A such that A{b}A\cup\{b\}\in\mathcal{I}.

The independent sets with maximum cardinality are called the bases of the matroid, we denote the set of bases of the matroid by \mathcal{B}. We can also assign a rank function rM:2E+r_{M}:2^{E}\rightarrow\mathbb{Z}_{+} as:

(270) rM(A):=max{|B|:BA,B}.\displaystyle r_{M}(A):=\max\{|B|:B\subseteq A,B\in\mathcal{I}\}.

Naturally, the rank of a matroid is the cardinality of any basis of it. The basis generating polynomial of a matroid is defined as

(271) pM(x):=BiBxi.\displaystyle p_{M}(x):=\sum_{B\in\mathcal{B}}\ \prod_{i\in B}x_{i}.

A class of hyperbolic polynomials are the basis generating polynomials of certain matroids with the half-plane property [9]. A polynomials has the half-plane property if it is nonvanishing whenever all the variables lie in the open right half-plane. We state it as the following lemma:

Lemma 13.1.

Assume that MM is a matroid with half-plane property. Then its basis generating polynomial is hyperbolic in any direction in the positive orthant.

Several classes of matroids have been proven to have the half-plane property [9, 3, 1]. As the first example, we introduce the Vamos matroid. The ground set has size 8 (can be seen as 8 vertices of the cube) and the rank of matroid is 4. All subsets of size 4 are independent except 5 of them. Vamos matroid has the half-plane property [37]. The basis generating polynomial is:

pV(x):=E4(x1,,x8)x1x2x3x4x1x2x5x6x1x2x6x8x3x4x5x6x5x6x7x8,\displaystyle p_{V}(x):=E_{4}(x_{1},\ldots,x_{8})-x_{1}x_{2}x_{3}x_{4}-x_{1}x_{2}x_{5}x_{6}-x_{1}x_{2}x_{6}x_{8}-x_{3}x_{4}x_{5}x_{6}-x_{5}x_{6}x_{7}x_{8},

where Ed(x1,,xm)E_{d}(x_{1},\ldots,x_{m}) is the elementary symmetric polynomial of degree dd with mm variables. Note that elementary symmetric polynomials are hyperbolic in direction of the vector of all ones. Some extensions of Vamos matroid also satisfy the half-plane property [6]. These matroids give the following Vamos-like basis generating polynomials:

(272) pVL(x):=E4(x1,,x2m)(i=2mx1x2x2k1x2k)(i=2m1x2k1x2kx2k+1x2k+2).\displaystyle p_{VL}(x):=E_{4}(x_{1},\ldots,x_{2m})-\left(\sum_{i=2}^{m}x_{1}x_{2}x_{2k-1}x_{2k}\right)-\left(\sum_{i=2}^{m-1}x_{2k-1}x_{2k}x_{2k+1}x_{2k+2}\right).

Vamos like polynomials in (272) have an interesting property of being counter examples to one generalization of Lax conjecture [4, 6]. Explicitly, there is no power kk and symmetric matrices H2,,H2mH_{2},\ldots,H_{2m} such that

(273) (pVL(x))k=det(x1I+x2H2++x2mH2m).\displaystyle(p_{VL}(x))^{k}=\det(x_{1}I+x_{2}H_{2}+\cdots+x_{2m}H_{2m}).

In the DDS package, we have created a function

poly = vamos(m)

that returns a Vamos like polynomial as in (272) in the ’monomial’ format, which can be given as cons{k,3} for inputting a hyperbolic constraint. vamos(4) returns the Vamos polynomial with 8 variables.

Graphic matroids also have the half-plane property [9]. However, the hyperbolicity cones arising from graphic matroids are isomorphic to positive semidefinite cones. This can be proved using a classical result that the characteristic polynomial of the bases of graphic matroids is the determinant of the Laplacian (with a row and a column removed) of the underlying graph [5]. Consider a graph G=(V,E)G=(V,E) and let 𝒯\mathcal{T} be the set of all spanning trees of GG. Then the generating polynomial defined as

(274) TG(x):=T𝒯eTxe\displaystyle T_{G}(x):=\sum_{T\in\mathcal{T}}\ \prod_{e\in T}x_{e}

is a hyperbolic polynomial. Several matroid operations preserve the half-plane property as proved in [9], including taking minors, duals, and direct sums.

We have created a set of problems with combinations of hyperbolic polynomial inequalities and other types of constraints such as those arising from LP, SOCP, and entropy function.

Table 7. Results for problems involving hyperbolic polynomials.
Problem size of AA var/deg of p(z)p(z) Type of Constraints Itr/  time(s)
Vamos-1 848*4 8/4 HB 7/  1.5
Vamos-LP-1 12412*4 8/4 HB-LP 8/  2
Vamos-SOCP-1 17517*5 8/4 HB-LP-SOCP 12/  2
Vamos-Entr-1 17517*5 8/4 HB-LP-Entropy 9/  1.2
VL-Entr-1 411141*11 20/4 HB-LP-Entropy 9/  24
VL-Entr-1 611661*16 30/4 HB-LP-Entropy 12/  211

13.5.2. Derivatives of products of linear forms

Hyperbolicity of a polynomial is preserved under directional derivative operation (see [30]):

Theorem 13.1.

Let p(x)[x1,,xm]p(x)\in\mathbb{R}[x_{1},\ldots,x_{m}] be hyperbolic in direction ee and of degree at least 2. Then the polynomial

(275) pe(x):=(p)(x)[e]\displaystyle p^{\prime}_{e}(x):=(\nabla p)(x)[e]

is also hyperbolic in direction ee. Moreover,

(276) Λ(p,e)Λ(pe,e).\displaystyle\Lambda(p,e)\subseteq\Lambda(p^{\prime}_{e},e).

Assume that p(x):=l1(x)l(x)p(x):=l_{1}(x)\cdots l_{\ell}(x), where l1(x),,l(x)l_{1}(x),\ldots,l_{\ell}(x) are linear forms. Then, pp is hyperbolic in the direction of any vector ee such that p(e)0p(e)\neq 0. As an example, consider Em(x1,,xm)=x1xmE_{m}(x_{1},\ldots,x_{m})=x_{1}\cdots x_{m}. Recursively applying Theorem 13.1 to such polynomials pp leads to many hyperbolic polynomials, including all elementary symmetric polynomials. For some properties of hyperbolicity cones of elementary symmetric polynomials, see [38]. For some preliminary computational experiments on such hyperbolic programming problems, see [24].

For the polynomials constructed by the products of linear forms, it is more efficient to use the straight-line program. For a polynomial p(x):=l1(x)l(x)p(x):=l_{1}(x)\cdots l_{\ell}(x), let us define a \ell-by-mm matrix LL where the jjth row contains the coefficients of ljl_{j}. In DDS, we have created a function

[poly,poly_grad] = lin2stl(M,d)

which returns two polynomials in ”straight-line” form, one as the product of linear forms defined by MM, and the other one its directional derivative in the direction of dd. For example, if we want the polynomial p(x):=(2x1x3)(x13x2+4x3)p(x):=(2x_{1}-x_{3})(x_{1}-3x_{2}+4x_{3}), then our MM is defined as

(279) M:=[201134].\displaystyle M:=\left[\begin{array}[]{ccc}2&0&-1\\ 1&-3&4\end{array}\right].

Table 8 shows some results of problem where the hyperbolic polynomial is either a product of linear forms or their derivatives.

Table 8. Results for problems involving hyperbolic polynomials as product of linear forms and their derivatives.
Problem size of AA var/deg of p(z)p(z) Type of Constraints Itr
HPLin-LP-1 555055*50 5/3 HB-LP 8
HPLinDer-LP-1 555055*50 5/3 HB-LP 7
HPLin-LP-2 555055*50 5/7 HB-LP 19
HPLinDer-LP-2 555055*50 5/7 HB-LP 22
HPLin-Entr-1 11151111*51 10/15 HB-LP-Entropy 19
HPLinDer-Entr-1 11151111*51 10/15 HB-LP-Entropy 18
HPLin-pn-1 11151111*51 10/15 HB-LP-pnorm 30
HPLinDer-pn-1 11151111*51 10/15 HB-LP-pnorm 32
Elem-unb-10 10510*5 10/3 HB 12 (unb)
Elem-LP-10 15515*5 10/3 HB-LP 9
Elem-LP-10-2 101010001010*1000 10/3 HB-LP 30
Elem-Entr-10-1 211101211*101 10/3 HB-LP-Entropy 17
Elem-Entr-10-2 221101221*101 10/4 HB-LP-Entropy 18

13.5.3. Derivatives of determinant of matrix pencils

Let H1,,HmnH_{1},\ldots,H_{m}\in\mathbb{H}^{n} be Hermitian matrices and eme\in\mathbb{R}^{m} be such that e1H1++emHme_{1}H_{1}+\cdots+e_{m}H_{m} is positive definite. Then, p(x):=det(x1H1++xmHm)p(x):=\det(x_{1}H_{1}+\cdots+x_{m}H_{m}) is hyperbolic in direction ee. Now, by using Theorem 13.1, we can generate a set of interesting hyperbolic polynomials by taking the directional derivative of this polynomial up to n3n-3 times at direction ee.

References

  • [1] N. Amini and P. Brändén, Non-representable hyperbolic matroids, Adv. Math., 334 (2018), pp. 417–449.
  • [2] S. Boyd, S.-J. Kim, L. Vandenberghe, and A. Hassibi, A tutorial on geometric programming, Optimization and Engineering, 8 (2007), pp. 67–127.
  • [3] P. Brändén, Polynomials with the half-plane property and matroid theory, Advances in Mathematics, 216 (2007), pp. 302–320.
  • [4]  , Obstructions to determinantal representability, Advances in Mathematics, 226 (2011), pp. 1202–1212.
  • [5]  , Hyperbolicity cones of elementary symmetric polynomials are spectrahedral, Optimization Letters, 8 (2014), pp. 1773–1782.
  • [6] S. Burton, C. Vinzant, and Y. Youm, A real stable extension of the vamos matroid polynomial, arXiv preprint arXiv:1411.2038, (2014).
  • [7] V. Chandrasekaran and P. Shah, Relative entropy optimization and its applications, Mathematical Programming, 161 (2017), pp. 1–32.
  • [8] R. Chares, Cones and Interior-Point Algorithms for Structured Convex Optimization Involving Powers and Exponentials, PhD thesis, Université Catholique de Louvain, Louvain-la-Neuve, 2008.
  • [9] Y.-B. Choe, J. G. Oxley, A. D. Sokal, and D. G. Wagner, Homogeneous multivariate polynomials with the half-plane property, Advances in Applied Mathematics, 32 (2004), pp. 88–187.
  • [10] J. Dahl and E. D. Andersen, A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization, Optimization Online, (2019).
  • [11] C. Davis, All convex invariant functions of hermitian matrices, Archiv der Mathematik, 8 (1957), pp. 276–278.
  • [12] S.-C. Fang, J. R. Rajasekera, and H.-S. J. Tsao, Entropy optimization and Mathematical Programming, vol. 8, Springer Science & Business Media, 1997.
  • [13] H. Fawzi, J. Saunderson, and P. A. Parrilo, Semidefinite approximations of the matrix logarithm, Foundations of Computational Mathematics, 19 (2019), pp. 259–296.
  • [14] L. Faybusovich and T. Tsuchiya, Matrix monotonicity and self-concordance: how to handle quantum entropy in optimization problems, Optimization Letters, (2017), pp. 1513–1526.
  • [15] H. A. Friberg, Cblib 2014: a benchmark library for conic mixed-integer and continuous optimization, Mathematical Programming Computation, 8 (2016), pp. 191–214.
  • [16] M. Grant, S. Boyd, and Y. Ye, CVX: MATLAB software for disciplined convex programming, 2008.
  • [17] O. Güler, Hyperbolic polynomials and interior point methods for convex programming, Mathematics of Operations Research, 22 (1997), pp. 350–377.
  • [18] F. Hiai and D. Petz, Introduction to matrix analysis and applications, Springer Science & Business Media, 2014.
  • [19] M. Karimi and L. Tunçel, Primal-dual interior-point methods for Domain-Driven formulations, Mathematics of Operations Research (to appear), arXiv preprint arXiv:1804.06925, (2018).
  • [20]  , Status determination by interior-point methods for convex optimization problems in domain-driven form, arXiv preprint arXiv:1901.07084, (2019).
  • [21] A. S. Lewis, The mathematics of eigenvalue optimization, Mathematical Programming, 97 (2003), pp. 155–176.
  • [22] H. D. Mittelmann, The state-of-the-art in conic optimization software, in Handbook on Semidefinite, Conic and Polynomial Optimization, Springer, 2012, pp. 671–686.
  • [23] MOSEK ApS, The MOSEK optimization toolbox for MATLAB manual. Version 9.0., 2019. http://docs.mosek.com/9.0/toolbox/index.html.
  • [24] T. G. J. Myklebust, On primal-dual interior-point algorithms for convex optimisation, PhD thesis, Department of Combinatorics and Optimization, Faculty of Mathematics, University of Waterloo.
  • [25] A. Nemirovski and L. Tunçel, Cone-free primal-dual path-following and potential reduction polynomial time interior-point methods, Mathematical Programming, 102 (2005), pp. 261–294.
  • [26] Y. Nesterov, Lectures on Convex Optimization, Springer, 2018.
  • [27] Y. Nesterov and A. Nemirovski, Interior-Point Polynomial Algorithms in Convex Programming, SIAM Series in Applied Mathematics, SIAM: Philadelphia, 1994.
  • [28] G. Pataki and S. Schmieta, The DIMACS library of semidefinite-quadratic-linear programs, tech. rep., Preliminary draft, Computational Optimization Research Center, Columbia University, New York, 2002.
  • [29] B. Recht, M. Fazel, and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), pp. 471–501.
  • [30] J. Renegar, Hyperbolic programs, and their derivative relaxations, Foundations of Computational Mathematics, 6 (2006), pp. 59–79.
  • [31] S. Roy and L. Xiao, On self-concordant barriers for generalized power cones, tech. rep., Tech. Report MSR-TR-2018-3, January 2018, https://www. microsoft. com/en-us …, 2018.
  • [32] J. F. Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones, Optimization Methods and Software, 11 (1999), pp. 625–653.
  • [33] K.-C. Toh, M. J. Todd, and R. H. Tütüncü, On the implementation and usage of SDPT3–a MATLAB software package for semidefinite-quadratic-linear programming, version 4.0, in Handbook on semidefinite, conic and polynomial optimization, Springer, 2012, pp. 715–754.
  • [34] J. Tropp, An introduction to matrix concentration inequalities, arXiv preprint arXiv:1501.01571, (2015).
  • [35] L. Tunçel and A. Nemirovski, Self-concordant barriers for convex approximations of structured convex sets, Foundations of Computational Mathematics, 10 (2010), pp. 485–525.
  • [36] R. H. Tütüncü, K.-C. Toh, and M. J. Todd, Solving semidefinite-quadratic-linear programs using SDPT3, Mathematical programming, 95 (2003), pp. 189–217.
  • [37] D. G. Wagner and Y. Wei, A criterion for the half-plane property, Discrete Mathematics, 309 (2009), pp. 1385–1390.
  • [38] Y. Zinchenko, On hyperbolicity cones associated with elementary symmetric polynomials, Optim. Lett., 2 (2008), pp. 389–402.

Appendix A Calculating the predictor and corrector steps

As discussed in [19], for both the predictor and corrector steps, the theoretical algorithm solves a linear system with the LHS matrix of the form

(286) U[[Hhhζ]00[G+ηhhηhηhη]]U\displaystyle U^{\top}\left[\begin{array}[]{cc}\left[\begin{array}[]{cc}H&h\\ h^{\top}&\zeta\end{array}\right]&0\\ 0&\left[\begin{array}[]{cc}G+\eta_{*}h_{*}h_{*}^{\top}&-\eta_{*}h_{*}\\ -\eta_{*}h_{*}^{\top}&\eta_{*}\end{array}\right]\end{array}\right]U

where UU is a matrix that contains the linear transformations we need:

(291) U=[A000100cAFc00].\displaystyle U=\left[\begin{array}[]{ccc}A&0&0\\ 0&1&0\\ 0&-c_{A}&-F^{\top}\\ c^{\top}&0&0\end{array}\right].

HH and GG are positive definite matrices based on the Hessians of the s.c. functions, FF is a matrix whose rows form a basis for null(A)\textup{null}(A), cAc_{A} is any vector that satisfies AcA=cA^{\top}c_{A}=c, and η\eta_{*}, ζ\zeta, hh, and hh_{*} are scalars and vectors defined in [19]. A practical issue of this system is that calculating FF is not computationally efficient, and DDS uses a way around it. If we expand the system given in (286), the linear systems DDS needs to solve become of the form:

[AHA+ηccAh+ηhcAcηchFhA+ηcAhcζ+cAGcA+η(cAh)2cAGF+ηcAhhFηFhcFGcA+ηFhhcAFGF+ηFhhF][d¯xdτdv]=[r1r2Fr3].\displaystyle\left[\begin{array}[]{ccc}A^{\top}HA+\eta_{*}cc^{\top}&A^{\top}h+\eta_{*}h_{*}^{\top}c_{A}c&\eta_{*}ch_{*}^{\top}F^{\top}\\ h^{\top}A+\eta_{*}c_{A}^{\top}h_{*}c^{\top}&\zeta+c_{A}^{\top}Gc_{A}+\eta_{*}(c_{A}^{\top}h_{*})^{2}&c_{A}^{\top}GF^{\top}+\eta_{*}c_{A}^{\top}h_{*}h_{*}^{\top}F^{\top}\\ \eta_{*}Fh_{*}c^{\top}&FGc_{A}+\eta_{*}Fh_{*}h_{*}^{\top}c_{A}&FGF^{\top}+\eta_{*}Fh_{*}h_{*}^{\top}F^{\top}\end{array}\right]\left[\begin{array}[]{c}\bar{d}_{x}\\ d_{\tau}\\ d_{v}\end{array}\right]=\left[\begin{array}[]{c}r_{1}\\ r_{2}\\ Fr_{3}\end{array}\right].

At the end, we are interested in FdvF^{\top}d_{v} to calculate our search directions. If we consider the last equation, we can remove the matrix FF multiplied from the left to all the terms as

(293) ηhcd¯x+dτ(G+ηhh)cA+(G+ηhh)Fdv=r3+Aw\displaystyle\eta_{*}h_{*}c^{\top}\bar{d}_{x}+d_{\tau}(G+\eta_{*}h_{*}h_{*}^{\top})c_{A}+(G+\eta_{*}h_{*}h_{*}^{\top})F^{\top}d_{v}=r_{3}+Aw
\displaystyle\Rightarrow ηG¯1hcd¯x+dτcA+Fdv=G¯1r3+G¯1Aw,\displaystyle\eta_{*}\bar{G}^{-1}h_{*}c^{\top}\bar{d}_{x}+d_{\tau}c_{A}+F^{\top}d_{v}=\bar{G}^{-1}r_{3}+\bar{G}^{-1}Aw,

where G¯:=G+ηhh\bar{G}:=G+\eta_{*}h_{*}h_{*}^{\top}. Now, we multiply the last equation by AA^{\top} from the left and eliminate dvd_{v} to get

(294) ηAG¯1hcd¯x+dτc=AG¯1r3+AG¯1Aw.\displaystyle\eta_{*}A^{\top}\bar{G}^{-1}h_{*}c^{\top}\bar{d}_{x}+d_{\tau}c=A^{\top}\bar{G}^{-1}r_{3}+A^{\top}\bar{G}^{-1}Aw.

By using the equations in (293) and (294), we can get a linear system without FF as

(304) [AHA+(ηη2hG¯1h)ccηchG¯1AAhηAG¯1hcAG¯1AchAcζ][d¯xwdτ]=[r1ηchG¯1r3AG¯1r3r2cAr3].\displaystyle\left[\begin{array}[]{ccc}A^{\top}HA+(\eta_{*}-\eta_{*}^{2}h_{*}^{\top}\bar{G}^{-1}h_{*})cc^{\top}&\eta_{*}ch_{*}^{\top}\bar{G}^{-1}A&A^{\top}h\\ \eta_{*}A^{\top}\bar{G}^{-1}h_{*}c^{\top}&-A^{\top}\bar{G}^{-1}A&c\\ h^{\top}A&c^{\top}&\zeta\end{array}\right]\left[\begin{array}[]{c}\bar{d}_{x}\\ w\\ d_{\tau}\end{array}\right]=\left[\begin{array}[]{c}r_{1}-\eta_{*}ch_{*}^{\top}\bar{G}^{-1}r_{3}\\ A^{\top}\bar{G}^{-1}r_{3}\\ r_{2}-c_{A}^{\top}r_{3}\end{array}\right].

By Sherman–Morrison formula, we can write

(306) G¯1=G1ηG1hhG11+ηhG1h,\displaystyle\bar{G}^{-1}=G^{-1}-\eta_{*}\frac{G^{-1}h_{*}h_{*}^{\top}G^{-1}}{1+\eta_{*}h_{*}^{\top}G^{-1}h_{*}},
\displaystyle\Rightarrow G¯1h=11+ηhG1hG1h.\displaystyle\bar{G}^{-1}h_{*}=\frac{1}{1+\eta_{*}h_{*}^{\top}G^{-1}h_{*}}G^{-1}h_{*}.

Let us define

(307) β:=11+ηhG1h.\displaystyle\beta:=\frac{1}{1+\eta_{*}h_{*}^{\top}G^{-1}h_{*}}.

Then we can see that the LHS matrix in (304) can be written as

(317) [AHA0Ah0AG1AchAcζ]H~+ηβ[cAG1h0][cAG1h0].\displaystyle\underbrace{\left[\begin{array}[]{ccc}A^{\top}HA&0&A^{\top}h\\ 0&-A^{\top}G^{-1}A&c\\ h^{\top}A&c^{\top}&\zeta\end{array}\right]}_{\tilde{H}}+\eta_{*}\beta\left[\begin{array}[]{c}c\\ A^{\top}G^{-1}h_{*}\\ 0\end{array}\right]\left[\begin{array}[]{c}c\\ A^{\top}G^{-1}h_{*}\\ 0\end{array}\right]^{\top}.

This matrix is a (2n+1)(2n+1)-by-(2n+1)(2n+1) matrix H~\tilde{H} plus a rank one update. If we have the Cholesky or LU factorization of AHAA^{\top}HA and AG1AA^{\top}G^{-1}A (in the case that G:=μ2HG:=\mu^{2}H, we need just one such factorization), then we have such a factorization for the 2n2n-by-2n2n leading minor of H~\tilde{H} and we can easily extend it to a factorization for the whole H~\tilde{H}. To solve the whole system, we can then use Sherman-Morrison formula.

Appendix B Implementation details for SDP and generalized epigraphs of matrix norms

DDS accepts many constraints that involve matrices. These matrices are given into DDS as vectors. Calculating the derivatives and implementing the required linear algebra operations efficiently is critical in the performance of DDS. In DDS 2.0, many ideas and tricks have been used for matrix functions. In this section, we mention some of these techniques for SDP and matrix norm constraints.

B.1. SDP

The primal and dual barrier functions being used for SDP constraints (53) are:

Φ(Z)=ln(det(F0+Z)),\displaystyle\Phi(Z)=-\textup{ln}(\det(F_{0}+Z)),
(318) Φ(Y)=nF0,Yln(det(Y)).\displaystyle\Phi_{*}(Y)=-n-\langle F_{0},Y\rangle-\textup{ln}(\det(-Y)).

For function f=ln(det(X))f=-\textup{ln}(\det(X)), we have:

f(X),H\displaystyle\langle f^{\prime}(X),H\rangle =\displaystyle= Tr(X1H),\displaystyle-\textup{Tr}(X^{-1}H),
(319) f′′(X)H,H\displaystyle\langle f^{\prime\prime}(X)H,H\rangle =\displaystyle= Tr(X1HX1H).\displaystyle\textup{Tr}(X^{-1}HX^{-1}H).

To implement our algorithm, for each matrix XX, we need to find the corresponding gradient gXg_{X} and Hessian HXH_{X}, such that for any symmetric positive semidefinite matrix XX and symmetric matrix HH we have:

Tr(X1H)\displaystyle-\textup{Tr}(X^{-1}H) =\displaystyle= gXsm2vec(H),\displaystyle-g_{X}^{\top}\texttt{sm2vec}(H),
(320) Tr(X1HX1H)\displaystyle\textup{Tr}(X^{-1}HX^{-1}H) =\displaystyle= sm2vec(H)HXsm2vec(H).\displaystyle\texttt{sm2vec}(H)^{\top}H_{X}\texttt{sm2vec}(H).

It can be shown that gX=sm2vec(X1)g_{X}=\texttt{sm2vec}(X^{-1}) and HX=X1X1H_{X}=X^{-1}\otimes X^{-1}, where \otimes stands for the Kronecker product of two matrices. Although this representation is theoretically nice, there are two important practical issues with it. First, it is not efficient to calculate the inverse of a matrix explicitly. Second, forming and storing HXH_{X} is not numerically efficient for large scale problems. DDS does not explicitly form the inverse of matrices. An important matrix in calculating the search direction is AΦ′′(u)AA^{\top}\Phi^{\prime\prime}(u)A, where Φ\Phi is the s.c. barrier for the whole input problem. In DDS, there exist an internal function hessian_A that directly returns AΦ′′(u)AA^{\top}\Phi^{\prime\prime}(u)A in an efficient way, optimized for all the set constraints, including SDP. For transitions from matrices to vectors, we use the properties of Kronecker product that for matrices AA, BB, and XX of proper size, we have

(BA)sm2vec(X)=sm2vec(AXB),\displaystyle(B^{\top}\otimes A)sm2vec(X)=sm2vec(AXB),
(321) (AB)1=A1B1.\displaystyle(A\otimes B)^{-1}=A^{-1}\otimes B^{-1}.

Similar to hessian_A, there are other internal functions in DDS for efficiently calculating matrix-vector products, such as hessian_V for evaluating the product of Hessian with a given vector of proper size.

B.2. Generalized epigraphs of matrix norms

Let us see how to calculate the first and second derivatives of functions in (113) and (114).

Proposition B.1.

(a) Consider Φ(X,U)\Phi(X,U) defined in (113). Let, for simplicity, X¯:=XUU\bar{X}:=X-UU^{\top}, then, we have

Φ(X,U)[(dX,dU)]\displaystyle\Phi^{\prime}(X,U)[(d_{X},d_{U})] =\displaystyle= Tr(X¯1dX+X¯1(dUU+UdU)),\displaystyle\textup{Tr}(-\bar{X}^{-1}d_{X}+\bar{X}^{-1}(d_{U}U^{\top}+Ud_{U}^{\top})),
(322) Φ′′(X,U)[(dX,dU),(d¯X,d¯U)]\displaystyle\Phi^{\prime\prime}(X,U)[(d_{X},d_{U}),(\bar{d}_{X},\bar{d}_{U})] =\displaystyle= Tr(X¯1dXX¯1d¯X)\displaystyle\textup{Tr}(\bar{X}^{-1}d_{X}\bar{X}^{-1}\bar{d}_{X})
Tr(X¯1d¯XX¯1(dUU+UdU))\displaystyle-\textup{Tr}(\bar{X}^{-1}\bar{d}_{X}\bar{X}^{-1}(d_{U}U^{\top}+Ud_{U}^{\top}))
Tr(X¯1dXX¯1(d¯UU+Ud¯U))\displaystyle-\textup{Tr}(\bar{X}^{-1}d_{X}\bar{X}^{-1}(\bar{d}_{U}U^{\top}+U\bar{d}_{U}^{\top}))
+Tr(X¯1(dUU+UdU)X¯1(d¯UU+Ud¯U))\displaystyle+\textup{Tr}(\bar{X}^{-1}(d_{U}U^{\top}+Ud_{U}^{\top})\bar{X}^{-1}(\bar{d}_{U}U^{\top}+U\bar{d}_{U}^{\top}))
+2Tr(X¯1dUd¯U).\displaystyle+2\textup{Tr}(\bar{X}^{-1}d_{U}\bar{d}_{U}^{\top}).

(b) Consider Φ(Y,V)\Phi_{*}(Y,V) defined in (114), we have

Φ(Y,V)[(dY,dV)]\displaystyle\Phi^{\prime}_{*}(Y,V)[(d_{Y},d_{V})] =\displaystyle= 12Tr(VY1dV)+14Tr(Y1VVY1dY)Tr(Y1dY),\displaystyle-\frac{1}{2}\textup{Tr}(V^{\top}Y^{-1}d_{V})+\frac{1}{4}\textup{Tr}(Y^{-1}VV^{\top}Y^{-1}d_{Y})-\textup{Tr}(Y^{-1}d_{Y}),
Φ′′(Y,V)[(dY,dV),(d¯Y,d¯V)]\displaystyle\Phi^{\prime\prime}_{*}(Y,V)[(d_{Y},d_{V}),(\bar{d}_{Y},\bar{d}_{V})] =\displaystyle= 12Tr(dVY1d¯V)\displaystyle-\frac{1}{2}\textup{Tr}(d_{V}^{\top}Y^{-1}\bar{d}_{V})
+12Tr(Y1dVVY1d¯Y)+12Tr(Y1d¯VVY1dY)\displaystyle+\frac{1}{2}\textup{Tr}(Y^{-1}d_{V}V^{\top}Y^{-1}\bar{d}_{Y})+\frac{1}{2}\textup{Tr}(Y^{-1}\bar{d}_{V}V^{\top}Y^{-1}d_{Y})
12Tr(Y1dYY1d¯YY1VV)+Tr(Y1dYY1d¯Y).\displaystyle-\frac{1}{2}\textup{Tr}(Y^{-1}d_{Y}Y^{-1}\bar{d}_{Y}Y^{-1}VV^{\top})+\textup{Tr}(Y^{-1}d_{Y}Y^{-1}\bar{d}_{Y}).
Proof.

For the proof we use the fact that if g=ln(det(X))g=-\textup{ln}(\det(X)), then g(X)[H]=Tr(X1H)g^{\prime}(X)[H]=\textup{Tr}(X^{-1}H). Also note that if we define

(323) g(α):=ln(det(X+αdX(U+αdU)(U+αdU))),\displaystyle g(\alpha):=-\textup{ln}(\det(X+\alpha d_{X}-(U+\alpha d_{U})(U+\alpha d_{U})^{\top})),

then

g(0)=Φ(X,U)[(dX,dU)],g′′(0)=Φ′′(X,U)[(dX,dU),(dX,dU)],g^{\prime}(0)=\Phi^{\prime}(X,U)[(d_{X},d_{U})],\ \ \ g^{\prime\prime}(0)=\Phi^{\prime\prime}(X,U)[(d_{X},d_{U}),(d_{X},d_{U})],

and similarly for Φ(Y,V)\Phi_{*}(Y,V). We do not provide all the details, but we show how the proof works. For example, let us define

(324) f(α):=Tr((Y+αdY)1VVY1dY),\displaystyle f(\alpha):=\textup{Tr}((Y+\alpha d_{Y})^{-1}VV^{\top}Y^{-1}d_{Y}),

and we want to calculate f(0)f^{\prime}(0). We have

(325) f(0)\displaystyle f^{\prime}(0) :=\displaystyle:= limα0f(α)f(0)α\displaystyle\lim_{\alpha\rightarrow 0}\frac{f(\alpha)-f(0)}{\alpha}
=\displaystyle= Tr(limα0(Y+αdY)1VVY1dYY1VVY1dYα)\displaystyle\textup{Tr}\left(\lim_{\alpha\rightarrow 0}\frac{(Y+\alpha d_{Y})^{-1}VV^{\top}Y^{-1}d_{Y}-Y^{-1}VV^{\top}Y^{-1}d_{Y}}{\alpha}\right)
=\displaystyle= Tr(limα0(Y+αdY)1[VVY1dY(I+αdYY1)VVY1dY]α)\displaystyle\textup{Tr}\left(\lim_{\alpha\rightarrow 0}\frac{(Y+\alpha d_{Y})^{-1}\left[VV^{\top}Y^{-1}d_{Y}-(I+\alpha d_{Y}Y^{-1})VV^{\top}Y^{-1}d_{Y}\right]}{\alpha}\right)
=\displaystyle= Tr(limα0(Y+αdY)1[dYY1VVY1dY])\displaystyle\textup{Tr}\left(\lim_{\alpha\rightarrow 0}(Y+\alpha d_{Y})^{-1}\left[d_{Y}Y^{-1}VV^{\top}Y^{-1}d_{Y}\right]\right)
=\displaystyle= Tr(Y1dYY1VVY1dY).\displaystyle\textup{Tr}\left(Y^{-1}d_{Y}Y^{-1}VV^{\top}Y^{-1}d_{Y}\right).

Note that all the above formulas for the derivatives are in matrix form. Let us explain briefly how to convert them to the vector form for the code. We explain it for the derivatives of Φ(X,U)\Phi(X,U) and the rest are similar. From (B.1) we have

(326) Φ(X,U)[(dX,dU)]\displaystyle\Phi^{\prime}(X,U)[(d_{X},d_{U})] =\displaystyle= Tr(X¯1dX)+Tr(X¯1dUU)+Tr(X1UdU)),\displaystyle\textup{Tr}(-\bar{X}^{-1}d_{X})+\textup{Tr}(\bar{X}^{-1}d_{U}U^{\top})+\textup{Tr}(X^{-1}Ud_{U}^{\top})),
=\displaystyle= Tr(X¯1dX)+2Tr(UX¯1dU).\displaystyle\textup{Tr}(-\bar{X}^{-1}d_{X})+2\textup{Tr}(U^{\top}\bar{X}^{-1}d_{U}).

Hence, if gg is the gradient of Φ(X,U)\Phi(X,U) in the vector form, we have

(329) g=[2×m2vec(X1U,n)sm2vec(X1)].\displaystyle g=\left[\begin{array}[]{c}2\times m2vec(X^{-1}U,n)\\ -sm2vec(X^{-1})\end{array}\right].

The second derivatives are trickier. Assume that for example we want the vector form hh for Φ′′(X,U)[(dX,dU)]\Phi^{\prime\prime}(X,U)[(d_{X},d_{U})]. By using (B.1) we can easily get each entry of hh; consider the identity matrix of size m2+mnm^{2}+mn. If we choose (d¯X,d¯U)(\bar{d}_{X},\bar{d}_{U}) to represent the jjth column of this identity matrix, we get h(j)h(j). Practically, this can be done by a forfor loop, which is not efficient. What we did in the code is to implement this using matrix multiplication.

Appendix C Quantum entropy and quantum relative entropy

The s.c. barrier DDS uses for quantum entropy is given in (210). To derive the LF conjugate, we solve the optimization problem in (211); let us write the first order optimality conditions for (211). Section 3.3 of the book [18] is about the derivation of matrix-values functions. For the first derivative, we have the following theorem:

Theorem C.1.

Let XX and HH be self-adjoint matrices and f:(a,b)f:(a,b)\mapsto\mathbb{R} be a continuously differentiable function defined on an interval. Assume that the eigenvalues of X+αHX+\alpha H are in (a,b)(a,b) for an interval around α0\alpha_{0}\in\mathbb{R}. Then,

(330) ddtTrf(X+αH)|α=α0=TrHf(X+α0H).\displaystyle\left.\frac{d}{dt}\textup{Tr}{f(X+\alpha H)}\right|_{\alpha=\alpha_{0}}=\textup{Tr}{Hf^{\prime}(X+\alpha_{0}H)}.

The first-order optimality conditions for (211) can be written as

η+1tϕ(X)\displaystyle\eta+\frac{1}{t-\phi(X)} =\displaystyle= 0\displaystyle 0
(331) Y+f(X)tϕ(X)+X1\displaystyle Y+\frac{-f^{\prime}(X)}{t-\phi(X)}+X^{-1} =\displaystyle= 0.\displaystyle 0.

If we substitute the first equation in the second one, we get

(332) 1ηY+f(X)+1ηX1=0.\displaystyle\frac{1}{\eta}Y+f^{\prime}(X)+\frac{1}{\eta}X^{-1}=0.

This equation implies that YY and XX are simultaneously diagonalizable and if we have Y=UDiag(λ1(Y),,λn(Y))Y=U\textup{Diag}(\lambda_{1}(Y),\ldots,\lambda_{n}(Y)), then we have X=UDiag(λ1(X),,λn(X))X=U\textup{Diag}(\lambda_{1}(X),\ldots,\lambda_{n}(X)) and so

(333) 1ηλi(Y)+f(λi(X))+1ηλi(X)=0,i{1,,n}.\displaystyle\frac{1}{\eta}\lambda_{i}(Y)+f^{\prime}(\lambda_{i}(X))+\frac{1}{\eta\lambda_{i}(X)}=0,\ \ \ i\in\{1,\ldots,n\}.

Here, we focus on the case that f(x)=xln(x)f(x)=x\textup{ln}(x). This matrix function is related to quantum relative entropy and Von-Neumann entropy optimization problems (see [7] for a review of the applications). In this case, we can use results for type 3 univariate function in Table 2 and use the θ\theta function we defined in (344). The LF conjugate of (211) is given in the following lemma:

Lemma C.1.

Assume that f(x)=xln(x)f(x)=x\textup{ln}(x). For a given η<0\eta<0 and a symmetric matrix Y𝕊nY\in\mathbb{S}^{n}, the function defined in (211) becomes

(334) Φ(η,Y)=ln(η)+Tr(Θ+Θ1)Tr(1ηY)12n,\displaystyle\Phi_{*}(\eta,Y)=-\textup{ln}(-\eta)+\textup{Tr}(\Theta+\Theta^{-1})-\textup{Tr}\left(\frac{1}{\eta}Y\right)-1-2n,

where Θ:=Θ(1ηY+(1ln(η))I)\Theta:=\Theta(\frac{1}{\eta}Y+(1-\textup{ln}(-\eta))I). Θ\Theta is the matrix extension of θ\theta defined in (344) as described in Section 9.

Proof.

Assume that for a given (η,Y)(\eta,Y), (t,X)(t,X) is the optimal solution for (211). If we use theorem C.1, we have f(X)=I+ln(X)f^{\prime}(X)=I+\textup{ln}(X). By substituting this in the first order optimality condition (332) we get

(335) ηX=Θ(1ηY+(1ln(η))I).\displaystyle\eta X=-\Theta\left(\frac{1}{\eta}Y+(1-\textup{ln}(-\eta))I\right).

By the first equation in (C) and using (335) we can write

(336) ηt=1+Tr(ηXln(X))\displaystyle\eta t=-1+\textup{Tr}(\eta X\textup{ln}(X)) =\displaystyle= 1+Tr(Θln(X))\displaystyle-1+\textup{Tr}(-\Theta\cdot\textup{ln}(X))
=\displaystyle= 1+Tr(Θ(1ηY+IΘ1)).\displaystyle-1+\textup{Tr}\left(\Theta\cdot\left(\frac{1}{\eta}Y+I-\Theta^{-1}\right)\right).
=\displaystyle= 1n+Tr(ΘY/η)+Tr(Θ).\displaystyle-1-n+\textup{Tr}(\Theta Y/\eta)+\textup{Tr}(\Theta).

If we substitute tt and XX in (211), we get the result. ∎

To implement our primal-dual techniques, we need the gradient and Hessian of Φ(t,X)\Phi(t,X) and Φ(η,Y)\Phi_{*}(\eta,Y). We already saw in Theorem C.1 how to calculate the gradient. The following theorem gives us a tool to calculate the Hessian.

Theorem C.2 ([18]-Theorem 3.25).

Assume that f:(a,b)f:(a,b)\mapsto\mathbb{R} is a 𝒞1\mathcal{C}^{1}-function and T=Diag(t1,,tn)T=\textup{Diag}(t_{1},\ldots,t_{n}) with ti(a,b),i{1,,n}t_{i}\in(a,b),\ i\in\{1,\ldots,n\}. Then, for a Hermitian matrix HH, we have

(337) ddtf(T+αH)|α=0=TfH,\displaystyle\left.\frac{d}{dt}f(T+\alpha H)\right|_{\alpha=0}=T_{f}\odot H,

where \odot is the Hadamard product and TfT_{f} is the divided difference matrix:

(340) Tf:={f(ti)f(tj)titjtitjf(ti)ti=tj.\displaystyle T_{f}:=\left\{\begin{array}[]{ll}\frac{f(t_{i})-f(t_{j})}{t_{i}-t_{j}}&t_{i}\neq t_{j}\\ f^{\prime}(t_{i})&t_{i}=t_{j}\end{array}\right..

TT is diagonal in the statement of the theorem, which is without loss of generality. Note that by the definition of functional calculus in (208), for a Hermitian matrix XX and a unitary matrix UU, we have

(341) f(UXU)=Uf(X)U.\displaystyle f(UXU^{*})=Uf(X)U^{*}.

Therefore, for a matrix T=UDiag(t1,,tn)UT=U\textup{Diag}(t_{1},\ldots,t_{n})U^{*}, we can update (337)

(342) ddtf(T+αH)|α=0=U(Tf(UHU))U,\displaystyle\left.\frac{d}{dt}f(T+\alpha H)\right|_{\alpha=0}=U\left(T_{f}\odot(U^{*}HU)\right)U^{*},

where we extend the definition of TfT_{f} in (340) to non-diagonal matrices. Now we can use Theorems C.2 and C.1 to calculate the Hessian of a matrix function.

Corollary C.1.

Let XX, HH, and H~\tilde{H} be self-adjoint matrices and f:(a,b)f:(a,b)\mapsto\mathbb{R} be a continuously differentiable function defined on an interval. Assume that the eigenvalues of X+tHX+tH and X+tH~X+t\tilde{H} are in (a,b)(a,b) for an interval around t=0t=0. Assume that X=UDiag(λ1,,λn)UX=U\textup{Diag}(\lambda_{1},\ldots,\lambda_{n})U^{*}. Then,

(343) f′′(X)[H,H~]=Tr((Xf(UHU))UH~U).\displaystyle f^{\prime\prime}(X)[H,\tilde{H}]=\textup{Tr}\left(\left(X_{f}\odot(U^{*}HU)\right)U^{*}\tilde{H}U\right).

Let us calculate the gradient and Hessian for our functions for ϕ(x)=xln(x)\phi(x)=x\textup{ln}(x). Let X=UDiag(λ1,,λn)UX=U\textup{Diag}(\lambda_{1},\ldots,\lambda_{n})U^{*} in the following.

Φ(t,X)[(h,H)]=htTr(XlnX)+1tTr(XlnX)Tr((I+ln(X))H)Tr(X1H).\displaystyle\Phi^{\prime}(t,X)[(h,H)]=-\frac{h}{t-\textup{Tr}(X\textup{ln}X)}+\frac{1}{t-\textup{Tr}(X\textup{ln}X)}\textup{Tr}((I+\textup{ln}(X))H)-\textup{Tr}(X^{-1}H).

For the second derivative, we can use the fact that

Φ′′(t,X)[(h~,H~),(h,H)]=Φ(t+αh~,X+αH~)|α=0[(h,H)].\displaystyle\Phi^{\prime\prime}(t,X)[(\tilde{h},\tilde{H}),(h,H)]=\left.\Phi^{\prime}(t+\alpha\tilde{h},X+\alpha\tilde{H})\right|_{\alpha=0}[(h,H)].

Using this formula, we have (ζ:=1tTr(XlnX)\zeta:=\frac{1}{t-\textup{Tr}(X\textup{ln}X)})

Φ′′(t,X)[(h~,H~),(h,H)]\displaystyle\Phi^{\prime\prime}(t,X)[(\tilde{h},\tilde{H}),(h,H)] =\displaystyle= ζ2hh~\displaystyle\zeta^{2}h\tilde{h}
ζ2h~Tr((I+ln(X))H)ζhTr((I+ln(X))H~)\displaystyle-\zeta^{2}\tilde{h}\textup{Tr}((I+\textup{ln}(X))H)-\zeta h\textup{Tr}((I+\textup{ln}(X))\tilde{H})
+ζ2Tr((I+ln(X))H~)Tr((I+ln(X))H)\displaystyle+\zeta^{2}\textup{Tr}((I+\textup{ln}(X))\tilde{H})\textup{Tr}((I+\textup{ln}(X))H)
+ζTr(U(Xln(UH~U))UH)\displaystyle+\zeta\textup{Tr}\left(U\left(X_{\textup{ln}}\odot(U^{*}\tilde{H}U)\right)U^{*}H\right)
+Tr(X1H~X1H).\displaystyle+\textup{Tr}(X^{-1}\tilde{H}X^{-1}H).

Now let us compute the gradient and Hessian for the conjugate function. Let Y=Uλ(Y)UY=U\lambda(Y)U^{*}, by using Theorem C.1, the gradient of Φ(η,Y)\Phi_{*}(\eta,Y) is

Φ(η,Y)[(h,H)]\displaystyle\Phi^{\prime}_{*}(\eta,Y)[(h,H)] =\displaystyle= h[1η+Tr((1η2Y1ηI)Θ¯+1η2Y)]\displaystyle h\left[-\frac{1}{\eta}+\textup{Tr}\left(\left(-\frac{1}{\eta^{2}}Y-\frac{1}{\eta}I\right)\bar{\Theta}+\frac{1}{\eta^{2}}Y\right)\right]
+Tr(H(1ηΘ¯1ηI)),\displaystyle+\textup{Tr}\left(H\left(\frac{1}{\eta}\bar{\Theta}-\frac{1}{\eta}I\right)\right),

where Θ¯\bar{\Theta} is the matrix extension of (θθθ2)\left(\theta^{\prime}-\frac{\theta^{\prime}}{\theta^{2}}\right). For the second derivative, let us first define Y¯\bar{Y} as in (340):

Y¯:=(1ηY+(1ln(η))I)(θθθ2).\displaystyle\bar{Y}:=\left(\frac{1}{\eta}Y+(1-\textup{ln}(-\eta))I\right)_{\left(\theta^{\prime}-\frac{\theta^{\prime}}{\theta^{2}}\right)}.

In the expression of the Hessian, we use Θ¯¯\bar{\bar{\Theta}} as the matrix extension of (θ′′θ′′θ2(θ)2θ3)\left(\theta^{\prime\prime}-\frac{\theta^{\prime\prime}\theta-2(\theta^{\prime})^{2}}{\theta^{3}}\right). Then, we have

Φ′′(η,Y)[(h~,H~),(h,H)]=\displaystyle\Phi^{\prime\prime}_{*}(\eta,Y)[(\tilde{h},\tilde{H}),(h,H)]=
hh~[1η2+Tr((2η3Y+1η2I)Θ¯+(1η2Y1ηI)2Θ¯¯2η3Y)]\displaystyle h\tilde{h}\left[\frac{1}{\eta^{2}}+\textup{Tr}\left(\left(\frac{2}{\eta^{3}}Y+\frac{1}{\eta^{2}}I\right)\bar{\Theta}+\left(-\frac{1}{\eta^{2}}Y-\frac{1}{\eta}I\right)^{2}\bar{\bar{\Theta}}-\frac{2}{\eta^{3}}Y\right)\right]
+h~Tr(H[1η2Θ¯+1ηU(Y¯(1η2λ(Y)1ηI))U+1η2I])\displaystyle+\tilde{h}\textup{Tr}\left(H\left[\frac{-1}{\eta^{2}}\bar{\Theta}+\frac{1}{\eta}U\left(\bar{Y}\odot\left(\frac{-1}{\eta^{2}}\lambda(Y)-\frac{1}{\eta}I\right)\right)U^{*}+\frac{1}{\eta^{2}}I\right]\right)
+hTr(H~[1η2Θ¯+1ηU(Y¯(1η2λ(Y)1ηI))U+1η2I])\displaystyle+h\textup{Tr}\left(\tilde{H}\left[\frac{-1}{\eta^{2}}\bar{\Theta}+\frac{1}{\eta}U\left(\bar{Y}\odot\left(\frac{-1}{\eta^{2}}\lambda(Y)-\frac{1}{\eta}I\right)\right)U^{*}+\frac{1}{\eta^{2}}I\right]\right)
+1η2Tr(U(Y¯(UH~U))UH).\displaystyle+\frac{1}{\eta^{2}}\textup{Tr}\left(U\left(\bar{Y}\odot(U^{*}\tilde{H}U)\right)U^{*}H\right).

Appendix D Univariate convex functions

DDS 2.0 uses the epigraph of six univariate convex function, given in Table 2, in the direct sum format to handle important constraints such as geometric and entropy programming. In these section, we show how to calculate the LF conjugate for 2-variable s.c. barriers. Some of these function are implicitly evaluated, but in a computationally efficient way. In the second part of this section, we show that gradient and Hessian of these functions can also be evaluated efficiently.

D.1. Legendre-Fenchel conjugates of univariate convex functions

The LF conjugates for the first three functions are given in Table 9.

Table 9. LF conjugates for the first three s.c. barriers in Table 2.
Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
1 ln(t+ln(z))ln(z)-\textup{ln}(t+\textup{ln}(z))-\textup{ln}(z) 1+(η+1)[1+ln(η+1)y]ln(η)-1+(-\eta+1)\left[-1+\textup{ln}\frac{-(-\eta+1)}{y}\right]-\textup{ln}(-\eta)
2 ln(ln(t)z)ln(t)-\textup{ln}(\textup{ln}(t)-z)-\textup{ln}(t) 1+(y+1)[1+ln(y+1)η]ln(y)-1+(y+1)\left[-1+\textup{ln}\frac{-(y+1)}{\eta}\right]-\textup{ln}(y)
3 ln(tzln(z))ln(z)-\textup{ln}(t-z\textup{ln}(z))-\textup{ln}(z) ln(η)+θ(1+yηln(η))yη+1θ(1+yηln(η))3-\textup{ln}(-\eta)+\theta\left(1+\frac{y}{\eta}-\textup{ln}(-\eta)\right)-\frac{y}{\eta}+\frac{1}{\theta\left(1+\frac{y}{\eta}-\textup{ln}(-\eta)\right)}-3

Finding the LF conjugates for the first two functions can be handled with easy calculus. In the third row, θ(r)\theta(r), defined in [25], is the unique solution of

(344) 1θln(θ)=r.\displaystyle\frac{1}{\theta}-\textup{ln}(\theta)=r.

It is easy to check by implicit differentiation that

θ(r)=θ2(r)θ(r)+1,θ′′(r)=θ2(r)+2θ(r)[θ(r)+1]2θ(r).\displaystyle\theta^{\prime}(r)=-\frac{\theta^{2}(r)}{\theta(r)+1},\ \ \theta^{\prime\prime}(r)=\frac{\theta^{2}(r)+2\theta(r)}{[\theta(r)+1]^{2}}\theta^{\prime}(r).

We can calculate θ(r)\theta(r) with accuracy 101510^{-15} in few steps with the following Newton iterations:

θk=θk12θk1+1[1+2θk1ln(θk1)r],θ0={exp(r),r11rln(rln(r)),r>1.\displaystyle\theta_{k}=\frac{\theta^{2}_{k-1}}{\theta_{k-1}+1}\left[1+\frac{2}{\theta_{k-1}}-\textup{ln}(\theta_{k-1})-r\right],\ \ \theta_{0}=\left\{\begin{array}[]{ll}\textup{exp}(-r),&r\leq 1\\ \frac{1}{r-\textup{ln}(r-\textup{ln}(r))},&r>1\end{array}\right..

The s.c. barrier DDS uses for the set |z|pt,p1|z|^{p}\leq t,\ p\geq 1, is Φ(z,t)=ln(t2pz2)2ln(t)\Phi(z,t)=-\textup{ln}(t^{\frac{2}{p}}-z^{2})-2\textup{ln}(t). To find the LF conjugate, we need to solve the following optimization problem:

(346) minz,t{yz+ηt+ln(t2pz2)+2ln(t)}.\displaystyle\min_{z,t}\left\{yz+\eta t+\textup{ln}(t^{\frac{2}{p}}-z^{2})+2\textup{ln}(t)\right\}.

By writing the first order optimality conditions we have:

(347) y=2zt2pz2,η=2pt2p1t2pz22t.\displaystyle y=\frac{2z}{t^{\frac{2}{p}}-z^{2}},\ \ \ \eta=-\frac{\frac{2}{p}t^{\frac{2}{p}-1}}{t^{\frac{2}{p}}-z^{2}}-\frac{2}{t}.

By doing some algebra, we can see that zz and tt satisfy:

y(2(1p+1)+1pyzη)2pyz22z=0,\displaystyle y\left(\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz}{-\eta}\right)^{\frac{2}{p}}-yz^{2}-2z=0,
(348) t=2(1p+1)+1pyzη.\displaystyle t=\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz}{-\eta}.

Let us define z(y,η)z(y,\eta) as the solution of the first equation in (D.1). For each pair (y,η)(y,\eta), we can calculate z(y,η)z(y,\eta) by few iterations of Newton method. Then, the first and second derivative can be calculated in terms of z(y,η)z(y,\eta). In DDS, we have two functions for these derivatives.

p1_TD(y,eta,p)    % returns  z
p1_TD_der(y,eta,p)  % returns [z_y  z_eta  z_y,y  z_y,eta  z_eta,eta]

For the set defined by zpt, 0p1,z>0-z^{p}\leq t,\ 0\leq p\leq 1,z>0, the corresponding s.c. barrier is Φ(z,t)=ln(zp+t)ln(z)\Phi(z,t)=-\textup{ln}(z^{p}+t)-\textup{ln}(z). To calculate the LF conjugate, we need to solve the following optimization problem:

(349) minz,t{yz+ηt+ln(zp+t)+ln(z)}.\displaystyle\min_{z,t}\left\{yz+\eta t+\textup{ln}(z^{p}+t)+\textup{ln}(z)\right\}.

By writing the first order optimality conditions we have:

(350) y=pz(p1)zp+t1z,η=1zp+t.\displaystyle y=\frac{-pz^{(p-1)}}{z^{p}+t}-\frac{1}{z},\ \ \ \eta=-\frac{1}{z^{p}+t}.

By doing some algebra, we can see that zz satisfies:

(351) yηpz(p1)+1z=0.\displaystyle y-\eta pz^{(p-1)}+\frac{1}{z}=0.

Similar to the previous case, let us define z(y,η)z(y,\eta) as the solution of the first equation in (351). For each pair (y,η)(y,\eta), we can calculate z(y,η)z(y,\eta) by few iterations of Newton method. Then, the first and second derivatives can be calculated in terms of z(y,η)z(y,\eta). The important point is that when we calculate z(y,η)z(y,\eta), then the derivatives can be calculated by explicit formulas. In our code, we have two functions

p2_TD(y,eta,p)    % returns  z
p2_TD_der(y,eta,p)  % returns [z_y  z_eta  z_y,y  z_y,eta  z_eta,eta]

The inputs to the above functions can be vectors. Table 10 is the continuation of Table 9.

Table 10. s.c. barriers and their LF conjugate for rows 4 and 5 of Table 2
s.c. barrier Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
4 ln(t2pz2)2ln(t)-\textup{ln}(t^{\frac{2}{p}}-z^{2})-2\textup{ln}(t) (2p+(1p1)yz(y,η))2+2ln(2(1p+1)+1pyz(y,η)η)-\left(\frac{2}{p}+(\frac{1}{p}-1)yz(y,\eta)\right)-2+2\textup{ln}\left(\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz(y,\eta)}{-\eta}\right)
+ln((2(1p+1)+1pyz(y,η)η)2pz2(y,η))+\textup{ln}\left(\left(\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz(y,\eta)}{-\eta}\right)^{\frac{2}{p}}-z^{2}(y,\eta)\right)
5 ln(zp+t)ln(z)-\textup{ln}(z^{p}+t)-\textup{ln}(z) η(p1)zp(y,η)2ln(η)+ln(z(y,η))\eta(p-1)z^{p}(y,\eta)-2-\textup{ln}(-\eta)+\textup{ln}(z(y,\eta))

For the set defined by 1zt,z>0\frac{1}{z}\leq t,z>0, the corresponding s.c. barrier is Φ(z,t)=ln(zt1)\Phi(z,t)=-\textup{ln}(zt-1). To calculate the LF conjugate, we need to solve the following optimization problem:

(352) minz,t{yz+ηt+ln(zt1)}.\displaystyle\min_{z,t}\left\{yz+\eta t+\textup{ln}(zt-1)\right\}.

At the optimal solution, we must have

(353) y=tzt1,η=zzt1.\displaystyle y=-\frac{t}{zt-1},\ \ \eta=-\frac{z}{zt-1}.

Since we have z,t>0z,t>0, then we must have y,η<0y,\eta<0 at a dual feasible point. By solving these systems we get

(354) t=11+4yη2η,z=11+4yη2y,\displaystyle t=\frac{-1-\sqrt{1+4y\eta}}{2\eta},\ \ z=\frac{-1-\sqrt{1+4y\eta}}{2y},
\displaystyle\Rightarrow Φ(y,η)=11+4yη+ln(1+1+4yη2yη).\displaystyle\Phi_{*}(y,\eta)=-1-\sqrt{1+4y\eta}+\textup{ln}\left(\frac{1+\sqrt{1+4y\eta}}{2y\eta}\right).

D.2. Gradient and Hessian

In this subsection, we show how to calculate the gradient and Hessian for the s.c. barriers and their LF conjugates. Specifically for the implicit function, we show that the derivatives can also be calculated efficiently. First consider the three pairs of functions in Table 9. Here are the explicit formulas for the first and second derivatives:

Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
ln(t+ln(z))ln(z)-\textup{ln}(t+\textup{ln}(z))-\textup{ln}(z) 1+(η+1)[1+ln(η+1)y]ln(η)-1+(-\eta+1)\left[-1+\textup{ln}\frac{-(-\eta+1)}{y}\right]-\textup{ln}(-\eta)

For the primal function we have

Φ=[1z(1t+ln(z)+1)1t+ln(z)],2Φ=[1z2(1t+ln(z)+1(t+ln(z))2+1)1z(t+ln(z))21z(t+ln(z))21(t+ln(z))2],\displaystyle\nabla\Phi=\left[\begin{array}[]{c}-\frac{1}{z}\left(\frac{1}{t+\textup{ln}(z)}+1\right)\\ -\frac{1}{t+\textup{ln}(z)}\end{array}\right],\ \ \nabla^{2}\Phi=\left[\begin{array}[]{cc}\frac{1}{z^{2}}\left(\frac{1}{t+\textup{ln}(z)}+\frac{1}{(t+\textup{ln}(z))^{2}}+1\right)&\frac{1}{z(t+\textup{ln}(z))^{2}}\\ \frac{1}{z(t+\textup{ln}(z))^{2}}&\frac{1}{(t+\textup{ln}(z))^{2}}\end{array}\right],

and for the dual function we have

Φ=[η+1yln(η+1y)1η],2Φ=[η+1y21y1y1η+1+1η2].\displaystyle\nabla\Phi_{*}=\left[\begin{array}[]{c}-\frac{-\eta+1}{y}\\ -\textup{ln}\left(-\frac{-\eta+1}{y}\right)-\frac{1}{\eta}\end{array}\right],\ \ \nabla^{2}\Phi_{*}=\left[\begin{array}[]{cc}\frac{-\eta+1}{y^{2}}&\frac{1}{y}\\ \frac{1}{y}&\frac{1}{-\eta+1}+\frac{1}{\eta^{2}}\end{array}\right].
Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
ln(ln(t)z)ln(t)-\textup{ln}(\textup{ln}(t)-z)-\textup{ln}(t) 1+(y+1)[1+ln(y+1)η]ln(y)-1+(y+1)\left[-1+\textup{ln}\frac{-(y+1)}{\eta}\right]-\textup{ln}(y)

For the primal function we have

Φ=[1ln(t)z1t(1ln(t)z+1)],2Φ=1(ln(t)z)2[11t1t1+(ln(t)z)+(ln(t)z)2t2],\displaystyle\nabla\Phi=\left[\begin{array}[]{c}\frac{1}{\textup{ln}(t)-z}\\ \frac{1}{t}\left(\frac{1}{\textup{ln}(t)-z}+1\right)\end{array}\right],\ \ \nabla^{2}\Phi=\frac{1}{(\textup{ln}(t)-z)^{2}}\left[\begin{array}[]{cc}1&-\frac{1}{t}\\ -\frac{1}{t}&\frac{1+(\textup{ln}(t)-z)+(\textup{ln}(t)-z)^{2}}{t^{2}}\end{array}\right],

and for the dual function we have

Φ=[ln((y+1)η)1yy+1η],2Φ=[1y+1+1y21η1ηy+1η2].\displaystyle\nabla\Phi_{*}=\left[\begin{array}[]{c}\textup{ln}\left(\frac{-(y+1)}{\eta}\right)-\frac{1}{y}\\ -\frac{y+1}{\eta}\end{array}\right],\ \ \nabla^{2}\Phi_{*}=\left[\begin{array}[]{cc}\frac{1}{y+1}+\frac{1}{y^{2}}&-\frac{1}{\eta}\\ -\frac{1}{\eta}&\frac{y+1}{\eta^{2}}\end{array}\right].
Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
ln(tzln(z))ln(z)-\textup{ln}(t-z\textup{ln}(z))-\textup{ln}(z) ln(η)+θ(1+yηln(η))yη+1θ(1+yηln(η))3-\textup{ln}(-\eta)+\theta\left(1+\frac{y}{\eta}-\textup{ln}(-\eta)\right)-\frac{y}{\eta}+\frac{1}{\theta\left(1+\frac{y}{\eta}-\textup{ln}(-\eta)\right)}-3

For the primal function we have

Φ=[ln(z)+1tzln(z)1z1tzln(z)],2Φ=[(tzln(z))+(ln(z)+1)2z(tzln(z))2+1z2(ln(z)+1)(tzln(z))2(ln(z)+1)(tzln(z))21(tzln(z))2].\displaystyle\nabla\Phi=\left[\begin{array}[]{c}\frac{\textup{ln}(z)+1}{t-z\textup{ln}(z)}-\frac{1}{z}\\ \frac{-1}{t-z\textup{ln}(z)}\end{array}\right],\ \ \nabla^{2}\Phi=\left[\begin{array}[]{cc}\frac{(t-z\textup{ln}(z))+(\textup{ln}(z)+1)^{2}}{z(t-z\textup{ln}(z))^{2}}+\frac{1}{z^{2}}&\frac{-(\textup{ln}(z)+1)}{(t-z\textup{ln}(z))^{2}}\\ \frac{-(\textup{ln}(z)+1)}{(t-z\textup{ln}(z))^{2}}&\frac{1}{(t-z\textup{ln}(z))^{2}}\end{array}\right].

For the dual function, since the argument of the function θ()\theta(\cdot) is always 1+yηln(η)1+\frac{y}{\eta}-\textup{ln}(-\eta), we ignore that in the following formulas and use θ\theta, θ\theta^{\prime}, and θ′′\theta^{\prime\prime} for the function and its derivative.

Φ=[θ1ηθηθ21η+yη2(yη2+1η)θ(1+1θ2)],2Φ=[f11f12f21f22],\displaystyle\nabla\Phi_{*}=\left[\begin{array}[]{c}\frac{\theta^{\prime}-1}{\eta}-\frac{\theta^{\prime}}{\eta\theta^{2}}\\ -\frac{1}{\eta}+\frac{y}{\eta^{2}}-\left(\frac{y}{\eta^{2}}+\frac{1}{\eta}\right)\theta^{\prime}\left(1+\frac{1}{\theta^{2}}\right)\end{array}\right],\ \ \nabla^{2}\Phi_{*}=\left[\begin{array}[]{cc}f_{11}&f_{12}\\ f_{21}&f_{22}\end{array}\right],

where

f11\displaystyle f_{11} =\displaystyle= 1η2θ′′θ′′θ2(θ)2η2θ3,\displaystyle\frac{1}{\eta^{2}}\theta^{\prime\prime}-\frac{\theta^{\prime\prime}\theta-2(\theta^{\prime})^{2}}{\eta^{2}\theta^{3}},
f21=f12\displaystyle f_{21}=f_{12} =\displaystyle= 1η2θ+1η(yη21η)θ′′+1η2[1η2θ+1η(yη21η)θ′′]θ2η(yη21η)(θ)2θ3\displaystyle-\frac{1}{\eta^{2}}\theta^{\prime}+\frac{1}{\eta}\left(-\frac{y}{\eta^{2}}-\frac{1}{\eta}\right)\theta^{\prime\prime}+\frac{1}{\eta^{2}}-\frac{\left[-\frac{1}{\eta^{2}}\theta^{\prime}+\frac{1}{\eta}\left(-\frac{y}{\eta^{2}}-\frac{1}{\eta}\right)\theta^{\prime\prime}\right]\theta-\frac{2}{\eta}\left(-\frac{y}{\eta^{2}}-\frac{1}{\eta}\right)(\theta^{\prime})^{2}}{\theta^{3}}
f22\displaystyle f_{22} =\displaystyle= 1η22yη3+[(2yη3+1η2)θ+(yη21η)2θ′′](1+1θ2)+(yη21η)22(θ)2θ3\displaystyle\frac{1}{\eta^{2}}-\frac{2y}{\eta^{3}}+\left[\left(\frac{2y}{\eta^{3}}+\frac{1}{\eta^{2}}\right)\theta^{\prime}+\left(-\frac{y}{\eta^{2}}-\frac{1}{\eta}\right)^{2}\theta^{\prime\prime}\right]\left(1+\frac{1}{\theta^{2}}\right)+\left(-\frac{y}{\eta^{2}}-\frac{1}{\eta}\right)^{2}\frac{2(\theta^{\prime})^{2}}{\theta^{3}}
Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
ln(t2pz2)2ln(t)-\textup{ln}(t^{\frac{2}{p}}-z^{2})-2\textup{ln}(t) (2p+(1p1)yz)2+2ln(2(1p+1)+1pyzη)+ln((2(1p+1)+1pyzη)2pz2)-\left(\frac{2}{p}+(\frac{1}{p}-1)yz\right)-2+2\textup{ln}\left(\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz}{-\eta}\right)+\textup{ln}\left(\left(\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz}{-\eta}\right)^{\frac{2}{p}}-z^{2}\right)

where z(y,η)z(y,\eta) is the solution of

(361) y(2(1p+1)+1pyzη)2pyz22z=0.\displaystyle y\left(\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz}{-\eta}\right)^{\frac{2}{p}}-yz^{2}-2z=0.

For simplicity, we drop the arguments of z(y,η)z(y,\eta) and denote it as zz. We denote the first derivatives with respect to yy and η\eta as zyz^{\prime}_{y} and zηz^{\prime}_{\eta}, respectively. Similarly, we use zyy′′z^{\prime\prime}_{yy}, zηy′′z^{\prime\prime}_{\eta y}, and zηη′′z^{\prime\prime}_{\eta\eta} for the second derivatives. We have

zy\displaystyle z^{\prime}_{y} =\displaystyle= B2p2yp2ηxB2p1z2=:S2y2p2ηB2p1+2yz+2=:M\displaystyle\frac{B^{\frac{2}{p}}-\frac{2y}{p^{2}\eta}xB^{\frac{2}{p}-1}-z^{2}\ \ \ =:S}{\frac{2y^{2}}{p^{2}\eta}B^{\frac{2}{p}-1}+2yz+2\ \ \ =:M}
zη\displaystyle z^{\prime}_{\eta} =\displaystyle= 2ypηB2p=:T2y2p2ηB2p1+2yz+2\displaystyle\frac{\frac{-2y}{p\eta}B^{\frac{2}{p}}\ \ \ =:T}{\frac{2y^{2}}{p^{2}\eta}B^{\frac{2}{p}-1}+2yz+2}
(362) B\displaystyle B :=\displaystyle:= 2(1p+1)+1pyzη\displaystyle\frac{2(\frac{1}{p}+1)+\frac{1}{p}yz}{-\eta}

For calculating the second derivatives of Φ\Phi_{*}, we need the derivatives of BB:

By\displaystyle B^{\prime}_{y} =\displaystyle= z+yzypη,\displaystyle\frac{z+yz^{\prime}_{y}}{-p\eta},
(363) Bη\displaystyle B^{\prime}_{\eta} =\displaystyle= ηpyzη+2(1p+1)+1pyzη2.\displaystyle\frac{-\frac{\eta}{p}yz^{\prime}_{\eta}+2(\frac{1}{p}+1)+\frac{1}{p}yz}{\eta^{2}}.

Then we have

Sy\displaystyle S^{\prime}_{y} =\displaystyle= 2pByB2p1+(2zp2η2yp2ηzy)B2p1+2yzp2η(2p1)ByB2p22zzy,\displaystyle\frac{2}{p}B^{\prime}_{y}B^{\frac{2}{p}-1}+\left(\frac{-2z}{p^{2}\eta}-\frac{2y}{p^{2}\eta}z^{\prime}_{y}\right)B^{\frac{2}{p}-1}+\frac{-2yz}{p^{2}\eta}\left(\frac{2}{p}-1\right)B^{\prime}_{y}B^{\frac{2}{p}-2}-2zz^{\prime}_{y},
Sη\displaystyle S^{\prime}_{\eta} =\displaystyle= 2pBηB2p1+2yp2(ηzηzη2)B2p1+2yzp2η(2p1)BηB2p22zzη,\displaystyle\frac{2}{p}B^{\prime}_{\eta}B^{\frac{2}{p}-1}+\frac{-2y}{p^{2}}\left(\frac{\eta z^{\prime}_{\eta}-z}{\eta^{2}}\right)B^{\frac{2}{p}-1}+\frac{-2yz}{p^{2}\eta}\left(\frac{2}{p}-1\right)B^{\prime}_{\eta}B^{\frac{2}{p}-2}-2zz^{\prime}_{\eta},
My\displaystyle M^{\prime}_{y} =\displaystyle= 4yp2ηB2p1+2y2p2η(2p1)ByB2p2+2z+2yzy,\displaystyle\frac{4y}{p^{2}\eta}B^{\frac{2}{p}-1}+\frac{2y^{2}}{p^{2}\eta}\left(\frac{2}{p}-1\right)B^{\prime}_{y}B^{\frac{2}{p}-2}+2z+2yz^{\prime}_{y},
Mη\displaystyle M^{\prime}_{\eta} =\displaystyle= 2y2p2η2B2p1+2y2p2η(2p1)BηB2p2+2yzη,\displaystyle-\frac{2y^{2}}{p^{2}\eta^{2}}B^{\frac{2}{p}-1}+\frac{2y^{2}}{p^{2}\eta}\left(\frac{2}{p}-1\right)B^{\prime}_{\eta}B^{\frac{2}{p}-2}+2yz^{\prime}_{\eta},
Tη\displaystyle T^{\prime}_{\eta} =\displaystyle= 2ypη2B2p+4yp2ηBηB2p1.\displaystyle\frac{2y}{p\eta^{2}}B^{\frac{2}{p}}+\frac{-4y}{p^{2}\eta}B^{\prime}_{\eta}B^{\frac{2}{p}-1}.

By the above definitions of SS, MM, and TT, we have

zyy′′=SyMMySM2,zηy′′=SηMMηSM2,zηη′′=TηMMηTM2.\displaystyle z^{\prime\prime}_{yy}=\frac{S^{\prime}_{y}M-M^{\prime}_{y}S}{M^{2}},\ \ \ z^{\prime\prime}_{\eta y}=\frac{S^{\prime}_{\eta}M-M^{\prime}_{\eta}S}{M^{2}},\ \ \ z^{\prime\prime}_{\eta\eta}=\frac{T^{\prime}_{\eta}M-M^{\prime}_{\eta}T}{M^{2}}.

The first and second derivatives of Φ\Phi are calculated as follows:

(367) Φ\displaystyle\nabla\Phi =\displaystyle= [2zt2pz22pt2p1t2pz22t],\displaystyle\left[\begin{array}[]{c}\frac{2z}{t^{\frac{2}{p}}-z^{2}}\\ \\ \frac{-\frac{2}{p}t^{\frac{2}{p}-1}}{t^{\frac{2}{p}}-z^{2}}-\frac{2}{t}\end{array}\right],
(371) 2Φ\displaystyle\ \ \nabla^{2}\Phi =\displaystyle= [2(t2pz2)+4z2(t2pz2)24pt2p1z(t2pz2)24pt2p1z(t2pz2)22p(2p1)t2p2(t2pz2)+(2p)2t4p2(t2pz2)2+2t2].\displaystyle\left[\begin{array}[]{cc}\frac{2(t^{\frac{2}{p}}-z^{2})+4z^{2}}{(t^{\frac{2}{p}}-z^{2})^{2}}&\frac{-\frac{4}{p}t^{\frac{2}{p}-1}z}{(t^{\frac{2}{p}}-z^{2})^{2}}\\ &\\ \frac{-\frac{4}{p}t^{\frac{2}{p}-1}z}{(t^{\frac{2}{p}}-z^{2})^{2}}&\frac{-\frac{2}{p}\left(\frac{2}{p}-1\right)t^{\frac{2}{p}-2}(t^{\frac{2}{p}}-z^{2})+\left(\frac{2}{p}\right)^{2}t^{\frac{4}{p}-2}}{(t^{\frac{2}{p}}-z^{2})^{2}}+\frac{2}{t^{2}}\end{array}\right].

The first and second derivatives of Φ\Phi_{*} are messier. For the first derivative we have

(375) Φ\displaystyle\nabla\Phi_{*} =\displaystyle= [(1p1)(z+yzy)+2pByB2p12zzyB2pz2+2ByB(1p1)(yzη)+2pBηB2p12zzηB2pz2+2BηB],\displaystyle\left[\begin{array}[]{c}-\left(\frac{1}{p}-1\right)(z+yz^{\prime}_{y})+\frac{\frac{2}{p}B^{\prime}_{y}B^{\frac{2}{p}-1}-2zz^{\prime}_{y}}{B^{\frac{2}{p}}-z^{2}}+\frac{2B^{\prime}_{y}}{B}\\ \\ -\left(\frac{1}{p}-1\right)(yz^{\prime}_{\eta})+\frac{\frac{2}{p}B^{\prime}_{\eta}B^{\frac{2}{p}-1}-2zz^{\prime}_{\eta}}{B^{\frac{2}{p}}-z^{2}}+\frac{2B^{\prime}_{\eta}}{B}\end{array}\right],

For calculating the second derivative, we also need the second derivatives of BB:

Byy′′\displaystyle B^{\prime\prime}_{yy} =\displaystyle= 2zy+zyy′′pη,\displaystyle\frac{2z^{\prime}_{y}+z^{\prime\prime}_{yy}}{-p\eta},
Byη′′\displaystyle B^{\prime\prime}_{y\eta} =\displaystyle= pη(zη+yzηy′′)+p(z+yzy)(pη)2\displaystyle\frac{-p\eta(z^{\prime}_{\eta}+yz^{\prime\prime}_{\eta y})+p(z+yz^{\prime}_{y})}{(p\eta)^{2}}
Bηη′′\displaystyle B^{\prime\prime}_{\eta\eta} =\displaystyle= yzηη′′ηyzηη2(1p+1)4η3+1pyzηηzyη2.\displaystyle-\frac{yz^{\prime\prime}_{\eta\eta}\eta-yz^{\prime}_{\eta}}{\eta^{2}}-\left(\frac{1}{p}+1\right)\frac{4}{\eta^{3}}+\frac{1}{p}\frac{yz^{\prime}_{\eta}\eta-zy}{\eta^{2}}.

Using the second derivatives of BB, we have

(379) 2Φ=[f11f12f21f22],\displaystyle\nabla^{2}\Phi_{*}=\left[\begin{array}[]{cc}f_{11}&f_{12}\\ f_{21}&f_{22}\end{array}\right],

where

f11\displaystyle f_{11} =\displaystyle= (1p1)(2zy+yzyy′′)+[2p[Byy′′B2p1+(2p1)(By)2B2p2]2((zy)2+zzyy′′)](B2pz2)(B2pz2)2\displaystyle-\left(\frac{1}{p}-1\right)(2z^{\prime}_{y}+yz^{\prime\prime}_{yy})+\frac{\left[\frac{2}{p}\left[B^{\prime\prime}_{yy}B^{\frac{2}{p}-1}+\left(\frac{2}{p}-1\right)(B^{\prime}_{y})^{2}B^{\frac{2}{p}-2}\right]-2((z^{\prime}_{y})^{2}+zz^{\prime\prime}_{yy})\right](B^{\frac{2}{p}}-z^{2})}{(B^{\frac{2}{p}}-z^{2})^{2}}
[2pByB2p12zzy]2(B2pz2)2+2Byy′′B2(By)2B2.\displaystyle-\frac{\left[\frac{2}{p}B^{\prime}_{y}B^{\frac{2}{p}-1}-2zz^{\prime}_{y}\right]^{2}}{(B^{\frac{2}{p}}-z^{2})^{2}}+\frac{2B^{\prime\prime}_{yy}B-2(B^{\prime}_{y})^{2}}{B^{2}}.

f21=f12f_{21}=f_{12} and f22f_{22} have similar formulations.

Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
ln(zp+t)ln(z)-\textup{ln}(z^{p}+t)-\textup{ln}(z) η(p1)zp(y,η)2ln(η)+ln(z(y,η))\eta(p-1)z^{p}(y,\eta)-2-\textup{ln}(-\eta)+\textup{ln}(z(y,\eta))

where zz is the solution of

(380) yηpz(p1)+1z=0\displaystyle y-\eta pz^{(p-1)}+\frac{1}{z}=0

Similar to the previous case, for simplicity, we drop the arguments of z(y,η)z(y,\eta) and denote it as zz. We denote the first derivatives with respect to yy and η\eta as zyz^{\prime}_{y} and zηz^{\prime}_{\eta}, respectively. By implicit differentiation, we have

zy\displaystyle z^{\prime}_{y} =\displaystyle= 1ηp(p1)zp2+z2=:B,\displaystyle\frac{1}{\eta p(p-1)z^{p-2}+z^{-2}\ \ \ =:B},
zη\displaystyle z^{\prime}_{\eta} =\displaystyle= pzp1ηp(p1)zp2+z2.\displaystyle\frac{-pz^{p-1}}{\eta p(p-1)z^{p-2}+z^{-2}}.

For the second derivatives of zz, by using

By\displaystyle B^{\prime}_{y} =\displaystyle= ηp(p1)(p2)zyzp32zyz3,\displaystyle\eta p(p-1)(p-2)z^{\prime}_{y}z^{p-3}-2z^{\prime}_{y}z^{-3},
Bη\displaystyle B^{\prime}_{\eta} =\displaystyle= p(p1)zp2+ηp(p1)(p2)zηzp32zηz3.\displaystyle p(p-1)z^{p-2}+\eta p(p-1)(p-2)z^{\prime}_{\eta}z^{p-3}-2z^{\prime}_{\eta}z^{-3}.

we have

zyy′′=ByB2,zηy′′=BηB2,zηη′′=p(p1)zηzp2B+pzp1BηB2.\displaystyle z^{\prime\prime}_{yy}=\frac{-B^{\prime}_{y}}{B^{2}},\ \ \ z^{\prime\prime}_{\eta y}=\frac{-B^{\prime}_{\eta}}{B^{2}},\ \ \ z^{\prime\prime}_{\eta\eta}=\frac{-p(p-1)z^{\prime}_{\eta}z^{p-2}B+pz^{p-1}B^{\prime}_{\eta}}{B^{2}}.

The first and second derivatives of Φ\Phi are calculated as follows:

Φ\displaystyle\nabla\Phi =\displaystyle= [pzp1zp+t1x1zp+t],\displaystyle\left[\begin{array}[]{c}-\frac{pz^{p-1}}{z^{p}+t}-\frac{1}{x}\\ -\frac{1}{z^{p}+t}\end{array}\right],
2Φ\displaystyle\ \ \nabla^{2}\Phi =\displaystyle= [p(p1)zp2(zp+t)+p2z2p2+1z2pzp1(zp+t)2pzp1(zp+t)21(zp+t)2].\displaystyle\left[\begin{array}[]{cc}-p(p-1)z^{p-2}(z^{p}+t)+p^{2}z^{2p-2}+\frac{1}{z^{2}}&\frac{pz^{p-1}}{(z^{p}+t)^{2}}\\ \frac{pz^{p-1}}{(z^{p}+t)^{2}}&\frac{1}{(z^{p}+t)^{2}}\end{array}\right].

The first derivative of Φ\Phi_{*} is equal to

(385) Φ\displaystyle\nabla\Phi_{*} =\displaystyle= [ηp(p1)zyxp1+zyz(p1)zp+ηp(p1)zηzp11η+zηz],\displaystyle\left[\begin{array}[]{c}\eta p(p-1)z^{\prime}_{y}x^{p-1}+\frac{z^{\prime}_{y}}{z}\\ (p-1)z^{p}+\eta p(p-1)z^{\prime}_{\eta}z^{p-1}-\frac{1}{\eta}+\frac{z^{\prime}_{\eta}}{z}\end{array}\right],

and the second derivatives of Φ\Phi_{*} is equal to

(389) 2Φ=[f11f12f21f22],\displaystyle\nabla^{2}\Phi_{*}=\left[\begin{array}[]{cc}f_{11}&f_{12}\\ f_{21}&f_{22}\end{array}\right],

where

f11\displaystyle f_{11} =\displaystyle= η(p1)p[zyyxp1+(p1)(zy)2zp2]+zyyz(zy)2z2,\displaystyle\eta(p-1)p\left[z^{\prime}_{yy}x^{p-1}+(p-1)(z^{\prime}_{y})^{2}z^{p-2}\right]+\frac{z^{\prime}_{yy}z-(z^{\prime}_{y})^{2}}{z^{2}},
f12=f21\displaystyle f_{12}=f_{21} =\displaystyle= (p1)p[zyzp1+ηzyηzp1+η(p1)zyzηzp2]+zyηzzηzyz2,\displaystyle(p-1)p\left[z^{\prime}_{y}z^{p-1}+\eta z^{\prime}_{y\eta}z^{p-1}+\eta(p-1)z^{\prime}_{y}z^{\prime}_{\eta}z^{p-2}\right]+\frac{z^{\prime}_{y\eta}z-z^{\prime}_{\eta}z^{\prime}_{y}}{z^{2}},
f22\displaystyle f_{22} =\displaystyle= (p1)pzηzp1+(p1)p[zηzp1+ηzηηzp1+η(zη)2(p1)zp2]+1η2+zηηz(zη)2z2.\displaystyle(p-1)pz^{\prime}_{\eta}z^{p-1}+(p-1)p\left[z^{\prime}_{\eta}z^{p-1}+\eta z^{\prime}_{\eta\eta}z^{p-1}+\eta(z^{\prime}_{\eta})^{2}(p-1)z^{p-2}\right]+\frac{1}{\eta^{2}}+\frac{z^{\prime}_{\eta\eta}z-(z^{\prime}_{\eta})^{2}}{z^{2}}.
Φ(z,t)\Phi(z,t) Φ(y,η)\Phi_{*}(y,\eta)
ln(zt1)-\textup{ln}(zt-1) 11+4yη+ln(1+1+4yη2yη)-1-\sqrt{1+4y\eta}+\textup{ln}\left(\frac{1+\sqrt{1+4y\eta}}{2y\eta}\right)

For the primal function we have

Φ=[tzt1zzt1],2Φ=[t2(zt1)21(zt1)21(zt1)2z2(zt1)2],\displaystyle\nabla\Phi=\left[\begin{array}[]{c}-\frac{t}{zt-1}\\ -\frac{z}{zt-1}\end{array}\right],\ \ \nabla^{2}\Phi=\left[\begin{array}[]{cc}\frac{t^{2}}{(zt-1)^{2}}&\frac{1}{(zt-1)^{2}}\\ \frac{1}{(zt-1)^{2}}&\frac{z^{2}}{(zt-1)^{2}}\end{array}\right],

and for the dual function we have

Φ=[2η1+1+4yη1y2y1+1+4yη1η],2Φ=[4η21+4yη(1+1+4yη)2+1y22(1+4yη+1+4yη)+4yη1+4yη(1+1+4yη)22(1+4yη+1+4yη)+4yη1+4yη(1+1+4yη)24y21+4yη(1+1+4yη)2+1η2].\displaystyle\nabla\Phi_{*}=\left[\begin{array}[]{c}-\frac{2\eta}{1+\sqrt{1+4y\eta}}-\frac{1}{y}\\ -\frac{2y}{1+\sqrt{1+4y\eta}}-\frac{1}{\eta}\end{array}\right],\ \ \nabla^{2}\Phi_{*}=\left[\begin{array}[]{cc}\frac{4\eta^{2}}{\sqrt{1+4y\eta}(1+\sqrt{1+4y\eta})^{2}}+\frac{1}{y^{2}}&\frac{-2(\sqrt{1+4y\eta}+1+4y\eta)+4y\eta}{\sqrt{1+4y\eta}(1+\sqrt{1+4y\eta})^{2}}\\ \frac{-2(\sqrt{1+4y\eta}+1+4y\eta)+4y\eta}{\sqrt{1+4y\eta}(1+\sqrt{1+4y\eta})^{2}}&\frac{4y^{2}}{\sqrt{1+4y\eta}(1+\sqrt{1+4y\eta})^{2}}+\frac{1}{\eta^{2}}\end{array}\right].

Appendix E A s.c. barrier for vector relative entropy

In section, we prove that the function

(392) Φ(t,u,z):=ln(ti=1uiln(ui/zi))i=1ln(zi)i=1ln(ui),\displaystyle\Phi(t,u,z):=-\textup{ln}\left(t-\sum_{i=1}^{\ell}u_{i}\textup{ln}(u_{i}/z_{i})\right)-\sum_{i=1}^{\ell}\textup{ln}(z_{i})-\sum_{i=1}^{\ell}\textup{ln}(u_{i}),

is a (2+1)(2\ell+1)-LH-s.c. barrier for the epigraph of vector relative entropy function f:++++f:\mathbb{R}_{++}^{\ell}\oplus\mathbb{R}_{++}^{\ell}\rightarrow\mathbb{R} defined as

f(z,u):=i=1uiln(ui)uiln(zi).f(z,u):=\sum_{i=1}^{\ell}u_{i}\textup{ln}(u_{i})-u_{i}\textup{ln}(z_{i}).

First note that Φ\Phi is (2+1)(2\ell+1)-LH, so we just need to prove self-concordance.

The proof is based on the compatibility results of [27]. We first consider the case =1\ell=1 and then generalize it. Let us define the map A:++2A:\mathbb{R}_{++}^{2}\rightarrow\mathbb{R} as

(393) A(u,z)=uln(uz).\displaystyle A(u,z)=-u\textup{ln}\left(\frac{u}{z}\right).

For a vector d:=(du,dz)d:=(d_{u},d_{z}), we can verify

A′′[d,d]\displaystyle A^{\prime\prime}[d,d] =\displaystyle= (zduudz)2uz2,\displaystyle-\frac{(zd_{u}-ud_{z})^{2}}{uz^{2}},
(394) A′′′[d,d,d]\displaystyle A^{\prime\prime\prime}[d,d,d] =\displaystyle= (zdu+2udz)(zduudz)2u2z3.\displaystyle\frac{(zd_{u}+2ud_{z})(zd_{u}-ud_{z})^{2}}{u^{2}z^{3}}.

(E) implies that AA is concave with respect to +\mathbb{R}_{+}. We claim that AA is (+,1)(\mathbb{R}_{+},1)-compatible with the barrier ln(u)ln(z)-\textup{ln}(u)-\textup{ln}(z) ([27]- Definition 5.1.2). For this, we need to show

(395) A′′′[d,d,d]3A′′[d,d]z2du2+u2dz2u2z2.\displaystyle A^{\prime\prime\prime}[d,d,d]\leq-3A^{\prime\prime}[d,d]\sqrt{\frac{z^{2}d_{u}^{2}+u^{2}d_{z}^{2}}{u^{2}z^{2}}}.

By using (E) and canceling out the common terms from both sides, (395) reduces to

(396) (zdu+2udz)3z2du2+u2dz2.\displaystyle(zd_{u}+2ud_{z})\leq 3\sqrt{z^{2}d_{u}^{2}+u^{2}d_{z}^{2}}.

We can assume that the LHS is nonnegative, then by taking the square of both side and reordering, we get the obvious inequality

(397) 8z2du24zdyudz+5u2dz2=4z2du2+(2zduudz)2+4u2dz20.\displaystyle 8z^{2}d_{u}^{2}-4zd_{y}ud_{z}+5u^{2}d_{z}^{2}=4z^{2}d_{u}^{2}+(2zd_{u}-ud_{z})^{2}+4u^{2}d_{z}^{2}\geq 0.

Therefore A(u,z)A(u,z) is (+,1)(\mathbb{R}_{+},1)-compatible with the barrier ln(u)ln(z)-\textup{ln}(u)-\textup{ln}(z). Also note that its summation with a linear term t+A(u,z)t+A(u,z) is also (+,1)(\mathbb{R}_{+},1)-compatible with the barrier ln(z)ln(u)-\textup{ln}(z)-\textup{ln}(u). Hence, ln(t+A)ln(u)ln(z)-\textup{ln}(t+A)-\textup{ln}(u)-\textup{ln}(z) is a 3-s.c. barrier by [27]-Proposition 5.1.7.

For the general case, consider the map A¯:[++2]\bar{A}:[\mathbb{R}_{++}^{2}]^{\ell}\rightarrow\mathbb{R} as

(398) A¯(u,z):=iA(ui,zi).\displaystyle\bar{A}(u,z):=\sum_{i}^{\ell}A(u_{i},z_{i}).

For a vector dd of proper size, we have

A¯′′(u,z)[d,d]\displaystyle\bar{A}^{\prime\prime}(u,z)[d,d] =\displaystyle= iA′′(ui,zi)[di,di],\displaystyle\sum_{i}^{\ell}A^{\prime\prime}(u_{i},z_{i})[d^{i},d^{i}],
(399) A¯′′′(u,z)[d,d,d]\displaystyle\bar{A}^{\prime\prime\prime}(u,z)[d,d,d] =\displaystyle= iA′′′(ui,zi)[di,di,di].\displaystyle\sum_{i}^{\ell}A^{\prime\prime\prime}(u_{i},z_{i})[d^{i},d^{i},d^{i}].

We claim that A¯\bar{A} is (+,1)(\mathbb{R}_{+},1)-compatible with the barrier iln(ui)iln(zi)-\sum_{i}\textup{ln}(u_{i})-\sum_{i}\textup{ln}(z_{i}). First note that A¯\bar{A} is concave with respect to +\mathbb{R}_{+}. We need to prove a similar inequality as (395) for A¯\bar{A}. Clearly we have

(400) zj2(duj)2+uj2(dzj)2uj2zj2izi2(dui)2+ui2(dzi)2ui2zi2,j{1,,}.\displaystyle\frac{z_{j}^{2}(d^{j}_{u})^{2}+u_{j}^{2}(d^{j}_{z})^{2}}{u_{j}^{2}z_{j}^{2}}\leq\sum_{i}^{\ell}\frac{z_{i}^{2}(d^{i}_{u})^{2}+u_{i}^{2}(d^{i}_{z})^{2}}{u_{i}^{2}z_{i}^{2}},\ \ \forall j\in\{1,\ldots,\ell\}.

Using inequality (395) and (400) for all jj and adding them together yields the inequality we want for A¯\bar{A}. Therefore, A¯\bar{A} is (+,1)(\mathbb{R}_{+},1)-compatible with the barrier iln(ui)iln(zi)-\sum_{i}\textup{ln}(u_{i})-\sum_{i}\textup{ln}(z_{i}), and by [27]-Proposition 5.1.7, (392) is a (2+1)(2\ell+1)-s.c. barrier.

Appendix F Comparison with some related solvers and modeling systems

In this section, we take a look at the input format for some other well-known solvers and modeling systems. [22] is a survey by Mittelmann about solvers for conic optimization, which gives an overview of the major codes available for the solution of linear semidefinite (SDP) and second-order cone (SOCP) programs. Many of these codes also solve linear programs (LP). We mention the leaders MOSEK, SDPT3, and SeDuMi from the list. We also look at CVX, a very user-friendly interface for convex optimization. CVX is not a solver, but is a modeling system that (by following some rules) detects if a given problem is convex and remodels it as a suitable input for solvers such as SeDuMi.

F.1. MOSEK [23]

MOSEK is a leading commercial solver for not just optimization over symmetric cones, but also many other convex optimization problems. The most recent version, MOSEK 9.0, for this state-of-the-art convex optimization software handles, in a primal-dual framework, many convex cone constraints which arise in applications [10]. There are different options for the using platform that can seen in MOSEK’s website [23].

F.2. SDPT3 [36, 33]

SDPT3 is a MATLAB package for optimization over symmetric cones, and it solves a conic optimization problem in the equality form as

(404) minc,xs.t.Ax=b,xK,\displaystyle\begin{array}[]{cc}\min&\langle c,x\rangle\\ \text{s.t.}&Ax=b,\\ &x\in K,\end{array}

where our cone KK can be a direct sum of nonnegative rays (leading to LP problems), second-order cones or semidefinite cones. The Input for SDPT3 is given in the cell array structure of MATLAB. The command to solve SDPT3 is of he form

[obj,X,y,Z,info,runhist] = sqlp(blk,At,C,b,OPTIONS,X0,y0,Z0).

The input data is given in different blocks, where for the kkth block, blk{k,1} specifies the type of the constraint. Letters ’l’, ’q’, and ’s’ are representing linear, quadratic, and semidefinite constraints. In the kkth block, At{k}, C{k}, … contain the part of the input related to this block.

F.3. SeDuMi [32]

SeDuMi is also a MATLAB package for optimization over symmetric cones in the format of (404). For SeDuMi, we give as the input AA, bb and cc and a structure array KK. The vector of variables has a “direct sum” structure. In other words, the set of variables is the direct sum of free, linear, quadratic, or semidefinite variables. The fields of the structure array KK contain the number of constraints we have from each type and their sizes. SeDuMi can be called in MATLAB by the command

[x,y] = sedumi(A,b,c,K);

and the variables are distinguished by KK as follows:

  1. (1)

    K.fK.f is the number of free variables, i.e., in the variable vector xx, x(1:K.f) are free variables.

  2. (2)

    K.lK.l is the number of nonnegative variables.

  3. (3)

    K.qK.q lists the dimension of Lorentz constraints.

  4. (4)

    K.sK.s lists the dimensions of positive semidefinite constraints.

For example, if K.l=10, K.q=[3 7] and K.s=[4 3], then x(1:10) are non-negative. Then we have x(11) >= norm(x(12:13)), x(14) >= norm(x(15:20)), and mat(x(21:36),4) and mat(x(37:45),3) are positive semidefinite. To insert our problem into SeDuMi, we have to write it in the format of (404) . We also have the choice to solve the dual problem because all of the above cones are self-dual.

F.4. CVX [16]

CVX is an interface that is more user-friendly than solvers like SeDuMi. It provides many options for giving the problem as an input, and then translates them to an eligible format for a solver such as SeDuMi. We can insert our problem constraint-by-constraint into CVX, but they must follow a protocol called Disciplined convex programming (DCP). DCP has a rule-set that the user has to follow, which allows CVX to verify that the problem is convex and convert it to a solvable form. For example, we can write a <=<= constraint only when the left side is convex and the right side is concave, and to do that, we can use a large class of functions from the library of CVX.