This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Quadratically Convergent Sequential Programming Method for Second-Order Cone Programs Capable of Warm Starts

Xinyi Luo Department of Industrial Engineering and Management Sciences, Northwestern University. This author was partially supported by National Science Foundation grant DMS-2012410. E-mail: xinyiluo2023@u.northwestern.edu    Andreas Wächter Department of Industrial Engineering and Management Sciences, Northwestern University. This author was partially supported by National Science Foundation grant DMS-2012410. E-mail: andreas.waechter@northwestern.edu
Abstract

We propose a new method for linear second-order cone programs. It is based on the sequential quadratic programming framework for nonlinear programming. In contrast to interior point methods, it can capitalize on the warm-start capabilities of active-set quadratic programming subproblem solvers and achieve a local quadratic rate of convergence.

In order to overcome the non-differentiability or singularity observed in nonlinear formulations of the conic constraints, the subproblems approximate the cones with polyhedral outer approximations that are refined throughout the iterations. For nondegenerate instances, the algorithm implicitly identifies the set of cones for which the optimal solution lies at the extreme points. As a consequence, the final steps are identical to regular sequential quadratic programming steps for a differentiable nonlinear optimization problem, yielding local quadratic convergence.

We prove the global and local convergence guarantees of the method and present numerical experiments that confirm that the method can take advantage of good starting points and can achieve higher accuracy compared to a state-of-the-art interior point solver.

keywords:
nonlinear optimization, second-order cone programming, sequential quadratic programming
{AMS}

90C15, 90C30, 90C55

1 Introduction

We are interested in the solution of second-order cone programs (SOCPs) of the form

(1a) minxn\displaystyle\min_{x\in\mathbb{R}^{n}}\ cTx\displaystyle c^{T}x
(1b) s.t. Axb,\displaystyle Ax\leq b,
(1c) xj𝒦jj𝒥:={1,,p},\displaystyle x_{j}\in\mathcal{K}_{j}\qquad j\in\mathcal{J}:=\{1,\ldots,p\},

where cnc\in\mathbb{R}^{n}, bmb\in\mathbb{R}^{m}, Am×nA\in\mathbb{R}^{m\times n}, and xjx_{j} is a subvector of xx of dimension njn_{j} with index set j{1,,n}\mathcal{I}_{j}\subseteq\{1,\ldots,n\}. We assume that the sets j\mathcal{I}_{j} are disjoint. The set 𝒦j\mathcal{K}_{j} is the second-order cone of dimension njn_{j}, i.e.,

(2) 𝒦j:={ynj:y¯y0},\mathcal{K}_{j}:=\{y\in\mathbb{R}^{n_{j}}:\|\bar{y}\|\leq y_{0}\},

where the vector yy is partitioned into y=(y0,y¯T)Ty=(y_{0},\bar{y}^{T})^{T} with y¯=(y¯1,,y¯nj1)T\bar{y}=(\bar{y}_{1},\ldots,\bar{y}_{n_{j}-1})^{T}. These problems arise in a number of important applications [1, 2, 20, 27]

Currently, most of the commercial software for solving SOCPs implements interior-point algorithms which utilize a barrier function for second-order cones, see, e.g. [12, 15, 21]. Interior-point methods have well-established global and local convergence guarantees [25] and are able to solve large-scale instances, but they cannot take as much of an advantage of a good estimate of the optimal solution as it would be desirable in many situations. For example, in certain applications, such as online optimal control, the same optimization problem has to be solved over and over again, with slightly modified data. In such a case, the optimal solution of one problem provides a good approximation for the new instance. Having a solver that is capable of “warm-starts”, i.e., utilizing this knowledge, can be essential when many similar problems have to be solved in a small amount of time.

For some problem classes, including linear programs (LPs), quadratic programs (QPs), or nonlinear programming (NLP), active-set methods offer suitable alternatives to interior-point methods. They explicitly identify the set of constraints that are active (binding) at the optimal solution. When these methods are started from a guess of the active set that is close to the optimal one, they often converge rapidly in a small number of iterations. An example of this is the simplex method for LPs. Its warm-start capabilities are indispensable for efficient branch-and-bound algorithms for mixed-integer linear programs.

Active-set methods for LPs, QPs, or NLPs are also known to outperform interior-point algorithms for problems that are not too large [11, 14, 17]. Similarly, active-set methods might be preferable when there are a large number of inequality constraints among which only a few are active, since an interior-point method is designed to consider all inequality constraints in every iteration and consequently solves large linear systems, whereas an active set method can ignore all inactive inequality constraints and encounters potentially much smaller linear systems.

Our goal is to propose an active-set alternative to the interior-point method in the context of SOCP that might provide similar benefits. We introduce a new sequential quadratic programming (SQP) algorithm that, in contrast to interior-point algorithms for SOCPs, has favorable warm-starting capabilities because it can utilize active-set QP solvers. We prove that it is globally convergent, i.e., all limit points of the generated iterates are optimal solutions under mild assumptions, and that it enjoys a quadratic convergence rate for non-degenerate instances. Our preliminary numerical experiments demonstrate that these theoretical properties are indeed observed in practice. They also show that the algorithm is able in some cases to compute a solution to a higher degree of precision than interior point methods. This is expected, again in analogy to the context of LPs, QPs, and NLPs, since an interior point method terminates at a small, but nonzero value of the barrier parameter that cannot be made smaller than some threshold (typically 10610^{-6} or 10810^{-8}) because the arising linear systems become highly ill-conditioned. In contrast, in the final iteration of the active-set method, the linear systems solved correspond directly to the optimality conditions, without any perturbation introduced by a barrier parameter, and are only as degenerate as the optimal solution of the problem.

The paper is structured as follows. Section 2 reviews the sequential quadratic programming method and the optimality conditions of SOCPs. Section 3 describes the algorithm, which is based on an outer approximation of the conic constraints. Section 4 establishes the global and local convergence properties of the method, and numerical experiments are reported in Section 5. Concluding remarks are offered in Section 6.

1.1 Related work

While a large number of interior-point algorithms for SOCP have been proposed, including some that have been implemented in efficient optimization packages [12, 15, 21], there are only very few approaches for solving SOCPs with an active-set framework. The method proposed by Goldberg and Leyffer [10] is a two-phase algorithm that combines a projected-gradient method with equality-constrained SQP. However, it is limited to instances that have only conic constraints (1c) and no additional linear constraints (1b). Hayashi et al. [13] propose a simplex-type method, where they reformulate the SOCP as a linear semi-infinite program to handle the fact that these instances have infinitely many extreme points. The resulting dual-simplex exchange method shows promising practical behavior. However, in contrast to the method proposed here, the authors conjecture that their method has only an R-linear local convergence rate. Zhadan [30] proposes a similar simplex-type method. Another advantage of the method presented in this paper is that the pivoting algorithm does not need to be designed and implemented from scratch. Instead, it can leverage existing implementations of active-set QP solvers, in particular the efficient handling of linear systems.

The proposed algorithm relies on polyhedral outer approximations based on well-known cutting planes for SOCPs. For instance, the methods for mixed-integer SOCP by Drewes and Ulbrich [7] and Coey et al. [4] use these cutting planes to build LP relaxations of the branch-and-bound subproblems. We note that an LP-based cutting plane algorithm for SOCP could be seen as an active-set method, but it is only linearly convergent. As pointed out in [6], it is crucial to consider the curvature of the conic constraint in the subproblem objective to achieve fast convergence.

The term “SQP method for SOCP” has also been used in the literature to refer to methods for solving nonlinear SOCPs [6, 16, 24, 31]. However, in contrast to the method here, in these approaches, the subproblems themselves are SOCPs (1) and include the linearization of the nonlinear objective and constraints. It will be interesting to explore extensions of the proposed method to nonlinear SOCPs in which feasibility is achieved asymptotically not only for the nonlinear constraints but also for the conic constraints.

1.2 Notation

For two vectors x,ynx,y\in\mathbb{R}^{n}, we denote with xyx\circ y their component-wise product, and the condition xyx\perp y stands for xTy=0x^{T}y=0. For xnx\in\mathbb{R}^{n}, we define [x]+[x]^{+} as the vector with entries max{xi,0}\max\{x_{i},0\}. We denote by \|\cdot\|, 1\|\cdot\|_{1}, \|\cdot\|_{\infty} the Euclidean norm, the 1\ell_{1}-norm, and the \ell_{\infty}-norm, respectively. For a cone 𝒦j\mathcal{K}_{j}, ejinje_{ji}\in\mathbb{R}^{n_{j}} is the canonical basis vector with 11 in the element corresponding to xjix_{ji} for i{0,,nj1}i\in\{0,\ldots,n_{j}-1\}, and int(𝒦j)\operatorname{int}(\mathcal{K}_{j}) and bd(𝒦j)\operatorname{bd}(\mathcal{K}_{j}) denote the cone’s interior and boundary, respectively.

2 Preliminaries

The NLP reformulation of the SOCP is introduced in Section 2.1. We review in Section 2.2 the local convergence properties of the SQP method and in Section 2.3 the penalty function as a means to promote convergence from any starting point. In Section 2.4, we briefly state the optimality conditions and our assumption for the SOCP (1).

2.1 Reformulation as a smooth optimization problem

The definition of the second-order cone in (2) suggests that the conic constraint (1c) can be replaced by the nonlinear constraint

rj(xj):=x¯jxj00r_{j}(x_{j}):=\|\bar{x}_{j}\|-x_{j0}\leq 0

without changing the set of feasible points. Consequently, (1) is equivalent to

(3a) minxn\displaystyle\min_{x\in\mathbb{R}^{n}}\ cTx\displaystyle c^{T}x
(3b) s.t. Axb,\displaystyle Ax\leq b,
(3c) rj(xj)0,j𝒥.\displaystyle r_{j}(x_{j})\leq 0,\qquad j\in\mathcal{J}.

Unfortunately, (3) cannot be solved directly with standard gradient-based algorithms for nonlinear optimization, such as SQP methods. The reason is that rjr_{j} is not differentiable whenever x¯j=0\bar{x}_{j}=0. This is particularly problematic when the optimal solution xx^{*} of the SOCP lies at the extreme point of a cone, xj=0𝒦jx^{*}_{j}=0\in\mathcal{K}_{j}. In that case, the Karush-Kuhn-Tucker (KKT) necessary optimality conditions for the NLP formulation, which are expressed in terms of derivatives, cannot be satisfied. Therefore, any optimization algorithm that seeks KKT points cannot succeed. As a remedy, differentiable approximations of rjr_{j} have been proposed in the past; see, for example, [28]. However, high accuracy comes at the price of high curvature, which can make finding the numerical solution of the NLP difficult.

An alternative equivalent reformulation of the conic constraint is given by

x¯j2xj020 and xj00.\|\bar{x}_{j}\|^{2}-x_{j0}^{2}\leq 0\text{ and }x_{j0}\geq 0.

In this case, the constraint function is differentiable. But if xj=0x_{j}^{*}=0, its gradient vanishes, and as a consequence, no constraint qualification applies and the KKT conditions do not hold. Therefore, again, a gradient-based method cannot be employed. By using an outer approximation of the cones that is improved in the course of the algorithm, our proposed variation of the SQP method is able to avoid these kinds of degeneracy.

To facilitate the discussion we define a point-wise partition of the cones.

Definition 2.1.

Let xnx\in\mathbb{R}^{n}.

  1. 1.

    We call a cone 𝒦j\mathcal{K}_{j} extremal-active at xx, if xj=0x_{j}=0, and we denote with (x)={j𝒥:xj=0}\mathcal{E}(x)=\left\{j\in\mathcal{J}:x_{j}=0\right\} the set of extremal-active cones at xx.

  2. 2.

    We define the set 𝒟(x)={j𝒥:x¯j0}\mathcal{D}(x)=\left\{j\in\mathcal{J}:\bar{x}_{j}\neq 0\right\} as the set of all cones for which the function rjr_{j} is differentiable at xx.

  3. 3.

    We define the set 𝒩(x)={j𝒥:xj0 and x¯j=0}\mathcal{N}(x)=\left\{j\in\mathcal{J}:x_{j}\neq 0\text{ and }\bar{x}_{j}=0\right\} as the set of all cones that are not extremal-active and for which rjr_{j} is not differentiable xx.

If the set (x)\mathcal{E}(x^{*}) at an optimal solution xx^{*} were known in advance, we could compute xx^{*} as a solution of (1) by solving the NLP

(4a) minxn\displaystyle\underset{x\in\mathbb{R}^{n}}{\min}\ cTx\displaystyle c^{T}x
(4b) s.t. Axb,\displaystyle Ax\leq b,
(4c) rj(x)0,j𝒟(x),\displaystyle r_{j}(x)\leq 0,\quad j\in\mathcal{D}(x^{*}),
(4d) xj=0,j(x).\displaystyle x_{j}=0,\hskip 22.76228ptj\in\mathcal{E}(x^{*}).

The constraints involving the linearization of rjr_{j} are imposed only if rjr_{j} is differentiable at xx^{*}, and variables in cones that are extremal-active at xx^{*} are explicitly fixed to zero. With this, locally around xx^{*}, all functions in (4) are differentiable and we could apply standard second-order algorithms to achieve fast local convergence.

In (4), we omitted the cones in 𝒩(x)\mathcal{N}(x^{*}). If xx^{*} is feasible for the SOCP and j𝒩(x)j\in\mathcal{N}(x^{*}) we have x¯j=0\bar{x}^{*}_{j}=0 and xj0>0x^{*}_{j0}>0, and so rj(x)<0r_{j}(x^{*})<0. This implies that the nonlinear constraint (4c) for this cone is not active and we can omit it from the problem statement without impacting the optimal solution.

2.2 Local convergence of SQP methods

The proposed algorithm is designed to guide the iterates xkx^{k} into the neighborhood of an optimal solution xx^{*}. If the optimal solution is not degenerate and the iterates are sufficiently close to xx^{*}, the steps generated by the algorithm are eventually identical to the steps that the SQP method would take for solving the differentiable optimization problem (4). In this section, we review the mechanisms and convergence results of the basic SQP method [23].

At an iterate xkx^{k}, the basic SQP method, applied to (4), computes a step dkd^{k} as an optimal solution to the QP subproblem

(5a) mindn\displaystyle\min_{d\in\mathbb{R}^{n}}\ cTd+12dTHkd\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d
(5b) s.t. A(xk+d)b,\displaystyle A(x^{k}+d)\leq b,
(5c) rj(xjk)+NewArj(xjk)Tdj0,j𝒟(x),\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}\leq 0,\hskip 28.45274ptj\in\mathcal{D}(x^{*}),
(5d) xjk+dj=0,j(x).\displaystyle x_{j}^{k}+d_{j}=0,\hskip 86.78099ptj\in\mathcal{E}(x^{*}).

Here, HkH^{k} is the Hessian of the Lagrangian function for (4), which in our case is

(6) Hk=j𝒟(x)μjkNewAxx2rj(xjk),H^{k}=\sum_{j\in\mathcal{D}(x^{*})}\mu^{k}_{j}{\rm NewA}^{2}_{xx}r_{j}(x_{j}^{k}),

where μjk0\mu_{j}^{k}\geq 0 are estimates of the optimal multipliers for the nonlinear constraint (4c), and where NewAxx2rj(xj){\rm NewA}^{2}_{xx}r_{j}(x_{j}) is the n×nn\times n block-diagonal matrix with

(7) NewA2rj(xj)=[0001x¯jIx¯jx¯jTx¯j3]{\rm NewA}^{2}r_{j}(x_{j})=\begin{bmatrix}0&0\\ 0&\frac{1}{\|\bar{x}_{j}\|}I-\frac{\bar{x}_{j}\bar{x}_{j}^{T}}{\|\bar{x}_{j}\|^{3}}\end{bmatrix}

in the rows and columns corresponding to xjx_{j} for j𝒥j\in\mathcal{J}. It is easy to see that NewA2rj(xj){\rm NewA}^{2}r_{j}(x_{j}) is positive semi-definite. The estimates μjk\mu_{j}^{k} are updated based on the optimal multipliers μ^jk0\hat{\mu}_{j}^{k}\geq 0 corresponding to (5c).

Algorithm 1 Basic SQP Algorithm
1:Initial iterate x0x^{0} and multiplier estimates λ0\lambda^{0}, μ0\mu^{0}, and η0\eta^{0}.
2:for k=0,1,2k=0,1,2\cdots do
3:     Compute HkH^{k} from (6).
4:     Solve QP (5) to get step dkd^{k} and multipliers λ^k\hat{\lambda}^{k}, μ^jk\hat{\mu}_{j}^{k}, and η^k\hat{\eta}^{k}.
5:     Set xk+1xk+dkx^{k+1}\leftarrow x^{k}+d^{k} and μjk+1μ^jk\mu_{j}^{k+1}\leftarrow\hat{\mu}_{j}^{k} for all j𝒟(x)j\in\mathcal{D}(x^{*}).
6:end for

Algorithm 1 formally states the basic SQP method where λ^k0\hat{\lambda}^{k}\geq 0 and η^k\hat{\eta}^{k} denote the multipliers corresponding to (4b) and (4d), respectively. Because we are only interested in the behavior of the algorithm when xkx^{k} is close to xx^{*}, we assume here that x¯jk0\bar{x}_{j}^{k}\neq 0 for all j𝒟(x)j\in\mathcal{D}(x^{*}) and for all kk, and hence the gradient and Hessian of rjr_{j} can be computed. Note that the iterates λ^k\hat{\lambda}^{k} and η^k\hat{\eta}^{k} are not explicitly needed in Algorithm 1, but they are necessary to measure the optimality error and define the primal-dual iterate sequence that is analyzed in Theorem 2.2.

A fast rate of convergence can be proven under the following sufficient second-order optimality assumptions [3].

Assumption 1.

Suppose that xx^{*} is an optimal solution of the NLP (4) with corresponding KKT multipliers λ\lambda^{*}, μ\mu^{*}, and η\eta^{*}, satisfying the following properties:

  1. (i)

    Strict complementarity holds;

  2. (ii)

    the linear independence constraint qualification (LICQ) holds at xx^{*}, i.e., the gradients of the constraints that hold with equality at xx^{*} are linearly independent;

  3. (iii)

    the projection of the Lagrangian Hessian H=j𝒟(x)μjNewAxx2rj(xj)H^{*}=\sum_{j\in\mathcal{D}(x^{*})}\mu^{*}_{j}{\rm NewA}^{2}_{xx}r_{j}(x_{j}^{*}) into the null space of the gradients of the active constraints is positive definite.

Under these assumptions, the basic SQP algorithm reduces to Newton’s method applied to the optimality conditions of (4) and the following result holds [23].

Theorem 2.2.

Suppose that Assumption 1 holds and that the initial iterate x0x^{0} and multipliers μ0\mu^{0} (used in the Hessian calculation) are sufficiently close to xx^{*} and μ\mu^{*}, respectively. Then the iterates (xk+1,λ^k,μ^k,η^k)(x^{k+1},\hat{\lambda}^{k},\hat{\mu}^{k},\hat{\eta}^{k}) generated by the basic SQP algorithm, Algorithm 1, converge to (x,λ,μ,η)(x^{*},\lambda^{*},\mu^{*},\eta^{*}) at a quadratic rate.

2.3 Penalty function

Theorem 2.2 is a local convergence result. Practical SQP algorithms include mechanisms that make sure that the iterates eventually reach such a neighborhood, even if the starting point is far away. To this end, we employ the exact penalty function

(8) φ(x;ρ)=cTx+ρj𝒥[rj(xj)]+\varphi(x;\rho)=c^{T}x+\rho\sum_{j\in\mathcal{J}}[r_{j}(x_{j})]^{+}

in which ρ>0\rho>0 is a penalty parameter. Note that we define φ\varphi in terms of all conic constraints 𝒥\mathcal{J}, even though rjr_{j} appears in (4c) only for j𝒟(x)j\in\mathcal{D}(x^{*}). We do this because the proposed algorithm does not know 𝒟(x)\mathcal{D}(x^{*}) in advance and the violation of all cone constraints needs to be taken into account when the original problem (1) is solved. Nevertheless, in this section, we may safely ignore the terms for j𝒟(x)j\not\in\mathcal{D}(x^{*}) because for j(x)j\in\mathcal{E}(x^{*}) we have xjk=0x^{k}_{j}=0 and hence [rj(xk)]+=0[r_{j}(x^{k})]^{+}=0 for all kk due to (5d), and when j𝒩(x)j\in\mathcal{N}(x^{*}), we have rj(xjk)<0r_{j}(x_{j}^{k})<0 when xkx^{k} is close to xx^{*} since rj(xj)<0r_{j}(x_{j}^{*})<0.

It can be shown, under suitable assumptions, that the minimizers of φ(;ρ)\varphi(\,\cdot\,;\rho) over the set defined by the linear constraints (4b),

(9) X={xn:Axb},X=\{x\in\mathbb{R}^{n}:Ax\leq b\},

coincide with the minimizers of (4) when ρ\rho is chosen sufficiently large. Because it is not known upfront how large ρ\rho needs to be, the algorithm uses an estimate, ρk\rho^{k}, in iteration kk, which might be increased during the course of the algorithm.

To ensure that the iterates eventually reach a minimizer of φ(;ρ)\varphi(\,\cdot\,;\rho), and therefore a solution of (4), we require that the decrease of φ(;ρ)\varphi(\,\cdot\,;\rho) is at least a fraction of that achieved in the piece-wise linear model of φ(;ρ)\varphi(\,\cdot\,;\rho) given by

(10) mk(xk+d;ρ)=cT(xk+d)+ρj𝒟(xk)[rj(xjk)+NewArj(xjk)Tdj]+,m^{k}(x^{k}+d;\rho)=c^{T}(x^{k}+d)+\rho\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})+{\rm NewA}r_{j}(x^{k}_{j})^{T}d_{j}]^{+},

constructed at xkx^{k}. More precisely, the algorithm accepts a trial point x^k+1=xk+d\hat{x}^{k+1}=x^{k}+d as a new iterate only if the sufficient decrease condition

(11) φ(x^k+1;ρk)φ(xk;ρk)\displaystyle\varphi(\hat{x}^{k+1};\rho^{k})-\varphi(x^{k};\rho^{k})\leq cdec(mk(xk+d;ρk)mk(xk;ρk))\displaystyle\,c_{\text{dec}}\Big{(}m^{k}(x^{k}+d;\rho^{k})-m^{k}(x^{k};\rho^{k})\Big{)}
=(10)\displaystyle\stackrel{{\scriptstyle\eqref{eq:merit_model}}}{{=}} cdec(cTdρkj𝒟(xk)[rj(xjk)]+)\displaystyle\,c_{\text{dec}}\biggl{(}c^{T}d-\rho^{k}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})]^{+}\biggr{)}

holds with some fixed constant cdec(0,1)c_{\text{dec}}\in(0,1). The trial iterate x^k+1=xk+dk\hat{x}^{k+1}=x^{k}+d^{k} with dkd^{k} computed from (5) might not always satisfy this condition. The proposed algorithm generates a sequence of improved steps of which one is eventually accepted.

However, to apply Theorem 2.2, it would be necessary that the algorithm take the original step dkd^{k} computed from (5); see Step 5 of Algorithm 1. Unfortunately, x^k+1=xk+dk\hat{x}^{k+1}=x^{k}+d^{k} might not be acceptable even when the iterate xkx^{k} is arbitrarily close to a non-degenerate solution xx^{*} satisfying Assumption 1 (a phenomenon called the Maratos effect [19]). Our remedy is to employ the second-order correction step [9], sks^{k}, which is obtained as an optimal solution of the QP

(12a) minsn\displaystyle\min_{s\in\mathbb{R}^{n}}\ cT(dk+s)+12(dk+s)THk(dk+s)\displaystyle c^{T}(d^{k}+s)+\tfrac{1}{2}(d^{k}+s)^{T}H^{k}(d^{k}+s)
(12b) s.t. A(xk+dk+s)b,\displaystyle A(x^{k}+d^{k}+s)\leq b,
(12c) rj(xjk+djk)+NewArj(xjk+djk)Tsj0,j𝒟(x),\displaystyle r_{j}(x_{j}^{k}+d_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k})^{T}s_{j}\leq 0,\qquad j\in\mathcal{D}(x^{*}),
(12d) xjk+djk+sj=0,j(x).\displaystyle x_{j}^{k}+d^{k}_{j}+s_{j}=0,\hskip 101.00737ptj\in\mathcal{E}(x^{*}).

For later reference, let λ^S,k\hat{\lambda}^{S,k}, μ^S,k\hat{\mu}^{S,k} and η^S,k\hat{\eta}^{S,k} denote optimal multiplier vectors corresponding to (12b)–(12d), respectively. The algorithm accepts the trial point x^k+1=xk+dk+sk\hat{x}^{k+1}=x^{k}+d^{k}+s^{k} if it yields sufficient decrease (11) with respect to the original SQP step d=dkd=d^{k}. Note that (12) is a variation of the second-order correction that is usually used in SQP methods, for which (12c) reads

rj(xjk+djk)+NewArj(xjk)T(djk+sj)0,j𝒟(x),r_{j}(x_{j}^{k}+d_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}(d_{j}^{k}+s_{j})\leq 0,\qquad j\in\mathcal{D}(x^{*}),

and avoids the evaluation of NewArj(xjk+djk){\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k}). In our setting, however, evaluating NewArj(xjk+djk){\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k}) takes no extra work and (12c) is equivalent to a supporting hyperplane, see Section 3.1. As the following theorem shows (see, e.g., [9] or [5, Section 15.3.2.3]), this procedure computes steps with sufficient decrease (11) and results in quadratic convergence.

Theorem 2.3.

Let Assumption 1 hold and assume that the initial iterate x0x^{0} and multipliers μ0\mu^{0} are sufficiently close to xx^{*} and μ\mu^{*}, respectively. Further suppose that ρk=ρ\rho^{k}=\rho^{\infty} for large kk where ρ>μj\rho^{\infty}>\mu_{j}^{*} for all j𝒟(x)j\in\mathcal{D}(x^{*}).

  1. 1.

    Consider an algorithm that generates a sequence of iterates by setting (xk+1,λk+1,μk+1,ηk+1)=(xk+dk,λ^k,μ^k,η^k)(x^{k+1},\lambda^{k+1},\mu^{k+1},\eta^{k+1})=(x^{k}+d^{k},\hat{\lambda}^{k},\hat{\mu}^{k},\hat{\eta}^{k}) or (xk+1,λk+1,μk+1,ηk+1)=(xk+dk+sk,λ^S,k,μ^S,k,η^S,k)(x^{k+1},\lambda^{k+1},\mu^{k+1},\eta^{k+1})=(x^{k}+d^{k}+s^{k},\hat{\lambda}^{S,k},\hat{\mu}^{S,k},\hat{\eta}^{S,k}) for all k=0,1,2,k=0,1,2,\ldots. Then (xk,λk,μk,ηk)(x^{k},\lambda^{k},\mu^{k},\eta^{k}) converges to (x,λ,μ,η)(x^{*},\lambda^{*},\mu^{*},\eta^{*}) at a quadratic rate.

  2. 2.

    Further, for all kk, either x^k+1=xk+dk\hat{x}^{k+1}=x^{k}+d^{k} or x^k+1=xk+dk+sk\hat{x}^{k+1}=x^{k}+d^{k}+s^{k} satisfies the acceptance criterion (11).

2.4 Optimality conditions for SOCP

The proposed algorithm aims at finding an optimal solution of the SOCP (1), or equivalently, values of the primal variables, xnx^{*}\in\mathbb{R}^{n}, and the dual variables, λm\lambda^{*}\in\mathbb{R}^{m} and zjnjz_{j}^{*}\in\mathbb{R}^{n_{j}} for j𝒥j\in\mathcal{J}, that satisfy the necessary and sufficient optimality conditions [1, Theorem 16]

(13a) c+ATλz\displaystyle c+A^{T}\lambda^{*}-z^{*} =0,\displaystyle=0,
(13b) Axb0\displaystyle Ax^{*}-b\leq 0 λ0,\displaystyle\perp\lambda^{*}\geq 0,
(13c) 𝒦jxj\displaystyle\mathcal{K}_{j}\ni x_{j}^{*} zj𝒦j,j𝒥.\displaystyle\perp z_{j}^{*}\in\mathcal{K}_{j},\qquad j\in\mathcal{J}.

A thorough discussion of SOCPs is given in the comprehensive review by Alizadeh and Goldfarb [1]. The authors consider the formulation in which the linear constraints (1b) are equality constraints, but the results in [1] can be easily extended to inequalities.

The primal-dual solution (x,λ,z)(x^{*},\lambda^{*},z^{*}) is unique under the following assumption.

Assumption 2.

(x,λ,z)(x^{*},\lambda^{*},z^{*}) is a non-degenerate primal-dual solution of the SOCP (1) at which strict complementarity holds.

The definition of non-degeneracy for SOCP is somewhat involved and we refer the reader to [1, Theorem 21]. Strict complementarity holds if xj+zjint(𝒦j)x_{j}^{*}+z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j}) and implies that: (i) xjint(Kj)zj=0x^{*}_{j}\in\operatorname{int}(K_{j})\Longrightarrow z^{*}_{j}=0; (ii) zjint(Kj)xj=0z^{*}_{j}\in\operatorname{int}(K_{j})\Longrightarrow x^{*}_{j}=0; (iii) xjbd(Kj){0}zjbd(Kj){0}x^{*}_{j}\in\operatorname{bd}(K_{j})\setminus\{0\}\Longleftrightarrow z^{*}_{j}\in\operatorname{bd}(K_{j})\setminus\{0\}; and (iv) not both xjx^{*}_{j} and zjz_{j}^{*} are zero.

3 Algorithm

The proposed algorithm solves the NLP formulation (3) using a variation of the SQP method. Since the functional formulation of the cone constraints (3c) might not be differentiable at all iterates or at an optimal solution, the cones are approximated by a polyhedral outer approximation using supporting hyperplanes.

The approximation is done so that the method implicitly identifies the constraints that are extremal-active at an optimal solution xx^{*}, i.e., (x)=(xk)\mathcal{E}(x^{*})=\mathcal{E}(x^{k}) for large kk. More precisely, we will show that close to a non-degenerate optimal solution, the steps generated by the proposed algorithm are identical to those computed by the QP subproblem (5) for the basic SQP algorithm for solving (4). Consequently, fast local quadratic convergence is achieved, as discussed in Section 2.2.

3.1 Supporting hyperplanes

In the following, consider a particular cone 𝒦j\mathcal{K}_{j} and let 𝒴j\mathcal{Y}_{j} be a finite subset of {yjnj:y¯j0,yj00}\{y_{j}\in\mathbb{R}^{n_{j}}:\bar{y}_{j}\neq 0,y_{j0}\geq 0\}. We define the cone

(14) 𝒞j(𝒴j)={xjnj:xj00 and NewArj(yj)Txj0 for all yj𝒴j}\mathcal{C}_{j}(\mathcal{Y}_{j})=\left\{x_{j}\in\mathbb{R}^{n_{j}}:x_{j0}\geq 0\text{ and }{\rm NewA}r_{j}(y_{j})^{T}x_{j}\leq 0\text{ for all }y_{j}\in\mathcal{Y}_{j}\right\}

generated by the points in 𝒴j\mathcal{Y}_{j}. For each xj𝒦jx_{j}\in\mathcal{K}_{j} we have rj(xj)0r_{j}(x_{j})\leq 0, and using

(15) NewArj(xj)=(1,x¯jTx¯j)T,{\rm NewA}r_{j}(x_{j})=\biggl{(}-1,\frac{\bar{x}_{j}^{T}}{\|\bar{x}_{j}\|}\biggr{)}^{T},

we obtain for any yj𝒴jy_{j}\in\mathcal{Y}_{j} that

NewArj(yj)Txj=1y¯jy¯jTx¯jxj01y¯jy¯jx¯jxj0=rj(xj)0.{\rm NewA}r_{j}(y_{j})^{T}x_{j}=\frac{1}{\|\bar{y}_{j}\|}\bar{y}_{j}^{T}\bar{x}_{j}-x_{j0}\leq\frac{1}{\|\bar{y}_{j}\|}\|\bar{y}_{j}\|\|\bar{x}_{j}\|-x_{j0}=r_{j}(x_{j})\leq 0.

Therefore 𝒞j(𝒴j)𝒦j\mathcal{C}_{j}(\mathcal{Y}_{j})\supseteq\mathcal{K}_{j}. Also, for yj𝒴jy_{j}\in\mathcal{Y}_{j}, consider xj=(1,y¯jT/y¯j)T.x_{j}=(1,\bar{y}_{j}^{T}/\|\bar{y}_{j}\|)^{T}. Then

NewArj(yj)Txj=y¯jTy¯jy¯jy¯j1=11=0,{\rm NewA}r_{j}(y_{j})^{T}x_{j}=\frac{\bar{y}_{j}^{T}}{\|\bar{y}_{j}\|}\frac{\bar{y}_{j}}{\|\bar{y}_{j}\|}-1=1-1=0,

and also rj(xj)=x¯jxj0=y¯j/y¯j1=0.r_{j}(x_{j})=\|\bar{x}_{j}\|-x_{j0}={\bar{y}_{j}}/{\|\bar{y}_{j}\|}-1=0. Hence xj𝒞j(𝒴j)𝒦jx_{j}\in\mathcal{C}_{j}(\mathcal{Y}_{j})\cap\mathcal{K}_{j}. Therefore, for any yj𝒴jy_{j}\in\mathcal{Y}_{j}, the inequality

(16) NewArj(yj)Txj0{\rm NewA}r_{j}(y_{j})^{T}x_{j}\leq 0

defines a hyperplane that supports 𝒦j\mathcal{K}_{j} at (1,y¯j/y¯j)(1,{\bar{y}_{j}}/{\|\bar{y}_{j}\|}). In summary, 𝒞j(𝒴j)\mathcal{C}_{j}(\mathcal{Y}_{j}) is a polyhedral outer approximation of 𝒦j\mathcal{K}_{j}, defined by supporting hyperplanes.

In addition, writing 𝒴j={yj,1,,yj,m}\mathcal{Y}_{j}=\{y_{j,1},\ldots,y_{j,m}\}, we also define the cone

(17) 𝒞j(𝒴j):={l=1mσj,lNewArj(yj,l)+ηjej0:σj+m,ηj0}.\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j}):=\left\{-\sum_{l=1}^{m}\sigma_{j,l}{\rm NewA}r_{j}(y_{j,l})+\eta_{j}e_{j0}:\sigma_{j}\in\mathbb{R}^{m}_{+},\eta_{j}\geq 0\right\}.

For all xj𝒞j(𝒴j)x_{j}\in\mathcal{C}_{j}(\mathcal{Y}_{j}) and zj=l=1mσj,lNewArj(yj,l)+ηjej0𝒞j(𝒴j)z_{j}=-\sum_{l=1}^{m}\sigma_{j,l}{\rm NewA}r_{j}(y_{j,l})+\eta_{j}e_{j0}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j}), we have

xjTzj=l=1mσj,lNewArj(yj,l)Txj+ηjxj00x_{j}^{T}z_{j}=-\sum_{l=1}^{m}\sigma_{j,l}{\rm NewA}r_{j}(y_{j,l})^{T}x_{j}+\eta_{j}x_{j0}\geq 0

because NewArj(yj,l)Txj0{\rm NewA}r_{j}(y_{j,l})^{T}x_{j}\leq 0 and xj00x_{j0}\geq 0 from the definition of 𝒞j(𝒴j)\mathcal{C}_{j}(\mathcal{Y}_{j}). Therefore 𝒞j(𝒴j)\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j}) is included in the dual of the cone 𝒞j(𝒴j)\mathcal{C}_{j}(\mathcal{Y}_{j}).

Now define R=[NewArj(yj,1),,NewArj(yj,m),ej0]R=[-{\rm NewA}r_{j}(y_{j,1}),\ldots,-{\rm NewA}r_{j}(y_{j,m}),e_{j0}] and let zjnjz_{j}\in\mathbb{R}^{n_{j}} be in the dual of 𝒞j(𝒴j)\mathcal{C}_{j}(\mathcal{Y}_{j}). Since this implies that xjTzj0x_{j}^{T}z_{j}\geq 0 for all x𝒞j(𝒴j)={nj:RTxj0}x\in\mathcal{C}_{j}(\mathcal{Y}_{j})=\{\mathbb{R}^{n_{j}}:R^{T}x_{j}\geq 0\}, Farkas’ lemma yields that zj=R(σT,η)Tz_{j}=R\cdot(\sigma^{T},\eta)^{T} for some σj+m\sigma_{j}\in\mathbb{R}^{m}_{+} and ηj0\eta_{j}\geq 0, i.e., zj𝒞j(𝒴j)z_{j}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j}).

Overall we proved that 𝒞j(𝒴j)\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j}) defined in (17) is the dual of 𝒞j(𝒴j)\mathcal{C}_{j}(\mathcal{Y}_{j}), and since 𝒞j(𝒴j)𝒦j\mathcal{C}_{j}(\mathcal{Y}_{j})\supseteq\mathcal{K}_{j}, this implies 𝒞j(𝒴j)𝒦j\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j})\subseteq\mathcal{K}_{j}.

3.2 QP subproblem

In each iteration, at an iterate xkx^{k}, the proposed algorithm computes a step dkd^{k} as an optimal solution of the subproblem

(18a) mindn\displaystyle\min_{d\in\mathbb{R}^{n}}\ cTd+12dTHkd\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d
(18b) s.t. A(xk+d)b,\displaystyle A(x^{k}+d)\leq b,
(18c) rj(xjk)+NewArj(xjk)Tdj0,j𝒟(xk),\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}\leq 0,\qquad j\in\mathcal{D}(x^{k}),
(18d) xjk+dj𝒞j(𝒴jk),j𝒥.\displaystyle x_{j}^{k}+d_{j}\in\mathcal{C}_{j}(\mathcal{Y}^{k}_{j}),\hskip 54.89566ptj\in\mathcal{J}.

Here, HkH^{k} is a positive semi-definite matrix that captures the curvature of the nonlinear constraint (3c), and for each cone, 𝒴jk\mathcal{Y}^{k}_{j} is the set of hyperplane-generating points that have been accumulated up to this iteration. From (14), we see that (18d) can be replaced by linear constraints. Consequently, (18) is a QP and can be solved as such.

Algorithm 2 Preliminary SQP Algorithm
1:Initial iterate x0x^{0} and sets 𝒴j0\mathcal{Y}_{j}^{0} for j𝒥j\in\mathcal{J}.
2:for k=0,1,2k=0,1,2\cdots do
3:     Choose HkH^{k}.
4:     Solve subproblem (18) to get step dkd^{k}.
5:     Set xk+1xk+dkx^{k+1}\leftarrow x^{k}+d^{k}.
6:     Set 𝒴jk+1𝒴pr,j+(𝒴jk,xjk)\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}_{pr,j}^{+}(\mathcal{Y}^{k}_{j},x^{k}_{j}) for j𝒥j\in\mathcal{J}.
7:end for

Algorithm 2 describes a preliminary version of the proposed SQP method based on this subproblem. Observe that the linearization (18c) can be rewritten as

0\displaystyle 0 rj(xjk)+NewArj(xjk)Tdj=x¯jkxj0kdj0+(x¯jk)Td¯jx¯jk\displaystyle\geq r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}=\|\bar{x}^{k}_{j}\|-x^{k}_{j0}-d_{j0}+\frac{(\bar{x}^{k}_{j})^{T}\bar{d}_{j}}{\|\bar{x}^{k}_{j}\|}
=1x¯jk(x¯jk)T(x¯jk+d¯j)(xj0k+dj0)=NewArj(xjk)T(xjk+dj)\displaystyle=\frac{1}{\|\bar{x}^{k}_{j}\|}(\bar{x}^{k}_{j})^{T}(\bar{x}^{k}_{j}+\bar{d}_{j})-(x^{k}_{j0}+d_{j0})={\rm NewA}r_{j}(x^{k}_{j})^{T}(x^{k}_{j}+d_{j})

and is equivalent to the hyperplane constraint generated at xjkx_{j}^{k}. Consequently, if xjk𝒦jx^{k}_{j}\not\in\mathcal{K}_{j}, then rj(xjk)>0r_{j}(x^{k}_{j})>0 and (18c) acts as a cutting plane that excludes xjkx^{k}_{j}. Using the update rule

(19) 𝒴pr,j+(𝒴j,xj)={𝒴j{xj} if x¯j0 and rj(xj)>0,𝒴j otherwise,\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}_{j},x_{j})=\begin{cases}\mathcal{Y}_{j}\cup\{x_{j}\}&\text{ if }\bar{x}_{j}\neq 0\text{ and }r_{j}(x_{j})>0,\\ \mathcal{Y}_{j}&\text{ otherwise,}\end{cases}

in Step 6 makes sure that xjkx^{k}_{j} is excluded in all future iterations.

In our algorithm, we initialize 𝒴j0\mathcal{Y}_{j}^{0} so that

(20) 𝒴j0𝒴^j0:={eji:i=1,,nj1}{eji:i=1,,nj1}.\mathcal{Y}_{j}^{0}\supseteq\hat{\mathcal{Y}}^{0}_{j}:=\{e_{ji}:i=1,\ldots,n_{j}-1\}\cup\{-e_{ji}:i=1,\ldots,n_{j}-1\}.

In this way, xj=0x_{j}=0 is an extreme point of 𝒞j(𝒴j0)\mathcal{C}_{j}(\mathcal{Y}_{j}^{0}), as it is for 𝒦j\mathcal{K}_{j}, and the challenging aspect of the cone is already captured in the first subproblem. By choosing the coordinate vectors ejie_{ji} we have NewArj(eji)Txj=xjixj0,{\rm NewA}r_{j}(e_{ji})^{T}x_{j}=x_{ji}-x_{j0}, and the hyperplane constraint (16) becomes a very sparse linear constraint.

When Hk=0H^{k}=0 in each iteration, this procedure becomes the standard cutting plane algorithm for the SOCP (1). It is well-known that the cutting plane algorithm is convergent in the sense that every limit point of the iterates is an optimal solution of the SOCP (1), but the convergence is typically slow. In the following sections, we describe how Algorithm 2 is augmented to achieve fast local convergence. The full method is stated formally in Algorithm 3.

3.3 Identification of extremal-active cones

We now describe a strategy that enables our algorithm to identify those cones that are extreme-active at a non-degenerate solution xx^{*} within a finite number of iterations, i.e., (xk)=(x)\mathcal{E}(x^{k})=\mathcal{E}(x^{*}) for all large kk. This will make it possible to apply a second-order method and achieve quadratic local convergence.

Consider the optimality conditions for the QP subproblem (18):

(21a) c+Hkdk+ATλ^k+j𝒟(xk)μ^jkNewAxrj(xk)ν^k\displaystyle c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}+\sum_{j\in\mathcal{D}(x^{k})}\hat{\mu}_{j}^{k}{\rm NewA}_{x}r_{j}(x^{k})-\hat{\nu}^{k} =0,\displaystyle=0,
(21b) A(xk+dk)b0\displaystyle A(x^{k}+d^{k})-b\leq 0 λ^k0,\displaystyle\perp\hat{\lambda}^{k}\geq 0,
(21c) rj(xjk)+NewArj(xjk)Tdjk0\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}^{k}\leq 0 μ^j0,j𝒟(xk),\displaystyle\perp\hat{\mu}_{j}\geq 0,\hskip 34.14322ptj\in\mathcal{D}(x^{k}),
(21d) 𝒞j(𝒴jk)xjk+djk\displaystyle\mathcal{C}_{j}(\mathcal{Y}_{j}^{k})\ni x^{k}_{j}+d_{j}^{k} ν^jk𝒞j(𝒴jk),j𝒥.\displaystyle\perp\hat{\nu}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j}^{k}),\quad j\in\mathcal{J}.

Here, λ^k\hat{\lambda}^{k}, μ^jk\hat{\mu}^{k}_{j}, and ν^jk\hat{\nu}_{j}^{k} are the multipliers corresponding to the constraints in (18); for completeness, we define μ^jk=0\hat{\mu}_{j}^{k}=0 for j𝒥𝒟(xk)j\in\mathcal{J}\setminus\mathcal{D}(x^{k}). In (21a), NewAxrj(xk){\rm NewA}_{x}r_{j}(x^{k}) is the vector in n\mathbb{R}^{n} that contains NewArj(xjk){\rm NewA}r_{j}(x_{j}^{k}) in the elements corresponding to xjx_{j} and is zero otherwise. Similarly, ν^kn\hat{\nu}^{k}\in\mathbb{R}^{n} is equal to ν^jk\hat{\nu}^{k}_{j} in the elements corresponding to xjx_{j} for all j𝒥j\in\mathcal{J} and zero otherwise.

Let us define

(22) 𝒴^jk:={𝒴jk{xjk}, if j𝒟(xk),𝒴jk, if j𝒥𝒟(xk).\hat{\mathcal{Y}}_{j}^{k}:=\begin{cases}\mathcal{Y}_{j}^{k}\cup\{x^{k}_{j}\},&\text{ if }j\in\mathcal{D}(x^{k}),\\ \mathcal{Y}_{j}^{k},&\text{ if }j\in\mathcal{J}\setminus\mathcal{D}(x^{k}).\end{cases}

It is easy to verify that, for j𝒟(xk)j\in\mathcal{D}(x^{k}), NewArj(xjk)xjk=rj(xjk){\rm NewA}r_{j}(x_{j}^{k})x_{j}^{k}=r_{j}(x_{j}^{k}) and hence rj(xjk)T(xjk+dk)0r_{j}(x_{j}^{k})^{T}(x_{j}^{k}+d^{k})\leq 0 from (21c). As a consequence we obtain xjk+djk𝒞j(𝒴^jk)x^{k}_{j}+d_{j}^{k}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}_{j}^{k}) for all j𝒥j\in\mathcal{J}. Furthermore, ν^jk𝒞j(𝒴jk)\hat{\nu}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j}^{k}) implies that

ν^jk=l=1mσj,lkNewArj(yj,lk)+ηjkej0\hat{\nu}_{j}^{k}=-\sum_{l=1}^{m}\sigma^{k}_{j,l}{\rm NewA}r_{j}(y^{k}_{j,l})+\eta^{k}_{j}e_{j0}

for suitable values of σj,lk0\sigma^{k}_{j,l}\geq 0 and ηjk0\eta^{k}_{j}\geq 0. Then z^jk:=μ^jkrj(xk)+ν^jk𝒞j(𝒴^jk)\hat{z}_{j}^{k}:=-\hat{\mu}_{j}^{k}\nabla r_{j}(x^{k})+\hat{\nu}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k}) and

(23) z^k=c+Hkdk+ATλ^k\hat{z}^{k}=c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}

from (21a). In conclusion, if (d,λ^k,μ^k,ν^k)(d,\hat{\lambda}^{k},\hat{\mu}^{k},\hat{\nu}^{k}) is a primal-dual solution of the QP subproblem (18), then (d,λ^k,z^k)(d,\hat{\lambda}^{k},\hat{z}^{k}) satisfies the conditions

(24a) c+Hkdk+ATλ^kz^k\displaystyle c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}-\hat{z}^{k} =0,\displaystyle=0,
(24b) A(xk+dk)b0\displaystyle A(x^{k}+d^{k})-b\leq 0 λ^k0,\displaystyle\perp\hat{\lambda}^{k}\geq 0,
(24c) 𝒞j(𝒴^jk)xjk+djk\displaystyle\mathcal{C}_{j}(\hat{\mathcal{Y}}_{j}^{k})\ni x^{k}_{j}+d_{j}^{k} z^jk𝒞j(𝒴^jk),j𝒥,\displaystyle\perp\hat{z}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k}),\quad j\in\mathcal{J},

which more closely resembles the SOCP optimality conditions (13). Our algorithm maintains primal-dual iterates (xk+1,λ^k,z^k)(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k}) that are updated based on (24).

Suppose that strict-complementarity holds at a primal-dual solution (x,λ,z)(x^{*},\lambda^{*},z^{*}) of the SOCP (1) and that (xk+1,λ^k,z^k)(x,λ,z)(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})\to(x^{*},\lambda^{*},z^{*}). If j(x)j\not\in\mathcal{E}(x^{*}) then xj𝒦jx_{j}^{*}\in\mathcal{K}_{j} implies xj0>0x^{*}_{j0}>0. As xjkx^{k}_{j} converges to xjx^{*}_{j}, we have xj0k>0x^{k}_{j0}>0 and therefore j(xk)j\not\in\mathcal{E}(x^{k}) for sufficiently large kk. This yields (xk)(x)\mathcal{E}(x^{k})\subseteq\mathcal{E}(x^{*}). We now derive a modification of Algorithm 2 that ensures that (x)(xk)\mathcal{E}(x^{*})\subseteq\mathcal{E}(x^{k}) for all sufficiently large kk under Assumption 2.

Consider any j(x)j\in\mathcal{E}(x^{*}). We would like to have

(25) z^jkint(𝒞j(𝒴^jk))\hat{z}_{j}^{k}\in\operatorname{int}(\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k}))

for all large kk, since then complementarity in (24c) implies that xjk+1=xjk+djk=0x_{j}^{k+1}=x_{j}^{k}+d^{k}_{j}=0 and hence j(xk+1)j\in\mathcal{E}(x^{k+1}) for all large kk. We will later show that Assumption 2 implies that z^jkzj\hat{z}_{j}^{k}\to z^{*}_{j} and that there exists a neighborhood Nϵ(zj)={zjnj:zjzjϵ}N_{\epsilon}(z^{*}_{j})=\{z_{j}\in\mathbb{R}^{n_{j}}:\|z_{j}-z^{*}_{j}\|\leq\epsilon\} of zjz^{*}_{j} so that zjint(𝒞j(𝒴^j0{yj}))z_{j}\in\operatorname{int}(\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{0}\cup\{-y_{j}\})) if zj,yjNϵ(zj)z_{j},y_{j}\in N_{\epsilon}(z^{*}_{j}); see Remark 4.18. This suggests that some vector close to zj-z_{j}^{*} should eventually be included in 𝒴^jk\hat{\mathcal{Y}}_{j}^{k} because then (25) holds when z^jk\hat{z}_{j}^{k} is close enough to zjz^{*}_{j}. For this purpose, the algorithm computes

zˇk=c+ATλ^k,\check{z}^{k}=c+A^{T}\hat{\lambda}^{k},

which also converges to zjz^{*}_{j} (see (13a)), and sets 𝒴jk+1\mathcal{Y}^{k+1}_{j} to 𝒴du,j+(𝒴jk,xjk,zˇjk)\mathcal{Y}^{+}_{du,j}(\mathcal{Y}^{k}_{j},x^{k}_{j},\check{z}^{k}_{j}), where

(26) 𝒴du,j+(𝒴j,xj,zj)={𝒴j{zj} if xj0,z¯j0 and rj(zj)<0,𝒴j otherwise.\mathcal{Y}^{+}_{du,j}(\mathcal{Y}_{j},x_{j},z_{j})=\begin{cases}\mathcal{Y}_{j}\cup\{-z_{j}\}&\text{ if }x_{j}\neq 0,\bar{z}_{j}\neq 0\text{ and }r_{j}(z_{j})<0,\\ \mathcal{Y}_{j}&\text{ otherwise.}\end{cases}

The update is skipped when xjk=0x^{k}_{j}=0 (because then jj is already in (xk)\mathcal{E}(x^{k}) and no additional hyperplane is needed), and when zˇ¯jk=0\bar{\check{z}}^{k}_{j}=0 or rj(zˇjk)0r_{j}(\check{z}^{k}_{j})\geq 0, which might indicate that zjint(𝒦j)z^{*}_{j}\not\in\operatorname{int}(\mathcal{K}_{j}) and j(x)j\not\in\mathcal{E}(x^{*}).

3.4 Fast NLP-SQP steps

Now that we have a mechanism in place that makes sure that the extremal-active cones are identified in a finite number of iterations, we present a strategy that emulates the basic SQP Algorithm 1 and automatically takes quadratically convergent SQP steps, i.e., solutions of the SQP subproblem (5), close to xx^{*}. For the discussion in this section, we again assume that xx^{*} is a unique solution at which Assumption 2 holds.

Suppose that (xk)=(x)\mathcal{E}(x^{k})=\mathcal{E}(x^{*}) for large kk due to the strategy discussed in Section 3.3. This means that the outer approximation (18d) of 𝒦j\mathcal{K}_{j} for j(x)j\in\mathcal{E}(x^{*}) is sufficient to fix xjkx_{j}^{k} to zero and is therefore equivalent to the constraint (5d) in the basic SQP subproblem. However, (18) includes the outer approximations for all cones, including those for j(x)j\not\in\mathcal{E}(x^{*}), which are not present in (5). Consequently, the desired SQP step from (5) might not be feasible for (18).

As a remedy, at the beginning of an iteration, the algorithm first computes an NLP-SQP step as an optimal solution dS,kd^{S,k} of a relaxation of (18),

(27a) mindn\displaystyle\min_{d\in\mathbb{R}^{n}}\ cTd+12dTHkd\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d
(27b) s.t. A(xk+d)b\displaystyle A(x^{k}+d)\leq b
(27c) rj(xjk)+NewArj(xjk)Tdj0,j𝒟(xk)\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}\leq 0,\qquad j\in\mathcal{D}(x^{k})
(27d) xj0k+dj00,j𝒟(xk)^k\displaystyle x_{j0}^{k}+d_{j0}\geq 0,\hskip 71.13188ptj\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k}
(27e) xjk+dj𝒞j(𝒴jk)j^k,\displaystyle x_{j}^{k}+d_{j}\in\mathcal{C}_{j}(\mathcal{Y}^{k}_{j})\hskip 59.75095ptj\in\hat{\mathcal{E}}^{k},

where ^k=(xk)\hat{\mathcal{E}}^{k}=\mathcal{E}(x^{k}). In this way, the outer approximations are imposed only for the currently extremal-active cones, while for all other cones only the linearization (27c) is considered, just like in (5), with the additional restriction (27d) that ensure xj0k+10x^{k+1}_{j0}\geq 0. Let λ^k\hat{\lambda}^{k}, μ^jk\hat{\mu}_{j}^{k}, η^jk\hat{\eta}^{k}_{j}, and ν^jk\hat{\nu}^{k}_{j} be the optimal corresponding to the constraints in (27) (set to zero for non-existing constraints) and define z^k\hat{z}^{k} as in (23). Then the optimality conditions (24) hold again, this time with dk=dS,kd^{k}=d^{S,k}, but instead of (22) we have

(28) 𝒴^jk:={{xjk} if j𝒟(xk)^k,𝒴jk{xjk} if j^k𝒟(xk),𝒴jk if j^k𝒟(xk).\hat{\mathcal{Y}}_{j}^{k}:=\begin{cases}\{x^{k}_{j}\}&\text{ if }j\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k},\\ \mathcal{Y}_{j}^{k}\cup\{x^{k}_{j}\}&\text{ if }j\in\hat{\mathcal{E}}^{k}\cap\mathcal{D}(x^{k}),\\ \mathcal{Y}_{j}^{k}&\text{ if }j\in\hat{\mathcal{E}}^{k}\setminus\mathcal{D}(x^{k}).\end{cases}

When xkx^{k} is not close to xx^{*} and (x)(xk)\mathcal{E}(x^{*})\neq\mathcal{E}(x^{k}), QP (27) might result in poor steps that go far outside of 𝒦j\mathcal{K}_{j} for some j𝒟(xk)^kj\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k} and undermine convergence. Therefore, we iteratively add more cones to ^k\hat{\mathcal{E}}^{k} until

(29) xj0k+dj0S,k>0 only for j𝒥^k,x_{j0}^{k}+d_{j0}^{S,k}>0\text{ only for }j\in\mathcal{J}\setminus\hat{\mathcal{E}}^{k},

i.e., when a cone is approximated only by its linearization (27c), the step does not appear to target its extreme point. This property is necessary to show that (xk)=(x)\mathcal{E}(x^{k})=\mathcal{E}(x^{*}) for all large kk also for the case that new iterates are computed from (27) instead of (18). Note that in the extreme case ^k=𝒥\hat{\mathcal{E}}^{k}=\mathcal{J} and (27) is identical to (18). This loop can be found in Steps 710 in Algorithm 3.

Since there is no guarantee that (27) yields iterates that converge to xx^{*}, the algorithm discards the NLP-SQP step in certain situations and falls back to the original method to recompute a new step from (18), as in the original method. In Section 3.6 we describe how we use the exact penalty function (8) to determine when this is necessary.

3.5 Hessian matrix

Motivated by (6), we compute the Hessian matrix HkH^{k} in (18) and (27) from

(30) Hk=j𝒟(xk)μjkNewAxx2rj(xk),H^{k}=\sum_{j\in\mathcal{D}(x^{k})}\mu^{k}_{j}{\rm NewA}^{2}_{xx}r_{j}(x^{k}),

where μjk0\mu_{j}^{k}\geq 0 are multiplier estimates for the nonlinear constraint (3c). Because NewA2rj(xjk){\rm NewA}^{2}r_{j}(x_{j}^{k}) is positive semi-definite and μjk0\mu_{j}^{k}\geq 0, also HkH^{k} is positive semi-definite.

In the final phase, when we intend to emulate the basic SQP Algorithm 1. Therefore, we set μjk+1=μ^jk\mu_{j}^{k+1}=\hat{\mu}^{k}_{j} for j𝒟(xk)j\in\mathcal{D}(x^{k}), where μ^jk\hat{\mu}_{j}^{k} are the optimal multipliers for (27c), when the fast NLP-SQP step was accepted. But we also need to define a value for μjk+1\mu_{j}^{k+1} when the step is computed from (18) where, in addition to the linearization of rjr_{j}, hyperplanes (18d) are used to approximate all cones. By comparing the optimality conditions of the QPs (18) and (5) we now derive an update for μjk+1\mu_{j}^{k+1}.

Suppose that j𝒟(xk+1)𝒟(xk)j\in\mathcal{D}(x^{k+1})\cap\mathcal{D}(x^{k}). Then (21a) yields

(31) cj+Hjjkdjk+AjTλ^k+μ^jkNewArj(xjk)ν^jk=0,c_{j}+H_{jj}^{k}d^{k}_{j}+A_{j}^{T}\hat{\lambda}^{k}+\hat{\mu}_{j}^{k}{\rm NewA}r_{j}(x_{j}^{k})-\hat{\nu}^{k}_{j}=0,

where Hjjk=μjkNewA2rj(xjk)H^{k}_{jj}=\mu_{j}^{k}{\rm NewA}^{2}r_{j}(x^{k}_{j}) because of (30). Here, the dual information for the nonlinear constraint is split into μ^jk\hat{\mu}_{j}^{k} and ν^jk\hat{\nu}^{k}_{j} and needs to be condensed into a single number, μjk+1\mu_{j}^{k+1}, so that we can compute HkH^{k} from (30) in the next iteration.

Recall that, in the basic SQP Algorithm 1, the new multipliers μjk+1\mu_{j}^{k+1} are set to the optimal multipliers of the QP (5), which satisfy

(32) cj+Hjjkdjk+AjTλ^k+μjk+1NewArj(xjk)=0.c_{j}+H_{jj}^{k}d_{j}^{k}+A_{j}^{T}\hat{\lambda}^{k}+\mu_{j}^{k+1}{\rm NewA}r_{j}(x_{j}^{k})=0.

A comparison with (31) suggests to choose μjk+1\mu_{j}^{k+1} so that μjk+1NewArj(xjk)μ^jkNewArj(xjk)ν^jk.\mu_{j}^{k+1}{\rm NewA}r_{j}(x_{j}^{k})\approx\hat{\mu}_{j}^{k}{\rm NewA}r_{j}(x_{j}^{k})-\hat{\nu}^{k}_{j}. Multiplying both sides with NewArj(xjk)T{\rm NewA}r_{j}(x^{k}_{j})^{T} and solving for μk+1\mu^{k+1} yields

μjk+1=μ^jkNewArj(xjk)Tν^jkNewArj(xjk)2.\mu^{k+1}_{j}=\hat{\mu}_{j}^{k}-\frac{{\rm NewA}r_{j}(x_{j}^{k})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k})\|^{2}}.

Note that μjk+1=μ^jk\mu_{j}^{k+1}=\hat{\mu}_{j}^{k} if the outer approximation constraint (18d) is not active and therefore ν^jk=0\hat{\nu}_{j}^{k}=0 for jj. In this case, we recover the basic SQP update, as desired.

Now suppose that j𝒟(xk+1)𝒟(xk)j\in\mathcal{D}(x^{k+1})\setminus\mathcal{D}(x^{k}). Again comparing (31) with (32) suggests a choice so that μjk+1NewArj(xjk+1)ν^jk,\mu_{j}^{k+1}{\rm NewA}r_{j}(x_{j}^{k+1})\approx-\hat{\nu}^{k}_{j}, where we substituted NewArj(xjk){\rm NewA}r_{j}(x^{k}_{j}) by NewArj(xjk+1){\rm NewA}r_{j}(x^{k+1}_{j}) because the former is not defined for j𝒟(xk)j\not\in\mathcal{D}(x^{k}). In this case, multiplying both sides with NewArj(xjk+1)T{\rm NewA}r_{j}(x^{k+1}_{j})^{T} and solving for μk+1\mu^{k+1} yields

μjk+1=NewArj(xjk+1)Tν^jkNewArj(xjk+1)2.\mu^{k+1}_{j}=-\frac{{\rm NewA}r_{j}(x_{j}^{k+1})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k+1})\|^{2}}.

In summary, in each iteration in which (18) determines the new iterate, we update

(33) μjk+1={μ^jkNewArj(xjk)Tν^jkNewArj(xjk)2j𝒟(xk+1)𝒟(xk)NewArj(xjk+1)Tν^jkNewArj(xjk+1)2j𝒟(xk+1)𝒟(xk)0otherwise.\mu^{k+1}_{j}=\begin{cases}\hat{\mu}_{j}^{k}-\frac{{\rm NewA}r_{j}(x_{j}^{k})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k})\|^{2}}&j\in\mathcal{D}(x^{k+1})\cap\mathcal{D}(x^{k})\\[12.91663pt] -\frac{{\rm NewA}r_{j}(x_{j}^{k+1})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k+1})\|^{2}}&j\in\mathcal{D}(x^{k+1})\setminus\mathcal{D}(x^{k})\\ 0&\text{otherwise.}\end{cases}

The choice above leads to quadratic convergence for non-degenerate instances, but it is common for the global convergence analysis of SQP methods to permit any positive semi-definite Hessian matrix HkH^{k}, as long as it is bounded. Since we were not able to exclude the case that μjk\mu_{j}^{k} or 1/xj0k1/x_{j0}^{k} are unbounded for some cone j𝒥j\in\mathcal{J}, in which case HkH^{k} defined in (30) is unbounded, we fix a large threshold cH>0c_{H}>0 and rescale the Hessian matrix according to

(34) HkHkmin{1,cH/Hk}H_{k}\leftarrow H_{k}\cdot\min\{1,c_{H}/\|H_{k}\|\}

so that HkcH\|H_{k}\|\leq c_{H} in every iteration. In this way, global convergence is guaranteed, but the fast local convergence rate might be impaired if cHc_{H} is chosen too small so that HkH^{k} defined in (30) must be rescaled. Therefore, in practice, we set cHc_{H} to a very large number and (34) is never actually triggered in our numerical experiments.

3.6 Penalty function

The steps computed from (18) and (27) do not necessarily yield a convergent algorithm and a safeguard is required to force the iterates into a neighborhood of an optimal solution. Here, we utilized the exact penalty function (8) and accept a new iterate only if the sufficient decrease condition (11) holds.

As discussed in Section 3.4, at the beginning of an iteration, the algorithm first computes an NLP-SQP step dS,kd^{S,k} from (27). The penalty function can now help us to decide whether this step makes sufficient progress towards an optimal solution, and we only accept the trial point x^k+1=xk+dS,k\hat{x}^{k+1}=x^{k}+d^{S,k} as a new iterate if (11) holds with d=dS,kd=d^{S,k}.

If the penalty function does not accept dS,kd^{S,k}, there is still a chance that dS,kd^{S,k} is making rapid progress towards the solution, but, as discussed in Section 2.2, the Maratos effect is preventing the acceptance of dS,kd^{S,k}. As a remedy, we compute, analogously to (12), a second-order correction step sks^{k} for (27) as a solution of

(35) minsn\displaystyle\min_{s\in\mathbb{R}^{n}} cT(dS,k+s)+12(dS,k+s)THk(dS,k+s)\displaystyle c^{T}(d^{S,k}+s)+\tfrac{1}{2}(d^{S,k}+s)^{T}H^{k}(d^{S,k}+s)
s.t. A(xk+dS,k+s)b,\displaystyle A(x^{k}+d^{S,k}+s)\leq b,
rj(xjk+djS,k)+NewArj(xjk+djS,k)Tsj0,j𝒟(xk),\displaystyle r_{j}(x_{j}^{k}+d_{j}^{S,k})+{\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{S,k})^{T}s_{j}\leq 0,\qquad j\in\mathcal{D}(x^{k}),
xj0k+dj0S,k+sj00,j𝒟(xk)^k,\displaystyle x_{j0}^{k}+d_{j0}^{S,k}+s_{j0}\geq 0,\hskip 101.00737ptj\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k},
xjk+djS,k+sj𝒞j(𝒴jk),j^k,\displaystyle x_{j}^{k}+d^{S,k}_{j}+s_{j}\in\mathcal{C}_{j}(\mathcal{Y}^{k}_{j}),\hskip 85.35826ptj\in\hat{\mathcal{E}}^{k},

and accept the trial point x^k+1=xk+dS,k+sk\hat{x}^{k+1}=x^{k}+d^{S,k}+s^{k} if it satisfies (11) with d=dS,kd=d^{S,k}. Let again λ^k\hat{\lambda}^{k}, μ^jk\hat{\mu}_{j}^{k}, η^jk\hat{\eta}^{k}_{j}, and ν^jk\hat{\nu}^{k}_{j} denote the optimal multipliers in (35) and define z^k\hat{z}^{k} as in (23). The optimality conditions (24) still hold, this time with dk=dS,k+skd^{k}=d^{S,k}+s^{k} and

(36) 𝒴^jk:={{xjk+djS,k}, if j𝒟(xk)^k,𝒴jk{xjk+djS,k}, if j𝒟(xk)^k,𝒴jk, if j^k.\hat{\mathcal{Y}}_{j}^{k}:=\begin{cases}\{x^{k}_{j}+d_{j}^{S,k}\},&\text{ if }j\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k},\\ \mathcal{Y}_{j}^{k}\cup\{x^{k}_{j}+d_{j}^{S,k}\},&\text{ if }j\in\mathcal{D}(x^{k})\cap\hat{\mathcal{E}}^{k},\\ \mathcal{Y}_{j}^{k},&\text{ if }j\in\hat{\mathcal{E}}^{k}.\end{cases}

If neither dS,kd^{S,k} nor dS,k+skd^{S,k}+s^{k} has been accepted, we give up on fast NLP-SQP steps and instead revert to QP (18) which safely approximates every cone with an outer approximation. However, the trial point x^k+1=xk+dk\hat{x}^{k+1}=x^{k}+d^{k} with the step dkd^{k} obtained from (18) does not necessarily satisfy (11). In that case, the algorithm adds xk+dkx^{k}+d^{k} to 𝒴jk\mathcal{Y}_{j}^{k} to cut off xk+dkx^{k}+d^{k} and resolves (18) to get a new trial step dkd^{k}. In an inner loop (Steps 2232), this procedure is repeated until, eventually, a trial step is obtained that satisfies (11). We will show that (11) holds after a finite number of iterations of the inner loop.

It remains to discuss the update of the penalty parameter estimate ρk\rho^{k}. One can show (see Lemma 4.1) that an optimal solution of xx^{*} of the SOCP with multipliers zz^{*} is a minimizer of ϕ(,ρ)\phi(\;\cdot,\rho) over the set XX defined in (9) if ρ>z𝒥,0\rho>\|z^{*}_{\mathcal{J},0}\|_{\infty}, where z𝒥,0=(z1,0,,zp,0)Tz^{*}_{\mathcal{J},0}=(z^{*}_{1,0},\ldots,z^{*}_{p,0})^{T}. Since zz^{*} is not known a priori, the algorithm uses the update rule ρk=ρnew(ρk1,zk)\rho^{k}=\rho_{\text{new}}(\rho^{k-1},z^{k}) where

(37) ρnew(ρold,z):={ρold if ρold>z𝒥,0cincz𝒥,0 otherwise,\rho_{\text{new}}(\rho_{\text{old}},z):=\begin{cases}\rho_{\text{old}}&\text{ if }\rho_{\text{old}}>\|z_{\mathcal{J},0}\|_{\infty}\\ c_{\text{inc}}\cdot\|z_{\mathcal{J},0}\|_{\infty}&\text{ otherwise,}\end{cases}

with cinc>1c_{\text{inc}}>1. We will prove in Lemma 4.8 that the sequence {zk}k=1\{z^{k}\}_{k=1}^{\infty} is bounded under Slater’s constraint qualification. Therefore, this rule will eventually settle at a final penalty parameter ρ\rho^{\infty} that is not changed after a finite number of iterations.

During an iteration of the algorithm, several trial steps may be considered and a preliminary parameter value is computed from (37) for each one. At the end of the iteration, the parameter value corresponding to the accepted trial step is stored. Note that the acceptance test for the second-order correction step from (35) needs to be done with the penalty parameter computed for the regular NLP-SQP step from (27).

3.7 Complete algorithm

Algorithm 3 SQP Algorithm for SOCP.
1:Initial iterate x0Xx^{0}\in X with xj,00x_{j,0}\geq 0, multipliers μj0+\mu_{j}^{0}\in\mathbb{R}_{+}, penalty parameter ρ1>0\rho^{-1}>0; constants cdec(0,1)c_{\text{dec}}\in(0,1), cinc>1c_{\text{inc}}>1, and cH>0c_{H}>0.
2:Initialize 𝒴j0\mathcal{Y}_{j}^{0} so that (20) is satisfied.
3:for k=0,1,2,k=0,1,2,\ldots do
4:     Compute HkH^{k} using (30). Rescale according to (34) if Hk>cH\|H^{k}\|>c_{H}.
5:     Set ^k(xk)\hat{\mathcal{E}}^{k}\leftarrow\mathcal{E}(x^{k}).
6:     Compute dS,kd^{S,k}, λ^k\hat{\lambda}^{k}, μ^k\hat{\mu}^{k}, z^k\hat{z}^{k} from (27) and (23) and set x^k+1=xk+dS,k\hat{x}^{k+1}=x^{k}+d^{S,k}.
7:     while {j𝒥:xj0k+dj0S,k=0}^k\{j\in\mathcal{J}:x_{j0}^{k}+d^{S,k}_{j0}=0\}\not\subseteq\hat{\mathcal{E}}^{k} do
8:         Set ^k^k{j𝒥:xj0k+dj0S,k=0}\hat{\mathcal{E}}^{k}\leftarrow\hat{\mathcal{E}}^{k}\cup\{j\in\mathcal{J}:x_{j0}^{k}+d^{S,k}_{j0}=0\}.
9:         Recompute dS,kd^{S,k}, λ^k\hat{\lambda}^{k}, μ^k\hat{\mu}^{k}, z^k\hat{z}^{k} from (27) and (23) and set x^k+1=xk+dS,k\hat{x}^{k+1}=x^{k}+d^{S,k}.
10:     end while
11:     Compute candidate penalty parameter ρk=ρnew(ρk1,z^k)\rho^{k}=\rho_{\text{new}}(\rho^{k-1},\hat{z}^{k}), see (37).
12:     if (11) holds for d=dS,kd=d^{S,k} then
13:         Set 𝒴jk+1𝒴pr,j+(𝒴jk,xjk)\mathcal{Y}^{k+1}_{j}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k}_{j},x_{j}^{k}) using (19) and set dk=dS,kd^{k}=d^{S,k}.
14:         Set μk+1=μ^k\mu^{k+1}=\hat{\mu}^{k} and go to Step 34.
15:     end if
16:     Compute sks^{k}, λ^k\hat{\lambda}^{k}, μ^k\hat{\mu}^{k}, z^k\hat{z}^{k} from (35) and (23) and set x^k+1=xk+dS,k+sk\hat{x}^{k+1}=x^{k}+d^{S,k}+s^{k}.
17:     if (11) holds for d=dS,kd=d^{S,k} and {j𝒥:xj0k+dj0S,k+sk=0}^k\{j\in\mathcal{J}:x_{j0}^{k}+d^{S,k}_{j0}+s^{k}=0\}\subseteq\hat{\mathcal{E}}^{k} then
18:         Set 𝒴jk+1𝒴pr,j+(𝒴jk,xjk)\mathcal{Y}^{k+1}_{j}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k}_{j},x_{j}^{k}) and dk=dS,kd^{k}=d^{S,k}.
19:         Set μk+1=μ^k\mu^{k+1}=\hat{\mu}^{k} and go to Step 34.
20:     end if
21:     Set 𝒴jk,0𝒴jk\mathcal{Y}_{j}^{k,0}\leftarrow\mathcal{Y}^{k}_{j}.
22:     for l=0,1,2,l=0,1,2,\ldots do
23:         Compute dk,ld^{k,l}, λ^k\hat{\lambda}^{k}, μ^k\hat{\mu}^{k}, z^k\hat{z}^{k} from (18) and (23) and set x^k+1=xk+dk,l\hat{x}^{k+1}=x^{k}+d^{k,l}.
24:         Compute candidate penalty parameter ρk=ρnew(ρk1,z^k)\rho^{k}=\rho_{\text{new}}(\rho^{k-1},\hat{z}^{k}).
25:         if (11) holds for d=dk,ld=d^{k,l} then
26:              Set 𝒴jk+1𝒴pr,j+(𝒴jk,l,xjk)\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k,l}_{j},x_{j}^{k}) and dk=dk,ld^{k}=d^{k,l}.
27:              Go to Step 33.
28:         end if
29:         Set 𝒴jk+1𝒴pr,j+(𝒴jk,l,x^jk+1)\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k,l}_{j},\hat{x}_{j}^{k+1}), see (19).
30:         Compute zˇk=c+ATλ^k\check{z}^{k}=c+A^{T}\hat{\lambda}^{k}.
31:         Update 𝒴jk,l+1𝒴du,j+(𝒴jk,l+1,x^jk+1,zˇjk)\mathcal{Y}_{j}^{k,l+1}\leftarrow\mathcal{Y}^{+}_{du,j}(\mathcal{Y}^{k,l+1}_{j},\hat{x}_{j}^{k+1},\check{z}^{k}_{j}), see (26).
32:     end for
33:     Compute μk+1\mu^{k+1} from (33).
34:     Compute zˇk=c+ATλ^k\check{z}^{k}=c+A^{T}\hat{\lambda}^{k} and update 𝒴jk+1𝒴du,j+(𝒴jk+1,xjk,zˇjk)\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}^{+}_{du,j}(\mathcal{Y}^{k+1}_{j},x_{j}^{k},\check{z}^{k}_{j}).
35:     Set xk+1x^k+1x^{k+1}\leftarrow\hat{x}^{k+1}.
36:     If (xk+1,λ^k,z^k)(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k}) satisfy (13), stop.
37:end for

The complete method is stated in Algorithm 3. To keep the notation concise, we omit “for all j𝒥j\in\mathcal{J}” whenever the index jj is used. We assume that all QPs in the algorithm are solved exactly.

Each iteration begins with the computation of the fast NLP-SQP step where an inner loop repeatedly adds cones to ^k\hat{\mathcal{E}}^{k} until (29) holds. If the step achieves a sufficient decrease in the penalty function, the trial point is accepted. Otherwise, the second-order correction for the NLP-SQP step is computed and accepted if it yields a sufficient decrease for the NLP-SQP step. Note that the second-order correction step is discarded if it does not satisfy (29) since otherwise finite identification of (x)\mathcal{E}(x^{*}) cannot be guaranteed. If none of the NLP-SQP steps was acceptable, the algorithm proceeds with an inner loop in which hyperplanes cutting off the current trial point are repeatedly added until the penalty function is sufficiently decreased. No matter which step is taken, both xjkx_{j}^{k} and zˇjk\check{z}_{j}^{k} are added to 𝒴jk\mathcal{Y}_{j}^{k} according to the update rules (19) and (26) and the multiplier μk\mu^{k} for the nonlinear constraints is updated.

In most cases, a new QP is obtained by adding only a few constraints to the most recently solved QP, and a hot-started QP solver will typically compute the new solution quickly. For example, in each inner iteration in Steps 710, hyperplanes for the polyhedral outer approximation for cones augmenting ^k\hat{\mathcal{E}}^{k} are added to QP (27). Similarly, each inner iteration in Steps 2232 adds one cutting plane for a violated cone constraint. In Steps 6 and 16, some constraints are removed compared to the most recently solved QP, but also this structure could be utilized.

The algorithm might terminate because one of QPs solved for the step computation is infeasible. Since the feasible regions of the QP are outer approximations of the SOCP (1), this proves that the SOCP instance is infeasible; see also Remark 4.13.

4 Convergence analysis

4.1 Global convergence

In this section, we prove that, under a standard regularity assumption, all limit points of the sequence of iterates are optimal solutions of the SOCP, if the algorithm does not terminate with an optimal solution in Step 36. We also explore what happens when the SOCP is infeasible.

We make the following assumption throughout this section.

Assumption 3.

The set XX defined in (9) is bounded.

Since x0Xx^{0}\in X by the initialization of Algorithm 3 and any step satisfies (21b), we have xkXx^{k}\in X for all kk. Similarly, (24c) and (14) imply that

(38) xj0k0 for all k0 and j𝒥.x^{k}_{j0}\geq 0\text{ for all }k\geq 0\text{ and }j\in\mathcal{J}.

We start the analysis with some technical results that quantify the decrease in the penalty function model.

Lemma 4.1.

Consider an iteration kk and let dkd^{k} be computed in Step 6 or Step 23 in Algorithm 3. Further let ρk>ρmink\rho^{k}>\rho_{\min}^{k}, where ρmink=z^𝒥,0k\rho_{\min}^{k}=\|\hat{z}_{\mathcal{J},0}^{k}\|_{\infty} with z^k\hat{z}^{k} defined in (23). Then the following statements are true.

  1. (i)

    We have

    mk(xk+dk;ρk)mk(xk;ρk)(dk)THkdk(ρkρmink)j𝒥[rj(xjk)]+0.m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})\leq-(d^{k})^{T}H^{k}d^{k}-(\rho^{k}-\rho_{\min}^{k})\sum_{j\in\mathcal{J}}[r_{j}(x^{k}_{j})]^{+}\leq 0.
  2. (ii)

    If xkx^{k} is not an optimal solution of the SOCP, then

    (39) mk(xk+dk;ρk)mk(xk;ρk)<0.m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})<0.

Proof 4.2.

Proof of (i): Consider any j𝒟(xk)j\in\mathcal{D}(x^{k}). Because dkd^{k} is a solution of (18) or (27), there exist λ^k\hat{\lambda}^{k} and z^k\hat{z}^{k} so that the optimality conditions (24) hold. Let j𝒥j\in\mathcal{J}. Since z^jk𝒞(𝒴^jk)\hat{z}_{j}^{k}\in\mathcal{C}^{\circ}(\hat{\mathcal{Y}}_{j}^{k}), the definition (17) implies that

z^jk=l=1mjkσ^l,jkNewArj(yj,lk)+η^jkej0,\hat{z}^{k}_{j}=-\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}{\rm NewA}r_{j}(y_{j,l}^{k})+\hat{\eta}^{k}_{j}e_{j0},

where 𝒴^jk={yj,1k,,yj,mjkk}\hat{\mathcal{Y}}^{k}_{j}=\left\{y^{k}_{j,1},\ldots,y^{k}_{j,m_{j}^{k}}\right\} and σ^l,jk,η^jk+\hat{\sigma}^{k}_{l,j},\hat{\eta}^{k}_{j}\in\mathbb{R}_{+}.

Using (15) we have z^j0k=l=1mjkσ^l,jk+η^jkl=1mjkσ^l,jk.\hat{z}_{j0}^{k}=\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}+\hat{\eta}^{k}_{j}\geq\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}. Together with (xjk+djk)Tz^jk=0(x_{j}^{k}+d^{k}_{j})^{T}\hat{z}_{j}^{k}=0 from (24c) and (x¯jk)Ty¯l,jkx¯jky¯l,jk(\bar{x}^{k}_{j})^{T}\bar{y}^{k}_{l,j}\leq\|\bar{x}^{k}_{j}\|\cdot\|\bar{y}^{k}_{l,j}\| this overall yields

(djk)Tz^jk\displaystyle-(d_{j}^{k})^{T}\hat{z}_{j}^{k} =(xjk)Tz^jk=xj0kz^j0kl=1mjkσ^l,jk(x¯jk)Ty¯l,jky¯l,jkxj0kz^j0kl=1mjkσ^l,jkx¯jk\displaystyle=(x_{j}^{k})^{T}\hat{z}_{j}^{k}=x^{k}_{j0}\hat{z}_{j0}^{k}-\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}(\bar{x}^{k}_{j})^{T}\frac{\bar{y}^{k}_{l,j}}{\|\bar{y}^{k}_{l,j}\|}\geq x^{k}_{j0}\hat{z}_{j0}^{k}-\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}\|\bar{x}_{j}^{k}\|
xj0kz^j0kz^j0kx¯jk=zj0krj(xjk)zj0k[rj(xjk)]+.\displaystyle\geq x^{k}_{j0}\hat{z}_{j0}^{k}-\hat{z}^{k}_{j0}\|\bar{x}_{j}^{k}\|=-z^{k}_{j0}r_{j}(x_{j}^{k})\geq-z^{k}_{j0}[r_{j}(x_{j}^{k})]^{+}.

Further, we have from (24b) that 0=(Axk+Adkb)Tλ^k0=(Ax^{k}+Ad^{k}-b)^{T}\hat{\lambda}^{k} and therefore (dk)TATλ^k=(Axkb)Tλ^k0(d^{k})^{T}A^{T}\hat{\lambda}^{k}=-(Ax^{k}-b)^{T}\hat{\lambda}^{k}\geq 0 since λ^k0\hat{\lambda}^{k}\geq 0 and xkXx^{k}\in X.

Using these inequalities and (24a), the choice of ρmink\rho^{k}_{\min} yields

0\displaystyle 0 =(dk)T(c+Hkdk+ATλ^kz^k)\displaystyle=(d^{k})^{T}\left(c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}-\hat{z}^{k}\right)
cTdk+(dk)THkdkj𝒥z^j0k[rj(xjk)]+\displaystyle\geq c^{T}d^{k}+(d^{k})^{T}H^{k}d^{k}-\sum_{j\in\mathcal{J}}\hat{z}^{k}_{j0}[r_{j}(x_{j}^{k})]^{+}
cTdk+(dk)THkdkρminkj𝒥[rj(xjk)]+.\displaystyle\geq c^{T}d^{k}+(d^{k})^{T}H^{k}d^{k}-\rho^{k}_{\min}\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}.

Finally, combining this with (10) and (18c) or (27c), respectively, we obtain

mk(xk+dk;ρk)mk(xk;ρk)\displaystyle m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k}) =cTdkρkj𝒟(xk)[rj(xjk)]+\displaystyle=c^{T}d^{k}-\rho^{k}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x_{j}^{k})]^{+}
=cTdkρkj𝒥[rj(xjk)]+\displaystyle=c^{T}d^{k}-\rho^{k}\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}
(dk)THkdk(ρkρmink)j𝒥[rj(xjk)]+.\displaystyle\leq-(d^{k})^{T}H^{k}d^{k}-(\rho^{k}-\rho^{k}_{\min})\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}.

For the second equality, we used that rj(xjk)=0xj0k0r_{j}(x_{j}^{k})=0-x^{k}_{j0}\leq 0 for j𝒟(xk)j\not\in\mathcal{D}(x^{k}) by (38) and the definition of 𝒟(xk)\mathcal{D}(x^{k}). Since HkH^{k} is positive semi-definite, ρk>ρmink\rho^{k}>\rho^{k}_{\min}, and [rj(xjk)]+0[r_{j}(x^{k}_{j})]^{+}\geq 0, the right-hand side must be non-positive.

Proof of (39): Suppose xkXx^{k}\in X is not an optimal solution for the SOCP. If xkx^{k} is not feasible for the SOCP, xkx^{k} must violate a conic constraint and we have [rj(xjk)]+>0[r_{j}(x_{j}^{k})]^{+}>0 for some j𝒥j\in\mathcal{J}. Since HkH^{k} is positive semidefinite and ρkρmink>0\rho^{k}-\rho^{k}_{\min}>0, part (i) yields (39).

It remains to consider the case when xkx^{k} is feasible for the SOCP, i.e., [rj(xjk)]+=0[r_{j}(x^{k}_{j})]^{+}=0 for all jj. To derive a contradiction, suppose that (39) does not hold. Then part (i) yields

0\displaystyle 0 =mk(xk+dk;ρk)mk(xk;ρk)\displaystyle=m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})
=(dk)THkdk(ρkρmink)j𝒥[rj(xjk)]+=(dk)THkdk0.\displaystyle=-(d^{k})^{T}H^{k}d^{k}-(\rho^{k}-\rho^{k}_{\min})\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}=-(d^{k})^{T}H^{k}d^{k}\leq 0.

Because HkH^{k} is positive semi-definite, this implies Hkdk=0H^{k}d^{k}=0. Further, since also

0=mk(xk+dk;ρk)mk(xk;ρk)=(10)cTdkρkj𝒟(xk)[rj(xjk)]+=cTdk,0=m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})\stackrel{{\scriptstyle\eqref{eq:merit_model}}}{{=}}c^{T}d^{k}-\rho^{k}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})]^{+}=c^{T}d^{k},

the optimal objective value of (18) or (27), respectively, is zero. At the same time, choosing dk=0d^{k}=0 is also feasible for (18) or (27) and yields the same objective value. Therefore, also dk=0d^{k}=0 is an optimal solution of (18) or (27) and the optimality conditions (24) hold for some multipliers. Because 𝒞j(𝒴^kk)𝒦j\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{k}^{k})\subseteq\mathcal{K}_{j}, the same multipliers and dk=0d^{k}=0 show that the optimality conditions of the SOCP (13) also hold. So, xkx^{k} is an optimal solution for the SOCP, contradicting the assumption.

The following lemma shows that the algorithm is well-defined and will not stay in an infinite loop in Steps 2232.

Lemma 4.3.

Consider an iteration kk and let dkd^{k} be computed in Step 6 or Step 23 in Algorithm 3. Suppose that xkx^{k} is not an optimal solution of the SOCP. Then

(40) φ(xk+dk,l;ρk)φ(xk;ρk)cdec(mk(xk+dk,l;ρk)mk(xk;ρk))\varphi(x^{k}+d^{k,l};\rho^{k})-\varphi(x^{k};\rho^{k})\leq c_{\text{dec}}\Big{(}m^{k}(x^{k}+d^{k,l};\rho^{k})-m^{k}(x^{k};\rho^{k})\Big{)}

after a finite number of iterations in the inner loop in Steps 2232.

Proof 4.4.

Suppose the claim is not true and let {dk,l}l=0\{d^{k,l}\}_{l=0}^{\infty} be the infinite sequence of trial steps generated in the loop in Steps 2232 for which the stopping condition in Step 25 is never satisfied, and let dk,d^{k,\infty} be a limit point of {dk,l}l=0\{d^{k,l}\}_{l=0}^{\infty}. We will first show that

(41) [rj(xjk+djk,)]+=0 for all j𝒥.[r_{j}(x_{j}^{k}+d_{j}^{k,\infty})]^{+}=0\text{ for all }j\in\mathcal{J}.

Let us first consider the case when x¯jk+d¯jk,=0\bar{x}_{j}^{k}+\bar{d}_{j}^{k,\infty}=0 for some j𝒥j\in\mathcal{J}. Then rj(xjk+djk,)=x¯jk+d¯jk,(xj0k+dj0k,)=(xj0k+dj0k,)0r_{j}(x_{j}^{k}+d_{j}^{k,\infty})=\|\bar{x}_{j}^{k}+\bar{d}_{j}^{k,\infty}\|-(x_{j0}^{k}+d_{j0}^{k,\infty})=-(x_{j0}^{k}+d_{j0}^{k,\infty})\leq 0 and (41) holds.

Now consider the case that x¯jk+d¯jk,0\bar{x}_{j}^{k}+\bar{d}_{j}^{k,\infty}\neq 0 for j𝒥j\in\mathcal{J}. Since dk,d^{k,\infty} is a limit point of {dk,l}l=0\{d^{k,l}\}_{l=0}^{\infty}, there exists a subsequence {dk,lt}t=0\{d^{k,l_{t}}\}_{t=0}^{\infty} that converges to dk,d^{k,\infty}. We may assume without loss of generality that x¯jk+d¯jk,lt0\bar{x}_{j}^{k}+\bar{d}_{j}^{k,l_{t}}\neq 0 for all tt. Then, for any tt, by Step 31, xjk+djk,lt𝒴jk,lt+1x_{j}^{k}+d_{j}^{k,l_{t}}\in\mathcal{Y}_{j}^{k,l_{t+1}}. In the inner iteration lt+1l_{t+1}, the trial step djk,lt+1d_{j}^{k,l_{t+1}} is computed from (18) and satisfies xjk+djk,lt+1𝒞j(𝒴jk,lt)x_{j}^{k}+d_{j}^{k,l_{t+1}}\in\mathcal{C}_{j}(\mathcal{Y}_{j}^{k,l_{t}}), which by definition (14) implies

NewArj(xjk+djk,lt)T(xjk+djk,lt+1)0.{\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k,l_{t}})^{T}(x_{j}^{k}+d_{j}^{k,l_{t+1}})\leq 0.

Taking the limit tt\to\infty and using the fact that NewArj(vj)Tvj=rj(vj){\rm NewA}r_{j}(v_{j})^{T}v_{j}=r_{j}(v_{j}) for any vj𝒦jv_{j}\in\mathcal{K}_{j} yields

rj(xjk+djk,)=NewArj(xjk+djk,)T(xjk+djk,)0,r_{j}(x_{j}^{k}+d_{j}^{k,\infty})={\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k,\infty})^{T}(x_{j}^{k}+d_{j}^{k,\infty})\leq 0,

proving (41). In turn (41) implies that the ratio

φ(xk+dk,l;ρk)φ(xk;ρk)mk(xk+dk,l;ρk)mk(xk;ρk)=cTdk,l+ρk([rj(xjk+djk,l)]+[rj(xjk)]+)cTdk,lρk[rj(xjk)]+\displaystyle\frac{\varphi(x^{k}+d^{k,l};\rho^{k})-\varphi(x^{k};\rho^{k})}{m^{k}(x^{k}+d^{k,l};\rho^{k})-m^{k}(x^{k};\rho^{k})}=\frac{c^{T}d^{k,l}+\rho^{k}([r_{j}(x_{j}^{k}+d_{j}^{k,l})]^{+}-[r_{j}(x_{j}^{k})]^{+})}{c^{T}d^{k,l}-\rho^{k}[r_{j}(x_{j}^{k})]^{+}}

converges to 1. Note that the ratio is well-defined since mk(xk+dk,l;ρk)mk(xk;ρk)<0m^{k}(x^{k}+d^{k,l};\rho^{k})-m^{k}(x^{k};\rho^{k})<0 due to Lemma 4.3(39). It then follows that (40) is true for sufficiently large ll.

Lemma 4.5.

Suppose that there exists ρ>0\rho^{\infty}>0 so that ρk=ρ>0\rho^{k}=\rho^{\infty}>0 for all large kk. Then any limit point of {xk}k=0\{x^{k}\}_{k=0}^{\infty} is an optimal solution of the SOCP (1).

Proof 4.6.

From (11) and the updates in the algorithm, we have that

φ(xk+1;ρ)φ(xKρ;ρ)\displaystyle\varphi(x^{k+1};\rho^{\infty})-\varphi(x^{K_{\rho}};\rho^{\infty}) =t=Kρk(φ(xt+1;ρ)φ(xt;ρ))\displaystyle=\sum_{t=K_{\rho}}^{k}\Big{(}\varphi(x^{t+1};\rho^{\infty})-\varphi(x^{t};\rho^{\infty})\Big{)}
cdect=Kρk(mt(xt+dt;ρ)mt(xt;ρ))\displaystyle\leq c_{\text{dec}}\sum_{t=K_{\rho}}^{k}\Big{(}m^{t}(x^{t}+d^{t};\rho^{\infty})-m^{t}(x^{t};\rho^{\infty})\Big{)}

for kKρk\geq K_{\rho}. Since the SOCP cannot be unbounded below by Assumption 3, the left-hand side is bounded below as kk\to\infty. Lemma 4.1(i) shows that all summands are non-positive and we obtain

(42) limk(mk(xk+dk;ρ)mk(xk;ρ))=0.\lim_{k\to\infty}\Big{(}m^{k}(x^{k}+d^{k};\rho^{\infty})-m^{k}(x^{k};\rho^{\infty})\Big{)}=0.

Using Lemma 4.1(i), we also have

limk((dk)THkdk+(ρρmink)j𝒥[rj(xjk)]+)=0.\lim_{k\to\infty}\biggl{(}(d^{k})^{T}H^{k}d^{k}+(\rho^{\infty}-\rho_{\min}^{k})\sum_{j\in\mathcal{J}}[r_{j}(x^{k}_{j})]^{+}\biggr{)}=0.

Since HkH^{k} is positive semi-definite and ρρminkρρmin>0\rho^{\infty}-\rho^{k}_{\min}\geq\rho^{\infty}-\rho^{\infty}_{\min}>0, this implies that [rj(xjk)]+0[r_{j}(x^{k}_{j})]^{+}\to 0 for all j𝒥j\in\mathcal{J}, i.e., all limit points of {xk}k=0\{x^{k}\}_{k=0}^{\infty} are feasible. This also yields limk(dk)THkdk=0\lim_{k\to\infty}(d^{k})^{T}H^{k}d^{k}=0, and since HkH^{k} is positive semi-definite and uniformly bounded due to (34), we have

(43) limkHkdk=0.\lim_{k\to\infty}H^{k}d^{k}=0.

Using (42) together with (10) and [rj(xjk)]+0[r_{j}(x^{k}_{j})]^{+}\to 0, we obtain

(44) 0=limk(cTdkρj𝒟(xk)[rj(xjk)]+)=limkcTdk.0=\lim_{k\to\infty}\biggl{(}c^{T}d^{k}-\rho^{\infty}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})]^{+}\biggr{)}=\lim_{k\to\infty}c^{T}d^{k}.

Now let xx^{*} be a limit point of {xk}k=0\{x^{k}\}_{k=0}^{\infty}. Since XX is bounded, dkd^{k} is bounded, and consequently there exists a subsequence {kt}t=0\{k_{t}\}_{t=0}^{\infty} of iterates so that xktx^{k_{t}} and dktd^{k_{t}} converge to xx^{*} and dd^{\infty}, respectively, for some limit point dd^{\infty} of dkd^{k}. Define gkt=Hktdktg^{k_{t}}=H^{k_{t}}d^{k_{t}} for all tt. Then, looking at the QP optimality conditions (24), we see that dktd^{k_{t}} is also an optimal solution of the linear optimization problem

(45) mindn\displaystyle\min_{d\in\mathbb{R}^{n}} (c+gkt)Td\displaystyle(c+g^{k_{t}})^{T}d
s.t. A(xkt+d)b,\displaystyle A(x^{k_{t}}+d)\leq b,
xjkt+dj𝒞j(𝒴^jkt),j𝒥.\displaystyle x_{j}^{k_{t}}+d_{j}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k_{t}}_{j}),\qquad j\in\mathcal{J}.

Now suppose, for the purpose of deriving a contradiction, that xx^{*} is not an optimal solution of the SOCP. Since we showed above that xx^{*} is feasible, there then exists a step d~n\tilde{d}^{*}\in\mathbb{R}^{n} so that x~=x+d~\tilde{x}=x^{*}+\tilde{d}^{*} is feasible for (1) and cTd~<0c^{T}\tilde{d}^{*}<0. Then, because 𝒦j𝒞j(𝒴^jkt)\mathcal{K}_{j}\subseteq\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k_{t}}_{j}), for each tt, d~kt=xxkt+d~\tilde{d}^{k_{t}}=x^{*}-x^{k_{t}}+\tilde{d}^{*} is feasible for (45), and because dktd^{k_{t}} is an optimal solution of (45), we have (c+gkt)Tdkt(c+gkt)Td~kt(c+g^{k_{t}})^{T}d^{k_{t}}\leq(c+g^{k_{t}})^{T}\tilde{d}^{k_{t}}. Taking the limit tt\to\infty, we obtain cTdcTd~<0c^{T}d^{\infty}\leq c^{T}\tilde{d}^{*}<0, where we used limtgkt=limtHktdkt=0\lim_{t\to\infty}g^{k_{t}}=\lim_{t\to\infty}H^{k_{t}}d^{k_{t}}=0, due to the definition of gktg^{k_{t}} and (43). However, this contradicts (44). Therefore, xx^{*} must be a solution of the SOCP.

For later reference, we highlight the limit (43) established in the above proof.

Lemma 4.7.

Suppose that there exists ρ>0\rho^{\infty}>0 so that ρk=ρ>0\rho^{k}=\rho^{\infty}>0 for all large kk. Then limkHkdk=0\lim_{k\to\infty}H^{k}d^{k}=0.

We are now ready to prove that the algorithm is globally convergent under the following standard regularity assumption.

Assumption 4.

The SOCP is feasible and Slater’s constraint qualification holds, i.e., there exists a feasible point x~n\tilde{x}\in\mathbb{R}^{n} and ϵ>0\epsilon>0 so that x~+v\tilde{x}+v is feasible for any vnv\in\mathbb{R}^{n} with vϵ\|v\|\leq\epsilon.

This assumption implies that the multiplier estimates are bounded.

Lemma 4.8.

Suppose that Assumption 4 holds. Then {z^k}\{\hat{z}^{k}\} is bounded.

Proof 4.9.

Let x~\tilde{x} and ϵ\epsilon be the quantities from Assumption 4. Note that the QP corresponding to the optimality conditions (24) is

mindn\displaystyle\min_{d\in\mathbb{R}^{n}}\ cTd+12dTHkd\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d
s.t. A(xk+d)b,xjk+dj𝒞j(𝒴^jk),j𝒥.\displaystyle A(x^{k}+d)\leq b,\;x_{j}^{k}+d_{j}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k}_{j}),\;j\in\mathcal{J}.

Since xk+1=xk+dkx^{k+1}=x^{k}+d^{k} when dkd^{k} is the step accepted by the algorithm, it follows that xk+1x^{k+1} is an optimal solution of the QP

Oprimal=minxn\displaystyle O_{\rm primal}=\min_{x\in\mathbb{R}^{n}}\ (cTHkxk)x+12xTHkx\displaystyle(c^{T}-H^{k}x^{k})x+\tfrac{1}{2}x^{T}H^{k}x
s.t. Axk+1b,xjk+1𝒞j(𝒴^jk),j𝒥,\displaystyle Ax^{k+1}\leq b,\;x_{j}^{k+1}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k}_{j}),\;j\in\mathcal{J},

the Lagrangian dual of which is

(46a) Odual=maxx,zn,λm\displaystyle O_{\rm dual}=\max_{x,z\in\mathbb{R}^{n},\lambda\in\mathbb{R}^{m}}\ bTλ12xTHkx\displaystyle-b^{T}\lambda-\tfrac{1}{2}x^{T}H^{k}x
(46b) s.t. cHkxk+Hkx+ATλz=0,\displaystyle c-H^{k}x^{k}+H^{k}x+A^{T}\lambda-z=0,
(46c) z𝒞j(𝒴^jk),j𝒥,λ0.\displaystyle z\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}^{k}_{j}),\;j\in\mathcal{J},\quad\lambda\geq 0.

By (24), (xk+1,λ^k(x^{k+1},\hat{\lambda}^{k},z^k\hat{z}^{k}) is a primal-dual optimal solution of these QPs.

Define v=ϵz^kz^kv=-\epsilon\frac{\hat{z}^{k}}{\|\hat{z}^{k}\|}. Then vϵ\|v\|\leq\epsilon, and Assumption 4 implies that x~+v𝒦j𝒞j(𝒴^jk)\tilde{x}+v\in\mathcal{K}_{j}\subseteq\mathcal{C}_{j}(\hat{\mathcal{Y}}_{j}^{k}). Since z^k𝒞j(𝒴^jk)\hat{z}^{k}\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k}), we have with (46b) that

(47) 0(x~+v)Tz^k=vTz^k+x~T(cHkx~+Hkxk+1+ATλ^k).0\leq(\tilde{x}+v)^{T}\hat{z}^{k}=v^{T}\hat{z}^{k}+\tilde{x}^{T}(c-H^{k}\tilde{x}+H^{k}x^{k+1}+A^{T}\hat{\lambda}^{k}).

Since HkH^{k} is positive definite, it is

(48) 0(x~xk+1)THk(x~xk+1)=x~THkx~2x~THkxk+1+(xk+1)THkxk+1.0\leq(\tilde{x}-x^{k+1})^{T}H^{k}(\tilde{x}-x^{k+1})=\tilde{x}^{T}H^{k}\tilde{x}-2\tilde{x}^{T}H^{k}x^{k+1}+(x^{k+1})^{T}H^{k}x^{k+1}.

Furthermore, Slater’s condition implies strong duality, that is

bTλ^k+12(xk+1)THkxk+1\displaystyle b^{T}\hat{\lambda}^{k}+\frac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1} =Odual=Oprimal\displaystyle=-O_{\rm dual}=-O_{\rm primal}
(49) =(cHkxk)Txk+112(xk+1)THkxk+1.\displaystyle=-(c-H^{k}x^{k})^{T}x^{k+1}-\tfrac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1}.

Finally, since x~\tilde{x} is feasible for the SOCP, (1b) and λ^k0\hat{\lambda}^{k}\geq 0 imply x~TATλ^kbTλ^k\tilde{x}^{T}A^{T}\hat{\lambda}^{k}\leq b^{T}\hat{\lambda}^{k}. Subtracting vTz^kv^{T}\hat{z}^{k} on both sides of (47), this, together with (48) and (49), yields

ϵz^k\displaystyle\epsilon\|\hat{z}^{k}\| x~Tc12x~THkx~+12(xk+1)THkxk+1+bTλ^k\displaystyle\leq\tilde{x}^{T}c-\frac{1}{2}\tilde{x}^{T}H^{k}\tilde{x}+\frac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1}+b^{T}\hat{\lambda}^{k}
=x~Tc12x~THkx~cTxk+112(xk+1)THkxk+1.\displaystyle=\tilde{x}^{T}c-\frac{1}{2}\tilde{x}^{T}H^{k}\tilde{x}-c^{T}x^{k+1}-\frac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1}.

The first two terms are independent of kk, and since XX is bounded by Assumption 4 and HkH^{k} is uniformly bounded by (34), we can conclude that z^k\hat{z}^{k} is uniformly bounded.

It is easy to see that the penalty parameter update rule (37) and Lemma 4.8 imply the following result.

Lemma 4.10.

Suppose Assumption 4 holds. Then there exists ρ\rho^{\infty} and KρK_{\rho} so that ρk=ρ>ρmin\rho^{k}=\rho^{\infty}>\rho^{\infty}_{\min}, where ρminρmink=z𝒥,0k\rho^{\infty}_{\min}\geq\rho^{k}_{\min}=\|z^{k}_{\mathcal{J},0}\|_{\infty} for all kKρk\geq K_{\rho}.

We can now state the main convergence theorem of this section.

Theorem 4.11.

Suppose that Assumptions 3 and 4 hold. Then Algorithm 3 either terminates in Step 36 with an optimal solution, or it generates an infinite sequence of iterates {(xk+1,λ^k,z^k)}k=0\{(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})\}_{k=0}^{\infty}, each limit point of which is a primal-dual solution of the SOCP (1).

Proof 4.12.

Let {(xkt+1,λ^kt,z^kt)}\{(x^{k_{t}+1},\hat{\lambda}^{k_{t}},\hat{z}^{k_{t}})\} be a subsequence converging to a limit point (x,λ,z)(x^{*},\lambda^{*},z^{*}). No matter whether an iterate is computed from the optimal solution of (18), (27), or (35), the iterates satisfy the optimality conditions (24). In particular, from (24c) we have for any j𝒥j\in\mathcal{J} that z^jkt𝒞j(𝒴^jkt)𝒦j\hat{z}_{j}^{k_{t}}\in\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}_{j}^{k_{t}})\subseteq\mathcal{K}_{j} and (xjkt+1)Tz^jkt=0(x_{j}^{k_{t}+1})^{T}\hat{z}_{j}^{k_{t}}=0. In the limit, we obtain zj𝒦jz_{j}^{*}\in\mathcal{K}_{j} (since 𝒦j\mathcal{K}_{j} is closed) and (xj)Tzj=0(x_{j}^{*})^{T}z_{j}^{*}=0. Lemma 4.10 yields that ρk=ρ\rho^{k}=\rho^{\infty} for all large kk, and so Lemma 4.5 implies that xx^{*} is feasible, i.e., xj𝒦jx_{j}^{*}\in\mathcal{K}_{j}. Therefore, (13c) holds. Using Lemma 4.7 we can take the limit in (24a) and (24b) and deduce also the remaining SOCP optimality conditions (13a) and (13b) hold at the limit point.

Remark 4.13.

In case the SOCP is infeasible, we have two possible outcomes. Either, Algorithm 3 terminates in some iterations because one of the QPs is infeasible, or limkρk=\lim_{k\to\infty}\rho^{k}=\infty (reverse conclusion of Lemma 4.5).

4.2 Identification of extremal-active cones

We can only expect fast local convergence under some non-degeneracy assumptions. Throughout this section, we assume that Assumption 2 holds. Under this assumption, (x,λ,z)(x^{*},\lambda^{*},z^{*}) is the unique optimal solution [1, Theorem 22], and Theorem 4.11 then implies that

limk(xk+1,λ^k,z^k)=(x,λ,z).\lim_{k\to\infty}(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})=(x^{*},\lambda^{*},z^{*}).

First, we prove a technical result that describes elements in 𝒞j(𝒴j)\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j}) in a compact manner. For this characterization to hold, condition (20) for the initialization 𝒴j0\mathcal{Y}_{j}^{0} of the set of hyperplane-generating points is crucial.

Lemma 4.14.

Let yjnjy_{j}\in\mathbb{R}^{n_{j}} with y¯j0\bar{y}_{j}\neq 0 and yj00y_{j0}\geq 0. Further, let Φj(zj,yj):=zj0z¯j+y¯j1y¯j\Phi_{j}(z_{j},y_{j}):=z_{j0}-\|\bar{z}_{j}+\bar{y}_{j}\|_{1}-\|\bar{y}_{j}\|. Then the following statements hold for zj,yjnjz_{j},y_{j}\in\mathbb{R}^{n_{j}}:

  1. (i)

    zj𝒞j(𝒴j0{yj})z_{j}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\}) if Φj(zj,yj)0\Phi_{j}(z_{j},y_{j})\geq 0.

  2. (ii)

    zjint(𝒞j(𝒴j0{yj}))z_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\})) if Φj(zj,yj)>0\Phi_{j}(z_{j},y_{j})>0.

Proof 4.15.

For (i): Suppose Φj(zj,yj)0\Phi_{j}(z_{j},y_{j})\geq 0, then

zj0z¯j+y¯j1+y¯j.z_{j0}\geq\|\bar{z}_{j}+\bar{y}_{j}\|_{1}+\|\bar{y}_{j}\|.

Define s¯j=z¯j+y¯j\bar{s}_{j}=\bar{z}_{j}+\bar{y}_{j} and choose σj++nj1\sigma_{j}^{+}\in\mathbb{R}_{+}^{n_{j}-1} and σj+nj1\sigma_{j}^{-}\in\mathbb{R}_{+}^{n_{j}-1} so that s¯j=σj+σj\bar{s}_{j}=\sigma_{j}^{+}-\sigma_{j}^{-} and |sji|=σji++σji|s_{ji}|=\sigma_{ji}^{+}+\sigma_{ji}^{-} for all i=1,,nj1i=1,\ldots,n_{j}-1. Then we have

zj0\displaystyle z_{j0} =i=1nj1σji++i=1nj1σji+σj+ηj\displaystyle=\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{+}+\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{-}+\sigma_{j}+\eta_{j}
z¯j\displaystyle\bar{z}_{j} =s¯jy¯j=σj+σjσjy¯jy¯j\displaystyle=\bar{s}_{j}-\bar{y}_{j}=\sigma_{j}^{+}-\sigma_{j}^{-}-\sigma_{j}\frac{\bar{y}_{j}}{\|\bar{y}_{j}\|}

with σj=y¯j\sigma_{j}=\|\bar{y}_{j}\| and some ηj+\eta_{j}\in\mathbb{R}_{+}. Using (15), this can be rewritten as

zj=i=1nj1σji+NewArj(eji)i=1nj1σjiNewArj(eji)σjNewArj(yj)+ηjej0.z_{j}=-\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{+}{\rm NewA}r_{j}(-e_{ji})-\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{-}{\rm NewA}r_{j}(e_{ji})-\sigma_{j}{\rm NewA}r_{j}(y_{j})+\eta_{j}e_{j0}.

By the definition of 𝒞j\mathcal{C}_{j}^{\circ} in (17), this implies that zj𝒞j(𝒴^j0{yj})z_{j}\in\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{0}_{j}\cup\{y_{j}\}) where 𝒴^j0\hat{\mathcal{Y}}_{j}^{0} is defined in (20). Since 𝒴^j0𝒴j0\hat{\mathcal{Y}}_{j}^{0}\subseteq\mathcal{Y}_{j}^{0} from (20), we have 𝒞j(𝒴^j0{yj})𝒞j(𝒴j0{yj})\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{0}_{j}\cup\{y_{j}\})\subseteq\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\}), and the claim follows.

For (ii): Suppose Φj(zj,yj)>0\Phi_{j}(z_{j},y_{j})>0. Because Φj\Phi_{j} is a continuous function, there exists a neighborhood Nϵ(zj)N_{\epsilon}(z_{j}) around zjz_{j} so that Φj(z^j,yj)>0\Phi_{j}(\hat{z}_{j},y_{j})>0 for all z^jNϵ(zj)\hat{z}_{j}\in N_{\epsilon}(z_{j}). From part (i) we then have Nϵ(zj)𝒞j(𝒴j0{yj})N_{\epsilon}(z_{j})\subseteq\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\}), and consequently zjint(𝒞j(𝒴j0{yj}))z_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\})).

Theorem 4.16.

For all kk sufficiently large, we have (xk)=(x)\mathcal{E}(x^{k})=\mathcal{E}(x^{*}).

Proof 4.17.

Choose j(x)j\not\in\mathcal{E}(x^{*}), then xj0x^{*}_{j}\neq 0. Because xjkxjx_{j}^{k}\to x^{*}_{j}, it is xjk0x_{j}^{k}\neq 0 or, equivalently, j(xk)j\not\in\mathcal{E}(x^{k}) for kk sufficiently large. For the remainder of this proof we consider j(x)j\in\mathcal{E}(x^{*}) and show that j(xk)j\in\mathcal{E}(x^{k}) for large kk. Note that strict complementarity in Assumption 2 implies that zjint(𝒦j)z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j}), i.e., rj(zj)<0r_{j}(z_{j}^{*})<0, and consequently zj0>0z_{j0}^{*}>0.

First consider the iterations in which fast NLP-SQP steps are accepted in Steps 12 or 17. For the purpose of deriving a contradiction, suppose there exists an infinite subsequence so that xkt+1=xkt+dS,ktx^{k_{t}+1}=x^{k_{t}}+d^{S,k_{t}} or xkt+1=xkt+dS,kt+sktx^{k_{t}+1}=x^{k_{t}}+d^{S,k_{t}}+s^{k_{t}} and j^ktj\not\in\hat{\mathcal{E}}^{k_{t}}. Then j^ktj\not\in\hat{\mathcal{E}}^{k_{t}} implies xj0kt+1>0x^{k_{t}+1}_{j0}>0 (according to the termination condition in the while loop in Step 7). We also have 𝒴^jkt={xˇjkt}\hat{\mathcal{Y}}_{j}^{k_{t}}=\{\check{x}_{j}^{k_{t}}\} where xˇjkt=xjkt\check{x}_{j}^{k_{t}}=x_{j}^{k_{t}} from (28) or xˇjkt=xjkt+djS,kt\check{x}_{j}^{k_{t}}=x_{j}^{k_{t}}+d_{j}^{S,k_{t}} from (36). Condition (24c) yields zjkt𝒞j({xˇjkt})z_{j}^{k_{t}}\in\mathcal{C}^{\circ}_{j}(\{\check{x}_{j}^{k_{t}}\}), so by (17) it is zjkt=σjNewArj(xˇjkt)+ηjej0z_{j}^{k_{t}}=-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}})+\eta_{j}e_{j0} for some σj,ηj0\sigma_{j},\eta_{j}\geq 0, as well as xjkt+1𝒞j({xˇjkt})x_{j}^{k_{t}+1}\in\mathcal{C}_{j}(\{\check{x}_{j}^{k_{t}}\}), which by (14) implies NewArj(xˇjkt)Txjkt+10{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}})^{T}x_{j}^{k_{t}+1}\leq 0. Then complementarity yields

0=(zjkt)Txjkt+1=σjNewArj(xˇjkt)Txjkt+1+ηjxj0kt+1ηjxj0kt+1.0=(z_{j}^{k_{t}})^{T}x_{j}^{k_{t}+1}=-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}})^{T}x_{j}^{k_{t}+1}+\eta_{j}x^{k_{t}+1}_{j0}\geq\eta_{j}x^{k_{t}+1}_{j0}.

Since xj0kt+1>0x^{k_{t}+1}_{j0}>0 and ηj0\eta_{j}\geq 0, we must have ηj=0\eta_{j}=0, and consequently zjkt=σjNewArj(xˇjkt)z_{j}^{k_{t}}=-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}}). It is easy to see that rj(σjNewArj(xˇjkt))=0r_{j}(-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}}))=0. Since zjktzjz_{j}^{k_{t}}\to z_{j}^{*}, continuity of rjr_{j} yields rj(zj)=0r_{j}(z_{j}^{*})=0, in contradiction to zjint(𝒦j)z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j}). We thus showed that j^kj\in\hat{\mathcal{E}}^{k} for all large iterations kk in which the NLP-SQP step was accepted, and consequently (28) and (36) yield 𝒴^jk=𝒴jk\hat{\mathcal{Y}}_{j}^{k}=\mathcal{Y}_{j}^{k} for such kk.

In all other iterations (22) holds, and overall we obtain

(50) 𝒴j0𝒴jk𝒴^jk for all sufficiently large k.\mathcal{Y}_{j}^{0}\subseteq\mathcal{Y}_{j}^{k}\subseteq\hat{\mathcal{Y}}_{j}^{k}\text{ for all sufficiently large }k.

Let us first consider the case when z¯j=0\bar{z}^{*}_{j}=0. Then z¯jzj0=rj(zj)<0\|\bar{z}_{j}^{*}\|-z^{*}_{j0}=r_{j}(z_{j}^{*})<0 yields zj0>0z_{j0}^{*}>0. To apply Lemma 4.14 choose any i{1,,nj1}i\in\{1,\ldots,n_{j}-1\} and let yj=ejiy_{j}=e_{ji}. Then yj1=yj=1\|y_{j}\|_{1}=\|y_{j}\|=1 and Φj(zj,yj)=zj0>0\Phi_{j}(z_{j}^{*},y_{j})=z^{*}_{j0}>0. Since z^jkzj\hat{z}_{j}^{k}\to z_{j}^{*} and Φj\Phi_{j} is continuous, Φj(z^jk,yj)>0\Phi_{j}(\hat{z}_{j}^{k},y_{j})>0 for sufficiently large kk, and by Lemma 4.14, z^jkint(𝒞j(𝒴j0{yj}))\hat{z}^{k}_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\})). Since yj𝒴j0y_{j}\in\mathcal{Y}^{0}_{j} and (50) holds, we also have z^jkint(𝒞j(𝒴^jk))\hat{z}^{k}_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{k}_{j})). General conic complementarity in (24c) then implies that xjk+1=xjk+djk=0x_{j}^{k+1}=x_{j}^{k}+d_{j}^{k}=0 for all large kk, or equivalently, j(xk)j\in\mathcal{E}(x^{k}) for kk sufficiently large, as desired.

Now consider the case z¯j0\bar{z}^{*}_{j}\neq 0. For the purpose of deriving a contradiction, suppose there exists a subsequence {xkt}t=0\{x^{k_{t}}\}_{t=0}^{\infty} so that j(xkt)j\not\in\mathcal{E}(x^{k_{t}}), i.e., xkt0x^{k_{t}}\neq 0, for all tt. Because zˇjkzj\check{z}^{k}_{j}\to z^{*}_{j}, z¯j0\bar{z}^{*}_{j}\neq 0, and rj(zj)<0r_{j}(z_{j}^{*})<0, we may assume without loss of generality that rj(zˇjkt)<0r_{j}(\check{z}_{j}^{k_{t}})<0 and zˇ¯jkt0\bar{\check{z}}_{j}^{k_{t}}\neq 0 for all tt. Using this and xjkt0x_{j}^{k_{t}}\neq 0, we see that the update rule (26) in Step 34 adds zˇjkt-\check{z}^{k_{t}}_{j} to 𝒴jkt+1\mathcal{Y}_{j}^{k_{t}+1}. With (50), we have

(51) zˇjkt𝒴jkt+1𝒴jkt+1𝒴^jkt+1 for all t.-\check{z}^{k_{t}}_{j}\in\mathcal{Y}_{j}^{k_{t}+1}\subseteq\mathcal{Y}_{j}^{k_{t+1}}\subseteq\hat{\mathcal{Y}}_{j}^{k_{t+1}}\text{ for all }t.

Recall the mapping Φj\Phi_{j} defined in Lemma 4.14 and note that Φj(zj,zj)=zj0z¯j=rj(zj)>0\Phi_{j}(z^{*}_{j},-z^{*}_{j})=z^{*}_{j0}-\|\bar{z}^{*}_{j}\|=-r_{j}(z^{*}_{j})>0. Since both z^jk\hat{z}^{k}_{j} and zˇjk\check{z}^{k}_{j} converge to zjz_{j}^{*} and Φj\Phi_{j} is continuous, it is Φj(z^jkt+11,zˇjkt)>0\Phi_{j}(\hat{z}_{j}^{k_{t+1}-1},-\check{z}^{k_{t}}_{j})>0 for all large tt, and therefore, by Lemma 4.14, z^jkt+11int(𝒞j(𝒴j0{zˇjkt)}))(51)int(𝒞j(𝒴^jkt+11))\hat{z}_{j}^{k_{t+1}-1}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{-\check{z}^{k_{t}}_{j})\}))\stackrel{{\scriptstyle\eqref{eq:checkzinY}}}{{\subseteq}}\operatorname{int}(\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{k_{t+1}-1}_{j})) for all large tt. Conic complementarity in (24c) then implies that xjkt+1=xjkt+11+djkt+11=0x_{j}^{k_{t+1}}=x_{j}^{k_{t+1}-1}+d_{j}^{k_{t+1}-1}=0. This is a contradiction of the definition of the subsequence {xkt}t=0\{x^{k_{t}}\}_{t=0}^{\infty}.

Remark 4.18.

In the proof of Theorem 4.16, we saw that Φj(zj,zj)>0\Phi_{j}(z^{*}_{j},-z^{*}_{j})>0 if j(x)j\in\mathcal{E}(x^{*}) and z¯j0\bar{z}_{j}^{*}\neq 0. Since Φj\Phi_{j} is continuous, this implies that there exists a neighborhood Nϵ(zj)N_{\epsilon}(z^{*}_{j}) so that Φj(zj,yj)>0\Phi_{j}(z_{j},-y_{j})>0, and consequently zjint(𝒞j(𝒴j0{yj}))z_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{-y_{j}\})), for all zj,yjNϵ(zj)z_{j},y_{j}\in N_{\epsilon}(z^{*}_{j}).

4.3 Quadratic local convergence

As discussed in Section 2.1, since xx^{*} is a solution of the SOCP (1), it is also a solution of the nonlinear problem (4). We now show that Algorithm 3 eventually generates steps that are identical to SQP steps for (4). Then Theorem 2.3 implies that the iterates converge locally at a quadratic rate.

We first need to establish that the assumptions for Theorem 2.3 hold.

Lemma 4.19.

Suppose that Assumption 2 holds for the SOCP (1). Then Assumption 1 holds for the NLP (4).

Proof 4.20.

Let λ\lambda^{*} and zz^{*} be the optimal multipliers for the SOCP corresponding to xx^{*}, satisfying (13). Assumption 2 implies that λ\lambda^{*} and zz^{*} are unique [1, Theorem 22].

Let j𝒟(x)j\in\mathcal{D}(x^{*}) and define μj=zj00\mu^{*}_{j}=z^{*}_{j0}\geq 0. If 0=rj(xj)=xj0x¯j0=r_{j}(x^{*}_{j})=x^{*}_{j0}-\|\bar{x}^{*}_{j}\|, complementarity (13c) implies, for all i{1,nj}i\in\{1,\ldots n_{j}\}, that 0=xj0zji+xjizj0=x¯jzji+xjizj0,0=x^{*}_{j0}z^{*}_{ji}+x^{*}_{ji}z^{*}_{j0}=\|\bar{x}^{*}_{j}\|z^{*}_{ji}+x^{*}_{ji}z^{*}_{j0},, or equivalently, zji=zj0xjix¯jz^{*}_{ji}=-z^{*}_{j0}\frac{x^{*}_{ji}}{\|\bar{x}^{*}_{j}\|}; see [1, Lemma 15]. Using (15), this can be written as

(52) zj=zj0NewArj(xj)=μjNewArj(xj).z^{*}_{j}=-z^{*}_{j0}{\rm NewA}r_{j}(x^{*}_{j})=-\mu^{*}_{j}{\rm NewA}r_{j}(x^{*}_{j}).

On the other hand, if rj(xj)<0r_{j}(x^{*}_{j})<0, i.e., the constraint (4c) is inactive, then xjint(𝒦j)x^{*}_{j}\in\operatorname{int}(\mathcal{K}_{j}) and complementarity (13c) yields zj=0z_{j}^{*}=0 (see [1, Definition 23]) and therefore μj=0\mu_{j}^{*}=0. Consequently, (52) is also valid in that case. Finally, we define νj=zj\nu^{*}_{j}=z^{*}_{j} for all i(x)i\in\mathcal{E}(x^{*}). With these definitions, (13a) can be restated as

(53) c+ATλ+j𝒟(x)μjNewArj(x)ν=0,c+A^{T}\lambda^{*}+\sum_{j\in\mathcal{D}(x^{*})}\mu^{*}_{j}{\rm NewA}r_{j}(x^{*})-\nu^{*}=0,

where νn\nu^{*}\in\mathbb{R}^{n} is the vector with the values of νj\nu_{j}^{*} at the components corresponding to j(x)j\in\mathcal{E}(x^{*}) and zero otherwise. We now prove parts (i), (ii), and (iii) of Assumption 1.

Proof of (i): Let j𝒟(xj)j\in\mathcal{D}(x^{*}_{j}). We already established that rj(xj)<0r_{j}(x_{j}^{*})<0 yields μj=0\mu^{*}_{j}=0. Now suppose that rj(xj)=0r_{j}(x^{*}_{j})=0. Then xjbd(𝒦j){0}x_{j}^{*}\in\operatorname{bd}(\mathcal{K}_{j})\setminus\{0\}. Since strict complementarity is assumed, we have zjbd(𝒦j){0}z_{j}^{*}\in\operatorname{bd}(\mathcal{K}_{j})\setminus\{0\} (see the comment after Assumption 2), which in turn yields zj0z_{j}^{*}\neq 0 and hence μj0\mu_{j}^{*}\neq 0.

Proof of (ii): Since we need to prove linear independence only of those constraints that are active at xx^{*}, we consider only those rows A𝒜A_{\mathcal{A}} of AA for which (4b) is binding.

Without loss of generality suppose xx^{*} is partitioned into four parts, (x)T=((x)T(x)T(x)T(x)T)(x^{*})^{T}=((x_{\mathcal{B}}^{*})^{T}\ (x_{\mathcal{I}}^{*})^{T}\ (x_{\mathcal{E}}^{*})^{T}\ (x_{\mathcal{F}}^{*})^{T}), where xx_{\mathcal{B}}^{*}, xx_{\mathcal{I}}^{*}, and xx_{\mathcal{E}}^{*} correspond to the variables in the cones ={j𝒥:rj(xj)=0,xj0}\mathcal{B}=\{j\in\mathcal{J}:r_{j}(x^{*}_{j})=0,x_{j}^{*}\neq 0\}, ={j𝒥:rj(xj)<0}\mathcal{I}=\{j\in\mathcal{J}:r_{j}(x^{*}_{j})<0\}, and =(x)\mathcal{E}=\mathcal{E}(x^{*}), respectively, and xx^{*}_{\mathcal{F}} includes all components of xx^{*} that are not in any of the cones. Further suppose that (x)T=((x1)T(xp)T)(x_{\mathcal{B}}^{*})^{T}=((x^{*}_{1})^{T}\ \ldots(x^{*}_{p_{\mathcal{B}}})^{T}), where ={1,,p}\mathcal{B}=\{1,\ldots,p_{\mathcal{B}}\}, and that A𝒜A_{\mathcal{A}} is partitioned in the same way.

Primal non-degeneracy of the SOCP implies all that matrices of the form

([A𝒜]1[A𝒜]p[A𝒜][A𝒜][A𝒜]α1NewAr1(x1)TαpNewArp(xp)T0TvT0T)\begin{pmatrix}[A_{\mathcal{A}}]_{1}&\cdots&[A_{\mathcal{A}}]_{p_{\mathcal{B}}}&[A_{\mathcal{A}}]_{\mathcal{I}}&[A_{\mathcal{A}}]_{\mathcal{E}}&[A_{\mathcal{A}}]_{\mathcal{F}}\\ \alpha_{1}{\rm NewA}r_{1}(x_{1}^{*})^{T}&\cdots&\alpha_{p_{\mathcal{B}}}{\rm NewA}r_{p_{\mathcal{B}}}(x_{p_{\mathcal{B}}}^{*})^{T}&0^{T}&v^{T}&0^{T}\end{pmatrix}

have linear independent rows for all scalars αi\alpha_{i} and vectors vv, not all zero [1, Eq. (50)]. This implies that the rows of A𝒜A_{\mathcal{A}}, together with the gradient of any one of the binding constraints in (4c) and (4d) are linearly independent. Because the constraint gradients, which are of the form NewArj(xj){\rm NewA}r_{j}(x^{*}_{j}) and eije_{ij}, share no nonzero components when extended to the full space, we conclude that the gradients of all active constraints are linearly independent at xx^{*}, i.e., the LICQ holds.

Proof of (iii): For the purpose of deriving a contradiction, suppose that there exists a direction dn{0}d\in\mathbb{R}^{n}\setminus\{0\} that lies in the null space of the constraints of (4) that are binding at xx^{*} and for which dTHd0d^{T}H^{*}d\leq 0.

Since dd is in the null space of the binding constraints, we have A𝒜d=0A_{\mathcal{A}}d=0, NewArj(x)Td=0{\rm NewA}r_{j}(x^{*})^{T}d=0 for jj\in\mathcal{B}, and dj=0d_{j}=0 for all jj\in\mathcal{E}. Premultiplying (53) by dTd^{T} gives

(54) 0=cTd+(λ)TA𝒜d0+jμNewArj(x)Td0+jμ0NewArj(x)Td+(ν)Td0=cTd.0=c^{T}d+(\lambda^{*})^{T}\underbrace{A_{\mathcal{A}}d}_{0}+\sum_{j\in\mathcal{B}}\mu^{*}\underbrace{{\rm NewA}r_{j}(x^{*})^{T}d}_{0}+\sum_{j\in\mathcal{I}}\underbrace{\mu^{*}}_{0}{\rm NewA}r_{j}(x^{*})^{T}d+\underbrace{(\nu^{*})^{T}d}_{0}=c^{T}d.

What remains to show is that dd is a feasible direction for the SOCP, i.e., there exists β>0\beta>0 so that x+βdx^{*}+\beta d is feasible for the SOCP. Because of (54), this point has the same objective value as xx^{*} and is therefore also an optimal solution of the SOCP. This contradicts the fact that Assumption 2 implies that the optimal solution is unique [1, Theorem 22].

By the definition of HH^{*} in Assumption 1 and the choice of dd, we have

0dTHd=j𝒟(x)μjdjTNewA2rj(xj)dj=jμjdjTNewA2rj(xj)dj.0\geq d^{T}H^{*}d=\sum_{j\in\mathcal{D}(x^{*})}\mu^{*}_{j}d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*}_{j})d_{j}=\sum_{j\in\mathcal{B}}\mu^{*}_{j}d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*}_{j})d_{j}.

Since for all jj\in\mathcal{B}, the Hessian NewA2rj(xj){\rm NewA}^{2}r_{j}(x^{*}_{j}) is positive semi-definite and μj>0\mu^{*}_{j}>0 from Part (i), this yields djTNewA2rj(xj)dj=0d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*}_{j})d_{j}=0 for all jj\in\mathcal{B}.

Let jj\in\mathcal{B}. Then from (7)

(55) 0=djTNewA2rj(x)dj=d¯j2x¯j2(d¯jTx¯j)2x¯j3.0=d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*})d_{j}=\frac{\|\bar{d}_{j}\|^{2}\|\bar{x}^{*}_{j}\|^{2}-(\bar{d}_{j}^{T}\bar{x}^{*}_{j})^{2}}{\|\bar{x}^{*}_{j}\|^{3}}.

The definition of \mathcal{B} implies rj(xj)=0r_{j}(x_{j}^{*})=0 and so xj0=x¯jx^{*}_{j0}=\|\bar{x}^{*}_{j}\|. Since djd_{j} is in the null space of NewArj(xj){\rm NewA}r_{j}(x_{j}^{*}), we have 0=NewArj(xj)Tdj=dj0+d¯jTx¯jx¯j0={\rm NewA}r_{j}(x^{*}_{j})^{T}d_{j}=-d_{j0}+\frac{\bar{d}_{j}^{T}\bar{x}_{j}}{\|\bar{x}^{*}_{j}\|}, which in turn yields dj0xj0=d¯jTx¯jd_{j0}x^{*}_{j0}=\bar{d}_{j}^{T}\bar{x}^{*}_{j}. Finally, using these relationships together with (55) gives

0=d¯j2x¯j2(d¯jTx¯j)2=d¯j2(xj0)2(dj0xj0)20=\|\bar{d}_{j}\|^{2}\|\bar{x}^{*}_{j}\|^{2}-(\bar{d}_{j}^{T}\bar{x}^{*}_{j})^{2}=\|\bar{d}_{j}\|^{2}\,(x^{*}_{j0})^{2}-(d_{j0}x^{*}_{j0})^{2}

and so dj02=d¯j2d_{j0}^{2}=\|\bar{d}_{j}\|^{2}. All of these facts imply that for any β\beta\in\mathbb{R},

x¯j+βd¯j2(xj0+βdj0)2\displaystyle\|\bar{x}^{*}_{j}+\beta\bar{d}_{j}\|^{2}-(x^{*}_{j0}+\beta d_{j0})^{2}
=\displaystyle= x¯j2+2βd¯jTx¯j+β2d¯j2((xj0)2+2βdj0xj0+β2dj02)=0,\displaystyle\|\bar{x}^{*}_{j}\|^{2}+2\beta\bar{d}_{j}^{T}\bar{x}^{*}_{j}+\beta^{2}\|\bar{d}_{j}\|^{2}-\left((x^{*}_{j0})^{2}+2\beta d_{j0}x^{*}_{j0}+\beta^{2}d_{j0}^{2}\right)=0,

which implies rj(xj+βdj)=0r_{j}(x^{*}_{j}+\beta d_{j})=0 and therefore xj+βdj𝒦jx^{*}_{j}+\beta d_{j}\in\mathcal{K}_{j}.

Further, because dd lies in the null space of the active constraints, we have, for any β\beta\in\mathbb{R}, that xj+βdj=0𝒦jx^{*}_{j}+\beta d_{j}=0\in\mathcal{K}_{j} for all j(x)j\in\mathcal{E}(x^{*}) and A𝒜(x+βd)=b𝒜A_{\mathcal{A}}(x^{*}+\beta d)=b_{\mathcal{A}}. Finally, since rj(xj)<0r_{j}(x^{*}_{j})<0 and hence xjint(𝒦j)x^{*}_{j}\in\operatorname{int}(\mathcal{K}_{j}) for all j𝒥((x))j\in\mathcal{J}\setminus(\mathcal{E}(x^{*})\cup\mathcal{B}), and since xjx^{*}_{j} is strictly feasible for all non-binding constraints in (1b), there exists β>0\beta>0 so that x+βdx^{*}+\beta d satisfies all constraints in (1).

Theorem 4.21.

Suppose that cH>Hc_{H}>\|H^{*}\|. Then the primal-dual iterates (xk+1,λ^k,z^k)(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k}) converge locally to (x,λ,z)(x^{*},\lambda^{*},z^{*}) at a quadratic rate.

Proof 4.22.

We already established in Theorem 4.11 that the iterates converge to the optimal solution, and since HkHH^{k}\to H^{*} and cH>Hc_{H}>\|H^{*}\|, the Hessian is not rescaled according to (34) in Step 4. Using Theorem 4.16 we know that, once kk is sufficiently large, the step dS,kd^{S,k} computed in Step 6 of Algorithm 3 is identical with the SQP step from (5) for (4); we can ignore (27d) here because xj0>0x^{*}_{j0}>0 and dj0S,k0d^{S,k}_{j0}\to 0 and therefore this constraints is not active for large kk. This also implies that the condition in Step 7 is never true and thus ^k=(x)\hat{\mathcal{E}}_{k}=\mathcal{E}(x^{*}). If the decrease condition in Step 12 is not satisfied, by a similar argument we have that sks^{k} computed in Step 16 is the second-order correction step from (12) for (4). Due to Lemma 4.19 we can now apply Theorem 2.3 to conclude that either dS,kd^{S,k} or dS,k+skd^{S,k}+s^{k} is accepted to define the next iterate for large kk and that the iterates converge at a quadratic rate.

5 Numerical Experiments

In this section, we examine the performance of Algorithm 3. First, using randomly generated instances, we consider three types of starting points: (i) uninformative default starting point (cold start), (ii) solution of a perturbed instance, (iii) solution computed by an interior-point SOCP solver whose accuracy we wish to improve. Then we briefly report results using the test library CBLIB. The numerical experiments were performed on an Ubuntu 22.04 Linux server with a 2.1GHz Xeon Gold 5128 R CPU and 256GB of RAM.

5.1 Implementation

We implemented Algorithm 3 in MATLAB R2021b, with parameters cdec=106c_{\text{dec}}=10^{-6}, cinc=2c_{\text{inc}}=2, cH=1012c_{H}=10^{12}, and ρ1=50\rho^{-1}=50. In each iteration, we identify (xk)={j𝒥:xjk<106}\mathcal{E}(x^{k})=\{j\in\mathcal{J}:\|x^{k}_{j}\|_{\infty}<10^{-6}\} and 𝒟(xk)={j𝒥(xk):x¯jk>108}\mathcal{D}(x^{k})=\{j\in\mathcal{J}\setminus\mathcal{E}(x^{k}):\|\bar{x}^{k}_{j}\|>10^{-8}\}. The set 𝒴j0\mathcal{Y}_{j}^{0} is initialized to 𝒴^j0\hat{\mathcal{Y}}_{j}^{0} (see (20)), and λ0\lambda^{0} is a given starting value for λ\lambda, if provided, and zero otherwise. In addition, since the identification of the optimal extremal-active set (x)\mathcal{E}(x^{*}) requires zj𝒞j(𝒴j)z^{*}_{j}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j}), we add zˇj0-\check{z}^{0}_{j} to 𝒴j0\mathcal{Y}^{0}_{j}, where zˇ0=c+ATλ0\check{z}^{0}=c+A^{T}\lambda^{0}.

The algorithm terminates when the violation of the SOCP optimality conditions (13) for the current iterate satisfies

(56) E(xk,λk,zˇk)=max{[Axkb]+,(Axkb)λk,[λk]+maxj𝒥{[rj(xk)]+,[rj(zˇk)]+,|(xjk)Tzˇjk|}}ϵtolE(x^{k},\lambda^{k},\check{z}^{k})=\max\left\{\begin{array}[]{l}\|[Ax^{k}-b]^{+}\|_{\infty},\|(Ax^{k}-b)\circ\lambda^{k}\|_{\infty},\|[-\lambda^{k}]^{+}\|_{\infty}\\[4.30554pt] \max_{j\in\mathcal{J}}\left\{[r_{j}(x^{k})]^{+},[r_{j}(\check{z}^{k})]^{+},|(x_{j}^{k})^{T}\check{z}^{k}_{j}|\right\}\end{array}\right\}\leq\epsilon_{\text{tol}}

with zˇk=c+ATλk\check{z}^{k}=c+A^{T}\lambda^{k}, for some ϵtol>0\epsilon_{\text{tol}}>0.

As in [29], the sufficient descent condition (11) is slightly relaxed by

φ(x^k+1;ρk)φ(xk;ρk)10ϵmach|φ(xk;ρk)|cdec(mk(xk+d;ρk)mk(xk;ρk))\varphi(\hat{x}^{k+1};\rho^{k})-\varphi(x^{k};\rho^{k})-10\epsilon_{\text{mach}}|\varphi(x^{k};\rho^{k})|\leq c_{\text{dec}}\left(m^{k}(x^{k}+d;\rho^{k})-m^{k}(x^{k};\rho^{k})\right)

to account for cancellation error, where ϵmach\epsilon_{\text{mach}} is the machine precision. Finally, to avoid accumulating very similar hyperplanes that would lead to degenerate QPs, we do not add a new generating point vjv_{j} to 𝒴jk\mathcal{Y}_{j}^{k} if there already exists yj𝒴jky_{j}\in\mathcal{Y}_{j}^{k} such that v¯jv¯jy¯jy¯j1010.\left\|\frac{\bar{v}_{j}}{\|\bar{v}_{j}\|}-\frac{\bar{y}_{j}}{\|\bar{y}_{j}\|}\right\|_{\infty}\leq 10^{-10}.

In these experiments, we disabled the second-order correction step (Steps 1620) because we noticed that it was never accepted in practice. In a more sophisticated implementation, one would include a heuristic that attempts to detect the Maratos effect and then triggers the second-order correction step in specific situations.

The QPs were solved using ILOG CPLEX V12.10, with optimality and feasibility tolerances set to 10910^{-9} and “dependency checker” and “numerical precision emphasis” enabled, using the primal simplex method. When CPLEX did not report a solution status “optimal” and the QP KKT error was above 10910^{-9}, a small perturbation was added to the Hessian matrix, i.e., we replaced HkH^{k} by Hk+107IH^{k}+10^{-7}\cdot I. This helped in some cases in which CPLEX (incorrectly) reported that HkH^{k} was not positive semi-definite. If CPLEX still did not find a QP solution with KKT error less than 10910^{-9}, we attempted to resolve the QP with the barrier method, the dual simplex method, and the primal simplex method again, until one was able to compute a solution. If all solvers failed for QP (27), the algorithm continued in Step 22. If no solver was able to solve (18), we terminated the main algorithm and declared a failure.

We emphasize that the purpose of our implementation is to assess whether the proposed algorithm exhibits behavior that validates the stated goals: Convergence from any starting point and rapid local convergence to highly accurate solutions. In its current implementation, it requires more computation time than highly sophisticated commercial solvers such as MOSEK or CPLEX, which were developed over decades and have highly specialized linear algebra routines that are tightly integrated into the algorithms. As we observed at the end of Section 3.7, many of the QPs in Algorithm 3 that are solved in succession are similar to each other, and savings in computation times should therefore be achievable. However, our prototype implementation based on the Matlab CPLEX interface does not allow us to utilize callback functions for adding or removing hyperplanes. Achieving these savings in computation time thus requires a more sophisticated implementation, a task that is outside of the scope of this paper. Consequently, we do not report solution times here.

5.2 Randomly generated QCQPs

The experiments were performed on randomly generated SOCP instances of varying sizes, specified by (n,m,K)(n,m,K). Here, n,m1n,m\geq 1 are the number of variables and linear constraints, respectively. K1K\geq 1 specifies the number of cones of each “activity type”: |(x)|=K|\mathcal{E}(x^{*})|=K, |{j𝒥:rj(xj)=0,xj0}|=K|\{j\in\mathcal{J}:r_{j}(x^{*}_{j})=0,x_{j}^{*}\neq 0\}|=K, and |{j𝒥:rj(xj)<0}|=K|\{j\in\mathcal{J}:r_{j}(x^{*}_{j})<0\}|=K, i.e., there are KK cones that are extremal-active, KK that are active at the boundary, and KK that are inactive at the optimal solution xx^{*}. The dimensions of the cones are randomly chosen. In addition, there are variables that are not part of any cone, with bounds chosen in a way so that the non-degeneracy assumption, Assumption 2, holds. A detailed description of the problem generation is stated in Appendix A in [18].

nn mm KK solved total SQP total total
iter iter QP (27) QP (18)
200 60 10 30 6.67 6.67 9.77 0.00
400 120 20 30 7.20 7.20 11.57 0.00
1000 300 50 30 7.23 7.23 12.17 0.00
200 60 4 30 7.53 7.07 11.83 0.90
400 120 8 30 8.27 7.77 14.20 1.00
1000 300 20 30 8.67 7.80 15.93 1.83
200 60 2 30 8.47 7.87 13.90 1.20
400 120 4 30 8.87 8.07 15.30 1.60
1000 300 10 30 9.47 8.43 17.27 1.97
Table 1: Results with x0=0x^{0}=0, ϵtol=107\epsilon_{\text{tol}}=10^{-7}, average per-size statistics taken over 30 random instances. “solved”: number of instances solved (out of 30); “total iter”: total number of iterations in Algorithm 3; “SQP iter”: number of iteration in which NLP-SQP step was accepted in Steps 12 or 17; “total QP (27)” / “total QP (18)”: Total number of QPs of that type solved.

Table 1 summarizes the performance of the algorithm with an uninformative starting point x0=0x^{0}=0. Each row lists average statistics for a given problem size (n,m,Kn,m,K), taken over 30 random instances. We see that the proposed algorithm is very reliable and solved every instance to the tolerance ϵ=107\epsilon=10^{-7}. The average number of iterations is mostly between 7–9, during most of which the second-order NLP-SQP step was accepted.

To give an idea of the computational effort, we report the number of times QPs (27) and (18) were solved. And we can draw further conclusions from this data: Consider, for example, the last row. At the beginning of each iteration, QP (27) is solved to obtain the NLP-SQP step. The difference with the total number of iterations, i.e., 17.27-9.47=7.80, gives us the total number of times in which the guess ^k\hat{\mathcal{E}}^{k} of the extremal-active cones needed to be corrected in Steps 710. In other words, on average, the loop Steps 710 is executed 7.80/9.47=0.82 times per iteration. Similarly, the last column tells us the total number of iterations of the loop in Steps 2232. The loop was only executed when the NLP-SQP step was not accepted, so in 9.47-8.43=1.04 iterations, taking 1.97/1.04=1.89 loop iterations on average.

The experiments are presented in three groups where the ratio between nn and KK is kept constant. As the number of cones, KK, decreases from one group to the next, the average size of the individual cones increases by a factor of 2.52.5 and 22, respectively. This increase seems to result in slightly more iterations in which the SQP step was rejected, indicating that the simple linearization (27c) of the non-extremal-active cones becomes sometimes insufficiently accurate.

nn mm KK total SQP total total Mosek final
iter iter QP (27) QP (18) error error
200 60 10 1.10 1.07 1.10 0.07 2.33e-06 1.63e-10
400 120 20 1.03 1.00 1.03 0.03 2.67e-06 1.70e-10
1000 300 50 1.07 1.03 1.07 0.03 3.49e-06 1.76e-10
200 60 4 1.03 1.03 1.03 0.00 5.97e-06 1.69e-10
400 120 8 1.00 1.00 1.00 0.00 2.28e-06 1.87e-10
1000 300 20 1.03 0.83 1.03 0.27 5.20e-06 1.72e-10
200 60 2 1.00 1.00 1.00 0.00 2.02e-06 1.53e-10
400 120 4 1.13 1.10 1.13 0.03 4.85e-06 2.03e-10
1000 300 10 1.20 1.10 1.20 0.13 1.22e-05 2.41e-10
Table 2: Result with MOSEK solution as x0x^{0}, ϵtol=109\epsilon_{\text{tol}}=10^{-9}. All instances were solved. “Mosek error”: Optimality error EE (56) at Mosek solution; “final error”: Optimality error EE at final iterate of Algorithm 3.

The remaining experiments in this section investigate to which degree the algorithm is able to achieve our primary goal of taking advantage of a good starting point. We begin with an extreme situation, in which we first solve an instance with the interior-point SOCP solver MOSEK V9.1.9 (called via CVX), using the setting cvx_precision=high corresponding to the MOSEK tolerance ϵ=ϵmach2/3\epsilon=\epsilon_{\rm mach}^{2/3}, and give the resulting primal-dual solution as starting point to Algorithm 3. Choosing any tighter MOSEK tolerances leads to failures in several problems. Table 2 summarizes the results. In all cases, the algorithm converges rapidly to an improved solution, reducing the error by 4 orders of magnitude, most of the time with only a single second-order iteration. The Mosek error was dominated by the violation of complementarity. This demonstrates the ability of the proposed method to improve the accuracy of a solution computed by an interior-point method.

nn mm KK solved total SQP total total
iter iter QP (27) QP (18)
200 60 10 30 1.00 0.97 1.00 0.07
400 120 20 30 1.00 0.97 1.00 0.03
1000 300 50 30 1.00 0.97 1.00 0.07
200 60 4 30 1.00 1.00 1.00 0.00
400 120 8 30 1.00 0.93 1.00 0.07
1000 300 20 30 1.00 0.87 1.00 0.20
200 60 2 30 1.00 1.00 1.00 0.00
400 120 4 30 1.00 0.97 1.00 0.03
1000 300 10 30 1.07 1.00 1.03 0.07
Table 3: Result with 10310^{-3} perturbation, ϵtol=107\epsilon_{\text{tol}}=10^{-7}.
nn mm KK solved total SQP total total
iter iter QP (27) QP (18)
200 60 10 30 1.20 1.07 1.00 0.19
400 120 20 30 1.33 1.17 1.00 0.73
1000 300 50 30 1.60 1.23 1.02 1.29
200 60 4 30 1.27 1.13 1.02 0.71
400 120 8 30 1.67 1.27 1.16 0.48
1000 300 20 30 2.10 1.40 1.27 0.57
200 60 2 30 1.67 1.33 1.24 0.42
400 120 4 30 2.30 1.87 1.30 0.38
1000 300 10 30 3.67 2.53 1.53 0.59
Table 4: Result with 10110^{-1} perturbation, ϵtol=107\epsilon_{\text{tol}}=10^{-7}.

For the final experiments, summarized in Tables 3 and 4, the starting point is the MOSEK solution of a perturbed problem, in which 10% of the objective coefficients cc were perturbed by uniformly distributed random noise of the order of 10310^{-3} and 10110^{-1}, respectively. For the small perturbation, similar to Table 2, Algorithm 3 terminated in one iteration most of the time. More iterations were required for the larger perturbation, but still significantly fewer compared to the uninformative starting point, see Table 1.

5.3 CBLIB instances

Problem # var # con # soc solved/ total SQP iter warm/
subset total iter iter iter cold
10/20* 3024.0 6030.6 336.0 10/11 6.5 6.5 0.57
achtziger_* 2314.0 1856.7 257.0 6/7 21.3 10.2 0.53
as_* 4909.0 5695.7 669.0 20/20 10.4 8.4 0.37
ck_* 2376.0 2375.0 15.0 90/90 5.8 4.6 0.40
classical_* 159.3 200.4 1.0 409/409 7.3 7.3 0.49
clay* 457.2 691.9 82.9 11/12 67.6 13.2 1.31
estein* 103.1 165.2 14.0 9/9 18.3 4.4 0.30
flay* 175.0 408.0 4.0 10/10 7.0 3.6 0.34
fo[7-9]* 208.6 540.2 15.9 19/19 12.3 3.0 0.53
m[3-9]* 150.2 370.5 12.8 8/8 10.9 4.0 0.44
nb* 3098.8 321.0 816.0 4/4 4.8 4.2 3.43
netmod* 989.7 2593.0 4.7 3/3 4.0 4.0 0.33
no7_* 169.0 438.0 14.0 5/5 10.6 4.4 0.46
o[7-9]_* 183.2 466.9 14.7 9/9 8.7 3.2 0.66
pp-* 2960.0 2221.0 370.0 6/6 12.8 12.8 0.17
robust_* 253.9 298.3 2.0 420/420 6.8 6.8 0.67
sched_* 7361.5 3685.0 1.5 4/4 12.5 10.0 0.35
shortfall_* 249.9 294.3 2.0 420/420 7.8 7.8 0.37
slay* 393.0 936.0 14.0 14/14 8.6 7.0 0.27
sssd-* 286.0 314.5 18.0 16/16 7.6 5.6 0.42
stolpe07-* 1483.0 1202.0 164.7 3/3 29.0 18.3 0.71
tls* 521.4 989.8 51.6 5/6 7.6 7.0 0.49
turbine* 201.6 289.3 52.4 7/7 4.9 4.9 0.30
uflquad-* 8011.0 6811.0 1333.3 3/4 15.3 14.7 0.19
wiener_* 2297.5 2508.5 71.3 41/45 12.4 9.9 0.43
misc 1534.1 1374.5 158.9 11/14 10.5 7.6 0.33
Table 5: Results for CBLIB instances, averaged per problem group, ϵtol=105\epsilon_{\text{tol}}=10^{-5}. “Problem subset”: name of problem group; “# var”: number of variables; “# con”: number of linear constraints; “# soc”: number of second-order cone constraints; “solved/total”: number of solved vs. total instances; “total iter”: number of iterations in Algorithm 3; “SQP iter”: number of iterations in which NLP-SQP step was accepted; “iter warm/iter cold”: iterations for warm start divided by iterations for cold start (only for instances solved in both settings).

To demonstrate the robustness of the algorithm we also solved instances from the Conic Benchmark Library CBLIB [32]. Some instances involve rotated second-order cone constraints, and we reformulated them so that they fit into our standard form (1). We chose all 1,575 instances with at most 10,000 variables and 10,000 constraints. Integer variables were relaxed to be continuous.

Table 5 shows the statistics for the starting point x0=0x^{0}=0, where instances were grouped into sets of problems with similar names, and the remaining ones are collected in misc. The method was able to solve 99.2% of the instances, where 10 problems could not be solved due to failures of the QP subproblem solver, and Algorithm 3 exceeded the maximum number of 200 iterations in 2 cases. In comparison, MOSEK, with default settings, failed on 5 instances (those were solved correctly with Algorithm 3), incorrectly declared 3 instances to be infeasible, and labeled 6 instances to be unbounded (of which 3 were solved by Algorithm 3). We observed that some instances, especially those in the clay*, fo[7-9]*, m[3-9]*, no7*, o[7-9]_* subsets, are degenerate, having an optimal objective function value of 0, and the assumption necessary to prove fast local convergence is violated. This matches the observations in the table, where the SQP step was accepted only in a relatively small fraction of the iterations.

To showcase the warm-starting feature of the algorithm, we took the 1,563 previously successfully solved instances, perturbed 10% of the entries of the final primal-dual iterate by a random perturbation, uniformly chosen in [0.1,0.1][-0.1,0.1], and used this as the starting point for a warm-started run. Here, QP subproblem failure occurred in 3 cases and 2 instances exceeded the iteration limit. As we can see from the last column in Table 5, on average, the number of iterations was reduced in most cases, often by more than 50%.

6 Concluding remarks

We presented an SQP algorithm for solving SOCPs and proved that it converges from any starting point and achieves local quadratic convergence for non-degenerate SOCPs. Our numerical experiments indicate that the algorithm is reliable, converges quickly when a good starting point is available, and produces more accurate solutions than a state-of-the-art interior-point solver.

Future research would investigate whether the proposed algorithm is a valuable alternative for interior-point methods for small problems or for the solution of a sequence of related SOCPs. An efficient implementation of the algorithm beyond our Matlab prototype would be tightly coupled with a tailored active-set QP solver that efficiently adds or removes cuts instead of solving each QP essentially from scratch. Parametric active-set solvers such as qpOASES [8] or QORE [26] might be suitable options since they do not require primal or dual feasible starting points.

Acknowledgments

We thank Javier Peña for his suggestion for the proof of Lemma 4.8, as well as three referees whose comments helped us to improve the paper.

References

  • [1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Mathematical Programming, 95(1):3–51, 2003.
  • [2] A. Ben-Tal and A. Nemirovski. On polyhedral approximations of the second-order cone. Mathematics of Operations Research, 26(2):193–205, 2001.
  • [3] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta numerica, 4:1–51, 1995.
  • [4] C. Coey, M. Lubin, and J. P. Vielma. Outer approximation with conic certificates for mixed-integer convex problems. Mathematical Programming Computation, 12(2):249–293, 2020.
  • [5] A. R. Conn, N. I. Gould, and P. L. Toint. Trust region methods. SIAM, 2000.
  • [6] M. Diehl, F. Jarre, and C. H. Vogelbusch. Loss of superlinear convergence for an SQP-type method with conic constraints. SIAM Journal on Optimization, 16(4):1201–1210, 2006.
  • [7] S. Drewes and S. Ulbrich. Subgradient based outer approximation for mixed integer second order cone programming. In Mixed integer nonlinear programming, pages 41–59. Springer, 2012.
  • [8] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl. qpOASES: A parametric active-set algorithm for quadratic programming. Mathematical Programming Computation, 6:327–363, 2014.
  • [9] R. Fletcher. Second order corrections for non-differentiable optimization. In Numerical analysis, pages 85–114. Springer, 1982.
  • [10] N. Goldberg and S. Leyffer. An active-set method for second-order conic-constrained quadratic programming. SIAM Journal on Optimization, 25(3):1455–1477, 2015.
  • [11] N. Goswami, S. K. Mondal, and S. Paruya. A comparative study of dual active-set and primal-dual interior-point method. IFAC Proceedings Volumes, 45(15), 2012.
  • [12] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022.
  • [13] S. Hayashi, T. Okuno, and Y. Ito. Simplex-type algorithm for second-order cone programmes via semi-infinite programming reformulation. Optimization Methods and Software, 31(6):1272–1297, 2016.
  • [14] L. Hei, J. Nocedal, and R. A. Waltz. A numerical study of active-set and interior-point methods for bound constrained optimization. In Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam, pages 273–292. Springer, 2008.
  • [15] IBM ILOG. User’s Manual for CPLEX, 2019.
  • [16] H. Kato and M. Fukushima. An SQP-type algorithm for nonlinear second-order cone programs. Optimization Letters, 1(2):129–144, 2007.
  • [17] M. S. K. Lau, S. P. Yue, K. V. Ling, and J. M. Maciejowski. A comparison of interior point and active set methods for fpga implementation of model predictive control. In 2009 European Control Conference (ECC), pages 156–161, 2009.
  • [18] X. Luo and A. Wächter. A Quadratically Convergent Sequential Programming Method for Second-Order Cone Programs Capable of Warm Starts, 2022. arXiv:2207.03081.
  • [19] N. Maratos. Exact penalty function algorithms for finite dimensional and control optimization problems. PhD thesis, Imperial College London (University of London), 1978.
  • [20] D. K. Molzahn and I. A. Hiskens. A survey of relaxations and approximations of the power flow equations. Foundations and Trends in Electric Energy Systems, 4(1-2):1–221, 2019.
  • [21] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.1., 2019.
  • [22] M. Nayakakuppam, M. Overton, and S. Schemita. SDPpack user’s guide-version 0.9 Beta, 1997.
  • [23] J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
  • [24] T. Okuno, K. Yasuda, and S. Hayashi. Sl1QP based algorithm with trust region technique for solving nonlinear second-order cone programming problems. Interdisciplinary Information Sciences, 21(2):97–107, 2015.
  • [25] F. A. Potra and S. J. Wright. Interior-point methods. Journal of Computational and Applied Mathematics, 124(1-2):281–302, 2000.
  • [26] L. Schork. A parametric active set method for general quadratic programming. Master’s thesis, Heidelberg University, Germany, 2015.
  • [27] P. Tseng. Second‐order cone programming relaxation of sensor network localization. SIAM Journal on Optimization, 18(1):156–185, 2007.
  • [28] R. J. Vanderbei and H. Yurttan. Using LOQO to solve second-order cone programming problems. Technical report, Priceton University, 1998.
  • [29] A. Wächter and L. T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
  • [30] V. Zhadan. The variant of primal simplex-type method for linear second-order cone programming. In Optimization and Applications: 12th International Conference, OPTIMA 2021, Petrovac, Montenegro, September 27–October 1, 2021, Proceedings, pages 64–75. Springer, 2021.
  • [31] X. Zhang, Z. Liu, and S. Liu. A trust region SQP-filter method for nonlinear second-order cone programming. Computers & Mathematics with Applications, 63(12):1569–1576, 2012.
  • [32] Zuse Institute Berlin. CBLIB - The Conic Benchmark Library. https://cblib.zib.de/.

Appendix A Generation of random instances

The test instances used for the numerical experiments have the form

minxn\displaystyle{\displaystyle\min_{x\in\mathbb{R}^{n}}} j=1pcjTxj+cp+1Txp+1\displaystyle\sum_{j=1}^{p}c_{j}^{T}x_{j}+c_{p+1}^{T}x_{p+1}
s.t. j=1pAjxj+Ap+1xp+1=b,\displaystyle\sum_{j=1}^{p}A_{j}x_{j}+A_{p+1}x_{p+1}=b,
xj𝒦j,j[p],\displaystyle x_{j}\in\mathcal{K}_{j},\quad j\in[p],
xj01000,0xp+11000,\displaystyle x_{j0}\leq 1000,\quad 0\leq x_{p+1}\leq 1000,

where (c1,cp,cp+1)n1××np×np+1(c_{1},\cdots c_{p},c_{p+1})\in\mathbb{R}^{n_{1}}\times\cdots\times\mathbb{R}^{n_{p}}\times\mathbb{R}^{n_{p+1}} is the partition of the objective vector cc, and (A1,,Ap+1)(A_{1},\cdots,A_{p+1}) is the partition of the column vectors of constraint matrix Am×nA\in\mathbb{R}^{m\times n}. The subvector xp+1x_{p+1} includes all optimization variables that are not in any of the cones.

In contrast to the original SOCP (1), this formulation includes linear equality constraints, but it is straightforward to generalize Algorithm 3 and its convergence proofs for this setting. The formulation above also includes large upper bounds on all variables so that the set XX defined in (9) is bounded. However, the upper bounds were chosen so large that they are not active at the optimal solution.

Algorithm 4 Random Instance Generation
1: nn (number of variables), mm (number of linear constraints), pp (total number of cones), K0K_{0} (number of extremal-active cones), KIK_{I} (number of inactive cones), KBK_{B} (number of cones active at the boundary, excluding extremal-active cones), dd (density of nonzeros in constraint matrix). Conditions: n,m,p,K0,KI,KB+n,m,p,K_{0},K_{I},K_{B}\in\mathbb{Z}_{+}, d(0,1]d\in(0,1], p=K0+KI+KBp=K_{0}+K_{I}+K_{B}, n>2pn>2p, n>m+KB+(m+KB)/2n>m+K_{B}+\lfloor(m+K_{B})/2\rfloor.
2:Randomly choose positive integers n1,,nK0n_{1},\ldots,n_{K_{0}} so that nj2n_{j}\geq 2 and j=1K0nj=(m+KB)/2\sum_{j=1}^{K_{0}}n_{j}=\lfloor(m+K_{B})/2\rfloor.
3:Randomly choose positive integers nK0+1,,npn_{K_{0}+1},\ldots,n_{p} so that nj2n_{j}\geq 2 and j=K0+1pnj=m+KB\sum_{j=K_{0}+1}^{p}n_{j}=m+K_{B}.
4:for j=1,,K0j=1,\ldots,K_{0} do \triangleright Generate xj=0x_{j}^{*}=0, zjint(𝒦j)z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j})
5:     Set xj0x_{j}^{*}\leftarrow 0.
6:     Sample z¯j(10,10)nj1\bar{z}_{j}\sim(-10,10)^{n_{j}-1}, zj0(1,5)z_{j0}^{*}\sim(1,5) and ϵj(0,1)\epsilon_{j}\sim(0,1). Set z¯j(2+ϵ)zj0z¯jz¯j\bar{z}_{j}^{*}\leftarrow(2+\epsilon)\frac{z^{*}_{j0}}{\|\bar{z}_{j}\|}\bar{z}_{j}.
7:end for
8:for j=K0+1,,K0+KIj=K_{0}+1,\ldots,K_{0}+K_{I} do \triangleright Generate zj=0z_{j}^{*}=0, xjint(𝒦j)x_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j})
9:     Set zj0z_{j}^{*}\leftarrow 0.
10:     Sample x¯j(10,10)nj1\bar{x}_{j}\sim(-10,10)^{n_{j}-1}, xj0(1,5)x_{j0}^{*}\sim(1,5) and ϵj(0,1)\epsilon_{j}\sim(0,1). Set x¯j(2+ϵ)xj0x¯jx¯j\bar{x}_{j}^{*}\leftarrow(2+\epsilon)\frac{x^{*}_{j0}}{\|\bar{x}_{j}\|}\bar{x}_{j}.
11:end for
12:for j=K0+KI+1,,pj=K_{0}+K_{I}+1,\ldots,p do\triangleright Generate zj,xjbd(𝒦j)z_{j}^{*},x_{j}^{*}\in\operatorname{bd}(\mathcal{K}_{j})
13:     Sample x¯j(10,10)nj1\bar{x}_{j}\sim(-10,10)^{n_{j}-1} and xj0(1,5)x_{j0}^{*}\sim(1,5). Set x¯jxj0x¯jx¯j\bar{x}_{j}^{*}\leftarrow\frac{x^{*}_{j0}}{\|\bar{x}_{j}\|}\bar{x}_{j}.
14:     Sample β(1,5)\beta\sim(1,5). Set z¯jβx¯j\bar{z}^{*}_{j}\leftarrow-\beta\bar{x}_{j}^{*} and z¯j0βxj0\bar{z}^{*}_{j0}\leftarrow\beta x_{j0}^{*}
15:end for
16:Set nfixnmKBj=1K0njn_{\rm fix}\leftarrow n-m-K_{B}-\sum_{j=1}^{K_{0}}n_{j} and nfreennfixj=1pnjn_{\rm free}\leftarrow n-n_{\rm fix}-\sum_{j=1}^{p}n_{j}.
17:for j=1,,nfixj=1,\ldots,n_{\rm fix} do
18:     Set [xp+1]j0x_{p+1}^{*}]_{j}\leftarrow 0 and sample [zp+1]j(1,5)[z^{*}_{p+1}]_{j}\sim(1,5).
19:end for
20:for j=nfix+1,,nfreej=n_{\rm fix}+1,\ldots,n_{\rm free} do
21:     Set [zp+1]j0z_{p+1}^{*}]_{j}\leftarrow 0 and sample [xp+1]j(1,5).[x^{*}_{p+1}]_{j}\sim(1,5).
22:end for
23:Sample linear independent rows Ai(5,5)nA_{i}\sim(-5,5)^{n} with density dd, for i=1,,mi=1,\cdots,m.
24:Call pcond to calculate primal condition number cpc_{p} and call dcond to calculate dual condition number cdc_{d}.
25:if cp>105c_{p}>10^{5} or cd>105c_{d}>10^{5} then go to Step 3 and redo the process.
26:Sample λ(1,10)m\lambda^{*}\sim(1,10)^{m}. Set bAxb\leftarrow Ax^{*} and cATλ+zc\leftarrow-A^{T}\lambda^{*}+z^{*}.
27:return A,b,c,x,λ,zA,b,c,x^{*},\lambda^{*},z^{*}.

Algorithm 4 describes how the data for these instances are generated. The algorithm generates an optimal primal-dual solution in a way so that, at the solution, K0K_{0} cones are extremal-active, for KIK_{I} cones the optimal solution lies in the interior of the cone, and for KBK_{B} the optimal solution is at the boundary but not the extreme point of the cone. In our experiments, there is an equal number of KK of each type. The number of variables that are active at the lower bound, nfixn_{\rm fix}, is chosen in a way so that the optimal solution is non-degenerate. The linear constraints reduce the number of degrees of freedom by mm, each extremal active cone by njn_{j}, and each otherwise active cone by 11. Step 16 calculates the number of constraints that are active at zero so that the total number of degrees of freedom fixed by constraints equals nn. Lastly, in Step 24, we use pcond and dcond functions provided in SDPPack [22] to double-check if a generated instance is also numerically non-degenerate. These functions return the primal and dual conditions numbers, cpc_{p} and cdc_{d}, respectively, and we discard an instance if either number is above a threshold. Only about 1% of instances created were excluded in this manner.