A Quadratically Convergent Sequential Programming Method for Second-Order Cone Programs Capable of Warm Starts

Xinyi Luo Department of Industrial Engineering and Management Sciences, Northwestern University. This author was partially supported by National Science Foundation grant DMS-2012410. E-mail: xinyiluo2023@u.northwestern.edu Andreas Wächter Department of Industrial Engineering and Management Sciences, Northwestern University. This author was partially supported by National Science Foundation grant DMS-2012410. E-mail: andreas.waechter@northwestern.edu

Abstract

We propose a new method for linear second-order cone programs. It is based on the sequential quadratic programming framework for nonlinear programming. In contrast to interior point methods, it can capitalize on the warm-start capabilities of active-set quadratic programming subproblem solvers and achieve a local quadratic rate of convergence.

In order to overcome the non-differentiability or singularity observed in nonlinear formulations of the conic constraints, the subproblems approximate the cones with polyhedral outer approximations that are refined throughout the iterations. For nondegenerate instances, the algorithm implicitly identifies the set of cones for which the optimal solution lies at the extreme points. As a consequence, the final steps are identical to regular sequential quadratic programming steps for a differentiable nonlinear optimization problem, yielding local quadratic convergence.

We prove the global and local convergence guarantees of the method and present numerical experiments that confirm that the method can take advantage of good starting points and can achieve higher accuracy compared to a state-of-the-art interior point solver.

keywords:

nonlinear optimization, second-order cone programming, sequential quadratic programming

{AMS}

90C15, 90C30, 90C55

1 Introduction

We are interested in the solution of second-order cone programs (SOCPs) of the form


(1a)	$\displaystyle\min_{x\in\mathbb{R}^{n}}\$	$\displaystyle c^{T}x$
(1b)	s.t.	$\displaystyle Ax\leq b,$
(1c)		$\displaystyle x_{j}\in\mathcal{K}_{j}\qquad j\in\mathcal{J}:=\{1,\ldots,p\},$

where $c\in\mathbb{R}^{n}$ , $b\in\mathbb{R}^{m}$ , $A\in\mathbb{R}^{m\times n}$ , and $x_{j}$ is a subvector of $x$ of dimension $n_{j}$ with index set $\mathcal{I}_{j}\subseteq\{1,\ldots,n\}$ . We assume that the sets $\mathcal{I}_{j}$ are disjoint. The set $\mathcal{K}_{j}$ is the second-order cone of dimension $n_{j}$ , i.e.,

(2)

\mathcal{K}_{j}:=\{y\in\mathbb{R}^{n_{j}}:\|\bar{y}\|\leq y_{0}\},

where the vector $y$ is partitioned into $y=(y_{0},\bar{y}^{T})^{T}$ with $\bar{y}=(\bar{y}_{1},\ldots,\bar{y}_{n_{j}-1})^{T}$ . These problems arise in a number of important applications [1, 2, 20, 27]

Currently, most of the commercial software for solving SOCPs implements interior-point algorithms which utilize a barrier function for second-order cones, see, e.g. [12, 15, 21]. Interior-point methods have well-established global and local convergence guarantees [25] and are able to solve large-scale instances, but they cannot take as much of an advantage of a good estimate of the optimal solution as it would be desirable in many situations. For example, in certain applications, such as online optimal control, the same optimization problem has to be solved over and over again, with slightly modified data. In such a case, the optimal solution of one problem provides a good approximation for the new instance. Having a solver that is capable of “warm-starts”, i.e., utilizing this knowledge, can be essential when many similar problems have to be solved in a small amount of time.

For some problem classes, including linear programs (LPs), quadratic programs (QPs), or nonlinear programming (NLP), active-set methods offer suitable alternatives to interior-point methods. They explicitly identify the set of constraints that are active (binding) at the optimal solution. When these methods are started from a guess of the active set that is close to the optimal one, they often converge rapidly in a small number of iterations. An example of this is the simplex method for LPs. Its warm-start capabilities are indispensable for efficient branch-and-bound algorithms for mixed-integer linear programs.

Active-set methods for LPs, QPs, or NLPs are also known to outperform interior-point algorithms for problems that are not too large [11, 14, 17]. Similarly, active-set methods might be preferable when there are a large number of inequality constraints among which only a few are active, since an interior-point method is designed to consider all inequality constraints in every iteration and consequently solves large linear systems, whereas an active set method can ignore all inactive inequality constraints and encounters potentially much smaller linear systems.

Our goal is to propose an active-set alternative to the interior-point method in the context of SOCP that might provide similar benefits. We introduce a new sequential quadratic programming (SQP) algorithm that, in contrast to interior-point algorithms for SOCPs, has favorable warm-starting capabilities because it can utilize active-set QP solvers. We prove that it is globally convergent, i.e., all limit points of the generated iterates are optimal solutions under mild assumptions, and that it enjoys a quadratic convergence rate for non-degenerate instances. Our preliminary numerical experiments demonstrate that these theoretical properties are indeed observed in practice. They also show that the algorithm is able in some cases to compute a solution to a higher degree of precision than interior point methods. This is expected, again in analogy to the context of LPs, QPs, and NLPs, since an interior point method terminates at a small, but nonzero value of the barrier parameter that cannot be made smaller than some threshold (typically $10^{-6}$ or $10^{-8}$ ) because the arising linear systems become highly ill-conditioned. In contrast, in the final iteration of the active-set method, the linear systems solved correspond directly to the optimality conditions, without any perturbation introduced by a barrier parameter, and are only as degenerate as the optimal solution of the problem.

The paper is structured as follows. Section 2 reviews the sequential quadratic programming method and the optimality conditions of SOCPs. Section 3 describes the algorithm, which is based on an outer approximation of the conic constraints. Section 4 establishes the global and local convergence properties of the method, and numerical experiments are reported in Section 5. Concluding remarks are offered in Section 6.

1.1 Related work

While a large number of interior-point algorithms for SOCP have been proposed, including some that have been implemented in efficient optimization packages [12, 15, 21], there are only very few approaches for solving SOCPs with an active-set framework. The method proposed by Goldberg and Leyffer [10] is a two-phase algorithm that combines a projected-gradient method with equality-constrained SQP. However, it is limited to instances that have only conic constraints (1c) and no additional linear constraints (1b). Hayashi et al. [13] propose a simplex-type method, where they reformulate the SOCP as a linear semi-infinite program to handle the fact that these instances have infinitely many extreme points. The resulting dual-simplex exchange method shows promising practical behavior. However, in contrast to the method proposed here, the authors conjecture that their method has only an R-linear local convergence rate. Zhadan [30] proposes a similar simplex-type method. Another advantage of the method presented in this paper is that the pivoting algorithm does not need to be designed and implemented from scratch. Instead, it can leverage existing implementations of active-set QP solvers, in particular the efficient handling of linear systems.

The proposed algorithm relies on polyhedral outer approximations based on well-known cutting planes for SOCPs. For instance, the methods for mixed-integer SOCP by Drewes and Ulbrich [7] and Coey et al. [4] use these cutting planes to build LP relaxations of the branch-and-bound subproblems. We note that an LP-based cutting plane algorithm for SOCP could be seen as an active-set method, but it is only linearly convergent. As pointed out in [6], it is crucial to consider the curvature of the conic constraint in the subproblem objective to achieve fast convergence.

The term “SQP method for SOCP” has also been used in the literature to refer to methods for solving nonlinear SOCPs [6, 16, 24, 31]. However, in contrast to the method here, in these approaches, the subproblems themselves are SOCPs (1) and include the linearization of the nonlinear objective and constraints. It will be interesting to explore extensions of the proposed method to nonlinear SOCPs in which feasibility is achieved asymptotically not only for the nonlinear constraints but also for the conic constraints.

1.2 Notation

For two vectors $x,y\in\mathbb{R}^{n}$ , we denote with $x\circ y$ their component-wise product, and the condition $x\perp y$ stands for $x^{T}y=0$ . For $x\in\mathbb{R}^{n}$ , we define $[x]^{+}$ as the vector with entries $\max\{x_{i},0\}$ . We denote by $\|\cdot\|$ , $\|\cdot\|_{1}$ , $\|\cdot\|_{\infty}$ the Euclidean norm, the $\ell_{1}$ -norm, and the $\ell_{\infty}$ -norm, respectively. For a cone $\mathcal{K}_{j}$ , $e_{ji}\in\mathbb{R}^{n_{j}}$ is the canonical basis vector with $1$ in the element corresponding to $x_{ji}$ for $i\in\{0,\ldots,n_{j}-1\}$ , and $\operatorname{int}(\mathcal{K}_{j})$ and $\operatorname{bd}(\mathcal{K}_{j})$ denote the cone’s interior and boundary, respectively.

2 Preliminaries

The NLP reformulation of the SOCP is introduced in Section 2.1. We review in Section 2.2 the local convergence properties of the SQP method and in Section 2.3 the penalty function as a means to promote convergence from any starting point. In Section 2.4, we briefly state the optimality conditions and our assumption for the SOCP (1).

2.1 Reformulation as a smooth optimization problem

The definition of the second-order cone in (2) suggests that the conic constraint (1c) can be replaced by the nonlinear constraint

r_{j}(x_{j}):=\|\bar{x}_{j}\|-x_{j0}\leq 0

without changing the set of feasible points. Consequently, (1) is equivalent to


(3a)	$\displaystyle\min_{x\in\mathbb{R}^{n}}\$	$\displaystyle c^{T}x$
(3b)	s.t.	$\displaystyle Ax\leq b,$
(3c)		$\displaystyle r_{j}(x_{j})\leq 0,\qquad j\in\mathcal{J}.$

Unfortunately, (3) cannot be solved directly with standard gradient-based algorithms for nonlinear optimization, such as SQP methods. The reason is that $r_{j}$ is not differentiable whenever $\bar{x}_{j}=0$ . This is particularly problematic when the optimal solution $x^{*}$ of the SOCP lies at the extreme point of a cone, $x^{*}_{j}=0\in\mathcal{K}_{j}$ . In that case, the Karush-Kuhn-Tucker (KKT) necessary optimality conditions for the NLP formulation, which are expressed in terms of derivatives, cannot be satisfied. Therefore, any optimization algorithm that seeks KKT points cannot succeed. As a remedy, differentiable approximations of $r_{j}$ have been proposed in the past; see, for example, [28]. However, high accuracy comes at the price of high curvature, which can make finding the numerical solution of the NLP difficult.

An alternative equivalent reformulation of the conic constraint is given by

\|\bar{x}_{j}\|^{2}-x_{j0}^{2}\leq 0\text{ and }x_{j0}\geq 0.

In this case, the constraint function is differentiable. But if $x_{j}^{*}=0$ , its gradient vanishes, and as a consequence, no constraint qualification applies and the KKT conditions do not hold. Therefore, again, a gradient-based method cannot be employed. By using an outer approximation of the cones that is improved in the course of the algorithm, our proposed variation of the SQP method is able to avoid these kinds of degeneracy.

To facilitate the discussion we define a point-wise partition of the cones.

Definition 2.1.

Let $x\in\mathbb{R}^{n}$ .

1.

We call a cone $\mathcal{K}_{j}$ extremal-active at $x$ , if $x_{j}=0$ , and we denote with $\mathcal{E}(x)=\left\{j\in\mathcal{J}:x_{j}=0\right\}$ the set of extremal-active cones at $x$ .
2.

We define the set $\mathcal{D}(x)=\left\{j\in\mathcal{J}:\bar{x}_{j}\neq 0\right\}$ as the set of all cones for which the function $r_{j}$ is differentiable at $x$ .
3.

We define the set $\mathcal{N}(x)=\left\{j\in\mathcal{J}:x_{j}\neq 0\text{ and }\bar{x}_{j}=0\right\}$ as the set of all cones that are not extremal-active and for which $r_{j}$ is not differentiable $x$ .

If the set $\mathcal{E}(x^{*})$ at an optimal solution $x^{*}$ were known in advance, we could compute $x^{*}$ as a solution of (1) by solving the NLP


(4a)	$\displaystyle\underset{x\in\mathbb{R}^{n}}{\min}\$	$\displaystyle c^{T}x$
(4b)	s.t.	$\displaystyle Ax\leq b,$
(4c)		$\displaystyle r_{j}(x)\leq 0,\quad j\in\mathcal{D}(x^{*}),$
(4d)		$\displaystyle x_{j}=0,\hskip 22.76228ptj\in\mathcal{E}(x^{*}).$

The constraints involving the linearization of $r_{j}$ are imposed only if $r_{j}$ is differentiable at $x^{*}$ , and variables in cones that are extremal-active at $x^{*}$ are explicitly fixed to zero. With this, locally around $x^{*}$ , all functions in (4) are differentiable and we could apply standard second-order algorithms to achieve fast local convergence.

In (4), we omitted the cones in $\mathcal{N}(x^{*})$ . If $x^{*}$ is feasible for the SOCP and $j\in\mathcal{N}(x^{*})$ we have $\bar{x}^{*}_{j}=0$ and $x^{*}_{j0}>0$ , and so $r_{j}(x^{*})<0$ . This implies that the nonlinear constraint (4c) for this cone is not active and we can omit it from the problem statement without impacting the optimal solution.

2.2 Local convergence of SQP methods

The proposed algorithm is designed to guide the iterates $x^{k}$ into the neighborhood of an optimal solution $x^{*}$ . If the optimal solution is not degenerate and the iterates are sufficiently close to $x^{*}$ , the steps generated by the algorithm are eventually identical to the steps that the SQP method would take for solving the differentiable optimization problem (4). In this section, we review the mechanisms and convergence results of the basic SQP method [23].

At an iterate $x^{k}$ , the basic SQP method, applied to (4), computes a step $d^{k}$ as an optimal solution to the QP subproblem


(5a)	$\displaystyle\min_{d\in\mathbb{R}^{n}}\$	$\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d$
(5b)	s.t.	$\displaystyle A(x^{k}+d)\leq b,$
(5c)		$\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}\leq 0,\hskip 28.45274ptj\in\mathcal{D}(x^{*}),$
(5d)		$\displaystyle x_{j}^{k}+d_{j}=0,\hskip 86.78099ptj\in\mathcal{E}(x^{*}).$

Here, $H^{k}$ is the Hessian of the Lagrangian function for (4), which in our case is

(6)

H^{k}=\sum_{j\in\mathcal{D}(x^{*})}\mu^{k}_{j}{\rm NewA}^{2}_{xx}r_{j}(x_{j}^{k}),

where $\mu_{j}^{k}\geq 0$ are estimates of the optimal multipliers for the nonlinear constraint (4c), and where ${\rm NewA}^{2}_{xx}r_{j}(x_{j})$ is the $n\times n$ block-diagonal matrix with

(7)

{\rm NewA}^{2}r_{j}(x_{j})=\begin{bmatrix}0&0\\ 0&\frac{1}{\|\bar{x}_{j}\|}I-\frac{\bar{x}_{j}\bar{x}_{j}^{T}}{\|\bar{x}_{j}\|^{3}}\end{bmatrix}

in the rows and columns corresponding to $x_{j}$ for $j\in\mathcal{J}$ . It is easy to see that ${\rm NewA}^{2}r_{j}(x_{j})$ is positive semi-definite. The estimates $\mu_{j}^{k}$ are updated based on the optimal multipliers $\hat{\mu}_{j}^{k}\geq 0$ corresponding to (5c).

Algorithm 1 Basic SQP Algorithm

1:Initial iterate

x^{0}

and multiplier estimates

\lambda^{0}

\mu^{0}

, and

\eta^{0}

2:for

k=0,1,2\cdots

3: Compute

H^{k}

from (6).

4: Solve QP (5) to get step

d^{k}

and multipliers

\hat{\lambda}^{k}

\hat{\mu}_{j}^{k}

, and

\hat{\eta}^{k}

5: Set

x^{k+1}\leftarrow x^{k}+d^{k}

and

\mu_{j}^{k+1}\leftarrow\hat{\mu}_{j}^{k}

for all

j\in\mathcal{D}(x^{*})

6:end for

Algorithm 1 formally states the basic SQP method where $\hat{\lambda}^{k}\geq 0$ and $\hat{\eta}^{k}$ denote the multipliers corresponding to (4b) and (4d), respectively. Because we are only interested in the behavior of the algorithm when $x^{k}$ is close to $x^{*}$ , we assume here that $\bar{x}_{j}^{k}\neq 0$ for all $j\in\mathcal{D}(x^{*})$ and for all $k$ , and hence the gradient and Hessian of $r_{j}$ can be computed. Note that the iterates $\hat{\lambda}^{k}$ and $\hat{\eta}^{k}$ are not explicitly needed in Algorithm 1, but they are necessary to measure the optimality error and define the primal-dual iterate sequence that is analyzed in Theorem 2.2.

A fast rate of convergence can be proven under the following sufficient second-order optimality assumptions [3].

Assumption 1.

Suppose that $x^{*}$ is an optimal solution of the NLP (4) with corresponding KKT multipliers $\lambda^{*}$ , $\mu^{*}$ , and $\eta^{*}$ , satisfying the following properties:

(i)

Strict complementarity holds;
(ii)

the linear independence constraint qualification (LICQ) holds at $x^{*}$ , i.e., the gradients of the constraints that hold with equality at $x^{*}$ are linearly independent;
(iii)

the projection of the Lagrangian Hessian $H^{*}=\sum_{j\in\mathcal{D}(x^{*})}\mu^{*}_{j}{\rm NewA}^{2}_{xx}r_{j}(x_{j}^{*})$ into the null space of the gradients of the active constraints is positive definite.

Under these assumptions, the basic SQP algorithm reduces to Newton’s method applied to the optimality conditions of (4) and the following result holds [23].

Theorem 2.2.

Suppose that Assumption 1 holds and that the initial iterate $x^{0}$ and multipliers $\mu^{0}$ (used in the Hessian calculation) are sufficiently close to $x^{*}$ and $\mu^{*}$ , respectively. Then the iterates $(x^{k+1},\hat{\lambda}^{k},\hat{\mu}^{k},\hat{\eta}^{k})$ generated by the basic SQP algorithm, Algorithm 1, converge to $(x^{*},\lambda^{*},\mu^{*},\eta^{*})$ at a quadratic rate.

2.3 Penalty function

Theorem 2.2 is a local convergence result. Practical SQP algorithms include mechanisms that make sure that the iterates eventually reach such a neighborhood, even if the starting point is far away. To this end, we employ the exact penalty function

(8)

\varphi(x;\rho)=c^{T}x+\rho\sum_{j\in\mathcal{J}}[r_{j}(x_{j})]^{+}

in which $\rho>0$ is a penalty parameter. Note that we define $\varphi$ in terms of all conic constraints $\mathcal{J}$ , even though $r_{j}$ appears in (4c) only for $j\in\mathcal{D}(x^{*})$ . We do this because the proposed algorithm does not know $\mathcal{D}(x^{*})$ in advance and the violation of all cone constraints needs to be taken into account when the original problem (1) is solved. Nevertheless, in this section, we may safely ignore the terms for $j\not\in\mathcal{D}(x^{*})$ because for $j\in\mathcal{E}(x^{*})$ we have $x^{k}_{j}=0$ and hence $[r_{j}(x^{k})]^{+}=0$ for all $k$ due to (5d), and when $j\in\mathcal{N}(x^{*})$ , we have $r_{j}(x_{j}^{k})<0$ when $x^{k}$ is close to $x^{*}$ since $r_{j}(x_{j}^{*})<0$ .

It can be shown, under suitable assumptions, that the minimizers of $\varphi(\,\cdot\,;\rho)$ over the set defined by the linear constraints (4b),

(9)

X=\{x\in\mathbb{R}^{n}:Ax\leq b\},

coincide with the minimizers of (4) when $\rho$ is chosen sufficiently large. Because it is not known upfront how large $\rho$ needs to be, the algorithm uses an estimate, $\rho^{k}$ , in iteration $k$ , which might be increased during the course of the algorithm.

To ensure that the iterates eventually reach a minimizer of $\varphi(\,\cdot\,;\rho)$ , and therefore a solution of (4), we require that the decrease of $\varphi(\,\cdot\,;\rho)$ is at least a fraction of that achieved in the piece-wise linear model of $\varphi(\,\cdot\,;\rho)$ given by

(10)

m^{k}(x^{k}+d;\rho)=c^{T}(x^{k}+d)+\rho\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})+{\rm NewA}r_{j}(x^{k}_{j})^{T}d_{j}]^{+},

constructed at $x^{k}$ . More precisely, the algorithm accepts a trial point $\hat{x}^{k+1}=x^{k}+d$ as a new iterate only if the sufficient decrease condition

(11)		$\displaystyle\varphi(\hat{x}^{k+1};\rho^{k})-\varphi(x^{k};\rho^{k})\leq$	$\displaystyle\,c_{\text{dec}}\Big{(}m^{k}(x^{k}+d;\rho^{k})-m^{k}(x^{k};\rho^{k})\Big{)}$
	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:merit_model}}}{{=}}$	$\displaystyle\,c_{\text{dec}}\biggl{(}c^{T}d-\rho^{k}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})]^{+}\biggr{)}$

holds with some fixed constant $c_{\text{dec}}\in(0,1)$ . The trial iterate $\hat{x}^{k+1}=x^{k}+d^{k}$ with $d^{k}$ computed from (5) might not always satisfy this condition. The proposed algorithm generates a sequence of improved steps of which one is eventually accepted.

However, to apply Theorem 2.2, it would be necessary that the algorithm take the original step $d^{k}$ computed from (5); see Step 5 of Algorithm 1. Unfortunately, $\hat{x}^{k+1}=x^{k}+d^{k}$ might not be acceptable even when the iterate $x^{k}$ is arbitrarily close to a non-degenerate solution $x^{*}$ satisfying Assumption 1 (a phenomenon called the Maratos effect [19]). Our remedy is to employ the second-order correction step [9], $s^{k}$ , which is obtained as an optimal solution of the QP


(12a)	$\displaystyle\min_{s\in\mathbb{R}^{n}}\$	$\displaystyle c^{T}(d^{k}+s)+\tfrac{1}{2}(d^{k}+s)^{T}H^{k}(d^{k}+s)$
(12b)	s.t.	$\displaystyle A(x^{k}+d^{k}+s)\leq b,$
(12c)		$\displaystyle r_{j}(x_{j}^{k}+d_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k})^{T}s_{j}\leq 0,\qquad j\in\mathcal{D}(x^{*}),$
(12d)		$\displaystyle x_{j}^{k}+d^{k}_{j}+s_{j}=0,\hskip 101.00737ptj\in\mathcal{E}(x^{*}).$

For later reference, let $\hat{\lambda}^{S,k}$ , $\hat{\mu}^{S,k}$ and $\hat{\eta}^{S,k}$ denote optimal multiplier vectors corresponding to (12b)–(12d), respectively. The algorithm accepts the trial point $\hat{x}^{k+1}=x^{k}+d^{k}+s^{k}$ if it yields sufficient decrease (11) with respect to the original SQP step $d=d^{k}$ . Note that (12) is a variation of the second-order correction that is usually used in SQP methods, for which (12c) reads

r_{j}(x_{j}^{k}+d_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}(d_{j}^{k}+s_{j})\leq 0,\qquad j\in\mathcal{D}(x^{*}),

and avoids the evaluation of ${\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k})$ . In our setting, however, evaluating ${\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k})$ takes no extra work and (12c) is equivalent to a supporting hyperplane, see Section 3.1. As the following theorem shows (see, e.g., [9] or [5, Section 15.3.2.3]), this procedure computes steps with sufficient decrease (11) and results in quadratic convergence.

Theorem 2.3.

Let Assumption 1 hold and assume that the initial iterate $x^{0}$ and multipliers $\mu^{0}$ are sufficiently close to $x^{*}$ and $\mu^{*}$ , respectively. Further suppose that $\rho^{k}=\rho^{\infty}$ for large $k$ where $\rho^{\infty}>\mu_{j}^{*}$ for all $j\in\mathcal{D}(x^{*})$ .

1.

Consider an algorithm that generates a sequence of iterates by setting $(x^{k+1},\lambda^{k+1},\mu^{k+1},\eta^{k+1})=(x^{k}+d^{k},\hat{\lambda}^{k},\hat{\mu}^{k},\hat{\eta}^{k})$ or $(x^{k+1},\lambda^{k+1},\mu^{k+1},\eta^{k+1})=(x^{k}+d^{k}+s^{k},\hat{\lambda}^{S,k},\hat{\mu}^{S,k},\hat{\eta}^{S,k})$ for all $k=0,1,2,\ldots$ . Then $(x^{k},\lambda^{k},\mu^{k},\eta^{k})$ converges to $(x^{*},\lambda^{*},\mu^{*},\eta^{*})$ at a quadratic rate.
2.

Further, for all $k$ , either $\hat{x}^{k+1}=x^{k}+d^{k}$ or $\hat{x}^{k+1}=x^{k}+d^{k}+s^{k}$ satisfies the acceptance criterion (11).

2.4 Optimality conditions for SOCP

The proposed algorithm aims at finding an optimal solution of the SOCP (1), or equivalently, values of the primal variables, $x^{*}\in\mathbb{R}^{n}$ , and the dual variables, $\lambda^{*}\in\mathbb{R}^{m}$ and $z_{j}^{*}\in\mathbb{R}^{n_{j}}$ for $j\in\mathcal{J}$ , that satisfy the necessary and sufficient optimality conditions [1, Theorem 16]


(13a)	$\displaystyle c+A^{T}\lambda^{}-z^{}$	$\displaystyle=0,$
(13b)	$\displaystyle Ax^{*}-b\leq 0$	$\displaystyle\perp\lambda^{*}\geq 0,$
(13c)	$\displaystyle\mathcal{K}_{j}\ni x_{j}^{*}$	$\displaystyle\perp z_{j}^{*}\in\mathcal{K}_{j},\qquad j\in\mathcal{J}.$

A thorough discussion of SOCPs is given in the comprehensive review by Alizadeh and Goldfarb [1]. The authors consider the formulation in which the linear constraints (1b) are equality constraints, but the results in [1] can be easily extended to inequalities.

The primal-dual solution $(x^{*},\lambda^{*},z^{*})$ is unique under the following assumption.

Assumption 2.

$(x^{*},\lambda^{*},z^{*})$ is a non-degenerate primal-dual solution of the SOCP (1) at which strict complementarity holds.

The definition of non-degeneracy for SOCP is somewhat involved and we refer the reader to [1, Theorem 21]. Strict complementarity holds if $x_{j}^{*}+z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j})$ and implies that: (i) $x^{*}_{j}\in\operatorname{int}(K_{j})\Longrightarrow z^{*}_{j}=0$ ; (ii) $z^{*}_{j}\in\operatorname{int}(K_{j})\Longrightarrow x^{*}_{j}=0$ ; (iii) $x^{*}_{j}\in\operatorname{bd}(K_{j})\setminus\{0\}\Longleftrightarrow z^{*}_{j}\in\operatorname{bd}(K_{j})\setminus\{0\}$ ; and (iv) not both $x^{*}_{j}$ and $z_{j}^{*}$ are zero.

3 Algorithm

The proposed algorithm solves the NLP formulation (3) using a variation of the SQP method. Since the functional formulation of the cone constraints (3c) might not be differentiable at all iterates or at an optimal solution, the cones are approximated by a polyhedral outer approximation using supporting hyperplanes.

The approximation is done so that the method implicitly identifies the constraints that are extremal-active at an optimal solution $x^{*}$ , i.e., $\mathcal{E}(x^{*})=\mathcal{E}(x^{k})$ for large $k$ . More precisely, we will show that close to a non-degenerate optimal solution, the steps generated by the proposed algorithm are identical to those computed by the QP subproblem (5) for the basic SQP algorithm for solving (4). Consequently, fast local quadratic convergence is achieved, as discussed in Section 2.2.

3.1 Supporting hyperplanes

In the following, consider a particular cone $\mathcal{K}_{j}$ and let $\mathcal{Y}_{j}$ be a finite subset of $\{y_{j}\in\mathbb{R}^{n_{j}}:\bar{y}_{j}\neq 0,y_{j0}\geq 0\}$ . We define the cone

(14)

\mathcal{C}_{j}(\mathcal{Y}_{j})=\left\{x_{j}\in\mathbb{R}^{n_{j}}:x_{j0}\geq 0\text{ and }{\rm NewA}r_{j}(y_{j})^{T}x_{j}\leq 0\text{ for all }y_{j}\in\mathcal{Y}_{j}\right\}

generated by the points in $\mathcal{Y}_{j}$ . For each $x_{j}\in\mathcal{K}_{j}$ we have $r_{j}(x_{j})\leq 0$ , and using

(15)

{\rm NewA}r_{j}(x_{j})=\biggl{(}-1,\frac{\bar{x}_{j}^{T}}{\|\bar{x}_{j}\|}\biggr{)}^{T},

we obtain for any $y_{j}\in\mathcal{Y}_{j}$ that

{\rm NewA}r_{j}(y_{j})^{T}x_{j}=\frac{1}{\|\bar{y}_{j}\|}\bar{y}_{j}^{T}\bar{x}_{j}-x_{j0}\leq\frac{1}{\|\bar{y}_{j}\|}\|\bar{y}_{j}\|\|\bar{x}_{j}\|-x_{j0}=r_{j}(x_{j})\leq 0.

Therefore $\mathcal{C}_{j}(\mathcal{Y}_{j})\supseteq\mathcal{K}_{j}$ . Also, for $y_{j}\in\mathcal{Y}_{j}$ , consider $x_{j}=(1,\bar{y}_{j}^{T}/\|\bar{y}_{j}\|)^{T}.$ Then

{\rm NewA}r_{j}(y_{j})^{T}x_{j}=\frac{\bar{y}_{j}^{T}}{\|\bar{y}_{j}\|}\frac{\bar{y}_{j}}{\|\bar{y}_{j}\|}-1=1-1=0,

and also $r_{j}(x_{j})=\|\bar{x}_{j}\|-x_{j0}={\bar{y}_{j}}/{\|\bar{y}_{j}\|}-1=0.$ Hence $x_{j}\in\mathcal{C}_{j}(\mathcal{Y}_{j})\cap\mathcal{K}_{j}$ . Therefore, for any $y_{j}\in\mathcal{Y}_{j}$ , the inequality

(16)

{\rm NewA}r_{j}(y_{j})^{T}x_{j}\leq 0

defines a hyperplane that supports $\mathcal{K}_{j}$ at $(1,{\bar{y}_{j}}/{\|\bar{y}_{j}\|})$ . In summary, $\mathcal{C}_{j}(\mathcal{Y}_{j})$ is a polyhedral outer approximation of $\mathcal{K}_{j}$ , defined by supporting hyperplanes.

In addition, writing $\mathcal{Y}_{j}=\{y_{j,1},\ldots,y_{j,m}\}$ , we also define the cone

(17)

\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j}):=\left\{-\sum_{l=1}^{m}\sigma_{j,l}{\rm NewA}r_{j}(y_{j,l})+\eta_{j}e_{j0}:\sigma_{j}\in\mathbb{R}^{m}_{+},\eta_{j}\geq 0\right\}.

For all $x_{j}\in\mathcal{C}_{j}(\mathcal{Y}_{j})$ and $z_{j}=-\sum_{l=1}^{m}\sigma_{j,l}{\rm NewA}r_{j}(y_{j,l})+\eta_{j}e_{j0}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j})$ , we have

x_{j}^{T}z_{j}=-\sum_{l=1}^{m}\sigma_{j,l}{\rm NewA}r_{j}(y_{j,l})^{T}x_{j}+\eta_{j}x_{j0}\geq 0

because ${\rm NewA}r_{j}(y_{j,l})^{T}x_{j}\leq 0$ and $x_{j0}\geq 0$ from the definition of $\mathcal{C}_{j}(\mathcal{Y}_{j})$ . Therefore $\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j})$ is included in the dual of the cone $\mathcal{C}_{j}(\mathcal{Y}_{j})$ .

Now define $R=[-{\rm NewA}r_{j}(y_{j,1}),\ldots,-{\rm NewA}r_{j}(y_{j,m}),e_{j0}]$ and let $z_{j}\in\mathbb{R}^{n_{j}}$ be in the dual of $\mathcal{C}_{j}(\mathcal{Y}_{j})$ . Since this implies that $x_{j}^{T}z_{j}\geq 0$ for all $x\in\mathcal{C}_{j}(\mathcal{Y}_{j})=\{\mathbb{R}^{n_{j}}:R^{T}x_{j}\geq 0\}$ , Farkas’ lemma yields that $z_{j}=R\cdot(\sigma^{T},\eta)^{T}$ for some $\sigma_{j}\in\mathbb{R}^{m}_{+}$ and $\eta_{j}\geq 0$ , i.e., $z_{j}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j})$ .

Overall we proved that $\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j})$ defined in (17) is the dual of $\mathcal{C}_{j}(\mathcal{Y}_{j})$ , and since $\mathcal{C}_{j}(\mathcal{Y}_{j})\supseteq\mathcal{K}_{j}$ , this implies $\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j})\subseteq\mathcal{K}_{j}$ .

3.2 QP subproblem

In each iteration, at an iterate $x^{k}$ , the proposed algorithm computes a step $d^{k}$ as an optimal solution of the subproblem


(18a)	$\displaystyle\min_{d\in\mathbb{R}^{n}}\$	$\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d$
(18b)	s.t.	$\displaystyle A(x^{k}+d)\leq b,$
(18c)		$\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}\leq 0,\qquad j\in\mathcal{D}(x^{k}),$
(18d)		$\displaystyle x_{j}^{k}+d_{j}\in\mathcal{C}_{j}(\mathcal{Y}^{k}_{j}),\hskip 54.89566ptj\in\mathcal{J}.$

Here, $H^{k}$ is a positive semi-definite matrix that captures the curvature of the nonlinear constraint (3c), and for each cone, $\mathcal{Y}^{k}_{j}$ is the set of hyperplane-generating points that have been accumulated up to this iteration. From (14), we see that (18d) can be replaced by linear constraints. Consequently, (18) is a QP and can be solved as such.

Algorithm 2 Preliminary SQP Algorithm

1:Initial iterate

x^{0}

and sets

\mathcal{Y}_{j}^{0}

for

j\in\mathcal{J}

2:for

k=0,1,2\cdots

3: Choose

H^{k}

4: Solve subproblem (18) to get step

d^{k}

5: Set

x^{k+1}\leftarrow x^{k}+d^{k}

6: Set

\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}_{pr,j}^{+}(\mathcal{Y}^{k}_{j},x^{k}_{j})

for

j\in\mathcal{J}

7:end for

Algorithm 2 describes a preliminary version of the proposed SQP method based on this subproblem. Observe that the linearization (18c) can be rewritten as

	$\displaystyle 0$	$\displaystyle\geq r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}=\\|\bar{x}^{k}_{j}\\|-x^{k}_{j0}-d_{j0}+\frac{(\bar{x}^{k}_{j})^{T}\bar{d}_{j}}{\\|\bar{x}^{k}_{j}\\|}$
		$\displaystyle=\frac{1}{\\|\bar{x}^{k}_{j}\\|}(\bar{x}^{k}_{j})^{T}(\bar{x}^{k}_{j}+\bar{d}_{j})-(x^{k}_{j0}+d_{j0})={\rm NewA}r_{j}(x^{k}_{j})^{T}(x^{k}_{j}+d_{j})$

and is equivalent to the hyperplane constraint generated at $x_{j}^{k}$ . Consequently, if $x^{k}_{j}\not\in\mathcal{K}_{j}$ , then $r_{j}(x^{k}_{j})>0$ and (18c) acts as a cutting plane that excludes $x^{k}_{j}$ . Using the update rule

(19)

\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}_{j},x_{j})=\begin{cases}\mathcal{Y}_{j}\cup\{x_{j}\}&\text{ if }\bar{x}_{j}\neq 0\text{ and }r_{j}(x_{j})>0,\\ \mathcal{Y}_{j}&\text{ otherwise,}\end{cases}

in Step 6 makes sure that $x^{k}_{j}$ is excluded in all future iterations.

In our algorithm, we initialize $\mathcal{Y}_{j}^{0}$ so that

(20)

\mathcal{Y}_{j}^{0}\supseteq\hat{\mathcal{Y}}^{0}_{j}:=\{e_{ji}:i=1,\ldots,n_{j}-1\}\cup\{-e_{ji}:i=1,\ldots,n_{j}-1\}.

In this way, $x_{j}=0$ is an extreme point of $\mathcal{C}_{j}(\mathcal{Y}_{j}^{0})$ , as it is for $\mathcal{K}_{j}$ , and the challenging aspect of the cone is already captured in the first subproblem. By choosing the coordinate vectors $e_{ji}$ we have ${\rm NewA}r_{j}(e_{ji})^{T}x_{j}=x_{ji}-x_{j0},$ and the hyperplane constraint (16) becomes a very sparse linear constraint.

When $H^{k}=0$ in each iteration, this procedure becomes the standard cutting plane algorithm for the SOCP (1). It is well-known that the cutting plane algorithm is convergent in the sense that every limit point of the iterates is an optimal solution of the SOCP (1), but the convergence is typically slow. In the following sections, we describe how Algorithm 2 is augmented to achieve fast local convergence. The full method is stated formally in Algorithm 3.

3.3 Identification of extremal-active cones

We now describe a strategy that enables our algorithm to identify those cones that are extreme-active at a non-degenerate solution $x^{*}$ within a finite number of iterations, i.e., $\mathcal{E}(x^{k})=\mathcal{E}(x^{*})$ for all large $k$ . This will make it possible to apply a second-order method and achieve quadratic local convergence.

Consider the optimality conditions for the QP subproblem (18):


(21a)	$\displaystyle c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}+\sum_{j\in\mathcal{D}(x^{k})}\hat{\mu}_{j}^{k}{\rm NewA}_{x}r_{j}(x^{k})-\hat{\nu}^{k}$	$\displaystyle=0,$
(21b)	$\displaystyle A(x^{k}+d^{k})-b\leq 0$	$\displaystyle\perp\hat{\lambda}^{k}\geq 0,$
(21c)	$\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}^{k}\leq 0$	$\displaystyle\perp\hat{\mu}_{j}\geq 0,\hskip 34.14322ptj\in\mathcal{D}(x^{k}),$
(21d)	$\displaystyle\mathcal{C}_{j}(\mathcal{Y}_{j}^{k})\ni x^{k}_{j}+d_{j}^{k}$	$\displaystyle\perp\hat{\nu}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j}^{k}),\quad j\in\mathcal{J}.$

Here, $\hat{\lambda}^{k}$ , $\hat{\mu}^{k}_{j}$ , and $\hat{\nu}_{j}^{k}$ are the multipliers corresponding to the constraints in (18); for completeness, we define $\hat{\mu}_{j}^{k}=0$ for $j\in\mathcal{J}\setminus\mathcal{D}(x^{k})$ . In (21a), ${\rm NewA}_{x}r_{j}(x^{k})$ is the vector in $\mathbb{R}^{n}$ that contains ${\rm NewA}r_{j}(x_{j}^{k})$ in the elements corresponding to $x_{j}$ and is zero otherwise. Similarly, $\hat{\nu}^{k}\in\mathbb{R}^{n}$ is equal to $\hat{\nu}^{k}_{j}$ in the elements corresponding to $x_{j}$ for all $j\in\mathcal{J}$ and zero otherwise.

Let us define

(22)

\hat{\mathcal{Y}}_{j}^{k}:=\begin{cases}\mathcal{Y}_{j}^{k}\cup\{x^{k}_{j}\},&\text{ if }j\in\mathcal{D}(x^{k}),\\ \mathcal{Y}_{j}^{k},&\text{ if }j\in\mathcal{J}\setminus\mathcal{D}(x^{k}).\end{cases}

It is easy to verify that, for $j\in\mathcal{D}(x^{k})$ , ${\rm NewA}r_{j}(x_{j}^{k})x_{j}^{k}=r_{j}(x_{j}^{k})$ and hence $r_{j}(x_{j}^{k})^{T}(x_{j}^{k}+d^{k})\leq 0$ from (21c). As a consequence we obtain $x^{k}_{j}+d_{j}^{k}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}_{j}^{k})$ for all $j\in\mathcal{J}$ . Furthermore, $\hat{\nu}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j}^{k})$ implies that

\hat{\nu}_{j}^{k}=-\sum_{l=1}^{m}\sigma^{k}_{j,l}{\rm NewA}r_{j}(y^{k}_{j,l})+\eta^{k}_{j}e_{j0}

for suitable values of $\sigma^{k}_{j,l}\geq 0$ and $\eta^{k}_{j}\geq 0$ . Then $\hat{z}_{j}^{k}:=-\hat{\mu}_{j}^{k}\nabla r_{j}(x^{k})+\hat{\nu}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k})$ and

(23)

\hat{z}^{k}=c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}

from (21a). In conclusion, if $(d,\hat{\lambda}^{k},\hat{\mu}^{k},\hat{\nu}^{k})$ is a primal-dual solution of the QP subproblem (18), then $(d,\hat{\lambda}^{k},\hat{z}^{k})$ satisfies the conditions


(24a)	$\displaystyle c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}-\hat{z}^{k}$	$\displaystyle=0,$
(24b)	$\displaystyle A(x^{k}+d^{k})-b\leq 0$	$\displaystyle\perp\hat{\lambda}^{k}\geq 0,$
(24c)	$\displaystyle\mathcal{C}_{j}(\hat{\mathcal{Y}}_{j}^{k})\ni x^{k}_{j}+d_{j}^{k}$	$\displaystyle\perp\hat{z}_{j}^{k}\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k}),\quad j\in\mathcal{J},$

which more closely resembles the SOCP optimality conditions (13). Our algorithm maintains primal-dual iterates $(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})$ that are updated based on (24).

Suppose that strict-complementarity holds at a primal-dual solution $(x^{*},\lambda^{*},z^{*})$ of the SOCP (1) and that $(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})\to(x^{*},\lambda^{*},z^{*})$ . If $j\not\in\mathcal{E}(x^{*})$ then $x_{j}^{*}\in\mathcal{K}_{j}$ implies $x^{*}_{j0}>0$ . As $x^{k}_{j}$ converges to $x^{*}_{j}$ , we have $x^{k}_{j0}>0$ and therefore $j\not\in\mathcal{E}(x^{k})$ for sufficiently large $k$ . This yields $\mathcal{E}(x^{k})\subseteq\mathcal{E}(x^{*})$ . We now derive a modification of Algorithm 2 that ensures that $\mathcal{E}(x^{*})\subseteq\mathcal{E}(x^{k})$ for all sufficiently large $k$ under Assumption 2.

Consider any $j\in\mathcal{E}(x^{*})$ . We would like to have

(25)

\hat{z}_{j}^{k}\in\operatorname{int}(\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k}))

for all large $k$ , since then complementarity in (24c) implies that $x_{j}^{k+1}=x_{j}^{k}+d^{k}_{j}=0$ and hence $j\in\mathcal{E}(x^{k+1})$ for all large $k$ . We will later show that Assumption 2 implies that $\hat{z}_{j}^{k}\to z^{*}_{j}$ and that there exists a neighborhood $N_{\epsilon}(z^{*}_{j})=\{z_{j}\in\mathbb{R}^{n_{j}}:\|z_{j}-z^{*}_{j}\|\leq\epsilon\}$ of $z^{*}_{j}$ so that $z_{j}\in\operatorname{int}(\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{0}\cup\{-y_{j}\}))$ if $z_{j},y_{j}\in N_{\epsilon}(z^{*}_{j})$ ; see Remark 4.18. This suggests that some vector close to $-z_{j}^{*}$ should eventually be included in $\hat{\mathcal{Y}}_{j}^{k}$ because then (25) holds when $\hat{z}_{j}^{k}$ is close enough to $z^{*}_{j}$ . For this purpose, the algorithm computes

\check{z}^{k}=c+A^{T}\hat{\lambda}^{k},

which also converges to $z^{*}_{j}$ (see (13a)), and sets $\mathcal{Y}^{k+1}_{j}$ to $\mathcal{Y}^{+}_{du,j}(\mathcal{Y}^{k}_{j},x^{k}_{j},\check{z}^{k}_{j})$ , where

(26)

\mathcal{Y}^{+}_{du,j}(\mathcal{Y}_{j},x_{j},z_{j})=\begin{cases}\mathcal{Y}_{j}\cup\{-z_{j}\}&\text{ if }x_{j}\neq 0,\bar{z}_{j}\neq 0\text{ and }r_{j}(z_{j})<0,\\ \mathcal{Y}_{j}&\text{ otherwise.}\end{cases}

The update is skipped when $x^{k}_{j}=0$ (because then $j$ is already in $\mathcal{E}(x^{k})$ and no additional hyperplane is needed), and when $\bar{\check{z}}^{k}_{j}=0$ or $r_{j}(\check{z}^{k}_{j})\geq 0$ , which might indicate that $z^{*}_{j}\not\in\operatorname{int}(\mathcal{K}_{j})$ and $j\not\in\mathcal{E}(x^{*})$ .

3.4 Fast NLP-SQP steps

Now that we have a mechanism in place that makes sure that the extremal-active cones are identified in a finite number of iterations, we present a strategy that emulates the basic SQP Algorithm 1 and automatically takes quadratically convergent SQP steps, i.e., solutions of the SQP subproblem (5), close to $x^{*}$ . For the discussion in this section, we again assume that $x^{*}$ is a unique solution at which Assumption 2 holds.

Suppose that $\mathcal{E}(x^{k})=\mathcal{E}(x^{*})$ for large $k$ due to the strategy discussed in Section 3.3. This means that the outer approximation (18d) of $\mathcal{K}_{j}$ for $j\in\mathcal{E}(x^{*})$ is sufficient to fix $x_{j}^{k}$ to zero and is therefore equivalent to the constraint (5d) in the basic SQP subproblem. However, (18) includes the outer approximations for all cones, including those for $j\not\in\mathcal{E}(x^{*})$ , which are not present in (5). Consequently, the desired SQP step from (5) might not be feasible for (18).

As a remedy, at the beginning of an iteration, the algorithm first computes an NLP-SQP step as an optimal solution $d^{S,k}$ of a relaxation of (18),


(27a)	$\displaystyle\min_{d\in\mathbb{R}^{n}}\$	$\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d$
(27b)	s.t.	$\displaystyle A(x^{k}+d)\leq b$
(27c)		$\displaystyle r_{j}(x_{j}^{k})+{\rm NewA}r_{j}(x_{j}^{k})^{T}d_{j}\leq 0,\qquad j\in\mathcal{D}(x^{k})$
(27d)		$\displaystyle x_{j0}^{k}+d_{j0}\geq 0,\hskip 71.13188ptj\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k}$
(27e)		$\displaystyle x_{j}^{k}+d_{j}\in\mathcal{C}_{j}(\mathcal{Y}^{k}_{j})\hskip 59.75095ptj\in\hat{\mathcal{E}}^{k},$

where $\hat{\mathcal{E}}^{k}=\mathcal{E}(x^{k})$ . In this way, the outer approximations are imposed only for the currently extremal-active cones, while for all other cones only the linearization (27c) is considered, just like in (5), with the additional restriction (27d) that ensure $x^{k+1}_{j0}\geq 0$ . Let $\hat{\lambda}^{k}$ , $\hat{\mu}_{j}^{k}$ , $\hat{\eta}^{k}_{j}$ , and $\hat{\nu}^{k}_{j}$ be the optimal corresponding to the constraints in (27) (set to zero for non-existing constraints) and define $\hat{z}^{k}$ as in (23). Then the optimality conditions (24) hold again, this time with $d^{k}=d^{S,k}$ , but instead of (22) we have

(28)

\hat{\mathcal{Y}}_{j}^{k}:=\begin{cases}\{x^{k}_{j}\}&\text{ if }j\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k},\\ \mathcal{Y}_{j}^{k}\cup\{x^{k}_{j}\}&\text{ if }j\in\hat{\mathcal{E}}^{k}\cap\mathcal{D}(x^{k}),\\ \mathcal{Y}_{j}^{k}&\text{ if }j\in\hat{\mathcal{E}}^{k}\setminus\mathcal{D}(x^{k}).\end{cases}

When $x^{k}$ is not close to $x^{*}$ and $\mathcal{E}(x^{*})\neq\mathcal{E}(x^{k})$ , QP (27) might result in poor steps that go far outside of $\mathcal{K}_{j}$ for some $j\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k}$ and undermine convergence. Therefore, we iteratively add more cones to $\hat{\mathcal{E}}^{k}$ until

(29)

x_{j0}^{k}+d_{j0}^{S,k}>0\text{ only for }j\in\mathcal{J}\setminus\hat{\mathcal{E}}^{k},

i.e., when a cone is approximated only by its linearization (27c), the step does not appear to target its extreme point. This property is necessary to show that $\mathcal{E}(x^{k})=\mathcal{E}(x^{*})$ for all large $k$ also for the case that new iterates are computed from (27) instead of (18). Note that in the extreme case $\hat{\mathcal{E}}^{k}=\mathcal{J}$ and (27) is identical to (18). This loop can be found in Steps 7–10 in Algorithm 3.

Since there is no guarantee that (27) yields iterates that converge to $x^{*}$ , the algorithm discards the NLP-SQP step in certain situations and falls back to the original method to recompute a new step from (18), as in the original method. In Section 3.6 we describe how we use the exact penalty function (8) to determine when this is necessary.

3.5 Hessian matrix

Motivated by (6), we compute the Hessian matrix $H^{k}$ in (18) and (27) from

(30)

H^{k}=\sum_{j\in\mathcal{D}(x^{k})}\mu^{k}_{j}{\rm NewA}^{2}_{xx}r_{j}(x^{k}),

where $\mu_{j}^{k}\geq 0$ are multiplier estimates for the nonlinear constraint (3c). Because ${\rm NewA}^{2}r_{j}(x_{j}^{k})$ is positive semi-definite and $\mu_{j}^{k}\geq 0$ , also $H^{k}$ is positive semi-definite.

In the final phase, when we intend to emulate the basic SQP Algorithm 1. Therefore, we set $\mu_{j}^{k+1}=\hat{\mu}^{k}_{j}$ for $j\in\mathcal{D}(x^{k})$ , where $\hat{\mu}_{j}^{k}$ are the optimal multipliers for (27c), when the fast NLP-SQP step was accepted. But we also need to define a value for $\mu_{j}^{k+1}$ when the step is computed from (18) where, in addition to the linearization of $r_{j}$ , hyperplanes (18d) are used to approximate all cones. By comparing the optimality conditions of the QPs (18) and (5) we now derive an update for $\mu_{j}^{k+1}$ .

Suppose that $j\in\mathcal{D}(x^{k+1})\cap\mathcal{D}(x^{k})$ . Then (21a) yields

(31)

c_{j}+H_{jj}^{k}d^{k}_{j}+A_{j}^{T}\hat{\lambda}^{k}+\hat{\mu}_{j}^{k}{\rm NewA}r_{j}(x_{j}^{k})-\hat{\nu}^{k}_{j}=0,

where $H^{k}_{jj}=\mu_{j}^{k}{\rm NewA}^{2}r_{j}(x^{k}_{j})$ because of (30). Here, the dual information for the nonlinear constraint is split into $\hat{\mu}_{j}^{k}$ and $\hat{\nu}^{k}_{j}$ and needs to be condensed into a single number, $\mu_{j}^{k+1}$ , so that we can compute $H^{k}$ from (30) in the next iteration.

Recall that, in the basic SQP Algorithm 1, the new multipliers $\mu_{j}^{k+1}$ are set to the optimal multipliers of the QP (5), which satisfy

(32)

c_{j}+H_{jj}^{k}d_{j}^{k}+A_{j}^{T}\hat{\lambda}^{k}+\mu_{j}^{k+1}{\rm NewA}r_{j}(x_{j}^{k})=0.

A comparison with (31) suggests to choose $\mu_{j}^{k+1}$ so that $\mu_{j}^{k+1}{\rm NewA}r_{j}(x_{j}^{k})\approx\hat{\mu}_{j}^{k}{\rm NewA}r_{j}(x_{j}^{k})-\hat{\nu}^{k}_{j}.$ Multiplying both sides with ${\rm NewA}r_{j}(x^{k}_{j})^{T}$ and solving for $\mu^{k+1}$ yields

\mu^{k+1}_{j}=\hat{\mu}_{j}^{k}-\frac{{\rm NewA}r_{j}(x_{j}^{k})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k})\|^{2}}.

Note that $\mu_{j}^{k+1}=\hat{\mu}_{j}^{k}$ if the outer approximation constraint (18d) is not active and therefore $\hat{\nu}_{j}^{k}=0$ for $j$ . In this case, we recover the basic SQP update, as desired.

Now suppose that $j\in\mathcal{D}(x^{k+1})\setminus\mathcal{D}(x^{k})$ . Again comparing (31) with (32) suggests a choice so that $\mu_{j}^{k+1}{\rm NewA}r_{j}(x_{j}^{k+1})\approx-\hat{\nu}^{k}_{j},$ where we substituted ${\rm NewA}r_{j}(x^{k}_{j})$ by ${\rm NewA}r_{j}(x^{k+1}_{j})$ because the former is not defined for $j\not\in\mathcal{D}(x^{k})$ . In this case, multiplying both sides with ${\rm NewA}r_{j}(x^{k+1}_{j})^{T}$ and solving for $\mu^{k+1}$ yields

\mu^{k+1}_{j}=-\frac{{\rm NewA}r_{j}(x_{j}^{k+1})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k+1})\|^{2}}.

In summary, in each iteration in which (18) determines the new iterate, we update

(33)

\mu^{k+1}_{j}=\begin{cases}\hat{\mu}_{j}^{k}-\frac{{\rm NewA}r_{j}(x_{j}^{k})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k})\|^{2}}&j\in\mathcal{D}(x^{k+1})\cap\mathcal{D}(x^{k})\\[12.91663pt] -\frac{{\rm NewA}r_{j}(x_{j}^{k+1})^{T}\hat{\nu}^{k}_{j}}{\|{\rm NewA}r_{j}(x_{j}^{k+1})\|^{2}}&j\in\mathcal{D}(x^{k+1})\setminus\mathcal{D}(x^{k})\\ 0&\text{otherwise.}\end{cases}

The choice above leads to quadratic convergence for non-degenerate instances, but it is common for the global convergence analysis of SQP methods to permit any positive semi-definite Hessian matrix $H^{k}$ , as long as it is bounded. Since we were not able to exclude the case that $\mu_{j}^{k}$ or $1/x_{j0}^{k}$ are unbounded for some cone $j\in\mathcal{J}$ , in which case $H^{k}$ defined in (30) is unbounded, we fix a large threshold $c_{H}>0$ and rescale the Hessian matrix according to

(34)

H_{k}\leftarrow H_{k}\cdot\min\{1,c_{H}/\|H_{k}\|\}

so that $\|H_{k}\|\leq c_{H}$ in every iteration. In this way, global convergence is guaranteed, but the fast local convergence rate might be impaired if $c_{H}$ is chosen too small so that $H^{k}$ defined in (30) must be rescaled. Therefore, in practice, we set $c_{H}$ to a very large number and (34) is never actually triggered in our numerical experiments.

3.6 Penalty function

The steps computed from (18) and (27) do not necessarily yield a convergent algorithm and a safeguard is required to force the iterates into a neighborhood of an optimal solution. Here, we utilized the exact penalty function (8) and accept a new iterate only if the sufficient decrease condition (11) holds.

As discussed in Section 3.4, at the beginning of an iteration, the algorithm first computes an NLP-SQP step $d^{S,k}$ from (27). The penalty function can now help us to decide whether this step makes sufficient progress towards an optimal solution, and we only accept the trial point $\hat{x}^{k+1}=x^{k}+d^{S,k}$ as a new iterate if (11) holds with $d=d^{S,k}$ .

If the penalty function does not accept $d^{S,k}$ , there is still a chance that $d^{S,k}$ is making rapid progress towards the solution, but, as discussed in Section 2.2, the Maratos effect is preventing the acceptance of $d^{S,k}$ . As a remedy, we compute, analogously to (12), a second-order correction step $s^{k}$ for (27) as a solution of

(35)	$\displaystyle\min_{s\in\mathbb{R}^{n}}$	$\displaystyle c^{T}(d^{S,k}+s)+\tfrac{1}{2}(d^{S,k}+s)^{T}H^{k}(d^{S,k}+s)$
	s.t.	$\displaystyle A(x^{k}+d^{S,k}+s)\leq b,$
		$\displaystyle r_{j}(x_{j}^{k}+d_{j}^{S,k})+{\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{S,k})^{T}s_{j}\leq 0,\qquad j\in\mathcal{D}(x^{k}),$
		$\displaystyle x_{j0}^{k}+d_{j0}^{S,k}+s_{j0}\geq 0,\hskip 101.00737ptj\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k},$
		$\displaystyle x_{j}^{k}+d^{S,k}_{j}+s_{j}\in\mathcal{C}_{j}(\mathcal{Y}^{k}_{j}),\hskip 85.35826ptj\in\hat{\mathcal{E}}^{k},$

and accept the trial point $\hat{x}^{k+1}=x^{k}+d^{S,k}+s^{k}$ if it satisfies (11) with $d=d^{S,k}$ . Let again $\hat{\lambda}^{k}$ , $\hat{\mu}_{j}^{k}$ , $\hat{\eta}^{k}_{j}$ , and $\hat{\nu}^{k}_{j}$ denote the optimal multipliers in (35) and define $\hat{z}^{k}$ as in (23). The optimality conditions (24) still hold, this time with $d^{k}=d^{S,k}+s^{k}$ and

(36)

\hat{\mathcal{Y}}_{j}^{k}:=\begin{cases}\{x^{k}_{j}+d_{j}^{S,k}\},&\text{ if }j\in\mathcal{D}(x^{k})\setminus\hat{\mathcal{E}}^{k},\\ \mathcal{Y}_{j}^{k}\cup\{x^{k}_{j}+d_{j}^{S,k}\},&\text{ if }j\in\mathcal{D}(x^{k})\cap\hat{\mathcal{E}}^{k},\\ \mathcal{Y}_{j}^{k},&\text{ if }j\in\hat{\mathcal{E}}^{k}.\end{cases}

If neither $d^{S,k}$ nor $d^{S,k}+s^{k}$ has been accepted, we give up on fast NLP-SQP steps and instead revert to QP (18) which safely approximates every cone with an outer approximation. However, the trial point $\hat{x}^{k+1}=x^{k}+d^{k}$ with the step $d^{k}$ obtained from (18) does not necessarily satisfy (11). In that case, the algorithm adds $x^{k}+d^{k}$ to $\mathcal{Y}_{j}^{k}$ to cut off $x^{k}+d^{k}$ and resolves (18) to get a new trial step $d^{k}$ . In an inner loop (Steps 22–32), this procedure is repeated until, eventually, a trial step is obtained that satisfies (11). We will show that (11) holds after a finite number of iterations of the inner loop.

It remains to discuss the update of the penalty parameter estimate $\rho^{k}$ . One can show (see Lemma 4.1) that an optimal solution of $x^{*}$ of the SOCP with multipliers $z^{*}$ is a minimizer of $\phi(\;\cdot,\rho)$ over the set $X$ defined in (9) if $\rho>\|z^{*}_{\mathcal{J},0}\|_{\infty}$ , where $z^{*}_{\mathcal{J},0}=(z^{*}_{1,0},\ldots,z^{*}_{p,0})^{T}$ . Since $z^{*}$ is not known a priori, the algorithm uses the update rule $\rho^{k}=\rho_{\text{new}}(\rho^{k-1},z^{k})$ where

(37)

\rho_{\text{new}}(\rho_{\text{old}},z):=\begin{cases}\rho_{\text{old}}&\text{ if }\rho_{\text{old}}>\|z_{\mathcal{J},0}\|_{\infty}\\ c_{\text{inc}}\cdot\|z_{\mathcal{J},0}\|_{\infty}&\text{ otherwise,}\end{cases}

with $c_{\text{inc}}>1$ . We will prove in Lemma 4.8 that the sequence $\{z^{k}\}_{k=1}^{\infty}$ is bounded under Slater’s constraint qualification. Therefore, this rule will eventually settle at a final penalty parameter $\rho^{\infty}$ that is not changed after a finite number of iterations.

During an iteration of the algorithm, several trial steps may be considered and a preliminary parameter value is computed from (37) for each one. At the end of the iteration, the parameter value corresponding to the accepted trial step is stored. Note that the acceptance test for the second-order correction step from (35) needs to be done with the penalty parameter computed for the regular NLP-SQP step from (27).

3.7 Complete algorithm

Algorithm 3 SQP Algorithm for SOCP.

1:Initial iterate

x^{0}\in X

with

x_{j,0}\geq 0

, multipliers

\mu_{j}^{0}\in\mathbb{R}_{+}

, penalty parameter

\rho^{-1}>0

; constants

c_{\text{dec}}\in(0,1)

c_{\text{inc}}>1

, and

c_{H}>0

2:Initialize

\mathcal{Y}_{j}^{0}

so that (20) is satisfied.

3:for

k=0,1,2,\ldots

4: Compute

H^{k}

using (30). Rescale according to (34) if

\|H^{k}\|>c_{H}

5: Set

\hat{\mathcal{E}}^{k}\leftarrow\mathcal{E}(x^{k})

6: Compute

d^{S,k}

\hat{\lambda}^{k}

\hat{\mu}^{k}

\hat{z}^{k}

from (27) and (23) and set

\hat{x}^{k+1}=x^{k}+d^{S,k}

7: while

\{j\in\mathcal{J}:x_{j0}^{k}+d^{S,k}_{j0}=0\}\not\subseteq\hat{\mathcal{E}}^{k}

8: Set

\hat{\mathcal{E}}^{k}\leftarrow\hat{\mathcal{E}}^{k}\cup\{j\in\mathcal{J}:x_{j0}^{k}+d^{S,k}_{j0}=0\}

9: Recompute

d^{S,k}

\hat{\lambda}^{k}

\hat{\mu}^{k}

\hat{z}^{k}

from (27) and (23) and set

\hat{x}^{k+1}=x^{k}+d^{S,k}

10: end while

11: Compute candidate penalty parameter

\rho^{k}=\rho_{\text{new}}(\rho^{k-1},\hat{z}^{k})

, see (37).

12: if (11) holds for

d=d^{S,k}

then

13: Set

\mathcal{Y}^{k+1}_{j}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k}_{j},x_{j}^{k})

using (19) and set

d^{k}=d^{S,k}

14: Set

\mu^{k+1}=\hat{\mu}^{k}

and go to Step 34.

15: end if

16: Compute

s^{k}

\hat{\lambda}^{k}

\hat{\mu}^{k}

\hat{z}^{k}

from (35) and (23) and set

\hat{x}^{k+1}=x^{k}+d^{S,k}+s^{k}

17: if (11) holds for

d=d^{S,k}

and

\{j\in\mathcal{J}:x_{j0}^{k}+d^{S,k}_{j0}+s^{k}=0\}\subseteq\hat{\mathcal{E}}^{k}

then

18: Set

\mathcal{Y}^{k+1}_{j}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k}_{j},x_{j}^{k})

and

d^{k}=d^{S,k}

19: Set

\mu^{k+1}=\hat{\mu}^{k}

and go to Step 34.

20: end if

21: Set

\mathcal{Y}_{j}^{k,0}\leftarrow\mathcal{Y}^{k}_{j}

22: for

l=0,1,2,\ldots

23: Compute

d^{k,l}

\hat{\lambda}^{k}

\hat{\mu}^{k}

\hat{z}^{k}

from (18) and (23) and set

\hat{x}^{k+1}=x^{k}+d^{k,l}

24: Compute candidate penalty parameter

\rho^{k}=\rho_{\text{new}}(\rho^{k-1},\hat{z}^{k})

25: if (11) holds for

d=d^{k,l}

then

26: Set

\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k,l}_{j},x_{j}^{k})

and

d^{k}=d^{k,l}

27: Go to Step 33.

28: end if

29: Set

\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}^{+}_{pr,j}(\mathcal{Y}^{k,l}_{j},\hat{x}_{j}^{k+1})

, see (19).

30: Compute

\check{z}^{k}=c+A^{T}\hat{\lambda}^{k}

31: Update

\mathcal{Y}_{j}^{k,l+1}\leftarrow\mathcal{Y}^{+}_{du,j}(\mathcal{Y}^{k,l+1}_{j},\hat{x}_{j}^{k+1},\check{z}^{k}_{j})

, see (26).

32: end for

33: Compute

\mu^{k+1}

from (33).

34: Compute

\check{z}^{k}=c+A^{T}\hat{\lambda}^{k}

and update

\mathcal{Y}_{j}^{k+1}\leftarrow\mathcal{Y}^{+}_{du,j}(\mathcal{Y}^{k+1}_{j},x_{j}^{k},\check{z}^{k}_{j})

35: Set

x^{k+1}\leftarrow\hat{x}^{k+1}

36: If

(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})

satisfy (13), stop.

37:end for

The complete method is stated in Algorithm 3. To keep the notation concise, we omit “for all $j\in\mathcal{J}$ ” whenever the index $j$ is used. We assume that all QPs in the algorithm are solved exactly.

Each iteration begins with the computation of the fast NLP-SQP step where an inner loop repeatedly adds cones to $\hat{\mathcal{E}}^{k}$ until (29) holds. If the step achieves a sufficient decrease in the penalty function, the trial point is accepted. Otherwise, the second-order correction for the NLP-SQP step is computed and accepted if it yields a sufficient decrease for the NLP-SQP step. Note that the second-order correction step is discarded if it does not satisfy (29) since otherwise finite identification of $\mathcal{E}(x^{*})$ cannot be guaranteed. If none of the NLP-SQP steps was acceptable, the algorithm proceeds with an inner loop in which hyperplanes cutting off the current trial point are repeatedly added until the penalty function is sufficiently decreased. No matter which step is taken, both $x_{j}^{k}$ and $\check{z}_{j}^{k}$ are added to $\mathcal{Y}_{j}^{k}$ according to the update rules (19) and (26) and the multiplier $\mu^{k}$ for the nonlinear constraints is updated.

In most cases, a new QP is obtained by adding only a few constraints to the most recently solved QP, and a hot-started QP solver will typically compute the new solution quickly. For example, in each inner iteration in Steps 7–10, hyperplanes for the polyhedral outer approximation for cones augmenting $\hat{\mathcal{E}}^{k}$ are added to QP (27). Similarly, each inner iteration in Steps 22–32 adds one cutting plane for a violated cone constraint. In Steps 6 and 16, some constraints are removed compared to the most recently solved QP, but also this structure could be utilized.

The algorithm might terminate because one of QPs solved for the step computation is infeasible. Since the feasible regions of the QP are outer approximations of the SOCP (1), this proves that the SOCP instance is infeasible; see also Remark 4.13.

4 Convergence analysis

4.1 Global convergence

In this section, we prove that, under a standard regularity assumption, all limit points of the sequence of iterates are optimal solutions of the SOCP, if the algorithm does not terminate with an optimal solution in Step 36. We also explore what happens when the SOCP is infeasible.

We make the following assumption throughout this section.

Assumption 3.

The set $X$ defined in (9) is bounded.

Since $x^{0}\in X$ by the initialization of Algorithm 3 and any step satisfies (21b), we have $x^{k}\in X$ for all $k$ . Similarly, (24c) and (14) imply that

(38)

x^{k}_{j0}\geq 0\text{ for all }k\geq 0\text{ and }j\in\mathcal{J}.

We start the analysis with some technical results that quantify the decrease in the penalty function model.

Lemma 4.1.

Consider an iteration $k$ and let $d^{k}$ be computed in Step 6 or Step 23 in Algorithm 3. Further let $\rho^{k}>\rho_{\min}^{k}$ , where $\rho_{\min}^{k}=\|\hat{z}_{\mathcal{J},0}^{k}\|_{\infty}$ with $\hat{z}^{k}$ defined in (23). Then the following statements are true.

(i)

We have

m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})\leq-(d^{k})^{T}H^{k}d^{k}-(\rho^{k}-\rho_{\min}^{k})\sum_{j\in\mathcal{J}}[r_{j}(x^{k}_{j})]^{+}\leq 0.

(ii)

If $x^{k}$ is not an optimal solution of the SOCP, then

(39) $m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})<0.$

Proof 4.2.

Proof of (i): Consider any $j\in\mathcal{D}(x^{k})$ . Because $d^{k}$ is a solution of (18) or (27), there exist $\hat{\lambda}^{k}$ and $\hat{z}^{k}$ so that the optimality conditions (24) hold. Let $j\in\mathcal{J}$ . Since $\hat{z}_{j}^{k}\in\mathcal{C}^{\circ}(\hat{\mathcal{Y}}_{j}^{k})$ , the definition (17) implies that

\hat{z}^{k}_{j}=-\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}{\rm NewA}r_{j}(y_{j,l}^{k})+\hat{\eta}^{k}_{j}e_{j0},

where $\hat{\mathcal{Y}}^{k}_{j}=\left\{y^{k}_{j,1},\ldots,y^{k}_{j,m_{j}^{k}}\right\}$ and $\hat{\sigma}^{k}_{l,j},\hat{\eta}^{k}_{j}\in\mathbb{R}_{+}$ .

Using (15) we have $\hat{z}_{j0}^{k}=\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}+\hat{\eta}^{k}_{j}\geq\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}.$ Together with $(x_{j}^{k}+d^{k}_{j})^{T}\hat{z}_{j}^{k}=0$ from (24c) and $(\bar{x}^{k}_{j})^{T}\bar{y}^{k}_{l,j}\leq\|\bar{x}^{k}_{j}\|\cdot\|\bar{y}^{k}_{l,j}\|$ this overall yields

	$\displaystyle-(d_{j}^{k})^{T}\hat{z}_{j}^{k}$	$\displaystyle=(x_{j}^{k})^{T}\hat{z}_{j}^{k}=x^{k}_{j0}\hat{z}_{j0}^{k}-\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}(\bar{x}^{k}_{j})^{T}\frac{\bar{y}^{k}_{l,j}}{\\|\bar{y}^{k}_{l,j}\\|}\geq x^{k}_{j0}\hat{z}_{j0}^{k}-\sum_{l=1}^{m_{j}^{k}}\hat{\sigma}^{k}_{l,j}\\|\bar{x}_{j}^{k}\\|$
		$\displaystyle\geq x^{k}_{j0}\hat{z}_{j0}^{k}-\hat{z}^{k}_{j0}\\|\bar{x}_{j}^{k}\\|=-z^{k}_{j0}r_{j}(x_{j}^{k})\geq-z^{k}_{j0}[r_{j}(x_{j}^{k})]^{+}.$

Further, we have from (24b) that $0=(Ax^{k}+Ad^{k}-b)^{T}\hat{\lambda}^{k}$ and therefore $(d^{k})^{T}A^{T}\hat{\lambda}^{k}=-(Ax^{k}-b)^{T}\hat{\lambda}^{k}\geq 0$ since $\hat{\lambda}^{k}\geq 0$ and $x^{k}\in X$ .

Using these inequalities and (24a), the choice of $\rho^{k}_{\min}$ yields

	$\displaystyle 0$	$\displaystyle=(d^{k})^{T}\left(c+H^{k}d^{k}+A^{T}\hat{\lambda}^{k}-\hat{z}^{k}\right)$
		$\displaystyle\geq c^{T}d^{k}+(d^{k})^{T}H^{k}d^{k}-\sum_{j\in\mathcal{J}}\hat{z}^{k}_{j0}[r_{j}(x_{j}^{k})]^{+}$
		$\displaystyle\geq c^{T}d^{k}+(d^{k})^{T}H^{k}d^{k}-\rho^{k}_{\min}\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}.$

Finally, combining this with (10) and (18c) or (27c), respectively, we obtain

	$\displaystyle m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})$	$\displaystyle=c^{T}d^{k}-\rho^{k}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x_{j}^{k})]^{+}$
		$\displaystyle=c^{T}d^{k}-\rho^{k}\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}$
		$\displaystyle\leq-(d^{k})^{T}H^{k}d^{k}-(\rho^{k}-\rho^{k}_{\min})\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}.$

For the second equality, we used that $r_{j}(x_{j}^{k})=0-x^{k}_{j0}\leq 0$ for $j\not\in\mathcal{D}(x^{k})$ by (38) and the definition of $\mathcal{D}(x^{k})$ . Since $H^{k}$ is positive semi-definite, $\rho^{k}>\rho^{k}_{\min}$ , and $[r_{j}(x^{k}_{j})]^{+}\geq 0$ , the right-hand side must be non-positive.

Proof of (39): Suppose $x^{k}\in X$ is not an optimal solution for the SOCP. If $x^{k}$ is not feasible for the SOCP, $x^{k}$ must violate a conic constraint and we have $[r_{j}(x_{j}^{k})]^{+}>0$ for some $j\in\mathcal{J}$ . Since $H^{k}$ is positive semidefinite and $\rho^{k}-\rho^{k}_{\min}>0$ , part (i) yields (39).

It remains to consider the case when $x^{k}$ is feasible for the SOCP, i.e., $[r_{j}(x^{k}_{j})]^{+}=0$ for all $j$ . To derive a contradiction, suppose that (39) does not hold. Then part (i) yields

	$\displaystyle 0$	$\displaystyle=m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})$
		$\displaystyle=-(d^{k})^{T}H^{k}d^{k}-(\rho^{k}-\rho^{k}_{\min})\sum_{j\in\mathcal{J}}[r_{j}(x_{j}^{k})]^{+}=-(d^{k})^{T}H^{k}d^{k}\leq 0.$

Because $H^{k}$ is positive semi-definite, this implies $H^{k}d^{k}=0$ . Further, since also

0=m^{k}(x^{k}+d^{k};\rho^{k})-m^{k}(x^{k};\rho^{k})\stackrel{{\scriptstyle\eqref{eq:merit_model}}}{{=}}c^{T}d^{k}-\rho^{k}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})]^{+}=c^{T}d^{k},

the optimal objective value of (18) or (27), respectively, is zero. At the same time, choosing $d^{k}=0$ is also feasible for (18) or (27) and yields the same objective value. Therefore, also $d^{k}=0$ is an optimal solution of (18) or (27) and the optimality conditions (24) hold for some multipliers. Because $\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{k}^{k})\subseteq\mathcal{K}_{j}$ , the same multipliers and $d^{k}=0$ show that the optimality conditions of the SOCP (13) also hold. So, $x^{k}$ is an optimal solution for the SOCP, contradicting the assumption.

The following lemma shows that the algorithm is well-defined and will not stay in an infinite loop in Steps 22–32.

Lemma 4.3.

Consider an iteration $k$ and let $d^{k}$ be computed in Step 6 or Step 23 in Algorithm 3. Suppose that $x^{k}$ is not an optimal solution of the SOCP. Then

(40)

\varphi(x^{k}+d^{k,l};\rho^{k})-\varphi(x^{k};\rho^{k})\leq c_{\text{dec}}\Big{(}m^{k}(x^{k}+d^{k,l};\rho^{k})-m^{k}(x^{k};\rho^{k})\Big{)}

after a finite number of iterations in the inner loop in Steps 22–32.

Proof 4.4.

Suppose the claim is not true and let $\{d^{k,l}\}_{l=0}^{\infty}$ be the infinite sequence of trial steps generated in the loop in Steps 22–32 for which the stopping condition in Step 25 is never satisfied, and let $d^{k,\infty}$ be a limit point of $\{d^{k,l}\}_{l=0}^{\infty}$ . We will first show that

(41)

[r_{j}(x_{j}^{k}+d_{j}^{k,\infty})]^{+}=0\text{ for all }j\in\mathcal{J}.

Let us first consider the case when $\bar{x}_{j}^{k}+\bar{d}_{j}^{k,\infty}=0$ for some $j\in\mathcal{J}$ . Then $r_{j}(x_{j}^{k}+d_{j}^{k,\infty})=\|\bar{x}_{j}^{k}+\bar{d}_{j}^{k,\infty}\|-(x_{j0}^{k}+d_{j0}^{k,\infty})=-(x_{j0}^{k}+d_{j0}^{k,\infty})\leq 0$ and (41) holds.

Now consider the case that $\bar{x}_{j}^{k}+\bar{d}_{j}^{k,\infty}\neq 0$ for $j\in\mathcal{J}$ . Since $d^{k,\infty}$ is a limit point of $\{d^{k,l}\}_{l=0}^{\infty}$ , there exists a subsequence $\{d^{k,l_{t}}\}_{t=0}^{\infty}$ that converges to $d^{k,\infty}$ . We may assume without loss of generality that $\bar{x}_{j}^{k}+\bar{d}_{j}^{k,l_{t}}\neq 0$ for all $t$ . Then, for any $t$ , by Step 31, $x_{j}^{k}+d_{j}^{k,l_{t}}\in\mathcal{Y}_{j}^{k,l_{t+1}}$ . In the inner iteration $l_{t+1}$ , the trial step $d_{j}^{k,l_{t+1}}$ is computed from (18) and satisfies $x_{j}^{k}+d_{j}^{k,l_{t+1}}\in\mathcal{C}_{j}(\mathcal{Y}_{j}^{k,l_{t}})$ , which by definition (14) implies

{\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k,l_{t}})^{T}(x_{j}^{k}+d_{j}^{k,l_{t+1}})\leq 0.

Taking the limit $t\to\infty$ and using the fact that ${\rm NewA}r_{j}(v_{j})^{T}v_{j}=r_{j}(v_{j})$ for any $v_{j}\in\mathcal{K}_{j}$ yields

r_{j}(x_{j}^{k}+d_{j}^{k,\infty})={\rm NewA}r_{j}(x_{j}^{k}+d_{j}^{k,\infty})^{T}(x_{j}^{k}+d_{j}^{k,\infty})\leq 0,

proving (41). In turn (41) implies that the ratio

\displaystyle\frac{\varphi(x^{k}+d^{k,l};\rho^{k})-\varphi(x^{k};\rho^{k})}{m^{k}(x^{k}+d^{k,l};\rho^{k})-m^{k}(x^{k};\rho^{k})}=\frac{c^{T}d^{k,l}+\rho^{k}([r_{j}(x_{j}^{k}+d_{j}^{k,l})]^{+}-[r_{j}(x_{j}^{k})]^{+})}{c^{T}d^{k,l}-\rho^{k}[r_{j}(x_{j}^{k})]^{+}}

converges to 1. Note that the ratio is well-defined since $m^{k}(x^{k}+d^{k,l};\rho^{k})-m^{k}(x^{k};\rho^{k})<0$ due to Lemma 4.3(39). It then follows that (40) is true for sufficiently large $l$ .

Lemma 4.5.

Suppose that there exists $\rho^{\infty}>0$ so that $\rho^{k}=\rho^{\infty}>0$ for all large $k$ . Then any limit point of $\{x^{k}\}_{k=0}^{\infty}$ is an optimal solution of the SOCP (1).

Proof 4.6.

From (11) and the updates in the algorithm, we have that

	$\displaystyle\varphi(x^{k+1};\rho^{\infty})-\varphi(x^{K_{\rho}};\rho^{\infty})$	$\displaystyle=\sum_{t=K_{\rho}}^{k}\Big{(}\varphi(x^{t+1};\rho^{\infty})-\varphi(x^{t};\rho^{\infty})\Big{)}$
		$\displaystyle\leq c_{\text{dec}}\sum_{t=K_{\rho}}^{k}\Big{(}m^{t}(x^{t}+d^{t};\rho^{\infty})-m^{t}(x^{t};\rho^{\infty})\Big{)}$

for $k\geq K_{\rho}$ . Since the SOCP cannot be unbounded below by Assumption 3, the left-hand side is bounded below as $k\to\infty$ . Lemma 4.1(i) shows that all summands are non-positive and we obtain

(42)

\lim_{k\to\infty}\Big{(}m^{k}(x^{k}+d^{k};\rho^{\infty})-m^{k}(x^{k};\rho^{\infty})\Big{)}=0.

Using Lemma 4.1(i), we also have

\lim_{k\to\infty}\biggl{(}(d^{k})^{T}H^{k}d^{k}+(\rho^{\infty}-\rho_{\min}^{k})\sum_{j\in\mathcal{J}}[r_{j}(x^{k}_{j})]^{+}\biggr{)}=0.

Since $H^{k}$ is positive semi-definite and $\rho^{\infty}-\rho^{k}_{\min}\geq\rho^{\infty}-\rho^{\infty}_{\min}>0$ , this implies that $[r_{j}(x^{k}_{j})]^{+}\to 0$ for all $j\in\mathcal{J}$ , i.e., all limit points of $\{x^{k}\}_{k=0}^{\infty}$ are feasible. This also yields $\lim_{k\to\infty}(d^{k})^{T}H^{k}d^{k}=0$ , and since $H^{k}$ is positive semi-definite and uniformly bounded due to (34), we have

(43)

\lim_{k\to\infty}H^{k}d^{k}=0.

Using (42) together with (10) and $[r_{j}(x^{k}_{j})]^{+}\to 0$ , we obtain

(44)

0=\lim_{k\to\infty}\biggl{(}c^{T}d^{k}-\rho^{\infty}\sum_{j\in\mathcal{D}(x^{k})}[r_{j}(x^{k}_{j})]^{+}\biggr{)}=\lim_{k\to\infty}c^{T}d^{k}.

Now let $x^{*}$ be a limit point of $\{x^{k}\}_{k=0}^{\infty}$ . Since $X$ is bounded, $d^{k}$ is bounded, and consequently there exists a subsequence $\{k_{t}\}_{t=0}^{\infty}$ of iterates so that $x^{k_{t}}$ and $d^{k_{t}}$ converge to $x^{*}$ and $d^{\infty}$ , respectively, for some limit point $d^{\infty}$ of $d^{k}$ . Define $g^{k_{t}}=H^{k_{t}}d^{k_{t}}$ for all $t$ . Then, looking at the QP optimality conditions (24), we see that $d^{k_{t}}$ is also an optimal solution of the linear optimization problem

(45)	$\displaystyle\min_{d\in\mathbb{R}^{n}}$	$\displaystyle(c+g^{k_{t}})^{T}d$
	s.t.	$\displaystyle A(x^{k_{t}}+d)\leq b,$
		$\displaystyle x_{j}^{k_{t}}+d_{j}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k_{t}}_{j}),\qquad j\in\mathcal{J}.$

Now suppose, for the purpose of deriving a contradiction, that $x^{*}$ is not an optimal solution of the SOCP. Since we showed above that $x^{*}$ is feasible, there then exists a step $\tilde{d}^{*}\in\mathbb{R}^{n}$ so that $\tilde{x}=x^{*}+\tilde{d}^{*}$ is feasible for (1) and $c^{T}\tilde{d}^{*}<0$ . Then, because $\mathcal{K}_{j}\subseteq\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k_{t}}_{j})$ , for each $t$ , $\tilde{d}^{k_{t}}=x^{*}-x^{k_{t}}+\tilde{d}^{*}$ is feasible for (45), and because $d^{k_{t}}$ is an optimal solution of (45), we have $(c+g^{k_{t}})^{T}d^{k_{t}}\leq(c+g^{k_{t}})^{T}\tilde{d}^{k_{t}}$ . Taking the limit $t\to\infty$ , we obtain $c^{T}d^{\infty}\leq c^{T}\tilde{d}^{*}<0$ , where we used $\lim_{t\to\infty}g^{k_{t}}=\lim_{t\to\infty}H^{k_{t}}d^{k_{t}}=0$ , due to the definition of $g^{k_{t}}$ and (43). However, this contradicts (44). Therefore, $x^{*}$ must be a solution of the SOCP.

For later reference, we highlight the limit (43) established in the above proof.

Lemma 4.7.

Suppose that there exists $\rho^{\infty}>0$ so that $\rho^{k}=\rho^{\infty}>0$ for all large $k$ . Then $\lim_{k\to\infty}H^{k}d^{k}=0$ .

We are now ready to prove that the algorithm is globally convergent under the following standard regularity assumption.

Assumption 4.

The SOCP is feasible and Slater’s constraint qualification holds, i.e., there exists a feasible point $\tilde{x}\in\mathbb{R}^{n}$ and $\epsilon>0$ so that $\tilde{x}+v$ is feasible for any $v\in\mathbb{R}^{n}$ with $\|v\|\leq\epsilon$ .

This assumption implies that the multiplier estimates are bounded.

Lemma 4.8.

Suppose that Assumption 4 holds. Then $\{\hat{z}^{k}\}$ is bounded.

Proof 4.9.

Let $\tilde{x}$ and $\epsilon$ be the quantities from Assumption 4. Note that the QP corresponding to the optimality conditions (24) is

	$\displaystyle\min_{d\in\mathbb{R}^{n}}\$	$\displaystyle c^{T}d+\tfrac{1}{2}d^{T}H^{k}d$
	s.t.	$\displaystyle A(x^{k}+d)\leq b,\;x_{j}^{k}+d_{j}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k}_{j}),\;j\in\mathcal{J}.$

Since $x^{k+1}=x^{k}+d^{k}$ when $d^{k}$ is the step accepted by the algorithm, it follows that $x^{k+1}$ is an optimal solution of the QP

	$\displaystyle O_{\rm primal}=\min_{x\in\mathbb{R}^{n}}\$	$\displaystyle(c^{T}-H^{k}x^{k})x+\tfrac{1}{2}x^{T}H^{k}x$
	s.t.	$\displaystyle Ax^{k+1}\leq b,\;x_{j}^{k+1}\in\mathcal{C}_{j}(\hat{\mathcal{Y}}^{k}_{j}),\;j\in\mathcal{J},$

the Lagrangian dual of which is


(46a)	$\displaystyle O_{\rm dual}=\max_{x,z\in\mathbb{R}^{n},\lambda\in\mathbb{R}^{m}}\$	$\displaystyle-b^{T}\lambda-\tfrac{1}{2}x^{T}H^{k}x$
(46b)	s.t.	$\displaystyle c-H^{k}x^{k}+H^{k}x+A^{T}\lambda-z=0,$
(46c)		$\displaystyle z\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}^{k}_{j}),\;j\in\mathcal{J},\quad\lambda\geq 0.$

By (24), $(x^{k+1},\hat{\lambda}^{k}$ , $\hat{z}^{k}$ ) is a primal-dual optimal solution of these QPs.

Define $v=-\epsilon\frac{\hat{z}^{k}}{\|\hat{z}^{k}\|}$ . Then $\|v\|\leq\epsilon$ , and Assumption 4 implies that $\tilde{x}+v\in\mathcal{K}_{j}\subseteq\mathcal{C}_{j}(\hat{\mathcal{Y}}_{j}^{k})$ . Since $\hat{z}^{k}\in\mathcal{C}^{\circ}_{j}(\hat{\mathcal{Y}}_{j}^{k})$ , we have with (46b) that

(47)

0\leq(\tilde{x}+v)^{T}\hat{z}^{k}=v^{T}\hat{z}^{k}+\tilde{x}^{T}(c-H^{k}\tilde{x}+H^{k}x^{k+1}+A^{T}\hat{\lambda}^{k}).

Since $H^{k}$ is positive definite, it is

(48)

0\leq(\tilde{x}-x^{k+1})^{T}H^{k}(\tilde{x}-x^{k+1})=\tilde{x}^{T}H^{k}\tilde{x}-2\tilde{x}^{T}H^{k}x^{k+1}+(x^{k+1})^{T}H^{k}x^{k+1}.

Furthermore, Slater’s condition implies strong duality, that is

	$\displaystyle b^{T}\hat{\lambda}^{k}+\frac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1}$	$\displaystyle=-O_{\rm dual}=-O_{\rm primal}$
(49)			$\displaystyle=-(c-H^{k}x^{k})^{T}x^{k+1}-\tfrac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1}.$

Finally, since $\tilde{x}$ is feasible for the SOCP, (1b) and $\hat{\lambda}^{k}\geq 0$ imply $\tilde{x}^{T}A^{T}\hat{\lambda}^{k}\leq b^{T}\hat{\lambda}^{k}$ . Subtracting $v^{T}\hat{z}^{k}$ on both sides of (47), this, together with (48) and (49), yields

	$\displaystyle\epsilon\\|\hat{z}^{k}\\|$	$\displaystyle\leq\tilde{x}^{T}c-\frac{1}{2}\tilde{x}^{T}H^{k}\tilde{x}+\frac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1}+b^{T}\hat{\lambda}^{k}$
		$\displaystyle=\tilde{x}^{T}c-\frac{1}{2}\tilde{x}^{T}H^{k}\tilde{x}-c^{T}x^{k+1}-\frac{1}{2}(x^{k+1})^{T}H^{k}x^{k+1}.$

The first two terms are independent of $k$ , and since $X$ is bounded by Assumption 4 and $H^{k}$ is uniformly bounded by (34), we can conclude that $\hat{z}^{k}$ is uniformly bounded.

It is easy to see that the penalty parameter update rule (37) and Lemma 4.8 imply the following result.

Lemma 4.10.

Suppose Assumption 4 holds. Then there exists $\rho^{\infty}$ and $K_{\rho}$ so that $\rho^{k}=\rho^{\infty}>\rho^{\infty}_{\min}$ , where $\rho^{\infty}_{\min}\geq\rho^{k}_{\min}=\|z^{k}_{\mathcal{J},0}\|_{\infty}$ for all $k\geq K_{\rho}$ .

We can now state the main convergence theorem of this section.

Theorem 4.11.

Suppose that Assumptions 3 and 4 hold. Then Algorithm 3 either terminates in Step 36 with an optimal solution, or it generates an infinite sequence of iterates $\{(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})\}_{k=0}^{\infty}$ , each limit point of which is a primal-dual solution of the SOCP (1).

Proof 4.12.

Let $\{(x^{k_{t}+1},\hat{\lambda}^{k_{t}},\hat{z}^{k_{t}})\}$ be a subsequence converging to a limit point $(x^{*},\lambda^{*},z^{*})$ . No matter whether an iterate is computed from the optimal solution of (18), (27), or (35), the iterates satisfy the optimality conditions (24). In particular, from (24c) we have for any $j\in\mathcal{J}$ that $\hat{z}_{j}^{k_{t}}\in\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}_{j}^{k_{t}})\subseteq\mathcal{K}_{j}$ and $(x_{j}^{k_{t}+1})^{T}\hat{z}_{j}^{k_{t}}=0$ . In the limit, we obtain $z_{j}^{*}\in\mathcal{K}_{j}$ (since $\mathcal{K}_{j}$ is closed) and $(x_{j}^{*})^{T}z_{j}^{*}=0$ . Lemma 4.10 yields that $\rho^{k}=\rho^{\infty}$ for all large $k$ , and so Lemma 4.5 implies that $x^{*}$ is feasible, i.e., $x_{j}^{*}\in\mathcal{K}_{j}$ . Therefore, (13c) holds. Using Lemma 4.7 we can take the limit in (24a) and (24b) and deduce also the remaining SOCP optimality conditions (13a) and (13b) hold at the limit point.

Remark 4.13.

In case the SOCP is infeasible, we have two possible outcomes. Either, Algorithm 3 terminates in some iterations because one of the QPs is infeasible, or $\lim_{k\to\infty}\rho^{k}=\infty$ (reverse conclusion of Lemma 4.5).

4.2 Identification of extremal-active cones

We can only expect fast local convergence under some non-degeneracy assumptions. Throughout this section, we assume that Assumption 2 holds. Under this assumption, $(x^{*},\lambda^{*},z^{*})$ is the unique optimal solution [1, Theorem 22], and Theorem 4.11 then implies that

\lim_{k\to\infty}(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})=(x^{*},\lambda^{*},z^{*}).

First, we prove a technical result that describes elements in $\mathcal{C}^{\circ}_{j}(\mathcal{Y}_{j})$ in a compact manner. For this characterization to hold, condition (20) for the initialization $\mathcal{Y}_{j}^{0}$ of the set of hyperplane-generating points is crucial.

Lemma 4.14.

Let $y_{j}\in\mathbb{R}^{n_{j}}$ with $\bar{y}_{j}\neq 0$ and $y_{j0}\geq 0$ . Further, let $\Phi_{j}(z_{j},y_{j}):=z_{j0}-\|\bar{z}_{j}+\bar{y}_{j}\|_{1}-\|\bar{y}_{j}\|$ . Then the following statements hold for $z_{j},y_{j}\in\mathbb{R}^{n_{j}}$ :

(i)

$z_{j}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\})$ if $\Phi_{j}(z_{j},y_{j})\geq 0$ .
(ii)

$z_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\}))$ if $\Phi_{j}(z_{j},y_{j})>0$ .

Proof 4.15.

For (i): Suppose $\Phi_{j}(z_{j},y_{j})\geq 0$ , then

z_{j0}\geq\|\bar{z}_{j}+\bar{y}_{j}\|_{1}+\|\bar{y}_{j}\|.

Define $\bar{s}_{j}=\bar{z}_{j}+\bar{y}_{j}$ and choose $\sigma_{j}^{+}\in\mathbb{R}_{+}^{n_{j}-1}$ and $\sigma_{j}^{-}\in\mathbb{R}_{+}^{n_{j}-1}$ so that $\bar{s}_{j}=\sigma_{j}^{+}-\sigma_{j}^{-}$ and $|s_{ji}|=\sigma_{ji}^{+}+\sigma_{ji}^{-}$ for all $i=1,\ldots,n_{j}-1$ . Then we have

	$\displaystyle z_{j0}$	$\displaystyle=\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{+}+\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{-}+\sigma_{j}+\eta_{j}$
	$\displaystyle\bar{z}_{j}$	$\displaystyle=\bar{s}_{j}-\bar{y}_{j}=\sigma_{j}^{+}-\sigma_{j}^{-}-\sigma_{j}\frac{\bar{y}_{j}}{\\|\bar{y}_{j}\\|}$

with $\sigma_{j}=\|\bar{y}_{j}\|$ and some $\eta_{j}\in\mathbb{R}_{+}$ . Using (15), this can be rewritten as

z_{j}=-\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{+}{\rm NewA}r_{j}(-e_{ji})-\sum_{i=1}^{n_{j}-1}\sigma_{ji}^{-}{\rm NewA}r_{j}(e_{ji})-\sigma_{j}{\rm NewA}r_{j}(y_{j})+\eta_{j}e_{j0}.

By the definition of $\mathcal{C}_{j}^{\circ}$ in (17), this implies that $z_{j}\in\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{0}_{j}\cup\{y_{j}\})$ where $\hat{\mathcal{Y}}_{j}^{0}$ is defined in (20). Since $\hat{\mathcal{Y}}_{j}^{0}\subseteq\mathcal{Y}_{j}^{0}$ from (20), we have $\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{0}_{j}\cup\{y_{j}\})\subseteq\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\})$ , and the claim follows.

For (ii): Suppose $\Phi_{j}(z_{j},y_{j})>0$ . Because $\Phi_{j}$ is a continuous function, there exists a neighborhood $N_{\epsilon}(z_{j})$ around $z_{j}$ so that $\Phi_{j}(\hat{z}_{j},y_{j})>0$ for all $\hat{z}_{j}\in N_{\epsilon}(z_{j})$ . From part (i) we then have $N_{\epsilon}(z_{j})\subseteq\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\})$ , and consequently $z_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\}))$ .

Theorem 4.16.

For all $k$ sufficiently large, we have $\mathcal{E}(x^{k})=\mathcal{E}(x^{*})$ .

Proof 4.17.

Choose $j\not\in\mathcal{E}(x^{*})$ , then $x^{*}_{j}\neq 0$ . Because $x_{j}^{k}\to x^{*}_{j}$ , it is $x_{j}^{k}\neq 0$ or, equivalently, $j\not\in\mathcal{E}(x^{k})$ for $k$ sufficiently large. For the remainder of this proof we consider $j\in\mathcal{E}(x^{*})$ and show that $j\in\mathcal{E}(x^{k})$ for large $k$ . Note that strict complementarity in Assumption 2 implies that $z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j})$ , i.e., $r_{j}(z_{j}^{*})<0$ , and consequently $z_{j0}^{*}>0$ .

First consider the iterations in which fast NLP-SQP steps are accepted in Steps 12 or 17. For the purpose of deriving a contradiction, suppose there exists an infinite subsequence so that $x^{k_{t}+1}=x^{k_{t}}+d^{S,k_{t}}$ or $x^{k_{t}+1}=x^{k_{t}}+d^{S,k_{t}}+s^{k_{t}}$ and $j\not\in\hat{\mathcal{E}}^{k_{t}}$ . Then $j\not\in\hat{\mathcal{E}}^{k_{t}}$ implies $x^{k_{t}+1}_{j0}>0$ (according to the termination condition in the while loop in Step 7). We also have $\hat{\mathcal{Y}}_{j}^{k_{t}}=\{\check{x}_{j}^{k_{t}}\}$ where $\check{x}_{j}^{k_{t}}=x_{j}^{k_{t}}$ from (28) or $\check{x}_{j}^{k_{t}}=x_{j}^{k_{t}}+d_{j}^{S,k_{t}}$ from (36). Condition (24c) yields $z_{j}^{k_{t}}\in\mathcal{C}^{\circ}_{j}(\{\check{x}_{j}^{k_{t}}\})$ , so by (17) it is $z_{j}^{k_{t}}=-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}})+\eta_{j}e_{j0}$ for some $\sigma_{j},\eta_{j}\geq 0$ , as well as $x_{j}^{k_{t}+1}\in\mathcal{C}_{j}(\{\check{x}_{j}^{k_{t}}\})$ , which by (14) implies ${\rm NewA}r_{j}(\check{x}_{j}^{k_{t}})^{T}x_{j}^{k_{t}+1}\leq 0$ . Then complementarity yields

0=(z_{j}^{k_{t}})^{T}x_{j}^{k_{t}+1}=-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}})^{T}x_{j}^{k_{t}+1}+\eta_{j}x^{k_{t}+1}_{j0}\geq\eta_{j}x^{k_{t}+1}_{j0}.

Since $x^{k_{t}+1}_{j0}>0$ and $\eta_{j}\geq 0$ , we must have $\eta_{j}=0$ , and consequently $z_{j}^{k_{t}}=-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}})$ . It is easy to see that $r_{j}(-\sigma_{j}{\rm NewA}r_{j}(\check{x}_{j}^{k_{t}}))=0$ . Since $z_{j}^{k_{t}}\to z_{j}^{*}$ , continuity of $r_{j}$ yields $r_{j}(z_{j}^{*})=0$ , in contradiction to $z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j})$ . We thus showed that $j\in\hat{\mathcal{E}}^{k}$ for all large iterations $k$ in which the NLP-SQP step was accepted, and consequently (28) and (36) yield $\hat{\mathcal{Y}}_{j}^{k}=\mathcal{Y}_{j}^{k}$ for such $k$ .

In all other iterations (22) holds, and overall we obtain

(50)

\mathcal{Y}_{j}^{0}\subseteq\mathcal{Y}_{j}^{k}\subseteq\hat{\mathcal{Y}}_{j}^{k}\text{ for all sufficiently large }k.

Let us first consider the case when $\bar{z}^{*}_{j}=0$ . Then $\|\bar{z}_{j}^{*}\|-z^{*}_{j0}=r_{j}(z_{j}^{*})<0$ yields $z_{j0}^{*}>0$ . To apply Lemma 4.14 choose any $i\in\{1,\ldots,n_{j}-1\}$ and let $y_{j}=e_{ji}$ . Then $\|y_{j}\|_{1}=\|y_{j}\|=1$ and $\Phi_{j}(z_{j}^{*},y_{j})=z^{*}_{j0}>0$ . Since $\hat{z}_{j}^{k}\to z_{j}^{*}$ and $\Phi_{j}$ is continuous, $\Phi_{j}(\hat{z}_{j}^{k},y_{j})>0$ for sufficiently large $k$ , and by Lemma 4.14, $\hat{z}^{k}_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{y_{j}\}))$ . Since $y_{j}\in\mathcal{Y}^{0}_{j}$ and (50) holds, we also have $\hat{z}^{k}_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{k}_{j}))$ . General conic complementarity in (24c) then implies that $x_{j}^{k+1}=x_{j}^{k}+d_{j}^{k}=0$ for all large $k$ , or equivalently, $j\in\mathcal{E}(x^{k})$ for $k$ sufficiently large, as desired.

Now consider the case $\bar{z}^{*}_{j}\neq 0$ . For the purpose of deriving a contradiction, suppose there exists a subsequence $\{x^{k_{t}}\}_{t=0}^{\infty}$ so that $j\not\in\mathcal{E}(x^{k_{t}})$ , i.e., $x^{k_{t}}\neq 0$ , for all $t$ . Because $\check{z}^{k}_{j}\to z^{*}_{j}$ , $\bar{z}^{*}_{j}\neq 0$ , and $r_{j}(z_{j}^{*})<0$ , we may assume without loss of generality that $r_{j}(\check{z}_{j}^{k_{t}})<0$ and $\bar{\check{z}}_{j}^{k_{t}}\neq 0$ for all $t$ . Using this and $x_{j}^{k_{t}}\neq 0$ , we see that the update rule (26) in Step 34 adds $-\check{z}^{k_{t}}_{j}$ to $\mathcal{Y}_{j}^{k_{t}+1}$ . With (50), we have

(51)

-\check{z}^{k_{t}}_{j}\in\mathcal{Y}_{j}^{k_{t}+1}\subseteq\mathcal{Y}_{j}^{k_{t+1}}\subseteq\hat{\mathcal{Y}}_{j}^{k_{t+1}}\text{ for all }t.

Recall the mapping $\Phi_{j}$ defined in Lemma 4.14 and note that $\Phi_{j}(z^{*}_{j},-z^{*}_{j})=z^{*}_{j0}-\|\bar{z}^{*}_{j}\|=-r_{j}(z^{*}_{j})>0$ . Since both $\hat{z}^{k}_{j}$ and $\check{z}^{k}_{j}$ converge to $z_{j}^{*}$ and $\Phi_{j}$ is continuous, it is $\Phi_{j}(\hat{z}_{j}^{k_{t+1}-1},-\check{z}^{k_{t}}_{j})>0$ for all large $t$ , and therefore, by Lemma 4.14, $\hat{z}_{j}^{k_{t+1}-1}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{-\check{z}^{k_{t}}_{j})\}))\stackrel{{\scriptstyle\eqref{eq:checkzinY}}}{{\subseteq}}\operatorname{int}(\mathcal{C}_{j}^{\circ}(\hat{\mathcal{Y}}^{k_{t+1}-1}_{j}))$ for all large $t$ . Conic complementarity in (24c) then implies that $x_{j}^{k_{t+1}}=x_{j}^{k_{t+1}-1}+d_{j}^{k_{t+1}-1}=0$ . This is a contradiction of the definition of the subsequence $\{x^{k_{t}}\}_{t=0}^{\infty}$ .

Remark 4.18.

In the proof of Theorem 4.16, we saw that $\Phi_{j}(z^{*}_{j},-z^{*}_{j})>0$ if $j\in\mathcal{E}(x^{*})$ and $\bar{z}_{j}^{*}\neq 0$ . Since $\Phi_{j}$ is continuous, this implies that there exists a neighborhood $N_{\epsilon}(z^{*}_{j})$ so that $\Phi_{j}(z_{j},-y_{j})>0$ , and consequently $z_{j}\in\operatorname{int}(\mathcal{C}_{j}^{\circ}(\mathcal{Y}^{0}_{j}\cup\{-y_{j}\}))$ , for all $z_{j},y_{j}\in N_{\epsilon}(z^{*}_{j})$ .

4.3 Quadratic local convergence

As discussed in Section 2.1, since $x^{*}$ is a solution of the SOCP (1), it is also a solution of the nonlinear problem (4). We now show that Algorithm 3 eventually generates steps that are identical to SQP steps for (4). Then Theorem 2.3 implies that the iterates converge locally at a quadratic rate.

We first need to establish that the assumptions for Theorem 2.3 hold.

Lemma 4.19.

Suppose that Assumption 2 holds for the SOCP (1). Then Assumption 1 holds for the NLP (4).

Proof 4.20.

Let $\lambda^{*}$ and $z^{*}$ be the optimal multipliers for the SOCP corresponding to $x^{*}$ , satisfying (13). Assumption 2 implies that $\lambda^{*}$ and $z^{*}$ are unique [1, Theorem 22].

Let $j\in\mathcal{D}(x^{*})$ and define $\mu^{*}_{j}=z^{*}_{j0}\geq 0$ . If $0=r_{j}(x^{*}_{j})=x^{*}_{j0}-\|\bar{x}^{*}_{j}\|$ , complementarity (13c) implies, for all $i\in\{1,\ldots n_{j}\}$ , that $0=x^{*}_{j0}z^{*}_{ji}+x^{*}_{ji}z^{*}_{j0}=\|\bar{x}^{*}_{j}\|z^{*}_{ji}+x^{*}_{ji}z^{*}_{j0},$ , or equivalently, $z^{*}_{ji}=-z^{*}_{j0}\frac{x^{*}_{ji}}{\|\bar{x}^{*}_{j}\|}$ ; see [1, Lemma 15]. Using (15), this can be written as

(52)

z^{*}_{j}=-z^{*}_{j0}{\rm NewA}r_{j}(x^{*}_{j})=-\mu^{*}_{j}{\rm NewA}r_{j}(x^{*}_{j}).

On the other hand, if $r_{j}(x^{*}_{j})<0$ , i.e., the constraint (4c) is inactive, then $x^{*}_{j}\in\operatorname{int}(\mathcal{K}_{j})$ and complementarity (13c) yields $z_{j}^{*}=0$ (see [1, Definition 23]) and therefore $\mu_{j}^{*}=0$ . Consequently, (52) is also valid in that case. Finally, we define $\nu^{*}_{j}=z^{*}_{j}$ for all $i\in\mathcal{E}(x^{*})$ . With these definitions, (13a) can be restated as

(53)

c+A^{T}\lambda^{*}+\sum_{j\in\mathcal{D}(x^{*})}\mu^{*}_{j}{\rm NewA}r_{j}(x^{*})-\nu^{*}=0,

where $\nu^{*}\in\mathbb{R}^{n}$ is the vector with the values of $\nu_{j}^{*}$ at the components corresponding to $j\in\mathcal{E}(x^{*})$ and zero otherwise. We now prove parts (i), (ii), and (iii) of Assumption 1.

Proof of (i): Let $j\in\mathcal{D}(x^{*}_{j})$ . We already established that $r_{j}(x_{j}^{*})<0$ yields $\mu^{*}_{j}=0$ . Now suppose that $r_{j}(x^{*}_{j})=0$ . Then $x_{j}^{*}\in\operatorname{bd}(\mathcal{K}_{j})\setminus\{0\}$ . Since strict complementarity is assumed, we have $z_{j}^{*}\in\operatorname{bd}(\mathcal{K}_{j})\setminus\{0\}$ (see the comment after Assumption 2), which in turn yields $z_{j}^{*}\neq 0$ and hence $\mu_{j}^{*}\neq 0$ .

Proof of (ii): Since we need to prove linear independence only of those constraints that are active at $x^{*}$ , we consider only those rows $A_{\mathcal{A}}$ of $A$ for which (4b) is binding.

Without loss of generality suppose $x^{*}$ is partitioned into four parts, $(x^{*})^{T}=((x_{\mathcal{B}}^{*})^{T}\ (x_{\mathcal{I}}^{*})^{T}\ (x_{\mathcal{E}}^{*})^{T}\ (x_{\mathcal{F}}^{*})^{T})$ , where $x_{\mathcal{B}}^{*}$ , $x_{\mathcal{I}}^{*}$ , and $x_{\mathcal{E}}^{*}$ correspond to the variables in the cones $\mathcal{B}=\{j\in\mathcal{J}:r_{j}(x^{*}_{j})=0,x_{j}^{*}\neq 0\}$ , $\mathcal{I}=\{j\in\mathcal{J}:r_{j}(x^{*}_{j})<0\}$ , and $\mathcal{E}=\mathcal{E}(x^{*})$ , respectively, and $x^{*}_{\mathcal{F}}$ includes all components of $x^{*}$ that are not in any of the cones. Further suppose that $(x_{\mathcal{B}}^{*})^{T}=((x^{*}_{1})^{T}\ \ldots(x^{*}_{p_{\mathcal{B}}})^{T})$ , where $\mathcal{B}=\{1,\ldots,p_{\mathcal{B}}\}$ , and that $A_{\mathcal{A}}$ is partitioned in the same way.

Primal non-degeneracy of the SOCP implies all that matrices of the form

\begin{pmatrix}[A_{\mathcal{A}}]_{1}&\cdots&[A_{\mathcal{A}}]_{p_{\mathcal{B}}}&[A_{\mathcal{A}}]_{\mathcal{I}}&[A_{\mathcal{A}}]_{\mathcal{E}}&[A_{\mathcal{A}}]_{\mathcal{F}}\\ \alpha_{1}{\rm NewA}r_{1}(x_{1}^{*})^{T}&\cdots&\alpha_{p_{\mathcal{B}}}{\rm NewA}r_{p_{\mathcal{B}}}(x_{p_{\mathcal{B}}}^{*})^{T}&0^{T}&v^{T}&0^{T}\end{pmatrix}

have linear independent rows for all scalars $\alpha_{i}$ and vectors $v$ , not all zero [1, Eq. (50)]. This implies that the rows of $A_{\mathcal{A}}$ , together with the gradient of any one of the binding constraints in (4c) and (4d) are linearly independent. Because the constraint gradients, which are of the form ${\rm NewA}r_{j}(x^{*}_{j})$ and $e_{ij}$ , share no nonzero components when extended to the full space, we conclude that the gradients of all active constraints are linearly independent at $x^{*}$ , i.e., the LICQ holds.

Proof of (iii): For the purpose of deriving a contradiction, suppose that there exists a direction $d\in\mathbb{R}^{n}\setminus\{0\}$ that lies in the null space of the constraints of (4) that are binding at $x^{*}$ and for which $d^{T}H^{*}d\leq 0$ .

Since $d$ is in the null space of the binding constraints, we have $A_{\mathcal{A}}d=0$ , ${\rm NewA}r_{j}(x^{*})^{T}d=0$ for $j\in\mathcal{B}$ , and $d_{j}=0$ for all $j\in\mathcal{E}$ . Premultiplying (53) by $d^{T}$ gives

(54)

0=c^{T}d+(\lambda^{*})^{T}\underbrace{A_{\mathcal{A}}d}_{0}+\sum_{j\in\mathcal{B}}\mu^{*}\underbrace{{\rm NewA}r_{j}(x^{*})^{T}d}_{0}+\sum_{j\in\mathcal{I}}\underbrace{\mu^{*}}_{0}{\rm NewA}r_{j}(x^{*})^{T}d+\underbrace{(\nu^{*})^{T}d}_{0}=c^{T}d.

What remains to show is that $d$ is a feasible direction for the SOCP, i.e., there exists $\beta>0$ so that $x^{*}+\beta d$ is feasible for the SOCP. Because of (54), this point has the same objective value as $x^{*}$ and is therefore also an optimal solution of the SOCP. This contradicts the fact that Assumption 2 implies that the optimal solution is unique [1, Theorem 22].

By the definition of $H^{*}$ in Assumption 1 and the choice of $d$ , we have

0\geq d^{T}H^{*}d=\sum_{j\in\mathcal{D}(x^{*})}\mu^{*}_{j}d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*}_{j})d_{j}=\sum_{j\in\mathcal{B}}\mu^{*}_{j}d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*}_{j})d_{j}.

Since for all $j\in\mathcal{B}$ , the Hessian ${\rm NewA}^{2}r_{j}(x^{*}_{j})$ is positive semi-definite and $\mu^{*}_{j}>0$ from Part (i), this yields $d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*}_{j})d_{j}=0$ for all $j\in\mathcal{B}$ .

Let $j\in\mathcal{B}$ . Then from (7)

(55)

0=d_{j}^{T}{\rm NewA}^{2}r_{j}(x^{*})d_{j}=\frac{\|\bar{d}_{j}\|^{2}\|\bar{x}^{*}_{j}\|^{2}-(\bar{d}_{j}^{T}\bar{x}^{*}_{j})^{2}}{\|\bar{x}^{*}_{j}\|^{3}}.

The definition of $\mathcal{B}$ implies $r_{j}(x_{j}^{*})=0$ and so $x^{*}_{j0}=\|\bar{x}^{*}_{j}\|$ . Since $d_{j}$ is in the null space of ${\rm NewA}r_{j}(x_{j}^{*})$ , we have $0={\rm NewA}r_{j}(x^{*}_{j})^{T}d_{j}=-d_{j0}+\frac{\bar{d}_{j}^{T}\bar{x}_{j}}{\|\bar{x}^{*}_{j}\|}$ , which in turn yields $d_{j0}x^{*}_{j0}=\bar{d}_{j}^{T}\bar{x}^{*}_{j}$ . Finally, using these relationships together with (55) gives

0=\|\bar{d}_{j}\|^{2}\|\bar{x}^{*}_{j}\|^{2}-(\bar{d}_{j}^{T}\bar{x}^{*}_{j})^{2}=\|\bar{d}_{j}\|^{2}\,(x^{*}_{j0})^{2}-(d_{j0}x^{*}_{j0})^{2}

and so $d_{j0}^{2}=\|\bar{d}_{j}\|^{2}$ . All of these facts imply that for any $\beta\in\mathbb{R}$ ,

		$\displaystyle\\|\bar{x}^{}_{j}+\beta\bar{d}_{j}\\|^{2}-(x^{}_{j0}+\beta d_{j0})^{2}$
	$\displaystyle=$	$\displaystyle\\|\bar{x}^{}_{j}\\|^{2}+2\beta\bar{d}_{j}^{T}\bar{x}^{}_{j}+\beta^{2}\\|\bar{d}_{j}\\|^{2}-\left((x^{}_{j0})^{2}+2\beta d_{j0}x^{}_{j0}+\beta^{2}d_{j0}^{2}\right)=0,$

which implies $r_{j}(x^{*}_{j}+\beta d_{j})=0$ and therefore $x^{*}_{j}+\beta d_{j}\in\mathcal{K}_{j}$ .

Further, because $d$ lies in the null space of the active constraints, we have, for any $\beta\in\mathbb{R}$ , that $x^{*}_{j}+\beta d_{j}=0\in\mathcal{K}_{j}$ for all $j\in\mathcal{E}(x^{*})$ and $A_{\mathcal{A}}(x^{*}+\beta d)=b_{\mathcal{A}}$ . Finally, since $r_{j}(x^{*}_{j})<0$ and hence $x^{*}_{j}\in\operatorname{int}(\mathcal{K}_{j})$ for all $j\in\mathcal{J}\setminus(\mathcal{E}(x^{*})\cup\mathcal{B})$ , and since $x^{*}_{j}$ is strictly feasible for all non-binding constraints in (1b), there exists $\beta>0$ so that $x^{*}+\beta d$ satisfies all constraints in (1).

Theorem 4.21.

Suppose that $c_{H}>\|H^{*}\|$ . Then the primal-dual iterates $(x^{k+1},\hat{\lambda}^{k},\hat{z}^{k})$ converge locally to $(x^{*},\lambda^{*},z^{*})$ at a quadratic rate.

Proof 4.22.

We already established in Theorem 4.11 that the iterates converge to the optimal solution, and since $H^{k}\to H^{*}$ and $c_{H}>\|H^{*}\|$ , the Hessian is not rescaled according to (34) in Step 4. Using Theorem 4.16 we know that, once $k$ is sufficiently large, the step $d^{S,k}$ computed in Step 6 of Algorithm 3 is identical with the SQP step from (5) for (4); we can ignore (27d) here because $x^{*}_{j0}>0$ and $d^{S,k}_{j0}\to 0$ and therefore this constraints is not active for large $k$ . This also implies that the condition in Step 7 is never true and thus $\hat{\mathcal{E}}_{k}=\mathcal{E}(x^{*})$ . If the decrease condition in Step 12 is not satisfied, by a similar argument we have that $s^{k}$ computed in Step 16 is the second-order correction step from (12) for (4). Due to Lemma 4.19 we can now apply Theorem 2.3 to conclude that either $d^{S,k}$ or $d^{S,k}+s^{k}$ is accepted to define the next iterate for large $k$ and that the iterates converge at a quadratic rate.

5 Numerical Experiments

In this section, we examine the performance of Algorithm 3. First, using randomly generated instances, we consider three types of starting points: (i) uninformative default starting point (cold start), (ii) solution of a perturbed instance, (iii) solution computed by an interior-point SOCP solver whose accuracy we wish to improve. Then we briefly report results using the test library CBLIB. The numerical experiments were performed on an Ubuntu 22.04 Linux server with a 2.1GHz Xeon Gold 5128 R CPU and 256GB of RAM.

5.1 Implementation

We implemented Algorithm 3 in MATLAB R2021b, with parameters $c_{\text{dec}}=10^{-6}$ , $c_{\text{inc}}=2$ , $c_{H}=10^{12}$ , and $\rho^{-1}=50$ . In each iteration, we identify $\mathcal{E}(x^{k})=\{j\in\mathcal{J}:\|x^{k}_{j}\|_{\infty}<10^{-6}\}$ and $\mathcal{D}(x^{k})=\{j\in\mathcal{J}\setminus\mathcal{E}(x^{k}):\|\bar{x}^{k}_{j}\|>10^{-8}\}$ . The set $\mathcal{Y}_{j}^{0}$ is initialized to $\hat{\mathcal{Y}}_{j}^{0}$ (see (20)), and $\lambda^{0}$ is a given starting value for $\lambda$ , if provided, and zero otherwise. In addition, since the identification of the optimal extremal-active set $\mathcal{E}(x^{*})$ requires $z^{*}_{j}\in\mathcal{C}_{j}^{\circ}(\mathcal{Y}_{j})$ , we add $-\check{z}^{0}_{j}$ to $\mathcal{Y}^{0}_{j}$ , where $\check{z}^{0}=c+A^{T}\lambda^{0}$ .

The algorithm terminates when the violation of the SOCP optimality conditions (13) for the current iterate satisfies

(56)

E(x^{k},\lambda^{k},\check{z}^{k})=\max\left\{\begin{array}[]{l}\|[Ax^{k}-b]^{+}\|_{\infty},\|(Ax^{k}-b)\circ\lambda^{k}\|_{\infty},\|[-\lambda^{k}]^{+}\|_{\infty}\\[4.30554pt] \max_{j\in\mathcal{J}}\left\{[r_{j}(x^{k})]^{+},[r_{j}(\check{z}^{k})]^{+},|(x_{j}^{k})^{T}\check{z}^{k}_{j}|\right\}\end{array}\right\}\leq\epsilon_{\text{tol}}

with $\check{z}^{k}=c+A^{T}\lambda^{k}$ , for some $\epsilon_{\text{tol}}>0$ .

As in [29], the sufficient descent condition (11) is slightly relaxed by

\varphi(\hat{x}^{k+1};\rho^{k})-\varphi(x^{k};\rho^{k})-10\epsilon_{\text{mach}}|\varphi(x^{k};\rho^{k})|\leq c_{\text{dec}}\left(m^{k}(x^{k}+d;\rho^{k})-m^{k}(x^{k};\rho^{k})\right)

to account for cancellation error, where $\epsilon_{\text{mach}}$ is the machine precision. Finally, to avoid accumulating very similar hyperplanes that would lead to degenerate QPs, we do not add a new generating point $v_{j}$ to $\mathcal{Y}_{j}^{k}$ if there already exists $y_{j}\in\mathcal{Y}_{j}^{k}$ such that $\left\|\frac{\bar{v}_{j}}{\|\bar{v}_{j}\|}-\frac{\bar{y}_{j}}{\|\bar{y}_{j}\|}\right\|_{\infty}\leq 10^{-10}.$

In these experiments, we disabled the second-order correction step (Steps 16–20) because we noticed that it was never accepted in practice. In a more sophisticated implementation, one would include a heuristic that attempts to detect the Maratos effect and then triggers the second-order correction step in specific situations.

The QPs were solved using ILOG CPLEX V12.10, with optimality and feasibility tolerances set to $10^{-9}$ and “dependency checker” and “numerical precision emphasis” enabled, using the primal simplex method. When CPLEX did not report a solution status “optimal” and the QP KKT error was above $10^{-9}$ , a small perturbation was added to the Hessian matrix, i.e., we replaced $H^{k}$ by $H^{k}+10^{-7}\cdot I$ . This helped in some cases in which CPLEX (incorrectly) reported that $H^{k}$ was not positive semi-definite. If CPLEX still did not find a QP solution with KKT error less than $10^{-9}$ , we attempted to resolve the QP with the barrier method, the dual simplex method, and the primal simplex method again, until one was able to compute a solution. If all solvers failed for QP (27), the algorithm continued in Step 22. If no solver was able to solve (18), we terminated the main algorithm and declared a failure.

We emphasize that the purpose of our implementation is to assess whether the proposed algorithm exhibits behavior that validates the stated goals: Convergence from any starting point and rapid local convergence to highly accurate solutions. In its current implementation, it requires more computation time than highly sophisticated commercial solvers such as MOSEK or CPLEX, which were developed over decades and have highly specialized linear algebra routines that are tightly integrated into the algorithms. As we observed at the end of Section 3.7, many of the QPs in Algorithm 3 that are solved in succession are similar to each other, and savings in computation times should therefore be achievable. However, our prototype implementation based on the Matlab CPLEX interface does not allow us to utilize callback functions for adding or removing hyperplanes. Achieving these savings in computation time thus requires a more sophisticated implementation, a task that is outside of the scope of this paper. Consequently, we do not report solution times here.

5.2 Randomly generated QCQPs

The experiments were performed on randomly generated SOCP instances of varying sizes, specified by $(n,m,K)$ . Here, $n,m\geq 1$ are the number of variables and linear constraints, respectively. $K\geq 1$ specifies the number of cones of each “activity type”: $|\mathcal{E}(x^{*})|=K$ , $|\{j\in\mathcal{J}:r_{j}(x^{*}_{j})=0,x_{j}^{*}\neq 0\}|=K$ , and $|\{j\in\mathcal{J}:r_{j}(x^{*}_{j})<0\}|=K$ , i.e., there are $K$ cones that are extremal-active, $K$ that are active at the boundary, and $K$ that are inactive at the optimal solution $x^{*}$ . The dimensions of the cones are randomly chosen. In addition, there are variables that are not part of any cone, with bounds chosen in a way so that the non-degeneracy assumption, Assumption 2, holds. A detailed description of the problem generation is stated in Appendix A in [18].

$n$	$m$	$K$	solved	total	SQP	total	total
				iter	iter	QP (27)	QP (18)
200	60	10	30	6.67	6.67	9.77	0.00
400	120	20	30	7.20	7.20	11.57	0.00
1000	300	50	30	7.23	7.23	12.17	0.00
200	60	4	30	7.53	7.07	11.83	0.90
400	120	8	30	8.27	7.77	14.20	1.00
1000	300	20	30	8.67	7.80	15.93	1.83
200	60	2	30	8.47	7.87	13.90	1.20
400	120	4	30	8.87	8.07	15.30	1.60
1000	300	10	30	9.47	8.43	17.27	1.97

Table 1: Results with

x^{0}=0

\epsilon_{\text{tol}}=10^{-7}

, average per-size statistics taken over 30 random instances. “solved”: number of instances solved (out of 30); “total iter”: total number of iterations in Algorithm 3; “SQP iter”: number of iteration in which NLP-SQP step was accepted in Steps 12 or 17; “total QP (27)” / “total QP (18)”: Total number of QPs of that type solved.

Table 1 summarizes the performance of the algorithm with an uninformative starting point $x^{0}=0$ . Each row lists average statistics for a given problem size ( $n,m,K$ ), taken over 30 random instances. We see that the proposed algorithm is very reliable and solved every instance to the tolerance $\epsilon=10^{-7}$ . The average number of iterations is mostly between 7–9, during most of which the second-order NLP-SQP step was accepted.

To give an idea of the computational effort, we report the number of times QPs (27) and (18) were solved. And we can draw further conclusions from this data: Consider, for example, the last row. At the beginning of each iteration, QP (27) is solved to obtain the NLP-SQP step. The difference with the total number of iterations, i.e., 17.27-9.47=7.80, gives us the total number of times in which the guess $\hat{\mathcal{E}}^{k}$ of the extremal-active cones needed to be corrected in Steps 7–10. In other words, on average, the loop Steps 7–10 is executed 7.80/9.47=0.82 times per iteration. Similarly, the last column tells us the total number of iterations of the loop in Steps 22–32. The loop was only executed when the NLP-SQP step was not accepted, so in 9.47-8.43=1.04 iterations, taking 1.97/1.04=1.89 loop iterations on average.

The experiments are presented in three groups where the ratio between $n$ and $K$ is kept constant. As the number of cones, $K$ , decreases from one group to the next, the average size of the individual cones increases by a factor of $2.5$ and $2$ , respectively. This increase seems to result in slightly more iterations in which the SQP step was rejected, indicating that the simple linearization (27c) of the non-extremal-active cones becomes sometimes insufficiently accurate.

$n$	$m$	$K$	total	SQP	total	total	Mosek	final
			iter	iter	QP (27)	QP (18)	error	error
200	60	10	1.10	1.07	1.10	0.07	2.33e-06	1.63e-10
400	120	20	1.03	1.00	1.03	0.03	2.67e-06	1.70e-10
1000	300	50	1.07	1.03	1.07	0.03	3.49e-06	1.76e-10
200	60	4	1.03	1.03	1.03	0.00	5.97e-06	1.69e-10
400	120	8	1.00	1.00	1.00	0.00	2.28e-06	1.87e-10
1000	300	20	1.03	0.83	1.03	0.27	5.20e-06	1.72e-10
200	60	2	1.00	1.00	1.00	0.00	2.02e-06	1.53e-10
400	120	4	1.13	1.10	1.13	0.03	4.85e-06	2.03e-10
1000	300	10	1.20	1.10	1.20	0.13	1.22e-05	2.41e-10

Table 2: Result with MOSEK solution as

x^{0}

\epsilon_{\text{tol}}=10^{-9}

. All instances were solved. “Mosek error”: Optimality error

E

(56) at Mosek solution; “final error”: Optimality error

E

at final iterate of Algorithm 3.

The remaining experiments in this section investigate to which degree the algorithm is able to achieve our primary goal of taking advantage of a good starting point. We begin with an extreme situation, in which we first solve an instance with the interior-point SOCP solver MOSEK V9.1.9 (called via CVX), using the setting cvx_precision=high corresponding to the MOSEK tolerance $\epsilon=\epsilon_{\rm mach}^{2/3}$ , and give the resulting primal-dual solution as starting point to Algorithm 3. Choosing any tighter MOSEK tolerances leads to failures in several problems. Table 2 summarizes the results. In all cases, the algorithm converges rapidly to an improved solution, reducing the error by 4 orders of magnitude, most of the time with only a single second-order iteration. The Mosek error was dominated by the violation of complementarity. This demonstrates the ability of the proposed method to improve the accuracy of a solution computed by an interior-point method.

$n$	$m$	$K$	solved	total	SQP	total	total
				iter	iter	QP (27)	QP (18)
200	60	10	30	1.00	0.97	1.00	0.07
400	120	20	30	1.00	0.97	1.00	0.03
1000	300	50	30	1.00	0.97	1.00	0.07
200	60	4	30	1.00	1.00	1.00	0.00
400	120	8	30	1.00	0.93	1.00	0.07
1000	300	20	30	1.00	0.87	1.00	0.20
200	60	2	30	1.00	1.00	1.00	0.00
400	120	4	30	1.00	0.97	1.00	0.03
1000	300	10	30	1.07	1.00	1.03	0.07

Table 3: Result with

10^{-3}

perturbation,

\epsilon_{\text{tol}}=10^{-7}

$n$	$m$	$K$	solved	total	SQP	total	total
				iter	iter	QP (27)	QP (18)
200	60	10	30	1.20	1.07	1.00	0.19
400	120	20	30	1.33	1.17	1.00	0.73
1000	300	50	30	1.60	1.23	1.02	1.29
200	60	4	30	1.27	1.13	1.02	0.71
400	120	8	30	1.67	1.27	1.16	0.48
1000	300	20	30	2.10	1.40	1.27	0.57
200	60	2	30	1.67	1.33	1.24	0.42
400	120	4	30	2.30	1.87	1.30	0.38
1000	300	10	30	3.67	2.53	1.53	0.59

Table 4: Result with

10^{-1}

perturbation,

\epsilon_{\text{tol}}=10^{-7}

For the final experiments, summarized in Tables 3 and 4, the starting point is the MOSEK solution of a perturbed problem, in which 10% of the objective coefficients $c$ were perturbed by uniformly distributed random noise of the order of $10^{-3}$ and $10^{-1}$ , respectively. For the small perturbation, similar to Table 2, Algorithm 3 terminated in one iteration most of the time. More iterations were required for the larger perturbation, but still significantly fewer compared to the uninformative starting point, see Table 1.

5.3 CBLIB instances

Problem	# var	# con	# soc	solved/	total	SQP	iter warm/
subset				total	iter	iter	iter cold
10/20*	3024.0	6030.6	336.0	10/11	6.5	6.5	0.57
achtziger_*	2314.0	1856.7	257.0	6/7	21.3	10.2	0.53
as_*	4909.0	5695.7	669.0	20/20	10.4	8.4	0.37
ck_*	2376.0	2375.0	15.0	90/90	5.8	4.6	0.40
classical_*	159.3	200.4	1.0	409/409	7.3	7.3	0.49
clay*	457.2	691.9	82.9	11/12	67.6	13.2	1.31
estein*	103.1	165.2	14.0	9/9	18.3	4.4	0.30
flay*	175.0	408.0	4.0	10/10	7.0	3.6	0.34
fo[7-9]*	208.6	540.2	15.9	19/19	12.3	3.0	0.53
m[3-9]*	150.2	370.5	12.8	8/8	10.9	4.0	0.44
nb*	3098.8	321.0	816.0	4/4	4.8	4.2	3.43
netmod*	989.7	2593.0	4.7	3/3	4.0	4.0	0.33
no7_*	169.0	438.0	14.0	5/5	10.6	4.4	0.46
o[7-9]_*	183.2	466.9	14.7	9/9	8.7	3.2	0.66
pp-*	2960.0	2221.0	370.0	6/6	12.8	12.8	0.17
robust_*	253.9	298.3	2.0	420/420	6.8	6.8	0.67
sched_*	7361.5	3685.0	1.5	4/4	12.5	10.0	0.35
shortfall_*	249.9	294.3	2.0	420/420	7.8	7.8	0.37
slay*	393.0	936.0	14.0	14/14	8.6	7.0	0.27
sssd-*	286.0	314.5	18.0	16/16	7.6	5.6	0.42
stolpe07-*	1483.0	1202.0	164.7	3/3	29.0	18.3	0.71
tls*	521.4	989.8	51.6	5/6	7.6	7.0	0.49
turbine*	201.6	289.3	52.4	7/7	4.9	4.9	0.30
uflquad-*	8011.0	6811.0	1333.3	3/4	15.3	14.7	0.19
wiener_*	2297.5	2508.5	71.3	41/45	12.4	9.9	0.43
misc	1534.1	1374.5	158.9	11/14	10.5	7.6	0.33

Table 5: Results for CBLIB instances, averaged per problem group,

\epsilon_{\text{tol}}=10^{-5}

. “Problem subset”: name of problem group; “# var”: number of variables; “# con”: number of linear constraints; “# soc”: number of second-order cone constraints; “solved/total”: number of solved vs. total instances; “total iter”: number of iterations in Algorithm 3; “SQP iter”: number of iterations in which NLP-SQP step was accepted; “iter warm/iter cold”: iterations for warm start divided by iterations for cold start (only for instances solved in both settings).

To demonstrate the robustness of the algorithm we also solved instances from the Conic Benchmark Library CBLIB [32]. Some instances involve rotated second-order cone constraints, and we reformulated them so that they fit into our standard form (1). We chose all 1,575 instances with at most 10,000 variables and 10,000 constraints. Integer variables were relaxed to be continuous.

Table 5 shows the statistics for the starting point $x^{0}=0$ , where instances were grouped into sets of problems with similar names, and the remaining ones are collected in misc. The method was able to solve 99.2% of the instances, where 10 problems could not be solved due to failures of the QP subproblem solver, and Algorithm 3 exceeded the maximum number of 200 iterations in 2 cases. In comparison, MOSEK, with default settings, failed on 5 instances (those were solved correctly with Algorithm 3), incorrectly declared 3 instances to be infeasible, and labeled 6 instances to be unbounded (of which 3 were solved by Algorithm 3). We observed that some instances, especially those in the clay*, fo[7-9]*, m[3-9]*, no7*, o[7-9]_* subsets, are degenerate, having an optimal objective function value of 0, and the assumption necessary to prove fast local convergence is violated. This matches the observations in the table, where the SQP step was accepted only in a relatively small fraction of the iterations.

To showcase the warm-starting feature of the algorithm, we took the 1,563 previously successfully solved instances, perturbed 10% of the entries of the final primal-dual iterate by a random perturbation, uniformly chosen in $[-0.1,0.1]$ , and used this as the starting point for a warm-started run. Here, QP subproblem failure occurred in 3 cases and 2 instances exceeded the iteration limit. As we can see from the last column in Table 5, on average, the number of iterations was reduced in most cases, often by more than 50%.

6 Concluding remarks

We presented an SQP algorithm for solving SOCPs and proved that it converges from any starting point and achieves local quadratic convergence for non-degenerate SOCPs. Our numerical experiments indicate that the algorithm is reliable, converges quickly when a good starting point is available, and produces more accurate solutions than a state-of-the-art interior-point solver.

Future research would investigate whether the proposed algorithm is a valuable alternative for interior-point methods for small problems or for the solution of a sequence of related SOCPs. An efficient implementation of the algorithm beyond our Matlab prototype would be tightly coupled with a tailored active-set QP solver that efficiently adds or removes cuts instead of solving each QP essentially from scratch. Parametric active-set solvers such as qpOASES [8] or QORE [26] might be suitable options since they do not require primal or dual feasible starting points.

Acknowledgments

We thank Javier Peña for his suggestion for the proof of Lemma 4.8, as well as three referees whose comments helped us to improve the paper.

References

[1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Mathematical Programming, 95(1):3–51, 2003.
[2] A. Ben-Tal and A. Nemirovski. On polyhedral approximations of the second-order cone. Mathematics of Operations Research, 26(2):193–205, 2001.
[3] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta numerica, 4:1–51, 1995.
[4] C. Coey, M. Lubin, and J. P. Vielma. Outer approximation with conic certificates for mixed-integer convex problems. Mathematical Programming Computation, 12(2):249–293, 2020.
[5] A. R. Conn, N. I. Gould, and P. L. Toint. Trust region methods. SIAM, 2000.
[6] M. Diehl, F. Jarre, and C. H. Vogelbusch. Loss of superlinear convergence for an SQP-type method with conic constraints. SIAM Journal on Optimization, 16(4):1201–1210, 2006.
[7] S. Drewes and S. Ulbrich. Subgradient based outer approximation for mixed integer second order cone programming. In Mixed integer nonlinear programming, pages 41–59. Springer, 2012.
[8] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl. qpOASES: A parametric active-set algorithm for quadratic programming. Mathematical Programming Computation, 6:327–363, 2014.
[9] R. Fletcher. Second order corrections for non-differentiable optimization. In Numerical analysis, pages 85–114. Springer, 1982.
[10] N. Goldberg and S. Leyffer. An active-set method for second-order conic-constrained quadratic programming. SIAM Journal on Optimization, 25(3):1455–1477, 2015.
[11] N. Goswami, S. K. Mondal, and S. Paruya. A comparative study of dual active-set and primal-dual interior-point method. IFAC Proceedings Volumes, 45(15), 2012.
[12] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022.
[13] S. Hayashi, T. Okuno, and Y. Ito. Simplex-type algorithm for second-order cone programmes via semi-infinite programming reformulation. Optimization Methods and Software, 31(6):1272–1297, 2016.
[14] L. Hei, J. Nocedal, and R. A. Waltz. A numerical study of active-set and interior-point methods for bound constrained optimization. In Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam, pages 273–292. Springer, 2008.
[15] IBM ILOG. User’s Manual for CPLEX, 2019.
[16] H. Kato and M. Fukushima. An SQP-type algorithm for nonlinear second-order cone programs. Optimization Letters, 1(2):129–144, 2007.
[17] M. S. K. Lau, S. P. Yue, K. V. Ling, and J. M. Maciejowski. A comparison of interior point and active set methods for fpga implementation of model predictive control. In 2009 European Control Conference (ECC), pages 156–161, 2009.
[18] X. Luo and A. Wächter. A Quadratically Convergent Sequential Programming Method for Second-Order Cone Programs Capable of Warm Starts, 2022. arXiv:2207.03081.
[19] N. Maratos. Exact penalty function algorithms for finite dimensional and control optimization problems. PhD thesis, Imperial College London (University of London), 1978.
[20] D. K. Molzahn and I. A. Hiskens. A survey of relaxations and approximations of the power flow equations. Foundations and Trends in Electric Energy Systems, 4(1-2):1–221, 2019.
[21] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.1., 2019.
[22] M. Nayakakuppam, M. Overton, and S. Schemita. SDPpack user’s guide-version 0.9 Beta, 1997.
[23] J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
[24] T. Okuno, K. Yasuda, and S. Hayashi. Sl1QP based algorithm with trust region technique for solving nonlinear second-order cone programming problems. Interdisciplinary Information Sciences, 21(2):97–107, 2015.
[25] F. A. Potra and S. J. Wright. Interior-point methods. Journal of Computational and Applied Mathematics, 124(1-2):281–302, 2000.
[26] L. Schork. A parametric active set method for general quadratic programming. Master’s thesis, Heidelberg University, Germany, 2015.
[27] P. Tseng. Second‐order cone programming relaxation of sensor network localization. SIAM Journal on Optimization, 18(1):156–185, 2007.
[28] R. J. Vanderbei and H. Yurttan. Using LOQO to solve second-order cone programming problems. Technical report, Priceton University, 1998.
[29] A. Wächter and L. T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
[30] V. Zhadan. The variant of primal simplex-type method for linear second-order cone programming. In Optimization and Applications: 12th International Conference, OPTIMA 2021, Petrovac, Montenegro, September 27–October 1, 2021, Proceedings, pages 64–75. Springer, 2021.
[31] X. Zhang, Z. Liu, and S. Liu. A trust region SQP-filter method for nonlinear second-order cone programming. Computers & Mathematics with Applications, 63(12):1569–1576, 2012.
[32] Zuse Institute Berlin. CBLIB - The Conic Benchmark Library. https://cblib.zib.de/.

Appendix A Generation of random instances

The test instances used for the numerical experiments have the form

	$\displaystyle{\displaystyle\min_{x\in\mathbb{R}^{n}}}$	$\displaystyle\sum_{j=1}^{p}c_{j}^{T}x_{j}+c_{p+1}^{T}x_{p+1}$
	s.t.	$\displaystyle\sum_{j=1}^{p}A_{j}x_{j}+A_{p+1}x_{p+1}=b,$
		$\displaystyle x_{j}\in\mathcal{K}_{j},\quad j\in[p],$
		$\displaystyle x_{j0}\leq 1000,\quad 0\leq x_{p+1}\leq 1000,$

where $(c_{1},\cdots c_{p},c_{p+1})\in\mathbb{R}^{n_{1}}\times\cdots\times\mathbb{R}^{n_{p}}\times\mathbb{R}^{n_{p+1}}$ is the partition of the objective vector $c$ , and $(A_{1},\cdots,A_{p+1})$ is the partition of the column vectors of constraint matrix $A\in\mathbb{R}^{m\times n}$ . The subvector $x_{p+1}$ includes all optimization variables that are not in any of the cones.

In contrast to the original SOCP (1), this formulation includes linear equality constraints, but it is straightforward to generalize Algorithm 3 and its convergence proofs for this setting. The formulation above also includes large upper bounds on all variables so that the set $X$ defined in (9) is bounded. However, the upper bounds were chosen so large that they are not active at the optimal solution.

Algorithm 4 Random Instance Generation

n

(number of variables),

m

(number of linear constraints),

p

(total number of cones),

K_{0}

(number of extremal-active cones),

K_{I}

(number of inactive cones),

K_{B}

(number of cones active at the boundary, excluding extremal-active cones),

d

(density of nonzeros in constraint matrix). Conditions:

n,m,p,K_{0},K_{I},K_{B}\in\mathbb{Z}_{+}

d\in(0,1]

p=K_{0}+K_{I}+K_{B}

n>2p

n>m+K_{B}+\lfloor(m+K_{B})/2\rfloor

2:Randomly choose positive integers

n_{1},\ldots,n_{K_{0}}

so that

n_{j}\geq 2

and

\sum_{j=1}^{K_{0}}n_{j}=\lfloor(m+K_{B})/2\rfloor

3:Randomly choose positive integers

n_{K_{0}+1},\ldots,n_{p}

so that

n_{j}\geq 2

and

\sum_{j=K_{0}+1}^{p}n_{j}=m+K_{B}

4:for

j=1,\ldots,K_{0}

\triangleright

Generate

x_{j}^{*}=0

z_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j})

5: Set

x_{j}^{*}\leftarrow 0

6: Sample

\bar{z}_{j}\sim(-10,10)^{n_{j}-1}

z_{j0}^{*}\sim(1,5)

and

\epsilon_{j}\sim(0,1)

. Set

\bar{z}_{j}^{*}\leftarrow(2+\epsilon)\frac{z^{*}_{j0}}{\|\bar{z}_{j}\|}\bar{z}_{j}

7:end for

8:for

j=K_{0}+1,\ldots,K_{0}+K_{I}

\triangleright

Generate

z_{j}^{*}=0

x_{j}^{*}\in\operatorname{int}(\mathcal{K}_{j})

9: Set

z_{j}^{*}\leftarrow 0

10: Sample

\bar{x}_{j}\sim(-10,10)^{n_{j}-1}

x_{j0}^{*}\sim(1,5)

and

\epsilon_{j}\sim(0,1)

. Set

\bar{x}_{j}^{*}\leftarrow(2+\epsilon)\frac{x^{*}_{j0}}{\|\bar{x}_{j}\|}\bar{x}_{j}

11:end for

12:for

j=K_{0}+K_{I}+1,\ldots,p

\triangleright

Generate

z_{j}^{*},x_{j}^{*}\in\operatorname{bd}(\mathcal{K}_{j})

13: Sample

\bar{x}_{j}\sim(-10,10)^{n_{j}-1}

and

x_{j0}^{*}\sim(1,5)

. Set

\bar{x}_{j}^{*}\leftarrow\frac{x^{*}_{j0}}{\|\bar{x}_{j}\|}\bar{x}_{j}

14: Sample

\beta\sim(1,5)

. Set

\bar{z}^{*}_{j}\leftarrow-\beta\bar{x}_{j}^{*}

and

\bar{z}^{*}_{j0}\leftarrow\beta x_{j0}^{*}

15:end for

16:Set

n_{\rm fix}\leftarrow n-m-K_{B}-\sum_{j=1}^{K_{0}}n_{j}

and

n_{\rm free}\leftarrow n-n_{\rm fix}-\sum_{j=1}^{p}n_{j}

17:for

j=1,\ldots,n_{\rm fix}

18: Set [

x_{p+1}^{*}]_{j}\leftarrow 0

and sample

[z^{*}_{p+1}]_{j}\sim(1,5)

19:end for

20:for

j=n_{\rm fix}+1,\ldots,n_{\rm free}

21: Set [

z_{p+1}^{*}]_{j}\leftarrow 0

and sample

[x^{*}_{p+1}]_{j}\sim(1,5).

22:end for

23:Sample linear independent rows

A_{i}\sim(-5,5)^{n}

with density

d

, for

i=1,\cdots,m

24:Call pcond to calculate primal condition number

c_{p}

and call dcond to calculate dual condition number

c_{d}

25:if

c_{p}>10^{5}

c_{d}>10^{5}

then go to Step 3 and redo the process.

26:Sample

\lambda^{*}\sim(1,10)^{m}

. Set

b\leftarrow Ax^{*}

and

c\leftarrow-A^{T}\lambda^{*}+z^{*}

27:return

A,b,c,x^{*},\lambda^{*},z^{*}

Algorithm 4 describes how the data for these instances are generated. The algorithm generates an optimal primal-dual solution in a way so that, at the solution, $K_{0}$ cones are extremal-active, for $K_{I}$ cones the optimal solution lies in the interior of the cone, and for $K_{B}$ the optimal solution is at the boundary but not the extreme point of the cone. In our experiments, there is an equal number of $K$ of each type. The number of variables that are active at the lower bound, $n_{\rm fix}$ , is chosen in a way so that the optimal solution is non-degenerate. The linear constraints reduce the number of degrees of freedom by $m$ , each extremal active cone by $n_{j}$ , and each otherwise active cone by $1$ . Step 16 calculates the number of constraints that are active at zero so that the total number of degrees of freedom fixed by constraints equals $n$ . Lastly, in Step 24, we use pcond and dcond functions provided in SDPPack [22] to double-check if a generated instance is also numerically non-degenerate. These functions return the primal and dual conditions numbers, $c_{p}$ and $c_{d}$ , respectively, and we discard an instance if either number is above a threshold. Only about 1% of instances created were excluded in this manner.