A Quadratically Convergent Sequential Programming Method for Second-Order Cone Programs Capable of Warm Starts
Abstract
We propose a new method for linear second-order cone programs. It is based on the sequential quadratic programming framework for nonlinear programming. In contrast to interior point methods, it can capitalize on the warm-start capabilities of active-set quadratic programming subproblem solvers and achieve a local quadratic rate of convergence.
In order to overcome the non-differentiability or singularity observed in nonlinear formulations of the conic constraints, the subproblems approximate the cones with polyhedral outer approximations that are refined throughout the iterations. For nondegenerate instances, the algorithm implicitly identifies the set of cones for which the optimal solution lies at the extreme points. As a consequence, the final steps are identical to regular sequential quadratic programming steps for a differentiable nonlinear optimization problem, yielding local quadratic convergence.
We prove the global and local convergence guarantees of the method and present numerical experiments that confirm that the method can take advantage of good starting points and can achieve higher accuracy compared to a state-of-the-art interior point solver.
keywords:
nonlinear optimization, second-order cone programming, sequential quadratic programming90C15, 90C30, 90C55
1 Introduction
We are interested in the solution of second-order cone programs (SOCPs) of the form
(1a) | ||||
(1b) | s.t. | |||
(1c) |
where , , , and is a subvector of of dimension with index set . We assume that the sets are disjoint. The set is the second-order cone of dimension , i.e.,
(2) |
where the vector is partitioned into with . These problems arise in a number of important applications [1, 2, 20, 27]
Currently, most of the commercial software for solving SOCPs implements interior-point algorithms which utilize a barrier function for second-order cones, see, e.g. [12, 15, 21]. Interior-point methods have well-established global and local convergence guarantees [25] and are able to solve large-scale instances, but they cannot take as much of an advantage of a good estimate of the optimal solution as it would be desirable in many situations. For example, in certain applications, such as online optimal control, the same optimization problem has to be solved over and over again, with slightly modified data. In such a case, the optimal solution of one problem provides a good approximation for the new instance. Having a solver that is capable of “warm-starts”, i.e., utilizing this knowledge, can be essential when many similar problems have to be solved in a small amount of time.
For some problem classes, including linear programs (LPs), quadratic programs (QPs), or nonlinear programming (NLP), active-set methods offer suitable alternatives to interior-point methods. They explicitly identify the set of constraints that are active (binding) at the optimal solution. When these methods are started from a guess of the active set that is close to the optimal one, they often converge rapidly in a small number of iterations. An example of this is the simplex method for LPs. Its warm-start capabilities are indispensable for efficient branch-and-bound algorithms for mixed-integer linear programs.
Active-set methods for LPs, QPs, or NLPs are also known to outperform interior-point algorithms for problems that are not too large [11, 14, 17]. Similarly, active-set methods might be preferable when there are a large number of inequality constraints among which only a few are active, since an interior-point method is designed to consider all inequality constraints in every iteration and consequently solves large linear systems, whereas an active set method can ignore all inactive inequality constraints and encounters potentially much smaller linear systems.
Our goal is to propose an active-set alternative to the interior-point method in the context of SOCP that might provide similar benefits. We introduce a new sequential quadratic programming (SQP) algorithm that, in contrast to interior-point algorithms for SOCPs, has favorable warm-starting capabilities because it can utilize active-set QP solvers. We prove that it is globally convergent, i.e., all limit points of the generated iterates are optimal solutions under mild assumptions, and that it enjoys a quadratic convergence rate for non-degenerate instances. Our preliminary numerical experiments demonstrate that these theoretical properties are indeed observed in practice. They also show that the algorithm is able in some cases to compute a solution to a higher degree of precision than interior point methods. This is expected, again in analogy to the context of LPs, QPs, and NLPs, since an interior point method terminates at a small, but nonzero value of the barrier parameter that cannot be made smaller than some threshold (typically or ) because the arising linear systems become highly ill-conditioned. In contrast, in the final iteration of the active-set method, the linear systems solved correspond directly to the optimality conditions, without any perturbation introduced by a barrier parameter, and are only as degenerate as the optimal solution of the problem.
The paper is structured as follows. Section 2 reviews the sequential quadratic programming method and the optimality conditions of SOCPs. Section 3 describes the algorithm, which is based on an outer approximation of the conic constraints. Section 4 establishes the global and local convergence properties of the method, and numerical experiments are reported in Section 5. Concluding remarks are offered in Section 6.
1.1 Related work
While a large number of interior-point algorithms for SOCP have been proposed, including some that have been implemented in efficient optimization packages [12, 15, 21], there are only very few approaches for solving SOCPs with an active-set framework. The method proposed by Goldberg and Leyffer [10] is a two-phase algorithm that combines a projected-gradient method with equality-constrained SQP. However, it is limited to instances that have only conic constraints (1c) and no additional linear constraints (1b). Hayashi et al. [13] propose a simplex-type method, where they reformulate the SOCP as a linear semi-infinite program to handle the fact that these instances have infinitely many extreme points. The resulting dual-simplex exchange method shows promising practical behavior. However, in contrast to the method proposed here, the authors conjecture that their method has only an R-linear local convergence rate. Zhadan [30] proposes a similar simplex-type method. Another advantage of the method presented in this paper is that the pivoting algorithm does not need to be designed and implemented from scratch. Instead, it can leverage existing implementations of active-set QP solvers, in particular the efficient handling of linear systems.
The proposed algorithm relies on polyhedral outer approximations based on well-known cutting planes for SOCPs. For instance, the methods for mixed-integer SOCP by Drewes and Ulbrich [7] and Coey et al. [4] use these cutting planes to build LP relaxations of the branch-and-bound subproblems. We note that an LP-based cutting plane algorithm for SOCP could be seen as an active-set method, but it is only linearly convergent. As pointed out in [6], it is crucial to consider the curvature of the conic constraint in the subproblem objective to achieve fast convergence.
The term “SQP method for SOCP” has also been used in the literature to refer to methods for solving nonlinear SOCPs [6, 16, 24, 31]. However, in contrast to the method here, in these approaches, the subproblems themselves are SOCPs (1) and include the linearization of the nonlinear objective and constraints. It will be interesting to explore extensions of the proposed method to nonlinear SOCPs in which feasibility is achieved asymptotically not only for the nonlinear constraints but also for the conic constraints.
1.2 Notation
For two vectors , we denote with their component-wise product, and the condition stands for . For , we define as the vector with entries . We denote by , , the Euclidean norm, the -norm, and the -norm, respectively. For a cone , is the canonical basis vector with in the element corresponding to for , and and denote the cone’s interior and boundary, respectively.
2 Preliminaries
The NLP reformulation of the SOCP is introduced in Section 2.1. We review in Section 2.2 the local convergence properties of the SQP method and in Section 2.3 the penalty function as a means to promote convergence from any starting point. In Section 2.4, we briefly state the optimality conditions and our assumption for the SOCP (1).
2.1 Reformulation as a smooth optimization problem
The definition of the second-order cone in (2) suggests that the conic constraint (1c) can be replaced by the nonlinear constraint
without changing the set of feasible points. Consequently, (1) is equivalent to
(3a) | ||||
(3b) | s.t. | |||
(3c) |
Unfortunately, (3) cannot be solved directly with standard gradient-based algorithms for nonlinear optimization, such as SQP methods. The reason is that is not differentiable whenever . This is particularly problematic when the optimal solution of the SOCP lies at the extreme point of a cone, . In that case, the Karush-Kuhn-Tucker (KKT) necessary optimality conditions for the NLP formulation, which are expressed in terms of derivatives, cannot be satisfied. Therefore, any optimization algorithm that seeks KKT points cannot succeed. As a remedy, differentiable approximations of have been proposed in the past; see, for example, [28]. However, high accuracy comes at the price of high curvature, which can make finding the numerical solution of the NLP difficult.
An alternative equivalent reformulation of the conic constraint is given by
In this case, the constraint function is differentiable. But if , its gradient vanishes, and as a consequence, no constraint qualification applies and the KKT conditions do not hold. Therefore, again, a gradient-based method cannot be employed. By using an outer approximation of the cones that is improved in the course of the algorithm, our proposed variation of the SQP method is able to avoid these kinds of degeneracy.
To facilitate the discussion we define a point-wise partition of the cones.
Definition 2.1.
Let .
-
1.
We call a cone extremal-active at , if , and we denote with the set of extremal-active cones at .
-
2.
We define the set as the set of all cones for which the function is differentiable at .
-
3.
We define the set as the set of all cones that are not extremal-active and for which is not differentiable .
If the set at an optimal solution were known in advance, we could compute as a solution of (1) by solving the NLP
(4a) | ||||
(4b) | s.t. | |||
(4c) | ||||
(4d) |
The constraints involving the linearization of are imposed only if is differentiable at , and variables in cones that are extremal-active at are explicitly fixed to zero. With this, locally around , all functions in (4) are differentiable and we could apply standard second-order algorithms to achieve fast local convergence.
2.2 Local convergence of SQP methods
The proposed algorithm is designed to guide the iterates into the neighborhood of an optimal solution . If the optimal solution is not degenerate and the iterates are sufficiently close to , the steps generated by the algorithm are eventually identical to the steps that the SQP method would take for solving the differentiable optimization problem (4). In this section, we review the mechanisms and convergence results of the basic SQP method [23].
At an iterate , the basic SQP method, applied to (4), computes a step as an optimal solution to the QP subproblem
(5a) | ||||
(5b) | s.t. | |||
(5c) | ||||
(5d) |
Here, is the Hessian of the Lagrangian function for (4), which in our case is
(6) |
where are estimates of the optimal multipliers for the nonlinear constraint (4c), and where is the block-diagonal matrix with
(7) |
in the rows and columns corresponding to for . It is easy to see that is positive semi-definite. The estimates are updated based on the optimal multipliers corresponding to (5c).
Algorithm 1 formally states the basic SQP method where and denote the multipliers corresponding to (4b) and (4d), respectively. Because we are only interested in the behavior of the algorithm when is close to , we assume here that for all and for all , and hence the gradient and Hessian of can be computed. Note that the iterates and are not explicitly needed in Algorithm 1, but they are necessary to measure the optimality error and define the primal-dual iterate sequence that is analyzed in Theorem 2.2.
A fast rate of convergence can be proven under the following sufficient second-order optimality assumptions [3].
Assumption 1.
Suppose that is an optimal solution of the NLP (4) with corresponding KKT multipliers , , and , satisfying the following properties:
-
(i)
Strict complementarity holds;
-
(ii)
the linear independence constraint qualification (LICQ) holds at , i.e., the gradients of the constraints that hold with equality at are linearly independent;
-
(iii)
the projection of the Lagrangian Hessian into the null space of the gradients of the active constraints is positive definite.
2.3 Penalty function
Theorem 2.2 is a local convergence result. Practical SQP algorithms include mechanisms that make sure that the iterates eventually reach such a neighborhood, even if the starting point is far away. To this end, we employ the exact penalty function
(8) |
in which is a penalty parameter. Note that we define in terms of all conic constraints , even though appears in (4c) only for . We do this because the proposed algorithm does not know in advance and the violation of all cone constraints needs to be taken into account when the original problem (1) is solved. Nevertheless, in this section, we may safely ignore the terms for because for we have and hence for all due to (5d), and when , we have when is close to since .
It can be shown, under suitable assumptions, that the minimizers of over the set defined by the linear constraints (4b),
(9) |
coincide with the minimizers of (4) when is chosen sufficiently large. Because it is not known upfront how large needs to be, the algorithm uses an estimate, , in iteration , which might be increased during the course of the algorithm.
To ensure that the iterates eventually reach a minimizer of , and therefore a solution of (4), we require that the decrease of is at least a fraction of that achieved in the piece-wise linear model of given by
(10) |
constructed at . More precisely, the algorithm accepts a trial point as a new iterate only if the sufficient decrease condition
(11) | ||||
holds with some fixed constant . The trial iterate with computed from (5) might not always satisfy this condition. The proposed algorithm generates a sequence of improved steps of which one is eventually accepted.
However, to apply Theorem 2.2, it would be necessary that the algorithm take the original step computed from (5); see Step 5 of Algorithm 1. Unfortunately, might not be acceptable even when the iterate is arbitrarily close to a non-degenerate solution satisfying Assumption 1 (a phenomenon called the Maratos effect [19]). Our remedy is to employ the second-order correction step [9], , which is obtained as an optimal solution of the QP
(12a) | ||||
(12b) | s.t. | |||
(12c) | ||||
(12d) |
For later reference, let , and denote optimal multiplier vectors corresponding to (12b)–(12d), respectively. The algorithm accepts the trial point if it yields sufficient decrease (11) with respect to the original SQP step . Note that (12) is a variation of the second-order correction that is usually used in SQP methods, for which (12c) reads
and avoids the evaluation of . In our setting, however, evaluating takes no extra work and (12c) is equivalent to a supporting hyperplane, see Section 3.1. As the following theorem shows (see, e.g., [9] or [5, Section 15.3.2.3]), this procedure computes steps with sufficient decrease (11) and results in quadratic convergence.
Theorem 2.3.
Let Assumption 1 hold and assume that the initial iterate and multipliers are sufficiently close to and , respectively. Further suppose that for large where for all .
-
1.
Consider an algorithm that generates a sequence of iterates by setting or for all . Then converges to at a quadratic rate.
-
2.
Further, for all , either or satisfies the acceptance criterion (11).
2.4 Optimality conditions for SOCP
The proposed algorithm aims at finding an optimal solution of the SOCP (1), or equivalently, values of the primal variables, , and the dual variables, and for , that satisfy the necessary and sufficient optimality conditions [1, Theorem 16]
(13a) | ||||
(13b) | ||||
(13c) |
A thorough discussion of SOCPs is given in the comprehensive review by Alizadeh and Goldfarb [1]. The authors consider the formulation in which the linear constraints (1b) are equality constraints, but the results in [1] can be easily extended to inequalities.
The primal-dual solution is unique under the following assumption.
Assumption 2.
is a non-degenerate primal-dual solution of the SOCP (1) at which strict complementarity holds.
The definition of non-degeneracy for SOCP is somewhat involved and we refer the reader to [1, Theorem 21]. Strict complementarity holds if and implies that: (i) ; (ii) ; (iii) ; and (iv) not both and are zero.
3 Algorithm
The proposed algorithm solves the NLP formulation (3) using a variation of the SQP method. Since the functional formulation of the cone constraints (3c) might not be differentiable at all iterates or at an optimal solution, the cones are approximated by a polyhedral outer approximation using supporting hyperplanes.
The approximation is done so that the method implicitly identifies the constraints that are extremal-active at an optimal solution , i.e., for large . More precisely, we will show that close to a non-degenerate optimal solution, the steps generated by the proposed algorithm are identical to those computed by the QP subproblem (5) for the basic SQP algorithm for solving (4). Consequently, fast local quadratic convergence is achieved, as discussed in Section 2.2.
3.1 Supporting hyperplanes
In the following, consider a particular cone and let be a finite subset of . We define the cone
(14) |
generated by the points in . For each we have , and using
(15) |
we obtain for any that
Therefore . Also, for , consider Then
and also Hence . Therefore, for any , the inequality
(16) |
defines a hyperplane that supports at . In summary, is a polyhedral outer approximation of , defined by supporting hyperplanes.
In addition, writing , we also define the cone
(17) |
For all and , we have
because and from the definition of . Therefore is included in the dual of the cone .
Now define and let be in the dual of . Since this implies that for all , Farkas’ lemma yields that for some and , i.e., .
Overall we proved that defined in (17) is the dual of , and since , this implies .
3.2 QP subproblem
In each iteration, at an iterate , the proposed algorithm computes a step as an optimal solution of the subproblem
(18a) | ||||
(18b) | s.t. | |||
(18c) | ||||
(18d) |
Here, is a positive semi-definite matrix that captures the curvature of the nonlinear constraint (3c), and for each cone, is the set of hyperplane-generating points that have been accumulated up to this iteration. From (14), we see that (18d) can be replaced by linear constraints. Consequently, (18) is a QP and can be solved as such.
Algorithm 2 describes a preliminary version of the proposed SQP method based on this subproblem. Observe that the linearization (18c) can be rewritten as
and is equivalent to the hyperplane constraint generated at . Consequently, if , then and (18c) acts as a cutting plane that excludes . Using the update rule
(19) |
in Step 6 makes sure that is excluded in all future iterations.
In our algorithm, we initialize so that
(20) |
In this way, is an extreme point of , as it is for , and the challenging aspect of the cone is already captured in the first subproblem. By choosing the coordinate vectors we have and the hyperplane constraint (16) becomes a very sparse linear constraint.
When in each iteration, this procedure becomes the standard cutting plane algorithm for the SOCP (1). It is well-known that the cutting plane algorithm is convergent in the sense that every limit point of the iterates is an optimal solution of the SOCP (1), but the convergence is typically slow. In the following sections, we describe how Algorithm 2 is augmented to achieve fast local convergence. The full method is stated formally in Algorithm 3.
3.3 Identification of extremal-active cones
We now describe a strategy that enables our algorithm to identify those cones that are extreme-active at a non-degenerate solution within a finite number of iterations, i.e., for all large . This will make it possible to apply a second-order method and achieve quadratic local convergence.
Consider the optimality conditions for the QP subproblem (18):
(21a) | ||||
(21b) | ||||
(21c) | ||||
(21d) |
Here, , , and are the multipliers corresponding to the constraints in (18); for completeness, we define for . In (21a), is the vector in that contains in the elements corresponding to and is zero otherwise. Similarly, is equal to in the elements corresponding to for all and zero otherwise.
Let us define
(22) |
It is easy to verify that, for , and hence from (21c). As a consequence we obtain for all . Furthermore, implies that
for suitable values of and . Then and
(23) |
from (21a). In conclusion, if is a primal-dual solution of the QP subproblem (18), then satisfies the conditions
(24a) | ||||
(24b) | ||||
(24c) |
which more closely resembles the SOCP optimality conditions (13). Our algorithm maintains primal-dual iterates that are updated based on (24).
Suppose that strict-complementarity holds at a primal-dual solution of the SOCP (1) and that . If then implies . As converges to , we have and therefore for sufficiently large . This yields . We now derive a modification of Algorithm 2 that ensures that for all sufficiently large under Assumption 2.
Consider any . We would like to have
(25) |
for all large , since then complementarity in (24c) implies that and hence for all large . We will later show that Assumption 2 implies that and that there exists a neighborhood of so that if ; see Remark 4.18. This suggests that some vector close to should eventually be included in because then (25) holds when is close enough to . For this purpose, the algorithm computes
which also converges to (see (13a)), and sets to , where
(26) |
The update is skipped when (because then is already in and no additional hyperplane is needed), and when or , which might indicate that and .
3.4 Fast NLP-SQP steps
Now that we have a mechanism in place that makes sure that the extremal-active cones are identified in a finite number of iterations, we present a strategy that emulates the basic SQP Algorithm 1 and automatically takes quadratically convergent SQP steps, i.e., solutions of the SQP subproblem (5), close to . For the discussion in this section, we again assume that is a unique solution at which Assumption 2 holds.
Suppose that for large due to the strategy discussed in Section 3.3. This means that the outer approximation (18d) of for is sufficient to fix to zero and is therefore equivalent to the constraint (5d) in the basic SQP subproblem. However, (18) includes the outer approximations for all cones, including those for , which are not present in (5). Consequently, the desired SQP step from (5) might not be feasible for (18).
As a remedy, at the beginning of an iteration, the algorithm first computes an NLP-SQP step as an optimal solution of a relaxation of (18),
(27a) | ||||
(27b) | s.t. | |||
(27c) | ||||
(27d) | ||||
(27e) |
where . In this way, the outer approximations are imposed only for the currently extremal-active cones, while for all other cones only the linearization (27c) is considered, just like in (5), with the additional restriction (27d) that ensure . Let , , , and be the optimal corresponding to the constraints in (27) (set to zero for non-existing constraints) and define as in (23). Then the optimality conditions (24) hold again, this time with , but instead of (22) we have
(28) |
When is not close to and , QP (27) might result in poor steps that go far outside of for some and undermine convergence. Therefore, we iteratively add more cones to until
(29) |
i.e., when a cone is approximated only by its linearization (27c), the step does not appear to target its extreme point. This property is necessary to show that for all large also for the case that new iterates are computed from (27) instead of (18). Note that in the extreme case and (27) is identical to (18). This loop can be found in Steps 7–10 in Algorithm 3.
Since there is no guarantee that (27) yields iterates that converge to , the algorithm discards the NLP-SQP step in certain situations and falls back to the original method to recompute a new step from (18), as in the original method. In Section 3.6 we describe how we use the exact penalty function (8) to determine when this is necessary.
3.5 Hessian matrix
Motivated by (6), we compute the Hessian matrix in (18) and (27) from
(30) |
where are multiplier estimates for the nonlinear constraint (3c). Because is positive semi-definite and , also is positive semi-definite.
In the final phase, when we intend to emulate the basic SQP Algorithm 1. Therefore, we set for , where are the optimal multipliers for (27c), when the fast NLP-SQP step was accepted. But we also need to define a value for when the step is computed from (18) where, in addition to the linearization of , hyperplanes (18d) are used to approximate all cones. By comparing the optimality conditions of the QPs (18) and (5) we now derive an update for .
Suppose that . Then (21a) yields
(31) |
where because of (30). Here, the dual information for the nonlinear constraint is split into and and needs to be condensed into a single number, , so that we can compute from (30) in the next iteration.
Recall that, in the basic SQP Algorithm 1, the new multipliers are set to the optimal multipliers of the QP (5), which satisfy
(32) |
A comparison with (31) suggests to choose so that Multiplying both sides with and solving for yields
Note that if the outer approximation constraint (18d) is not active and therefore for . In this case, we recover the basic SQP update, as desired.
Now suppose that . Again comparing (31) with (32) suggests a choice so that where we substituted by because the former is not defined for . In this case, multiplying both sides with and solving for yields
In summary, in each iteration in which (18) determines the new iterate, we update
(33) |
The choice above leads to quadratic convergence for non-degenerate instances, but it is common for the global convergence analysis of SQP methods to permit any positive semi-definite Hessian matrix , as long as it is bounded. Since we were not able to exclude the case that or are unbounded for some cone , in which case defined in (30) is unbounded, we fix a large threshold and rescale the Hessian matrix according to
(34) |
so that in every iteration. In this way, global convergence is guaranteed, but the fast local convergence rate might be impaired if is chosen too small so that defined in (30) must be rescaled. Therefore, in practice, we set to a very large number and (34) is never actually triggered in our numerical experiments.
3.6 Penalty function
The steps computed from (18) and (27) do not necessarily yield a convergent algorithm and a safeguard is required to force the iterates into a neighborhood of an optimal solution. Here, we utilized the exact penalty function (8) and accept a new iterate only if the sufficient decrease condition (11) holds.
As discussed in Section 3.4, at the beginning of an iteration, the algorithm first computes an NLP-SQP step from (27). The penalty function can now help us to decide whether this step makes sufficient progress towards an optimal solution, and we only accept the trial point as a new iterate if (11) holds with .
If the penalty function does not accept , there is still a chance that is making rapid progress towards the solution, but, as discussed in Section 2.2, the Maratos effect is preventing the acceptance of . As a remedy, we compute, analogously to (12), a second-order correction step for (27) as a solution of
(35) | ||||
s.t. | ||||
and accept the trial point if it satisfies (11) with . Let again , , , and denote the optimal multipliers in (35) and define as in (23). The optimality conditions (24) still hold, this time with and
(36) |
If neither nor has been accepted, we give up on fast NLP-SQP steps and instead revert to QP (18) which safely approximates every cone with an outer approximation. However, the trial point with the step obtained from (18) does not necessarily satisfy (11). In that case, the algorithm adds to to cut off and resolves (18) to get a new trial step . In an inner loop (Steps 22–32), this procedure is repeated until, eventually, a trial step is obtained that satisfies (11). We will show that (11) holds after a finite number of iterations of the inner loop.
It remains to discuss the update of the penalty parameter estimate . One can show (see Lemma 4.1) that an optimal solution of of the SOCP with multipliers is a minimizer of over the set defined in (9) if , where . Since is not known a priori, the algorithm uses the update rule where
(37) |
with . We will prove in Lemma 4.8 that the sequence is bounded under Slater’s constraint qualification. Therefore, this rule will eventually settle at a final penalty parameter that is not changed after a finite number of iterations.
During an iteration of the algorithm, several trial steps may be considered and a preliminary parameter value is computed from (37) for each one. At the end of the iteration, the parameter value corresponding to the accepted trial step is stored. Note that the acceptance test for the second-order correction step from (35) needs to be done with the penalty parameter computed for the regular NLP-SQP step from (27).
3.7 Complete algorithm
The complete method is stated in Algorithm 3. To keep the notation concise, we omit “for all ” whenever the index is used. We assume that all QPs in the algorithm are solved exactly.
Each iteration begins with the computation of the fast NLP-SQP step where an inner loop repeatedly adds cones to until (29) holds. If the step achieves a sufficient decrease in the penalty function, the trial point is accepted. Otherwise, the second-order correction for the NLP-SQP step is computed and accepted if it yields a sufficient decrease for the NLP-SQP step. Note that the second-order correction step is discarded if it does not satisfy (29) since otherwise finite identification of cannot be guaranteed. If none of the NLP-SQP steps was acceptable, the algorithm proceeds with an inner loop in which hyperplanes cutting off the current trial point are repeatedly added until the penalty function is sufficiently decreased. No matter which step is taken, both and are added to according to the update rules (19) and (26) and the multiplier for the nonlinear constraints is updated.
In most cases, a new QP is obtained by adding only a few constraints to the most recently solved QP, and a hot-started QP solver will typically compute the new solution quickly. For example, in each inner iteration in Steps 7–10, hyperplanes for the polyhedral outer approximation for cones augmenting are added to QP (27). Similarly, each inner iteration in Steps 22–32 adds one cutting plane for a violated cone constraint. In Steps 6 and 16, some constraints are removed compared to the most recently solved QP, but also this structure could be utilized.
4 Convergence analysis
4.1 Global convergence
In this section, we prove that, under a standard regularity assumption, all limit points of the sequence of iterates are optimal solutions of the SOCP, if the algorithm does not terminate with an optimal solution in Step 36. We also explore what happens when the SOCP is infeasible.
We make the following assumption throughout this section.
Assumption 3.
The set defined in (9) is bounded.
Since by the initialization of Algorithm 3 and any step satisfies (21b), we have for all . Similarly, (24c) and (14) imply that
(38) |
We start the analysis with some technical results that quantify the decrease in the penalty function model.
Lemma 4.1.
Proof 4.2.
Proof of (i): Consider any . Because is a solution of (18) or (27), there exist and so that the optimality conditions (24) hold. Let . Since , the definition (17) implies that
where and .
Using (15) we have Together with from (24c) and this overall yields
Further, we have from (24b) that and therefore since and .
Using these inequalities and (24a), the choice of yields
Finally, combining this with (10) and (18c) or (27c), respectively, we obtain
For the second equality, we used that for by (38) and the definition of . Since is positive semi-definite, , and , the right-hand side must be non-positive.
Proof of (39): Suppose is not an optimal solution for the SOCP. If is not feasible for the SOCP, must violate a conic constraint and we have for some . Since is positive semidefinite and , part (i) yields (39).
It remains to consider the case when is feasible for the SOCP, i.e., for all . To derive a contradiction, suppose that (39) does not hold. Then part (i) yields
Because is positive semi-definite, this implies . Further, since also
the optimal objective value of (18) or (27), respectively, is zero. At the same time, choosing is also feasible for (18) or (27) and yields the same objective value. Therefore, also is an optimal solution of (18) or (27) and the optimality conditions (24) hold for some multipliers. Because , the same multipliers and show that the optimality conditions of the SOCP (13) also hold. So, is an optimal solution for the SOCP, contradicting the assumption.
The following lemma shows that the algorithm is well-defined and will not stay in an infinite loop in Steps 22–32.
Lemma 4.3.
Proof 4.4.
Suppose the claim is not true and let be the infinite sequence of trial steps generated in the loop in Steps 22–32 for which the stopping condition in Step 25 is never satisfied, and let be a limit point of . We will first show that
(41) |
Let us first consider the case when for some . Then and (41) holds.
Now consider the case that for . Since is a limit point of , there exists a subsequence that converges to . We may assume without loss of generality that for all . Then, for any , by Step 31, . In the inner iteration , the trial step is computed from (18) and satisfies , which by definition (14) implies
Taking the limit and using the fact that for any yields
proving (41). In turn (41) implies that the ratio
converges to 1. Note that the ratio is well-defined since due to Lemma 4.3(39). It then follows that (40) is true for sufficiently large .
Lemma 4.5.
Suppose that there exists so that for all large . Then any limit point of is an optimal solution of the SOCP (1).
Proof 4.6.
From (11) and the updates in the algorithm, we have that
for . Since the SOCP cannot be unbounded below by Assumption 3, the left-hand side is bounded below as . Lemma 4.1(i) shows that all summands are non-positive and we obtain
(42) |
Using Lemma 4.1(i), we also have
Since is positive semi-definite and , this implies that for all , i.e., all limit points of are feasible. This also yields , and since is positive semi-definite and uniformly bounded due to (34), we have
(43) |
Now let be a limit point of . Since is bounded, is bounded, and consequently there exists a subsequence of iterates so that and converge to and , respectively, for some limit point of . Define for all . Then, looking at the QP optimality conditions (24), we see that is also an optimal solution of the linear optimization problem
(45) | ||||
s.t. | ||||
Now suppose, for the purpose of deriving a contradiction, that is not an optimal solution of the SOCP. Since we showed above that is feasible, there then exists a step so that is feasible for (1) and . Then, because , for each , is feasible for (45), and because is an optimal solution of (45), we have . Taking the limit , we obtain , where we used , due to the definition of and (43). However, this contradicts (44). Therefore, must be a solution of the SOCP.
For later reference, we highlight the limit (43) established in the above proof.
Lemma 4.7.
Suppose that there exists so that for all large . Then .
We are now ready to prove that the algorithm is globally convergent under the following standard regularity assumption.
Assumption 4.
The SOCP is feasible and Slater’s constraint qualification holds, i.e., there exists a feasible point and so that is feasible for any with .
This assumption implies that the multiplier estimates are bounded.
Lemma 4.8.
Suppose that Assumption 4 holds. Then is bounded.
Proof 4.9.
Let and be the quantities from Assumption 4. Note that the QP corresponding to the optimality conditions (24) is
s.t. |
Since when is the step accepted by the algorithm, it follows that is an optimal solution of the QP
s.t. |
the Lagrangian dual of which is
(46a) | ||||
(46b) | s.t. | |||
(46c) |
By (24), ,) is a primal-dual optimal solution of these QPs.
Define . Then , and Assumption 4 implies that . Since , we have with (46b) that
(47) |
Since is positive definite, it is
(48) |
Furthermore, Slater’s condition implies strong duality, that is
(49) |
Finally, since is feasible for the SOCP, (1b) and imply . Subtracting on both sides of (47), this, together with (48) and (49), yields
The first two terms are independent of , and since is bounded by Assumption 4 and is uniformly bounded by (34), we can conclude that is uniformly bounded.
It is easy to see that the penalty parameter update rule (37) and Lemma 4.8 imply the following result.
Lemma 4.10.
Suppose Assumption 4 holds. Then there exists and so that , where for all .
We can now state the main convergence theorem of this section.
Theorem 4.11.
Proof 4.12.
Let be a subsequence converging to a limit point . No matter whether an iterate is computed from the optimal solution of (18), (27), or (35), the iterates satisfy the optimality conditions (24). In particular, from (24c) we have for any that and . In the limit, we obtain (since is closed) and . Lemma 4.10 yields that for all large , and so Lemma 4.5 implies that is feasible, i.e., . Therefore, (13c) holds. Using Lemma 4.7 we can take the limit in (24a) and (24b) and deduce also the remaining SOCP optimality conditions (13a) and (13b) hold at the limit point.
4.2 Identification of extremal-active cones
We can only expect fast local convergence under some non-degeneracy assumptions. Throughout this section, we assume that Assumption 2 holds. Under this assumption, is the unique optimal solution [1, Theorem 22], and Theorem 4.11 then implies that
First, we prove a technical result that describes elements in in a compact manner. For this characterization to hold, condition (20) for the initialization of the set of hyperplane-generating points is crucial.
Lemma 4.14.
Let with and . Further, let . Then the following statements hold for :
-
(i)
if .
-
(ii)
if .
Proof 4.15.
Theorem 4.16.
For all sufficiently large, we have .
Proof 4.17.
Choose , then . Because , it is or, equivalently, for sufficiently large. For the remainder of this proof we consider and show that for large . Note that strict complementarity in Assumption 2 implies that , i.e., , and consequently .
First consider the iterations in which fast NLP-SQP steps are accepted in Steps 12 or 17. For the purpose of deriving a contradiction, suppose there exists an infinite subsequence so that or and . Then implies (according to the termination condition in the while loop in Step 7). We also have where from (28) or from (36). Condition (24c) yields , so by (17) it is for some , as well as , which by (14) implies . Then complementarity yields
Since and , we must have , and consequently . It is easy to see that . Since , continuity of yields , in contradiction to . We thus showed that for all large iterations in which the NLP-SQP step was accepted, and consequently (28) and (36) yield for such .
In all other iterations (22) holds, and overall we obtain
(50) |
Let us first consider the case when . Then yields . To apply Lemma 4.14 choose any and let . Then and . Since and is continuous, for sufficiently large , and by Lemma 4.14, . Since and (50) holds, we also have . General conic complementarity in (24c) then implies that for all large , or equivalently, for sufficiently large, as desired.
Now consider the case . For the purpose of deriving a contradiction, suppose there exists a subsequence so that , i.e., , for all . Because , , and , we may assume without loss of generality that and for all . Using this and , we see that the update rule (26) in Step 34 adds to . With (50), we have
(51) |
Remark 4.18.
In the proof of Theorem 4.16, we saw that if and . Since is continuous, this implies that there exists a neighborhood so that , and consequently , for all .
4.3 Quadratic local convergence
As discussed in Section 2.1, since is a solution of the SOCP (1), it is also a solution of the nonlinear problem (4). We now show that Algorithm 3 eventually generates steps that are identical to SQP steps for (4). Then Theorem 2.3 implies that the iterates converge locally at a quadratic rate.
We first need to establish that the assumptions for Theorem 2.3 hold.
Lemma 4.19.
Proof 4.20.
Let and be the optimal multipliers for the SOCP corresponding to , satisfying (13). Assumption 2 implies that and are unique [1, Theorem 22].
Let and define . If , complementarity (13c) implies, for all , that , or equivalently, ; see [1, Lemma 15]. Using (15), this can be written as
(52) |
On the other hand, if , i.e., the constraint (4c) is inactive, then and complementarity (13c) yields (see [1, Definition 23]) and therefore . Consequently, (52) is also valid in that case. Finally, we define for all . With these definitions, (13a) can be restated as
(53) |
where is the vector with the values of at the components corresponding to and zero otherwise. We now prove parts (i), (ii), and (iii) of Assumption 1.
Proof of (i): Let . We already established that yields . Now suppose that . Then . Since strict complementarity is assumed, we have (see the comment after Assumption 2), which in turn yields and hence .
Proof of (ii): Since we need to prove linear independence only of those constraints that are active at , we consider only those rows of for which (4b) is binding.
Without loss of generality suppose is partitioned into four parts, , where , , and correspond to the variables in the cones , , and , respectively, and includes all components of that are not in any of the cones. Further suppose that , where , and that is partitioned in the same way.
Primal non-degeneracy of the SOCP implies all that matrices of the form
have linear independent rows for all scalars and vectors , not all zero [1, Eq. (50)]. This implies that the rows of , together with the gradient of any one of the binding constraints in (4c) and (4d) are linearly independent. Because the constraint gradients, which are of the form and , share no nonzero components when extended to the full space, we conclude that the gradients of all active constraints are linearly independent at , i.e., the LICQ holds.
Proof of (iii): For the purpose of deriving a contradiction, suppose that there exists a direction that lies in the null space of the constraints of (4) that are binding at and for which .
Since is in the null space of the binding constraints, we have , for , and for all . Premultiplying (53) by gives
(54) |
What remains to show is that is a feasible direction for the SOCP, i.e., there exists so that is feasible for the SOCP. Because of (54), this point has the same objective value as and is therefore also an optimal solution of the SOCP. This contradicts the fact that Assumption 2 implies that the optimal solution is unique [1, Theorem 22].
By the definition of in Assumption 1 and the choice of , we have
Since for all , the Hessian is positive semi-definite and from Part (i), this yields for all .
Theorem 4.21.
Suppose that . Then the primal-dual iterates converge locally to at a quadratic rate.
Proof 4.22.
We already established in Theorem 4.11 that the iterates converge to the optimal solution, and since and , the Hessian is not rescaled according to (34) in Step 4. Using Theorem 4.16 we know that, once is sufficiently large, the step computed in Step 6 of Algorithm 3 is identical with the SQP step from (5) for (4); we can ignore (27d) here because and and therefore this constraints is not active for large . This also implies that the condition in Step 7 is never true and thus . If the decrease condition in Step 12 is not satisfied, by a similar argument we have that computed in Step 16 is the second-order correction step from (12) for (4). Due to Lemma 4.19 we can now apply Theorem 2.3 to conclude that either or is accepted to define the next iterate for large and that the iterates converge at a quadratic rate.
5 Numerical Experiments
In this section, we examine the performance of Algorithm 3. First, using randomly generated instances, we consider three types of starting points: (i) uninformative default starting point (cold start), (ii) solution of a perturbed instance, (iii) solution computed by an interior-point SOCP solver whose accuracy we wish to improve. Then we briefly report results using the test library CBLIB. The numerical experiments were performed on an Ubuntu 22.04 Linux server with a 2.1GHz Xeon Gold 5128 R CPU and 256GB of RAM.
5.1 Implementation
We implemented Algorithm 3 in MATLAB R2021b, with parameters , , , and . In each iteration, we identify and . The set is initialized to (see (20)), and is a given starting value for , if provided, and zero otherwise. In addition, since the identification of the optimal extremal-active set requires , we add to , where .
The algorithm terminates when the violation of the SOCP optimality conditions (13) for the current iterate satisfies
(56) |
with , for some .
As in [29], the sufficient descent condition (11) is slightly relaxed by
to account for cancellation error, where is the machine precision. Finally, to avoid accumulating very similar hyperplanes that would lead to degenerate QPs, we do not add a new generating point to if there already exists such that
In these experiments, we disabled the second-order correction step (Steps 16–20) because we noticed that it was never accepted in practice. In a more sophisticated implementation, one would include a heuristic that attempts to detect the Maratos effect and then triggers the second-order correction step in specific situations.
The QPs were solved using ILOG CPLEX V12.10, with optimality and feasibility tolerances set to and “dependency checker” and “numerical precision emphasis” enabled, using the primal simplex method. When CPLEX did not report a solution status “optimal” and the QP KKT error was above , a small perturbation was added to the Hessian matrix, i.e., we replaced by . This helped in some cases in which CPLEX (incorrectly) reported that was not positive semi-definite. If CPLEX still did not find a QP solution with KKT error less than , we attempted to resolve the QP with the barrier method, the dual simplex method, and the primal simplex method again, until one was able to compute a solution. If all solvers failed for QP (27), the algorithm continued in Step 22. If no solver was able to solve (18), we terminated the main algorithm and declared a failure.
We emphasize that the purpose of our implementation is to assess whether the proposed algorithm exhibits behavior that validates the stated goals: Convergence from any starting point and rapid local convergence to highly accurate solutions. In its current implementation, it requires more computation time than highly sophisticated commercial solvers such as MOSEK or CPLEX, which were developed over decades and have highly specialized linear algebra routines that are tightly integrated into the algorithms. As we observed at the end of Section 3.7, many of the QPs in Algorithm 3 that are solved in succession are similar to each other, and savings in computation times should therefore be achievable. However, our prototype implementation based on the Matlab CPLEX interface does not allow us to utilize callback functions for adding or removing hyperplanes. Achieving these savings in computation time thus requires a more sophisticated implementation, a task that is outside of the scope of this paper. Consequently, we do not report solution times here.
5.2 Randomly generated QCQPs
The experiments were performed on randomly generated SOCP instances of varying sizes, specified by . Here, are the number of variables and linear constraints, respectively. specifies the number of cones of each “activity type”: , , and , i.e., there are cones that are extremal-active, that are active at the boundary, and that are inactive at the optimal solution . The dimensions of the cones are randomly chosen. In addition, there are variables that are not part of any cone, with bounds chosen in a way so that the non-degeneracy assumption, Assumption 2, holds. A detailed description of the problem generation is stated in Appendix A in [18].
solved | total | SQP | total | total | |||
---|---|---|---|---|---|---|---|
iter | iter | QP (27) | QP (18) | ||||
200 | 60 | 10 | 30 | 6.67 | 6.67 | 9.77 | 0.00 |
400 | 120 | 20 | 30 | 7.20 | 7.20 | 11.57 | 0.00 |
1000 | 300 | 50 | 30 | 7.23 | 7.23 | 12.17 | 0.00 |
200 | 60 | 4 | 30 | 7.53 | 7.07 | 11.83 | 0.90 |
400 | 120 | 8 | 30 | 8.27 | 7.77 | 14.20 | 1.00 |
1000 | 300 | 20 | 30 | 8.67 | 7.80 | 15.93 | 1.83 |
200 | 60 | 2 | 30 | 8.47 | 7.87 | 13.90 | 1.20 |
400 | 120 | 4 | 30 | 8.87 | 8.07 | 15.30 | 1.60 |
1000 | 300 | 10 | 30 | 9.47 | 8.43 | 17.27 | 1.97 |
Table 1 summarizes the performance of the algorithm with an uninformative starting point . Each row lists average statistics for a given problem size (), taken over 30 random instances. We see that the proposed algorithm is very reliable and solved every instance to the tolerance . The average number of iterations is mostly between 7–9, during most of which the second-order NLP-SQP step was accepted.
To give an idea of the computational effort, we report the number of times QPs (27) and (18) were solved. And we can draw further conclusions from this data: Consider, for example, the last row. At the beginning of each iteration, QP (27) is solved to obtain the NLP-SQP step. The difference with the total number of iterations, i.e., 17.27-9.47=7.80, gives us the total number of times in which the guess of the extremal-active cones needed to be corrected in Steps 7–10. In other words, on average, the loop Steps 7–10 is executed 7.80/9.47=0.82 times per iteration. Similarly, the last column tells us the total number of iterations of the loop in Steps 22–32. The loop was only executed when the NLP-SQP step was not accepted, so in 9.47-8.43=1.04 iterations, taking 1.97/1.04=1.89 loop iterations on average.
The experiments are presented in three groups where the ratio between and is kept constant. As the number of cones, , decreases from one group to the next, the average size of the individual cones increases by a factor of and , respectively. This increase seems to result in slightly more iterations in which the SQP step was rejected, indicating that the simple linearization (27c) of the non-extremal-active cones becomes sometimes insufficiently accurate.
total | SQP | total | total | Mosek | final | |||
iter | iter | QP (27) | QP (18) | error | error | |||
200 | 60 | 10 | 1.10 | 1.07 | 1.10 | 0.07 | 2.33e-06 | 1.63e-10 |
400 | 120 | 20 | 1.03 | 1.00 | 1.03 | 0.03 | 2.67e-06 | 1.70e-10 |
1000 | 300 | 50 | 1.07 | 1.03 | 1.07 | 0.03 | 3.49e-06 | 1.76e-10 |
200 | 60 | 4 | 1.03 | 1.03 | 1.03 | 0.00 | 5.97e-06 | 1.69e-10 |
400 | 120 | 8 | 1.00 | 1.00 | 1.00 | 0.00 | 2.28e-06 | 1.87e-10 |
1000 | 300 | 20 | 1.03 | 0.83 | 1.03 | 0.27 | 5.20e-06 | 1.72e-10 |
200 | 60 | 2 | 1.00 | 1.00 | 1.00 | 0.00 | 2.02e-06 | 1.53e-10 |
400 | 120 | 4 | 1.13 | 1.10 | 1.13 | 0.03 | 4.85e-06 | 2.03e-10 |
1000 | 300 | 10 | 1.20 | 1.10 | 1.20 | 0.13 | 1.22e-05 | 2.41e-10 |
The remaining experiments in this section investigate to which degree the algorithm is able to achieve our primary goal of taking advantage of a good starting point. We begin with an extreme situation, in which we first solve an instance with the interior-point SOCP solver MOSEK V9.1.9 (called via CVX), using the setting cvx_precision=high corresponding to the MOSEK tolerance , and give the resulting primal-dual solution as starting point to Algorithm 3. Choosing any tighter MOSEK tolerances leads to failures in several problems. Table 2 summarizes the results. In all cases, the algorithm converges rapidly to an improved solution, reducing the error by 4 orders of magnitude, most of the time with only a single second-order iteration. The Mosek error was dominated by the violation of complementarity. This demonstrates the ability of the proposed method to improve the accuracy of a solution computed by an interior-point method.
solved | total | SQP | total | total | |||
iter | iter | QP (27) | QP (18) | ||||
200 | 60 | 10 | 30 | 1.00 | 0.97 | 1.00 | 0.07 |
400 | 120 | 20 | 30 | 1.00 | 0.97 | 1.00 | 0.03 |
1000 | 300 | 50 | 30 | 1.00 | 0.97 | 1.00 | 0.07 |
200 | 60 | 4 | 30 | 1.00 | 1.00 | 1.00 | 0.00 |
400 | 120 | 8 | 30 | 1.00 | 0.93 | 1.00 | 0.07 |
1000 | 300 | 20 | 30 | 1.00 | 0.87 | 1.00 | 0.20 |
200 | 60 | 2 | 30 | 1.00 | 1.00 | 1.00 | 0.00 |
400 | 120 | 4 | 30 | 1.00 | 0.97 | 1.00 | 0.03 |
1000 | 300 | 10 | 30 | 1.07 | 1.00 | 1.03 | 0.07 |
solved | total | SQP | total | total | |||
iter | iter | QP (27) | QP (18) | ||||
200 | 60 | 10 | 30 | 1.20 | 1.07 | 1.00 | 0.19 |
400 | 120 | 20 | 30 | 1.33 | 1.17 | 1.00 | 0.73 |
1000 | 300 | 50 | 30 | 1.60 | 1.23 | 1.02 | 1.29 |
200 | 60 | 4 | 30 | 1.27 | 1.13 | 1.02 | 0.71 |
400 | 120 | 8 | 30 | 1.67 | 1.27 | 1.16 | 0.48 |
1000 | 300 | 20 | 30 | 2.10 | 1.40 | 1.27 | 0.57 |
200 | 60 | 2 | 30 | 1.67 | 1.33 | 1.24 | 0.42 |
400 | 120 | 4 | 30 | 2.30 | 1.87 | 1.30 | 0.38 |
1000 | 300 | 10 | 30 | 3.67 | 2.53 | 1.53 | 0.59 |
For the final experiments, summarized in Tables 3 and 4, the starting point is the MOSEK solution of a perturbed problem, in which 10% of the objective coefficients were perturbed by uniformly distributed random noise of the order of and , respectively. For the small perturbation, similar to Table 2, Algorithm 3 terminated in one iteration most of the time. More iterations were required for the larger perturbation, but still significantly fewer compared to the uninformative starting point, see Table 1.
5.3 CBLIB instances
Problem | # var | # con | # soc | solved/ | total | SQP | iter warm/ |
---|---|---|---|---|---|---|---|
subset | total | iter | iter | iter cold | |||
10/20* | 3024.0 | 6030.6 | 336.0 | 10/11 | 6.5 | 6.5 | 0.57 |
achtziger_* | 2314.0 | 1856.7 | 257.0 | 6/7 | 21.3 | 10.2 | 0.53 |
as_* | 4909.0 | 5695.7 | 669.0 | 20/20 | 10.4 | 8.4 | 0.37 |
ck_* | 2376.0 | 2375.0 | 15.0 | 90/90 | 5.8 | 4.6 | 0.40 |
classical_* | 159.3 | 200.4 | 1.0 | 409/409 | 7.3 | 7.3 | 0.49 |
clay* | 457.2 | 691.9 | 82.9 | 11/12 | 67.6 | 13.2 | 1.31 |
estein* | 103.1 | 165.2 | 14.0 | 9/9 | 18.3 | 4.4 | 0.30 |
flay* | 175.0 | 408.0 | 4.0 | 10/10 | 7.0 | 3.6 | 0.34 |
fo[7-9]* | 208.6 | 540.2 | 15.9 | 19/19 | 12.3 | 3.0 | 0.53 |
m[3-9]* | 150.2 | 370.5 | 12.8 | 8/8 | 10.9 | 4.0 | 0.44 |
nb* | 3098.8 | 321.0 | 816.0 | 4/4 | 4.8 | 4.2 | 3.43 |
netmod* | 989.7 | 2593.0 | 4.7 | 3/3 | 4.0 | 4.0 | 0.33 |
no7_* | 169.0 | 438.0 | 14.0 | 5/5 | 10.6 | 4.4 | 0.46 |
o[7-9]_* | 183.2 | 466.9 | 14.7 | 9/9 | 8.7 | 3.2 | 0.66 |
pp-* | 2960.0 | 2221.0 | 370.0 | 6/6 | 12.8 | 12.8 | 0.17 |
robust_* | 253.9 | 298.3 | 2.0 | 420/420 | 6.8 | 6.8 | 0.67 |
sched_* | 7361.5 | 3685.0 | 1.5 | 4/4 | 12.5 | 10.0 | 0.35 |
shortfall_* | 249.9 | 294.3 | 2.0 | 420/420 | 7.8 | 7.8 | 0.37 |
slay* | 393.0 | 936.0 | 14.0 | 14/14 | 8.6 | 7.0 | 0.27 |
sssd-* | 286.0 | 314.5 | 18.0 | 16/16 | 7.6 | 5.6 | 0.42 |
stolpe07-* | 1483.0 | 1202.0 | 164.7 | 3/3 | 29.0 | 18.3 | 0.71 |
tls* | 521.4 | 989.8 | 51.6 | 5/6 | 7.6 | 7.0 | 0.49 |
turbine* | 201.6 | 289.3 | 52.4 | 7/7 | 4.9 | 4.9 | 0.30 |
uflquad-* | 8011.0 | 6811.0 | 1333.3 | 3/4 | 15.3 | 14.7 | 0.19 |
wiener_* | 2297.5 | 2508.5 | 71.3 | 41/45 | 12.4 | 9.9 | 0.43 |
misc | 1534.1 | 1374.5 | 158.9 | 11/14 | 10.5 | 7.6 | 0.33 |
To demonstrate the robustness of the algorithm we also solved instances from the Conic Benchmark Library CBLIB [32]. Some instances involve rotated second-order cone constraints, and we reformulated them so that they fit into our standard form (1). We chose all 1,575 instances with at most 10,000 variables and 10,000 constraints. Integer variables were relaxed to be continuous.
Table 5 shows the statistics for the starting point , where instances were grouped into sets of problems with similar names, and the remaining ones are collected in misc. The method was able to solve 99.2% of the instances, where 10 problems could not be solved due to failures of the QP subproblem solver, and Algorithm 3 exceeded the maximum number of 200 iterations in 2 cases. In comparison, MOSEK, with default settings, failed on 5 instances (those were solved correctly with Algorithm 3), incorrectly declared 3 instances to be infeasible, and labeled 6 instances to be unbounded (of which 3 were solved by Algorithm 3). We observed that some instances, especially those in the clay*, fo[7-9]*, m[3-9]*, no7*, o[7-9]_* subsets, are degenerate, having an optimal objective function value of 0, and the assumption necessary to prove fast local convergence is violated. This matches the observations in the table, where the SQP step was accepted only in a relatively small fraction of the iterations.
To showcase the warm-starting feature of the algorithm, we took the 1,563 previously successfully solved instances, perturbed 10% of the entries of the final primal-dual iterate by a random perturbation, uniformly chosen in , and used this as the starting point for a warm-started run. Here, QP subproblem failure occurred in 3 cases and 2 instances exceeded the iteration limit. As we can see from the last column in Table 5, on average, the number of iterations was reduced in most cases, often by more than 50%.
6 Concluding remarks
We presented an SQP algorithm for solving SOCPs and proved that it converges from any starting point and achieves local quadratic convergence for non-degenerate SOCPs. Our numerical experiments indicate that the algorithm is reliable, converges quickly when a good starting point is available, and produces more accurate solutions than a state-of-the-art interior-point solver.
Future research would investigate whether the proposed algorithm is a valuable alternative for interior-point methods for small problems or for the solution of a sequence of related SOCPs. An efficient implementation of the algorithm beyond our Matlab prototype would be tightly coupled with a tailored active-set QP solver that efficiently adds or removes cuts instead of solving each QP essentially from scratch. Parametric active-set solvers such as qpOASES [8] or QORE [26] might be suitable options since they do not require primal or dual feasible starting points.
Acknowledgments
We thank Javier Peña for his suggestion for the proof of Lemma 4.8, as well as three referees whose comments helped us to improve the paper.
References
- [1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Mathematical Programming, 95(1):3–51, 2003.
- [2] A. Ben-Tal and A. Nemirovski. On polyhedral approximations of the second-order cone. Mathematics of Operations Research, 26(2):193–205, 2001.
- [3] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta numerica, 4:1–51, 1995.
- [4] C. Coey, M. Lubin, and J. P. Vielma. Outer approximation with conic certificates for mixed-integer convex problems. Mathematical Programming Computation, 12(2):249–293, 2020.
- [5] A. R. Conn, N. I. Gould, and P. L. Toint. Trust region methods. SIAM, 2000.
- [6] M. Diehl, F. Jarre, and C. H. Vogelbusch. Loss of superlinear convergence for an SQP-type method with conic constraints. SIAM Journal on Optimization, 16(4):1201–1210, 2006.
- [7] S. Drewes and S. Ulbrich. Subgradient based outer approximation for mixed integer second order cone programming. In Mixed integer nonlinear programming, pages 41–59. Springer, 2012.
- [8] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl. qpOASES: A parametric active-set algorithm for quadratic programming. Mathematical Programming Computation, 6:327–363, 2014.
- [9] R. Fletcher. Second order corrections for non-differentiable optimization. In Numerical analysis, pages 85–114. Springer, 1982.
- [10] N. Goldberg and S. Leyffer. An active-set method for second-order conic-constrained quadratic programming. SIAM Journal on Optimization, 25(3):1455–1477, 2015.
- [11] N. Goswami, S. K. Mondal, and S. Paruya. A comparative study of dual active-set and primal-dual interior-point method. IFAC Proceedings Volumes, 45(15), 2012.
- [12] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022.
- [13] S. Hayashi, T. Okuno, and Y. Ito. Simplex-type algorithm for second-order cone programmes via semi-infinite programming reformulation. Optimization Methods and Software, 31(6):1272–1297, 2016.
- [14] L. Hei, J. Nocedal, and R. A. Waltz. A numerical study of active-set and interior-point methods for bound constrained optimization. In Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam, pages 273–292. Springer, 2008.
- [15] IBM ILOG. User’s Manual for CPLEX, 2019.
- [16] H. Kato and M. Fukushima. An SQP-type algorithm for nonlinear second-order cone programs. Optimization Letters, 1(2):129–144, 2007.
- [17] M. S. K. Lau, S. P. Yue, K. V. Ling, and J. M. Maciejowski. A comparison of interior point and active set methods for fpga implementation of model predictive control. In 2009 European Control Conference (ECC), pages 156–161, 2009.
- [18] X. Luo and A. Wächter. A Quadratically Convergent Sequential Programming Method for Second-Order Cone Programs Capable of Warm Starts, 2022. arXiv:2207.03081.
- [19] N. Maratos. Exact penalty function algorithms for finite dimensional and control optimization problems. PhD thesis, Imperial College London (University of London), 1978.
- [20] D. K. Molzahn and I. A. Hiskens. A survey of relaxations and approximations of the power flow equations. Foundations and Trends in Electric Energy Systems, 4(1-2):1–221, 2019.
- [21] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.1., 2019.
- [22] M. Nayakakuppam, M. Overton, and S. Schemita. SDPpack user’s guide-version 0.9 Beta, 1997.
- [23] J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
- [24] T. Okuno, K. Yasuda, and S. Hayashi. Sl1QP based algorithm with trust region technique for solving nonlinear second-order cone programming problems. Interdisciplinary Information Sciences, 21(2):97–107, 2015.
- [25] F. A. Potra and S. J. Wright. Interior-point methods. Journal of Computational and Applied Mathematics, 124(1-2):281–302, 2000.
- [26] L. Schork. A parametric active set method for general quadratic programming. Master’s thesis, Heidelberg University, Germany, 2015.
- [27] P. Tseng. Second‐order cone programming relaxation of sensor network localization. SIAM Journal on Optimization, 18(1):156–185, 2007.
- [28] R. J. Vanderbei and H. Yurttan. Using LOQO to solve second-order cone programming problems. Technical report, Priceton University, 1998.
- [29] A. Wächter and L. T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
- [30] V. Zhadan. The variant of primal simplex-type method for linear second-order cone programming. In Optimization and Applications: 12th International Conference, OPTIMA 2021, Petrovac, Montenegro, September 27–October 1, 2021, Proceedings, pages 64–75. Springer, 2021.
- [31] X. Zhang, Z. Liu, and S. Liu. A trust region SQP-filter method for nonlinear second-order cone programming. Computers & Mathematics with Applications, 63(12):1569–1576, 2012.
- [32] Zuse Institute Berlin. CBLIB - The Conic Benchmark Library. https://cblib.zib.de/.
Appendix A Generation of random instances
The test instances used for the numerical experiments have the form
s.t. | |||
where is the partition of the objective vector , and is the partition of the column vectors of constraint matrix . The subvector includes all optimization variables that are not in any of the cones.
In contrast to the original SOCP (1), this formulation includes linear equality constraints, but it is straightforward to generalize Algorithm 3 and its convergence proofs for this setting. The formulation above also includes large upper bounds on all variables so that the set defined in (9) is bounded. However, the upper bounds were chosen so large that they are not active at the optimal solution.
Algorithm 4 describes how the data for these instances are generated. The algorithm generates an optimal primal-dual solution in a way so that, at the solution, cones are extremal-active, for cones the optimal solution lies in the interior of the cone, and for the optimal solution is at the boundary but not the extreme point of the cone. In our experiments, there is an equal number of of each type. The number of variables that are active at the lower bound, , is chosen in a way so that the optimal solution is non-degenerate. The linear constraints reduce the number of degrees of freedom by , each extremal active cone by , and each otherwise active cone by . Step 16 calculates the number of constraints that are active at zero so that the total number of degrees of freedom fixed by constraints equals . Lastly, in Step 24, we use pcond and dcond functions provided in SDPPack [22] to double-check if a generated instance is also numerically non-degenerate. These functions return the primal and dual conditions numbers, and , respectively, and we discard an instance if either number is above a threshold. Only about 1% of instances created were excluded in this manner.