This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Iteration Complexity of an Infeasible Interior Point Methods for Seconder-order Cone Programming and its Warmstarting

Yushu Chen and Guangwen Yang
Department of Computer Science and Technology
Tsinghua University
Beijing, China, 100084
Lu Wang, Qingzhong Gan, and Haipeng Chen
Shanghai Aerospace Control Technology Institute
Shanghai, China, 201108
Also at National Supercomputing Center in Wuxi, Jiangsu, China. chenyushu@mail.tsinghua.edu.cnCorresponding author. Also at National Supercomputing Center in Wuxi, Jiangsu, China. ygw@mail.tsinghua.edu.cn
Abstract

This paper studies the worst case iteration complexity of an infeasible interior point method (IPM) for seconder order cone programming (SOCP), which is more convenient for warmstarting compared with feasible IPMs. The method studied bases on the homogeneous and self-dual model and the Monteiro-Zhang family of searching directions. Its worst case iteration complexity is O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right), to reduce the primal residual, dual residual, and complementarity gap by a factor of ϵ\epsilon, where kk is the number of cone constraints. The result is the same as the best known result for feasible IPMs. The condition under which warmstarting improves the complexity bound is also studied.

1 Introduction

Warm starting of IPMs is widely perceived to be difficult [Potra and Wright, 2000]. When the initial value is close to the boundary and not well-centered, the IPMs usually convergences very slow. Compared with feasible IPMs, since infeasible IPMs can be initialized trivially under the cone constraints without satisfying the equity constrains, they are more convenient for warm starting [Skajaa et al., 2013]. In order to study the convergence of warm starting schemes for SOCP, a better understanding on the iteration complexity of infeasible IPMs is required. This paper studies the iteration-complexity of a infeasible interior point method for SOCP, and present the condition under which warmstarting improves the complexity.

Most of the existing works on the worst-case iteration complexity of IPMs for SOCP focus on feasible IPMs, requiring that the iterations satisfy the equity constraints. Among these works, Nesterov and Todd [1997, 1998] present the Nesterov-Todd (NT) searching direction and show that the short-step path-following algorithm [Kojima et al., 1989] using the direction has O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right) iteration complexity, where kk is the number of cones. Tsuchiya [1999] further studies the convergence of path-following algorithms for SOCP, and shows that: (1) using the HRVW/KSH/M directions [Helmberg et al., 1996, Kojima et al., 1997, Monteiro, 1997], the iteration complexity of the shortstep, semi-long step and long step methods are O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right), O(klog(ϵ1))O\left(k\log\left(\epsilon^{-1}\right)\right), and O(k3/2log(ϵ1))O\left(k^{3/2}\log\left(\epsilon^{-1}\right)\right), respectively; (2) using the NT directions, the iteration complexity of the methods are O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right), O(klog(ϵ1))O\left(k\log\left(\epsilon^{-1}\right)\right), and O(klog(ϵ1))O\left(k\log\left(\epsilon^{-1}\right)\right) iterations, respectively. Monteiro and Tsuchiya [2000] extend the O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right) complexity bound to the short-step method and the Mizuno-Todd-Ye predictor-corrector method using the Monteiro-Zhang (MZ) family [Monteiro, 1997, Monteiro and Zhang, 1998, Zhang, 1998] family of directions, that includes the HRVW/KSH/M directions and NT directions as special cases. Mohammad-Nezhad and Terlaky [2019] show a Newton method converges quadraticly, when initialized using a IPM solution with a sufficiently small complementary gap.

Compared with feasible IPMs, iteration complexity of infeasible IPMs for SOCP are not thoroughly investigated. Rangarajan [2006] study the convergence of infeasible IPMs for conic programing that includes SOCP as a special case. They show using the 𝒩\mathcal{N}_{-\infty} central path neighborhood, the method using the NT directions takes O(k2log(ϵ1))O\left(k^{2}\log\left(\epsilon^{-1}\right)\right) iterations, and the method using th XS or SX directions [Zhang, 1998] takes O(k5/2log(ϵ1))O\left(k^{5/2}\log\left(\epsilon^{-1}\right)\right) iterations.

In this paper, we show an infeasible IPM for SOCP has O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right) iteration complexity, that is the same as the best known result of feasible IPMs. We also present the condition under which warmstarting improves the bound. The method studied is an extension of the well know short-step path-following algorithm for linear programming, carried over to the context of SOCP. The modifications are as follows: firstly, the iterates are not required to be feasible to the equity constraints, but only be in the interior of the cone constraints, so that it is convenient for warmstarting; secondly, a homogeneous and self-dual (HSD) model [Nemirovskii and Nesterov, 1993, Peng et al., 2003] is applied, therefore an infeasibility certificate can be obtained when either the primal or dual problem is infeasible; thirdly, the MZ family of searching directions is employed for acceleration. The warmstarting scheme analyzed is presented in a companion paper [Chen et al., 2022], which uses initial values modified from inexact solutions of previous problems to save computation.

The paper is organized as follows. Section II introduces the SOCP problem and the IPM for solving it. Section III studies the iteration complexity for the IPM to reduce the primal residual, dual residual, and complementarity gap. Section IV studies the conditions that warm starting can improve the worst case complexity compared with cold starting.

1.1 Notation and terminology

The following notations are used throughout the paper. +\mathcal{R}_{+} denotes the set of positive real numbers, n\mathcal{R}^{n} denotes the nn-dimensional Euclidean space, and m×n\mathcal{R}^{m\times n} denotes the set of all m×nm\times n matrices with real entries. 𝐈n\mathbf{I}_{n} denotes the identity matrix of order nn. 𝐆0\mathbf{G}\succ 0 means 𝐆\mathbf{G} is symmetric and positive definite, and 𝐆0\mathbf{G}\succeq 0 means 𝐆\mathbf{G} is symmetric and positive semi-definite. For a square matrix 𝐌\mathbf{M} with all real eigenvalues, its largest and smallest eigenvalues are denoted by λmax(𝐌)\lambda_{\max}(\mathbf{M}) and λmin(𝐌)\lambda_{\min}(\mathbf{M}). For a set 𝒦\mathcal{K}, int(𝒦)\text{int}(\mathcal{K}) denotes its interior. diag()\text{diag}(\cdots) denotes a diagonal matrix using a given vector as diagonal, and blkdiag()\text{blkdiag}(\cdots) means a block diagonal matrix with given matrix blocks. \left\|\cdots\right\| without specification is used instead of 2\left\|\cdots\right\|_{2} for simplicity. With a little abuse of notations, when concatenating column vectors 𝐱\mathbf{x} and 𝐲\mathbf{y}, we use (𝐱,𝐲)(\mathbf{x},\mathbf{y}) instead of (𝐱T,𝐲T)T(\mathbf{x}^{T},\mathbf{y}^{T})^{T} for simplicity.

2 The SOCP problem and the infeasible IPM

In this section, we introduce the seconder-order cone programming (SOCP) problem, some important concepts about the problem, and the infeasible IPM for SOCP studied in this paper.

2.1 The SOCP problem

SOCP minimize a linear function over the intersection of an affine set and the direct product of linear cones (LC) and second-order cones (SOC).

The linear cones 𝒦Ln\mathcal{K}_{L}^{n} and second-order cones 𝒦Sn\mathcal{K}_{S}^{n} are defined as

𝒦Ln={𝐯𝐑n:𝐯0},n1,\displaystyle\mathcal{K}_{L}^{n}=\{\mathbf{v}\in{{\mathbf{R}}^{n}}:\mathbf{v}\geq 0\},n\geq 1, (1)
𝒦Sn={𝐯𝐑n:𝐯1𝐯2:n},n2,\displaystyle\mathcal{K}_{S}^{n}=\{\mathbf{v}\in{{\mathbf{R}}^{n}}:{{\mathbf{v}}_{1}}\geq\left\|{{\mathbf{v}}_{2:n}}\right\|\},n\geq 2,

The 1-dimensional SOC is defined as 𝒦S1=𝒦L1\mathcal{K}_{S}^{1}=\mathcal{K}_{L}^{1}.

Then, the standard form of SOCP is defined as

min𝐱𝐜T𝐱,\displaystyle\min_{\mathbf{x}}\mathbf{c}^{T}\mathbf{x}, (2)
s.t.
𝐀𝐱=𝐛,\displaystyle\mathbf{A}\mathbf{x}=\mathbf{b},
𝐱𝒦,\displaystyle\mathbf{x}\in\mathcal{K},
𝒦=𝒦Ll×𝒦Snl+1×𝒦Snl+2×𝒦Snl+m,\displaystyle\mathcal{K}=\mathcal{K}_{L}^{l}\times\mathcal{K}_{S}^{{{n}_{l+1}}}\times\mathcal{K}_{S}^{{{n}_{l+2}}}\cdots\times\mathcal{K}_{S}^{{{n}_{l+m}}},

where 𝐀p×n\mathbf{A}\in\mathcal{R}^{p\times n}, rank(𝐀)=pn\text{rank}(\mathbf{A})=p\leq n. Without loss of generality, we assume that in the solution variable the linear cone is arranged at first, followed with the SOCs. The number of cones is

k=l+m,k=l+m, (3)

where the ll-dimensional LC 𝒦Ln\mathcal{K}_{L}^{n} is viewed as ll 1-dimensional LCs, so that ni=1n_{i}=1 for i{1,,l}i\in\left\{1,\cdots,l\right\}. The ii-th cone is also mentioned as 𝒦i\mathcal{K}^{i} for simplicity, and correspondingly 𝒦=𝒦1××𝒦k\mathcal{K}=\mathcal{K}^{1}\times\cdots\times\mathcal{K}^{k}. For a vector 𝐯𝒦\mathbf{v}\in\mathcal{K}, we use 𝐯(i)\mathbf{v}^{(i)} to denote the subvector in the cone 𝒦i\mathcal{K}^{i}. For 𝐯n𝒦\mathbf{v}\in\mathcal{R}^{n}-\mathcal{K}, 𝐯(i)\mathbf{v}^{(i)} also denotes the subvector corresponding to the position of 𝒦i\mathcal{K}^{i}.

The dual problem of SOCP Eq. (2) is

max𝐲𝐛T𝐲\displaystyle\max_{\mathbf{y}}\mathbf{b}^{T}\mathbf{y} (4)
s.t.
𝐀T𝐲+𝐬=𝐜,\displaystyle\mathbf{A}^{T}\mathbf{y}+\mathbf{s}=\mathbf{c},
𝐬𝒦.\displaystyle\mathbf{s}\in\mathcal{K}.

2.2 Important concepts about SOCP

This subsection introduce some important concepts about SOCP for later use.

Firstly, we introduce the Euclidean Jordan algebra associated with the SOC (see e.g. [Faraut and Koranyi, 1994, Faybusovich, 1997a, b]), which is crucial in the IPMs for SOCP. The algebra for the SOC 𝒦i\mathcal{K}^{i} is defined as

𝐱(i)𝐬(i)=((𝐱(i))T𝐬(i),𝐱1(i)𝐬2:ni(i)+𝐬1(i)𝐱2:ni(i)),𝐱(i),𝐬(i)ni\mathbf{x}^{(i)}\circ\mathbf{s}^{(i)}=\left({\left(\mathbf{x}^{(i)}\right)}^{T}\mathbf{s}^{(i)},\mathbf{x}_{1}^{(i)}\mathbf{s}_{2:n_{i}}^{(i)}+\mathbf{s}_{1}^{(i)}\mathbf{x}_{2:n_{i}}^{(i)}\right),\forall\mathbf{x}^{(i)},\mathbf{s}^{(i)}\in\mathcal{R}^{n_{i}} (5)

The Euclidean Jordan algebra for 𝒦\mathcal{K} is defined as

𝐱𝐬=(𝐱(1)𝐬(1),,𝐱(k)𝐬(k)),𝐱,𝐬n\mathbf{x}\circ\mathbf{s}=\left(\mathbf{x}^{(1)}\circ\mathbf{s}^{(1)},\cdots,\mathbf{x}^{(k)}\circ\mathbf{s}^{(k)}\right),\forall\mathbf{x},\mathbf{s}\in\mathcal{R}^{n} (6)

From now on, we assume that the space n\mathcal{R}^{n} is endowed with the Euclidean Jordan algebra.

The unit element of the algebra is 𝐞=(𝐞(1),,𝐞(k))\mathbf{e}=\left(\mathbf{e}^{(1)},\cdots,\mathbf{e}^{(k)}\right), where 𝐞(i)=(1,0,,0)T,i1,,k\mathbf{e}^{(i)}=(1,0,\cdots,0)^{T},i\in{1,\cdots,k}, because 𝐯𝐞=𝐯\mathbf{v}\circ\mathbf{e}=\mathbf{v}. Then, the algebra can be written as multiplication of arrow head matrices and the unit element, as

𝐱𝐬=mat(𝐱)mat(𝐬)𝐞,\mathbf{x}\circ\mathbf{s}=\text{mat}(\mathbf{x})\text{mat}(\mathbf{s})\mathbf{e}, (7)

where the arrow head matrix for 𝐯n\mathbf{v}\in\mathcal{R}^{n} is defined as

mat(𝐯)=blkdiag(mat(𝐯(1)),mat(𝐯(k))),\displaystyle\text{mat}(\mathbf{v})=\text{blkdiag}\left(\text{mat}\left(\mathbf{v}^{(1)}\right)\cdots,\text{mat}\left(\mathbf{v}^{(k)}\right)\right), (8)
mat(𝐯(i))(𝐯1(i)(𝐯2:n(i))T𝐯2:n(i)𝐯1(i)𝐈ni1),i1,,k.\displaystyle\text{mat}\left(\mathbf{v}^{(i)}\right)\triangleq\begin{pmatrix}{\mathbf{v}^{(i)}_{1}}&\left(\mathbf{v}^{(i)}_{2:n}\right)^{T}\\ \mathbf{v}^{(i)}_{2:n}&{\mathbf{v}^{(i)}_{1}}{{\mathbf{I}}_{n_{i}-1}}\end{pmatrix},i\in{1,\cdots,k}.

It is easy to verify that the eigenvalues of the arrow head matrix satisfies

λmin(mat(𝐯))=𝐯1𝐯2:n,λmax(mat(𝐯))=𝐯1+𝐯2:n.\lambda_{\min}(\text{mat}(\mathbf{v}))=\mathbf{v}_{1}-\|\mathbf{v}_{2:n}\|,\quad\lambda_{\max}(\text{mat}(\mathbf{v}))=\mathbf{v}_{1}+\|\mathbf{v}_{2:n}\|. (9)

Consequently, we define the notation

λmin(𝐯)𝐯1𝐯2:n,λmax(𝐯)𝐯1+𝐯2:n.\lambda_{\min}(\mathbf{v})\triangleq\mathbf{v}_{1}-\|\mathbf{v}_{2:n}\|,\quad\lambda_{\max}(\mathbf{v})\triangleq\mathbf{v}_{1}+\|\mathbf{v}_{2:n}\|. (10)

For a vector 𝐯𝒦\mathbf{v}\in\mathcal{K}, λi1(𝐯)\lambda_{i}^{1}(\mathbf{v}) and λi2(𝐯)\lambda_{i}^{2}(\mathbf{v}) are used as abbreviations of λmin(𝐯(i))\lambda_{\min}(\mathbf{v}^{(i)}) and λmax(𝐯(i))\lambda_{\max}(\mathbf{v}^{(i)}), where 𝐯(i)\mathbf{v}^{(i)} denotes the part of 𝐯\mathbf{v} in the ii-th cone.

Secondly, we introduce the homogeneous and self-dual (HSD) model and the central path. The HSD model combines the primal and dual problem, and enables to solved them simultaneously. The model is given by

minβ~ν\displaystyle\min\tilde{\beta}\nu (11)
s.t.
𝐀𝐱𝐛τ𝐫~pν=0,\displaystyle\mathbf{Ax}-\mathbf{b}\tau-\tilde{\mathbf{r}}_{p}\nu=0,
𝐀T𝐲+𝐜τ𝐬𝐫~dν=0,\displaystyle-{{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{c}\tau-\mathbf{s}-\tilde{\mathbf{r}}_{d}\nu=0,
𝐛T𝐲𝐜T𝐱κr~gν=0,\displaystyle{{\mathbf{b}}^{T}}\mathbf{y}-{{\mathbf{c}}^{T}}\mathbf{x}-\kappa-\tilde{r}_{g}\nu=0,
𝐫~pT𝐲+𝐫d,0T𝐱+r~gτ=β~,\displaystyle\tilde{\mathbf{r}}_{p}^{T}\mathbf{y}+\mathbf{r}_{d,0}^{T}\mathbf{x}+\tilde{r}_{g}\tau=-\tilde{\beta},
(𝐱,𝐲,𝐬,κ,τ)𝒞,𝒞=𝒦×p×𝒦×𝒦L1×𝒦L1,\displaystyle(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\in\mathcal{C},\mathcal{C}=\mathcal{K}\times\mathcal{R}^{p}\times\mathcal{K}\times\mathcal{K}_{L}^{1}\times\mathcal{K}_{L}^{1},

where νR+\nu\in{{R}_{+}} is free, and 𝐫pRp{{\mathbf{r}}_{p}}\in{{R}^{p}}, 𝐫dRn{{\mathbf{r}}_{d}}\in{{R}^{n}}, rgR{{r}_{g}}\in R, βR\beta\in R are residuals defined by

𝐫~p(𝐀𝐱0𝐛τ0)/ν0,\displaystyle\tilde{\mathbf{r}}_{p}\triangleq\left(\mathbf{A}{{\mathbf{x}}_{0}}-\mathbf{b}{{\tau}_{0}}\right)/{{\nu}_{0}}, (12)
𝐫~d(𝐀T𝐲0+𝐜τ0𝐬0)/ν0,\displaystyle\tilde{\mathbf{r}}_{d}\triangleq\left(-{{\mathbf{A}}^{T}}{{\mathbf{y}}_{0}}+\mathbf{c}{{\tau}_{0}}-{{\mathbf{s}}_{0}}\right)/{{\nu}_{0}},
r~g(𝐛T𝐲𝟎𝐜T𝐱0κ0)/ν0,\displaystyle\tilde{r}_{g}\triangleq\left({{\mathbf{b}}^{T}}{{\mathbf{y}}_{\mathbf{0}}}-{{\mathbf{c}}^{T}}{{\mathbf{x}}_{0}}-{{\kappa}_{0}}\right)/{{\nu}_{0}},
β~(𝐫pT𝐲0+𝐫dT𝐱0+rgτ0),\displaystyle\tilde{\beta}\triangleq-\left(\mathbf{r}_{p}^{T}{{\mathbf{y}}_{0}}+\mathbf{r}_{d}^{T}{{\mathbf{x}}_{0}}+{{r}_{g}}{{\tau}_{0}}\right),

ν0{{\nu}_{0}} is typically set as 1.

The solution of the HSD model fully describes the property of the primal problem and dual problem, that is summarized by the following theorem.

Theorem 1. Let (𝐱,𝐲,𝐬,κ,τ)\left(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau\right) be an optimal solution of the HSD model Eq. (11). Then, the following relations hold [Terlaky et al., 2002, Wang, 2003]:

a) 𝐱T𝐬=0\mathbf{x}^{T}\mathbf{s}=0 and κτ=0\kappa\tau=0;

b) If τ>0\tau>0, then (𝐱,𝐲,𝐬)/τ\left(\mathbf{x},\mathbf{y},\mathbf{s}\right)/\tau is the optimal solution of the primal problem Eq. (2) and the dual problem Eq. (LABEL:SOCP_DUAL);

c) If κ>0\kappa>0, then at least one of the inequities 𝐛T𝐲>0\mathbf{b}^{T}\mathbf{y}>0 and 𝐜T𝐱<0\mathbf{c}^{T}\mathbf{x}<0 hold. If the first inequality holds then the primal problem Eq. (2) is infeasible. If the second inequality holds then the dual problem Eq. (LABEL:SOCP_DUAL) is infeasible.

Proof. see corollary 1.3.1 of [Wang, 2003].

The central path is defined by

𝐀𝐱𝐛τ𝐫pν=0,\displaystyle\mathbf{Ax}-\mathbf{b}\tau-{{\mathbf{r}}_{p}}\nu=0, (13)
𝐀T𝐲+𝐜τ𝐬𝐫dν=0,\displaystyle-{{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{c}\tau-\mathbf{s}-{{\mathbf{r}}_{d}}\nu=0,
𝐛T𝐲𝐜T𝐱κrgν=0,\displaystyle{{\mathbf{b}}^{T}}\mathbf{y}-{{\mathbf{c}}^{T}}\mathbf{x}-\kappa-{{r}_{g}}\nu=0,
𝐱𝐬=μ𝐞,\displaystyle\mathbf{x}\circ\mathbf{s}=\mu\mathbf{e},
κτ=μ\displaystyle\kappa\tau=\mu

The central path approaches the optimal solution of the HSD model as μ0\mu\to 0, that corresponds to the solution or an infeasibility certificate of the SOCP problem.

Thirdly, we introduce the scaling matrix, that can be applied to scale the searching directions. Monteiro and Tsuchiya [2000] introduces a group of scaling auto-morphism that maps the cone 𝒦i\mathcal{K}^{i} onto itself, as the following group of matrices

𝒢i{θi𝐆i:θi>0,𝐆i𝐑ni×ni,(𝐆i)T𝐐i𝐆i=𝐐i},\mathcal{G}^{i}\triangleq\left\{\theta^{i}\mathbf{G}^{i}:\theta^{i}>0,\mathbf{G}^{i}\in\mathbf{R}^{n_{i}\times n_{i}},(\mathbf{G}^{i})^{T}\mathbf{Q}^{i}\mathbf{G}^{i}=\mathbf{Q}^{i}\right\}, (14)

where θi\theta^{i} is a positive scaling value, and

𝐐i=(100𝐈ni1).\mathbf{Q}^{i}=\begin{pmatrix}1&0\\ 0&-\mathbf{I}_{n_{i}-1}\\ \end{pmatrix}. (15)

The set of scaling matrice for a vector in 𝒦\mathcal{K} is defined as

𝒢{Θ𝐆:𝐆=blkdiag(𝐆1,,𝐆k),𝐆i𝐑ni×ni,(𝐆i)T𝐐i𝐆i=𝐐i}\mathcal{G}\triangleq\left\{\Theta\mathbf{G}:\mathbf{G}=\text{blkdiag}\left({{\mathbf{G}}^{1}},\cdots,{{\mathbf{G}}^{k}}\right),\mathbf{G}^{i}\in\mathbf{R}^{n_{i}\times n_{i}},(\mathbf{G}^{i})^{T}\mathbf{Q}^{i}\mathbf{G}^{i}=\mathbf{Q}^{i}\right\} (16)

where i{1,,k}i\in\left\{1,\cdots,k\right\} and

Θ=blkdiag(θ1𝐈n1,,θk𝐈nk).\Theta=\text{blkdiag}\left({\theta}^{1}\mathbf{I}_{n_{1}},\cdots,{\theta}^{k}\mathbf{I}_{n_{k}}\right). (17)

The solution variables 𝐱\mathbf{x} and 𝐬\mathbf{s} are scaled by

𝐱¯=(Θ𝐆)T𝐱,𝐬¯=(Θ𝐆)1𝐬,\mathbf{\bar{x}}=(\Theta\mathbf{G})^{T}\mathbf{x},\mathbf{\bar{s}}={{(\Theta\mathbf{G})}^{-1}}\mathbf{s}, (18)

For convenience, we denote a scaling matrix by

𝐃=(𝚯𝐆)1.\mathbf{D}=(\mathbf{\Theta}\mathbf{G})^{-1}. (19)

The following proposition gives several important properties of the scaling matrix.

Proposition 1 [Andersen et al., 2003]. Let 𝐃𝒢\mathbf{D}\in\mathcal{G}, then

(𝐱(i))T𝐬(i)=(𝐱¯(i))T𝐬¯(i),\displaystyle(\mathbf{x}^{(i)})^{T}\mathbf{s}^{(i)}=(\mathbf{\bar{x}}^{(i)})^{T}\mathbf{\bar{s}}^{(i)}, (20)
θi2(𝐱(i))T𝐐i𝐱(i)=(𝐱¯(i))T𝐐i𝐱¯(i)\displaystyle\theta_{i}^{2}(\mathbf{x}^{(i)})^{T}\mathbf{Q}^{i}\mathbf{x}^{(i)}=(\mathbf{\bar{x}}^{(i)})^{T}\mathbf{Q}^{i}\mathbf{\bar{x}}^{(i)}
θi2(𝐬(i))T𝐐i𝐬(i)=(𝐬¯(i))T𝐐i𝐬¯(i),\displaystyle\theta_{i}^{-2}(\mathbf{s}^{(i)})^{T}\mathbf{Q}^{i}\mathbf{s}^{(i)}=(\mathbf{\bar{s}}^{(i)})^{T}\mathbf{Q}^{i}\mathbf{\bar{s}}^{(i)},
𝐱𝒦𝐱¯𝒦, and 𝐱int(𝒦)𝐱¯int(𝒦).\displaystyle\mathbf{x}\in\mathcal{K}\Leftrightarrow\mathbf{\bar{x}}\in\mathcal{K},\text{ and }\mathbf{x}\in\text{int}(\mathcal{K})\Leftrightarrow\mathbf{\bar{x}}\in\text{int}(\mathcal{K}).

Proof. See Lemma 3 of [Andersen et al., 2003].

For a vector 𝐯int(K)\mathbf{v}\in\text{int}\mathbf{(}K), Tsuchiya [1999] introduces a special scaling matrix 𝐓v\mathbf{T}_{v} defined as

𝐓vblkdiag(𝐓v(1),,𝐓v(k)),\mathbf{T}_{v}\triangleq\text{blkdiag}\left(\mathbf{T}_{v^{(1)}},\cdots,\mathbf{T}_{v^{(k)}}\right), (21)

where

𝐓u(𝐮1𝐮2:nT𝐮2:nβu𝐈+𝐮2:n𝐮2:nTβu+𝐮1),βu=𝐮12𝐮2:n2\displaystyle\mathbf{T}_{u}\triangleq\begin{pmatrix}&\mathbf{u}_{1}&\mathbf{u}_{2:n}^{T}\\ &\mathbf{u}_{2:n}&\beta_{u}\mathbf{I}+\frac{\mathbf{u}_{2:n}\mathbf{u}_{2:n}^{T}}{\beta_{u}+\mathbf{u}_{1}}\end{pmatrix},\beta_{u}=\sqrt{\mathbf{u}_{1}^{2}-\|\mathbf{u}_{2:n}\|^{2}} (22)

for 𝐮n\mathbf{u}\in\mathcal{R}^{n}.

Proposition 2 [Tsuchiya, 1999]. For a vector 𝐯int(K)\mathbf{v}\in\text{int}\mathbf{(}K), 𝐓v\mathbf{T}_{v} is the unique scaling matrix which maps 𝐞\mathbf{e} to 𝐓v\mathbf{T}_{v}. Moreover, 𝐓v0\mathbf{T}_{v}\succ 0.

Proof. See Proposition 2.1 of [Tsuchiya, 1999].

Finally, it is ready to define the scaling-invariant neighborhoods of the central path. For (𝐱,𝐬,κ,τ)int(𝒦)×int(𝒦)×R+×R+(\mathbf{x},\mathbf{s},\kappa,\tau)\in\text{int}(\mathcal{K})\times\text{int}(\mathcal{K})\times R_{+}\times R_{+}, define the central path coefficient

μ(𝐱,𝐬,κ,τ)=(𝐱T𝐬+κτ)/(k+1).\mu(\mathbf{x},\mathbf{s},\kappa,\tau)=(\mathbf{x}^{T}\mathbf{s}+\kappa\tau)/(k+1). (23)

With a little abuse of notation, we also use μ\mu instead of μ(𝐱,𝐬,κ,τ)\mu(\mathbf{x},\mathbf{s},\kappa,\tau) for simplicity. The central path distances are defined as

d2(𝐱,𝐬,κ,τ)=2𝐰xsκτμ𝐞,\displaystyle d_{2}(\mathbf{x},\mathbf{s},\kappa,\tau)=\sqrt{2}\left\|\mathbf{w}_{xs\kappa\tau}-\mu\mathbf{e}\right\|, (24)
d(𝐱,𝐬,κ,τ)=max(d(𝐱,𝐬),|κτμ|),\displaystyle d_{\infty}(\mathbf{x},\mathbf{s},\kappa,\tau)=\max\left(d_{\infty}(\mathbf{x},\mathbf{s}),\left|\kappa\tau-\mu\right|\right),

where

𝐰xs=𝐓x𝐬,𝐰xsκτ=(𝐰xs,κτ),\displaystyle\mathbf{w}_{xs}=\mathbf{T}_{x}\mathbf{s},\mathbf{w}_{xs\kappa\tau}=\left(\mathbf{w}_{xs},\kappa\tau\right), (25)
d(𝐱,𝐬)=maxi=1,,k,j=0,1(|(𝐰xs)1(i)+(𝐰xs)2:ni(i)μ|,|(𝐰xs)1(i)(𝐰xs)2:ni(i)μ|),\displaystyle d_{\infty}(\mathbf{x},\mathbf{s})=\max_{i=1,\cdots,k,j=0,1}\left(\left|(\mathbf{w}_{xs})_{1}^{(i)}+\|(\mathbf{w}_{xs})_{2:n_{i}}^{(i)}\|-\mu\right|,\left|(\mathbf{w}_{xs})_{1}^{(i)}-\|(\mathbf{w}_{xs})_{2:n_{i}}^{(i)}\|-\mu\right|\right),

for 𝐯𝒦\mathbf{v}\in\mathcal{K}.

The neighborhoods of the central path determined by the above distance are

𝒩2(γ)={(𝐱,𝐲,𝐬,κ,τ)int(𝒞):d2(𝐱,𝐬,κ,τ)γμ(𝐱,𝐬,κ,τ)},\displaystyle\mathcal{N}_{2}(\gamma)=\left\{(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\in\text{int}(\mathcal{C}):d_{2}(\mathbf{x},\mathbf{s},\kappa,\tau)\leq\gamma\mu(\mathbf{x},\mathbf{s},\kappa,\tau)\right\}, (26)
𝒩(γ)={(𝐱,𝐲,𝐬,κ,τ)int(𝒞):d(𝐱,𝐬,κ,τ)γμ(𝐱,𝐬,κ,τ)},\displaystyle\mathcal{N}_{\infty}(\gamma)=\left\{(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\in\text{int}(\mathcal{C}):d_{\infty}(\mathbf{x},\mathbf{s},\kappa,\tau)\leq\gamma\mu(\mathbf{x},\mathbf{s},\kappa,\tau)\right\},

where the parameter γ(0,1)\gamma\in(0,1). For (𝐱,𝐲,𝐬,κ,τ)int(𝒞)(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\in\text{int}(\mathcal{C}), it is easy to show d(𝐱,𝐬,κ,τ)d2(𝐱,𝐬,κ,τ)d_{\infty}(\mathbf{x},\mathbf{s},\kappa,\tau)\leq d_{2}(\mathbf{x},\mathbf{s},\kappa,\tau), so that 𝒩2(γ)𝒩(γ)\mathcal{N}_{2}(\gamma)\subseteq\mathcal{N}_{\infty}(\gamma).

The following proposition shows the invariance property of 𝐰xs\mathbf{w}_{xs}.

Proposition 3 Tsuchiya [1999]. Suppose that (𝐱,𝐬)int(𝒦)×int(𝒦)(\mathbf{x},\mathbf{s})\in\text{int}(\mathcal{K})\times\text{int}(\mathcal{K}). Let 𝐰𝐰xs\mathbf{w}\triangleq\mathbf{w}_{xs} and 𝐰¯𝐰x¯s¯\mathbf{\bar{w}}\triangleq\mathbf{w}_{\bar{x}\bar{s}}, i1,,ki\in{1,\cdots,k}. Then:

a) 𝐰¯0(i)=𝐰0(i)\mathbf{\bar{w}}_{0}^{(i)}=\mathbf{w}_{0}^{(i)} and 𝐰¯1(i)=𝐰1(i)\|\mathbf{\bar{w}}_{1}^{(i)}\|=\|\mathbf{w}_{1}^{(i)}\|;

b) λmin(𝐰¯0(i))=λmin(𝐰0(i))\lambda_{\min}(\mathbf{\bar{w}}_{0}^{(i)})=\lambda_{\min}(\mathbf{w}_{0}^{(i)}) and λmax(𝐰¯0(i))=λmax(𝐰0(i))\lambda_{\max}(\mathbf{\bar{w}}_{0}^{(i)})=\lambda_{\max}(\mathbf{w}_{0}^{(i)});

c) d2(𝐱¯,𝐬¯)=d2(𝐱,𝐬)d_{2}(\mathbf{\bar{x}},\mathbf{\bar{s}})=d_{2}(\mathbf{x},\mathbf{s}) and d(𝐱¯,𝐬¯)=d(𝐱,𝐬)d_{\infty}(\mathbf{\bar{x}},\mathbf{\bar{s}})=d_{\infty}(\mathbf{x},\mathbf{s}).

Proof. See Proposition 2.4 of Tsuchiya [1999].

Using Proposition 3, it is easy to verify that for a point (𝐱,𝐲,𝐬,κ,τ)(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau) on the central path defined by Eq. (13), the distances d2(𝐱,𝐬,κ,τ)=d(𝐱,𝐬,κ,τ)=0d_{2}(\mathbf{x},\mathbf{s},\kappa,\tau)=d_{\infty}(\mathbf{x},\mathbf{s},\kappa,\tau)=0.

2.3 The infeasible IPM for SOCP

In this subsection, we briefly introduce the infeasible IPM for SOCP studied in the paper.

The SOCP problem can be solved by loosely track the central path towards the optimal solution of the HSD model. In such a way, the updating direction in each iteration is computed by solving

𝐀Δ𝐱𝐛Δτ=(1ν)(𝐀𝐱𝐛τ),\displaystyle\mathbf{A}\Delta\mathbf{x}-\mathbf{b}\Delta\tau=-\left(1-\nu\right)\left(\mathbf{Ax}-\mathbf{b}\tau\right), (27)
𝐀TΔ𝐲+𝐜ΔτΔ𝐬=(1ν)(𝐀T𝐲+𝐜τ𝐬),\displaystyle-{{\mathbf{A}}^{T}}\Delta\mathbf{y}+\mathbf{c}\Delta\tau-\Delta\mathbf{s}=-\left(1-\nu\right)\left(-{{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{c}\tau-\mathbf{s}\right),
𝐛TΔ𝐲𝐜TΔ𝐱Δκ=(1ν)(𝐛T𝐲𝐜T𝐱κ),\displaystyle{{\mathbf{b}}^{T}}\Delta\mathbf{y}-{{\mathbf{c}}^{T}}\Delta\mathbf{x}-\Delta\kappa=-\left(1-\nu\right)\left({{\mathbf{b}}^{T}}\mathbf{y}-{{\mathbf{c}}^{T}}\mathbf{x}-\kappa\right),
𝐗Δ𝐬+𝐒Δ𝐱=νμ𝐞𝐗𝐒𝐞,\displaystyle\mathbf{X}\Delta\mathbf{s}+\mathbf{S}\Delta\mathbf{x}=\nu\mu\mathbf{e}-\mathbf{XSe},
κΔτ+τΔκ=νμκτ,\displaystyle\kappa\Delta\tau+\tau\Delta\kappa=\nu\mu-\kappa\tau,

where 𝐗=mat(𝐱)\mathbf{X}=\text{mat}(\mathbf{x}), 𝐒=mat(𝐬)\mathbf{S}=\text{mat}(\mathbf{s}). μ\mu is μ(𝐱,𝐬,κ,τ)\mu(\mathbf{x},\mathbf{s},\kappa,\tau) defined in Eq. (23), which is proportional to the complementarity gap 𝐱T𝐬+κτ\mathbf{x}^{T}\mathbf{s}+\kappa\tau.

Instead of using the searching direction obtained by solving Eq (27) directly, the MZ family of drections are applied to speed up the convergence. The directions are calculated by solving a scaled linear system, as

𝐀Δ𝐱𝐛Δτ=(1ν)(𝐀𝐱𝐛τ),\displaystyle\mathbf{A}\Delta\mathbf{x}-\mathbf{b}\Delta\tau=-\left(1-\nu\right)\left(\mathbf{Ax}-\mathbf{b}\tau\right), (28)
𝐀TΔ𝐲+𝐜ΔτΔ𝐬=(1ν)(𝐀T𝐲+𝐜τ𝐬),\displaystyle-{{\mathbf{A}}^{T}}\Delta\mathbf{y}+\mathbf{c}\Delta\tau-\Delta\mathbf{s}=-\left(1-\nu\right)\left(-{{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{c}\tau-\mathbf{s}\right),
𝐛TΔ𝐲𝐜TΔ𝐱Δκ=(1ν)(𝐛T𝐲𝐜T𝐱κ),\displaystyle{{\mathbf{b}}^{T}}\Delta\mathbf{y}-{{\mathbf{c}}^{T}}\Delta\mathbf{x}-\Delta\kappa=-\left(1-\nu\right)\left({{\mathbf{b}}^{T}}\mathbf{y}-{{\mathbf{c}}^{T}}\mathbf{x}-\kappa\right),
𝐗¯𝐃Δ𝐬+𝐒¯𝐃TΔ𝐱=νμ𝐞𝐗¯𝐒¯𝐞,\displaystyle\bar{\mathbf{X}}\mathbf{D}\Delta\mathbf{s}+\bar{\mathbf{S}}\mathbf{D}^{-T}\Delta\mathbf{x}=\nu\mu\mathbf{e}-\bar{\mathbf{X}}\bar{\mathbf{S}}\mathbf{e},
κΔτ+τΔκ=νμκτ,\displaystyle\kappa\Delta\tau+\tau\Delta\kappa=\nu\mu-\kappa\tau,

where

𝐗¯=mat(𝐱¯),𝐒¯=mat(𝐬¯),𝐱¯=𝐃T𝐱,𝐬¯=𝐃𝐬,𝐃𝒢.\bar{\mathbf{X}}=\text{mat}(\bar{\mathbf{x}}),\bar{\mathbf{S}}=\text{mat}(\bar{\mathbf{s}}),\bar{\mathbf{x}}=\mathbf{D}^{-T}\mathbf{x},\bar{\mathbf{s}}=\mathbf{D}\mathbf{s},\mathbf{D}\in\mathcal{G}. (29)

The scaled linear system Eq (28) is obtained by using the scale variable 𝐱¯,𝐬¯\bar{\mathbf{x}},\bar{\mathbf{s}} instead of 𝐱\mathbf{x} and 𝐬\mathbf{s} in the HSD model Eq. (11).

The MZ family of directions include the HRVW/KSH/M directions and NT directions as special cases. For example, in the NT scaling, the scaling matrix 𝐃\mathbf{D} is symmetric and satisfies 𝐱=𝐃2𝐬\mathbf{x}=\mathbf{D}^{2}\mathbf{s}, and the multiplication with 𝐃\mathbf{D} or 𝐃2\mathbf{D}^{2} can be calculated in O(n)O(n) complexity.

Then, the infeasible IPM for SOCP is presented as Algorithm 1.

Algorithm 1 An Infeasible IPM for SOCP

Input: Constant coefficients γ,δ(0,1)\gamma,\delta\in(0,1), a small positive threshold ϵ(0,1)\epsilon\in(0,1); the SOCP problem parameters 𝐀,𝐛,𝐜,𝒦\mathbf{A},\mathbf{b},\mathbf{c},\mathcal{K}; initial value (𝐱0,𝐲0,𝐬0,κ0,τ0)𝒩2(γ)(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\in\mathcal{N}_{2}(\gamma).
Output: the solution (𝐱,𝐲,𝐬,κ,τ)(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau) and final status.

1:  Set ν=1δ/2(k+1)\nu=1-\delta/\sqrt{2(k+1)};
2:  Set (𝐱,𝐲,𝐬,κ,τ)=(𝐱0,𝐲0,𝐬0,κ0,τ0)(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)=(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0});
3:  while the stop criteria is not satisfied do
4:     Set μ=(𝐱T𝐬+κτ)/(k+1)\mu=(\mathbf{x}^{T}\mathbf{s}+\kappa\tau)/(k+1);
5:     Compute (Δ𝐱,Δ𝐲,Δ𝐬,Δκ,Δτ)(\Delta\mathbf{x},\Delta\mathbf{y},\Delta\mathbf{s},\Delta\kappa,\Delta\tau) by solving the linear system Eq. (28);
6:     Set (𝐱,𝐲,𝐬,κ,τ)=(𝐱,𝐲,𝐬,κ,τ)+(Δ𝐱,Δ𝐲,Δ𝐬,Δκ,Δτ)(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)=(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)+(\Delta\mathbf{x},\Delta\mathbf{y},\Delta\mathbf{s},\Delta\kappa,\Delta\tau);
7:  end while
8:  Select the final status according to the stop criteria.

In Algorithm 1, the stop criteria is as follows:

𝐫p(𝐱,𝐲,𝐬,κ,τ)ϵ𝐫p(𝐱0,𝐲0,𝐬0,κ0,τ0),\displaystyle\left\|\mathbf{r}_{p}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\right\|\leq\epsilon\left\|\mathbf{r}_{p}(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\right\|, (30)
𝐫d(𝐱,𝐲,𝐬,κ,τ)ϵ𝐫d(𝐱0,𝐲0,𝐬0,κ0,τ0),\displaystyle\left\|\mathbf{r}_{d}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\right\|\leq\epsilon\left\|\mathbf{r}_{d}(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\right\|,
μ(𝐱,𝐬,κ,τ)ϵμ(𝐱0,𝐬0,κ0,τ0).\displaystyle\mu\left(\mathbf{x},\mathbf{s},\kappa,\tau\right)\leq\epsilon\mu\left({{\mathbf{x}}_{0}},{{\mathbf{s}}_{0}},{{\kappa}_{0}},{{\tau}_{0}}\right).

where the primal residual 𝐫p\mathbf{r}_{p} and the dual residual 𝐫d\mathbf{r}_{d} are defined as

𝐫p(𝐱,𝐲,𝐬,κ,τ)=𝐀𝐱τ𝐛,𝐫d(𝐱,𝐲,𝐬,κ,τ)=𝐀T𝐲+𝐬τ𝐜.\mathbf{r}_{p}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)=\mathbf{Ax}-\tau\mathbf{b},\mathbf{r}_{d}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)={{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{s}-\tau\mathbf{c}. (31)

For convience, we also define

rg(𝐱,𝐲,𝐬,κ,τ)=𝐛T𝐲𝐜T𝐱κ.r_{g}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)={{\mathbf{b}}^{T}}\mathbf{y}-{{\mathbf{c}}^{T}}\mathbf{x}-\kappa. (32)

The criteria indicates that the primal residual, dual residual, and complementarity gap are reduce by a certain ratio ϵ\epsilon. When the criteria is satisfied, the final status is decided using an approximate version of Theorem 1: if τϵmax(1,κ)\tau\geq\epsilon\max(1,\kappa), then the problem is solved; if τ<ϵmax(1,κ)\tau<\epsilon\max(1,\kappa), the primal problem is infeasible if 𝐛T𝐲>0\mathbf{b}^{T}\mathbf{y}>0, and the dual problem is infeasible if 𝐜T𝐱<0\mathbf{c}^{T}\mathbf{x}<0. Some important issues in implementation such as the maximum iteration number limit and rounding errors are not considered in the theoretical analysis of this paper.

Algorithm 1 obtains the solution for the HSD model. If the problem is solved, the solution for the SOCP problem is (𝐱/τ,𝐲/τ,𝐬/τ)(\mathbf{x}/\tau,\mathbf{y}/\tau,\mathbf{s}/\tau). In oder to guarantee the convergence of Algorithm 1, the parameters γ,δ\gamma,\delta should be chosen to satisfy certain conditions, which will be presented in Proposition 4 in the next section.

3 Analysis of the iteration complexity

In order to prove the polynomial convergence in the worst case, it is crucial to show two points: (1) the updating keeps the solution in a central path neighborhood, which is a subset of the interior of cone constraints; (2) when the solution is bounded in the central path neighborhood, the primal residual, dual residual, and complementarity gap can be reduced for a certain ratio in each iteration.

This section is organized as follows: firstly, an equivalent form of the HSD model is proposed to simplify the following analysis; secondly, the well-definedness of the searching directions is proved; thirdly, the improvement of the primal residual, dual residual, and complementarity gap is investigated; fourthly, several important results are presented to bound the updating in the central path neighborhood for the unscaled search directions; fifthly, the polynomial convergence for the unscaled search directions is proved; finally, the proof is extend to the MZ family of search directions.

3.1 An equivalent form of the HSD model

In the HSD model, κ\kappa and τ\tau are variables in 𝒦L1\mathcal{K}_{L}^{1}, but they act differently from other variables in LCs. Although we find it is theoretically not difficult to consider them as additional terms, the analysis in such a way would be long and tedious. In order to simplify the analysis, we propose a equivalent form of the HSD model in this subsection.

Consider the following change of variables

𝐀^=[𝐀,𝐛],𝐂^=(0,𝐜𝐜T,0),𝐱^=(𝐱τ),𝐬^=(𝐬κ),𝐞^=(𝐞1),\displaystyle\hat{\mathbf{A}}=[\mathbf{A},-\mathbf{b}],\hat{\mathbf{C}}=\begin{pmatrix}\begin{aligned} &0,&-\mathbf{c}\\ &\mathbf{c}^{T},&0\\ \end{aligned}\end{pmatrix},\hat{\mathbf{x}}=\begin{pmatrix}\mathbf{x}\\ \tau\\ \end{pmatrix},\hat{\mathbf{s}}=\begin{pmatrix}\mathbf{s}\\ \kappa\\ \end{pmatrix},\hat{\mathbf{e}}=\begin{pmatrix}\mathbf{e}\\ 1\\ \end{pmatrix}, (33)

where 𝐱^,𝐬^𝒦×𝒦L1\hat{\mathbf{x}},\hat{\mathbf{s}}\in\mathcal{K}\times\mathcal{K}_{L}^{1} and 𝐲p\mathbf{y}\in\mathcal{R}^{p}.

It is easy to verify that the linear system Eq. (27) can be written as

𝐀^Δ𝐱^=(1ν)𝐀^𝐱^,\displaystyle\hat{\mathbf{A}}\Delta\hat{\mathbf{x}}=-\left(1-\nu\right)\hat{\mathbf{A}}\hat{\mathbf{x}}, (34)
𝐀^TΔ𝐲+𝐂^Δ𝐱^+Δ𝐬^=(1ν)(𝐀^T𝐲+𝐂^𝐱^+𝐬^),\displaystyle\hat{\mathbf{A}}^{T}\Delta\mathbf{y}+\hat{\mathbf{C}}\Delta\hat{\mathbf{x}}+\Delta\hat{\mathbf{s}}=-(1-\nu)\left(\hat{\mathbf{A}}^{T}\mathbf{y}+\hat{\mathbf{C}}\hat{\mathbf{x}}+\hat{\mathbf{s}}\right),
𝐗^Δ𝐬^+𝐒^Δ𝐱^=νμ𝐞^𝐗^𝐒^𝐞^,\displaystyle\hat{\mathbf{X}}\Delta\hat{\mathbf{s}}+\hat{\mathbf{S}}\Delta\hat{\mathbf{x}}=\nu\mu\hat{\mathbf{e}}-\hat{\mathbf{X}}\hat{\mathbf{S}}\hat{\mathbf{e}},

where 𝐗^=mat(𝐱^)\hat{\mathbf{X}}=\text{mat}(\hat{\mathbf{x}}), 𝐒^=mat(𝐬^)\hat{\mathbf{S}}=\text{mat}(\hat{\mathbf{s}}). It is worth noting that the second and third equations in Eq. (27) are combined into the second equation in Eq. (34).

The scaled linear system Eq. (28) is written as

𝐀^Δ𝐱^=(1ν)𝐀^𝐱^,\displaystyle\hat{\mathbf{A}}\Delta\hat{\mathbf{x}}=-\left(1-\nu\right)\hat{\mathbf{A}}\hat{\mathbf{x}}, (35)
𝐀^TΔ𝐲+𝐂^Δ𝐱^+Δ𝐬^=(1ν)(𝐀^T𝐲+𝐂^𝐱^+𝐬^),\displaystyle\hat{\mathbf{A}}^{T}\Delta\mathbf{y}+\hat{\mathbf{C}}\Delta\hat{\mathbf{x}}+\Delta\hat{\mathbf{s}}=-(1-\nu)\left(\hat{\mathbf{A}}^{T}\mathbf{y}+\hat{\mathbf{C}}\hat{\mathbf{x}}+\hat{\mathbf{s}}\right),
mat(𝐃^T𝐱^)𝐃^Δ𝐬^+mat(𝐃^𝐬^)𝐃^TΔ𝐱^=νμ𝐞^(𝐃^T𝐱^)(𝐃^𝐬^),\displaystyle\text{mat}(\hat{\mathbf{D}}^{-T}\hat{\mathbf{x}})\hat{\mathbf{D}}\Delta\hat{\mathbf{s}}+\text{mat}(\hat{\mathbf{D}}\hat{\mathbf{s}})\hat{\mathbf{D}}^{-T}\Delta\hat{\mathbf{x}}=\nu\mu\hat{\mathbf{e}}-(\hat{\mathbf{D}}^{-T}\hat{\mathbf{x}})\circ(\hat{\mathbf{D}}\hat{\mathbf{s}}),

where the scaling matrix

𝐃^=(𝐃001).\hat{\mathbf{D}}=\begin{pmatrix}\begin{aligned} &\mathbf{D}&0\\ &0&1\end{aligned}\end{pmatrix}. (36)

The central path coefficient is also simplified as

μ=(𝐱T𝐬+κτ)/(k+1)=𝐱^T𝐬^/(k+1)\mu=(\mathbf{x}^{T}\mathbf{s}+\kappa\tau)/(k+1)=\hat{\mathbf{x}}^{T}\hat{\mathbf{s}}/(k+1) (37)

Because the two linear systems are identical, the IPM solving the HSD model by Eq. (28) is identical to solving the model by Eq. (35). Without loss of generality, the last LC 𝒦L1\mathcal{K}_{L}^{1} can be viewed as a SOC 𝒦S1\mathcal{K}_{S}^{1}. Then, with a little abuse of notation, Eq. (35) is a special case of the following generalized linear system

𝐀Δ𝐱=(1ν)𝐀𝐱,\displaystyle\mathbf{A}\Delta\mathbf{x}=-\left(1-\nu\right)\mathbf{A}\mathbf{x}, (38)
𝐀TΔ𝐲+𝐂Δ𝐱+Δ𝐬=(1ν)(𝐀T𝐲+𝐂𝐱+𝐬),\displaystyle\mathbf{A}^{T}\Delta\mathbf{y}+\mathbf{C}\Delta\mathbf{x}+\Delta\mathbf{s}=-(1-\nu)\left(\mathbf{A}^{T}\mathbf{y}+\mathbf{C}\mathbf{x}+\mathbf{s}\right),
𝐗¯𝐃Δ𝐬+𝐒¯𝐃TΔ𝐱=νμ𝐞𝐱¯𝐬¯,\displaystyle\bar{\mathbf{X}}\mathbf{D}\Delta\mathbf{s}+\bar{\mathbf{S}}\mathbf{D}^{-T}\Delta\mathbf{x}=\nu\mu\mathbf{e}-\bar{\mathbf{x}}\circ\bar{\mathbf{s}},

where the variables are redefined for simplicity, as

𝐀p×n,rank(𝐀)=pn,𝐂n×n,𝐂=𝐂T\displaystyle\mathbf{A}\in\mathcal{R}^{p\times n},\text{rank}(\mathbf{A})=p\leq n,\mathbf{C}\in\mathcal{R}^{n\times n},\mathbf{C}=-\mathbf{C}^{T} (39)
𝒦=𝒦Ll×𝒦Snl+1×𝒦Snl+2×𝒦Snl+m,k=l+m,\displaystyle\mathcal{K}=\mathcal{K}_{L}^{l}\times\mathcal{K}_{S}^{{{n}_{l+1}}}\times\mathcal{K}_{S}^{{{n}_{l+2}}}\cdots\times\mathcal{K}_{S}^{{{n}_{l+m}}},k=l+m,
𝐱𝒦,𝐲p,𝐬𝒦,𝒞=𝒦×p×𝒦,\displaystyle\mathbf{x}\in\mathcal{K},\mathbf{y}\in\mathcal{R}^{p},\mathbf{s}\in\mathcal{K},\mathcal{C}=\mathcal{K}\times\mathcal{R}^{p}\times\mathcal{K},
𝐃𝒢,𝐱¯=𝐃T𝐱,𝐬¯=𝐃𝐬,𝐗¯=mat(𝐱¯),𝐒¯=mat(𝐬¯).\displaystyle\mathbf{D}\in\mathcal{G},\bar{\mathbf{x}}=\mathbf{D}^{-T}\mathbf{x},\bar{\mathbf{s}}=\mathbf{D}\mathbf{s},\bar{\mathbf{X}}=\text{mat}(\bar{\mathbf{x}}),\bar{\mathbf{S}}=\text{mat}(\bar{\mathbf{s}}).

The central path coefficient for the generalized linear system Eq. (38) is defined as

μ=μ(𝐱,𝐬)=𝐱T𝐬/k,\mu=\mu(\mathbf{x},\mathbf{s})=\mathbf{x}^{T}\mathbf{s}/k, (40)

where 𝐱T𝐬\mathbf{x}^{T}\mathbf{s} is the complementarity gap . The residuals are defined as

𝐫^p=𝐀𝐱,𝐫^d=𝐀T𝐲+𝐂𝐱+𝐬.\hat{\mathbf{r}}_{p}=\mathbf{A}\mathbf{x},\hat{\mathbf{r}}_{d}=\mathbf{A}^{T}\mathbf{y}+\mathbf{C}\mathbf{x}+\mathbf{s}. (41)

For the special case Eq. (35), the generalized linear system Eq. (38) has 1 additional dimensional in both 𝐱\mathbf{x} and 𝐬\mathbf{s} compared with the original form Eq. (28), to substitute κ\kappa and τ\tau. Meanwhile, its cone constraint 𝒦\mathcal{K} is also 1-dimensional higher. The scaling matrix 𝐃𝒢\mathbf{D}\in\mathcal{G} generates the MZ family of searching directions. The residual 𝐫^p\hat{\mathbf{r}}_{p} is identical to the primal residual 𝐫p\mathbf{r}_{p} of the HSD model defined in Eq. (12), 𝐫^d\hat{\mathbf{r}}_{d} is a combination of the dual residual and gap as (𝐫d,rg)(\mathbf{r}_{d},r_{g}), and the central path coefficient μ(𝐱,𝐬)\mu(\mathbf{x},\mathbf{s}) is identical to μ(𝐱,𝐬,κ,τ)\mu(\mathbf{x},\mathbf{s},\kappa,\tau) defined in Eq. (23).

In the following, it sufficient to study the convergence properties to reduce the residuals 𝐫^p\hat{\mathbf{r}}_{p}, 𝐫^d\hat{\mathbf{r}}_{d}, and the complementarity gap which is proportional to μ\mu by solving the linear system Eq. (38), and the results apply to the process by solving Eq. (28).

For the generalized linear system Eq. (38), the central path distances are

d2(𝐱,𝐬)=2𝐰xsμ𝐞,\displaystyle d_{2}(\mathbf{x},\mathbf{s})=\sqrt{2}\left\|\mathbf{w}_{xs}-\mu\mathbf{e}\right\|, (42)
d(𝐱,𝐬)=maxi=1,,k,j=0,1(|(𝐰xs)0(i)+(𝐰xs)1(i)μ|,|(𝐰xs)0(i)(𝐰xs)1(i)μ|).\displaystyle d_{\infty}(\mathbf{x},\mathbf{s})=\max_{i=1,\cdots,k,j=0,1}\left(\left|(\mathbf{w}_{xs})_{0}^{(i)}+\|(\mathbf{w}_{xs})_{1}^{(i)}\|-\mu\right|,\left|(\mathbf{w}_{xs})_{0}^{(i)}-\|(\mathbf{w}_{xs})_{1}^{(i)}\|-\mu\right|\right).

The central path neighborhoods are

𝒩2(γ)={(𝐱,𝐲,𝐬)int(𝒞):d2(𝐱,𝐬)γμ(𝐱,𝐬)},\displaystyle\mathcal{N}_{2}(\gamma)=\left\{(\mathbf{x},\mathbf{y},\mathbf{s})\in\text{int}(\mathcal{C}):d_{2}(\mathbf{x},\mathbf{s})\leq\gamma\mu(\mathbf{x},\mathbf{s})\right\}, (43)
𝒩(γ)={(𝐱,𝐲,𝐬)int(𝒞):d(𝐱,𝐬)γμ(𝐱,𝐬)},\displaystyle\mathcal{N}_{\infty}(\gamma)=\left\{(\mathbf{x},\mathbf{y},\mathbf{s})\in\text{int}(\mathcal{C}):d_{\infty}(\mathbf{x},\mathbf{s})\leq\gamma\mu(\mathbf{x},\mathbf{s})\right\},

where the parameter γ(0,1)\gamma\in(0,1). It is easy to verify the central path distances and neighborhoods are identical to the definition Eq. (24) and Eq. (26) for the special case Eq. (35). We also have d(𝐱,𝐬)d2(𝐱,𝐬)d_{\infty}(\mathbf{x},\mathbf{s})\leq d_{2}(\mathbf{x},\mathbf{s}), and 𝒩2(γ)𝒩(γ)\mathcal{N}_{2}(\gamma)\subseteq\mathcal{N}_{\infty}(\gamma).

3.2 The well-definedness of the searching directions

Tsuchiya [1999] and Monteiro and Tsuchiya [2000] provide valuable tools for our study. For the convenience of analysis, we introduce Lemma 1-4 in [Monteiro and Tsuchiya, 2000] as Lemma 1-4 (see the paper for the proofs).

Define the following notations as

𝐖xsmat(𝐰xs),𝐑(𝐱,𝐬)𝐓x𝐗1𝐒𝐓x.\mathbf{W}_{xs}\triangleq\text{mat}(\mathbf{w}_{xs}),\mathbf{R}(\mathbf{x},\mathbf{s})\triangleq\mathbf{T}_{x}\mathbf{X}^{-1}\mathbf{S}\mathbf{T}_{x}. (44)

Lemma 1 [Monteiro and Tsuchiya, 2000]. For any 𝐱int(𝒦)\mathbf{x}\in\text{int}(\mathcal{K}), the matrice 𝐗\mathbf{X} and 𝐓x\mathbf{T}_{x} satisfy:

a)

𝐗𝐓x=𝐔x=blkdiag(𝐔x(i):i=1,,k),\mathbf{X}-\mathbf{T}_{x}=\mathbf{U}_{x}=\text{blkdiag}\left(\mathbf{U}_{x^{(i)}}:i=1,\cdots,k\right), (45)

where

𝐔v=(000(𝐯1βv)𝐏v)\mathbf{U}_{v}=\begin{pmatrix}&0&0\\ &0&(\mathbf{v}_{1}-\beta_{v})\mathbf{P}_{v}\end{pmatrix} (46)

for 𝐯n\mathbf{v}\in\mathcal{R}^{n}, and 𝐏v\mathbf{P}_{v} is the orthogonal projection matrix onto the subspace orthogonal to 𝐯2:n\mathbf{v}_{2:n}, as

𝐏v=𝐈n1𝐯2:n(𝐯2:n)T/𝐯2:n2.\mathbf{P}_{v}=\mathbf{I}_{n-1}-\mathbf{v}_{2:n}(\mathbf{v}_{2:n})^{T}/\left\|\mathbf{v}_{2:n}\right\|^{2}. (47)

We also use the notation

𝐏x=blkdiag((000𝐏x(i)):i=1,,k).\mathbf{P}_{x}=\text{blkdiag}\left(\begin{pmatrix}&0&0\\ &0&\mathbf{P}_{x^{(i)}}\end{pmatrix}:i=1,\cdots,k\right). (48)

b)

𝐓x𝐗1=𝐗1𝐓x=blkdiag(𝐈𝐔x(i)/𝐱1(i):i=1,,k),\displaystyle\mathbf{T}_{x}\mathbf{X}^{-1}=\mathbf{X}^{-1}\mathbf{T}_{x}=\text{blkdiag}\left(\mathbf{I}-\mathbf{U}_{x^{(i)}}/\mathbf{x}^{(i)}_{1}:i=1,\cdots,k\right), (49)
𝐓x𝐗1𝐞=𝐞;\displaystyle\mathbf{T}_{x}\mathbf{X}^{-1}\mathbf{e}=\mathbf{e};

c) 𝐗\mathbf{X} and 𝐓x\mathbf{T}_{x} commute and 𝐗𝐓x\mathbf{X}\succeq\mathbf{T}_{x}.

Lemma 2 [Monteiro and Tsuchiya, 2000]. 𝐑xs\mathbf{R}_{xs} satisfies

𝐑xs=blkdiag(𝐑xsi:i=1,,k),\mathbf{R}_{xs}=\text{blkdiag}\left(\mathbf{R}_{xs}^{i}:i=1,\cdots,k\right), (50)

where

𝐑xsi(𝐰˙1𝐰˙2:niT𝐰˙2:ni𝐑~xsi)\mathbf{R}_{xs}^{i}\triangleq\begin{pmatrix}&\dot{\mathbf{w}}_{1}&\dot{\mathbf{w}}_{2:n_{i}}^{T}\\ &\dot{\mathbf{w}}_{2:n_{i}}&\tilde{\mathbf{R}}_{xs}^{i}\end{pmatrix} (51)

with 𝐰˙\dot{\mathbf{w}} denotes 𝐰xs(i)\mathbf{w}_{xs}^{(i)} and

𝐑~xsi1𝐱1(i)(𝐰˙2:ni(𝐱2:ni(i))T+βx(i)2𝐬1(i)𝐈)=𝐰˙2:ni(𝐱2:ni(i))T𝐱1(i)+(𝐰˙1𝐰˙2:niT𝐱2:ni(i)𝐱1(i))𝐈.\tilde{\mathbf{R}}_{xs}^{i}\triangleq\frac{1}{\mathbf{x}^{(i)}_{1}}\left(\dot{\mathbf{w}}_{2:n_{i}}(\mathbf{x}^{(i)}_{2:n_{i}})^{T}+\beta_{x^{(i)}}^{2}\mathbf{s}^{(i)}_{1}\mathbf{I}\right)=\frac{\dot{\mathbf{w}}_{2:n_{i}}(\mathbf{x}^{(i)}_{2:n_{i}})^{T}}{\mathbf{x}^{(i)}_{1}}+\left(\dot{\mathbf{w}}_{1}-\frac{\dot{\mathbf{w}}_{2:n_{i}}^{T}\mathbf{x}^{(i)}_{2:n_{i}}}{\mathbf{x}^{(i)}_{1}}\right)\mathbf{I}. (52)

.

Lemma 3 [Monteiro and Tsuchiya, 2000]. Let (𝐱,𝐬)int(𝒦)×int(𝒦)(\mathbf{x},\mathbf{s})\in\text{int}(\mathcal{K})\times\text{int}(\mathcal{K}) satisfy

maxi,j|λij(𝐰xs)ν˙|γν˙\max_{i,j}\left|\lambda_{i}^{j}(\mathbf{w}_{xs})-\dot{\nu}\right|\leq\gamma\dot{\nu} (53)

for some scalars γ>0\gamma>0 and ν˙>0\dot{\nu}>0. Then

𝐑xs𝐖𝐱𝐬2γν˙,\displaystyle\left\|\mathbf{R}_{xs}-\mathbf{W_{xs}}\right\|\leq 2\gamma\dot{\nu}, (54)
𝐖xsν˙𝐈γν˙,\displaystyle\left\|\mathbf{W}_{xs}-\dot{\nu}\mathbf{I}\right\|\leq\gamma\dot{\nu},
𝐑xsν˙𝐈3γν˙,\displaystyle\left\|\mathbf{R}_{xs}-\dot{\nu}\mathbf{I}\right\|\leq 3\gamma\dot{\nu},

Lemma 4 [Monteiro and Tsuchiya, 2000]. Let (𝐱,𝐬)int(𝒦)×int(𝒦)(\mathbf{x},\mathbf{s})\in\text{int}(\mathcal{K})\times\text{int}(\mathcal{K}) satisfy

𝐑xsν˙𝐈τ˙ν˙,\left\|\mathbf{R}_{xs}-\dot{\nu}\mathbf{I}\right\|\leq\dot{\tau}\dot{\nu}, (55)

for some scalars τ˙(0,1)\dot{\tau}\in(0,1) and ν˙>0\dot{\nu}>0. Assume that (𝐮,𝐯)𝒦×𝒦(\mathbf{u},\mathbf{v})\in\mathcal{K}\times\mathcal{K} and 𝐡𝒦\mathbf{h}\in\mathcal{K} satisfy

𝐒𝐮+𝐗𝐯=𝐡,𝐮T𝐯0,\mathbf{S}\mathbf{u}+\mathbf{X}\mathbf{v}=\mathbf{h},\mathbf{u}^{T}\mathbf{v}\geq 0, (56)

and define δu𝐓x1𝐮\delta_{u}\triangleq\left\|\mathbf{T}_{x}^{-1}\mathbf{u}\right\| and δv𝐓x𝐯\delta_{v}\triangleq\left\|\mathbf{T}_{x}\mathbf{v}\right\|. Then,

δu𝐓x𝐗1𝐡(1τ˙)ν,δv2𝐓x𝐗1𝐡1τ˙\delta_{u}\leq\frac{\left\|\mathbf{T}_{x}\mathbf{X}^{-1}\mathbf{h}\right\|}{(1-\dot{\tau})\nu},\delta_{v}\leq\frac{2\left\|\mathbf{T}_{x}\mathbf{X}^{-1}\mathbf{h}\right\|}{1-\dot{\tau}} (57)

Using the above lemmas, we obtain the following result that shows the well-definedness of the search directions.

Theorem 2. Let (𝐱,𝐲,𝐬)int(𝒞)(\mathbf{x},\mathbf{y},\mathbf{s})\in\text{int}(\mathcal{C}) satisfy

maxi,jλij(𝐰xs)νγν\max_{i,j}\left\|\lambda_{i}^{j}(\mathbf{w}_{xs})-\nu\right\|\leq\gamma\nu (58)

for γ(0,1/3)\gamma\in(0,1/3), and 𝐃𝒢\mathbf{D}\in\mathcal{G}. Then, the linear systems Eq. (38) has unique solution.

Proof. In order to prove the linear system Eq. (38) has unique solution, it is equivalent to show the solution of the homogeneous system associated with it is 0. The homogeneous system is

𝐀𝐝hx=0,\displaystyle\mathbf{A}\mathbf{d}_{hx}=0, (59)
𝐀T𝐝hy+𝐂dhx+𝐝hs=0,\displaystyle{{\mathbf{A}}^{T}}\mathbf{d}_{hy}+\mathbf{C}d_{hx}+\mathbf{d}_{hs}=0,
𝐗¯𝐃𝐝hs+𝐒¯𝐃T𝐝hx=0,\displaystyle\bar{\mathbf{X}}\mathbf{D}\mathbf{d}_{hs}+\bar{\mathbf{S}}\mathbf{D}^{-T}\mathbf{d}_{hx}=0,

where the solution is denoted by (𝐝hx,𝐝hy,𝐝hs)(\mathbf{d}_{hx},\mathbf{d}_{hy},\mathbf{d}_{hs}).

Because 𝐃𝒢\mathbf{D}\in\mathcal{G}, it is invertible. Since 𝐂=𝐂T\mathbf{C}=-\mathbf{C}^{T} by definition Eq. (39), multiplying 𝐝hyT\mathbf{d}_{hy}^{T} and 𝐝hxT\mathbf{d}_{hx}^{T} to the first and second equations in Eq. (59) respectively, we can obtain

(𝐃T𝐝hx)T(𝐃𝐝hs)=𝐝hxT𝐝hs=0.(\mathbf{D}^{-T}\mathbf{d}_{hx})^{T}(\mathbf{D}\mathbf{d}_{hs})=\mathbf{d}_{hx}^{T}\mathbf{d}_{hs}=0. (60)

By proposition 3 b),

maxi,jλij(𝐰x¯s¯)ν=maxi,jλij(𝐰xs)νγ.\max_{i,j}\left\|\lambda_{i}^{j}(\mathbf{w}_{\bar{x}\bar{s}})-\nu\right\|=\max_{i,j}\left\|\lambda_{i}^{j}(\mathbf{w}_{xs})-\nu\right\|\leq\gamma. (61)

By Lemma 3, Eq. (55) holds for τ˙=3γ<1\dot{\tau}=3\gamma<1. By the last equation in Eq. (59), using 𝐃𝐝hs\mathbf{D}\mathbf{d}_{hs} and 𝐃1𝐝hx\mathbf{D}^{-1}\mathbf{d}_{hx} as 𝐮\mathbf{u} and 𝐯\mathbf{v} in Lemma 4 with h=0\textbf{h}=0, we obtain

𝐝hx=0,𝐝hs=0.\mathbf{d}_{hx}=0,\mathbf{d}_{hs}=0. (62)

Then, by the second equation in Eq. (59),

𝐀T𝐝hy=0.{{\mathbf{A}}^{T}}\mathbf{d}_{hy}=0. (63)

Since A has full row rank, we obtain

𝐝hy=0.\mathbf{d}_{hy}=0. (64)

Consequently, the linear system Eq. (59) has unique solution. Then, the linear system Eq. (38) also has unique solution. Q.E.D.

3.3 The improvements in each iteration

Now we investigate the improvement of the primal residual, dual residual and complementarity gap in each iteration. For a step size α\alpha, define

𝐱(α)𝐱+αΔ𝐱,𝐲(α)𝐲+αΔ𝐲,𝐬(α)𝐬+αΔ𝐬.\displaystyle\mathbf{x}(\alpha)\triangleq\mathbf{x}+\alpha\Delta\mathbf{x},\mathbf{y}(\alpha)\triangleq\mathbf{y}+\alpha\Delta\mathbf{y},\mathbf{s}(\alpha)\triangleq\mathbf{s}+\alpha\Delta\mathbf{s}. (65)

The following theorem states the improvement in each iteration to solve the linear system Eq. (38). It also shows that the updating directions of 𝐱\mathbf{x} and 𝐬\mathbf{s} are orthogonal.

Theorem 3. Assuming the updating direction (Δ𝐱,Δ𝐲,Δ𝐬)\left(\Delta{{\mathbf{x}}},\Delta{{\mathbf{y}}},\Delta{{\mathbf{s}}}\right) satisfy the linear system Eq. (38), then

𝐀𝐱(α)=(1(1ν)α)𝐀𝐱,\displaystyle\mathbf{Ax}\left(\alpha\right)=\left(1-\left(1-\nu\right)\alpha\right)\mathbf{Ax}, (66)
𝐀T𝐲(α)+𝐂𝐱(α)+𝐬(α)=(1(1ν)α)(𝐀T𝐲+𝐂𝐱+𝐬),\displaystyle{{\mathbf{A}}^{T}}\mathbf{y}(\alpha)+\mathbf{C}\mathbf{x}\left(\alpha\right)+\mathbf{s}(\alpha)=\left(1-\left(1-\nu\right)\alpha\right)\left({{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{C}\mathbf{x}+\mathbf{s}\right),
𝐱(α)T𝐬(α)=(1(1ν)α)𝐱T𝐬,\displaystyle\mathbf{x}{{\left(\alpha\right)}^{T}}\mathbf{s}\left(\alpha\right)=\left(1-\left(1-\nu\right)\alpha\right){\mathbf{x}}^{T}\mathbf{s},
Δ𝐱TΔ𝐬=0.\displaystyle\Delta{{\mathbf{x}}^{T}}\Delta\mathbf{s}=0.

Proof. The first two equations in Eq. (66) can be proved easily using elementary linear algebra.

Then, we prove the last two equations in Eq. (66).

Since 𝐂=𝐂T\mathbf{C}=-\mathbf{C}^{T} by definition Eq. (39), multiplying the first two equation in Eq. (38) by Δ𝐲T-\Delta\mathbf{y}^{T}, and Δ𝐱T\Delta\mathbf{x}^{T} respectively, and adding them together, we obtain

Δ𝐱TΔ𝐬=(1ν)(Δ𝐲T𝐀𝐱Δ𝐱T(𝐀T𝐲+𝐂𝐱+𝐬)).\Delta{{\mathbf{x}}^{T}}\Delta\mathbf{s}=\left(1-\nu\right)\left(\Delta{{\mathbf{y}}^{T}}\mathbf{Ax}-\Delta{{\mathbf{x}}^{T}}\left({{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{C}\mathbf{x}+\mathbf{s}\right)\right). (67)

Multiplying the first two equation in Eq. (38) by 𝐲T-\mathbf{y}^{T}, 𝐱T\mathbf{x}^{T} respectively, and adding them together, we get

(𝐱+Δ𝐱)T(𝐬+Δ𝐬)+Δ𝐲T𝐀𝐱Δ𝐱T(𝐀T𝐲+𝐂𝐱+𝐬)=ν𝐱T𝐬+Δ𝐱TΔ𝐬.{{\left(\mathbf{x}+\Delta\mathbf{x}\right)}^{T}}\left(\mathbf{s}+\Delta\mathbf{s}\right)+\Delta{{\mathbf{y}}^{T}}\mathbf{Ax}-\Delta{{\mathbf{x}}^{T}}\left({{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{C}\mathbf{x}+\mathbf{s}\right)=\nu{\mathbf{x}}^{T}\mathbf{s}+\Delta\mathbf{x}^{T}\Delta\mathbf{s}. (68)

Using the definition Eq. (39) and Eq. (40), summing the first element of each cone in the third equation in Eq. (38) yields

(𝐱+Δ𝐱)T(𝐬+Δ𝐬)=ν𝐱T𝐬+Δ𝐱TΔ𝐬.{{\left(\mathbf{x}+\Delta\mathbf{x}\right)}^{T}}\left(\mathbf{s}+\Delta\mathbf{s}\right)=\nu{{\mathbf{x}}^{T}}\mathbf{s}+\Delta\mathbf{x}^{T}\Delta\mathbf{s}. (69)

Subtracting Eq. (68) and Eq. (69), we obtain

Δ𝐲T𝐀𝐱Δ𝐱T(𝐀T𝐲+𝐂𝐱+𝐬)=0.\Delta{{\mathbf{y}}^{T}}\mathbf{Ax}-\Delta{{\mathbf{x}}^{T}}\left({{\mathbf{A}}^{T}}\mathbf{y}+\mathbf{C}\mathbf{x}+\mathbf{s}\right)=0. (70)

From Eq. (67) and Eq. (70), we get

Δ𝐱TΔ𝐬=0.\Delta{{\mathbf{x}}^{T}}\Delta\mathbf{s}=0. (71)

Combining Eq. (71) with Eq. (69) and the definition Eq. (65) yields

𝐱(α)T𝐬(α)=(1(1ν)α)𝐱T𝐬.\mathbf{x}{{\left(\alpha\right)}^{T}}\mathbf{s}\left(\alpha\right)=\left(1-\left(1-\nu\right)\alpha\right){\mathbf{x}}^{T}\mathbf{s}. (72)

Q.E.D.

The first three equation in Eq. (66) can also be written as

𝐫^p(α)=(1(1ν)α)𝐫^p\displaystyle\hat{\mathbf{r}}_{p}(\alpha)=(1-(1-\nu)\alpha)\hat{\mathbf{r}}_{p} (73)
𝐫^d(α)=(1(1ν)α)𝐫^d\displaystyle\hat{\mathbf{r}}_{d}(\alpha)=(1-(1-\nu)\alpha)\hat{\mathbf{r}}_{d}
μ(α)=(1(1ν)α)μ,\displaystyle\mu(\alpha)=(1-(1-\nu)\alpha)\mu,

where 𝐫^p(α)\hat{\mathbf{r}}_{p}(\alpha), 𝐫^d(α)\hat{\mathbf{r}}_{d}(\alpha), and μ(α)\mu(\alpha) are 𝐫^p\hat{\mathbf{r}}_{p}, 𝐫^d\hat{\mathbf{r}}_{d}, and μ\mu for the point (𝐱(α),𝐲(α),𝐬(α))(\mathbf{x}(\alpha),\mathbf{y}(\alpha),\mathbf{s}(\alpha)). The relation shows that the improvement in each iteration depends on α\alpha and ν\nu. If there are constants α(0,1]\alpha\in(0,1] and ν[0,1)\nu\in[0,1) that keep (𝐱(α),𝐲(α),𝐬(α))(\mathbf{x}(\alpha),\mathbf{y}(\alpha),\mathbf{s}(\alpha)) in a central path neighborhood 𝒩2(γ)\mathcal{N}_{2}(\gamma) or 𝒩(γ)\mathcal{N}_{\infty}(\gamma), where γ(0,1)\gamma\in(0,1) is also constant, the polynomial convergence is established.

3.4 Results to bound the updating in the central path neighborhood

This subsection presents results to bound the updated solution (𝐱(α),𝐲(α),𝐬(α))(\mathbf{x}(\alpha),\mathbf{y}(\alpha),\mathbf{s}(\alpha)) in a central path neighborhood 𝒩2(γ)\mathcal{N}_{2}(\gamma) for some γ(0,1)\gamma\in(0,1), when the updating directions are unscaled.

By the definition Eq. (43), a point lying in 𝒩2(γ)\mathcal{N}_{2}(\gamma) should satisfy two conditions: (1) the distance d2d_{2} is bounded by γμ\gamma\mu; (2) the point is in the interior of the cone constraints. The following lemmas provide means to guarantee the two conditions. They are modifications of the results for feasible IPMs [Monteiro and Tsuchiya, 2000] by including the HSD model and removing the equity constaints, so as to be applicable in infeasible IPMs and hold in cases where the primal or the dual problem is infeasible. The equavalent form of the HSD model and the orthogonality of search directions provided by Theorem 4 allow them to be proved roughly the same as the counterparts for feasible IPMs.

Firstly, we study the conditions to bound the distance d2d_{2}.

The following Lemma simplifies the expression of 𝐓x1𝐱(α)𝐓x𝐬(α)μ(α)𝐞\mathbf{T}_{x}^{-1}\mathbf{x}(\alpha)\circ\mathbf{T}_{x}\mathbf{s}(\alpha)-\mu(\alpha)\mathbf{e}, that is useful to analyze d2(𝐱(α),𝐬(α))d_{2}(\mathbf{x}(\alpha),\mathbf{s}(\alpha)).

Lemma 5. Let (𝐱,𝐲,𝐬)int(𝒞)(\mathbf{x},\mathbf{y},\mathbf{s})\in\text{int}(\mathcal{C}), the updating direction (Δ𝐱,Δ𝐲,Δ𝐬)\left(\Delta{{\mathbf{x}}},\Delta{{\mathbf{y}}},\Delta{{\mathbf{s}}}\right) satisfy the linear system Eq. (38) with αR\alpha\in R and

𝐃=𝐈.\mathbf{D}=\mathbf{I}. (74)

Then

𝐳(α)=(1α)(𝐰xsμ𝐞)+α(𝐖xs𝐑xs)Δ𝐱^+α2Δ𝐱^Δ𝐬^,\mathbf{z}(\alpha)=(1-\alpha)(\mathbf{w}_{xs}-\mu\mathbf{e})+\alpha(\mathbf{W}_{xs}-\mathbf{R}_{xs})\widehat{\Delta\mathbf{x}}+\alpha^{2}\widehat{\Delta\mathbf{x}}\circ\widehat{\Delta\mathbf{s}}, (75)

where

𝐳(α)𝐓x1𝐱(α)𝐓x𝐬(α)μ(α)𝐞,Δ𝐱^𝐓x1Δ𝐱,Δ𝐬^𝐓xΔ𝐬.\mathbf{z}(\alpha)\triangleq\mathbf{T}_{x}^{-1}\mathbf{x}(\alpha)\circ\mathbf{T}_{x}\mathbf{s}(\alpha)-\mu(\alpha)\mathbf{e},\widehat{\Delta\mathbf{x}}\triangleq\mathbf{T}_{x}^{-1}\Delta\mathbf{x},\widehat{\Delta\mathbf{s}}\triangleq\mathbf{T}_{x}\Delta\mathbf{s}. (76)

Proof. Due to the assumption, the definition Eq. (25), Eq. (44), and Lemma 1 b), multiplying 𝐓x𝐗1\mathbf{T}_{x}\mathbf{X}^{-1} to the third equation of the linear system Eq. (38) yields

Δ𝐬^=𝐓xΔ𝐬=𝐓x𝐗1(νμ𝐞𝐗𝐬𝐒Δ𝐱)=νμ𝐞𝐰xs𝐑xsΔ𝐱^.\widehat{\Delta\mathbf{s}}=\mathbf{T}_{x}\Delta\mathbf{s}=\mathbf{T}_{x}\mathbf{X}^{-1}(\nu\mu\mathbf{e}-\mathbf{X}\mathbf{s}-\mathbf{S}\Delta\mathbf{x})=\nu\mu\mathbf{e}-\mathbf{w}_{xs}-\mathbf{R}_{xs}\widehat{\Delta\mathbf{x}}. (77)

Using the above result and definition Eq. (65), we have

𝐓x1𝐱(α)𝐓x𝐬(α)=(𝐓x1𝐱+αΔ𝐱^)(𝐓x𝐬+αΔ𝐬^)\displaystyle\mathbf{T}_{x}^{-1}\mathbf{x}(\alpha)\circ\mathbf{T}_{x}\mathbf{s}(\alpha)=\left(\mathbf{T}_{x}^{-1}\mathbf{x}+\alpha\widehat{\Delta\mathbf{x}}\right)\circ\left(\mathbf{T}_{x}\mathbf{s}+\alpha\widehat{\Delta\mathbf{s}}\right) (78)
=(𝐞+αΔ𝐱^)(𝐰xs+αΔ𝐬^)\displaystyle=\left(\mathbf{e}+\alpha\widehat{\Delta\mathbf{x}}\right)\circ\left(\mathbf{w}_{xs}+\alpha\widehat{\Delta\mathbf{s}}\right)
=𝐰xs+α(𝐰xsΔ𝐱^+Δ𝐬^)+α2Δ𝐱^Δ𝐬^\displaystyle=\mathbf{w}_{xs}+\alpha(\mathbf{w}_{xs}\circ\widehat{\Delta\mathbf{x}}+\widehat{\Delta\mathbf{s}})+\alpha^{2}\widehat{\Delta\mathbf{x}}\circ\widehat{\Delta\mathbf{s}}
=𝐰xs+α(νμ𝐞𝐰xs+(𝐖xs𝐑xs)Δ𝐱^)+α2Δ𝐱^Δ𝐬^\displaystyle=\mathbf{w}_{xs}+\alpha\left(\nu\mu\mathbf{e}-\mathbf{w}_{xs}+(\mathbf{W}_{xs}-\mathbf{R}_{xs})\widehat{\Delta\mathbf{x}}\right)+\alpha^{2}\widehat{\Delta\mathbf{x}}\circ\widehat{\Delta\mathbf{s}}

Combining Eq. (78) and Eq. (73), we obtain Eq.(75). Q.E.D.

Lemma 6 and 7 are introduced to bound the norm of Δ𝐱^Δ𝐬^\widehat{\Delta\mathbf{x}}\circ\widehat{\Delta\mathbf{s}} in Eq. (75).

Lemma 6. Assume that γ(0,1/3)\gamma\in(0,1/3), (𝐱,𝐲,𝐬)𝒩2(γ)(\mathbf{x},\mathbf{y},\mathbf{s})\in\mathcal{N}_{2}(\gamma), (Δ𝐱,Δ𝐲,Δ𝐬)(\Delta\mathbf{x},\Delta\mathbf{y},\Delta\mathbf{s}) is the solution of the linear system Eq. (38) with 𝐃=𝐈\mathbf{D}=\mathbf{I}. Then, the scaled increments satisfies

Δ𝐱^Γp/2,Δ𝐬^Γpμ,\left\|\widehat{\Delta\mathbf{x}}\right\|\leq\Gamma_{p}/2,\quad\left\|\widehat{\Delta\mathbf{s}}\right\|\leq\Gamma_{p}\mu, (79)

where

Γp=2(γ2/2+(1ν)2k)1/2/(13γ).\displaystyle\Gamma_{p}=2\left(\gamma^{2}/2+(1-\nu)^{2}k\right)^{1/2}/(1-3\gamma). (80)

Proof.

The assumption and the definition of the neighborhood Eq. (43) show

𝐰xsμ𝐞γμ/2.\|\mathbf{w}_{xs}-\mu\mathbf{e}\|\leq\gamma\mu/\sqrt{2}. (81)

Because

𝐰xsT𝐞=𝐬T𝐓x𝐞=𝐬T𝐱=kμ,\mathbf{w}_{xs}^{T}\mathbf{e}=\mathbf{s}^{T}\mathbf{T}_{x}\mathbf{e}=\mathbf{s}^{T}\mathbf{x}=k\mu, (82)

we obtain

(𝐰xsμ𝐞)T𝐞=0.(\mathbf{w}_{xs}-\mu\mathbf{e})^{T}\mathbf{e}=0. (83)

Then,

𝐰xsνμ𝐞2=𝐰xsμ𝐞2+2μ𝐞νμ𝐞2+2(1ν)μ(𝐰xsμ𝐞)T𝐞\displaystyle\|\mathbf{w}_{xs}-\nu\mu\mathbf{e}\|^{2}=\|\mathbf{w}_{xs}-\mu\mathbf{e}\|^{2}+2\|\mu\mathbf{e}-\nu\mu\mathbf{e}\|^{2}+2(1-\nu)\mu(\mathbf{w}_{xs}-\mu\mathbf{e})^{T}\mathbf{e} (84)
(γ2/2+(1ν)2k)μ2.\displaystyle\leq\left(\gamma^{2}/2+(1-\nu)^{2}k\right)\mu^{2}.

Because d(𝐱,𝐬)γμd_{\infty}(\mathbf{x},\mathbf{s})\leq\gamma\mu, using Lemma 3 with ν˙=μ\dot{\nu}=\mu, Eq. (55) holds with τ˙=3γ<1\dot{\tau}=3\gamma<1 and ν˙=μ\dot{\nu}=\mu.

By Theorem 3, we also have Δ𝐱TΔ𝐬=0\Delta\mathbf{x}^{T}\Delta\mathbf{s}=0. Hence, using Lemma 4 with ν˙=μ,𝐮=Δ𝐱,𝐯=Δ𝐬,𝐡=νμ𝐞𝐱𝐬\dot{\nu}=\mu,\mathbf{u}=\Delta\mathbf{x},\mathbf{v}=\Delta\mathbf{s},\mathbf{h}=\nu\mu\mathbf{e}-\mathbf{x}\circ\mathbf{s} and τ˙=3γ\dot{\tau}=3\gamma, we have

Δ𝐱^𝐓x(𝐗)1(νμ𝐞𝐗𝐬)μ(13γ)=νμ𝐞𝐰xsμ(13γ),\displaystyle\left\|\widehat{\Delta\mathbf{x}}\right\|\leq\frac{\|\mathbf{T}_{x}(\mathbf{X})^{-1}\left(\nu\mu\mathbf{e}-\mathbf{X}\mathbf{s}\right)\|}{\mu(1-3\gamma)}=\frac{\|\nu\mu\mathbf{e}-\mathbf{w}_{xs}\|}{\mu(1-3\gamma)}, (85)
Δ𝐬^2𝐓x(𝐗)1(νμ𝐞𝐗𝐬)13γ=2νμ𝐞𝐰xs13γ.\displaystyle\left\|\widehat{\Delta\mathbf{s}}\right\|\leq\frac{2\|\mathbf{T}_{x}(\mathbf{X})^{-1}\left(\nu\mu\mathbf{e}-\mathbf{X}\mathbf{s}\right)\|}{1-3\gamma}=\frac{2\|\nu\mu\mathbf{e}-\mathbf{w}_{xs}\|}{1-3\gamma}.

Using the result along with Eq. (79) and Eq. (84), we obtain Eq. (79) and Eq. (80). Q.E.D.

Lemma 7 [Tsuchiya, 1999]. Let 𝐮,𝐯n\mathbf{u},\mathbf{v}\in\mathcal{R}^{n}, then

𝐮𝐯2𝐮𝐯.\left\|\mathbf{u}\circ\mathbf{v}\right\|\leq\sqrt{2}\left\|\mathbf{u}\right\|\left\|\mathbf{v}\right\|. (86)

Proof. See Lemma 2.12 of [Tsuchiya, 1999].

Now, it is ready to bound (𝐳(α))\|(\mathbf{z}(\alpha))\| as the following lemma.

Lemma 8. Assume that (𝐱,𝐲,𝐬)𝒩2(γ)(\mathbf{x},\mathbf{y},\mathbf{s})\in\mathcal{N}_{2}(\gamma) for some scalar γ(0,1/3)\gamma\in(0,1/3), and let (Δ𝐱,Δ𝐲,Δ𝐬)(\Delta\mathbf{x},\Delta\mathbf{y},\Delta\mathbf{s}) be the unique solution of the linear system Eq. (38) with 𝐃=𝐈\mathbf{D}=\mathbf{I} for some ν\nu\in\mathcal{R}, and any α[0,1]\alpha\in[0,1]. Then,

2(𝐳(α))((1α)γ+2αγΓp+α2Γp2)μ.\sqrt{2}\|(\mathbf{z}(\alpha))\|\leq\left((1-\alpha)\gamma+\sqrt{2}\alpha\gamma\Gamma_{p}+\alpha^{2}\Gamma_{p}^{2}\right)\mu. (87)

Proof. Due to Lemma 6 and Lemma 7, we have

Δ𝐱^Δ𝐬^22Γp2μ.\left\|\widehat{\Delta\mathbf{x}}\circ\widehat{\Delta\mathbf{s}}\right\|\leq\frac{\sqrt{2}}{2}\Gamma_{p}^{2}\mu. (88)

By the assumption, Lemma 5, Eq. (43), and Eq. (88), we obtain

2(𝐳(α))=2𝐓x1(𝐱(α))𝐓x(𝐬(α))μ(α)𝐞\displaystyle\sqrt{2}\|(\mathbf{z}(\alpha))\|=\sqrt{2}\left\|\mathbf{T}_{x}^{-1}(\mathbf{x}(\alpha))\circ\mathbf{T}_{x}(\mathbf{s}(\alpha))-\mu(\alpha)\mathbf{e}\right\| (89)
((1α)𝐰xsμ𝐞+2α𝐑xsi𝐖xsiΔ𝐱^+α2Δ𝐱^Δ𝐬^)\displaystyle\leq\left((1-\alpha)\|\mathbf{w}_{xs}-\mu\mathbf{e}\|+\sqrt{2}\alpha\|\mathbf{R}_{xs}^{i}-\mathbf{W}_{xs}^{i}\|\left\|\widehat{\Delta\mathbf{x}}\right\|+\alpha^{2}\left\|\widehat{\Delta\mathbf{x}}\circ\widehat{\Delta\mathbf{s}}\right\|\right)
((1α)γ+2αγΓp+α2Γp2)μ.\displaystyle\leq\left((1-\alpha)\gamma+\sqrt{2}\alpha\gamma\Gamma_{p}+\alpha^{2}\Gamma_{p}^{2}\right)\mu.

Q.E.D.

The following lemma shows d2(𝐱(α),𝐬(α))d_{2}(\mathbf{x}(\alpha),\mathbf{s}(\alpha)) is a lower bound of the approximation 2𝐳(α)2\sqrt{2}\|\mathbf{z}(\alpha)\|_{2}.

Lemma 9 Let (𝐱,𝐬)int(𝒦)×int(𝒦)(\mathbf{x},\mathbf{s})\in\text{int}(\mathcal{K})\times\text{int}(\mathcal{K}). Then,

d2(𝐱,𝐬)=2𝐰xsμ𝐞=min𝐃𝒢(𝐃T𝐱)(𝐃𝐬)μ𝐞d_{2}(\mathbf{x},\mathbf{s})=\sqrt{2}\|\mathbf{w}_{xs}-\mu\mathbf{e}\|=\min_{\mathbf{D}\in\mathcal{G}}\|(\mathbf{D}^{-T}\mathbf{x})\circ(\mathbf{D}\mathbf{s})-\mu\mathbf{e}\| (90)

Proof. See Lemma 2.10 of [Tsuchiya, 1999].

Secondly, we introduce the next lemma to facilitate the proof of the interior point condition.

Lemma 10 [Monteiro and Tsuchiya, 2000]. Let (𝐱,𝐬)𝒦×𝒦(\mathbf{x},\mathbf{s})\in\mathcal{K}\times\mathcal{K}. If 𝐱𝐬int(𝒦)\mathbf{x}\circ\mathbf{s}\in\text{int}(\mathcal{K}), then (𝐱,𝐬)int(𝒦)×int(𝒦)(\mathbf{x},\mathbf{s})\in\text{int}(\mathcal{K})\times\text{int}(\mathcal{K}). In particular, if 2𝐱𝐬ν𝐞γν\sqrt{2}\left\|\mathbf{x}\circ\mathbf{s}-\nu\mathbf{e}\right\|\leq\gamma\nu for some γ(0,1)\gamma\in(0,1), and ν>0\nu>0, then (𝐱,𝐬)int(𝒦)×int(𝒦)(\mathbf{x},\mathbf{s})\in\text{int}(\mathcal{K})\times\text{int}(\mathcal{K}).

Proof. See Lemma 10 of [Monteiro and Tsuchiya, 2000].

3.5 polynomial convergence for unscaled searching directions

This subsection establishes iteration-complexity bounds of the infeasible IPM Algorithm 1 to reduce the residuals 𝐫^p\hat{\mathbf{r}}_{p}, 𝐫^d\hat{\mathbf{r}}_{d}, and complementarity gap μ\mu, by solving the linear system Eq. (38) with the unscaled searching directions.

Theorem 4. Let γ(0,1/3)\gamma\in(0,1/3), α[0,1]\alpha\in[0,1], δ(0,1)\delta\in(0,1) satisfy

Γ~4(γ2+δ2)(13γ)2(1δ2k)1<γ,\displaystyle\tilde{\Gamma}\triangleq\frac{4\left(\gamma^{2}+\delta^{2}\right)}{\left(1-3\gamma\right)^{2}}\left(1-\frac{\delta}{\sqrt{2k}}\right)^{-1}<\gamma, (91)
ν=1δ/2k.\displaystyle\nu=1-\delta/\sqrt{2k}.

Assume that (𝐱,𝐲,𝐬)𝒩2(γ)(\mathbf{x},\mathbf{y},\mathbf{s})\in\mathcal{N}_{2}(\gamma), 𝐃=𝐈\mathbf{D}=\mathbf{I} and (Δ𝐱,Δ𝐲,Δ𝐬)(\Delta\mathbf{x},\Delta\mathbf{y},\Delta\mathbf{s}) is the solution of the linear system Eq. (38). Then,

(𝐱(α),𝐲(α),𝐬(α))𝒩2(γ),(\mathbf{x}(\alpha),\mathbf{y}(\alpha),\mathbf{s}(\alpha))\in\mathcal{N}_{2}(\gamma), (92)

and

𝐫^p(α)=(1αδ/2k)𝐫^p,\displaystyle\hat{\mathbf{r}}_{p}(\alpha)=(1-\alpha\delta/\sqrt{2k})\hat{\mathbf{r}}_{p}, (93)
𝐫^d(α)=(1αδ/2k)𝐫^d,\displaystyle\hat{\mathbf{r}}_{d}(\alpha)=(1-\alpha\delta/\sqrt{2k})\hat{\mathbf{r}}_{d},
μ(α)=(1αδ/2k)μ.\displaystyle\mu(\alpha)=(1-\alpha\delta/\sqrt{2k})\mu.

Proof.

Due to the assumption, Lemma 8, Eq. (73), and Γp2γ\Gamma_{p}\geq\sqrt{2}\gamma, we obtain

2𝐳(α)((1α)γ+2αΓp2)μ\displaystyle\sqrt{2}\mathbf{z}(\alpha)\leq\left((1-\alpha)\gamma+2\alpha\Gamma_{p}^{2}\right)\mu (94)
((1α)γ+8α(γ2/2+(1ν)2k)/(13γ)2)μ\displaystyle\leq\left((1-\alpha)\gamma+8\alpha(\gamma^{2}/2+(1-\nu)^{2}k)/(1-3\gamma)^{2}\right)\mu
=((1α)γ+4α(γ2+δ2)/(13γ)2)μ\displaystyle=\left((1-\alpha)\gamma+4\alpha(\gamma^{2}+\delta^{2})/(1-3\gamma)^{2}\right)\mu
=((1α)γ+αΓ~(1δ/2k))μ\displaystyle=\left((1-\alpha)\gamma+\alpha\tilde{\Gamma}\left(1-\delta/\sqrt{2k}\right)\right)\mu
=((1α)γ+ανΓ~)μ\displaystyle=\left((1-\alpha)\gamma+\alpha\nu\tilde{\Gamma}\right)\mu
γ(1α+αν)μ\displaystyle\leq\gamma\left(1-\alpha+\alpha\nu\right)\mu
=γμ(α).\displaystyle=\gamma\mu(\alpha).

Denote 𝐮˙(α)=(𝐱(α),𝐲(α),𝐬(α))\dot{\mathbf{u}}(\alpha)=(\mathbf{x}(\alpha),\mathbf{y}(\alpha),\mathbf{s}(\alpha)). In order to prove Eq. (92), we have to show 𝐮˙(α)int(𝒞)\dot{\mathbf{u}}(\alpha)\in\text{int}(\mathcal{C}). Suppose that 𝐮˙(α˙)𝒞\dot{\mathbf{u}}(\dot{\alpha})\notin\mathcal{C} for some step size α˙[0,1]\dot{\alpha}\in[0,1]. Then, at least one of the cone constraint is not satisfied. Denote the part of 𝐮˙(α)\dot{\mathbf{u}}(\alpha) corresponding to an unsatisfied cone constraint by 𝐯˙(α)\dot{\mathbf{v}}(\alpha). Then, λmin(𝐯˙(0))>0\lambda_{\min}(\dot{\mathbf{v}}(0))>0 and λmin(𝐯˙(α˙))<0\lambda_{\min}(\dot{\mathbf{v}}(\dot{\alpha}))<0. Since λmin(𝐯˙(α))<0\lambda_{\min}(\dot{\mathbf{v}}(\alpha))<0 is continuous for α\alpha, there exists α¨(0,α˙)\ddot{\alpha}\in(0,\dot{\alpha}) satisfying λmin(𝐯˙(α¨))=0\lambda_{\min}(\dot{\mathbf{v}}(\ddot{\alpha}))=0. We use αˇ\check{\alpha} to denote the minimum α¨\ddot{\alpha} for all the cone constraints that is unsatisfied in 𝐮˙(α˙)\dot{\mathbf{u}}(\dot{\alpha}). Then, 𝐮˙(αˇ)𝒞int(𝒞)\dot{\mathbf{u}}(\check{\alpha})\in\mathcal{C}-\text{int}(\mathcal{C}). However, by Lemma 10 and Eq. (94), 𝐮˙(αˇ)int(𝒞)\dot{\mathbf{u}}(\check{\alpha})\in\text{int}(\mathcal{C}). The contradiction shows α˙\dot{\alpha} does not exist. Consequently, 𝐮˙(α)𝒞\dot{\mathbf{u}}(\alpha)\in\mathcal{C}. Using Lemma 10 and Eq. (94) again, we conclude that 𝐮˙(α)int(𝒞)\dot{\mathbf{u}}(\alpha)\in\text{int}(\mathcal{C}) for α[0,1]\alpha\in[0,1].

Due to Lemma 9 and Eq. (94), we obtain

d2(𝐱(α),𝐬(α))=2𝐓x(α)1𝐱(α)𝐓x(α)𝐬(α)μ(α)𝐞\displaystyle d_{2}(\mathbf{x}(\alpha),\mathbf{s}(\alpha))=\sqrt{2}\left\|\mathbf{T}_{x(\alpha)}^{-1}\mathbf{x}(\alpha)\circ\mathbf{T}_{x(\alpha)}\mathbf{s}(\alpha)-\mu(\alpha)\mathbf{e}\right\| (95)
2𝐓x1𝐱(α)𝐓x𝐬(α)μ(α)𝐞\displaystyle\leq\sqrt{2}\left\|\mathbf{T}_{x}^{-1}\mathbf{x}(\alpha)\circ\mathbf{T}_{x}\mathbf{s}(\alpha)-\mu(\alpha)\mathbf{e}\right\|
γμ(α).\displaystyle\leq\gamma\mu(\alpha).

Consequently, Eq. (92) holds.

From Eq. (73) and the assumption, we also obtain Eq. (93). Q.E.D.

Algorithm 1 computes the search direction (Δ𝐱,Δ𝐲,Δ𝐬,Δκ,Δτ)(\Delta\mathbf{x},\Delta\mathbf{y},\Delta\mathbf{s},\Delta\kappa,\Delta\tau) by solving the linear system Eq. (28), that is identical to Eq. (35) and updates with step size α=1\alpha=1. Because Eq. (35) is a special case of (38), α=1\alpha=1 satisfies the condition of Theorem 4, the residuals 𝐫p,𝐫d\mathbf{r}_{p},\mathbf{r}_{d} are subvectors of 𝐫^p,𝐫^d\hat{\mathbf{r}}_{p},\hat{\mathbf{r}}_{d}, and μ\mu is unchanged, we can obtain the following proposition.

Proposition 4. Let the parameters satisfy

4(γ2+δ2)(13γ)2(1δ2(k+1))1<γ,γ(0,1/3),δ(0,1).\frac{4\left(\gamma^{2}+\delta^{2}\right)}{\left(1-3\gamma\right)^{2}}\left(1-\frac{\delta}{\sqrt{2(k+1)}}\right)^{-1}<\gamma,\gamma\in(0,1/3),\delta\in(0,1). (96)

Assume that (𝐱0,𝐲0,𝐬0,κ0,τ0)𝒩2(γ)(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\in\mathcal{N}_{2}(\gamma), 𝐃=𝐈\mathbf{D}=\mathbf{I}. Then, in Algorithm 1,

(𝐱,𝐲,𝐬,κ,τ)𝒩2(γ),\displaystyle(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\in\mathcal{N}_{2}(\gamma), (97)
𝐫p(𝐱,𝐲,𝐬,κ,τ)=(1δ/2(k+1))j𝐫p(𝐱0,𝐲0,𝐬0,κ0,τ0),\displaystyle\left\|\mathbf{r}_{p}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\right\|=\left(1-\delta/\sqrt{2(k+1)}\right)^{j}\left\|\mathbf{r}_{p}(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\right\|,
𝐫d(𝐱,𝐲,𝐬,κ,τ)=(1δ/2(k+1))j𝐫d(𝐱0,𝐲0,𝐬0,κ0,τ0),\displaystyle\left\|\mathbf{r}_{d}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\right\|=\left(1-\delta/\sqrt{2(k+1)}\right)^{j}\left\|\mathbf{r}_{d}(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\right\|,
μ(𝐱,𝐬,κ,τ)=(1δ/2(k+1))jμ(𝐱0,𝐬0,κ0,τ0),\displaystyle\mu\left(\mathbf{x},\mathbf{s},\kappa,\tau\right)=\left(1-\delta/\sqrt{2(k+1)}\right)^{j}\mu\left({{\mathbf{x}}_{0}},{{\mathbf{s}}_{0}},{{\kappa}_{0}},{{\tau}_{0}}\right),

where jj is the iteration number.

Proposition 4 shows the worst case iteration complexity of Algorithm 1 is O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right) when 𝐃=𝐈\mathbf{D}=\mathbf{I} and the parameters γ\gamma and δ\delta satisfy Eq. (96).

At the end of this subsection, we show that the centrality can be improved when the parameters are set properly. By setting α=1\alpha=1 in Eq. (94), and Lemma 9, we can also obtain

d2(𝐱(1),𝐬(1))𝐳(1)νΓ~μ=Γ~μ(1).d_{2}(\mathbf{x}(1),\mathbf{s}(1))\leq\mathbf{z}(1)\leq\nu\tilde{\Gamma}\mu=\tilde{\Gamma}\mu(1). (98)

Combining the result with Eq. (92),

(𝐱(1),𝐲(1),𝐬(1))𝒩2(Γ~),(\mathbf{x}(1),\mathbf{y}(1),\mathbf{s}(1))\in\mathcal{N}_{2}(\tilde{\Gamma}), (99)

that shows the centrality can be improved since Γ~<γ\tilde{\Gamma}<\gamma. For example, in Algorithm 1, if the parameters satisfy 4(γ2+δ2)/(13γ)2<γ{4\left(\gamma^{2}+\delta^{2}\right)}/{\left(1-3\gamma\right)^{2}}<\gamma, the central path neighborhood coefficient γ\gamma improves by 1δ/2(k+1)1-\delta/\sqrt{2(k+1)} in each iteration.

3.6 Polynomial convergence for the MZ family of directions

The MZ family of searching directions are generated by using 𝐃𝒢\mathbf{D}\in\mathcal{G} in the linear system Eq. (38).

For a scaling matrix 𝐃𝒢\mathbf{D}\in\mathcal{G}, consider the following change of variables

𝐀¯=𝐀𝐃T,𝐂¯=𝐃𝐂𝐃T.\bar{\mathbf{A}}=\mathbf{A}\mathbf{D}^{T},\bar{\mathbf{C}}=\mathbf{D}\mathbf{C}\mathbf{D}^{T}. (100)

Linear system Eq. (38) is transformed to

𝐀¯Δ𝐱¯=(1ν)𝐀¯𝐱¯,\displaystyle\bar{\mathbf{A}}\Delta\bar{\mathbf{x}}=-\left(1-\nu\right)\bar{\mathbf{A}}\bar{\mathbf{x}}, (101)
𝐀¯TΔ𝐲+𝐂¯Δ𝐱¯+Δ𝐬¯=(1ν)(𝐀¯T𝐲+𝐂¯𝐱¯+𝐬¯),\displaystyle\bar{\mathbf{A}}^{T}\Delta\mathbf{y}+\bar{\mathbf{C}}\Delta\bar{\mathbf{x}}+\Delta\bar{\mathbf{s}}=-(1-\nu)\left(\bar{\mathbf{A}}^{T}\mathbf{y}+\bar{\mathbf{C}}\bar{\mathbf{x}}+\bar{\mathbf{s}}\right),
𝐗¯Δ𝐬¯+𝐒¯Δ𝐱¯=νμ𝐞𝐱¯𝐬¯,\displaystyle\bar{\mathbf{X}}\Delta\bar{\mathbf{s}}+\bar{\mathbf{S}}\Delta\bar{\mathbf{x}}=\nu\mu\mathbf{e}-\bar{\mathbf{x}}\circ\bar{\mathbf{s}},

which is equal to (38) with 𝐱¯=𝐃T𝐱,𝐬¯=𝐃𝐬\bar{\mathbf{x}}=\mathbf{D}^{-T}\mathbf{x},\bar{\mathbf{s}}=\mathbf{D}\mathbf{s} instead of 𝐱\mathbf{x} and 𝐬\mathbf{s}, and 𝐃=𝐈\mathbf{D}=\mathbf{I}. Using Theorem 4 and Proposition 3 c) which shows the central path neighborhood 𝒩2(γ)\mathcal{N}_{2}(\gamma) is unchanged under the scaling, we obtain that Theorem 4 also holds with 𝐃𝒢\mathbf{D}\in\mathcal{G} instead of 𝐃=𝐈\mathbf{D}=\mathbf{I}.

Then, similar to Proposition 4, we obtain the next proposition immediately.

Proposition 5. Let the parameters satisfy

4(γ2+δ2)(13γ)2(1δ2(k+1))1<γ,γ(0,1/3),δ(0,1).\frac{4\left(\gamma^{2}+\delta^{2}\right)}{\left(1-3\gamma\right)^{2}}\left(1-\frac{\delta}{\sqrt{2(k+1)}}\right)^{-1}<\gamma,\gamma\in(0,1/3),\delta\in(0,1). (102)

Assume that (𝐱0,𝐲0,𝐬0,κ0,τ0)𝒩2(γ)(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\in\mathcal{N}_{2}(\gamma), 𝐃𝒢\mathbf{D}\in\mathcal{G}. Then, in Algorithm 1,

(𝐱,𝐲,𝐬,κ,τ)𝒩2(γ),\displaystyle(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\in\mathcal{N}_{2}(\gamma), (103)
𝐫p(𝐱,𝐲,𝐬,κ,τ)=(1δ/2(k+1))j𝐫p(𝐱0,𝐲0,𝐬0,κ0,τ0),\displaystyle\left\|\mathbf{r}_{p}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\right\|=\left(1-\delta/\sqrt{2(k+1)}\right)^{j}\left\|\mathbf{r}_{p}(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\right\|,
𝐫d(𝐱,𝐲,𝐬,κ,τ)=(1δ/2(k+1))j𝐫d(𝐱0,𝐲0,𝐬0,κ0,τ0),\displaystyle\left\|\mathbf{r}_{d}(\mathbf{x},\mathbf{y},\mathbf{s},\kappa,\tau)\right\|=\left(1-\delta/\sqrt{2(k+1)}\right)^{j}\left\|\mathbf{r}_{d}(\mathbf{x}_{0},\mathbf{y}_{0},\mathbf{s}_{0},\kappa_{0},\tau_{0})\right\|,
μ(𝐱,𝐬,κ,τ)=(1δ/2(k+1))jμ(𝐱0,𝐬0,κ0,τ0),\displaystyle\mu\left(\mathbf{x},\mathbf{s},\kappa,\tau\right)=\left(1-\delta/\sqrt{2(k+1)}\right)^{j}\mu\left({{\mathbf{x}}_{0}},{{\mathbf{s}}_{0}},{{\kappa}_{0}},{{\tau}_{0}}\right),

where jj is the iteration number.

Proposition 5 shows the worst case iteration complexity of Algorithm 1 is O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right) for the MZ family of searching directions, when the parameters γ\gamma and δ\delta satisfy Eq. (102). It is equal to the best known complexity of feasible IPMs. More precisely, the algorithm takes log(ϵ)/log(1δ/2k+1)\log(\epsilon)/\log\left(1-\delta/\sqrt{2k+1}\right) iterations to reduce the primal residual, dual residual, and complementary gap by a factor of 1/ϵ1/\epsilon.

4 Analysis of warm starting

In this section, we study the conditions that warm starting can improve the worst case iteration bound compared with cold starting for IPMs of SOCP.

The warm starting scheme investigated is modified from a scheme presented by Skajaa et al. [2013], which initialize infeasible IPMs for SOCP with a linear combination of the optimal solution of a similar problem solved previously and the cold starting point. They show that the scheme can improve the worst case interaion complexity under certain conditions in linear programming, and generalized the scheme to SOCP. We extend the scheme by using an inexact solution of a previous problem instead of the optimal solution, that enables the former problem to be solved for only a few iterations in order to save computation in scenarios such as successive convexification e.g. [Szmuk et al., 2020, Miao et al., 2021].

We firstly introduce the warm starting scheme.

Consider two related SOCP problems defined by Eq. (2), where the problem dimensions and cone constraints are unchanged. Denote the current SOCP problem by 𝒫\mathcal{P} and its paramters by 𝐀,𝐛,𝐜\mathbf{A},\mathbf{b},\mathbf{c}. The previous problem is 𝒫o\mathcal{P}_{o} and its paramters are 𝐀o,𝐛o,𝐜o\mathbf{A}_{o},\mathbf{b}_{o},\mathbf{c}_{o}, an inexact solution of which is (𝐱o,𝐲o,𝐬o)(\mathbf{x}_{o},\mathbf{y}_{o},\mathbf{s}_{o}). The differences of problem parameters are defined as

Δ𝐀=𝐀𝐀o,Δ𝐛=𝐛𝐛o,Δ𝐜=𝐜𝐜o.\Delta\mathbf{A}=\mathbf{A}-\mathbf{A}_{o},\ \Delta\mathbf{b}=\mathbf{b}-\mathbf{b}_{o},\ \Delta\mathbf{c}=\mathbf{c}-\mathbf{c}_{o}. (104)

The cold starting point for an infeasible IPMs based on the HSD model (e.g. Algorithm 1) is defined as

𝐪c=(𝐱c,𝐲c,𝐬c,κc,τc)=(𝐞,0,𝐞,1,1).\mathbf{q}_{c}=(\mathbf{x}_{c},\mathbf{y}_{c},\mathbf{s}_{c},\kappa_{c},\tau_{c})=(\mathbf{e},0,\mathbf{e},1,1). (105)

where 𝐪\mathbf{q} denotes the solution as a whole for simplicity. We also have μ(𝐪c)=1\mu(\mathbf{q}_{c})=1.

Then, the warm starting point 𝐪w=(𝐱w,𝐲w,𝐬w,κw,τw)\mathbf{q}_{w}=(\mathbf{x}_{w},\mathbf{y}_{w},\mathbf{s}_{w},\kappa_{w},\tau_{w}) is computed as

𝐱w=ω𝐱o+(1ω)𝐞\displaystyle{{\mathbf{x}}_{w}}=\omega{{\mathbf{x}}_{o}}+\left(1-\omega\right)\mathbf{e} (106)
𝐲w=ω𝐲o\displaystyle{{\mathbf{y}}_{w}}=\omega{{\mathbf{y}}_{o}}
𝐬w=ω𝐬o+(1ω)𝐞\displaystyle{{\mathbf{s}}_{w}}=\omega{{\mathbf{s}}_{o}}+\left(1-\omega\right)\mathbf{e}
κw=(𝐱w)T𝐬w/k\displaystyle{{\kappa}_{w}}={{\left({{\mathbf{x}}_{w}}\right)}^{T}}{{\mathbf{s}}_{w}}/k
τw=1,\displaystyle{{\tau}_{w}}=1,

where the coefficient ω[0,1]\omega\in[0,1].

Then, we investigate the iterations required when using the warm starting scheme in Algorithm 1, which has the best known worst case iteration complexity in IPMs for SOCP. Considering the inital primal residual, dual residual, and complementary gap are different for cold and warm starting, we use an unified stop creteria, as

𝐫p(𝐪)ϵu,𝐫d(𝐪)ϵu,μ(𝐪)ϵu.\left\|\mathbf{r}_{p}(\mathbf{q})\right\|\leq\epsilon_{u},\left\|\mathbf{r}_{d}(\mathbf{q})\right\|\leq\epsilon_{u},\mu\left(\mathbf{q}\right)\leq\epsilon_{u}. (107)

Then, according to proposition 5, when the parameters γ\gamma and δ\delta satisfy Eq. (102), the iterations required is

Nit=log(max(𝐫p(𝐪w),𝐫d(𝐪w),μ(𝐪w))/ϵu)/log(1δ/2k+1).N_{it}=-\log\left(\max\left(\left\|\mathbf{r}_{p}(\mathbf{q}_{w})\right\|,\left\|\mathbf{r}_{d}(\mathbf{q}_{w})\right\|,\mu(\mathbf{q}_{w})\right)/\epsilon_{u}\right)/\log\left(1-\delta/\sqrt{2k+1}\right). (108)

Obviously, a sufficient condition for the warm starting to improve the iteration bound is

𝐫p(𝐪w)cw𝐫p(𝐪c),𝐫d(𝐪w)cw𝐫d(𝐪c),μ(𝐪w)cw,𝐪w𝒩2(γ),\left\|\mathbf{r}_{p}(\mathbf{q}_{w})\right\|\leq c_{w}\left\|\mathbf{r}_{p}(\mathbf{q}_{c})\right\|,\left\|\mathbf{r}_{d}(\mathbf{q}_{w})\right\|\leq c_{w}\left\|\mathbf{r}_{d}(\mathbf{q}_{c})\right\|,\mu(\mathbf{q}_{w})\leq c_{w},\mathbf{q}_{w}\in\mathcal{N}_{2}(\gamma), (109)

where cw(0,1)c_{w}\in(0,1) is a constant. Under the condition, the iterations can be reduced by log(cw)-\log(c_{w}) compared with cold starting.

In the rest of this section, we study the conditions for the warm starting to improve the inital primal residual, dual residual, complementary gap by at least a constant factor cw(0,1)c_{w}\in(0,1), and to keep the inital point being in the central path neighorhood.

4.1 The conditions to improve the primal residual

In this subsection, we analyze the primal residual 𝐫p(𝐪w)\mathbf{r}_{p}(\mathbf{q}_{w}).

Assume that

Δ𝐀𝐱ocA𝐫p(𝐪c),\displaystyle\left\|\Delta\mathbf{A}\right\|\left\|{{\mathbf{x}}_{o}}\right\|\leq{{c}_{A}}{{\mathbf{r}}_{p}}({{\mathbf{q}}_{c}}), (110)
Δ𝐛cb𝐫p(𝐪c),\displaystyle\left\|\Delta\mathbf{b}\right\|\leq{{c}_{b}}{{\mathbf{r}}_{p}}({{\mathbf{q}}_{c}}),
𝐫po(𝐪o)cp𝐫p(𝐪c),\displaystyle\mathbf{r}_{p}^{o}({{\mathbf{q}}_{o}})\leq{c_{p}}{{\mathbf{r}}_{p}}({{\mathbf{q}}_{c}}),
cA+cb+cp1.\displaystyle{{c}_{A}}+{{c}_{b}}+{c_{p}}\leq 1.

Conbining the assumption and the definition Eq. (106), the primal residual satisfies

𝐫p(𝐪w)=𝐀𝐱w𝐛τw=ω(𝐀𝐱o𝐛)+(1ω)𝐫p(𝐪c)\displaystyle{{\mathbf{r}}_{p}}({{\mathbf{q}}_{w}})=\mathbf{A}{{\mathbf{x}}_{w}}-\mathbf{b}{{\tau}_{w}}=\omega\left(\mathbf{A}{{\mathbf{x}}_{o}}-\mathbf{b}\right)+\left(1-\omega\right){{\mathbf{r}}_{p}}({{\mathbf{q}}_{c}}) (111)
=ω((𝐀o+Δ𝐀)𝐱o(𝐛o+Δ𝐛))+(1ω)𝐫p(𝐪c)\displaystyle=\omega\left(\left({{\mathbf{A}}_{o}}+\Delta\mathbf{A}\right){{\mathbf{x}}_{o}}-\left({{\mathbf{b}}_{o}}+\Delta\mathbf{b}\right)\right)+\left(1-\omega\right){{\mathbf{r}}_{p}}({{\mathbf{q}}_{c}})
=ω𝐫Po(𝐪o)+ω(Δ𝐀𝐱oΔ𝐛)+(1ω)𝐫p(𝐪c).\displaystyle=\omega\mathbf{r}_{P}^{o}({{\mathbf{q}}_{o}})+\omega\left(\Delta\mathbf{A}{{\mathbf{x}}_{o}}-\Delta\mathbf{b}\right)+\left(1-\omega\right){{\mathbf{r}}_{p}}({{\mathbf{q}}_{c}}).

Consequently,

𝐫p(𝐪w)(1ω(1cAcbcp))𝐫p(𝐪c).\left\|{{\mathbf{r}}_{p}}({{\mathbf{q}}_{w}})\right\|\leq\left(1-\omega\left(1-{{c}_{A}}-{{c}_{b}}-{c_{p}}\right)\right)\|{{\mathbf{r}}_{p}}({\mathbf{q}}_{c})\|. (112)

Then, we obtain

𝐫p(𝐪w)cw𝐫p(𝐪c), if cw(1ω(1cAcbcp)).\left\|\mathbf{r}_{p}(\mathbf{q}_{w})\right\|\leq c_{w}\left\|\mathbf{r}_{p}(\mathbf{q}_{c})\right\|,\text{ if }c_{w}\geq\left(1-\omega\left(1-{{c}_{A}}-{{c}_{b}}-{c_{p}}\right)\right). (113)

The result shows that when the previous primal residual and the difference of problem paramters are small, warm starting with the coefficient ω\omega close to 1 can reduce the primal residual compared with cold starting.

4.2 The conditions to improve the dual residual

In this subsection, we analyze the dual residual 𝐫d(𝐪w)\mathbf{r}_{d}(\mathbf{q}_{w}).

Assume that

Δ𝐀T𝐲ocAT𝐫d(𝐪c),\displaystyle\left\|\Delta{{\mathbf{A}}^{T}}\right\|\left\|{{\mathbf{y}}_{o}}\right\|\leq{{c}_{AT}}{{\mathbf{r}}_{d}}({{\mathbf{q}}_{c}}), (114)
Δ𝐜cc𝐫d(𝐪c),\displaystyle\left\|\Delta\mathbf{c}\right\|\leq{{c}_{c}}{{\mathbf{r}}_{d}}({{\mathbf{q}}_{c}}),
𝐫do(𝐳o)cd𝐫d(𝐪c),\displaystyle\mathbf{r}_{d}^{o}({{\mathbf{z}}_{o}})\leq{{c}_{d}}{{\mathbf{r}}_{d}}({{\mathbf{q}}_{c}}),
cd+cAT+cc1.\displaystyle{{c}_{d}}+{{c}_{AT}}+{{c}_{c}}\leq 1.

Then, using the definition Eq. (106),

𝐫d(𝐪w)=𝐀T𝐲w𝐬w+τw𝐜\displaystyle{{\mathbf{r}}_{d}}({{\mathbf{q}}_{w}})=-{{\mathbf{A}}^{T}}{{\mathbf{y}}_{w}}-{{\mathbf{s}}_{w}}+{{\tau}_{w}}\mathbf{c} (115)
=ω(𝐀o+Δ𝐀)T𝐲oω𝐬o(1ω)𝐞+ω(𝐜o+Δ𝐜)+(1ω)𝐜,\displaystyle=-\omega{{\left({{\mathbf{A}}_{o}}+\Delta\mathbf{A}\right)}^{T}}{{\mathbf{y}}_{o}}-\omega{{\mathbf{s}}_{o}}-\left(1-\omega\right)\mathbf{e}+\omega\left({{\mathbf{c}}_{o}}+\Delta\mathbf{c}\right)+\left(1-\omega\right)\mathbf{c},
=ω𝐫do(𝐪o)+ω(Δ𝐀T𝐲o+Δ𝐜)+(1ω)𝐫d(𝐪c),\displaystyle=\omega\mathbf{r}_{d}^{o}({{\mathbf{q}}_{o}})+\omega\left(-\Delta{{\mathbf{A}}^{T}}{{\mathbf{y}}_{o}}+\Delta\mathbf{c}\right)+\left(1-\omega\right){{\mathbf{r}}_{d}}({{\mathbf{q}}_{c}}),
𝐫d(𝐳w)(1ω(1cdcATcc))𝐫d(𝐪c).\displaystyle\left\|{{\mathbf{r}}_{d}}({{\mathbf{z}}_{w}})\right\|\leq\left(1-\omega\left(1-{{c}_{d}}-{{c}_{AT}}-{{c}_{c}}\right)\right)\|{{\mathbf{r}}_{d}}({{\mathbf{q}}_{c}})\|.

Then, we obtain

𝐫d(𝐪w)cw𝐫d(𝐪c), if cw(1ω(1cdcATcc)).\left\|\mathbf{r}_{d}(\mathbf{q}_{w})\right\|\leq c_{w}\left\|\mathbf{r}_{d}(\mathbf{q}_{c})\right\|,\text{ if }c_{w}\geq\left(1-\omega\left(1-{{c}_{d}}-{{c}_{AT}}-{{c}_{c}}\right)\right). (116)

The result shows that when the previous dual residual and the difference of problem paramters are small, warm starting with the coefficient ω\omega close to 1 can reduce the dual residual compared with cold starting.

4.3 The conditions to improve the complementary gap

In this subsection, we analyze the central path coefficient μ(𝐪w)\mu(\mathbf{q}_{w}), that is propotional to the complementary gap.

Assume that

μ(𝐪w)cμμ(𝐪c),\displaystyle\mu({{\mathbf{q}}_{w}})\leq{{c}_{\mu}}\mu({{\mathbf{q}}_{c}}), (117)
(1ω)(𝐞T(𝐱o+𝐬o)/k+1)=cxs,\displaystyle\left(1-\omega\right)\left(\mathbf{e}^{T}({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})/k+1\right)={{c}_{xs}},
cμ+cxs1.\displaystyle{{c}_{\mu}}+{{c}_{xs}}\leq 1.

Then, we have

(𝐱w)T𝐬w=(ω𝐱o+(1ω)𝐞)T(ω𝐬o+(1ω)𝐞)\displaystyle{{\left({{\mathbf{x}}_{w}}\right)}^{T}}{{\mathbf{s}}_{w}}={{\left(\omega{{\mathbf{x}}_{o}}+\left(1-\omega\right)\mathbf{e}\right)}^{T}}\left(\omega{{\mathbf{s}}_{o}}+\left(1-\omega\right)\mathbf{e}\right) (118)
=ω2(𝐱o)T𝐬o+ω(1ω)𝐞T(𝐱o+𝐬o)+(1ω)2k.\displaystyle={{\omega}^{2}}{{\left({{\mathbf{x}}_{o}}\right)}^{T}}{{\mathbf{s}}_{o}}+\omega\left(1-\omega\right){{\mathbf{e}}^{T}}\left({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}}\right)+{{\left(1-\omega\right)}^{2}}k.

Due to the definition Eq. (106),

μ(𝐪w)=(𝐱w)T𝐬w+κwτwk+1=(𝐱w)T𝐬wk\displaystyle\mu({{\mathbf{q}}_{w}})=\frac{{{\left({{\mathbf{x}}_{w}}\right)}^{T}}{{\mathbf{s}}_{w}}+{{\kappa}_{w}}{{\tau}_{w}}}{k+1}=\frac{{{\left({{\mathbf{x}}_{w}}\right)}^{T}}{{\mathbf{s}}_{w}}}{k} (119)
=ω2μ(𝐪o)+ω(1ω)k𝐞T(𝐱o+𝐬o)+(1ω)2\displaystyle={{\omega}^{2}}\mu({{\mathbf{q}}_{o}})+\frac{\omega\left(1-\omega\right)}{k}{{\mathbf{e}}^{T}}\left({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}}\right)+{{\left(1-\omega\right)}^{2}}
cμ+(1ω)(𝐞T(𝐱o+𝐬o)/k+1)\displaystyle\leq{{c}_{\mu}}+\left(1-\omega\right)(\mathbf{e}^{T}({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})/k+1)
=cμ+cxs=(cμ+cxs)μ(𝐪c).\displaystyle={{c}_{\mu}}+{{c}_{xs}}=\left({{c}_{\mu}}+{{c}_{xs}}\right)\mu({{\mathbf{q}}_{c}}).

Then, we obtain

μ(𝐪w)cwμ(𝐪c), if cwcμ+cxs.\mu(\mathbf{q}_{w})\leq c_{w}\mu(\mathbf{q}_{c}),\text{ if }c_{w}\geq{{c}_{\mu}}+{{c}_{xs}}. (120)

The second equation of Eq. (117) requires

ω1cxs/(𝐞T(𝐱o+𝐬o)/k+1),\omega\geq 1-c_{xs}/(\mathbf{e}^{T}({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})/k+1), (121)

that show ω\omega should be close to 1.

The result shows that when the previous complementary gap is small, warm starting with the coefficient ω\omega close to 1 can reduce the complementary gap compared with cold starting.

4.4 The conditions to maintain centrality

Finaly, we study the condition to ensure 𝐪w𝒩2(γ)\mathbf{q}_{w}\in\mathcal{N}_{2}(\gamma), which is required for the iteration number relation Eq.(108) to hold.

Because of Eq. (99) that shows the centrality can be improved in the solving process of Algorithm 1, we can assume that 𝐪o𝒩2(γ)\mathbf{q}_{o}\in\mathcal{N}_{2}(\gamma) and d2(𝐪o)γoμ(𝐪o)<γμ(𝐪o)d_{2}(\mathbf{q}_{o})\leq\gamma_{o}\mu(\mathbf{q}_{o})<\gamma\mu(\mathbf{q}_{o}). Besides, the cold starting point is perfectly centered, as 𝐪w𝒩2(γ)\mathbf{q}_{w}\in\mathcal{N}_{2}(\gamma) and d2(𝐪c)=0d_{2}(\mathbf{q}_{c})=0. Since int(𝒦)\text{int}(\mathcal{K}) is convex, by the definition Eq. (106), 𝐪wint(𝒦)\mathbf{q}_{w}\in\text{int}(\mathcal{K}).

Then, in order to ensure 𝐪w𝒩2(γ)\mathbf{q}_{w}\in\mathcal{N}_{2}(\gamma), we have to ensure d2(𝐪w)<γμ(𝐪w)d_{2}(\mathbf{q}_{w})<\gamma\mu(\mathbf{q}_{w}). To investigate d2(𝐪w)d_{2}(\mathbf{q}_{w}), we firstly study

𝐰xw,sw=𝐓xw𝐬w.\mathbf{w}_{x_{w},s_{w}}=\mathbf{T}_{x_{w}}\mathbf{s}_{w}. (122)

Due to Eq. (45),

𝐓x=mat(𝐱)𝐔x,𝐔x𝐞=0.\mathbf{T}_{x}=\text{mat}(\mathbf{x})-\mathbf{U}_{x},\mathbf{U}_{x}\mathbf{e}=0. (123)

Denote

𝐔~=𝐔xwω𝐔xo.\tilde{\mathbf{U}}=\mathbf{U}_{x_{w}}-\omega\mathbf{U}_{x_{o}}. (124)

Due to Eq. (22), Eq. (45), and Eq. (106), we have

𝐔xw(i)ω𝐔xo(i)=(000ρ~i𝐏xo(i)),\mathbf{U}_{x_{w}^{(i)}}-\omega\mathbf{U}_{x_{o}^{(i)}}=\begin{pmatrix}0&0\\ 0&\tilde{\rho}_{i}\mathbf{P}_{x_{o}^{(i)}}\end{pmatrix}, (125)

where

ρ~i=(xw(i))1ω(xo(i))1βxw(i)+ωβxo(i)\displaystyle\tilde{\rho}_{i}=(x_{w}^{(i)})_{1}-\omega(x_{o}^{(i)})_{1}-\beta_{x_{w}^{(i)}}+\omega\beta_{x_{o}^{(i)}} (126)
=(xw(i))1ω(xo(i))1(xw(i))12(xw(i))2:n2+ω(xo(i))12(xo(i))2:n2\displaystyle=(x_{w}^{(i)})_{1}-\omega(x_{o}^{(i)})_{1}-\sqrt{(x_{w}^{(i)})_{1}^{2}-\|(x_{w}^{(i)})_{2:n}\|^{2}}+\omega\sqrt{(x_{o}^{(i)})_{1}^{2}-\|(x_{o}^{(i)})_{2:n}\|^{2}}
=(1ω)(12ω(xo(i))1+1ωβxw(i)+ωβxo(i)).\displaystyle=(1-\omega)\left(1-\frac{2\omega(x_{o}^{(i)})_{1}+1-\omega}{\beta_{x_{w}^{(i)}}+\omega\beta_{x_{o}^{(i)}}}\right).

Because Eq. (47) shows 𝐏xo1\left\|\mathbf{P}_{x_{o}}\right\|\leq 1, we obtain

𝐔~(1ω)ρ,ρ=maxi(12ω(xo(i))1+1ωβxw(i)+ωβxo(i)).\left\|\tilde{\mathbf{U}}\right\|\leq(1-\omega)\rho,\ \rho=\max_{i}\left(1-\frac{2\omega(x_{o}^{(i)})_{1}+1-\omega}{\beta_{x_{w}^{(i)}}+\omega\beta_{x_{o}^{(i)}}}\right). (127)

Then, using the definition Eq. (106),

𝐰xw,sw=𝐓xw𝐬w=𝐱w𝐬w𝐔xw𝐬w\displaystyle\mathbf{w}_{x_{w},s_{w}}=\mathbf{T}_{x_{w}}\mathbf{s}_{w}=\mathbf{x}_{w}\circ\mathbf{s}_{w}-\mathbf{U}_{x_{w}}\mathbf{s}_{w} (128)
=ω2𝐱o𝐬o+ω(1ω)(𝐱o+𝐬o)+(1ω)2𝐞𝐔xw(ω𝐬o+(1ω)𝐞)\displaystyle=\omega^{2}\mathbf{x}_{o}\circ\mathbf{s}_{o}+\omega(1-\omega)(\mathbf{x}_{o}+\mathbf{s}_{o})+(1-\omega)^{2}\mathbf{e}-\mathbf{U}_{x_{w}}(\omega\mathbf{s}_{o}+(1-\omega)\mathbf{e})
=ω2𝐱o𝐬o+ω(1ω)(𝐱o+𝐬o)+(1ω)2𝐞(ω𝐔xo+𝐔~)ω𝐬o\displaystyle=\omega^{2}\mathbf{x}_{o}\circ\mathbf{s}_{o}+\omega(1-\omega)(\mathbf{x}_{o}+\mathbf{s}_{o})+(1-\omega)^{2}\mathbf{e}-(\omega\mathbf{U}_{x_{o}}+\tilde{\mathbf{U}})\omega\mathbf{s}_{o}
=ω2(mat(𝐱o)𝐔xo)𝐬o+ω(1ω)(𝐱o+𝐬o)+(1ω)2𝐞ω𝐔~𝐬o\displaystyle=\omega^{2}(\text{mat}(\mathbf{x}_{o})-\mathbf{U}_{x_{o}})\mathbf{s}_{o}+\omega(1-\omega)(\mathbf{x}_{o}+\mathbf{s}_{o})+(1-\omega)^{2}\mathbf{e}-\omega\tilde{\mathbf{U}}\mathbf{s}_{o}
=ω2𝐰xo,so+ω(1ω)(𝐱o+𝐬o)+(1ω)2𝐞ω𝐔~𝐬o.\displaystyle=\omega^{2}\mathbf{w}_{x_{o},s_{o}}+\omega(1-\omega)(\mathbf{x}_{o}+\mathbf{s}_{o})+(1-\omega)^{2}\mathbf{e}-\omega\tilde{\mathbf{U}}\mathbf{s}_{o}.

Combining the result with Eq. (119), we obtain

d2(𝐪w)=𝐰xw,swμ(𝐪w)𝐞\displaystyle{{d}_{2}}\left({{\mathbf{q}}_{w}}\right)=\left\|{{\mathbf{w}}_{{{x}_{w}},{{s}_{w}}}}-\mu({{\mathbf{q}}_{w}})\mathbf{e}\right\| (129)
ω2𝐰xo,soμ(𝐪o)𝐞+ω(1ω)((𝐱o+𝐬o)𝐞Tk(𝐱o+𝐬o)𝐞)ω𝐔~𝐬o\displaystyle\leq{{\omega}^{2}}\left\|{{\mathbf{w}}_{{{x}_{o}},{{s}_{o}}}}-\mu({{\mathbf{q}}_{o}})\mathbf{e}\right\|+\left\|\omega(1-\omega)\left(({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})-\frac{{{\mathbf{e}}^{T}}}{k}\left({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}}\right)\mathbf{e}\right)-\omega\widetilde{\mathbf{U}}{{\mathbf{s}}_{o}}\right\|
ω2d2(𝐪o)+ω(1ω)(((𝐱o+𝐬o)ψo𝐞)+ρ𝐬o),\displaystyle\leq{{\omega}^{2}}{{d}_{2}}\left({{\mathbf{q}}_{o}}\right)+\omega(1-\omega)\left(\left\|\left(({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})-\psi_{o}\mathbf{e}\right)\right\|+\rho\left\|{{\mathbf{s}}_{o}}\right\|\right),

where

ψo=𝐞Tk(𝐱o+𝐬o).\psi_{o}=\frac{{{\mathbf{e}}^{T}}}{k}\left({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}}\right). (130)

Then, we obtain a sufficient condition for d2(𝐪w)<γμ(𝐪w)d_{2}(\mathbf{q}_{w})<\gamma\mu(\mathbf{q}_{w}) by the following derivation:

d2(𝐪w)γμ(𝐪w)\displaystyle{{d}_{2}}\left({{\mathbf{q}}_{w}}\right)\leq\gamma\mu({{\mathbf{q}}_{w}}) (131)
ω2d2(𝐪o)+ω(1ω)(((𝐱o+𝐬o)ψo𝐞)+ρ𝐬o)\displaystyle\Leftarrow{{\omega}^{2}}{{d}_{2}}\left({{\mathbf{q}}_{o}}\right)+\omega(1-\omega)\left(\left\|\left(({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})-{{\psi}_{o}}\mathbf{e}\right)\right\|+\rho\left\|{{\mathbf{s}}_{o}}\right\|\right)
γ(ω2μ(𝐪o)+ω(1ω)ψo+(1ω)2)\displaystyle\leq\gamma\left({{\omega}^{2}}\mu({{\mathbf{q}}_{o}})+\omega\left(1-\omega\right){{\psi}_{o}}+{{\left(1-\omega\right)}^{2}}\right)
ωγoμ(𝐪o)+(1ω)(((𝐱o+𝐬o)ψo𝐞)+ρ𝐬o)γ(ωμ(𝐪o)+(1ω)ψo)\displaystyle\Leftarrow\omega{{\gamma}_{o}}\mu({{\mathbf{q}}_{o}})+(1-\omega)\left(\left\|\left(({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})-{{\psi}_{o}}\mathbf{e}\right)\right\|+\rho\left\|{{\mathbf{s}}_{o}}\right\|\right)\leq\gamma\left(\omega\mu({{\mathbf{q}}_{o}})+\left(1-\omega\right){{\psi}_{o}}\right)
ξoω(ξo+(γγo)μ(𝐪o))\displaystyle\Leftarrow{{\xi}_{o}}\leq\omega\left({{\xi}_{o}}+\left(\gamma-{{\gamma}_{o}}\right)\mu({{\mathbf{q}}_{o}})\right)

where

ξo=((𝐱o+𝐬o)ψo𝐞)+ρ𝐬oγψo.{{\xi}_{o}}=\left\|\left(({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})-{{\psi}_{o}}\mathbf{e}\right)\right\|+\rho\left\|{{\mathbf{s}}_{o}}\right\|-\gamma{{\psi}_{o}}. (132)

Then, the sufficient condition of 𝐪w𝒩2(γ)\mathbf{q}_{w}\in\mathcal{N}_{2}(\gamma) is

ω[0,1] if ξo0\displaystyle\omega\in[0,1]\text{ if }{{\xi}_{o}}\leq 0 (133)
ω[ξoξo+(γγo)μ(𝐪o),1], if ξo>0.\displaystyle\omega\in\left[\frac{{{\xi}_{o}}}{{{\xi}_{o}}+\left(\gamma-{{\gamma}_{o}}\right)\mu({{\mathbf{q}}_{o}})},1\right],\text{ if }{{\xi}_{o}}>0.

The result shows when 𝐪o\mathbf{q}_{o} is closed to the boundary of the cone constrains, the coefficient ω\omega to ensure 𝐪w𝒩2(γ)\mathbf{q}_{w}\in\mathcal{N}_{2}(\gamma) may be very close to 1. The reason is twofold: firstly, near the boundary, the corresponding element in 𝐱o\mathbf{x}_{o} and 𝐬o\mathbf{s}_{o} may be different for many orders of magnitude, making ((𝐱o+𝐬o)ψo𝐞)0\left\|\left(({{\mathbf{x}}_{o}}+{{\mathbf{s}}_{o}})-{{\psi}_{o}}\mathbf{e}\right)\right\|\gg 0; secondly, by Eq. (127), when 𝐱o\mathbf{x}_{o} is close to the boundary, ρ\rho may be very large. In that case, little correction for the previous solution can be added. Typically, if the exact solution is on the boundary, as the number of iterations increases, the solution gets closer to the boundary. Consequently, an early stage inexact solution of the previous problem is preferred in the warm starting scheme, because less computation is cost to solve the previous problem, as well as it is easier to maintain centrality. It is similar to [Yildirim and Wright, 2002], which also prefers early stage inexact solutions for warm starting of feasible IPMs for linear programing.

However, when the differences of problem parameters defined in Eq. (104) are very small, and the previous solution is well centered, little correction is required. Consequently, the warm starting scheme with ω\omega very close to 1 is applicable in that case. For example, in some successive convexification applications, the increment of 𝐀\mathbf{A}, 𝐛\mathbf{b}, and 𝐜\mathbf{c} between problems depend on the increment of 𝐱\mathbf{x}, and the Jacobian matrix is bounded. In the late stage of these applications, the differences of problem parameters are small since the increment of 𝐱\mathbf{x} is small. As a result, the warm starting scheme with ω\omega very close to 1 may also work well.

As a summary of this section, when the differences of problem parameters are small, and the previous solution is well centered with small primal residual, dual residual, and complementary gap, the warm starting scheme with ω\omega close to 1 can improve the worst case iteration complexity compared with cold starting.

5 Conclusion

This paper shows that an infeasible IPM for SOCP has O(k1/2log(ϵ1))O\left(k^{1/2}\log\left(\epsilon^{-1}\right)\right) iteration complexity, which is the same as the best known result of feasible IPMs. Compared with cold starting, a warm starting scheme can reduce the iterations required, when the differences of problem parameters are small, and the previous solution is well centered with small primal residual, dual residual, and complementary gap.

References

  • Andersen et al. [2003] E. D. Andersen, C. Roos, and T. Terlaky. On implementing a primal-dual interior-point method for conic quadratic optimization. Mathematical Programming, 95(2):249–277, 2003.
  • Chen et al. [2022] Yushu Chen, Guangwen Yang, Lu Wang, Qingzhong Gan, Haipeng Chen, and Quanyong Xu. A fast algorithm for onboard atmospheric powered descent guidance. ArXiv, abs/2209.04157, 2022.
  • Faraut and Koranyi [1994] J. Faraut and A. Koranyi. Analysis on symmetric cones. oxford mathematical monographs, 1994.
  • Faybusovich [1997a] Leonid Faybusovich. Linear systems in jordan algebras and primal-dual interior-point algorithms. Journal of Computational and Applied Mathematics, 86(1):149–175, 1997a. ISSN 0377-0427. doi: 10.1016/S0377-0427(97)00153-2.
  • Faybusovich [1997b] Leonid Faybusovich. Euclidean jordan algebras and interior-point algorithms. Positivity, 1:331–357, 12 1997b. doi: 10.1023/A:1009701824047.
  • Helmberg et al. [1996] Christoph Helmberg, Franz Rendl, Robert J. Vanderbei, and Henry Wolkowicz. An interior-point method for semidefinite programming. SIAM Journal on Optimization, 6(2):342–361, 1996. doi: 10.1137/0806020. URL https://doi.org/10.1137/0806020.
  • Kojima et al. [1989] Masakazu Kojima, Shinji Mizuno, and Akiko Yoshise. A polynomial-time algorithm for a class of linear complementarity problems. Mathematical programming, 44(1):1–26, 1989.
  • Kojima et al. [1997] Masakazu Kojima, Susumu Shindoh, and Shinji Hara. Interior-point methods for the monotone semidefinite linear complementarity problem in symmetric matrices. SIAM Journal on Optimization, 7(1):86–125, 1997. doi: 10.1137/S1052623494269035. URL https://doi.org/10.1137/S1052623494269035.
  • Miao et al. [2021] Xinyuan Miao, Yu Song, Zhiguo Zhang, and Shengping Gong. Successive convexification for ascent trajectory replanning of a multi-stage launch vehicle experiencing non-fatal dynamic faults. IEEE Transactions on Aerospace and Electronic Systems, pages 1–1, 2021. doi: 10.1109/TAES.2021.3133310.
  • Mohammad-Nezhad and Terlaky [2019] Ali Mohammad-Nezhad and Tamás Terlaky. Quadratic convergence to the optimal solution of second-order conic optimization without strict complementarity. Optimization Methods and Software, 34(5):960–990, 2019. doi: 10.1080/10556788.2018.1528249. URL https://doi.org/10.1080/10556788.2018.1528249.
  • Monteiro [1997] Renato D. C. Monteiro. Primal–dual path-following algorithms for semidefinite programming. SIAM Journal on Optimization, 7(3):663–678, 1997. doi: 10.1137/S1052623495293056.
  • Monteiro and Tsuchiya [2000] Renato D. C. Monteiro and T. Tsuchiya. Polynomial convergence of primal-dual algorithms for the second-order cone program based on the mz-family of directions. Mathematical Programming, 88(1):61–83, 2000.
  • Monteiro and Zhang [1998] Renato DC Monteiro and Yin Zhang. A unified analysis for a class of long-step primal-dual path-following interior-point algorithms for semidefinite programming. Mathematical Programming, 81(3):281–299, 1998. doi: 10.1007/BF01580085.
  • Nemirovskii and Nesterov [1993] A. S. Nemirovskii and Y. E. Nesterov. Interior-point polynomial algorithms in convex programming. Studies in Applied Mathematics Philadelphia SIAM, 1993.
  • Nesterov and Todd [1998] Yu. E. Nesterov and M. J. Todd. Primal-dual interior-point methods for self-scaled cones. SIAM Journal on Optimization, 8(2):324–364, 1998. doi: 10.1137/S1052623495290209. URL https://doi.org/10.1137/S1052623495290209.
  • Nesterov and Todd [1997] Yurii E Nesterov and Michael Jeremy Todd. Self-scaled barriers and interior-point methods for convex programming. Mathematics of Operations Research, 1997.
  • Peng et al. [2003] Jiming Peng, Cornelis Roos, and Tamás Terlaky. Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton University Press, 01 2003. ISBN 9781400825134. doi: 10.1515/9781400825134.
  • Potra and Wright [2000] Florian A. Potra and Stephen J. Wright. Interior-point methods. Journal of Computational and Applied Mathematics, 124(1):281–302, 2000. ISSN 0377-0427. doi: https://doi.org/10.1016/S0377-0427(00)00433-7. URL https://www.sciencedirect.com/science/article/pii/S0377042700004337. Numerical Analysis 2000. Vol. IV: Optimization and Nonlinear Equations.
  • Rangarajan [2006] Bharath Kumar Rangarajan. Polynomial convergence of infeasible-interior-point methods over symmetric cones. SIAM Journal on Optimization, 16(4):1211–1229, 2006. doi: 10.1137/040606557. URL https://doi.org/10.1137/040606557.
  • Skajaa et al. [2013] Anders Skajaa, Erling D Andersen, and Yinyu Ye. Warmstarting the homogeneous and self-dual interior point method for linear and conic quadratic problems. Mathematical Programming Computation, 5(1):1–25, 2013.
  • Szmuk et al. [2020] Michael Szmuk, Taylor P. Reynolds, and Behçet Açıkmeşe. Successive convexification for real-time six-degree-of-freedom powered descent guidance with state-triggered constraints. Journal of Guidance, Control, and Dynamics, 43(8):1399–1413, 2020. doi: 10.2514/1.G004549. URL https://doi.org/10.2514/1.G004549.
  • Terlaky et al. [2002] Tamás Terlaky, Cornelis Roos, and Jiming Peng. Self-regularity: A New Paradigm for Primal-dual Interior-point Algorithms (Princeton Series in Applied Mathematics). Princeton University Press, 2002.
  • Tsuchiya [1999] Takashi Tsuchiya. A convergence analysis of the scaling-invariant primal-dual path-following algorithms for second-order cone programming. Optimization Methods and Software, 11(1-4):141–182, 1999. doi: 10.1080/10556789908805750.
  • Wang [2003] Bixiang Wang. Implementation of interior point methods for second order conic optimization. PhD thesis, McMaster University, 2003.
  • Yildirim and Wright [2002] E. Alper Yildirim and Stephen J. Wright. Warm-start strategies in interior-point methods for linear programming. SIAM Journal on Optimization, 12(3):782–810, 2002. doi: 10.1137/S1052623400369235.
  • Zhang [1998] Yin Zhang. On extending some primal–dual interior-point algorithms from linear programming to semidefinite programming. SIAM Journal on Optimization, 8(2):365–386, 1998. doi: 10.1137/S1052623495296115. URL https://doi.org/10.1137/S1052623495296115.