This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rapid heuristic projection on simplicial cones thanks: 1991 A M S Subject Classification. Primary 90C33; Secondary 15A48, Key words and phrases. Metric projection on simplicial cones

A. Ekárt
Computer Science, Aston University
Aston Triangle, Birmingham B4 7ET
United Kingdom
email: a.ekart@aston.ac.uk
   A. B. Németh
Faculty of Mathematics and Computer Science
Babeş Bolyai University, Str. Kogălniceanu nr. 1-3
RO-400084 Cluj-Napoca, Romania
email: nemab@math.ubbcluj.ro
   S. Z. Németh
School of Mathematics, The University of Birmingham
The Watson Building, Edgbaston
Birmingham B15 2TT, United Kingdom
email: nemeths@for.mat.bham.ac.uk
Abstract

A very fast heuristic iterative method of projection on simplicial cones is presented. It consists in solving two linear systems at each step of the iteration. The extensive experiments indicate that the method furnishes the exact solution in more then 99.7 percent of the cases. The average number of steps is 5.67 (we have not found any examples which required more than 13 steps) and the relative number of steps with respect to the dimension decreases dramatically. Roughly speaking, for high enough dimensions the absolute number of steps is independent of the dimension.

1 Introduction

Projection on polyhedral cones is one of the important problems applied optimization is confronted with. Many applied numerical optimization methods uses projection on polyhedral cones as the main tool.

In most of them, projection is part of an iterative process which involve its repeated application (see e. g. problems of image reconstruction [1], nonlinear complementarity [4, 9], etc.). Hence, it is important to get fast projection algorithms.

The main streems of the current methods in use rely on the classical von Neumann algorithm (see e.g. the Dykstra algorithm [3, 2, 10]), but they are rather expensive for a numerical handling (see the numerical results in [7] and the remark preceding section 6.3 in [5]).

Finite methods of projections are of combinatorial nature which reduces their applicability to low dimensional ambient spaces.

Recently we have given a simple projection method exposed in note [8] for projecting on so called isotone projection cones. Isotone projection cones are special simplicial cones, and due to their good properties we can project on them in nn steps, where nn is the dimension of the ambient space. In the first part of that note we have explained our approach by considering the problem of projection on simplicial cones by giving an exact method based on duality. This method has combinatorial character and therefore it is inefficient. More recently we observed that a heuristic method based on the same ideas gives surprisingly good results. This note describes the theoretical foundation of the heuristic method and draws conclusions based on millions of numerical experiments.

Projection on polyhedral cones is a problem of high impact on scientific community.111see the popularity of the Wikimization page Projection on Polyhedral Cone at http://www.convexoptimization.com/wikimization/index.php/Special:Popularpages.

2 The simplicial cone and its polar

Let n\mathbb{R}^{n} be an nn-dimensional Euclidean space endowed with a Cartesian reference system. We assume that each point of n\mathbb{R}^{n} is a column vector.

We shall use the term cone in the sense of closed convex cone. That is, the nonempty closed subset KnK\subset\mathbb{R}^{n} in our terminology is a cone, if K+KKK+K\subset K, and tKKtK\subset K whenever t,t0t\in\mathbb{R},\;t\geq 0.

Let mnm\leq n and 𝐞1,,𝐞m\mathbf{e}_{1},\dots,\mathbf{e}_{m} be mm elements in n\mathbb{R}^{n}. Denote

cone{𝐞1,,𝐞m}={λ1𝐞1++λm𝐞m:λi0,i=1,,m},\operatorname{cone}\{\mathbf{e}_{1},\dots,\mathbf{e}_{m}\}=\{\lambda^{1}\mathbf{e}_{1}+\dots+\lambda^{m}\mathbf{e}_{m}:\lambda^{i}\geq 0,\,i=1,\dots,m\},

the cone engendered by 𝐞1,,𝐞m\mathbf{e}_{1},\dots,\mathbf{e}_{m}. Then,

cone{𝐞1,,𝐞m}={𝐄𝐯:𝐯+m},\operatorname{cone}\{\mathbf{e}_{1},\dots,\mathbf{e}_{m}\}=\{\mathbf{E}\mathbf{v}:\mathbf{v}\in\mathbb{R}^{m}_{+}\}, (1)

where 𝐄=(𝐞1,,𝐞m)\mathbf{E}=(\mathbf{e}_{1},\dots,\mathbf{e}_{m}) is the matrix with columns 𝐞1,,𝐞m\mathbf{e}_{1},\dots,\mathbf{e}_{m} and +m\mathbb{R}^{m}_{+} is the non-negative orthant in m\mathbb{R}^{m}.

Suppose that 𝐞1,,𝐞nn\mathbf{e}_{1},\dots,\mathbf{e}_{n}\in\mathbb{R}^{n} are linearly independent elements. Then, the cone

K=cone{𝐞1,,𝐞n}={λ1𝐞1++λn𝐞n:λi0,i=1,,n}={𝐄𝐯:𝐯+m},\begin{array}[]{rcl}K&=&\operatorname{cone}\{\mathbf{e}_{1},\dots,\mathbf{e}_{n}\}\\ &=&\{\lambda^{1}\mathbf{e}_{1}+\dots+\lambda^{n}\mathbf{e}_{n}:\lambda^{i}\geq 0,\,i=1,\dots,n\}=\{\mathbf{E}\mathbf{v}:\mathbf{v}\in\mathbb{R}^{m}_{+}\},\end{array} (2)

with 𝐄\mathbf{E} the matrix from (1) for m=nm=n, is called simplicial cone. Denote N={1,2,,n}N=\{1,2,\dots,n\}.

The polar of KK is the set

K={𝐱n:𝐱𝐲0,𝐲K}.K^{\circ}=\{\mathbf{x}\in\mathbb{R}^{n}:\,\mathbf{x}^{\top}\mathbf{y}\leq 0,\,\forall\mathbf{y}\in K\}. (3)

K=KK^{*}=-K^{\circ} is called the dual of KK. KK is called subdual, if KKK\subset K^{*}. This is equivalent to the condition 𝐞𝐞k0,,kN.\mathbf{e}_{\ell}^{\top}\mathbf{e}_{k}\geq 0,\,\ell,k\in N.

Lemma 1

The polar of the simplicial cone (2) can be represented in the form

K={μ1𝐮1+μn𝐮n:μi0,i=1,,n},K^{\circ}=\{\mu^{1}\mathbf{u}_{1}+\dots\mu^{n}\mathbf{u}_{n}:\mu^{i}\geq 0,\,i=1,\dots,n\}, (4)

where 𝐮i(i=1,,n)\mathbf{u}_{i}(i=1,\dots,n) is a solution of the system

𝐞j𝐮i=0,j=1,,n,ji,\mathbf{e}^{\top}_{j}\mathbf{u}_{i}=0,\;j=1,\dots,n,\;j\neq i,
𝐞i𝐮i=1\mathbf{e}^{\top}_{i}\mathbf{u}_{i}=-1

(𝐮i\mathbf{u}_{i} is normal to the hyperplane span{𝐞1,,𝐞i1,𝐞i+1,,𝐞n}\operatorname{span}\{\mathbf{e}_{1},\dots,\mathbf{e}_{i-1},\mathbf{e}_{i+1},\dots,\mathbf{e}_{n}\} in the opposite direction to the halfspace that contains 𝐞i\mathbf{e}_{i}). Thus,

K={𝐔𝐱:𝐱+n}K^{\circ}=\{\mathbf{U}\mathbf{x}:\mathbf{x}\in\mathbb{R}^{n}_{+}\}

with +n={𝐱=(x1,,xn):xi0,i=1n}\mathbb{R}^{n}_{+}=\{\mathbf{x}=(x_{1},\dots,x_{n})^{\top}:x_{i}\geq 0,\;i=1\dots n\} and

𝐔=(𝐄1).\mathbf{U}=-(\mathbf{E}^{-1})^{\top}. (5)

For simplicity we shall call 𝐔\mathbf{U} the polar matrix of 𝐄\mathbf{E}. The columns of 𝐔\mathbf{U} are {𝐮i\{\mathbf{u}_{i}: i=1,,n}i=1,\dots,n\}.

Proof. Let 𝐲=j=1nμj𝐮j\mathbf{y}=\sum_{j=1}^{n}\mu^{j}\mathbf{u}_{j} and 𝐳=i=1nαi𝐞i\mathbf{z}=\sum_{i=1}^{n}\alpha^{i}\mathbf{e}_{i} for any non-negative real numbers αi\alpha^{i} and μj\mu^{j}. The inner produce of 𝐲\mathbf{y} and 𝐳\mathbf{z} is non-positive because

𝐲𝐳=i=1nj=1nαiμj𝐞i𝐮j=i=1nαiμi0\mathbf{y}^{\top}\mathbf{z}=\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha^{i}\mu^{j}\mathbf{e}_{i}^{\top}\mathbf{u}_{j}=-\sum_{i=1}^{n}\alpha^{i}\mu^{i}\leq 0

But 𝐲\mathbf{y} is an arbitrary element of the right hand side of (4) and 𝐳\mathbf{z} is an arbitrary element of KK, thus we can conclude that the righ hand side of (4) is a subset of KK^{\circ}.

The vectors 𝐮1,,𝐮n\mathbf{u}_{1},\dots,\mathbf{u}_{n} are linearly independent. (This can be verified by assuming the contrary, and by multiplying the subsequent relation by 𝐞j\mathbf{e}_{j} to get a contradiction.) Hence for 𝐲K\mathbf{y}\in K^{\circ} we have the representation

y=β1𝐮1++βn𝐮n.y=\beta^{1}\mathbf{u}_{1}+\dots+\beta^{n}\mathbf{u}_{n}.

By (3),

𝐞k𝐲=βk0\mathbf{e}^{\top}_{k}\mathbf{y}=-\beta^{k}\leq 0

so βk0\beta^{k}\geq 0 which prove that 𝐲\mathbf{y} is an element of the right hand side of (4). Thus we can conclude that KK^{\circ} is a subset of the right hand side of (4). \Box

The formula (5) of Lemma 1 is equivalent to the formula (380) of [1].

Corollary 1

For each subset II of indices in NN, the vectors 𝐞i,iI,𝐮j,jIc\mathbf{e}_{i},i\in I,\mathbf{u}_{j},j\in I^{c} (where IcI^{c} the complement of II with respect to NN) are linearly independent.

Proof. Assume that

iIαi𝐞i+jIcβj𝐮j=0\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}+\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j}=0 (6)

for some reals αi\alpha^{i} and βj.\beta^{j}. By the mutual orthogonality of the vectors 𝐞i,iI\mathbf{e}_{i},i\in I and 𝐮j,jIc\mathbf{u}_{j},j\in I^{c} it follows, by multiplication of the relation (6) with iIαi𝐞i\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}^{\top} and respectively with jIcβj𝐮j\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j}^{\top}, that

iIαi𝐞i=0\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}=0

and

jIcβj𝐮j=0.\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j}=0.

Hence, αi=βj=0\alpha^{i}=\beta^{j}=0 must hold. \Box

The cone K0KK_{0}\subset K is called a face of KK if from 𝐱K0,𝐲K\mathbf{x}\in K_{0},\mathbf{y}\in K and 𝐱𝐲K\mathbf{x}-\mathbf{y}\in K it follows that 𝐲K0.\mathbf{y}\in K_{0}. The face K0K_{0} is called a proper face of KK, if K0K.K_{0}\not=K.

Lemma 2

If KK is the cone (2) and KK^{\circ} is the cone (4), then for every subset of indices I={i1,,ik}NI=\{i_{1},...,i_{k}\}\subset N the set

FI=cone{𝐞i:iI}={𝐱K:𝐱𝐮j=0:iIc}F_{I}=\operatorname{cone}\{\mathbf{e}_{i}:\;i\in I\}=\{\mathbf{x}\in K:\;\mathbf{x}^{\top}\mathbf{u}_{j}=0:\;i\in I^{c}\} (7)

(with FI={𝟎}F_{I}=\{\mathbf{0}\} if I=I=\emptyset) is a face of KK. If ihili_{h}\not=i_{l} whenever hlh\not=l, then FIF_{I} is for k>0k>0 a nonempty set in n\mathbb{R}^{n} of dimension kk. (In the sense that FIF_{I} spans a subspace of n\mathbb{R}^{n} of dimension kk.)

Every face of KK is equal to FIF_{I} for some IN.I\subset N. If INI\not=N then FIF_{I} is a proper face.

Proof.

The relation in (7) follows from the definition of the vectors 𝐮j\mathbf{u}_{j} in Lemma 1, while the assertion on the dimension of FIF_{I} is obvious.

Suppose that 𝐱FI\mathbf{x}\in F_{I} and 𝐲K\mathbf{y}\in K with 𝐲𝐱\mathbf{y}\leq\mathbf{x}.

Then (𝐱𝐲)𝐮j=𝐲𝐮j0(\mathbf{x}-\mathbf{y})^{\top}\mathbf{u}_{j}=-\mathbf{y}^{\top}\mathbf{u}_{j}\leq 0, jIc\forall j\in I^{c} and 𝐲𝐮j0,jN\mathbf{y}^{\top}\mathbf{u}_{j}\leq 0,\;\forall j\in N, because 𝐲K\mathbf{y}\in K. Thus 𝐲𝐮j=0,jIc\mathbf{y}^{\top}\mathbf{u}_{j}=0,\;\forall j\in I^{c}, hence yFI,y\in F_{I}, showing that FIF_{I} is a face.

Suppose that 𝐱F\mathbf{x}\in F for FF an arbitrary proper face of KK. Since 𝐱K\mathbf{x}\in K, by the definition of the vectors 𝐮j,\mathbf{u}_{j}, 𝐱𝐮j0\mathbf{x}^{\top}\mathbf{u}_{j}\leq 0 for jN.j\in N.

If 𝐱𝐮j<0,jN\mathbf{x}^{\top}\mathbf{u}_{j}<0,\;\forall j\in N, then there exists a positive scalar tt with (𝐱t𝐲)𝐮j0,jN(\mathbf{x}-t\mathbf{y})^{\top}\mathbf{u}_{j}\leq 0,\;\forall j\in N. Hence, 𝐱t𝐲K\mathbf{x}-t\mathbf{y}\in K and thus t𝐲𝐱t\mathbf{y}\leq\mathbf{x}. But then t𝐲Ft\mathbf{y}\in F and since FF is a cone, 𝐲F\mathbf{y}\in F. This means that KFK\subset F, that is, FF cannot be a proper face.

We have to show that FF has a representation like (7). By the above reasoning, for each 𝐱F\mathbf{x}\in F there exist some index iNi\in N with with 𝐱𝐮i=0.\mathbf{x}^{\top}\mathbf{u}_{i}=0.

If F={𝟎}F=\{\mathbf{0}\} we have the representation (7) with I=.I=\emptyset.

If F{𝟎}F\not=\{\mathbf{0}\}, take 𝐱\mathbf{x} in the relative interior of FF and let II be the complement in NN of the maximal set of indices jj with 𝐱𝐮j=0.\mathbf{x}^{\top}\mathbf{u}_{j}=0. (II must be a nonempty, proper subset of NN since 𝐱𝟎.\mathbf{x}\not=\mathbf{0}.)

Take 𝐲F\mathbf{y}\in F arbitrarily. By the definition of 𝐱\mathbf{x}, 𝐱t𝐲F\mathbf{x}-t\mathbf{y}\in F for some sufficiently small t>0t>0. Hence,

(𝐱t𝐲)𝐮i0,iN.(\mathbf{x}-t\mathbf{y})^{\top}\mathbf{u}_{i}\leq 0,\;\;\forall i\in N. (8)

By 𝐲FK\mathbf{y}\in F\subset K we also have 𝐲𝐮i0,iN.\mathbf{y}^{\top}\mathbf{u}_{i}\leq 0,\;\forall i\in N. If 𝐲𝐮j<0\mathbf{y}^{\top}\mathbf{u}_{j}<0 for some jIcj\in I^{c}, then (8) would imply

𝐱𝐮jt𝐲𝐮j<0,\mathbf{x}^{\top}\mathbf{u}_{j}\leq t\mathbf{y}^{\top}\mathbf{u}_{j}<0,

which is a contradiction. Hence, we must have 𝐲𝐮j=0,jIc\mathbf{y}^{\top}\mathbf{u}_{j}=0,\;\forall j\in I^{c}; and accordingly

F{𝐳K:𝐳𝐮j=0,jIc}.F\subset\{\mathbf{z}\in K:\;\mathbf{z}^{\top}\mathbf{u}_{j}=0,\;\forall j\in I^{c}\}. (9)

Suppose that 𝐲K\mathbf{y}\in K and 𝐲𝐮j=0,jIc\mathbf{y}^{\top}\mathbf{u}_{j}=0,\;\forall j\in I^{c}. From definition we have 𝐱𝐮i<0\mathbf{x}^{\top}\mathbf{u}_{i}<0 for each iIi\in I, whereby for a sufficiently large t>0t>0,

(t𝐱𝐲)𝐮i0,iN.(t\mathbf{x}-\mathbf{y})^{\top}\mathbf{u}_{i}\leq 0,\;\forall i\in N.

Hence, t𝐱𝐲t\mathbf{x}-\mathbf{y} is in the polar of KK^{\circ}, which by Farkas’ lemma is KK (This follows in fact, in our case, also by the symmetry of the vectors 𝐞i\mathbf{e}_{i} and 𝐮j\mathbf{u}_{j} in the formulae of Lemma 1.) Thus 𝟎𝐲t𝐱\mathbf{0}\leq\mathbf{y}\leq t\mathbf{x}, whereby 𝟎(1/t)𝐲𝐱.\mathbf{0}\leq(1/t)\mathbf{y}\leq\mathbf{x}. Since FF is a face of KK, we have (1/t)𝐲F(1/t)\mathbf{y}\in F and since it is also a cone, 𝐲F.\mathbf{y}\in F. This proves the converse of the inclusion in (9) and completes the proof.

\Box

Thus a maximal proper face of KK is of the form

Ki0=cone{𝐞i:iN{i0}}=cone{𝐞i:iN,𝐞i𝐮i0=0},K_{i_{0}}=\operatorname{cone}\{\mathbf{e}_{i}:i\in N\setminus\{i_{0}\}\}=\operatorname{cone}\{\mathbf{e}_{i}:i\in N,\mathbf{e}^{\top}_{i}\mathbf{u}_{i_{0}}=0\},

hence it is also called the face of KK orthogonal to 𝐮i0\mathbf{u}_{i_{0}}. Similarly, we have a maximal proper face of KK^{\circ} orthogonal to some 𝐞j0\mathbf{e}_{j_{0}}.

An equivalent result to the one presented in Lemma 2 is given by the Cone Table 1 on page 179 of [1].

Thus a maximal proper face of KK is of the form

Ki0=cone{𝐞i:iN{i0}}=cone{𝐞i:iN,𝐞i𝐮i0=0},K_{i_{0}}=\operatorname{cone}\{\mathbf{e}_{i}:i\in N\setminus\{i_{0}\}\}=\operatorname{cone}\{\mathbf{e}_{i}:i\in N,\mathbf{e}^{\top}_{i}\mathbf{u}_{i_{0}}=0\},

hence it is also called the face of KK orthogonal to 𝐮i0\mathbf{u}_{i_{0}}. Similarly, we have a maximal proper face of KK^{\circ} orthogonal to some 𝐞j0\mathbf{e}_{j_{0}}.

Let F=cone{𝐞i:iI}F=\operatorname{cone}\{\mathbf{e}_{i}:\;i\in I\} and F=cone{𝐮j:jIc}.F^{\perp}=\operatorname{cone}\{\mathbf{u}_{j}:\;j\in I^{c}\}. Then, from the above results it follows that

F={𝐱K:𝐱𝐮j=0,jIc}F=\{\mathbf{x}\in K:\;\mathbf{x}^{\top}\mathbf{u}_{j}=0,\,j\in I^{c}\}

and

F={𝐲K:𝐲𝐞i=0,iI}.F^{\perp}=\{\mathbf{y}\in K^{\circ}:\;\mathbf{y}^{\top}\mathbf{e}_{i}=0,\,i\in I\}.

The faces FKF\subset K and FKF^{\perp}\subset K^{\circ} of the above form are called a pair of orthogonal faces where FF^{\perp} is called the orthogonal face of FF and FF is called the orthogonal face of FF^{\perp}.

3 Finite method of projection on a simplicial cone

For an arbitrary 𝐮n\mathbf{u}\in\mathbb{R}^{n} denote 𝐮=𝐮𝐮\|\mathbf{u}\|=\sqrt{\mathbf{u}^{\top}\mathbf{u}}. Let KnK\in\mathbb{R}^{n} be an arbitrary cone and KK^{\circ} its polar, and CnC\subset\mathbb{R}^{n} an arbitrary closed convex set. Recall that the projection mapping 𝐏C:HH\mathbf{P}_{C}:H\to H on CC is well defined by 𝐏C𝐱C\mathbf{P}_{C}\mathbf{x}\in C and

𝐱𝐏C𝐱=min{𝐱𝐲:𝐲C}.\|\mathbf{x}-\mathbf{P}_{C}\mathbf{x}\|=\min\{\|\mathbf{x}-\mathbf{y}\|:\mathbf{y}\in C\}.

Then, Moreau’s decomposition theorem asserts:

Theorem 1

(Moreau, [6]) For 𝐱,𝐲,𝐳n\mathbf{x},\,\mathbf{y},\,\mathbf{z}\in\mathbb{R}^{n} the following statements are equivalent:

  1. (i)

    𝐳=𝐱+𝐲,𝐱K,𝐲K\mathbf{z}=\mathbf{x}+\mathbf{y},\mathbf{x}\in K,\mathbf{y}\in K^{\circ} and 𝐱𝐲=0.\mathbf{x}^{\top}\mathbf{y}=0.

  2. (ii)

    𝐱=𝐏K𝐳\mathbf{x}=\mathbf{P}_{K}\mathbf{z} and 𝐲=𝐏K𝐳\mathbf{y}=\mathbf{P}_{K^{\circ}}\mathbf{z}.

Suppose now, that KK is a simplicial cone in n\mathbb{R}^{n}. We shall use the representation (2) for KK and the representation (4) for KK^{\circ}. Hence,

𝐞i𝐮j=δji,i,j=1,,n\mathbf{e}^{\top}_{i}\mathbf{u}_{j}=-\delta^{i}_{j},i,j=1,\dots,n

where δji\delta^{i}_{j} the Kronecker symbol. As a direct implication of Moreau’s decomposition theorem and the constructions in the preceding section we have:

Theorem 2

Let 𝐱n\mathbf{x}\in\mathbb{R}^{n}. For each subset of indices INI\subset N, 𝐱\mathbf{x} can be represented in the form

𝐱=iIαi𝐞i+jIcβj𝐮j\mathbf{x}=\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}+\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j} (10)

with IcI^{c} the complement of II with respect to NN, and with αi\alpha^{i} and βj\beta^{j} real numbers. Among the subsets II of indices, there exists exactly one (the cases I=I=\emptyset and I=NI=N are not excluded) with the property that for the coefficients in (10) one has βj>0,jIc\beta^{j}>0,j\in I^{c} and αi0,iI.\alpha^{i}\geq 0,i\in I. For this representation it holds that

𝐏K𝐱=iIαi𝐞i,αi0,\mathbf{P}_{K}\mathbf{x}=\sum_{i\in I}\alpha^{i}\mathbf{e}_{i},\quad\alpha^{i}\geq 0, (11)

and

𝐏K𝐱=jIcβj𝐮j,βj>0.\mathbf{P}_{K^{\circ}}\mathbf{x}=\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j},\quad\beta^{j}>0. (12)

Proof. The first assertion is the consequence of Corollary 1.

The projections 𝐏K𝐱\mathbf{P}_{K}\mathbf{x} and 𝐏K𝐱\mathbf{P}_{K^{\circ}}\mathbf{x} as elements of KK and KK^{\circ}, respectively can be represented as

𝐏K𝐱=i=1nαi𝐞i,αi0\mathbf{P}_{K}\mathbf{x}=\sum_{i=1}^{n}\alpha^{i}\mathbf{e}_{i},\quad\alpha^{i}\geq 0 (13)

and

𝐏K𝐱=j=1nβj𝐮j,βj0.\mathbf{P}_{K^{\circ}}\mathbf{x}=\sum_{j=1}^{n}\beta^{j}\mathbf{u}_{j},\quad\beta^{j}\geq 0. (14)

To prove existence, let Ic={jN:βj>0}I^{c}=\{j\in N:\beta^{j}>0\} and let II be the complement of IcI^{c} in the set NN of indices. For an arbitrary element 𝐳n\mathbf{z}\in\mathbb{R}^{n}, denote 𝐏K𝐳=(𝐏K𝐳)\mathbf{P}_{K}^{\top}\mathbf{z}=(\mathbf{P}_{K}\mathbf{z})^{\top}. If αj>0\alpha^{j}>0 would hold in (14), for some jIcj\in I^{c}, then by Lemma 1 it would follow that 𝐏K𝐱𝐏K𝐱<0,\mathbf{P}_{K}^{\top}\mathbf{x}\cdot\mathbf{P}_{K^{\circ}}\mathbf{x}<0, which contradicts the theorem of Moreau. Hence, (13) can be written in the form (11) and (14) can be written in the form (12). Therefore, Theorem 1 implies

𝐱=𝐏K𝐱+𝐏K𝐱=iIαi𝐞i+jIcβj𝐮j,\mathbf{x}=\mathbf{P}_{K}\mathbf{x}+\mathbf{P}_{K^{\circ}}\mathbf{x}=\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}+\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j},

where αi0\alpha^{i}\geq 0, iI\forall i\in I and βj>0\beta^{j}>0, jIc\forall j\in I^{c}.

To prove uniqueness, suppose that in the representation (10) of 𝐱\mathbf{x} we have αi0,βi=0\alpha^{i}\geq 0,\;\beta^{i}=0 for iIi\in I and βj>0,αj=0\beta^{j}>0,\;\alpha^{j}=0 for jIcj\in I^{c}, where II is a subset of NN, and IcI^{c} is the complement of II in NN (the cases I=I=\emptyset and I=NI=N are not excluded). Then representations (11) and (12) follow from Theorem 1 by using the mutual orthogonality of the vectors 𝐞i,iI\mathbf{e}_{i},\;i\in I and 𝐮j,jIc\mathbf{u}_{j},\;j\in I^{c}. From (11) and the uniqueness of the projection 𝐏K𝐱\mathbf{P}_{K}\mathbf{x} it follows that II is unique.

\Box

From this theorem it follows that a given simplicial cone KnK\subset\mathbb{R}^{n} determines a partition of the space n\mathbb{R}^{n} in 2n2^{n} cones in the sense that

n=INcone{𝐞i,𝐮j:iI,jIc}\mathbb{R}^{n}=\bigcup_{I\subset N}\operatorname{cone}\{\mathbf{e}_{i},\mathbf{u}_{j}:\;i\in I,\;j\in I^{c}\}

and for two different sets II of indices the respective cones do not contain common interior points. The cones in the above union are exactly the sums of orthogonal faces.

This theorem suggests the following algorithm for finding the projection 𝐏K𝐱\mathbf{P}_{K}\mathbf{x}:

Step 1. For the subset INI\subset N we solve the following linear system in αi\alpha^{i}

𝐱𝐞=iIαi𝐞i𝐞,lI.\mathbf{x}^{\top}\mathbf{e}_{\ell}=\sum_{i\in I}\alpha^{i}\mathbf{e}^{\top}_{i}\mathbf{e}_{\ell},\;l\in I. (15)

Step 2. Then, we select from the family of all subsets in NN the subfamily Δ\Delta of subsets II for which the system possesses non-negative solutions.

Step 3. For each IΔI\in\Delta we solve the linear system in βj\beta^{j}

𝐱𝐮k=jIcβj𝐮j𝐮k,kIc.\mathbf{x}^{\top}\mathbf{u}_{k}=\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j}^{\top}\mathbf{u}_{k},\;k\in I^{c}. (16)

By Theorem 2 among these systems there exists exactly one with non-negative solutions. By this theorem, for corresponding II and for the solution of the system (15), we must have

𝐏K𝐱=iIαi𝐞i.\mathbf{P}_{K}\mathbf{x}=\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}.

This algorithm requires that we solve 2n2^{n} linear systems of at most nn equations in Step 1 (15) and another 2|Δ|2^{|\Delta|} systems in Step 2 (16). (Observe that all these systems are given by Gram matrices, hence they have unique solutions.) Perhaps this great number of systems can be substantially reduced, but it still remains considerable.

Remark 1

If KK is subdual; that is, if 𝐞k𝐞0,k,lN\mathbf{e}^{\top}_{k}\mathbf{e}_{\ell}\geq 0,\;k,\;l\in N, the above algorithm can be reduced as follows: By supposing that we have got the representation (10) of xx with non-negative coefficients, we multiply both sides of (10) by an arbitrary 𝐞l\mathbf{e}_{l}^{\top}. If 𝐱𝐞l<0\mathbf{x}^{\top}\mathbf{e}_{l}<0 then ll cannot be in II, otherwise the relations 𝐮j𝐞l=0,jIc\mathbf{u}_{j}\mathbf{e}_{l}=0,\;j\in I^{c} and 𝐞i𝐞l0,iI\mathbf{e}^{\top}_{i}\mathbf{e}_{l}\geq 0,\;i\in I would furnish a contradiction. Thus, we have to look for the set II of indices (for which we have to solve the system (15)) among the subfamilies of {iN:𝐱𝐞i0}.\{i\in N:\mathbf{x}^{\top}\mathbf{e}_{i}\geq 0\}. (Arguments like this can be used, as it was done e. g. in [7] for the Dykstra algorithm, to eliminate some hyperplanes while comkputing successive approximations of the solution.)

Obviously, the proposed method is inefficient. It was presented by A. B. Németh and S. Z. Németh in [8] as a preparatory matherial for an efficient algorithm for so called isotone projection cones only. For isotone projection cones we can obtain the projection of a point in at most nn steps, where nn is the dimension of the space. Isotone projection cones are special simplicial cones. Even if there are important isotone projection cones in applications, they are rather particular in the family of simplicial cones.

4 Heuristic method for projection onto a simplicial cone

Regardless the inconveniences of the above presented exact method, which follow from its combinatorial character, it suggests an interesting heuristic algorithm. To explain its intuitive background we consider again the simplicial cone

K=cone{𝐞1,,𝐞n}K=\operatorname{cone}\{\mathbf{e}_{1},\dots,\mathbf{e}_{n}\}

and its polar

K=cone{𝐮1,,𝐮n}K^{\circ}=\operatorname{cone}\{\mathbf{u}_{1},\dots,\mathbf{u}_{n}\}

given by Lemma 1.

Take an arbitrary 𝐱n\mathbf{x}\in\mathbb{R}^{n}. We are seeking the projection 𝐏Kx.\mathbf{P}_{K}x.

If 𝐞i𝐱0,iN\mathbf{e}_{i}^{\top}\mathbf{x}\leq 0,\;\forall\;i\in N, then 𝐱K=ker𝐏K\mathbf{x}\in K^{\circ}=\ker\mathbf{P}_{K}, hence 𝐏K𝐱=0.\mathbf{P}_{K}\mathbf{x}=0.

If 𝐮j𝐱0,jN\mathbf{u}_{j}^{\top}\mathbf{x}\leq 0,\;\forall\;j\in N, then 𝐱K\mathbf{x}\in K, and hence 𝐏K𝐱=𝐱\mathbf{P}_{K}\mathbf{x}=\mathbf{x}.

We can assume that 𝐱KK\mathbf{x}\notin K\cup K^{\circ}. Hence, 𝐱\mathbf{x} projects in a proper face of KK and in a proper face of K.K^{\circ}.

Take an arbitrary family INI\subset N of indices. Then, the vectors

𝐞i,𝐮j:iI,jIc\mathbf{e}_{i},\,\mathbf{u}_{j}:\;i\in I,\,j\in I^{c}

entgender by Corollary 1 a reference system in n\mathbb{R}^{n}. Then,

𝐱=iIαi𝐞i+jIcβj𝐮j\mathbf{x}=\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}+\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j} (17)

with some αi,βj.\alpha^{i},\;\beta^{j}\in\mathbb{R}. (As far as the family INI\subset N of indices is given, we can determine the coefficients αi\alpha^{i} and βj\beta^{j}, according to Theorem 2, by solving the systems (15) and (16).)

If we have αi0,βj0:iI,jIc\alpha^{i}\geq 0,\;\beta^{j}\geq 0:\;i\in I,\;j\in I^{c}, then from Theorem 2 we obtain

𝐏K𝐱=iIαi𝐞i\mathbf{P}_{K}\mathbf{x}=\sum_{i\in I}\alpha^{i}\mathbf{e}_{i}

and

𝐏K𝐱=jIcβj𝐮j.\mathbf{P}_{K^{\circ}}\mathbf{x}=\sum_{j\in I^{c}}\beta^{j}\mathbf{u}_{j}.

In this case 𝐱\mathbf{x} is projected onto face F=cone{𝐞i:iI}F=\operatorname{cone}\{\mathbf{e}_{i}:\;i\in I\} ortogonally along the subspace engendered by the elements {𝐮j:jIc}\{\mathbf{u}_{j}:\;j\in I^{c}\}, roughly speaking, along the orthogonal face FF^{\perp} of FF.

Suppose that βj<0\beta^{j}<0 for some jIcj\in I^{c}. Then, considering the reference system entgendered by 𝐞i,𝐮j,iI,jIc\mathbf{e}_{i},\;\mathbf{u}_{j},\;i\in I,\;j\in I^{c}, 𝐱\mathbf{x} lies in its orthant with negative jthj^{th} coordinate, that is in the direction of the vector 𝐮j-\mathbf{u}_{j}. By construction, 𝐞j\mathbf{e}_{j} and 𝐮j\mathbf{u}_{j} form an obtuse angle. Hence the angle of 𝐞j\mathbf{e}_{j} and 𝐮j-\mathbf{u}_{j} is an acute one. Thus there is a real chance that in a new reference system in which 𝐞j\mathbf{e}_{j} replaces 𝐮j\mathbf{u}_{j}, the coordinate of 𝐱\mathbf{x} with respect to 𝐞j\mathbf{e}_{j} has the same sign as its coordinate with respect to 𝐮j-\mathbf{u}_{j}, that is positive (or at least non-negative).

If we have αi<0\alpha^{i}<0 for some iIi\in I, then by similar reasoning it seems to be advantageous to replace 𝐞i\mathbf{e}_{i} with 𝐮i\mathbf{u}_{i}, and so on.

Thus, we arrive to the following step in our algorithm:

Substitute 𝐮j\mathbf{u}_{j} with 𝐞j\mathbf{e}_{j} if βj<0\beta^{j}<0 and substitute 𝐞i\mathbf{e}_{i} with 𝐮i\mathbf{u}_{i} if αi<0\alpha^{i}<0 and solve the systems (15) and (16) for the new configuration of indices II. We shall call this step an iteration of the heuristic algorithm.

Then, repeat the procedure for the new configuration of II and so on, until we obtain a representation (17) of 𝐱\mathbf{x} with all the coefficients non-negative.


5 Experimental results

The heuristic algorithm was programmed in Scilab, an open source platform for numerical computation.222http://www.scilab.org/ Experiments were performed on numerical examples for 2, 3, 5, 10, 15, 20, 25, 30, 50, 75, 100, 200, 300, 5002,\,3,\,5,\,10,\,15,\,20,\,25,\,30,\,50,\,75,\,100,\,200,\,300,\,500 dimensional cones. The algorithm was performed on 100000100000 random examples for each of the problem sizes 2,, 1002,\dots,\,100. Statistical analysis on a subset of 1000010000 examples from the set of 100000100000 examples for size 100100 indicates no significant difference in overall results and performance, therefore we subsequenlty reduced the number of experiments on larger problem sizes. 1000010000 random examples were used for sizes 200200 and 300300 and 10001000 examples for size 500500, as the time needed by the algorithm increases with size.

Table 1: Number of changes, iterations, iterations where the number of changes increased and loops for the various cone dimensions
Size Changes Iterations Iterations with Loops
increases [%] [%]
22 11 11 0 4.3824.382
33 22 11 3.9±0.13.9\pm 0.1 4.2784.278
55 44 22 13±0.213\pm 0.2 1.3961.396
1010 1111 33 26±0.326\pm 0.3 0.2730.273
1515 1717 44 30.3±0.330.3\pm 0.3 0.0290.029
2020 2424 44 31.6±0.331.6\pm 0.3 0.0070.007
2525 3030 44 31.2±0.331.2\pm 0.3 0.0030.003
3030 3737 44 29.9±0.329.9\pm 0.3 -
5050 6464 55 26.8±0.326.8\pm 0.3 -
7575 9797 55 24.2±0.324.2\pm 0.3 -
100100 131131 55 23.8±0.323.8\pm 0.3 -
200200 267±1267\pm 1 66 19.9±0.819.9\pm 0.8 -
300300 409±1409\pm 1 66 26.2±0.926.2\pm 0.9 -
500500 700±5700\pm 5 77 25.7±2.725.7\pm 2.7 -

Table 1 shows the experimental results. For each problem size, the averages of all runs are shown, together with a confidence interval at confidence level 95%95\% where appropriate.333Any difference less than ±0.5\pm 0.5 for integers and ±0.1\pm 0.1 for percentages, respectively, is not shown, as deemed irrelevant for the analysis. The Changes column indicates the total number of swaps 𝐮j\mathbf{u}_{j} for 𝐞j\mathbf{e}_{j} and 𝐞j\mathbf{e}_{j} for 𝐮j\mathbf{u}_{j}, respectively, before reaching the solution. The Iterations column indicates the number of iterations (as defined in Section 4) the algorithm performed before reaching a solution. The Iterations with increases column shows the percentage of iterations where the number of changes increased from the previous iteration. We noticed that in the majority of iterations the number of changes decreased, which led to the quick convergence of the algorithm in the vast majority of cases. In all examples the starting point for the search was the 𝐞1,,𝐞n\mathbf{e}_{1},\dots,\,\mathbf{e}_{n} base. The final column shows the percentage of problems where the algorithm was aborted due to going in a loop by allocating in some iteration a set of 𝐞j\mathbf{e}_{j}s and 𝐮j\mathbf{u}_{j}s that were encountered in a previous iteration. The percentage of loops was exponentially decreasing as the size increased and we did not observe any loops in any experiments on problem sizes of 3030 or above. Overall, loops were observed in 0.1%0.1\% of the experiments, so the heuristic algorithm was successful 99.9%99.9\% of the time. A solution that we see for solving the problems that lead to a loop is to restart the algorithm from a different initial set of 𝐞j\mathbf{e}_{j}s and 𝐮j\mathbf{u}_{j}s. The problems ending in a loop were excluded from the detailed analysis that follows.

More detailed analysis of the three main performance indicators of changes, iterations and iterations with increases was performed using boxplots as shown in Figures 1, 2 and 3. Although the total number of changes performed increases linearly with problem size (at a rate of less than 2×n2\times n, even if considering maximum numer of changes), this does not affect the performance substantially (see Figure 1, where the results were split into two parts for a clearer view).

Refer to caption
Refer to caption
Figure 1: Boxplots of number of changes performed for the various cone dimensions

The number of iterations is the crucial indicator of performance. As shown in Figure 2 the number of iterations reaches at most 1111 (for sizes 1515 and 2020), but in 75%75\% of cases has a value of 77 or below. We ran a few experiments on larger sizes, up to 17501750, and the largest number of iterations we obeserved was 1313. Running experiments on very large problem sizes is problematic due to computer memory limitations and the Scilab built-in solving of linear systems.444 Note that the time needed by one iteration substantially increases with problem size nn as one iteration involves solving a linear system with nn equations and nn variables. The major benefit of this heuristic algorithm is the small number of iterations even for very large number of cone dimensions.

Refer to caption
Figure 2: Boxplots of number of iterations needed for the various cone dimensions

As our heuristic algorithm seems to converge quickly, we wanted to know how frequently it deviates from the optimal path. An optimal path would consist of iterations with decreasing number of changes. Figure 3 shows the boxplots for the number of iterations where an increase in the number of changes took place. The maximum number of such iterations over all experiments is 44, but 75%75\% of examples involved only one or no such iteration. This provides an explanation for the fast convergence: the algorithm very rarely deviates from the optimal path.

Refer to caption
Figure 3: Boxplots of number of iterations with increases in number of changes needed for the various cone dimensions

6 Conclusion

We presented a heuristic method of projection on simplicial cones based on Moreau’s decomposition theorem. The heuristic algorithm presented in this note iteratively finds the projection onto a simplicial cone in a surprisingly small number of steps even for large cone dimensions in 99.9.%99.9.\% of the cases. We attribute the success to the fact that the algorithm rarely deviates from the optimal path, in every iteration it usually has to change less base values than in the previous iteration. We are planning to further extend the algorithm with random restart hoping to achieve 100%100\% success rate.

Acknowledgements

S. Z. Németh was supported by the Hungarian Research Grant OTKA 60480. The authors are grateful to J. Dattorro for many helpful conversations.

References

  • [1] J. Dattorro. Convex Optimization &\& Euclidean Distance Geometry. εβoo\mathcal{M}\varepsilon\beta oo, 2005, v2009.04.11.
  • [2] F. Deutsch and H. Hundal. The rate of convergence of Dykstra’s cyclic projections algorithm: the polyhedral case. Numer. Funct. Anal. Optim., 15(5-6):537–565, 1994.
  • [3] R. L. Dykstra. An algorithm for restricted least squares regression. J. Amer. Stat. Assoc., 78(384):273–242, 1983.
  • [4] G. Isac and A. B. Németh. Projection methods, isotone projection cones, and the complementarity problem. J. Math. Anal. Appl., 153(1):258–275, 1990.
  • [5] T. Ming, T. Guo-Liang, F. Hong-Bin, and Ng. Kai Wang. A fast EM algorithm for quadratic optimization subject to convex constraints. Statist. Sinica, 17(3):945–964, 2007.
  • [6] J. J. Moreau. Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires. C. R. Acad. Sci., 255:238–240, 1962.
  • [7] P. M. Morillas. Dykstra’s algorithm with strategies for projecting onto certain polyhedral cones. Applied Mathematics and Computation, 167(1):635–649, 2005.
  • [8] A. B. Németh and S. Z. Németh. How to project onto an isotone projection cone. Linear Algebra Appl., submitted.
  • [9] S. Z. Németh. Iterative methods for nonlinear complementarity problems on isotone projection cones. J. Math. Anal. Appl., 350(1):340–347, 2009.
  • [10] X. Shusheng. Estimation of the convergence rate of Dykstra’s cyclic projections algorithm in polyhedral case. Acta Math. Appl. Sinica (English Ser.), 16(2):217–220, 2000.