This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Bredies-Chenchene-Lorenz-Naldi algorithm:
linear relations and strong convergence

Heinz H. Bauschke,  Walaa M. Moursi ,  Shambhavi Singh,  and  Xianfu Wang Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: heinz.bauschke@ubc.ca. Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. E-mail: walaa.moursi@uwaterloo.ca. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: sambha@mail.ubc.ca. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: shawn.wang@ubc.ca.
(April 23, 2025)
Abstract

Monotone inclusion problems occur in many areas of optimization and variational analysis. Splitting methods, which utilize resolvents or proximal mappings of the underlying operators, are often applied to solve these problems. In 2022, Bredies, Chenchene, Lorenz, and Naldi introduced a new elegant algorithmic framework that encompasses various well known algorithms including Douglas-Rachford and Chambolle-Pock. They obtained powerful weak and strong convergence results, where the latter type relies on additional strong monotonicity assumptions.

In this paper, we complement the analysis by Bredies et al. by relating the projections of the fixed point sets of the underlying operators that generate the (reduced and original) preconditioned proximal point sequences. We obtain a new strong convergence result when the underlying operator is a linear relation. We note without assumptions such as linearity or strong monotonicity, one may encounter weak convergence without strong convergence. In the case of the Chambolle-Pock algorithm, we obtain a new result that yields strong convergence to the projection onto the intersection of a linear subspace and the preimage of a linear subspace. Splitting algorithms by Ryu and by Malitsky and Tam are also considered. Various examples are provided to illustrate the applicability of our results.

2020 Mathematics Subject Classification: Primary 47H05, 47H09; Secondary 47N10, 65K05, 90C25.

Keywords: Bredies-Chenchene-Lorenz-Naldi algorithm, Chambolle-Pock algorithm, Douglas-Rachford algorithm, Malitsky-Tam algorithm, maximally monotone operator, proximal point algorithm, Ryu’s algorithm, splitting algorithm.

1 Introduction

Throughout, we assume that

HH is a real Hilbert space with inner product ,:H×H\left\langle{\cdot},{\cdot}\right\rangle\colon H\times H\to\mathbb{R}, (1)

and induced norm \|\cdot\|. We are given a set-valued operator A:HHA\colon H\rightrightarrows H. Our goal is to find a point xx in zerA\operatorname{zer}A, which we assume to be nonempty:

find xHx\in H such that 0Ax0\in Ax. (2)

This is a very general and powerful problem in modern optimization and variational analysis; see, e.g., [6]. Following the framework proposed recently by Bredies, Chenchene, Lorenz, and Naldi [8] (see also the follow up [9] and the related works111We would like to thank Dr. Panos Patrinos for bringing these to our attention. [26], [19], and [18]), we also assume that we are given another (possibly different) real Hilbert space DD, as well as a continuous linear operator

C:DH,C\colon D\to H, (3)

with adjoint operator C:HDC^{*}\colon H\to D. We shall (usually) assume that CC^{*} is surjective. Because D=ranC=ran¯C=(kerC)D={\operatorname{ran}}\,C^{*}=\overline{\operatorname{ran}}\,C^{*}=(\ker C)^{\perp}, it is clear that kerC={0}\ker C=\{0\} and CC is injective. We thus think of DD as a space that is “smaller” than HH. Now set

M:=CC:HH,M:=CC^{*}\colon H\to H, (4)

and one thinks of MM as a preconditioner222The viewpoint taken in [8] is to start with a preconditioner MM and then to construct the operator CC; see [8, Proposition 2.3] for details.. It follows from [15, Lemma 8.40] that ran(M)=ran(C){\operatorname{ran}}\,(M)={\operatorname{ran}}\,(C) is closed. We shall assume that the set-valued map

HH:xAxranMH\rightrightarrows H\colon x\mapsto Ax\cap{\operatorname{ran}}\,M is monotone, (5)

which is clearly true if AA itself is monotone, and that

(A+M)1(A+M)^{-1} is single-valued, full-domain, and Lipschitz continuous. (6)

This guarantees that the resolvent333We use the notation JB:=(Id+B)1J_{B}:=(\operatorname{Id}+B)^{-1} for the classical resolvent and RB:=2JBIdR_{B}:=2J_{B}-\operatorname{Id} for the reflected resolvent. of M1AM^{-1}A,

T:=(M+A)1M=(Id+M1A)1=JM1A:HHT:=(M+A)^{-1}M=(\operatorname{Id}+M^{-1}A)^{-1}=J_{M^{-1}A}\colon H\to H (7)

is also single-valued, full-domain, and Lipschitz continuous444M1AM^{-1}A is not necessarily monotone and hence TT need not be firmly nonexpansive; however, it is “MM-monotone” by 5 and [8, Proposition 2.5]. See also Section 5.2.4 below.. The importance of TT for the problem 2 is the relationship

FixT=zerA.\operatorname{Fix}T=\operatorname{zer}A. (8)

Associated with TT, which operates in HH, is the operator (see [8, Theorem 2.13] or [10])

T~:=C(M+A)1C=JCA:DD,\widetilde{T}:=C^{*}(M+A)^{-1}C=J_{C^{*}\triangleright A}\colon D\to D, (9)

which operates in DD; here CA:=(CA1C)1C^{*}\triangleright A:=(C^{*}A^{-1}C)^{-1} is a maximally monotone operator on DD and hence T~\widetilde{T} is firmly nonexpansive. Furthermore, we always assume that

(λk)k(\lambda_{k})_{k\in{\mathbb{N}}} is a sequence of real parameters in [0,2][0,2] with kλk(2λk)<+\sum_{k\in{\mathbb{N}}}\lambda_{k}(2-\lambda_{k})<+\infty, (10)

and sometimes we will impose the more restrictive condition that

0<infkλksupkλk<2.0<\inf_{k\in{\mathbb{N}}}\lambda_{k}\leq\sup_{k\in{\mathbb{N}}}\lambda_{k}<2. (11)

In order to solve 2, Bredies et al. [8] consider two sequences which are tightly intertwined: The first sequence, generated by the so-called preconditioned proximal point (PPP) algorithm, resides in HH and is given by

uk+1:=(1λk)uk+λkTuk,u_{k+1}:=(1-\lambda_{k})u_{k}+\lambda_{k}Tu_{k}, (12)

where u0Hu_{0}\in H is the starting point (see [8, equation (2.2)]). The second sequence, generated by the so-called reduced preconditioned proximal point (rPPP) algorithm resides in DD and is given by (see [8, equation (2.15)])

wk+1:=(1λk)wk+λkT~wk,w_{k+1}:=(1-\lambda_{k})w_{k}+\lambda_{k}\widetilde{T}w_{k}, (13)

where w0=Cu0w_{0}=C^{*}u_{0}.

With these sequences in place, we now state the main result on the PPP and rPPP algorithms from Bredies et al.

Fact 1.1 (Bredies-Chenchene-Lorez-Naldi, 2022).

For every starting point u0Hu_{0}\in H, set w0:=Cu0w_{0}:=C^{*}u_{0}. Then there exist uzerA=FixTu^{*}\in\operatorname{zer}A=\operatorname{Fix}T and wFixT~w^{*}\in\operatorname{Fix}\widetilde{T} such that the following hold:

  1. (i)

    The sequence (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}} converges weakly to uu^{*}.

  2. (ii)

    The sequence (wk)k(w_{k})_{k\in{\mathbb{N}}} converges weakly to ww^{*}.

  3. (iii)

    The sequence ((M+A)1Cwk)k((M+A)^{-1}Cw_{k})_{k\in{\mathbb{N}}} coincides with (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}}.

  4. (iv)

    The sequence (Cuk)k(C^{*}u_{k})_{k\in{\mathbb{N}}} coincides with (wk)k(w_{k})_{k\in{\mathbb{N}}}.

  5. (v)

    u=(M+A)1Cwu^{*}=(M+A)^{-1}Cw^{*}.

  6. (vi)

    w=Cuw^{*}=C^{*}u^{*}.

  7. (vii)

    If 11 holds, then (uk)k(u_{k})_{k\in{\mathbb{N}}} also converges weakly to uu^{*}.

Proof. (i): See [8, Corollary 2.10]. (ii): See [8, Corollary 2.15]. (iii): This is buried in [8, Proof of Corollary 2.15]. (iv): See [8, Theorem 2.14]. (v): Combine [8, Corollary 2.15], which states that ((M+A)1Cwk)k((M+A)^{-1}Cw_{k})_{k\in{\mathbb{N}}} converges weakly to (M+A)1Cw(M+A)^{-1}Cw^{*}, with (iii) and (i). (vi): This is buried in [8, Proof of Corollary 2.15]. (vii): See [8, Corollary 2.10]. \hfill\quad\blacksquare

In [8, Section 2.3] Bredies et al. also obtain linear convergence under the usual assumptions on strong monotonicity. Their framework encompasses an impressive number of other algorithms (see [8, Section 3] for details) including Douglas-Rachford, Chambolle-Pock, and others.

The goal of this paper is to complement the work by Bredies et al. [8]. Specifically, our main results are:

  1. R1

    We introduce the notion of the MM-projection onto FixT\operatorname{Fix}T (Definition 3.2) and we show how it is obtained from the (regular) projection onto FixT~\operatorname{Fix}\widetilde{T} (Theorem 3.6).

  2. R2

    We obtain strong convergence results for PPP and rPPP sequences in the case when AA is a linear relation (Theorem 4.2).

Assume AA is a linear relation. Then TT and T~\widetilde{T} are linear operators and so trivially 0FixT0\in\operatorname{Fix}T and 0FixT~0\in\operatorname{Fix}\widetilde{T}. R1 yields the exact limits, which are closest to the starting points (with respect to the seminorm induced by MM) and which thus may be different from 0, while R2 guarantees strong convergence of the sequences generated by PPP and rPPP.

We also provide: (i) various applications to algorithms in the context of normal cone operators of closed linear subspaces and (ii) an example where the range of the preconditioner MM is not closed555The existence of such an example was proclaimed in [8, Remark 3.1]..

Finally, let us stress that in the absence of strong monotonicity and linearity, weak convergence may occur! We shall demonstrate this for Douglas-Rachford and for Chambolle-Pock (see Section 5.1.2 and Section 5.2.2 below), by making use of the following classical example due to Genel and Lindenstrauss [20]:

Fact 1.2 (Genel-Lindenstrauss).

In 2\ell^{2}, there exists a nonempty bounded closed convex set SS, a nonexpansive mapping N:SSN\colon S\to S, and a starting point s0Ss_{0}\in S such that the sequence generated by sk+1:=12sk+12Nsks_{k+1}:=\tfrac{1}{2}s_{k}+\tfrac{1}{2}Ns_{k} converges weakly — but not strongly — to a fixed point of NN.

The remainder of this paper is organized as follows. In Section 2, we collect some auxiliary results which are required later. Our main result R1 along with the limits of the sequences generated by the PPP and rPPP algorithms are discussed in Section 3. In Section 4, we present our main result R2. Various algorithms are analyzed as in Section 5, with an emphasis on the case when normal cone operators of closed linear subspaces are involved. Some additional results concerning the Chambolle-Pock algorithm are provided in Section 6, including an example where the range of MM is not closed.

The notation we employ is fairly standard and follows largely [6].

2 Auxiliary results

In this section, we shall collect various results that will make subsequent proofs more structured.

2.1 More results from Bredies et al.’s [8]

The next result is essentially due to Bredies et al.; however, it is somewhat buried in [8, Proof of Theorem 2.14]:

Proposition 2.1 (FixT\operatorname{Fix}T vs FixT~\operatorname{Fix}\widetilde{T}).

Recall the definitions of TT and T~\widetilde{T} from 7 and 9. Then:

  1. (i)

    If uFixTu\in\operatorname{Fix}T, then CuFixT~C^{*}u\in\operatorname{Fix}\widetilde{T}.

  2. (ii)

    If wFixT~w\in\operatorname{Fix}\widetilde{T}, then w=Cuw=C^{*}u, where u:=(M+A)1CwFixTu:=(M+A)^{-1}Cw\in\operatorname{Fix}T.

  3. (iii)

    If uFixTu\in\operatorname{Fix}T, then u=(M+A)1Cwu=(M+A)^{-1}Cw, where w:=CuFixT~w:=C^{*}u\in\operatorname{Fix}\widetilde{T}.

Consequently, the functions C:FixTFixT~C^{*}\colon\operatorname{Fix}T\to\operatorname{Fix}\widetilde{T} and (M+A)1C:FixT~FixT(M+A)^{-1}C\colon\operatorname{Fix}\widetilde{T}\to\operatorname{Fix}T are bijective and inverse of each other, and

C(FixT)=FixT~and(M+A)1C(FixT~)=FixT.C^{*}\big{(}\operatorname{Fix}T\big{)}=\operatorname{Fix}\widetilde{T}\;\;\text{and}\;\;(M+A)^{-1}C\big{(}\operatorname{Fix}\widetilde{T}\big{)}=\operatorname{Fix}T. (14)

Proof. (i): Let uFixTu\in\operatorname{Fix}T and set w:=Cuw:=C^{*}u. Recalling 4, we thus have the implications u=Tu=(M+A)1Mu=(M+A)1CCuu=Tu=(M+A)^{-1}Mu=(M+A)^{-1}CC^{*}u \Rightarrow w=Cu=(C(M+A)1C)(Cu)=T~ww=C^{*}u=(C^{*}(M+A)^{-1}C)(C^{*}u)=\widetilde{T}w \Leftrightarrow wFixT~w\in\operatorname{Fix}\widetilde{T}.

(ii): Let wFixT~w\in\operatorname{Fix}\widetilde{T}. Clearly, wranT~ranCw\in{\operatorname{ran}}\,\widetilde{T}\subseteq{\operatorname{ran}}\,C^{*}. Hence CwranCC=ranMCw\in{\operatorname{ran}}\,CC^{*}={\operatorname{ran}}\,M and so (A+M)1Cw(A+M)^{-1}Cw is single-valued. Next,

Tu\displaystyle Tu =T(M+A)1Cw\displaystyle=T(M+A)^{-1}Cw (by definition of uu)
=(M+A)1M(M+A)1Cw\displaystyle=(M+A)^{-1}M(M+A)^{-1}Cw (by 7)
=(M+A)1CC(M+A)1Cw\displaystyle=(M+A)^{-1}CC^{*}(M+A)^{-1}Cw (by 4)
=(M+A)1CT~w\displaystyle=(M+A)^{-1}C\widetilde{T}w (by 9)
=(M+A)1Cw\displaystyle=(M+A)^{-1}Cw (because wFixT~w\in\operatorname{Fix}\widetilde{T})
=u\displaystyle=u (by definition of uu)

and so uFixTu\in\operatorname{Fix}T. Moreover, Cu=C(M+A)1Cw=T~w=wC^{*}u=C^{*}(M+A)^{-1}Cw=\widetilde{T}w=w as claimed.

(iii): Let uFixTu\in\operatorname{Fix}T. By (i), wFixT~w\in\operatorname{Fix}\widetilde{T} and (M+A)1Cw=(M+A)1CCu=(M+A)1Mu=Tu=u(M+A)^{-1}Cw=(M+A)^{-1}CC^{*}u=(M+A)^{-1}Mu=Tu=u as announced.

Finally, (i)(iii) yield the “Consequently” part. \hfill\quad\blacksquare

Fact 2.2.

Consider the PPP algorithm. If the parameter sequence (λk)k(\lambda_{k})_{k\in{\mathbb{N}}} satisfies 11, then ukTuk0u_{k}-Tu_{k}\to 0.

Proof. See the second half of [8, Proof of Theorem 2.9]. \hfill\quad\blacksquare

Utilized in the work by Bredies et al. is the seminorm

xM=x,Mx=Cx\|x\|_{M}=\sqrt{\left\langle{x},{Mx}\right\rangle}=\|C^{*}x\| (15)

on HH which allows for some elegant formulations; see in particular [8, Section 2].

2.2 A strong convergence result

Fact 2.3 (Baillon-Bruck-Reich, 1978).

Let N:DDN\colon D\to D be a linear nonexpansive mapping such that Nkx0Nk+1x00N^{k}x_{0}-N^{k+1}x_{0}\to 0 for every x0Dx_{0}\in D. Then (Nkx0)k(N^{k}x_{0})_{k\in{\mathbb{N}}} converges strongly to a fixed point of NN.

Proof. See the original [1, Theorem 1.1], or [6, Theorem 5.14(ii)]. \hfill\quad\blacksquare

2.3 From Genel-Lindenstrauss to maximally monotone operators

It will be convenient to extend and repackage 1.2 as follows:

Proposition 2.4.

There exists a maximally monotone operator BB on 2\ell^{2}, with bounded domain, and a starting point s02s_{0}\in\ell^{2} such that the sequence (JBks0)k(J_{B}^{k}s_{0})_{k\in{\mathbb{N}}} converges weakly — but not strongly — to some point s¯zerB\bar{s}\in\operatorname{zer}B.

Proof. Let SS, NN, and s0s_{0} be as in 1.2. (We recall that FixN\operatorname{Fix}N\neq\varnothing by the fixed point theorem of Browder-Göhde-Kirk; see, e.g., [6, Theorem 4.29].) The operator F:=12Id+12N:SSF:=\tfrac{1}{2}\operatorname{Id}+\tfrac{1}{2}N\colon S\to S is firmly nonexpansive, and (Fks0)k(F^{k}s_{0})_{k\in{\mathbb{N}}} converges weakly — but not strongly — to some point in FixN=FixF\operatorname{Fix}N=\operatorname{Fix}F. By [2, Corollary 5], there exists an extension F~:2S\widetilde{F}\colon\ell^{2}\to S such that F~\widetilde{F} is firmly nonexpansive and F~|S=F\widetilde{F}|_{S}=F. On the one hand, FixFFixF~\operatorname{Fix}F\subseteq\operatorname{Fix}\widetilde{F}. On the other hand, if xFixF~x\in\operatorname{Fix}\widetilde{F}, then x=F~xranF~Sx=\widetilde{F}x\in{\operatorname{ran}}\,\widetilde{F}\subseteq S and so x=F~|Sx=Fxx=\widetilde{F}|_{S}x=Fx, i.e., xFixFx\in\operatorname{Fix}F. Altogether, FixF=FixF~S\varnothing\neq\operatorname{Fix}F=\operatorname{Fix}\widetilde{F}\subseteq S. Finally, let B:=F~1IdB:=\widetilde{F}^{-1}-\operatorname{Id}. Then domB=ranF~S\operatorname{dom}\,B={\operatorname{ran}}\,\widetilde{F}\subseteq S is bounded, BB is maximally monotone on 2\ell^{2}, zerB=FixF~\varnothing\neq\operatorname{zer}B=\operatorname{Fix}\widetilde{F}, and JB=F~J_{B}=\widetilde{F}. The result thus follows from 1.2. \hfill\quad\blacksquare

3 The limits and the MM-projection

3.1 The limits

In this section, we identify the limits ww^{*} and uu^{*} of the rPPP and PPP algorithms (see 1.1) provided that additional assumptions are satisfied. Our result rests on the following relationship between PFixT~P_{\operatorname{Fix}\widetilde{T}} and the projection onto FixT\operatorname{Fix}T with respect to the seminorm from 15.

Theorem 3.1 (projecting onto FixT~\operatorname{Fix}\widetilde{T} and FixT\operatorname{Fix}T).

Let u0Hu_{0}\in H and assume that w0=Cu0w_{0}=C^{*}u_{0}. Then the following hold:

  1. (i)

    If w^:=PFixT~(w0)\widehat{w}:=P_{\operatorname{Fix}\widetilde{T}}(w_{0}), then u^:=(M+A)1Cw^\widehat{u}:=(M+A)^{-1}C\widehat{w} is an MM-projection of u0u_{0} onto FixT\operatorname{Fix}T in the sense that

    (xFixT)u0u^Mu0xM.(\forall x\in\operatorname{Fix}T)\quad\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M}. (16)
  2. (ii)

    If u^FixT\widehat{u}\in\operatorname{Fix}T is an MM-projection of u0u_{0} onto FixT\operatorname{Fix}T in the sense that (xFixT)(\forall x\in\operatorname{Fix}T) u0u^Mu0xM\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M}, then w^:=Cu^\widehat{w}:=C^{*}\widehat{u} is equal to PFixT~(w0)P_{\operatorname{Fix}\widetilde{T}}(w_{0}).

  3. (iii)

    If u1^,u2^\widehat{u_{1}},\widehat{u_{2}} belong to FixT\operatorname{Fix}T and are both MM-projections of u0u_{0} onto FixT\operatorname{Fix}T, then u1^=u2^\widehat{u_{1}}=\widehat{u_{2}}.

Proof. (i): Let xFixTx\in\operatorname{Fix}T and set y:=Cxy:=C^{*}x. Then yFixT~y\in\operatorname{Fix}\widetilde{T} by Proposition 2.1(i). Hence w0w^w0y\|w_{0}-\widehat{w}\|\leq\|w_{0}-y\|. On the other hand, w0=Cu0w_{0}=C^{*}u_{0} (by assumption), w^=Cu^\widehat{w}=C^{*}\widehat{u} and u^FixT\widehat{u}\in\operatorname{Fix}T (by Proposition 2.1(ii)). Altogether, we obtain Cu0Cu^Cu0Cx\|C^{*}u_{0}-C^{*}\widehat{u}\|\leq\|C^{*}u_{0}-C^{*}x\| or C(u0u^)C(u0x)\|C^{*}(u_{0}-\widehat{u})\|\leq\|C^{*}(u_{0}-x)\|. The conclusion now follows from 15.

(ii): By Proposition 2.1(i), we have w^FixT~\widehat{w}\in\operatorname{Fix}\widetilde{T}. Let yFixT~y\in\operatorname{Fix}\widetilde{T} and set x:=(M+A)1Cyx:=(M+A)^{-1}Cy. By Proposition 2.1(ii), we have xFixTx\in\operatorname{Fix}T and y=Cxy=C^{*}x. Moreover, C(u0x)=Cu0Cx=w0yC^{*}(u_{0}-x)=C^{*}u_{0}-C^{*}x=w_{0}-y and C(u0u^)=Cu0Cu^=w0w^C^{*}(u_{0}-\widehat{u})=C^{*}u_{0}-C^{*}\widehat{u}=w_{0}-\widehat{w}. The assumption that u0u^Mu0xM\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M} turns, in view of 15 and the above, into w0w^w0y\|w_{0}-\widehat{w}\|\leq\|w_{0}-y\|. Hence w^=PFixT~(w0)\widehat{w}=P_{\operatorname{Fix}\widetilde{T}}(w_{0}).

(iii): By (ii), Cu1^=PFixT~(w0)=Cu2^C^{*}\widehat{u_{1}}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})=C^{*}\widehat{u_{2}}. Hence u2^=u1^+k\widehat{u_{2}}=\widehat{u_{1}}+k, where kkerCk\in\ker C^{*}. It follows that

u2^\displaystyle\widehat{u_{2}} =Tu2^\displaystyle=T\widehat{u_{2}} (because u2^FixT\widehat{u_{2}}\in\operatorname{Fix}T)
=(A+M)1(CC)(u1^+k)\displaystyle=(A+M)^{-1}(CC^{*})(\widehat{u_{1}}+k) (by 7 and 4)
=(A+M)1(CCu1^+CCk)\displaystyle=(A+M)^{-1}(CC^{*}\widehat{u_{1}}+CC^{*}k) (CCCC^{*} is linear)
=(A+M)1(Mu1^+0)\displaystyle=(A+M)^{-1}(M\widehat{u_{1}}+0) (by 4 and because kkerCk\in\ker C^{*})
=(A+M)1Mu1^\displaystyle=(A+M)^{-1}M\widehat{u_{1}}
=Tu1^\displaystyle=T\widehat{u_{1}} (by 7)
=u1^\displaystyle=\widehat{u_{1}} (because u1^FixT\widehat{u_{1}}\in\operatorname{Fix}T)

and we are done. \hfill\quad\blacksquare

In view of Theorem 3.1, the following notion is now well defined:

Definition 3.2 (MM-projection onto FixT\operatorname{Fix}T).

For every u0Hu_{0}\in H, there exists a unique point u^FixT\widehat{u}\in\operatorname{Fix}T such that (xFixT)(\forall x\in\operatorname{Fix}T) u0u^Mu0xM\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M}. The point u^\widehat{u} is called the MM-projection of u0u_{0} onto FixT\operatorname{Fix}T, written also as PFixTM(u0)P_{\operatorname{Fix}T}^{M}(u_{0}).

Remark 3.3.

It is the special structure of FixT\operatorname{Fix}T that allows us to introduce the single-valued and full-domain operator PFixTMP_{\operatorname{Fix}T}^{M}. For more general sets, MM-projections either may not exist or they may not be a singleton — see Section 5.1.4 below.

We are now ready for a nice sufficient condition that allows for the identification of the weak limit ww^{*} of the sequence generated by the rPPP algorithm:

Theorem 3.4 (when FixT~\operatorname{Fix}\widetilde{T} is affine).

Suppose that FixT~\operatorname{Fix}\widetilde{T} is an affine subspace of DD. Then

w=PFixT~(w0).w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0}). (17)

Proof. By [6, Corollary 5.17(i)], the sequence (wk)k(w_{k})_{k\in{\mathbb{N}}} is Fejér monotone with respect to FixT~\operatorname{Fix}\widetilde{T}. From 1.1(ii), we know that (wk)k(w_{k})_{k\in{\mathbb{N}}} converges weakly to wFixT~w^{*}\in\operatorname{Fix}\widetilde{T}. Therefore, [6, Proposition 5.9(ii)] yields that (wk)k(w_{k})_{k\in{\mathbb{N}}} converges weakly to PFixT~(w0)P_{\operatorname{Fix}\widetilde{T}}(w_{0}). Altogether, we deduce 17. \hfill\quad\blacksquare

Remark 3.5 (linear relations).

Suppose that our given operator AA is actually a linear relation, i.e., its graph is a linear subspace of H×HH\times H. (For more on linear relations, we recommend Cross’s monograph [14] and Yao’s doctoral thesis [30].) Then T~\widetilde{T} (see 9) is actually a firmly nonexpansive linear operator; consequently, its fixed point set FixT~\operatorname{Fix}\widetilde{T} is a linear subspace and Theorem 3.4 is applicable. Similar comments hold for the case when AA is an affine relation, i.e., its graph is an affine subspace of H×HH\times H.

When the weak limit ww^{*} of the rPPP sequence (wk)k(w_{k})_{k\in{\mathbb{N}}} is PFixT~(w0)P_{\operatorname{Fix}\widetilde{T}}(w_{0}), then we are able to identify the weak limit uu^{*} of the PPP sequence (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}} as PFixTM(u0)P_{\operatorname{Fix}T}^{M}(u_{0}):

Theorem 3.6 (PFixTMP_{\operatorname{Fix}T}^{M} from PFixT~P_{\operatorname{Fix}\widetilde{T}}).

Suppose that w=PFixT~(w0)w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0}). Then

u=PFixTM(u0).u^{*}=P_{\operatorname{Fix}T}^{M}(u_{0}). (18)

Proof. On the one hand, (M+A)1Cw=(M+A)1CPFixT~(w0)=PFixTM(u0)(M+A)^{-1}Cw^{*}=(M+A)^{-1}CP_{\operatorname{Fix}\widetilde{T}}(w_{0})=P_{\operatorname{Fix}T}^{M}(u_{0}) by Theorem 3.1(i). On the other hand, u=(M+A)1Cwu^{*}=(M+A)^{-1}Cw^{*} by 1.1(v). Altogether, we obtain the announced identity 18. \hfill\quad\blacksquare

3.2 On the notion of a general MM-projection

Proposition 3.7.

Let SS be a nonempty subset of HH, and let hHh\in H. Then666Here we set ΠSM(h)=argminsShsM\Pi_{S}^{M}(h)=\operatorname*{argmin}_{s\in S}\|h-s\|_{M} and ΠR(d)=argminrRdr\Pi_{R}(d)=\operatorname*{argmin}_{r\in R}\|d-r\|.

ΠSM(h)=S(C)1(ΠC(S)(Ch)).\Pi_{S}^{M}(h)=S\cap(C^{*})^{-1}\big{(}\Pi_{C^{*}(S)}(C^{*}h)\big{)}. (19)

Proof. Suppose that s^H\hat{s}\in H. Set d:=Chd:=C^{*}h, r^:=Cs^\hat{r}:=C^{*}\hat{s}, and R:=C(S)R:=C^{*}(S). We have the following equivalences:

s^ΠSM(h)\displaystyle\hat{s}\in\Pi_{S}^{M}(h) s^S(sS)hs^MhsM\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;(\forall s\in S)\;\|h-\hat{s}\|_{M}\leq\|h-s\|_{M} (20a)
s^S(sS)ChCs^ChCs\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;(\forall s\in S)\;\|C^{*}h-C^{*}\hat{s}\|\leq\|C^{*}h-C^{*}s\| (20b)
s^Sr^R(rR)dr^dr\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;\hat{r}\in R\;\land\;(\forall r\in R)\;\|d-\hat{r}\|\leq\|d-r\| (20c)
s^Sr^ΠR(d)\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;\hat{r}\in\Pi_{R}(d) (20d)
s^SCs^ΠC(S)(Ch)\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;C^{*}\hat{s}\in\Pi_{C^{*}(S)}(C^{*}h) (20e)
s^Ss^(C)1ΠC(S)(Ch)\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;\hat{s}\in(C^{*})^{-1}\Pi_{C^{*}(S)}(C^{*}h) (20f)
s^S(C)1(ΠC(S)(Ch)),\displaystyle\Leftrightarrow\hat{s}\in S\cap(C^{*})^{-1}\big{(}\Pi_{C^{*}(S)}(C^{*}h)\big{)}, (20g)

and we are done! \hfill\quad\blacksquare

Proposition 3.8.

The following hold:

  1. (i)

    If ΠSM\Pi_{S}^{M} is at most singleton-valued, then

    (sS)(s+kerC)S={s}.(\forall s\in S)\quad(s+\ker C^{*})\cap S=\{s\}. (21)
  2. (ii)

    If ΠC(S)\Pi_{C^{*}(S)} is at most singleton-valued and 21 holds, then ΠSM\Pi_{S}^{M} is at most singleton-valued.

Proof. Clearly, (s+kerC)S{s}(s+\ker C^{*})\cap S\supseteq\{s\}.

(i): We prove the contrapositive. Suppose 21 does not hold. Then there exists sSs\in S such that (s+kerC){s}(s+\ker C^{*})\supsetneqq\{s\}. Take kkerC{0}k\in\ker C^{*}\smallsetminus\{0\} such that s+k=sSs+k=s^{\prime}\in S. Then ssM=0M=C0=0\|s-s\|_{M}=\|0\|_{M}=\|C^{*}0\|=0 and ssM=kM=Ck=0\|s-s^{\prime}\|_{M}=\|-k\|_{M}=\|-C^{*}k\|=0. Hence {s,s}ΠSM(s)\{s,s^{\prime}\}\subseteq\Pi_{S}^{M}(s) and ΠSM\Pi_{S}^{M} is not at most singleton-valued.

(ii): Let hHh\in H and suppose that s1s_{1} and s2s_{2} both belong to ΠSM(h)\Pi_{S}^{M}(h). By Proposition 3.7, Cs1ΠC(S)(Ch)C^{*}s_{1}\in\Pi_{C^{*}(S)}(C^{*}h) and Cs2ΠC(S)(Ch)C^{*}s_{2}\in\Pi_{C^{*}(S)}(C^{*}h). Because ΠC(S)\Pi_{C^{*}(S)} is at most singleton-valued, we deduce that Cs1=Cs2C^{*}s_{1}=C^{*}s_{2}. Hence s2s1kerCs_{2}-s_{1}\in\ker C^{*}. Combining with 21, we deduce that s2(s1+kerC)S={s1}s_{2}\in(s_{1}+\ker C^{*})\cap S=\{s_{1}\} and so s2=s1s_{2}=s_{1}. \hfill\quad\blacksquare

Remark 3.9.

The general results above give another view on why PFixTMP_{\operatorname{Fix}T}^{M} is not multi-valued. We sketch here why. Suppose that S=FixTS=\operatorname{Fix}T. We know that C(S)=FixT~C^{*}(S)=\operatorname{Fix}\widetilde{T} is a nonempty closed convex set because T~\widetilde{T} is (firmly) nonexpansive. Hence ΠC(S)\Pi_{C^{*}(S)} is actually singleton-valued. Let sS=FixTs\in S=\operatorname{Fix}T, kkerCk\in\ker C^{*}, and assume that s+kSs+k\in S. Then s+k=T(s+k)=(A+M)1(Ms+Mk)=(A+M)1(Ms)=T(s)=ss+k=T(s+k)=(A+M)^{-1}(Ms+Mk)=(A+M)^{-1}(Ms)=T(s)=s and so k=0k=0. This verifies 21. In view of Proposition 3.8(ii), we see that ΠSM\Pi^{M}_{S} is (at most) singleton-valued.

For an example that illustrates that ΠSM\Pi_{S}^{M} may be empty-valued or multi-valued, see Section 5.1.4 below.

4 Convergence

Theorem 4.1 (weak convergence and limits).

Suppose that FixT~\operatorname{Fix}\widetilde{T} is an affine subspace of DD. Let u0Hu_{0}\in H and set w0=Cu0w_{0}=C^{*}u_{0}. Then the following hold for the rPPP and PPP sequences (wk)k(w_{k})_{k\in{\mathbb{N}}} and (uk)k(u_{k})_{k\in{\mathbb{N}}} generated by 13 and 12:

  1. (i)

    The sequence (wk)k(w_{k})_{k\in{\mathbb{N}}} converges weakly to w=PFixT~(w0)w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0}).

  2. (ii)

    The sequence (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}} converges weakly to u=PFixTM(u0)=(M+A)1Cwu^{*}=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{*}.

  3. (iii)

    If 11 holds, then the sequence (uk)k(u_{k})_{k\in{\mathbb{N}}} also converges weakly to u=PFixTM(u0)=(M+A)1Cwu^{*}=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{*}.

Proof. (i): Combine 1.1(ii) and Theorem 3.4. (ii): Combine 1.1(i), Theorem 3.4, Theorem 3.6, and Theorem 3.1(i). (iii): Combine (ii) with 1.1(vii). \hfill\quad\blacksquare

Theorem 4.2 (strong convergence and limits).

Suppose that AA is a linear relation. Let u0Hu_{0}\in H and set w0=Cu0w_{0}=C^{*}u_{0}. Suppose that the parameter sequence (λk)k(\lambda_{k})_{k\in{\mathbb{N}}} is identical to some constant λ]0,2[\lambda\in\left]0,2\right[. Then the following hold for the rPPP and PPP sequences (wk)k(w_{k})_{k\in{\mathbb{N}}} and (uk)k(u_{k})_{k\in{\mathbb{N}}} generated by 13 and 12:

  1. (i)

    The sequence (wk)k(w_{k})_{k\in{\mathbb{N}}} converges strongly to w=PFixT~(w0)w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0}).

  2. (ii)

    The sequences (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}} and (uk)k(u_{k})_{k\in{\mathbb{N}}} converge strongly to u=PFixTM(u0)=(M+A)1Cwu^{*}=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{*}.

Proof. Because AA is a linear relation on HH, it follows that T~\widetilde{T} is a linear operator and that FixT~\operatorname{Fix}\widetilde{T} is a linear subspace of DD. Note that (k)(\forall{k\in{\mathbb{N}}}) wk+1=Nwkw_{k+1}=Nw_{k}, where N:=(1λ)Id+λT~=(1(λ/2))Id+(λ/2)N~N:=(1-\lambda)\operatorname{Id}+\lambda\widetilde{T}=(1-(\lambda/2))\operatorname{Id}+(\lambda/2)\widetilde{N} and N~:=2NId\widetilde{N}:=2N-\operatorname{Id} are both linear and nonexpansive, with FixT~=FixN=FixN~\operatorname{Fix}\widetilde{T}=\operatorname{Fix}N=\operatorname{Fix}\widetilde{N}. By [6, Theorem 5.15(ii)], wkN~wk0w_{k}-\widetilde{N}w_{k}\to 0; equivalently, wkNwk0w_{k}-Nw_{k}\to 0.

(i): Indeed, it follows from 2.3 that (wk)k(w_{k})_{k\in{\mathbb{N}}} converges strongly. The result now follows from Theorem 4.1(i) (or [6, Proposition 5.28]).

(ii): On the one hand, using 6, we see that (M+A)1C(M+A)^{-1}C is Lipschitz continuous. On the other hand, (Tuk)k=((M+A)1Cwk)k(Tu_{k})_{k\in{\mathbb{N}}}=((M+A)^{-1}Cw_{k})_{k\in{\mathbb{N}}} by 1.1(iii). Altogether, because (wk)k(w_{k})_{k\in{\mathbb{N}}} strongly converges by (i), we deduce that (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}} must converge strongly as well. By Theorem 4.1(ii), TukPFixTM(u0)Tu_{k}\to P_{\operatorname{Fix}T}^{M}(u_{0}). Finally, piggybacking on 2.2, we obtain that (uk)k(u_{k})_{k\in{\mathbb{N}}} converges strongly to PFixTM(u0)P_{\operatorname{Fix}T}^{M}(u_{0}) as well. \hfill\quad\blacksquare

Remark 4.3.

Using a translation argument similar to what was done in [7, Section 4.4], one can generalize Theorem 4.2 to the case when AA is an affine relation, i.e., graA\operatorname{gra}A is an affine subspace of H×HH\times H, and such that zerA\operatorname{zer}A\neq\varnothing.

5 Examples

In this section, we consider the Douglas-Rachford, Chambolle-Pock, Ryu, and the Malitsky-Tam minimal lifting (which we will refer to as Malitsky-Tam) splitting algorithms.

5.1 Douglas-Rachford

5.1.1 General setup

Following [8, Sections 1 and 3], we suppose that XX is a real Hilbert space, and A1,A2A_{1},A_{2} are two maximally monotone operators on XX. The goal is to find a point in zer(A1+A2)\operatorname{zer}(A_{1}+A_{2}). Now assume777We shall employ frequently “block operator” notation for convenience. that

H=X2,D=X,andC=[IdId].H=X^{2},\;\;D=X,\;\;\text{and}\;\;C=\begin{bmatrix}\operatorname{Id}\\ -\operatorname{Id}\end{bmatrix}. (22)

Then

C=[IdId]:X2X:[xy]xyC^{*}=\begin{bmatrix}\operatorname{Id}\;\;-\operatorname{Id}\end{bmatrix}\colon X^{2}\to X\colon\begin{bmatrix}x\\ y\end{bmatrix}\mapsto x-y (23)

is clearly surjective. Now we compute

M=CC=[IdIdIdId]M=CC^{\ast}=\begin{bmatrix}\operatorname{Id}&-\operatorname{Id}\\ -\operatorname{Id}&\operatorname{Id}\end{bmatrix} (24)

and we set

A=[A1IdIdA21]A=\begin{bmatrix}A_{1}&\operatorname{Id}\\ -\operatorname{Id}&A_{2}^{-1}\end{bmatrix} (25)

which is maximally monotone as the sum of a maximally monotone operator (x,y)A1x×A21y(x,y)\mapsto A_{1}x\times A_{2}^{-1}y and a skew linear operator. Next888We use occasionally use the notation B∨⃝=(Id)B(Id)B^{\ovee}=(-\operatorname{Id})\circ B\circ(-\operatorname{Id}) and B∨⃝=(B1)∨⃝B^{-\ovee}={(B^{-1})}^{\ovee}.,

zerA\displaystyle\operatorname{zer}A ={(x,y)|0A1x+y0x+A21y}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{0\in A_{1}x+y\land 0\in-x+A_{2}^{-1}y}\big{\}} (26a)
={(x,y)|0A1x+yyA2x}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{0\in A_{1}x+y\land y\in A_{2}x}\big{\}} (26b)
={(x,y)|xzer(A1+A2)yA1x(A2x)}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{x\in\operatorname{zer}(A_{1}+A_{2})\land-y\in A_{1}x\cap(-A_{2}x)}\big{\}} (26c)
={(x,y)|yzer(A11+A2∨⃝)xA11(y)(A2∨⃝(y))}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{-y\in\operatorname{zer}(A_{1}^{-1}+A_{2}^{-\ovee})\land x\in A_{1}^{-1}(-y)\cap(-A_{2}^{-\ovee}(-y))}\big{\}} (26d)
=gra(A2)gra(A1)\displaystyle=\operatorname{gra}(A_{2})\cap\operatorname{gra}(-A_{1}) (26e)

where 26d follows from [5, Proposition 2.4] and also relates to Attouch-Théra duality and associated operations (see [5, Section 3] for details). One verifies that

(M+A)1:X2X2:[xy][JA1(x)JA21(y+2JA1(x))](M+A)^{-1}\colon X^{2}\to X^{2}\colon\begin{bmatrix}x\\ y\end{bmatrix}\mapsto\begin{bmatrix}J_{A_{1}}(x)\\ J_{A_{2}^{-1}}(y+2J_{A_{1}}(x))\end{bmatrix} (27)

is indeed single-valued, full-domain, and Lipschitz continuous; hence,

(M+A)1C:XX2:w[JA1(w)JA21(w+2JA1(w))]=[JA1(w)JA21RA1(w)].(M+A)^{-1}C\colon X\to X^{2}\colon w\mapsto\begin{bmatrix}J_{A_{1}}(w)\\ J_{A_{2}^{-1}}(-w+2J_{A_{1}}(w))\end{bmatrix}=\begin{bmatrix}J_{A_{1}}(w)\\ J_{A_{2}^{-1}}R_{A_{1}}(w)\end{bmatrix}. (28)

We now compute

T(x,y)\displaystyle T(x,y) =(M+A)1M(x,y)=(M+A)1(xy,yx)\displaystyle=(M+A)^{-1}M(x,y)=(M+A)^{-1}(x-y,y-x) (29a)
=[JA1(xy)JA21RA1(xy)].\displaystyle=\begin{bmatrix}J_{A_{1}}(x-y)\\ J_{A_{2}^{-1}}R_{A_{1}}(x-y)\end{bmatrix}. (29b)

Furthermore, recalling 23 and 28, we obtain

T~(w)\displaystyle\widetilde{T}(w) =C(M+A)1Cw=[IdId][JA1(w)w+2JA1(w)JA2RA1(w)]\displaystyle=C^{*}(M+A)^{-1}Cw=[\operatorname{Id}\;\;-\operatorname{Id}]\begin{bmatrix}J_{A_{1}}(w)\\ -w+2J_{A_{1}}(w)-J_{A_{2}}R_{A_{1}}(w)\end{bmatrix} (30a)
=wJA1(w)+JA2RA1(w),\displaystyle=w-J_{A_{1}}(w)+J_{A_{2}}R_{A_{1}}(w), (30b)

which is the familiar Douglas-Rachford splitting operator. Using the fact that FixT~=C(FixT)\operatorname{Fix}\widetilde{T}=C^{*}(\operatorname{Fix}T) (see 14) and 26, we obtain

FixT~=xzer(A1+A2)x+(A1x(A2x)),\operatorname{Fix}\widetilde{T}=\bigcup_{x\in\operatorname{zer}(A_{1}+A_{2})}x+\big{(}A_{1}x\cap(-A_{2}x)\big{)}, (31)

an identity that also follows from [5, Theorem 4.5].

5.1.2 An example without strong convergence

Now suppose temporarily that X=2X=\ell^{2}, that BB, s0s_{0} and s¯\bar{s} are as in Proposition 2.4, that A1=BA_{1}=B, and that A2=0A_{2}=0. Then 26, 29, and 30 turn into

zerA=zer(B)×{0},T(x,y)=[JB(xy)0],andT~(w)=JB(w).\operatorname{zer}A=\operatorname{zer}(B)\times\{0\},\;\;T(x,y)=\begin{bmatrix}J_{B}(x-y)\\ 0\end{bmatrix},\;\;\text{and}\;\;\widetilde{T}(w)=J_{B}(w). (32)

Suppose that furthermore (k)(\forall{k\in{\mathbb{N}}}) λk=1\lambda_{k}=1 and u0=(s0,0)u_{0}=(s_{0},0). It then follows that w0=Cu0=s0w_{0}=C^{*}u_{0}=s_{0}, uk=Tk(s0,0)=(JBk(s0),0)(s¯,0)u_{k}=T^{k}(s_{0},0)=(J_{B}^{k}(s_{0}),0)\>{\rightharpoonup}\>(\bar{s},0) and wk=T~k(s0)=JBk(s0)s¯w_{k}=\widetilde{T}^{k}(s_{0})=J^{k}_{B}(s_{0})\>{\rightharpoonup}\>\bar{s} and neither convergence is strong.

5.1.3 Specialization to normal cones of linear subspaces

We now assume that U1,U2U_{1},U_{2} are two closed linear subspaces of XX and that999Given a nonempty closed convex subset UU of XX, recall that NU(x):={{xX|maxUx,x=0},if xU;,otherwiseN_{U}(x):=\begin{cases}\big{\{}{x^{*}\in X}~{}\big{|}~{}{\max\left\langle{U-x},{x^{*}}\right\rangle=0}\big{\}},&\text{if }x\in U;\\ \emptyset,&\text{otherwise}\end{cases} is the normal cone operator of UU at xXx\in X. A1=NU1,A2=NU2A_{1}=N_{U_{1}},A_{2}=N_{U_{2}}. Then zer(A1+A2)=U1U2\operatorname{zer}(A_{1}+A_{2})=U_{1}\cap U_{2}, graA1=U1×U1=U1×(U1)=gra(A1)\operatorname{gra}A_{1}=U_{1}\times U_{1}^{\perp}=U_{1}\times(-U_{1}^{\perp})=\operatorname{gra}(-A_{1}) and graA2=U2×U2\operatorname{gra}A_{2}=U_{2}\times U_{2}^{\perp}; hence, 26 yields

FixT=zerA=(U1×U1)(U2×U2)=(U1U2)×(U1U2).\operatorname{Fix}T=\operatorname{zer}A=(U_{1}\times U_{1}^{\perp})\cap(U_{2}\times U_{2}^{\perp})=(U_{1}\cap U_{2})\times(U_{1}^{\perp}\cap U_{2}^{\perp}). (33)

Next, we see that 29, 30, 31 turn into

T(x,y)=[PU1(xy)PU2(PU1PU1)(xy)],\displaystyle T(x,y)=\begin{bmatrix}P_{U_{1}}(x-y)\\ P_{U_{2}^{\perp}}(P_{U_{1}}-P_{U_{1}^{\perp}})(x-y)\end{bmatrix}, (34)
T~=IdPU1+PU2(PU1PU1)=PU2PU1+PU2PU1,\displaystyle\widetilde{T}=\operatorname{Id}-P_{U_{1}}+P_{U_{2}}(P_{U_{1}}-P_{U_{1}^{\perp}})=P_{U_{2}}P_{U_{1}}+P_{U_{2}^{\perp}}P_{U_{1}^{\perp}}, (35)
FixT~=xU1U2x+(U1(U2))=(U1U2)+(U1U2)\displaystyle\operatorname{Fix}\widetilde{T}=\bigcup_{x\in U_{1}\cap U_{2}}x+(U_{1}^{\perp}\cap(-U_{2}^{\perp}))=(U_{1}\cap U_{2})+(U_{1}^{\perp}\cap U_{2}^{\perp}) (36)

respectively. (The latter two identities were also derived in [3].) It follows from Theorem 4.1(i) that the rPPP sequence, i.e., the governing sequence of the Douglas-Rachford algorithm, satisfies

wkw=PFixT~(w0)=PU1U2(w0)+PU1U2(w0).w_{k}\>{\rightharpoonup}\>w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})=P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0}). (37)

Combining 1.1(iii), 28, Theorem 4.1(ii), Theorem 3.1(i), 28, and 37, we have101010Given a nonempty closed convex subset UU of XX, its reflected projection is RU:=RNU:=2PUIdR_{U}:=R_{N_{U}}:=2P_{U}-\operatorname{Id}.

Tuk\displaystyle Tu_{k} =(M+A)1Cwk=[PU1wkPU2RU1wk]\displaystyle=(M+A)^{-1}Cw_{k}=\begin{bmatrix}P_{U_{1}}w_{k}\\ P_{U_{2}^{\perp}}R_{U_{1}}w_{k}\\ \end{bmatrix} (38)

and

u\displaystyle u^{*} =PFixTM(u0)=(M+A)1Cw=[PU1wPU2RU1w]=[PU1(PU1U2(w0)+PU1U2(w0))PU2RU1(PU1U2(w0)+PU1U2(w0))]\displaystyle=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{*}=\begin{bmatrix}P_{U_{1}}w^{*}\\ P_{U_{2}^{\perp}}R_{U_{1}}w^{*}\end{bmatrix}=\begin{bmatrix}P_{U_{1}}\big{(}P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\big{)}\\ P_{U_{2}^{\perp}}R_{U_{1}}\big{(}P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\big{)}\end{bmatrix} (39a)
=[PU1U2(w0)PU1U2(w0)]=[PU1U2(x0y0)PU1U2(y0x0)].\displaystyle=\begin{bmatrix}P_{U_{1}\cap U_{2}}(w_{0})\\ -P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\end{bmatrix}=\begin{bmatrix}P_{U_{1}\cap U_{2}}(x_{0}-y_{0})\\ P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(y_{0}-x_{0})\end{bmatrix}. (39b)

Note that the last description of uu^{*} clearly differs from PFixT(u0)=(PU1U2(x0),PU1U2(y0))P_{\operatorname{Fix}T}(u_{0})=(P_{U_{1}\cap U_{2}}(x_{0}),P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(y_{0})) in general! Returning to 39, we deduce in particular that the shadow sequence (PU1wk)(P_{U_{1}}w_{k}) satisfies

PU1wkPU1U2(w0).P_{U_{1}}w_{k}\>{\rightharpoonup}\>P_{U_{1}\cap U_{2}}(w_{0}). (40)

Moreover, combining with Theorem 4.2, we have the following strong convergence111111 This was also derived in [3] when λ=1\lambda=1. result:

wkPU1U2(w0)+PU1U2(w0)andPU1wkPU1U2(w0)w_{k}\to P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\;\;\text{and}\;\;P_{U_{1}}w_{k}\to P_{U_{1}\cap U_{2}}(w_{0}) (41)

provided that λk=λ\lambda_{k}=\lambda for all k{k\in{\mathbb{N}}}.

5.1.4 When ΠSM\Pi_{S}^{M} is bizarre

We return to the general setup of Section 5.1.1. We shall show that ΠSM\Pi_{S}^{M} may be empty or single-valued when SS is a certain closed convex subset of H=X2H=X^{2}. To this end, assume that SHS\subseteq H is the Cartesian product

S=S1×S2,S=S_{1}\times S_{2}, (42)

where S1,S2S_{1},S_{2} are nonempty closed convex subsets of XX. Then C(S)=S1S2C^{*}(S)=S_{1}-S_{2} may fail to be closed even when each SiS_{i} is a closed linear subspace of XX (see, e.g., [3]). Consider such a scenario, let x¯S1S2¯(S1S2)\bar{x}\in\overline{S_{1}-S_{2}}\smallsetminus(S_{1}-S_{2}), and set h¯:=(x¯,0)\bar{h}:=(\bar{x},0) and let s=(s1,s2)Ss=(s_{1},s_{2})\in S. Then h¯sM=x¯(s1s2)>0\|\bar{h}-s\|_{M}=\|\bar{x}-(s_{1}-s_{2})\|>0 while clearly infx¯(S1S2)=0\inf\|\bar{x}-(S_{1}-S_{2})\|=0. In other words,

ΠSM(h¯)=.\Pi_{S}^{M}(\bar{h})=\varnothing. (43)

On the other hand, if S1=S2=XS_{1}=S_{2}=X and h=(h1,h2)S=Hh=(h_{1},h_{2})\in S=H, then S=S1×S2=X×XS=S_{1}\times S_{2}=X\times X and Proposition 3.7 yields

ΠSM(h)=(h+kerC)S=((h1,h2)+{(x,x)|xX})\Pi_{S}^{M}({h})=\big{(}h+\ker C^{*}\big{)}\cap S=\big{(}(h_{1},h_{2})+\big{\{}{(x,x)}~{}\big{|}~{}{x\in X}\big{\}}\big{)} (44)

is clearly multi-valued provided X{0}X\neq\{0\}. In summary, ΠSM\Pi_{S}^{M} may be empty-valued or multi-valued.

5.2 Chambolle-Pock

5.2.1 General setup

Following the presentation of the algorithm by Chambolle-Pock (see [11]) in Bredies et al.’s [8] (see also [12]), we suppose that XX and YY are two real Hilbert spaces, A1A_{1} is maximally monotone on XX and A2A_{2} is maximally monotone on YY. We also have a continuous linear operator L:XYL\colon X\to Y, as well as σ>0\sigma>0 and τ>0\tau>0 such that στL21\sigma\tau\|L\|^{2}\leq 1. The goal is to find a point in121212The operator LA2LL^{*}A_{2}L is not necessarily maximally monotone. For a sufficient condition, see [6, Corollary 25.6].

zer(A1+LA2L),\operatorname{zer}(A_{1}+L^{*}A_{2}L), (45)

which we assume to exist. We have

H=X×YandM=[1σIdXLL1τIdY];H=X\times Y\;\;\text{and}\;\;M=\begin{bmatrix}\frac{1}{\sigma}\operatorname{Id}_{X}&-L^{*}\\ -L&\frac{1}{\tau}\operatorname{Id}_{Y}\end{bmatrix}; (46)

hence,

(x,y)M2=1σx22Lx,y+1τy2.\|(x,y)\|_{M}^{2}=\frac{1}{\sigma}\|x\|^{2}-2\left\langle{Lx},{y}\right\rangle+\frac{1}{\tau}\|y\|^{2}. (47)

Unfortunately, for Chambolle-Pock the operator CC in the factorization CCCC^{*} is typically not explicitly available131313However, see Section 6.1 and Section 6.2 below for some factorizations.. Moreover,

A=[A1LLA21].A=\begin{bmatrix}A_{1}&L^{*}\\ -L&A_{2}^{-1}\end{bmatrix}. (48)

Then AA is maximally monotone on X×YX\times Y, because it is the sum of the maximally monotone operator (x,y)A1x×A21y(x,y)\mapsto A_{1}x\times A_{2}^{-1}y and a skew linear operator. Next,

zerA\displaystyle\operatorname{zer}A ={(x,y)|0A1x+Ly0Lx+A21y}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{0\in A_{1}x+L^{*}y\land 0\in-Lx+A_{2}^{-1}y}\big{\}} (49a)
={(x,y)|0A1x+LyyA2(Lx)}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{0\in A_{1}x+L^{*}y\land y\in A_{2}(Lx)}\big{\}} (49b)
={(x,y)|xzer(A1+LA2L)y(LA1x)(A2Lx)}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{x\in\operatorname{zer}(A_{1}+L^{*}A_{2}L)\land y\in(-L^{-*}A_{1}x)\cap(A_{2}Lx)}\big{\}} (49c)
={(x,y)|yzer(LA11(L)+A21)x(A11(Ly))(L1A21y)}\displaystyle=\big{\{}{(x,y)}~{}\big{|}~{}{y\in\operatorname{zer}(-L^{*}A_{1}^{-1}(-L)+A_{2}^{-1})\land x\in(A_{1}^{-1}(-L^{*}y))\cap(L^{-1}A_{2}^{-1}y)}\big{\}} (49d)

where 49d follows from elementary algebraic manipulations (see also [17, Proposition 1]). This shows that when (x,y)zerA(x,y)\in\operatorname{zer}A, then xx is a primal solution while yy corresponds to a dual solution, i.e., yy satisfies 0LA1∨⃝Ly=A21y0\in L^{*}A_{1}^{-\ovee}Ly=A_{2}^{-1}y. One verifies (see [8, equation (3.4)]) that

(M+A)1:X×YX×Y:[xy][JσA1(σx)JτA21(2τLJσA1(σx)+τy)](M+A)^{-1}\colon X\times Y\to X\times Y\colon\begin{bmatrix}x\\ y\end{bmatrix}\mapsto\begin{bmatrix}J_{\sigma A_{1}}(\sigma x)\\ J_{\tau A_{2}^{-1}}\big{(}2\tau LJ_{\sigma A_{1}}(\sigma x)+\tau y\big{)}\end{bmatrix} (50)

is indeed single-valued, full-domain, and Lipschitz continuous. Hence (see [8, equation (3.3)])

T(x,y)\displaystyle T(x,y) =(M+A)1M(x,y)=[JσA1(xσLy)JτA21(y+τL(2JσA1(xσLy)x))].\displaystyle=(M+A)^{-1}M(x,y)=\begin{bmatrix}J_{\sigma A_{1}}(x-\sigma L^{*}y)\\ J_{\tau A_{2}^{-1}}\big{(}y+\tau L(2J_{\sigma A_{1}}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix}. (51)

Using the general inverse resolvent identity, (see, e.g., [6, Proposition 23.20]), we know that JτA21=IdτJ1τA21τIdJ_{\tau A_{2}^{-1}}=\operatorname{Id}-\tau J_{\frac{1}{\tau}A_{2}}\circ\frac{1}{\tau}\operatorname{Id}. Therefore, we can express 51 also as

T(x,y)=[JσA1(xσLy)y+τL(2JσA1(xσLy)x)τJ1τA2(1τy+L(2JσA1(xσLy)x))].T(x,y)=\begin{bmatrix}J_{\sigma A_{1}}(x-\sigma L^{*}y)\\[5.0pt] y+\tau L(2J_{\sigma A_{1}}(x-\sigma L^{*}y)-x)-\tau J_{\frac{1}{\tau}A_{2}}\big{(}\frac{1}{\tau}y+L(2J_{\sigma A_{1}}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix}. (52)

5.2.2 An example without strong convergence

Now suppose temporarily that X=2X=\ell^{2}, that BB, s0s_{0} and s¯\bar{s} are as in Proposition 2.4, that A1=1σBA_{1}=\tfrac{1}{\sigma}B, and that A2=0A_{2}=0. Then 49 and 51 turn into

zerA=zer(B)×{0}andT(x,y)=[JB(xσLy)0].\operatorname{zer}A=\operatorname{zer}(B)\times\{0\}\;\;\text{and}\;\;T(x,y)=\begin{bmatrix}J_{B}(x-\sigma L^{*}y)\\ 0\end{bmatrix}. (53)

Suppose that furthermore (k)(\forall{k\in{\mathbb{N}}}) λk=1\lambda_{k}=1 and u0=(s0,0)u_{0}=(s_{0},0). It then follows that uk=Tk(s0,0)=(JBk(s0),0)(s¯,0)u_{k}=T^{k}(s_{0},0)=(J_{B}^{k}(s_{0}),0)\>{\rightharpoonup}\>(\bar{s},0) but the convergence is not strong.

5.2.3 Specialization to normal cones of linear subspaces

We now assume that UU is closed linear subspace of XX, that VV is a closed linear subspace of YY, that A1=NU,A2=NVA_{1}=N_{U},A_{2}=N_{V}. Let (x,y)X×Y(x,y)\in X\times Y. Then A1x=UA_{1}x=U^{\perp} if xUx\in U, and A1x=A_{1}x=\varnothing if xUx\notin U and similarly for NVyN_{V}y. Note that 0=0+0U+LV0=0+0\in U^{\perp}+L^{*}V^{\perp}. It follows that xzer(A1+LA2L)x\in\operatorname{zer}(A_{1}+L^{*}A_{2}L) \Leftrightarrow 0A1x+LA2Lx0\in A_{1}x+L^{*}A_{2}Lx \Leftrightarrow [xULxV][x\in U\land Lx\in V] \Leftrightarrow xUL1(V)x\in U\cap L^{-1}(V). We have shown zer(A1+LA2L)=UL1(V)\operatorname{zer}(A_{1}+L^{*}A_{2}L)=U\cap L^{-1}(V). Now assume that xzer(A1+LA2L)=UL1(V)x\in\operatorname{zer}(A_{1}+L^{*}A_{2}L)=U\cap L^{-1}(V). Then y(LA1x)(A2Lx)y\in(-L^{-*}A_{1}x)\cap(A_{2}Lx) \Leftrightarrow [L(y)A1xyA2Lx][L^{*}(-y)\in A_{1}x\land y\in A_{2}Lx] \Leftrightarrow [LyUyV][-L^{*}y\in U^{\perp}\land y\in V^{\perp}] \Leftrightarrow [LyU=UyV][L^{*}y\in-U^{\perp}=U^{\perp}\land y\in V^{\perp}] \Leftrightarrow yL(U)Vy\in L^{-*}(U^{\perp})\cap V^{\perp}. Combining this with 49 yields

FixT=zerA=(UL1(V))×(VL(U)).\operatorname{Fix}T=\operatorname{zer}A=\big{(}U\cap L^{-1}(V)\big{)}\times\big{(}V^{\perp}\cap L^{-*}(U^{\perp})\big{)}. (54)

Now 51 and 52 particularize to

T(x,y)\displaystyle T(x,y) =[PU(xσLy)PV(y+τL(2PU(xσLy)x))]\displaystyle=\begin{bmatrix}P_{U}(x-\sigma L^{*}y)\\ P_{V^{\perp}}\big{(}y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix} (55)

and

T(x,y)=[PU(xσLy)y+τL(2PU(xσLy)x)PV(y+τL(2PU(xσLy)x))]T(x,y)=\begin{bmatrix}P_{U}(x-\sigma L^{*}y)\\[5.0pt] y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)-P_{V}\big{(}y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix} (56)

respectively.

Lemma 5.1.

Given (x0,y0)X×Y(x_{0},y_{0})\in X\times Y, we have

PFixTM(x0,y0)=(PUL1(V)(x0σLy0),PVL(U)(y0τLx0)).P_{\operatorname{Fix}T}^{M}(x_{0},y_{0})=\big{(}P_{U\cap L^{-1}(V)}(x_{0}-\sigma L^{*}y_{0}),P_{V^{\perp}\cap L^{-*}(U^{\perp})}(y_{0}-\tau Lx_{0})\big{)}. (57)

Proof. Using 54, we want to find the (unique by Theorem 3.1(iii)) minimizer of the function (x,y)(x,y)(x0,y0)M(x,y)\mapsto\|(x,y)-(x_{0},y_{0})\|_{M} subject to xUL1(V)x\in U\cap L^{-1}(V) and yVL(U)y\in V^{\perp}\cap L^{-*}(U^{\perp}). First, squaring (x,y)(x0,y0)M\|(x,y)-(x_{0},y_{0})\|_{M} and recalling 47 yields 1σxx022L(xx0),yy0+1τyy02\frac{1}{\sigma}\|x-x_{0}\|^{2}-2\left\langle{L(x-x_{0})},{y-y_{0}}\right\rangle+\frac{1}{\tau}\|y-y_{0}\|^{2}. Second, expanding, discarding constant terms, and using LxyLx\perp y results in 1σx22σx,x0+2Lx0,y+2Lx,y0+1τy22τy,y0=1σ(x22x,x0σLy0)+1τ(y22y,y0τLx0)\frac{1}{\sigma}\|x\|^{2}-\frac{2}{\sigma}\left\langle{x},{x_{0}}\right\rangle+2\left\langle{Lx_{0}},{y}\right\rangle+2\left\langle{Lx},{y_{0}}\right\rangle+\frac{1}{\tau}\|y\|^{2}-\frac{2}{\tau}\left\langle{y},{y_{0}}\right\rangle=\frac{1}{\sigma}\big{(}\|x\|^{2}-2\left\langle{x},{x_{0}-\sigma L^{*}y_{0}}\right\rangle\big{)}+\frac{1}{\tau}\big{(}\|y\|^{2}-2\left\langle{y},{y_{0}-\tau Lx_{0}}\right\rangle\big{)}. Thirdly, completing the squares (which adds only constant terms here), we obtain 1σx(x0σLy0)2+1τy(y0τLx0)2\frac{1}{\sigma}\|x-(x_{0}-\sigma L^{*}y_{0})\|^{2}+\frac{1}{\tau}\|y-(y_{0}-\tau Lx_{0})\|^{2} and the proclaimed identity follows. \hfill\quad\blacksquare

If (uk)k=(xk,yk)k(u_{k})_{k\in{\mathbb{N}}}=(x_{k},y_{k})_{k\in{\mathbb{N}}} is the PPP sequence for Chambolle-Pock, then the weak limit u=(x,y)u^{*}=(x^{*},y^{*}) of (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}} is given by

(x,y)=(PUL1(V)(x0σLy0),PVL(U)(y0τLx0))(x^{*},y^{*})=\big{(}P_{U\cap L^{-1}(V)}(x_{0}-\sigma L^{*}y_{0}),P_{V^{\perp}\cap L^{-*}(U^{\perp})}(y_{0}-\tau Lx_{0})\big{)} (58)

because of Theorem 4.1(ii) and Lemma 5.1. Moreover, combining with Theorem 4.2, we obtain the strong convergence result

PU(xkσLyk)PUL1(V)(x0)when y0=0 and (λk)kλ.P_{U}(x_{k}-\sigma L^{*}y_{k})\to P_{U\cap L^{-1}(V)}(x_{0})\quad\text{when $y_{0}=0$ and $(\lambda_{k})_{k\in{\mathbb{N}}}\equiv\lambda$.} (59)
Remark 5.2.

We note that 58 beautifully generalizes 39, where Y=XY=X, L=IdL=\operatorname{Id} and σ=τ=1\sigma=\tau=1. To the best of our knowledge, the limit formula 58 appears to be new, even in the classical case where MM is positive definite. Moreover, 59 is a nice way to compute algorithmically PUL1(V)P_{U\cap L^{-1}(V)} in the case when only PVP_{V} is available but PL1(V)P_{L^{-1}(V)} is not.

5.2.4 Specializing Section 5.2.3 even further to two lines in 2\mathbb{R}^{2}

We now turn to a pleasing special case that allows for the explicit computation of the spectral radius and the operator norm of TT. Keeping the setup of Section 5.2.3, we assume additionally that Y=XY=X, L=IdL=\operatorname{Id}, and σ=1/τ\sigma=1/\tau. Then the operator TT from 55 turns into

T\displaystyle T =[PU1τPUτPV(PUPU)PV(PUPU)].\displaystyle=\begin{bmatrix}P_{U}&-\frac{1}{\tau}P_{U}\\ \tau P_{V^{\perp}}(P_{U}-P_{U^{\perp}})&P_{V^{\perp}}(P_{U^{\perp}}-P_{U})\end{bmatrix}. (60)

We now specialize even further to X=2X=\mathbb{R}^{2} and θ[0,π[\theta\in\left[0,\pi\right[, where we consider the two lines

U=[1,0]and𝖳V=[cos(θ),sin(θ)]𝖳U=\mathbb{R}[1,0]{{}^{\mkern-1.5mu\mathsf{T}}}\;\;\text{and}\;\;V=\mathbb{R}[\cos(\theta),\sin(\theta)]{{}^{\mkern-1.5mu\mathsf{T}}} (61)

which have an angle of θ\theta and for which have projection matrices (see, e.g., [3, Section 5])

PU=[1000]andPV=[cos2(θ)cos(θ)sin(θ)cos(θ)sin(θ)sin2(θ)].P_{U}=\begin{bmatrix}1&0\\ 0&0\end{bmatrix}\;\;\text{and}\;\;P_{V}=\begin{bmatrix}\cos^{2}(\theta)&\cos(\theta)\sin(\theta)\\ \cos(\theta)\sin(\theta)&\sin^{2}(\theta)\end{bmatrix}. (62)

It follows that 60 turns into

T=[101τ00000τsin2(θ)τcos(θ)sin(θ)sin2(θ)cos(θ)sin(θ)τcos(θ)sin(θ)τcos2(θ)cos(θ)sin(θ)cos2(θ)].T=\begin{bmatrix}1&0&-\frac{1}{\tau}&0\\ 0&0&0&0\\ \tau\,\sin^{2}(\theta)&\tau\,\cos\left(\theta\right)\,\sin\left(\theta\right)&-\sin^{2}(\theta)&-\cos\left(\theta\right)\,\sin\left(\theta\right)\\ -\tau\,\cos(\theta)\,\sin(\theta)&-\tau\cos^{2}(\theta)&\cos\left(\theta\right)\,\sin(\theta)&\cos^{2}(\theta)\end{bmatrix}. (63)

We know from the main results of Bredies et al. that TT has excellent properties with respect to the PPP algorithm. For this particular TT, we can quantify the key notions:

Lemma 5.3.

The operator TT defined in 63 satisfies the following:

  1. (i)

    (spectral radius) ρ(T)=|cos(θ)|\rho(T)=|\cos(\theta)|, which is independent of τ\tau, and minimized with minimum value of 0 when θ=π/2\theta=\pi/2 and maximized with maximum value of 11 when θ=0\theta=0.

  2. (ii)

    (operator norm) We have

    T=1+1+τ4+(1+τ2)1+τ42τ2cos(2θ)2τ2.\|T\|=\sqrt{1+\frac{1+\tau^{4}+(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}}}. (64)
  3. (iii)

    (bounds) 21+max{τ2,1/τ2}Tτ+1/τ\sqrt{2}\leq\sqrt{1+\max\{\tau^{2},1/\tau^{2}\}}\leq\|T\|\leq\tau+{1}/{\tau}, and the lower bound is attained when θ=0\theta=0 and τ=1\tau=1 while the upper bound is attained when θ=π/2\theta=\pi/2.

  4. (iv)

    (Douglas-Rachford case) When τ=1\tau=1, we have T=2+2sin(θ)\|T\|=\sqrt{2+2\sin(\theta)} which is minimized with minimum value 2\sqrt{2} when θ=0\theta=0, and maximized with maximum value 22 when θ=π/2\theta=\pi/2.

Proof. The matrix TT has the characteristic polynomial (in the variable ζ\zeta):

ζ2(ζ22cos2(θ)ζ+cos2(θ));\zeta^{2}\big{(}\zeta^{2}-2\cos^{2}(\theta)\zeta+\cos^{2}(\theta)\big{)}; (65)

hence, the eigenvalues of TT are 0,0,|cos(θ)|(|cos(θ)|±i|sin(θ)|)0,0,|\cos(\theta)|\big{(}|\cos(\theta)|\pm\mathrm{i}|\sin(\theta)|\big{)}, with absolute values
0,0,|cos(θ)||cos(θ)|2+|(±|sin(θ)|)|2=|cos(θ)|0,0,|\cos(\theta)|\sqrt{|\cos(\theta)|^{2}+|(\pm|\sin(\theta)|)|^{2}}=|\cos(\theta)|. Thus we obtain that the spectral radius of TT to be

ρ(T)=|cos(θ)|.\rho(T)=|\cos(\theta)|. (66)

This verifies (i).

Next, the matrix TT𝖳T{{}^{\mkern-1.5mu\mathsf{T}}}T has the characteristic polynomial (in the variable ζ\zeta)

ζ2τ2(τ2ζ2(1+τ2)2ζ+(1+τ2)2cos2(θ))\frac{\zeta^{2}}{\tau^{2}}\Big{(}\tau^{2}\zeta^{2}-(1+\tau^{2})^{2}\zeta+(1+\tau^{2})^{2}\cos^{2}(\theta)\Big{)} (67)

and thus it has eigenvalues 0,00,0 and

(1+τ2)2±(1+τ2)44τ2(1+τ2)2cos2(θ)2τ2\displaystyle\frac{(1+\tau^{2})^{2}\pm\sqrt{(1+\tau^{2})^{4}-4\tau^{2}(1+\tau^{2})^{2}\cos^{2}(\theta)}}{2\tau^{2}} (68a)
=(1+τ2)2±(1+τ2)1+τ4+2τ2(12cos2(θ))2τ2\displaystyle=\frac{(1+\tau^{2})^{2}\pm(1+\tau^{2})\sqrt{1+\tau^{4}+2\tau^{2}(1-2\cos^{2}(\theta))}}{2\tau^{2}} (68b)
=(1+τ2)2±(1+τ2)1+τ42τ2cos(2θ)2τ2\displaystyle=\frac{(1+\tau^{2})^{2}\pm(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}} (68c)
=1+1+τ4±(1+τ2)1+τ42τ2cos(2θ)2τ2.\displaystyle=1+\frac{1+\tau^{4}\pm(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}}. (68d)

Therefore,

T\displaystyle\|T\| =1+1+τ4+(1+τ2)1+τ42τ2cos(2θ)2τ2\displaystyle=\sqrt{1+\frac{1+\tau^{4}+(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}}} (69)

and we have verified (ii).

Turning to (iii), we see that for fixed τ\tau that T\|T\| is smallest (resp. largest) when θ=0\theta=0 (resp. θ=π/2\theta=\pi/2) in which case T\|T\| becomes 1+(1+τ4+(1+τ2)|1τ2|)/(2τ2)\sqrt{1+({1+\tau^{4}+(1+\tau^{2})|1-\tau^{2}|})/({2\tau^{2}})} (resp. 1+(1+τ4+(1+τ2)(1+τ2))/(2τ2))\sqrt{1+({1+\tau^{4}+(1+\tau^{2})(1+\tau^{2})})/({2\tau^{2}})}\big{)} which further simplifies to 1+max{τ2,1/τ2}\sqrt{1+\max\{\tau^{2},1/\tau^{2}\}} (resp. τ+1/τ\tau+1/\tau).

Finally, (iv) follows by simplifying (ii) with τ=1\tau=1. \hfill\quad\blacksquare

Remark 5.4.

Lemma 5.3(iii) impressively shows that studying the operator TT from 63 is outside the realm of fixed point theory of classical nonexpansive mappings. In such cases, the spectral radius is helpful (for more results in this direction, see [4]). We also point out that the operator TT from 63 is firmly nonexpansive — not with respect to the standard Hilbert space norm on X2X^{2}, but rather with respect to the seminorm induced by the preconditioner MM (see [8, Lemma 2.6] for details).

5.2.5 Numerical experiment

In this section, let UU be a closed affine subspace of XX and let bYb\in Y. Let (x,y)X×Y(x,y)\in X\times Y. We assume that A1=NUA_{1}=N_{U} and A2=N{b}A_{2}=N_{\{b\}}. Then 52 turns into

T(x,y)=[PU(xσLy)y+τL(2PU(xσLy)x)τb].T(x,y)=\begin{bmatrix}P_{U}(x-\sigma L^{*}y)\\[5.0pt] y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)-\tau b\end{bmatrix}. (70)

Now assume that xzer(A1+LA2L)=UL1(b)x\in\operatorname{zer}(A_{1}+L^{*}A_{2}L)=U\cap L^{-1}(b). Then y(LA1x)(A2Lx)y\in(-L^{-*}A_{1}x)\cap(A_{2}Lx) \Leftrightarrow [L(y)A1xyA2Lx][L^{*}(-y)\in A_{1}x\land y\in A_{2}Lx] \Leftrightarrow [Ly(UU)y({b}{b})][-L^{*}y\in(U-U)^{\perp}\land y\in(\{b\}-\{b\})^{\perp}] \Leftrightarrow [Ly(UU)=(UU)yY][L^{*}y\in-(U-U)^{\perp}=(U-U)^{\perp}\land y\in Y] \Leftrightarrow yL((UU))y\in L^{-*}((U-U)^{\perp}). It follows from 49 that

zerA=FixT=(UL1(b))×L((UU)).\operatorname{zer}A=\operatorname{Fix}T=\big{(}U\cap L^{-1}(b)\big{)}\times L^{-*}\big{(}(U-U)^{\perp}\big{)}. (71)

Arguing similarly to the derivation of 58 and 59, we obtain that the weak limit u=(x,y)u^{*}=(x^{*},y^{*}) of the PPP sequence (uk)k=(xk,yk)k(u_{k})_{k\in{\mathbb{N}}}=(x_{k},y_{k})_{k\in{\mathbb{N}}} of Chambolle-Pock is given by

(x,y)=(PUL1(b)(x0σLy0),PL((UU))(y0τLx0))(x^{*},y^{*})=\big{(}P_{U\cap L^{-1}(b)}(x_{0}-\sigma L^{*}y_{0}),P_{L^{-*}((U-U)^{\perp})}(y_{0}-\tau Lx_{0})\big{)} (72)

and that

PU(xkσLyk)PUL1(b)(x0)when y0=0 and (λk)kλ.P_{U}(x_{k}-\sigma L^{*}y_{k})\to P_{U\cap L^{-1}(b)}(x_{0})\quad\text{when $y_{0}=0$ and $(\lambda_{k})_{k\in{\mathbb{N}}}\equiv\lambda$.} (73)

We now illustrate 73 numerically using a set up motivated by Computed Tomography (CT), which uses X-ray measurements to reconstruct cross-sectional body images (see, e.g., [22]). The resulting inverse problem we consider here amounts to solving Lx=bLx=b where L6750×2500L\in\mathbb{R}^{6750\times 2500} and b6750b\in\mathbb{R}^{6750}: indeed, the matrix LL and the vector bb were generated from the 50×5050\times 50 Shepp-Logan phantom image [29], reshaped to a vector x2500x^{*}\in\mathbb{R}^{2500} using the Matlab AIR Tools II package141414We used the command paralleltomo(50,0:2:178,75). In fact, the experiments were performed in GNU Octave [16]. [21]. In turn, the closed linear subspace UU was obtained by using the a priori information that the first and last two columns of the phantom image must be black, i.e., they contain only zeros. Because the matrix LLL^{*}L is smaller than LLLL^{*}, we compute L\|L\| via L2=LL\|L\|^{2}=\|L^{*}L\| and then set σ:=τ:=0.99/L\sigma:=\tau:=0.99/\|L\|. In Figure 1 we present the reconstructed phantom images generated after 100100 and 1000010000 iterations of Chambolle-Pock, with starting point (x0,y0)=(0,0)(x_{0},y_{0})=(0,0) and λk1\lambda_{k}\equiv 1, along with the exact phantom.

Refer to caption Refer to caption Refer to caption
(a) Exact phantom image (b) Image generated by x100x_{100} (c) Image generated by x10000x_{10000}
Figure 1: The exact phantom image and reconstructions generated by x100x_{100} and x10000x_{10000}.

5.3 Ryu

5.3.1 General setup

Let XX be a real Hilbert space, and A1,A2,A3A_{1},A_{2},A_{3} be maximally monotone on XX. The goal is to find a point in

Z:=zer(A1+A2+A3)Z:=\operatorname{zer}(A_{1}+A_{2}+A_{3}) (74)

which we assume to be nonempty. The three-operator splitting algorithm by Ryu [28] is designed to solve this problem. It was pointed out in [9] that this fits the framework by Bredies et al. Indeed, assume that

H=X5,D=X2,andC=[Id00IdIdIdId00Id].\displaystyle H=X^{5},\quad D=X^{2},\quad\text{and}\quad C=\begin{bmatrix}\operatorname{Id}&0\\ 0&\operatorname{Id}\\ -\operatorname{Id}&-\operatorname{Id}\\ \operatorname{Id}&0\\ 0&\operatorname{Id}\end{bmatrix}. (75)

Then

C=[Id0IdId00IdId0Id]C^{*}=\begin{bmatrix}\operatorname{Id}&0&-\operatorname{Id}&\operatorname{Id}&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&\operatorname{Id}\end{bmatrix} (76)

is clearly surjective and

M=CC=[Id0IdId00IdId0IdIdId2IdIdIdId0IdId00IdId0Id].M=CC^{*}=\begin{bmatrix}\operatorname{Id}&0&-\operatorname{Id}&\operatorname{Id}&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&\operatorname{Id}\\ -\operatorname{Id}&-\operatorname{Id}&2\operatorname{Id}&-\operatorname{Id}&-\operatorname{Id}\\ \operatorname{Id}&0&-\operatorname{Id}&\operatorname{Id}&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&\operatorname{Id}\end{bmatrix}. (77)

Now assume that

A=[2A1+Id0IdId02Id2A2+IdId0IdIdId2A3IdIdId0Id000IdId00],A=\begin{bmatrix}2A_{1}+\operatorname{Id}&0&\operatorname{Id}&-\operatorname{Id}&0\\ -2\operatorname{Id}&2A_{2}+\operatorname{Id}&\operatorname{Id}&0&-\operatorname{Id}\\ -\operatorname{Id}&-\operatorname{Id}&2A_{3}&\operatorname{Id}&\operatorname{Id}\\ \operatorname{Id}&0&-\operatorname{Id}&0&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&0\end{bmatrix}, (78)

which is maximally monotone151515Indeed, AA is the sum of the operator (x1,,x5)2A1x1×2A2x2×2A3x3×{0}×{0}(x_{1},\ldots,x_{5})\mapsto 2A_{1}x_{1}\times 2A_{2}x_{2}\times 2A_{3}x_{3}\times\{0\}\times\{0\}, the gradient of (x1,,x5)12x1x22(x_{1},\ldots,x_{5})\mapsto\frac{1}{2}\lVert x_{1}-x_{2}\rVert^{2}, and a skew linear operator. on HH. Next, given (x1,,x5)H(x_{1},\ldots,x_{5})\in H, we have

(x1,,x5)zerA\displaystyle(x_{1},\ldots,x_{5})\in\operatorname{zer}A {0(2A1+Id)x1+x3x402x1+(2A2+Id)x2+x3x50x1x2+2A3x3+x4+x50=x1x30=x2x3\displaystyle\Leftrightarrow\begin{cases}0\in(2A_{1}+\operatorname{Id})x_{1}+x_{3}-x_{4}\\ 0\in-2x_{1}+(2A_{2}+\operatorname{Id})x_{2}+x_{3}-x_{5}\\ 0\in-x_{1}-x_{2}+2A_{3}x_{3}+x_{4}+x_{5}\\ 0=x_{1}-x_{3}\\ 0=x_{2}-x_{3}\end{cases} (79a)
{x=x1=x2=x3x42A1x+2xx52A2xx4x52A3x2x.\displaystyle\Leftrightarrow\begin{cases}x=x_{1}=x_{2}=x_{3}\\ x_{4}\in 2A_{1}x+2x\\ x_{5}\in 2A_{2}x\\ -x_{4}-x_{5}\in 2A_{3}x-2x.\end{cases} (79b)

This gives the description

zerA={(z,z,z,x4,x5)H|zZ12x4A1z+z12x5A2z12x412x5A3zz}.\operatorname{zer}A=\big{\{}{(z,z,z,x_{4},x_{5})\in H}~{}\big{|}~{}{z\in Z\land\tfrac{1}{2}x_{4}\in A_{1}z+z\land\tfrac{1}{2}x_{5}\in A_{2}z\land-\tfrac{1}{2}x_{4}-\tfrac{1}{2}x_{5}\in A_{3}z-z}\big{\}}. (80)

Next, one verifies that

(M+A)1:X5X5:[x1x2x3x4x5][y1y2y3y4y5]=[JA1(12x1)JA2(12x2+y1)JA3(12x3+y1+y2)x42y1+2y3x52y2+2y3],(M+A)^{-1}:X^{5}\to X^{5}:\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix}\mapsto\begin{bmatrix}y_{1}\\ y_{2}\\ y_{3}\\ y_{4}\\ y_{5}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{(}\tfrac{1}{2}x_{1}\big{)}\\[2.84526pt] J_{A_{2}}\big{(}\tfrac{1}{2}x_{2}+y_{1}\big{)}\\[2.84526pt] J_{A_{3}}\big{(}\tfrac{1}{2}x_{3}+y_{1}+y_{2}\big{)}\\[2.84526pt] x_{4}-2y_{1}+2y_{3}\\[2.84526pt] x_{5}-2y_{2}+2y_{3}\end{bmatrix}, (81)

which is clearly single-valued, full domain, and Lipschitz continuous. Combining 81 and 77, we end up with

T=(M+A)1M:X5\displaystyle T=(M+A)^{-1}M:X^{5} X5\displaystyle\to X^{5} (82a)
[x1x2x3x4x5]\displaystyle\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} [y1y2y3y4y5]=[JA1(12(x1x3+x4))JA2(12(x2x3+x5)+y1)JA3(12(x1x2+2x3x4x5)+y1+y2)x1x3+x42y1+2y3x2x3+x52y2+2y3].\displaystyle\mapsto\begin{bmatrix}y_{1}\\ y_{2}\\ y_{3}\\ y_{4}\\ y_{5}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\tfrac{1}{2}(x_{1}-x_{3}+x_{4})\big{\rparen}\\[2.84526pt] J_{A_{2}}\big{\lparen}\tfrac{1}{2}(x_{2}-x_{3}+x_{5})+y_{1}\big{\rparen}\\[2.84526pt] J_{A_{3}}\big{\lparen}\tfrac{1}{2}(-x_{1}-x_{2}+2x_{3}-x_{4}-x_{5})+y_{1}+y_{2}\big{\rparen}\\[2.84526pt] x_{1}-x_{3}+x_{4}-2y_{1}+2y_{3}\\[2.84526pt] x_{2}-x_{3}+x_{5}-2y_{2}+2y_{3}\end{bmatrix}. (82b)

Turning to T~\widetilde{T}, we verify that T~=C(M+A)1C:DD\widetilde{T}=C^{*}(M+A)^{-1}C:D\to D is given by

T~:[w1w2][w1w2]+[y3y1y3y2], where[y1y2y3]=[JA1(12w1)JA2(12w2+z1)JA3(12(w1w2)+y1+y2)].\widetilde{T}\colon\begin{bmatrix}w_{1}\\ w_{2}\end{bmatrix}\mapsto\begin{bmatrix}w_{1}\\ w_{2}\end{bmatrix}+\begin{bmatrix}y_{3}-y_{1}\\ y_{3}-y_{2}\end{bmatrix},\text{~{}~{}where}\begin{bmatrix}y_{1}\\ y_{2}\\ y_{3}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\tfrac{1}{2}w_{1}\big{\rparen}\\[2.84526pt] J_{A_{2}}\big{\lparen}\tfrac{1}{2}w_{2}+z_{1}\big{\rparen}\\[2.84526pt] J_{A_{3}}\big{\lparen}\tfrac{1}{2}(-w_{1}-w_{2})+y_{1}+y_{2}\big{\rparen}\end{bmatrix}. (83)

The operator T~\widetilde{T} is a scaled version161616If we denote the original Ryu operator by TRT_{\rm R}, then T~(w)=2TR(w/2)\widetilde{T}(w)=2T_{\rm R}(w/2). So, FixT~=2FixTR\operatorname{Fix}\widetilde{T}=2\operatorname{Fix}T_{\rm R} and T~k(w)=2TRk(w/2)\widetilde{T}^{k}(w)=2T_{\rm R}^{k}(w/2). of the original Ryu operator. Combining 80 and 14, we obtain

FixT~={(w1,w2)H|zZ12w1A1z+z12w2A2z12w112w2A3zz}.\operatorname{Fix}\widetilde{T}=\\ \big{\{}{(w_{1},w_{2})\in H}~{}\big{|}~{}{z\in Z\land\tfrac{1}{2}w_{1}\in A_{1}z+z\land\tfrac{1}{2}w_{2}\in A_{2}z\land-\tfrac{1}{2}w_{1}-\tfrac{1}{2}w_{2}\in A_{3}z-z}\big{\}}. (84)

5.3.2 Specialization to normal cones of linear subspaces

We now assume that each Ai=NUiA_{i}=N_{U_{i}}, where each UiU_{i} is a closed linear subspace of XX. Then Z=zer(A1+A2+A3)=U1U2U3Z=\operatorname{zer}(A_{1}+A_{2}+A_{3})=U_{1}\cap U_{2}\cap U_{3}. Then 80 turns into

zerA=FixT={(z,z,z,x4,x5)H|zZx4U1+2zx5U2x4+x5U3+2z},\operatorname{zer}A=\operatorname{Fix}T=\big{\{}{(z,z,z,x_{4},x_{5})\in H}~{}\big{|}~{}{z\in Z\land x_{4}\in U_{1}^{\perp}+2z\land x_{5}\in U_{2}^{\perp}\land x_{4}+x_{5}\in U_{3}^{\perp}+2z}\big{\}}, (85)

while 84 becomes

FixT~={(w1,w2)D|zZw1U1+2zw2U2w1+w2U3+2z}.\operatorname{Fix}\widetilde{T}=\big{\{}{(w_{1},w_{2})\in D}~{}\big{|}~{}{z\in Z\land w_{1}\in U_{1}^{\perp}+2z\land w_{2}\in U_{2}^{\perp}\land w_{1}+w_{2}\in U_{3}^{\perp}+2z}\big{\}}. (86)

We now provide an alternative description of the two fixed point sets.

Lemma 5.5 (fixed point sets).

Set S:={(z,z,z,2z,0)|zZ=U1U2U3}X5S:=\big{\{}{(z,z,z,2z,0)}~{}\big{|}~{}{z\in Z=U_{1}\cap U_{2}\cap U_{3}}\big{\}}\subseteq X^{5} and Δ2:={(x,x)|xX}X2\Delta_{2}:=\big{\{}{(x,x)}~{}\big{|}~{}{x\in X}\big{\}}\subseteq X^{2}. Then171717When S1S2S_{1}\perp S_{2}, we also write S1S2S_{1}\oplus S_{2} for S1+S2S_{1}+S_{2} to stress the orthogonality.

FixT=S({0}3×((U1×U2)(Δ2+({0}×U3))))\operatorname{Fix}T=S\oplus\big{(}\{0\}^{3}\times\big{(}(U_{1}^{\perp}\times U_{2}^{\perp})\cap(\Delta_{2}^{\perp}+(\{0\}\times U_{3}^{\perp}))\big{)}\big{)} (87)

and

FixT~=(Z×{0})((U1×U2)(Δ2+({0}×U3))).\operatorname{Fix}\widetilde{T}=(Z\times\{0\})\oplus\big{(}(U_{1}^{\perp}\times U_{2}^{\perp})\cap(\Delta_{2}^{\perp}+(\{0\}\times U_{3}^{\perp}))\big{)}. (88)

Proof. Recall that Δ2={(x,x)|xX}\Delta_{2}^{\perp}=\big{\{}{(x,-x)}~{}\big{|}~{}{x\in X}\big{\}}. The identity 87 is a reformulation of 85. To obtain 88, combine 86 with the fact that 2Z=Z2Z=Z. The orthogonality statements are a consequence of ZU1U1Z\subseteq U_{1}\perp U_{1}^{\perp}. \hfill\quad\blacksquare

Corollary 5.6 (projections).

Set E:=(U1×U2)(Δ2+({0}×U3))E:=(U_{1}^{\perp}\times U_{2}^{\perp})\cap(\Delta_{2}^{\perp}+(\{0\}\times U_{3}^{\perp})), where Δ2\Delta_{2} is as in Lemma 5.5. Then

(w=(w1,w2)D=X2)PFixT~w=(PZ(w1),0)+PE(w).(\forall w=(w_{1},w_{2})\in D=X^{2})\quad P_{\operatorname{Fix}\widetilde{T}}w=\big{(}P_{Z}(w_{1}),0\big{)}+P_{E}(w). (89)

Now let u=(u1,u2,u3,u4,u5)H=X5u=(u_{1},u_{2},u_{3},u_{4},u_{5})\in H=X^{5}, set w:=Cuw:=C^{*}u, and w:=PFixT~(w)w^{*}:=P_{\operatorname{Fix}\widetilde{T}}(w). Then

PFixTM(u)=(12PZw1,12PZw1,12PZw1,w1,w2)P_{\operatorname{Fix}T}^{M}(u)=\big{(}\tfrac{1}{2}P_{Z}w_{1},\tfrac{1}{2}P_{Z}w_{1},\tfrac{1}{2}P_{Z}w_{1},w_{1}^{\ast},w_{2}^{\ast}\big{)} (90)

Proof. The identity 89 follows directly from 88. We now tackle 90. From Theorem 3.1(i), we get

PFixTM(u)=(M+A)1CPFixT~(Cu)=(M+A)1CPFixT~(w)=(M+A)1Cw.P^{M}_{\operatorname{Fix}T}(u)=(M+A)^{-1}CP_{\operatorname{Fix}\widetilde{T}}(C^{*}u)=(M+A)^{-1}CP_{\operatorname{Fix}\widetilde{T}}(w)=(M+A)^{-1}Cw^{*}. (91)

By 89, w=(z+v1,v2)=(z+v1,v1+v3)w^{*}=(z+v_{1},v_{2})=(z+v_{1},-v_{1}+v_{3}), where z:=PZ(w1)z:=P_{Z}(w_{1}) and each viUiv_{i}\in U_{i}^{\perp}. Hence Cw=(z+v1,v2,zv3,z+v1,v2)Cw^{*}=(z+v_{1},v_{2},-z-v_{3},z+v_{1},v_{2}) using 75. In view of 81, we obtain

(M+A)1Cw\displaystyle(M+A)^{-1}Cw^{*} =(M+A)1[z+v1v2zv3z+v1v2]=[[1.3]PU1(12(z+v1))=12zPU2(12v2+12z)=12zPU3(12(z+v3)+z)=12zz+v1v2]\displaystyle=(M+A)^{-1}\begin{bmatrix}z+v_{1}\\ v_{2}\\ -z-v_{3}\\ z+v_{1}\\ v_{2}\end{bmatrix}=\begin{bmatrix}[1.3]P_{U_{1}}\big{(}\frac{1}{2}(z+v_{1})\big{)}=\frac{1}{2}z\\ P_{U_{2}}\big{(}\frac{1}{2}v_{2}+\frac{1}{2}z\big{)}=\frac{1}{2}z\\ P_{U_{3}}\big{(}-\frac{1}{2}(z+v_{3})+z\big{)}=\frac{1}{2}z\\ z+v_{1}\\ v_{2}\end{bmatrix} (92)

and we are done. \hfill\quad\blacksquare

Of course if (uk)k(u_{k})_{k\in{\mathbb{N}}} is the PPP sequence for Ryu, then the weak limit of (Tuk)k(Tu_{k})_{k\in{\mathbb{N}}} is PFixTM(u0)P^{M}_{\operatorname{Fix}T}(u_{0}) due to Theorem LABEL:*t:wconv (iii). This is given by 90 with uu replaced by the starting point u0u_{0}. The weak limit of the rPPP sequence (wk)k(w_{k})_{k\in{\mathbb{N}}} is given by 89 (with ww replaced by the starting point w0w_{0}) — this formula was observed already in [7, Lemma 4.1].

5.4 Malitsky-Tam

5.4.1 General setup

For n3n\geq 3, we consider nn maximally monotone operators A1,,AnA_{1},\dots,A_{n} on the Hilbert space XX. The goal becomes to find a point in

Z:=zer(A1++An)Z:=\operatorname{zer}(A_{1}+\dots+A_{n}) (93)

which we assume to be nonempty. The splitting algorithm by Malitsky and Tam [27] can deal with this problem and also fits the structure described in [9]. Assume that

H=X2n1,D=Xn1,andC=[IdIdIdIdIdIdIdIdId]:DH.H=X^{2n-1},\;\;D=X^{n-1},\;\;\text{and}\;\;C=\left[\begin{array}[]{rrrrr}\operatorname{Id}\\ -\operatorname{Id}&\operatorname{Id}\\ &-\operatorname{Id}&\ddots&\\ &&\ddots&\operatorname{Id}\\ &&&-\operatorname{Id}\\ \hline\cr\\[-8.53581pt] \operatorname{Id}\\ &\operatorname{Id}\\ &&\ddots\\ &&&\operatorname{Id}\end{array}\right]\colon D\to H. (94)

Then

C=[IdIdIdIdIdId]:HD:[x1xnv1vn1][x1x2+v1xn1xn+vn1]C^{\ast}=\left[\begin{array}[]{rrlr|rrr}\operatorname{Id}&-\operatorname{Id}&&&\operatorname{Id}\\ &\ddots&\ddots&&&\ddots\\ &&\operatorname{Id}&-\operatorname{Id}&&&\operatorname{Id}\end{array}\right]\colon H\to D\colon\begin{bmatrix}x_{1}\\ \vdots\\ x_{n}\\ v_{1}\\ \vdots\\ v_{n-1}\end{bmatrix}\mapsto\begin{bmatrix}x_{1}-x_{2}+v_{1}\\ \vdots\\ x_{n-1}-x_{n}+v_{n-1}\end{bmatrix} (95)

is surjective. Using CC and CC^{\ast}, we compute

M=[IdIdIdId2IdIdIdIdIdId2IdIdIdIdIdIdIdIdIdIdIdIdIdIdId].M=\left[\begin{array}[]{rrrrrr|rrrrrrrr}\operatorname{Id}&-\operatorname{Id}&&&&&\operatorname{Id}&&\\ -\operatorname{Id}&2\operatorname{Id}&-\operatorname{Id}&&&&-\operatorname{Id}&\operatorname{Id}&\\ &\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&&&&-\operatorname{Id}&\ddots\\ &&-\operatorname{Id}&2\operatorname{Id}&-\operatorname{Id}&&&&\ddots&\operatorname{Id}\\[5.0pt] &&&-\operatorname{Id}&\operatorname{Id}&&&&&-\operatorname{Id}\\ \hline\cr\operatorname{Id}&-\operatorname{Id}&&&&&\operatorname{Id}\\ &\operatorname{Id}&-\operatorname{Id}&&&&&\phantom{-}\operatorname{Id}\\ &&\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&&&&&\ddots\\ &&&\operatorname{Id}&-\operatorname{Id}&&&&&\operatorname{Id}\end{array}\right]. (96)

Finally, we assume that

A=[2A1+IdIdIdId2A2IdIdIdIdId2An1IdId2IdId2An+IdIdIdIdIdId0IdId],A=\left[\begin{array}[]{rrrcl|rrrlr}2A_{1}+\operatorname{Id}&\operatorname{Id}&&&&-\operatorname{Id}&&\\ -\operatorname{Id}&2A_{2}&\ddots&&&\operatorname{Id}&-\operatorname{Id}&\\ &-\operatorname{Id}&\ddots&\operatorname{Id}&&&\operatorname{Id}&\ddots\\ &&\ddots&2A_{n-1}&\operatorname{Id}&&&\ddots&&-\operatorname{Id}\\[5.0pt] -2\operatorname{Id}&&&-\operatorname{Id}&2A_{n}+\operatorname{Id}&&&&&\operatorname{Id}\\ \hline\cr\operatorname{Id}&-\operatorname{Id}&&&&&\\ &\operatorname{Id}&-\operatorname{Id}&&&&&\makebox(-30.0,-20.0)[]{0}\\ &&\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&&&&\\ &&&\operatorname{Id}\phantom{-}&-\operatorname{Id}&&&&&\end{array}\right], (97)

which is maximally monotone because it can be written as the sum of the maximally monotone operator (x1,,xn,v1,,vn1)2A1x1××2Anxn×{0}n1(x_{1},\ldots,x_{n},v_{1},\ldots,v_{n-1})\mapsto 2A_{1}x_{1}\times\cdots\times 2A_{n}x_{n}\times\{0\}^{n-1}, a linear monotone operator that is the gradient of a convex function, and a skew linear operator. Next, given u:=(x1,,xn,v1,,vn1)Hu:=(x_{1},\dots,x_{n},v_{1},\dots,v_{n-1})\in H, we have

uzerA\displaystyle u\in\operatorname{zer}A {0(2A1+Id)x1+x2v10xi1+2Aixi+xi+1+vi1vi for all i{2,,n1}02x1xn1+(2An+Id)xn+vn10=xjxj+1 for all j{1,,n1}\displaystyle\Leftrightarrow\left\{\begin{array}[]{rlr}0&\in(2A_{1}+\operatorname{Id})x_{1}+x_{2}-v_{1}\\ 0&\in-x_{i-1}+2A_{i}x_{i}+x_{i+1}+v_{i-1}-v_{i}&\text{ for all }i\in\{2,\dots,n-1\}\\ 0&\in-2x_{1}-x_{n-1}+(2A_{n}+\operatorname{Id})x_{n}+v_{n-1}\\ 0&=x_{j}-x_{j+1}&\text{ for all }j\in\{1,\dots,n-1\}\end{array}\right. (98e)
{z=x1==xnv12(A1+Id)zvi2Aiz+vi1 for all i{2,,n1}vn12(IdAn)z(2An1z+vn2).\displaystyle\Leftrightarrow\begin{cases}z=x_{1}=\dots=x_{n}\\ v_{1}\in 2(A_{1}+\operatorname{Id})z\\ v_{i}\in 2A_{i}z+v_{i-1}&\text{ for all }i\in\{2,\dots,n-1\}\\ v_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}z+v_{n-2}).\end{cases} (98f)

This gives us

zerA=\displaystyle\operatorname{zer}A= {(z,,z,v1,,vn1)H\displaystyle\big{\{}(z,\dots,z,v_{1},\dots,v_{n-1})\in H\mid zzer(A1++An)v12(A1+Id)z\displaystyle z\in\operatorname{zer}(A_{1}+\dots+A_{n})\land v_{1}\in 2(A_{1}+\operatorname{Id})z
(i{2,,n2})vivi12Aiz\displaystyle(\forall i\in\{2,\dots,n-2\})\;v_{i}-v_{i-1}\in 2A_{i}z
vn12(IdAn)z(2An1z+vn2)}.\displaystyle\land v_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}z+v_{n-2})\big{\}}. (99)

Next, we verify that

(M+A)1:X2n1X2n1:[x1xixnv1vn1][y1yiynw1wn1]=[JA1(12x1)JAi(12xi+yi1)JAn(12xn+y1+yn1)v12y1+2y2vn12yn1+2yn](M+A)^{-1}:X^{2n-1}\to X^{2n-1}:\begin{bmatrix}x_{1}\\ \vdots\\ x_{i}\\ \vdots\\ x_{n}\\ v_{1}\\ \vdots\\ v_{n-1}\end{bmatrix}\mapsto\begin{bmatrix}y_{1}\\ \vdots\\ y_{i}\\ \vdots\\ y_{n}\\ w_{1}\\ \vdots\\ w_{n-1}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\frac{1}{2}x_{1}\big{\rparen}\\ \vdots\\ J_{A_{i}}\big{\lparen}\frac{1}{2}x_{i}+y_{i-1}\big{\rparen}\\ \vdots\\ J_{A_{n}}\big{\lparen}\frac{1}{2}x_{n}+y_{1}+y_{n-1}\big{\rparen}\\ v_{1}-2y_{1}+2y_{2}\\ \vdots\\ v_{n-1}-2y_{n-1}+2y_{n}\end{bmatrix} (100)

which is clearly single-valued, full domain, and Lipschitz continuous. Using the fact that T=(M+A)1MT=(M+A)^{-1}M along with 100 and 96, we get

T:HH:[x1xixnv1vn1][y1yiynw1wn1]=[JA1(12(x1x2+v1))JAi(12(xi1+2xixi+1vi1+vi)+yi1)JAn(12(xn1+xnvn1)+y1+yn1)x1x2+v12y1+2y2xnxn1+vn12yn1+2yn]T\colon H\to H\colon\begin{bmatrix}x_{1}\\ \vdots\\ x_{i}\\ \vdots\\ x_{n}\\ v_{1}\\ \vdots\\ v_{n-1}\end{bmatrix}\mapsto\begin{bmatrix}y_{1}\\ \vdots\\ y_{i}\\ \vdots\\ y_{n}\\ w_{1}\\ \vdots\\ w_{n-1}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\frac{1}{2}(x_{1}-x_{2}+v_{1})\big{\rparen}\\ \vdots\\ J_{A_{i}}\big{\lparen}\frac{1}{2}(-x_{i-1}+2x_{i}-x_{i+1}-v_{i-1}+v_{i})+y_{i-1}\big{\rparen}\\ \vdots\\ J_{A_{n}}\big{\lparen}\frac{1}{2}(-x_{n-1}+x_{n}-v_{n-1})+y_{1}+y_{n-1}\big{\rparen}\\ x_{1}-x_{2}+v_{1}-2y_{1}+2y_{2}\\ \vdots\\ x_{n}-x_{n-1}+v_{n-1}-2y_{n-1}+2y_{n}\end{bmatrix} (101)

while

T~(w)\displaystyle\widetilde{T}(w) =[w1wn1]+[z2z1znzn1] where [z1zizn]=[JA1(12w1)JAi(12(wi1+wi)+zi1)JAn(12(wn1)+z1+zn1)].\displaystyle=\begin{bmatrix}w_{1}\\ \vdots\\ w_{n-1}\end{bmatrix}+\begin{bmatrix}z_{2}-z_{1}\\ \vdots\\ z_{n}-z_{n-1}\end{bmatrix}\text{ where }\begin{bmatrix}z_{1}\\ \vdots\\ z_{i}\\ \vdots\\ z_{n}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\frac{1}{2}w_{1}\big{\rparen}\\ \vdots\\ J_{A_{i}}\big{\lparen}\frac{1}{2}(-w_{i-1}+w_{i})+z_{i-1}\big{\rparen}\\ \vdots\\ J_{A_{n}}\big{\lparen}\frac{1}{2}(-w_{n-1})+z_{1}+z_{n-1}\big{\rparen}\end{bmatrix}. (102)

Here, T~\widetilde{T} becomes a scaled version of the Malitsky-Tam operator from [27, Algorithm 1]. Finally, for w=(w1,,wn1)w=(w_{1},\dots,w_{n-1}), we have

FixT~\displaystyle\operatorname{Fix}\widetilde{T} ={wXn1zXw12(Id+A1)z(i{2,,n2})wiwi12Aizwn12(IdAn)z(2An1+wn2)}\displaystyle=\big{\{}w\in X^{n-1}\mid\;\begin{aligned} &z\in X\land w_{1}\in 2(\operatorname{Id}+A_{1})z\\ &\land(\forall i\in\{2,\dots,n-2\})w_{i}-w_{i-1}\in 2A_{i}z\\ &\land w_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}+w_{n-2})\big{\}}\end{aligned} (103a)
={wXn1zzer(A1++An)w12(Id+A1)z(i{2,,n2})wiwi12Aizwn12(IdAn)z(2An1z+wn2)}.\displaystyle=\big{\{}w\in X^{n-1}\mid\begin{aligned} &z\;\in\operatorname{zer}(A_{1}+\dots+A_{n})\land w_{1}\in 2(\operatorname{Id}+A_{1})z\\ &\land(\forall i\in\{2,\dots,n-2\})w_{i}-w_{i-1}\in 2A_{i}z\\ &\land w_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}z+w_{n-2})\big{\}}.\end{aligned} (103b)

To obtain in 103b that zzer(A1++An)z\in\operatorname{zer}(A_{1}+\dots+A_{n}), we use the telescoping property on the statements for wiw_{i} in 103a.

5.4.2 Specialization to normal cones of linear subspaces

We now assume that UiU_{i} is a closed linear subspace of XX and that Ai=NUiA_{i}=N_{U_{i}} for each i{1,,n}i\in\{1,\dots,n\}. Then Z=zer(A1++An)=U1UnZ=\operatorname{zer}(A_{1}+\dots+A_{n})=U_{1}\cap\dots\cap U_{n} and 99 yields

FixT\displaystyle\operatorname{Fix}T ={(x,,x,v1,,vn1)\displaystyle=\big{\{}(x,\dots,x,v_{1},\dots,v_{n-1})\mid\; xZv12xU1\displaystyle x\in Z\land v_{1}-2x\in U_{1}^{\perp}
(i{2,,n2})viUi+vi1\displaystyle\land(\forall i\in\{2,\dots,n-2\})v_{i}\in U_{i}^{\perp}+v_{i-1}
vn1(Un+2x)(Un1+vn2)}\displaystyle\land v_{n-1}\in(U_{n}^{\perp}+2x)\cap(U_{n-1}^{\perp}+v_{n-2})\big{\}} (104)

and

FixT~\displaystyle\operatorname{Fix}\widetilde{T} ={wXn1\displaystyle=\big{\{}w\in X^{n-1}\mid zZw1U1+2z(i{2,,n2})wiUi+wi1\displaystyle\;z\in Z\land w_{1}\in U_{1}^{\perp}+2z\land(\forall i\in\{2,\dots,n-2\})w_{i}\in U_{i}^{\perp}+w_{i-1}
wn1(Un+2z)(Un1+wn2)}.\displaystyle\land w_{n-1}\in(U_{n}^{\perp}+2z)\cap(U_{n-1}^{\perp}+w_{n-2})\big{\}}. (105)
Lemma 5.7.

Set Δ={(x,,x)Xn1|xX}\Delta=\big{\{}{(x,\dots,x)\in X^{n-1}}~{}\big{|}~{}{x\in X}\big{\}} and define

E\displaystyle E :=ranΨ(Xn2×Un)\displaystyle:={\operatorname{ran}}\,\Psi\cap(X^{n-2}\times U_{n}^{\perp}) (106a)
U1××(U1++Un2)×((U1++Un1)Un)\displaystyle\subseteq U_{1}^{\perp}\times\dots\times(U_{1}^{\perp}+\cdots+U_{n-2}^{\perp})\times((U_{1}^{\perp}+\cdots+U_{n-1}^{\perp})\cap U_{n}^{\perp}) (106b)

such that

Ψ:\displaystyle\Psi: =U1××UnXn1:(y1,,yn)(y1,y1+y2,,y1++yn1).\displaystyle=U_{1}^{\perp}\times\dots\times U_{n}^{\perp}\to X^{n-1}\colon(y_{1},\ldots,y_{n})\mapsto(y_{1},y_{1}+y_{2},\dots,y_{1}+\dots+y_{n-1}). (107)

Then

FixT={(x,,xn copies,2x,,2x)X2n1|xZ}({0}n×E),\operatorname{Fix}T=\big{\{}{(\underbrace{x,\dots,x}_{n\text{\;copies}},2x,\dots,2x)\in X^{2n-1}}~{}\big{|}~{}{x\in Z}\big{\}}\oplus(\{0\}^{n}\times E), (108)

and

FixT~=ΔZn1E,\operatorname{Fix}\widetilde{T}=\Delta_{Z^{n-1}}\oplus E, (109)

where ΔZn1:=ΔZn1\Delta_{Z^{n-1}}:=\Delta\cap Z^{n-1}.

Proof. Using 104, we know that a generic point in FixT\operatorname{Fix}T is represented by

[xx2x+y12x+j=1n2yj2x+j=1n1yj]=[xx2x+y12x+j=1n2yj2x+yn],\begin{bmatrix}x\\ \vdots\\ x\\ 2x+y_{1}\\ \vdots\\ 2x+\sum_{j=1}^{n-2}y_{j}\\ 2x+\sum_{j=1}^{n-1}y_{j}\end{bmatrix}=\begin{bmatrix}x\\ \vdots\\ x\\ 2x+y_{1}\\ \vdots\\ 2x+\sum_{j=1}^{n-2}y_{j}\\ 2x+y_{n}\end{bmatrix}, (110)

where yiUiy_{i}\in U_{i}^{\perp} for each i{1,,n}i\in\{1,\dots,n\}. This yields 108. The orthogonality in 108 follows from 106b and the fact that Z=(U1++Un)¯Z^{\perp}=\overline{(U_{1}^{\perp}+\dots+U_{n}^{\perp})}. The representation for 109 can be obtained similarly from Section 5.4.2. Finally, the orthogonality in 109 follows as ΔZn1Zn1U1++Un¯n1E\Delta_{Z^{n-1}}\subseteq Z^{n-1}\perp\overline{U_{1}^{\perp}+\dots+U_{n}^{\perp}}^{n-1}\supseteq E. \hfill\quad\blacksquare

Corollary 5.8.

Using the definition of EE from Lemma 5.7, for wXn1w\in X^{n-1},

PFixT~w=(PZ( ¯w),,PZ( ¯w))+PE(w)P_{\operatorname{Fix}\widetilde{T}}w=\big{(}P_{Z}(\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}}),\dots,P_{Z}(\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}})\big{)}+P_{E}(w) (111)

where  ¯w:=1n1i=1n1wi\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}}:=\frac{1}{n-1}\sum_{i=1}^{n-1}w_{i}. Now let uH=X2n1u\in H=X^{2n-1}, set w=Cuw=C^{\ast}u and w=PFixT~ww^{\ast}=P_{\operatorname{Fix}\widetilde{T}}w. Then

PFixTM(u)\displaystyle P_{\operatorname{Fix}T}^{M}(u) =(12PZ ¯w,,12PZ ¯w,w1,,wn1).\displaystyle=\big{(}\tfrac{1}{2}P_{Z}\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}},\ldots,\tfrac{1}{2}P_{Z}\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}},w_{1}^{\ast},\ldots,w_{n-1}^{\ast}\big{)}. (112)

Proof. The identity 111 can be obtained from 109 after using the fact that PZn1Δ=PZn1PΔP_{Z^{n-1}\cap\Delta}=P_{Z^{n-1}}P_{\Delta} (see [7, Proof of Lemma 4.3]). Note that the projection onto Δ\Delta can be interpreted as reproducing the average of the argument a suitable number of times. For 112, we observe through 111 that wk=z+j=1kvjw^{\ast}_{k}=z+\sum_{j=1}^{k}v_{j} for k{1,,n1}k\in\{1,\dots,n-1\} for some viUiv_{i}\in U_{i}^{\perp} and z=PZ( ¯w)z=P_{Z}(\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}}). Therefore using 91 again, but this time with 100, we get

PFixTM(u)=(M+A)1[z+v1v2vn1vnzw1wn1]=[PU1(12(z+v1))=12zPUi(12vi+12z)=12zPUn(12(vn+z)+z)=12zw1wn1]P_{\operatorname{Fix}T}^{M}(u)=(M+A)^{-1}\begin{bmatrix}z+v_{1}\\ v_{2}\\ \vdots\\ v_{n-1}\\ -v_{n}-z\\ w_{1}^{\ast}\\ \vdots\\ w_{n-1}^{\ast}\end{bmatrix}=\begin{bmatrix}P_{U_{1}}(\frac{1}{2}(z+v_{1}))=\frac{1}{2}z\\ \vdots\\ P_{U_{i}}(\frac{1}{2}v_{i}+\frac{1}{2}z)=\frac{1}{2}z\\ \vdots\\ P_{U_{n}}(-\frac{1}{2}(v_{n}+z)+z)=\frac{1}{2}z\\ w_{1}^{\ast}\\ \vdots\\ w_{n-1}^{\ast}\end{bmatrix} (113)

and we are done. \hfill\quad\blacksquare

6 More on Chambolle-Pock

In this section, we provide some additional insights about the Chambolle-Pock framework discussed in Section 5.2.

6.1 Factoring MM

In this section, we discuss how to find a factorization of MM into CCCC^{*} using Linear Algebra techniques. Suppose that X=nX=\mathbb{R}^{n}, Y=mY=\mathbb{R}^{m}, and Lm×nL\in\mathbb{R}^{m\times n}. Recall from 46 that

M=[1σIdXLL1τIdY](n+m)×(n+m).M=\begin{bmatrix}\frac{1}{\sigma}\operatorname{Id}_{X}&-L^{*}\\ -L&\frac{1}{\tau}\operatorname{Id}_{Y}\end{bmatrix}\in\mathbb{R}^{(n+m)\times(n+m)}. (114)

Then M0M\succeq 0 \Leftrightarrow στL21\sigma\tau\|L\|^{2}\leq 1 which we assume henceforth. The following result, which is easily verified directly, provides a factorization of MM into CCCC^{*}:

Lemma 6.1.

Suppose Zm×mZ\in\mathbb{R}^{m\times m} satisfies181818For instance, ZZ could arise from a Cholesky factorization of (or as the square root of) IdYστLL\operatorname{Id}_{Y}-\sigma\tau LL^{*}. A referee pointed out that the result can be generalized to Hilbert spaces. Indeed, the proof of the operator version works exactly the same: use the form given in 116 to compute CCCC^{*}, then simplify the result with 115 to finally obtain MM from 114.

ZZ=IdYστLL.ZZ^{*}=\operatorname{Id}_{Y}-\sigma\tau LL^{*}. (115)

Then

C:=[1σIdX0σL1τZ]C:=\begin{bmatrix}\frac{1}{\sqrt{\sigma}}\operatorname{Id}_{X}&0\\ -\sqrt{\sigma}L&\frac{1}{\sqrt{\tau}}Z\end{bmatrix} (116)

factors MM into CCCC^{*}.

Example 6.2.

If m=n=1m=n=1, σ=τ=1\sigma=\tau=1, and L=[λ]L=[\lambda], where |λ|1|\lambda|\leq 1, then

C=[10λ1λ2]C=\begin{bmatrix}1&0\\ -\lambda&\sqrt{1-\lambda^{2}}\end{bmatrix} (117)

factors MM into CCCC^{*}.

Example 6.3.

If στLL=IdY\sigma\tau LL^{*}=\operatorname{Id}_{Y}, then Z=0Z=0 and we can simplify and reduce 116 further to

C:=[1σIdXσL](n+m)×n,C:=\begin{bmatrix}\frac{1}{\sqrt{\sigma}}\operatorname{Id}_{X}\\ -\sqrt{\sigma}L\end{bmatrix}\in\mathbb{R}^{(n+m)\times n}, (118)

which factors MM into CCCC^{*}.

6.2 An example on the real line

Consider the general setup in Section 5.2.1. We now specialize this further to the case when X=Y=X=Y=\mathbb{R} and L=λIdL=\lambda\operatorname{Id}, where |λ|1|\lambda|\leq 1. Clearly, L=|λ|\|L\|=|\lambda|. We also assume that σ=τ=1\sigma=\tau=1. Then στL21\sigma\tau\|L\|^{2}\leq 1 and the preconditioner matrix (see 46) is

Mλ:=[1λλ1].M_{\lambda}:=\begin{bmatrix}1&-\lambda\\ -\lambda&1\end{bmatrix}. (119)

The matrix MλM_{\lambda} acts on H=2H=\mathbb{R}^{2}, and it is indeed positive semidefinite, with eigenvalues 1±λ1\pm\lambda and

Mλ=1+|λ|.\|M_{\lambda}\|=1+|\lambda|. (120)

If |λ|<1|\lambda|<1, then

Mλ1=11λ2[1λλ1]has eigenvalues 11±λ,   and so Mλ1=11|λ|.M_{\lambda}^{-1}=\frac{1}{1-\lambda^{2}}\begin{bmatrix}1&\lambda\\ \lambda&1\end{bmatrix}\;\;\text{has eigenvalues $\frac{1}{1\pm\lambda}$, \;\;and so \;\; $\|M_{\lambda}^{-1}\|=\frac{1}{1-|\lambda|}$}. (121)

To find a factorization Mλ=CλCλM_{\lambda}=C_{\lambda}C_{\lambda}^{*}, we follow the recipe given in [8, Proof of Proposition 2.3], which starts by determining the principal square root of MλM_{\lambda}. Indeed, one directly verifies that

Sλ:=12[1λ+1+λ1λ1+λ1λ1+λ1λ+1+λ]=MλS_{\lambda}:=\frac{1}{2}\begin{bmatrix}\sqrt{1-\lambda}+\sqrt{1+\lambda}&\sqrt{1-\lambda}-\sqrt{1+\lambda}\\[5.69054pt] \sqrt{1-\lambda}-\sqrt{1+\lambda}&\sqrt{1-\lambda}+\sqrt{1+\lambda}\end{bmatrix}=\sqrt{M_{\lambda}} (122)

and that SλS_{\lambda} has eigenvalues 1+λ,1λ\sqrt{1+\lambda},\sqrt{1-\lambda}, and hence Sλ=1+|λ|\|S_{\lambda}\|=\sqrt{1+|\lambda|}. Note that we have the equivalences SλS_{\lambda} is invertible \Leftrightarrow MλM_{\lambda} is invertible \Leftrightarrow |λ|<1|\lambda|<1, in which case

Sλ1=12[11λ+11+λ11λ11+λ11λ11+λ11λ+11+λ],S_{\lambda}^{-1}=\frac{1}{2}\begin{bmatrix}\frac{1}{\sqrt{1-\lambda}}+\frac{1}{\sqrt{1+\lambda}}&\frac{1}{\sqrt{1-\lambda}}-\frac{1}{\sqrt{1+\lambda}}\\[5.69054pt] \frac{1}{\sqrt{1-\lambda}}-\frac{1}{\sqrt{1+\lambda}}&\frac{1}{\sqrt{1-\lambda}}+\frac{1}{\sqrt{1+\lambda}}\end{bmatrix}, (123)

D=ran¯Sλ=2=HD=\overline{\operatorname{ran}}\,S_{\lambda}=\mathbb{R}^{2}=H, Sλ1S_{\lambda}^{-1} has eigenvalues 1/1+λ,1/1λ1/\sqrt{1+\lambda},1/\sqrt{1-\lambda}, and Sλ1=1/1|λ|\|S_{\lambda}^{-1}\|=1/\sqrt{1-|\lambda|}. In addition,

S±1=12[1111],S_{\pm 1}=\frac{1}{\sqrt{2}}\begin{bmatrix}1&\mp 1\\[5.69054pt] \mp 1&1\end{bmatrix}, (124)

and here D=ran¯S±1=[1,1]𝖳D=\overline{\operatorname{ran}}\,S_{\pm 1}=\mathbb{R}[1,\mp 1]{{}^{\mkern-1.5mu\mathsf{T}}} is a proper subspace of HH.

Let us sum up the discussion as follows.

Lemma 6.4.

Using the square root approach, we may factor MλM_{\lambda} into Mλ=CλCλM_{\lambda}=C_{\lambda}C_{\lambda}^{*} with

Cλ:=12[1λ+1+λ1λ1+λ1λ1+λ1λ+1+λ],C_{\lambda}:=\frac{1}{2}\begin{bmatrix}\sqrt{1-\lambda}+\sqrt{1+\lambda}&\sqrt{1-\lambda}-\sqrt{1+\lambda}\\[5.69054pt] \sqrt{1-\lambda}-\sqrt{1+\lambda}&\sqrt{1-\lambda}+\sqrt{1+\lambda}\end{bmatrix}, (125)

where Cλ=CλC_{\lambda}=C_{\lambda}^{*}. If |λ|=1|\lambda|=1, then we may also factor M±1M_{\pm 1} into C±1C±1C_{\pm 1}C_{\pm 1}^{*}, where

C±1=[11].C_{\pm 1}=\begin{bmatrix}1\\ \mp 1\end{bmatrix}. (126)

In fact, the verification of Lemma 6.4 is algebraic in nature and one may directly also verify the following operator variants of 122 and 123:

Lemma 6.5.

Suppose that Y=XY=X and let L:XXL\colon X\to X be symmetric and positive semidefinite with L1\|L\|\leq 1. Then the principal square root of the preconditioning operator

M=[IdLLId]M=\begin{bmatrix}\operatorname{Id}&-L\\ -L&\operatorname{Id}\end{bmatrix} (127)

is

M=12[IdL+Id+LIdLId+LIdLId+LIdL+Id+L].\sqrt{M}=\frac{1}{2}\begin{bmatrix}\sqrt{\operatorname{Id}-L}+\sqrt{\operatorname{Id}+L}&\sqrt{\operatorname{Id}-L}-\sqrt{\operatorname{Id}+L}\\[5.69054pt] \sqrt{\operatorname{Id}-L}-\sqrt{\operatorname{Id}+L}&\sqrt{\operatorname{Id}-L}+\sqrt{\operatorname{Id}+L}\end{bmatrix}. (128)

If L<1\|L\|<1, then Id±L\operatorname{Id}\pm L are invertible and positive definite; moreover,

M1=12[(IdL)1+(Id+L)1(IdL)1(Id+L)1(IdL)1(Id+L)1(IdL)1+(Id+L)1].\sqrt{M}^{-1}=\frac{1}{2}\begin{bmatrix}\sqrt{(\operatorname{Id}-L)^{-1}}+\sqrt{(\operatorname{Id}+L)^{-1}}&\sqrt{(\operatorname{Id}-L)^{-1}}-\sqrt{(\operatorname{Id}+L)^{-1}}\\[5.69054pt] \sqrt{(\operatorname{Id}-L)^{-1}}-\sqrt{(\operatorname{Id}+L)^{-1}}&\sqrt{(\operatorname{Id}-L)^{-1}}+\sqrt{(\operatorname{Id}+L)^{-1}}\end{bmatrix}. (129)

The following generalization of Lemma 6.5 was prompted by comments of a reviewer.

Lemma 6.6.

Suppose that X=nX=\mathbb{R}^{n}, Y=mY=\mathbb{R}^{m}, and L:XYL\colon X\to Y, i.e., Lm×nL\in\mathbb{R}^{m\times n}, satisfies L1\|L\|\leq 1. Set S:=LLS:=\sqrt{L^{*}L}, U:=LSU:=LS^{\dagger}, and T:=LLT:=\sqrt{LL^{*}} which give rise to the canonical polar decompositions of LL and LL^{*}:

L=USandL=UT.L=US\quad\text{and}\quad L^{*}=U^{*}T. (130)

Set A:=arcsin(S)A:=\arcsin(S) and B:=arcsin(T)B:=\arcsin(T). Then the principal square root of

M:=[IdXLLIdY]M:=\begin{bmatrix}\operatorname{Id}_{X}&-L^{*}\\ -L&\operatorname{Id}_{Y}\end{bmatrix} (131)

is

M\displaystyle\sqrt{M} =[cos(A/2)Usin(B/2)Usin(A/2)cos(B/2)]\displaystyle=\begin{bmatrix}\cos(A/2)&-U^{*}\sin(B/2)\\[5.69054pt] -U\sin(A/2)&\cos(B/2)\end{bmatrix} (132a)
=12[IdXLL+IdX+LLU(IdYLLIdY+LL)U(IdXLLIdX+LL)IdYLL+IdY+LL].\displaystyle=\frac{1}{2}\begin{bmatrix}\sqrt{\operatorname{Id}_{X}-\sqrt{L^{*}L}}+\sqrt{\operatorname{Id}_{X}+\sqrt{L^{*}L}}&\;\;U^{*}\bigg{(}\sqrt{\operatorname{Id}_{Y}-\sqrt{LL^{*}}}-\sqrt{\operatorname{Id}_{Y}+\sqrt{LL^{*}}}\bigg{)}\\[11.38109pt] U\bigg{(}\sqrt{\operatorname{Id}_{X}-\sqrt{L^{*}L}}-\sqrt{\operatorname{Id}_{X}+\sqrt{L^{*}L}}\bigg{)}&\;\;\sqrt{\operatorname{Id}_{Y}-\sqrt{LL^{*}}}+\sqrt{\operatorname{Id}_{Y}+\sqrt{LL^{*}}}\end{bmatrix}. (132b)

Proof. Because this result is not needed elsewhere, we only sketch the proof which relies on some more advanced matrix analysis (see [23] and [24, 25] for background material). (In fact, the reviewer suggested this will work for general operators using results from [13].) Note that SS and TT are symmetric and positive semidefinite. We have PranSS=S=SSS=SPranS=SPranSP_{{\operatorname{ran}}\,S}S=S=SS^{\dagger}S=SP_{{\operatorname{ran}}\,S^{*}}=SP_{{\operatorname{ran}}\,S}. For the statement on the canonical polar decomposition, see [23, Theorem 8.3 and remarks on page 195] which also has

UU=SS=PranS.\displaystyle U^{*}U=SS^{\dagger}=P_{{\operatorname{ran}}\,S}. (133)

This implies LLU=(US)(US)U=USSUU=US2PranS=US2=ULLLL^{*}U=(US)(US)^{*}U=USSU^{*}U=US^{2}P_{{\operatorname{ran}}\,S}=US^{2}=UL^{*}L and so

T2U=(LL)U=U(LL)=US2T^{2}U=(LL^{*})U=U(L^{*}L)=US^{2} (134)

and similarly

S2U=(LL)U=U(LL)=UT2.S^{2}U^{*}=(L^{*}L)U^{*}=U^{*}(LL^{*})=U^{*}T^{2}. (135)

The last identity extends to monomials in the form (LL)kU=U(LL)k(L^{*}L)^{k}U^{*}=U^{*}(LL^{*})^{k} and further to polynomials. For ff suitably defined on the spectra of LLLL^{*} and LLL^{*}L, we have, using [23, Theorem 1.33],

f(LL)U=Uf(LL)f(L^{*}L)U^{*}=U^{*}f(LL^{*}) (136)

and similarly f(LL)U=Uf(LL)f(LL^{*})U=Uf(L^{*}L).

The spectra of SS and TT lie in [0,1][0,1] which makes AA and BB well defined, with spectra in [0,π/2][0,\pi/2]. It follows that the spectra of A/2A/2 and B/2B/2 lie in [0,π/4][0,\pi/4] which yields spectra of cos(A/2)\cos(A/2) and cos(B/2)\cos(B/2) in [1/2,1][1/\sqrt{2},1]. Hence cos(A/2)\cos(A/2) and cos(B/2)\cos(B/2) are invertible.

Now set

W:=[cos(A/2)Usin(B/2)Usin(A/2)cos(B/2)].W:=\begin{bmatrix}\cos(A/2)&-U^{*}\sin(B/2)\\[5.69054pt] -U\sin(A/2)&\cos(B/2)\end{bmatrix}. (137)

The matrices AA and BB are symmetric, and so is WW because (Usin(A/2))=sin(A/2)U=Usin(B/2)(U\sin(A/2))^{*}=\sin(A/2)U^{*}=U^{*}\sin(B/2) follows from 136 with f(t)=sin(12arcsin(t))f(t)=\sin(\tfrac{1}{2}\arcsin(\sqrt{t})). Moreover, since cos(A/2)0\cos(A/2)\succ 0, sin(A/2)=12sin(A)(cos(A/2))1\sin(A/2)=\frac{1}{2}\sin(A)(\cos(A/2))^{-1} and PranS=UUP_{{\operatorname{ran}}\,S}=U^{*}U, we have ransin(A/2)ransin(A)=ran(S)=FixPranS=FixUU{\operatorname{ran}}\,\sin(A/2)\subseteq{\operatorname{ran}}\,\sin(A)={\operatorname{ran}}\,(S)=\operatorname{Fix}P_{{\operatorname{ran}}\,S}=\operatorname{Fix}U^{*}U. Because cos(B/2)0\cos(B/2)\succ 0, we obtain altogether

cos(A/2)Usin(B/2)(cos(B/2))1Usin(A/2)\displaystyle\cos(A/2)-U^{*}\sin(B/2)\big{(}\cos(B/2)\big{)}^{-1}U\sin(A/2) (138a)
=cos(A/2)U(cos(B/2))1sin(B/2)Usin(A/2)\displaystyle=\cos(A/2)-U^{*}\big{(}\cos(B/2)\big{)}^{-1}\sin(B/2)U\sin(A/2) (138b)
=cos(A/2)(cos(A/2))1UUsin(A/2)sin(A/2)\displaystyle=\cos(A/2)-\big{(}\cos(A/2)\big{)}^{-1}U^{*}U\sin(A/2)\sin(A/2) (138c)
=cos(A/2)(cos(A/2))1sin2(A/2)\displaystyle=\cos(A/2)-\big{(}\cos(A/2)\big{)}^{-1}\sin^{2}(A/2) (138d)
=(cos(A/2))1(cos2(A/2)sin2(A/2))\displaystyle=\big{(}\cos(A/2)\big{)}^{-1}\Big{(}\cos^{2}(A/2)-\sin^{2}(A/2)\Big{)} (138e)
=(cos(A/2))1cos(A)\displaystyle=\big{(}\cos(A/2)\big{)}^{-1}\cos(A) (138f)
0,\displaystyle\succeq 0, (138g)

and a Schur complement argument shows that WW is positive semidefinite. We now show that W2=MW^{2}=M. We start with the (W2)1,1(W^{2})_{1,1}, the upper left block of W2W^{2}:

(W2)1,1\displaystyle(W^{2})_{1,1} =cos2(A/2)+Usin(B/2)Usin(A/2)\displaystyle=\cos^{2}(A/2)+U^{*}\sin(B/2)U\sin(A/2) (139a)
=cos2(A/2)+sin(A/2)UUsin(A/2)\displaystyle=\cos^{2}(A/2)+\sin(A/2)U^{*}U\sin(A/2) (139b)
=cos2(A/2)+sin(A/2)sin(A/2)\displaystyle=\cos^{2}(A/2)+\sin(A/2)\sin(A/2) (139c)
=IdX.\displaystyle=\operatorname{Id}_{X}. (139d)

Similarly, (W2)2,2=IdY(W^{2})_{2,2}=\operatorname{Id}_{Y}. Next,

(W2)2,1\displaystyle(W^{2})_{2,1} =Usin(A/2)cos(A/2)+cos(B/2)(Usin(A/2))\displaystyle=-U\sin(A/2)\cos(A/2)+\cos(B/2)\big{(}-U\sin(A/2)\big{)} (140a)
=Usin(A/2)cos(A/2)Ucos(A/2)sin(A/2)\displaystyle=-U\sin(A/2)\cos(A/2)-U\cos(A/2)\sin(A/2) (140b)
=Usin(A)\displaystyle=-U\sin(A) (140c)
=L.\displaystyle=-L. (140d)

Similarly, (W2)1,2=L(W^{2})_{1,2}=-L^{*}. We have shown that W2=MW^{2}=M, and so 132a is verified. Finally, for θ[1,1]\theta\in[-1,1] we have

cos(12arcsin(θ))\displaystyle\cos\big{(}\tfrac{1}{2}\arcsin(\theta)\big{)} =1+θ+1θ2,\displaystyle=\frac{\sqrt{1+\theta}+\sqrt{1-\theta}}{2}, (141a)
sin(12arcsin(θ))\displaystyle\sin\big{(}\tfrac{1}{2}\arcsin(\theta)\big{)} =1+θ1θ2,\displaystyle=\frac{\sqrt{1+\theta}-\sqrt{1-\theta}}{2}, (141b)

and this gives 132b. \hfill\quad\blacksquare

Corollary 6.7.

Suppose that X=nX=\mathbb{R}^{n}, Y=mY=\mathbb{R}^{m}, and Lm×nL\in\mathbb{R}^{m\times n} satisfies σ2L1\sigma^{2}\|L\|\leq 1, where σ>0\sigma>0. Set U:=L(LL)U:=L(\sqrt{L^{*}L})^{\dagger}. Then the principal square root of

M:=[1σIdXLL1σIdY]M:=\begin{bmatrix}\tfrac{1}{\sigma}\operatorname{Id}_{X}&-L^{*}\\ -L&\tfrac{1}{\sigma}\operatorname{Id}_{Y}\end{bmatrix} (142)

is

12[1σIdXLL+1σIdX+LLU(1σIdYLL1σIdY+LL)U(1σIdXLL1σIdX+LL)1σIdYLL+1σIdY+LL].\frac{1}{2}\begin{bmatrix}\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}-\sqrt{L^{*}L}}+\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}+\sqrt{L^{*}L}}&\;\;U^{*}\bigg{(}\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}-\sqrt{LL^{*}}}-\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}+\sqrt{LL^{*}}}\bigg{)}\\[11.38109pt] U\bigg{(}\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}-\sqrt{L^{*}L}}-\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}+\sqrt{L^{*}L}}\bigg{)}&\;\;\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}-\sqrt{LL^{*}}}+\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}+\sqrt{LL^{*}}}\end{bmatrix}. (143)

Proof. Set S:=LLS:=\sqrt{L^{*}L}, T:=LLT:=\sqrt{LL^{*}}, and L1:=σLL_{1}:=\sigma L. Then L11\|L_{1}\|\leq 1. Next, set S1:=L1L1=σLL=σS{S}_{1}:=\sqrt{{L_{1}^{*}}{L_{1}}}=\sigma\sqrt{L^{*}L}=\sigma S, U1:=L1(S1)=LS=UU_{1}:={L_{1}}({S_{1}})^{\dagger}=LS^{\dagger}=U, and T1:=L1L1=σLL=σTT_{1}:=\sqrt{{L_{1}}{L_{1}^{*}}}=\sigma\sqrt{LL^{*}}=\sigma T. By Lemma 6.6, the canonical polar decomposition of L1L_{1} is L1=U1S1=U(σS)L_{1}=U_{1}S_{1}=U(\sigma S) and we have the principal square root of

M1:=[IdXL1L1IdY]=σMM_{1}:=\begin{bmatrix}\operatorname{Id}_{X}&-L_{1}^{*}\\ -L_{1}&\operatorname{Id}_{Y}\end{bmatrix}={\sigma}M (144)

is

M1=12[IdXS1+IdX+S1U(IdYT1IdY+T1)U(IdXS1IdX+S1)IdYT1+IdY+T1].\sqrt{M_{1}}=\frac{1}{2}\begin{bmatrix}\sqrt{\operatorname{Id}_{X}-S_{1}}+\sqrt{\operatorname{Id}_{X}+S_{1}}&\;\;U^{*}\big{(}\sqrt{\operatorname{Id}_{Y}-T_{1}}-\sqrt{\operatorname{Id}_{Y}+T_{1}}\big{)}\\[11.38109pt] U\big{(}\sqrt{\operatorname{Id}_{X}-S_{1}}-\sqrt{\operatorname{Id}_{X}+S_{1}}\big{)}&\;\;\sqrt{\operatorname{Id}_{Y}-T_{1}}+\sqrt{\operatorname{Id}_{Y}+T_{1}}\end{bmatrix}. (145)

It follows that 1σM1=M\tfrac{1}{\sqrt{\sigma}}\sqrt{M_{1}}=\sqrt{M}, i.e.,

M=12σ[IdXσS+IdX+σSU(IdYσTIdY+σT)U(IdXσSIdX+σS)IdYσT+IdY+σT],\sqrt{M}=\frac{1}{2\sqrt{\sigma}}\begin{bmatrix}\sqrt{\operatorname{Id}_{X}-\sigma S}+\sqrt{\operatorname{Id}_{X}+\sigma S}&\;\;U^{*}\big{(}\sqrt{\operatorname{Id}_{Y}-\sigma T}-\sqrt{\operatorname{Id}_{Y}+\sigma T}\big{)}\\[11.38109pt] U\big{(}\sqrt{\operatorname{Id}_{X}-\sigma S}-\sqrt{\operatorname{Id}_{X}+\sigma S}\big{)}&\;\;\sqrt{\operatorname{Id}_{Y}-\sigma T}+\sqrt{\operatorname{Id}_{Y}+\sigma T}\end{bmatrix}, (146)

which yields the conclusion. \hfill\quad\blacksquare

6.2.1 An example where ranM{\operatorname{ran}}\,M is not closed

Bredies et al. hinted in [8, Remark 3.1] the existence of MM which may fail to have closed range. We now provide a setting where MM, DD, and CC are explicit and where indeed DD is not smaller than HH. To this end, we assume that X=Y=2X=Y=\ell^{2} and

L:22:𝐱=(xn)n(λnxn)n,where (λn)n is a sequence in ]0,1[.L\colon\ell^{2}\to\ell^{2}\colon\mathbf{x}=(x_{n})_{n\in{\mathbb{N}}}\mapsto(\lambda_{n}x_{n})_{n\in{\mathbb{N}}},\quad\text{where $(\lambda_{n})_{n\in{\mathbb{N}}}$ is a sequence in $\left]0,1\right[$.} (147)

It is clear that LL is a one-to-one continuous linear operator with L=supnλn1\|L\|=\sup_{n\in{\mathbb{N}}}\lambda_{n}\leq 1, and ran¯L=Y\overline{\operatorname{ran}}\,L=Y. Moreover, LL is onto \Leftrightarrow infnλn>0\inf_{n\in{\mathbb{N}}}\lambda_{n}>0. We think of LL as a Cartesian product of operators :ξλnξ\mathbb{R}\to\mathbb{R}\colon\xi\mapsto\lambda_{n}\xi. Now assume that σ=τ=1\sigma=\tau=1. In view of our work in Section 6.2, we can interpret the associated preconditioner MM with

M:(x0,x1,)Mλ0x0×Mλ1x1×,M\colon(x_{0},x_{1},\ldots)\mapsto M_{\lambda_{0}}x_{0}\times M_{\lambda_{1}}x_{1}\times\cdots, (148)

where Mλ=CλCλM_{\lambda}=C_{\lambda}C_{\lambda}^{*}, using 119 and Lemma 6.4. Set C:(x0,x1,)Cλ0x0×Cλ1x1×C\colon(x_{0},x_{1},\ldots)\mapsto C_{\lambda_{0}}x_{0}\times C_{\lambda_{1}}x_{1}\times\cdots. Then H=D=ran¯CH=D=\overline{\operatorname{ran}}\,C^{*}. Note that MM is a continuous linear operator, with

M=supnMλn=1+supnλn.\|M\|=\sup_{n\in{\mathbb{N}}}\|M_{\lambda_{n}}\|=1+\sup_{n\in{\mathbb{N}}}\lambda_{n}. (149)

using 120. It is clear that MM is injective and has dense range because each MλnM_{\lambda_{n}} is surjective. However, the smallest eigenvalue of MλnM_{\lambda_{n}} is 1λn1-\lambda_{n}; thus,

M is bijectivesupnλn<1,\text{$M$ is bijective}\;\;\Leftrightarrow\;\;\sup_{n\in{\mathbb{N}}}\lambda_{n}<1, (150)

in which case M1=(1supnλn)1\|M^{-1}\|=(1-\sup_{n\in{\mathbb{N}}}\lambda_{n})^{-1} by 121. As ranM{\operatorname{ran}}\,M is dense, we have ran¯M=X\overline{\operatorname{ran}}\,M=X. It follows from 150 that ranM=Xsupnλn<1{\operatorname{ran}}\,M=X\Leftrightarrow\sup_{{n\in{\mathbb{N}}}}\lambda_{n}<1.

It is time for a summary.

Proposition 6.8.

Suppose that supnλn=1\sup_{n\in{\mathbb{N}}}\lambda_{n}=1. Then MM is not surjective and ranM{\operatorname{ran}}\,M is dense, but not closed. If C:DHC\colon D\to H is any factorization of MM in the sense that M=CCM=CC^{*}, then ranC{\operatorname{ran}}\,C^{*} is not closed either.

Proof. We observed already the statement on MM. If ranC{\operatorname{ran}}\,C^{*} was closed, then [15, Lemma 8.40] would imply that ran(C)=ran(CC)=ran(M){\operatorname{ran}}\,(C)={\operatorname{ran}}\,(CC^{*})={\operatorname{ran}}\,(M) is closed which is absurd. \hfill\quad\blacksquare

Remark 6.9.

Suppose that supnλn=1\sup_{n\in{\mathbb{N}}}\lambda_{n}=1. Then [8, Proof of Proposition 2.3] suggests to find CC by first computing the square root SS of MM. Indeed, using 122, we see that SS is given by

S:(x0,x1,)Sλ0x0×Sλ1x1×,S\colon(x_{0},x_{1},\ldots)\mapsto S_{\lambda_{0}}x_{0}\times S_{\lambda_{1}}x_{1}\times\cdots, (151)

restricted to 2\ell^{2}. Once again, we have that ranS{\operatorname{ran}}\,S is dense but not closed. Hence C=SC=S and D=HD=H in this case.

Acknowledgments

We thank the Editor-in-Chief, Dr. Jong-Shi Pang, for his guidance and support. We also would like to thank two anonymous reviewers for their unusually careful reading and constructive comments which helped us to improve this manuscript significantly. The research of the authors was partially supported by Discovery Grants of the Natural Sciences and Engineering Research Council of Canada.

References