On the Bredies-Chenchene-Lorenz-Naldi algorithm:
linear relations and strong convergence

Heinz H. Bauschke, Walaa M. Moursi , Shambhavi Singh, and Xianfu Wang Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: heinz.bauschke@ubc.ca. Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. E-mail: walaa.moursi@uwaterloo.ca. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: sambha@mail.ubc.ca. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: shawn.wang@ubc.ca.

(April 23, 2025)

Abstract

Monotone inclusion problems occur in many areas of optimization and variational analysis. Splitting methods, which utilize resolvents or proximal mappings of the underlying operators, are often applied to solve these problems. In 2022, Bredies, Chenchene, Lorenz, and Naldi introduced a new elegant algorithmic framework that encompasses various well known algorithms including Douglas-Rachford and Chambolle-Pock. They obtained powerful weak and strong convergence results, where the latter type relies on additional strong monotonicity assumptions.

In this paper, we complement the analysis by Bredies et al. by relating the projections of the fixed point sets of the underlying operators that generate the (reduced and original) preconditioned proximal point sequences. We obtain a new strong convergence result when the underlying operator is a linear relation. We note without assumptions such as linearity or strong monotonicity, one may encounter weak convergence without strong convergence. In the case of the Chambolle-Pock algorithm, we obtain a new result that yields strong convergence to the projection onto the intersection of a linear subspace and the preimage of a linear subspace. Splitting algorithms by Ryu and by Malitsky and Tam are also considered. Various examples are provided to illustrate the applicability of our results.

2020 Mathematics Subject Classification: Primary 47H05, 47H09; Secondary 47N10, 65K05, 90C25.

Keywords: Bredies-Chenchene-Lorenz-Naldi algorithm, Chambolle-Pock algorithm, Douglas-Rachford algorithm, Malitsky-Tam algorithm, maximally monotone operator, proximal point algorithm, Ryu’s algorithm, splitting algorithm.

1 Introduction

Throughout, we assume that

H

is a real Hilbert space with inner product

\left\langle{\cdot},{\cdot}\right\rangle\colon H\times H\to\mathbb{R}

(1)

and induced norm $\|\cdot\|$ . We are given a set-valued operator $A\colon H\rightrightarrows H$ . Our goal is to find a point $x$ in $\operatorname{zer}A$ , which we assume to be nonempty:

find

x\in H

such that

0\in Ax

(2)

This is a very general and powerful problem in modern optimization and variational analysis; see, e.g., [6]. Following the framework proposed recently by Bredies, Chenchene, Lorenz, and Naldi [8] (see also the follow up [9] and the related works¹¹1We would like to thank Dr. Panos Patrinos for bringing these to our attention. [26], [19], and [18]), we also assume that we are given another (possibly different) real Hilbert space $D$ , as well as a continuous linear operator

C\colon D\to H,

(3)

with adjoint operator $C^{*}\colon H\to D$ . We shall (usually) assume that $C^{*}$ is surjective. Because $D={\operatorname{ran}}\,C^{*}=\overline{\operatorname{ran}}\,C^{*}=(\ker C)^{\perp}$ , it is clear that $\ker C=\{0\}$ and $C$ is injective. We thus think of $D$ as a space that is “smaller” than $H$ . Now set

M:=CC^{*}\colon H\to H,

(4)

and one thinks of $M$ as a preconditioner²²2The viewpoint taken in [8] is to start with a preconditioner $M$ and then to construct the operator $C$ ; see [8, Proposition 2.3] for details.. It follows from [15, Lemma 8.40] that ${\operatorname{ran}}\,(M)={\operatorname{ran}}\,(C)$ is closed. We shall assume that the set-valued map

H\rightrightarrows H\colon x\mapsto Ax\cap{\operatorname{ran}}\,M

is monotone,

(5)

which is clearly true if $A$ itself is monotone, and that

(A+M)^{-1}

is single-valued, full-domain, and Lipschitz continuous.

(6)

This guarantees that the resolvent³³3We use the notation $J_{B}:=(\operatorname{Id}+B)^{-1}$ for the classical resolvent and $R_{B}:=2J_{B}-\operatorname{Id}$ for the reflected resolvent. of $M^{-1}A$ ,

T:=(M+A)^{-1}M=(\operatorname{Id}+M^{-1}A)^{-1}=J_{M^{-1}A}\colon H\to H

(7)

is also single-valued, full-domain, and Lipschitz continuous⁴⁴4 $M^{-1}A$ is not necessarily monotone and hence $T$ need not be firmly nonexpansive; however, it is “ $M$ -monotone” by 5 and [8, Proposition 2.5]. See also Section 5.2.4 below.. The importance of $T$ for the problem 2 is the relationship

\operatorname{Fix}T=\operatorname{zer}A.

(8)

Associated with $T$ , which operates in $H$ , is the operator (see [8, Theorem 2.13] or [10])

\widetilde{T}:=C^{*}(M+A)^{-1}C=J_{C^{*}\triangleright A}\colon D\to D,

(9)

which operates in $D$ ; here $C^{*}\triangleright A:=(C^{*}A^{-1}C)^{-1}$ is a maximally monotone operator on $D$ and hence $\widetilde{T}$ is firmly nonexpansive. Furthermore, we always assume that

(\lambda_{k})_{k\in{\mathbb{N}}}

is a sequence of real parameters in

[0,2]

with

\sum_{k\in{\mathbb{N}}}\lambda_{k}(2-\lambda_{k})<+\infty

(10)

and sometimes we will impose the more restrictive condition that

0<\inf_{k\in{\mathbb{N}}}\lambda_{k}\leq\sup_{k\in{\mathbb{N}}}\lambda_{k}<2.

(11)

In order to solve 2, Bredies et al. [8] consider two sequences which are tightly intertwined: The first sequence, generated by the so-called preconditioned proximal point (PPP) algorithm, resides in $H$ and is given by

u_{k+1}:=(1-\lambda_{k})u_{k}+\lambda_{k}Tu_{k},

(12)

where $u_{0}\in H$ is the starting point (see [8, equation (2.2)]). The second sequence, generated by the so-called reduced preconditioned proximal point (rPPP) algorithm resides in $D$ and is given by (see [8, equation (2.15)])

w_{k+1}:=(1-\lambda_{k})w_{k}+\lambda_{k}\widetilde{T}w_{k},

(13)

where $w_{0}=C^{*}u_{0}$ .

With these sequences in place, we now state the main result on the PPP and rPPP algorithms from Bredies et al.

Fact 1.1 (Bredies-Chenchene-Lorez-Naldi, 2022).

For every starting point $u_{0}\in H$ , set $w_{0}:=C^{*}u_{0}$ . Then there exist $u^{*}\in\operatorname{zer}A=\operatorname{Fix}T$ and $w^{*}\in\operatorname{Fix}\widetilde{T}$ such that the following hold:

(i)

The sequence $(Tu_{k})_{k\in{\mathbb{N}}}$ converges weakly to $u^{*}$ .
(ii)

The sequence $(w_{k})_{k\in{\mathbb{N}}}$ converges weakly to $w^{*}$ .
(iii)

The sequence $((M+A)^{-1}Cw_{k})_{k\in{\mathbb{N}}}$ coincides with $(Tu_{k})_{k\in{\mathbb{N}}}$ .
(iv)

The sequence $(C^{*}u_{k})_{k\in{\mathbb{N}}}$ coincides with $(w_{k})_{k\in{\mathbb{N}}}$ .
(v)

$u^{*}=(M+A)^{-1}Cw^{*}$ .
(vi)

$w^{*}=C^{*}u^{*}$ .
(vii)

If 11 holds, then $(u_{k})_{k\in{\mathbb{N}}}$ also converges weakly to $u^{*}$ .

Proof. (i): See [8, Corollary 2.10]. (ii): See [8, Corollary 2.15]. (iii): This is buried in [8, Proof of Corollary 2.15]. (iv): See [8, Theorem 2.14]. (v): Combine [8, Corollary 2.15], which states that $((M+A)^{-1}Cw_{k})_{k\in{\mathbb{N}}}$ converges weakly to $(M+A)^{-1}Cw^{*}$ , with (iii) and (i). (vi): This is buried in [8, Proof of Corollary 2.15]. (vii): See [8, Corollary 2.10]. $\hfill\quad\blacksquare$

In [8, Section 2.3] Bredies et al. also obtain linear convergence under the usual assumptions on strong monotonicity. Their framework encompasses an impressive number of other algorithms (see [8, Section 3] for details) including Douglas-Rachford, Chambolle-Pock, and others.

The goal of this paper is to complement the work by Bredies et al. [8]. Specifically, our main results are:

R1

We introduce the notion of the $M$ -projection onto $\operatorname{Fix}T$ (Definition 3.2) and we show how it is obtained from the (regular) projection onto $\operatorname{Fix}\widetilde{T}$ (Theorem 3.6).
R2

We obtain strong convergence results for PPP and rPPP sequences in the case when $A$ is a linear relation (Theorem 4.2).

Assume $A$ is a linear relation. Then $T$ and $\widetilde{T}$ are linear operators and so trivially $0\in\operatorname{Fix}T$ and $0\in\operatorname{Fix}\widetilde{T}$ . R1 yields the exact limits, which are closest to the starting points (with respect to the seminorm induced by $M$ ) and which thus may be different from $0$ , while R2 guarantees strong convergence of the sequences generated by PPP and rPPP.

We also provide: (i) various applications to algorithms in the context of normal cone operators of closed linear subspaces and (ii) an example where the range of the preconditioner $M$ is not closed⁵⁵5The existence of such an example was proclaimed in [8, Remark 3.1]..

Finally, let us stress that in the absence of strong monotonicity and linearity, weak convergence may occur! We shall demonstrate this for Douglas-Rachford and for Chambolle-Pock (see Section 5.1.2 and Section 5.2.2 below), by making use of the following classical example due to Genel and Lindenstrauss [20]:

Fact 1.2 (Genel-Lindenstrauss).

In $\ell^{2}$ , there exists a nonempty bounded closed convex set $S$ , a nonexpansive mapping $N\colon S\to S$ , and a starting point $s_{0}\in S$ such that the sequence generated by $s_{k+1}:=\tfrac{1}{2}s_{k}+\tfrac{1}{2}Ns_{k}$ converges weakly — but not strongly — to a fixed point of $N$ .

The remainder of this paper is organized as follows. In Section 2, we collect some auxiliary results which are required later. Our main result R1 along with the limits of the sequences generated by the PPP and rPPP algorithms are discussed in Section 3. In Section 4, we present our main result R2. Various algorithms are analyzed as in Section 5, with an emphasis on the case when normal cone operators of closed linear subspaces are involved. Some additional results concerning the Chambolle-Pock algorithm are provided in Section 6, including an example where the range of $M$ is not closed.

The notation we employ is fairly standard and follows largely [6].

2 Auxiliary results

In this section, we shall collect various results that will make subsequent proofs more structured.

2.1 More results from Bredies et al.’s [8]

The next result is essentially due to Bredies et al.; however, it is somewhat buried in [8, Proof of Theorem 2.14]:

Proposition 2.1 ( $\operatorname{Fix}T$ vs $\operatorname{Fix}\widetilde{T}$ ).

Recall the definitions of $T$ and $\widetilde{T}$ from 7 and 9. Then:

(i)

If $u\in\operatorname{Fix}T$ , then $C^{*}u\in\operatorname{Fix}\widetilde{T}$ .
(ii)

If $w\in\operatorname{Fix}\widetilde{T}$ , then $w=C^{*}u$ , where $u:=(M+A)^{-1}Cw\in\operatorname{Fix}T$ .
(iii)

If $u\in\operatorname{Fix}T$ , then $u=(M+A)^{-1}Cw$ , where $w:=C^{*}u\in\operatorname{Fix}\widetilde{T}$ .

Consequently, the functions $C^{*}\colon\operatorname{Fix}T\to\operatorname{Fix}\widetilde{T}$ and $(M+A)^{-1}C\colon\operatorname{Fix}\widetilde{T}\to\operatorname{Fix}T$ are bijective and inverse of each other, and

C^{*}\big{(}\operatorname{Fix}T\big{)}=\operatorname{Fix}\widetilde{T}\;\;\text{and}\;\;(M+A)^{-1}C\big{(}\operatorname{Fix}\widetilde{T}\big{)}=\operatorname{Fix}T.

(14)

Proof. (i): Let $u\in\operatorname{Fix}T$ and set $w:=C^{*}u$ . Recalling 4, we thus have the implications $u=Tu=(M+A)^{-1}Mu=(M+A)^{-1}CC^{*}u$ $\Rightarrow$ $w=C^{*}u=(C^{*}(M+A)^{-1}C)(C^{*}u)=\widetilde{T}w$ $\Leftrightarrow$ $w\in\operatorname{Fix}\widetilde{T}$ .

(ii): Let $w\in\operatorname{Fix}\widetilde{T}$ . Clearly, $w\in{\operatorname{ran}}\,\widetilde{T}\subseteq{\operatorname{ran}}\,C^{*}$ . Hence $Cw\in{\operatorname{ran}}\,CC^{*}={\operatorname{ran}}\,M$ and so $(A+M)^{-1}Cw$ is single-valued. Next,

$\displaystyle Tu$	$\displaystyle=T(M+A)^{-1}Cw$	(by definition of $u$ )
	$\displaystyle=(M+A)^{-1}M(M+A)^{-1}Cw$	(by 7)
	$\displaystyle=(M+A)^{-1}CC^{*}(M+A)^{-1}Cw$	(by 4)
	$\displaystyle=(M+A)^{-1}C\widetilde{T}w$	(by 9)
	$\displaystyle=(M+A)^{-1}Cw$	(because $w\in\operatorname{Fix}\widetilde{T}$ )
	$\displaystyle=u$	(by definition of $u$ )

and so $u\in\operatorname{Fix}T$ . Moreover, $C^{*}u=C^{*}(M+A)^{-1}Cw=\widetilde{T}w=w$ as claimed.

(iii): Let $u\in\operatorname{Fix}T$ . By (i), $w\in\operatorname{Fix}\widetilde{T}$ and $(M+A)^{-1}Cw=(M+A)^{-1}CC^{*}u=(M+A)^{-1}Mu=Tu=u$ as announced.

Finally, (i)–(iii) yield the “Consequently” part. $\hfill\quad\blacksquare$

Fact 2.2.

Consider the PPP algorithm. If the parameter sequence $(\lambda_{k})_{k\in{\mathbb{N}}}$ satisfies 11, then $u_{k}-Tu_{k}\to 0$ .

Proof. See the second half of [8, Proof of Theorem 2.9]. $\hfill\quad\blacksquare$

Utilized in the work by Bredies et al. is the seminorm

\|x\|_{M}=\sqrt{\left\langle{x},{Mx}\right\rangle}=\|C^{*}x\|

(15)

on $H$ which allows for some elegant formulations; see in particular [8, Section 2].

2.2 A strong convergence result

Fact 2.3 (Baillon-Bruck-Reich, 1978).

Let $N\colon D\to D$ be a linear nonexpansive mapping such that $N^{k}x_{0}-N^{k+1}x_{0}\to 0$ for every $x_{0}\in D$ . Then $(N^{k}x_{0})_{k\in{\mathbb{N}}}$ converges strongly to a fixed point of $N$ .

Proof. See the original [1, Theorem 1.1], or [6, Theorem 5.14(ii)]. $\hfill\quad\blacksquare$

2.3 From Genel-Lindenstrauss to maximally monotone operators

It will be convenient to extend and repackage 1.2 as follows:

Proposition 2.4.

There exists a maximally monotone operator $B$ on $\ell^{2}$ , with bounded domain, and a starting point $s_{0}\in\ell^{2}$ such that the sequence $(J_{B}^{k}s_{0})_{k\in{\mathbb{N}}}$ converges weakly — but not strongly — to some point $\bar{s}\in\operatorname{zer}B$ .

Proof. Let $S$ , $N$ , and $s_{0}$ be as in 1.2. (We recall that $\operatorname{Fix}N\neq\varnothing$ by the fixed point theorem of Browder-Göhde-Kirk; see, e.g., [6, Theorem 4.29].) The operator $F:=\tfrac{1}{2}\operatorname{Id}+\tfrac{1}{2}N\colon S\to S$ is firmly nonexpansive, and $(F^{k}s_{0})_{k\in{\mathbb{N}}}$ converges weakly — but not strongly — to some point in $\operatorname{Fix}N=\operatorname{Fix}F$ . By [2, Corollary 5], there exists an extension $\widetilde{F}\colon\ell^{2}\to S$ such that $\widetilde{F}$ is firmly nonexpansive and $\widetilde{F}|_{S}=F$ . On the one hand, $\operatorname{Fix}F\subseteq\operatorname{Fix}\widetilde{F}$ . On the other hand, if $x\in\operatorname{Fix}\widetilde{F}$ , then $x=\widetilde{F}x\in{\operatorname{ran}}\,\widetilde{F}\subseteq S$ and so $x=\widetilde{F}|_{S}x=Fx$ , i.e., $x\in\operatorname{Fix}F$ . Altogether, $\varnothing\neq\operatorname{Fix}F=\operatorname{Fix}\widetilde{F}\subseteq S$ . Finally, let $B:=\widetilde{F}^{-1}-\operatorname{Id}$ . Then $\operatorname{dom}\,B={\operatorname{ran}}\,\widetilde{F}\subseteq S$ is bounded, $B$ is maximally monotone on $\ell^{2}$ , $\varnothing\neq\operatorname{zer}B=\operatorname{Fix}\widetilde{F}$ , and $J_{B}=\widetilde{F}$ . The result thus follows from 1.2. $\hfill\quad\blacksquare$

3 The limits and the $M$ -projection

3.1 The limits

In this section, we identify the limits $w^{*}$ and $u^{*}$ of the rPPP and PPP algorithms (see 1.1) provided that additional assumptions are satisfied. Our result rests on the following relationship between $P_{\operatorname{Fix}\widetilde{T}}$ and the projection onto $\operatorname{Fix}T$ with respect to the seminorm from 15.

Theorem 3.1 (projecting onto $\operatorname{Fix}\widetilde{T}$ and $\operatorname{Fix}T$ ).

Let $u_{0}\in H$ and assume that $w_{0}=C^{*}u_{0}$ . Then the following hold:

(i)

If $\widehat{w}:=P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ , then $\widehat{u}:=(M+A)^{-1}C\widehat{w}$ is an $M$ -projection of $u_{0}$ onto $\operatorname{Fix}T$ in the sense that

$(\forall x\in\operatorname{Fix}T)\quad\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M}.$ (16)
(ii)

If $\widehat{u}\in\operatorname{Fix}T$ is an $M$ -projection of $u_{0}$ onto $\operatorname{Fix}T$ in the sense that $(\forall x\in\operatorname{Fix}T)$ $\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M}$ , then $\widehat{w}:=C^{*}\widehat{u}$ is equal to $P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ .
(iii)

If $\widehat{u_{1}},\widehat{u_{2}}$ belong to $\operatorname{Fix}T$ and are both $M$ -projections of $u_{0}$ onto $\operatorname{Fix}T$ , then $\widehat{u_{1}}=\widehat{u_{2}}$ .

Proof. (i): Let $x\in\operatorname{Fix}T$ and set $y:=C^{*}x$ . Then $y\in\operatorname{Fix}\widetilde{T}$ by Proposition 2.1 (i). Hence $\|w_{0}-\widehat{w}\|\leq\|w_{0}-y\|$ . On the other hand, $w_{0}=C^{*}u_{0}$ (by assumption), $\widehat{w}=C^{*}\widehat{u}$ and $\widehat{u}\in\operatorname{Fix}T$ (by Proposition 2.1 (ii)). Altogether, we obtain $\|C^{*}u_{0}-C^{*}\widehat{u}\|\leq\|C^{*}u_{0}-C^{*}x\|$ or $\|C^{*}(u_{0}-\widehat{u})\|\leq\|C^{*}(u_{0}-x)\|$ . The conclusion now follows from 15.

(ii): By Proposition 2.1 (i), we have $\widehat{w}\in\operatorname{Fix}\widetilde{T}$ . Let $y\in\operatorname{Fix}\widetilde{T}$ and set $x:=(M+A)^{-1}Cy$ . By Proposition 2.1 (ii), we have $x\in\operatorname{Fix}T$ and $y=C^{*}x$ . Moreover, $C^{*}(u_{0}-x)=C^{*}u_{0}-C^{*}x=w_{0}-y$ and $C^{*}(u_{0}-\widehat{u})=C^{*}u_{0}-C^{*}\widehat{u}=w_{0}-\widehat{w}$ . The assumption that $\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M}$ turns, in view of 15 and the above, into $\|w_{0}-\widehat{w}\|\leq\|w_{0}-y\|$ . Hence $\widehat{w}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ .

(iii): By (ii), $C^{*}\widehat{u_{1}}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})=C^{*}\widehat{u_{2}}$ . Hence $\widehat{u_{2}}=\widehat{u_{1}}+k$ , where $k\in\ker C^{*}$ . It follows that

$\displaystyle\widehat{u_{2}}$	$\displaystyle=T\widehat{u_{2}}$	(because $\widehat{u_{2}}\in\operatorname{Fix}T$ )
	$\displaystyle=(A+M)^{-1}(CC^{*})(\widehat{u_{1}}+k)$	(by 7 and 4)
	$\displaystyle=(A+M)^{-1}(CC^{}\widehat{u_{1}}+CC^{}k)$	( $CC^{*}$ is linear)
	$\displaystyle=(A+M)^{-1}(M\widehat{u_{1}}+0)$	(by 4 and because $k\in\ker C^{*}$ )
	$\displaystyle=(A+M)^{-1}M\widehat{u_{1}}$
	$\displaystyle=T\widehat{u_{1}}$	(by 7)
	$\displaystyle=\widehat{u_{1}}$	(because $\widehat{u_{1}}\in\operatorname{Fix}T$ )

and we are done. $\hfill\quad\blacksquare$

In view of Theorem 3.1, the following notion is now well defined:

Definition 3.2 ( $M$ -projection onto $\operatorname{Fix}T$ ).

For every $u_{0}\in H$ , there exists a unique point $\widehat{u}\in\operatorname{Fix}T$ such that $(\forall x\in\operatorname{Fix}T)$ $\|u_{0}-\widehat{u}\|_{M}\leq\|u_{0}-x\|_{M}$ . The point $\widehat{u}$ is called the $M$ -projection of $u_{0}$ onto $\operatorname{Fix}T$ , written also as $P_{\operatorname{Fix}T}^{M}(u_{0})$ .

Remark 3.3.

It is the special structure of $\operatorname{Fix}T$ that allows us to introduce the single-valued and full-domain operator $P_{\operatorname{Fix}T}^{M}$ . For more general sets, $M$ -projections either may not exist or they may not be a singleton — see Section 5.1.4 below.

We are now ready for a nice sufficient condition that allows for the identification of the weak limit $w^{*}$ of the sequence generated by the rPPP algorithm:

Theorem 3.4 (when $\operatorname{Fix}\widetilde{T}$ is affine).

Suppose that $\operatorname{Fix}\widetilde{T}$ is an affine subspace of $D$ . Then

w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0}).

(17)

Proof. By [6, Corollary 5.17(i)], the sequence $(w_{k})_{k\in{\mathbb{N}}}$ is Fejér monotone with respect to $\operatorname{Fix}\widetilde{T}$ . From 1.1 (ii), we know that $(w_{k})_{k\in{\mathbb{N}}}$ converges weakly to $w^{*}\in\operatorname{Fix}\widetilde{T}$ . Therefore, [6, Proposition 5.9(ii)] yields that $(w_{k})_{k\in{\mathbb{N}}}$ converges weakly to $P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ . Altogether, we deduce 17. $\hfill\quad\blacksquare$

Remark 3.5 (linear relations).

Suppose that our given operator $A$ is actually a linear relation, i.e., its graph is a linear subspace of $H\times H$ . (For more on linear relations, we recommend Cross’s monograph [14] and Yao’s doctoral thesis [30].) Then $\widetilde{T}$ (see 9) is actually a firmly nonexpansive linear operator; consequently, its fixed point set $\operatorname{Fix}\widetilde{T}$ is a linear subspace and Theorem 3.4 is applicable. Similar comments hold for the case when $A$ is an affine relation, i.e., its graph is an affine subspace of $H\times H$ .

When the weak limit $w^{*}$ of the rPPP sequence $(w_{k})_{k\in{\mathbb{N}}}$ is $P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ , then we are able to identify the weak limit $u^{*}$ of the PPP sequence $(Tu_{k})_{k\in{\mathbb{N}}}$ as $P_{\operatorname{Fix}T}^{M}(u_{0})$ :

Theorem 3.6 ( $P_{\operatorname{Fix}T}^{M}$ from $P_{\operatorname{Fix}\widetilde{T}}$ ).

Suppose that $w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ . Then

u^{*}=P_{\operatorname{Fix}T}^{M}(u_{0}).

(18)

Proof. On the one hand, $(M+A)^{-1}Cw^{*}=(M+A)^{-1}CP_{\operatorname{Fix}\widetilde{T}}(w_{0})=P_{\operatorname{Fix}T}^{M}(u_{0})$ by Theorem 3.1 (i). On the other hand, $u^{*}=(M+A)^{-1}Cw^{*}$ by 1.1 (v). Altogether, we obtain the announced identity 18. $\hfill\quad\blacksquare$

3.2 On the notion of a general $M$ -projection

Proposition 3.7.

Let $S$ be a nonempty subset of $H$ , and let $h\in H$ . Then⁶⁶6Here we set $\Pi_{S}^{M}(h)=\operatorname*{argmin}_{s\in S}\|h-s\|_{M}$ and $\Pi_{R}(d)=\operatorname*{argmin}_{r\in R}\|d-r\|$ .

\Pi_{S}^{M}(h)=S\cap(C^{*})^{-1}\big{(}\Pi_{C^{*}(S)}(C^{*}h)\big{)}.

(19)

Proof. Suppose that $\hat{s}\in H$ . Set $d:=C^{*}h$ , $\hat{r}:=C^{*}\hat{s}$ , and $R:=C^{*}(S)$ . We have the following equivalences:


$\displaystyle\hat{s}\in\Pi_{S}^{M}(h)$	$\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;(\forall s\in S)\;\\|h-\hat{s}\\|_{M}\leq\\|h-s\\|_{M}$	(20a)
	$\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;(\forall s\in S)\;\\|C^{}h-C^{}\hat{s}\\|\leq\\|C^{}h-C^{}s\\|$	(20b)
	$\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;\hat{r}\in R\;\land\;(\forall r\in R)\;\\|d-\hat{r}\\|\leq\\|d-r\\|$	(20c)
	$\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;\hat{r}\in\Pi_{R}(d)$	(20d)
	$\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;C^{}\hat{s}\in\Pi_{C^{}(S)}(C^{*}h)$	(20e)
	$\displaystyle\Leftrightarrow\hat{s}\in S\;\land\;\hat{s}\in(C^{})^{-1}\Pi_{C^{}(S)}(C^{*}h)$	(20f)
	$\displaystyle\Leftrightarrow\hat{s}\in S\cap(C^{})^{-1}\big{(}\Pi_{C^{}(S)}(C^{*}h)\big{)},$	(20g)

and we are done! $\hfill\quad\blacksquare$

Proposition 3.8.

The following hold:

(i)

If $\Pi_{S}^{M}$ is at most singleton-valued, then

$(\forall s\in S)\quad(s+\ker C^{*})\cap S=\{s\}.$ (21)
(ii)

If $\Pi_{C^{*}(S)}$ is at most singleton-valued and 21 holds, then $\Pi_{S}^{M}$ is at most singleton-valued.

Proof. Clearly, $(s+\ker C^{*})\cap S\supseteq\{s\}$ .

(i): We prove the contrapositive. Suppose 21 does not hold. Then there exists $s\in S$ such that $(s+\ker C^{*})\supsetneqq\{s\}$ . Take $k\in\ker C^{*}\smallsetminus\{0\}$ such that $s+k=s^{\prime}\in S$ . Then $\|s-s\|_{M}=\|0\|_{M}=\|C^{*}0\|=0$ and $\|s-s^{\prime}\|_{M}=\|-k\|_{M}=\|-C^{*}k\|=0$ . Hence $\{s,s^{\prime}\}\subseteq\Pi_{S}^{M}(s)$ and $\Pi_{S}^{M}$ is not at most singleton-valued.

(ii): Let $h\in H$ and suppose that $s_{1}$ and $s_{2}$ both belong to $\Pi_{S}^{M}(h)$ . By Proposition 3.7, $C^{*}s_{1}\in\Pi_{C^{*}(S)}(C^{*}h)$ and $C^{*}s_{2}\in\Pi_{C^{*}(S)}(C^{*}h)$ . Because $\Pi_{C^{*}(S)}$ is at most singleton-valued, we deduce that $C^{*}s_{1}=C^{*}s_{2}$ . Hence $s_{2}-s_{1}\in\ker C^{*}$ . Combining with 21, we deduce that $s_{2}\in(s_{1}+\ker C^{*})\cap S=\{s_{1}\}$ and so $s_{2}=s_{1}$ . $\hfill\quad\blacksquare$

Remark 3.9.

The general results above give another view on why $P_{\operatorname{Fix}T}^{M}$ is not multi-valued. We sketch here why. Suppose that $S=\operatorname{Fix}T$ . We know that $C^{*}(S)=\operatorname{Fix}\widetilde{T}$ is a nonempty closed convex set because $\widetilde{T}$ is (firmly) nonexpansive. Hence $\Pi_{C^{*}(S)}$ is actually singleton-valued. Let $s\in S=\operatorname{Fix}T$ , $k\in\ker C^{*}$ , and assume that $s+k\in S$ . Then $s+k=T(s+k)=(A+M)^{-1}(Ms+Mk)=(A+M)^{-1}(Ms)=T(s)=s$ and so $k=0$ . This verifies 21. In view of Proposition 3.8 (ii), we see that $\Pi^{M}_{S}$ is (at most) singleton-valued.

For an example that illustrates that $\Pi_{S}^{M}$ may be empty-valued or multi-valued, see Section 5.1.4 below.

4 Convergence

Theorem 4.1 (weak convergence and limits).

Suppose that $\operatorname{Fix}\widetilde{T}$ is an affine subspace of $D$ . Let $u_{0}\in H$ and set $w_{0}=C^{*}u_{0}$ . Then the following hold for the rPPP and PPP sequences $(w_{k})_{k\in{\mathbb{N}}}$ and $(u_{k})_{k\in{\mathbb{N}}}$ generated by 13 and 12:

(i)

The sequence $(w_{k})_{k\in{\mathbb{N}}}$ converges weakly to $w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ .
(ii)

The sequence $(Tu_{k})_{k\in{\mathbb{N}}}$ converges weakly to $u^{*}=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{*}$ .
(iii)

If 11 holds, then the sequence $(u_{k})_{k\in{\mathbb{N}}}$ also converges weakly to $u^{*}=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{*}$ .

Proof. (i): Combine 1.1 (ii) and Theorem 3.4. (ii): Combine 1.1 (i), Theorem 3.4, Theorem 3.6, and Theorem 3.1 (i). (iii): Combine (ii) with 1.1 (vii). $\hfill\quad\blacksquare$

Theorem 4.2 (strong convergence and limits).

Suppose that $A$ is a linear relation. Let $u_{0}\in H$ and set $w_{0}=C^{*}u_{0}$ . Suppose that the parameter sequence $(\lambda_{k})_{k\in{\mathbb{N}}}$ is identical to some constant $\lambda\in\left]0,2\right[$ . Then the following hold for the rPPP and PPP sequences $(w_{k})_{k\in{\mathbb{N}}}$ and $(u_{k})_{k\in{\mathbb{N}}}$ generated by 13 and 12:

(i)

The sequence $(w_{k})_{k\in{\mathbb{N}}}$ converges strongly to $w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})$ .
(ii)

The sequences $(Tu_{k})_{k\in{\mathbb{N}}}$ and $(u_{k})_{k\in{\mathbb{N}}}$ converge strongly to $u^{*}=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{*}$ .

Proof. Because $A$ is a linear relation on $H$ , it follows that $\widetilde{T}$ is a linear operator and that $\operatorname{Fix}\widetilde{T}$ is a linear subspace of $D$ . Note that $(\forall{k\in{\mathbb{N}}})$ $w_{k+1}=Nw_{k}$ , where $N:=(1-\lambda)\operatorname{Id}+\lambda\widetilde{T}=(1-(\lambda/2))\operatorname{Id}+(\lambda/2)\widetilde{N}$ and $\widetilde{N}:=2N-\operatorname{Id}$ are both linear and nonexpansive, with $\operatorname{Fix}\widetilde{T}=\operatorname{Fix}N=\operatorname{Fix}\widetilde{N}$ . By [6, Theorem 5.15(ii)], $w_{k}-\widetilde{N}w_{k}\to 0$ ; equivalently, $w_{k}-Nw_{k}\to 0$ .

(i): Indeed, it follows from 2.3 that $(w_{k})_{k\in{\mathbb{N}}}$ converges strongly. The result now follows from Theorem 4.1 (i) (or [6, Proposition 5.28]).

(ii): On the one hand, using 6, we see that $(M+A)^{-1}C$ is Lipschitz continuous. On the other hand, $(Tu_{k})_{k\in{\mathbb{N}}}=((M+A)^{-1}Cw_{k})_{k\in{\mathbb{N}}}$ by 1.1 (iii). Altogether, because $(w_{k})_{k\in{\mathbb{N}}}$ strongly converges by (i), we deduce that $(Tu_{k})_{k\in{\mathbb{N}}}$ must converge strongly as well. By Theorem 4.1 (ii), $Tu_{k}\to P_{\operatorname{Fix}T}^{M}(u_{0})$ . Finally, piggybacking on 2.2, we obtain that $(u_{k})_{k\in{\mathbb{N}}}$ converges strongly to $P_{\operatorname{Fix}T}^{M}(u_{0})$ as well. $\hfill\quad\blacksquare$

Remark 4.3.

Using a translation argument similar to what was done in [7, Section 4.4], one can generalize Theorem 4.2 to the case when $A$ is an affine relation, i.e., $\operatorname{gra}A$ is an affine subspace of $H\times H$ , and such that $\operatorname{zer}A\neq\varnothing$ .

5 Examples

In this section, we consider the Douglas-Rachford, Chambolle-Pock, Ryu, and the Malitsky-Tam minimal lifting (which we will refer to as Malitsky-Tam) splitting algorithms.

5.1 Douglas-Rachford

5.1.1 General setup

Following [8, Sections 1 and 3], we suppose that $X$ is a real Hilbert space, and $A_{1},A_{2}$ are two maximally monotone operators on $X$ . The goal is to find a point in $\operatorname{zer}(A_{1}+A_{2})$ . Now assume⁷⁷7We shall employ frequently “block operator” notation for convenience. that

H=X^{2},\;\;D=X,\;\;\text{and}\;\;C=\begin{bmatrix}\operatorname{Id}\\ -\operatorname{Id}\end{bmatrix}.

(22)

Then

C^{*}=\begin{bmatrix}\operatorname{Id}\;\;-\operatorname{Id}\end{bmatrix}\colon X^{2}\to X\colon\begin{bmatrix}x\\ y\end{bmatrix}\mapsto x-y

(23)

is clearly surjective. Now we compute

M=CC^{\ast}=\begin{bmatrix}\operatorname{Id}&-\operatorname{Id}\\ -\operatorname{Id}&\operatorname{Id}\end{bmatrix}

(24)

and we set

A=\begin{bmatrix}A_{1}&\operatorname{Id}\\ -\operatorname{Id}&A_{2}^{-1}\end{bmatrix}

(25)

which is maximally monotone as the sum of a maximally monotone operator $(x,y)\mapsto A_{1}x\times A_{2}^{-1}y$ and a skew linear operator. Next⁸⁸8We use occasionally use the notation $B^{\ovee}=(-\operatorname{Id})\circ B\circ(-\operatorname{Id})$ and $B^{-\ovee}={(B^{-1})}^{\ovee}$ .,


$\displaystyle\operatorname{zer}A$	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{0\in A_{1}x+y\land 0\in-x+A_{2}^{-1}y}\big{\}}$	(26a)
	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{0\in A_{1}x+y\land y\in A_{2}x}\big{\}}$	(26b)
	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{x\in\operatorname{zer}(A_{1}+A_{2})\land-y\in A_{1}x\cap(-A_{2}x)}\big{\}}$	(26c)
	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{-y\in\operatorname{zer}(A_{1}^{-1}+A_{2}^{-\ovee})\land x\in A_{1}^{-1}(-y)\cap(-A_{2}^{-\ovee}(-y))}\big{\}}$	(26d)
	$\displaystyle=\operatorname{gra}(A_{2})\cap\operatorname{gra}(-A_{1})$	(26e)

where 26d follows from [5, Proposition 2.4] and also relates to Attouch-Théra duality and associated operations (see [5, Section 3] for details). One verifies that

(M+A)^{-1}\colon X^{2}\to X^{2}\colon\begin{bmatrix}x\\ y\end{bmatrix}\mapsto\begin{bmatrix}J_{A_{1}}(x)\\ J_{A_{2}^{-1}}(y+2J_{A_{1}}(x))\end{bmatrix}

(27)

is indeed single-valued, full-domain, and Lipschitz continuous; hence,

(M+A)^{-1}C\colon X\to X^{2}\colon w\mapsto\begin{bmatrix}J_{A_{1}}(w)\\ J_{A_{2}^{-1}}(-w+2J_{A_{1}}(w))\end{bmatrix}=\begin{bmatrix}J_{A_{1}}(w)\\ J_{A_{2}^{-1}}R_{A_{1}}(w)\end{bmatrix}.

(28)

We now compute


$\displaystyle T(x,y)$	$\displaystyle=(M+A)^{-1}M(x,y)=(M+A)^{-1}(x-y,y-x)$	(29a)
	$\displaystyle=\begin{bmatrix}J_{A_{1}}(x-y)\\ J_{A_{2}^{-1}}R_{A_{1}}(x-y)\end{bmatrix}.$	(29b)

Furthermore, recalling 23 and 28, we obtain


$\displaystyle\widetilde{T}(w)$	$\displaystyle=C^{*}(M+A)^{-1}Cw=[\operatorname{Id}\;\;-\operatorname{Id}]\begin{bmatrix}J_{A_{1}}(w)\\ -w+2J_{A_{1}}(w)-J_{A_{2}}R_{A_{1}}(w)\end{bmatrix}$	(30a)
	$\displaystyle=w-J_{A_{1}}(w)+J_{A_{2}}R_{A_{1}}(w),$	(30b)

which is the familiar Douglas-Rachford splitting operator. Using the fact that $\operatorname{Fix}\widetilde{T}=C^{*}(\operatorname{Fix}T)$ (see 14) and 26, we obtain

\operatorname{Fix}\widetilde{T}=\bigcup_{x\in\operatorname{zer}(A_{1}+A_{2})}x+\big{(}A_{1}x\cap(-A_{2}x)\big{)},

(31)

an identity that also follows from [5, Theorem 4.5].

5.1.2 An example without strong convergence

Now suppose temporarily that $X=\ell^{2}$ , that $B$ , $s_{0}$ and $\bar{s}$ are as in Proposition 2.4, that $A_{1}=B$ , and that $A_{2}=0$ . Then 26, 29, and 30 turn into

\operatorname{zer}A=\operatorname{zer}(B)\times\{0\},\;\;T(x,y)=\begin{bmatrix}J_{B}(x-y)\\ 0\end{bmatrix},\;\;\text{and}\;\;\widetilde{T}(w)=J_{B}(w).

(32)

Suppose that furthermore $(\forall{k\in{\mathbb{N}}})$ $\lambda_{k}=1$ and $u_{0}=(s_{0},0)$ . It then follows that $w_{0}=C^{*}u_{0}=s_{0}$ , $u_{k}=T^{k}(s_{0},0)=(J_{B}^{k}(s_{0}),0)\>{\rightharpoonup}\>(\bar{s},0)$ and $w_{k}=\widetilde{T}^{k}(s_{0})=J^{k}_{B}(s_{0})\>{\rightharpoonup}\>\bar{s}$ and neither convergence is strong.

5.1.3 Specialization to normal cones of linear subspaces

We now assume that $U_{1},U_{2}$ are two closed linear subspaces of $X$ and that⁹⁹9Given a nonempty closed convex subset $U$ of $X$ , recall that $N_{U}(x):=\begin{cases}\big{\{}{x^{*}\in X}~{}\big{|}~{}{\max\left\langle{U-x},{x^{*}}\right\rangle=0}\big{\}},&\text{if }x\in U;\\ \emptyset,&\text{otherwise}\end{cases}$ is the normal cone operator of $U$ at $x\in X$ . $A_{1}=N_{U_{1}},A_{2}=N_{U_{2}}$ . Then $\operatorname{zer}(A_{1}+A_{2})=U_{1}\cap U_{2}$ , $\operatorname{gra}A_{1}=U_{1}\times U_{1}^{\perp}=U_{1}\times(-U_{1}^{\perp})=\operatorname{gra}(-A_{1})$ and $\operatorname{gra}A_{2}=U_{2}\times U_{2}^{\perp}$ ; hence, 26 yields

\operatorname{Fix}T=\operatorname{zer}A=(U_{1}\times U_{1}^{\perp})\cap(U_{2}\times U_{2}^{\perp})=(U_{1}\cap U_{2})\times(U_{1}^{\perp}\cap U_{2}^{\perp}).

(33)

Next, we see that 29, 30, 31 turn into

\displaystyle T(x,y)=\begin{bmatrix}P_{U_{1}}(x-y)\\ P_{U_{2}^{\perp}}(P_{U_{1}}-P_{U_{1}^{\perp}})(x-y)\end{bmatrix},

(34)

\displaystyle\widetilde{T}=\operatorname{Id}-P_{U_{1}}+P_{U_{2}}(P_{U_{1}}-P_{U_{1}^{\perp}})=P_{U_{2}}P_{U_{1}}+P_{U_{2}^{\perp}}P_{U_{1}^{\perp}},

(35)

\displaystyle\operatorname{Fix}\widetilde{T}=\bigcup_{x\in U_{1}\cap U_{2}}x+(U_{1}^{\perp}\cap(-U_{2}^{\perp}))=(U_{1}\cap U_{2})+(U_{1}^{\perp}\cap U_{2}^{\perp})

(36)

respectively. (The latter two identities were also derived in [3].) It follows from Theorem 4.1 (i) that the rPPP sequence, i.e., the governing sequence of the Douglas-Rachford algorithm, satisfies

w_{k}\>{\rightharpoonup}\>w^{*}=P_{\operatorname{Fix}\widetilde{T}}(w_{0})=P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0}).

(37)

Combining 1.1 (iii), 28, Theorem 4.1 (ii), Theorem 3.1 (i), 28, and 37, we have¹⁰¹⁰10Given a nonempty closed convex subset $U$ of $X$ , its reflected projection is $R_{U}:=R_{N_{U}}:=2P_{U}-\operatorname{Id}$ .

\displaystyle Tu_{k}

\displaystyle=(M+A)^{-1}Cw_{k}=\begin{bmatrix}P_{U_{1}}w_{k}\\ P_{U_{2}^{\perp}}R_{U_{1}}w_{k}\\ \end{bmatrix}

(38)

and


$\displaystyle u^{*}$	$\displaystyle=P^{M}_{\operatorname{Fix}T}(u_{0})=(M+A)^{-1}Cw^{}=\begin{bmatrix}P_{U_{1}}w^{}\\ P_{U_{2}^{\perp}}R_{U_{1}}w^{*}\end{bmatrix}=\begin{bmatrix}P_{U_{1}}\big{(}P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\big{)}\\ P_{U_{2}^{\perp}}R_{U_{1}}\big{(}P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\big{)}\end{bmatrix}$	(39a)
	$\displaystyle=\begin{bmatrix}P_{U_{1}\cap U_{2}}(w_{0})\\ -P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\end{bmatrix}=\begin{bmatrix}P_{U_{1}\cap U_{2}}(x_{0}-y_{0})\\ P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(y_{0}-x_{0})\end{bmatrix}.$	(39b)

Note that the last description of $u^{*}$ clearly differs from $P_{\operatorname{Fix}T}(u_{0})=(P_{U_{1}\cap U_{2}}(x_{0}),P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(y_{0}))$ in general! Returning to 39, we deduce in particular that the shadow sequence $(P_{U_{1}}w_{k})$ satisfies

P_{U_{1}}w_{k}\>{\rightharpoonup}\>P_{U_{1}\cap U_{2}}(w_{0}).

(40)

Moreover, combining with Theorem 4.2, we have the following strong convergence¹¹¹¹11 This was also derived in [3] when $\lambda=1$ . result:

w_{k}\to P_{U_{1}\cap U_{2}}(w_{0})+P_{U_{1}^{\perp}\cap U_{2}^{\perp}}(w_{0})\;\;\text{and}\;\;P_{U_{1}}w_{k}\to P_{U_{1}\cap U_{2}}(w_{0})

(41)

provided that $\lambda_{k}=\lambda$ for all ${k\in{\mathbb{N}}}$ .

5.1.4 When $\Pi_{S}^{M}$ is bizarre

We return to the general setup of Section 5.1.1. We shall show that $\Pi_{S}^{M}$ may be empty or single-valued when $S$ is a certain closed convex subset of $H=X^{2}$ . To this end, assume that $S\subseteq H$ is the Cartesian product

S=S_{1}\times S_{2},

(42)

where $S_{1},S_{2}$ are nonempty closed convex subsets of $X$ . Then $C^{*}(S)=S_{1}-S_{2}$ may fail to be closed even when each $S_{i}$ is a closed linear subspace of $X$ (see, e.g., [3]). Consider such a scenario, let $\bar{x}\in\overline{S_{1}-S_{2}}\smallsetminus(S_{1}-S_{2})$ , and set $\bar{h}:=(\bar{x},0)$ and let $s=(s_{1},s_{2})\in S$ . Then $\|\bar{h}-s\|_{M}=\|\bar{x}-(s_{1}-s_{2})\|>0$ while clearly $\inf\|\bar{x}-(S_{1}-S_{2})\|=0$ . In other words,

\Pi_{S}^{M}(\bar{h})=\varnothing.

(43)

On the other hand, if $S_{1}=S_{2}=X$ and $h=(h_{1},h_{2})\in S=H$ , then $S=S_{1}\times S_{2}=X\times X$ and Proposition 3.7 yields

\Pi_{S}^{M}({h})=\big{(}h+\ker C^{*}\big{)}\cap S=\big{(}(h_{1},h_{2})+\big{\{}{(x,x)}~{}\big{|}~{}{x\in X}\big{\}}\big{)}

(44)

is clearly multi-valued provided $X\neq\{0\}$ . In summary, $\Pi_{S}^{M}$ may be empty-valued or multi-valued.

5.2 Chambolle-Pock

5.2.1 General setup

Following the presentation of the algorithm by Chambolle-Pock (see [11]) in Bredies et al.’s [8] (see also [12]), we suppose that $X$ and $Y$ are two real Hilbert spaces, $A_{1}$ is maximally monotone on $X$ and $A_{2}$ is maximally monotone on $Y$ . We also have a continuous linear operator $L\colon X\to Y$ , as well as $\sigma>0$ and $\tau>0$ such that $\sigma\tau\|L\|^{2}\leq 1$ . The goal is to find a point in¹²¹²12The operator $L^{*}A_{2}L$ is not necessarily maximally monotone. For a sufficient condition, see [6, Corollary 25.6].

\operatorname{zer}(A_{1}+L^{*}A_{2}L),

(45)

which we assume to exist. We have

H=X\times Y\;\;\text{and}\;\;M=\begin{bmatrix}\frac{1}{\sigma}\operatorname{Id}_{X}&-L^{*}\\ -L&\frac{1}{\tau}\operatorname{Id}_{Y}\end{bmatrix};

(46)

hence,

\|(x,y)\|_{M}^{2}=\frac{1}{\sigma}\|x\|^{2}-2\left\langle{Lx},{y}\right\rangle+\frac{1}{\tau}\|y\|^{2}.

(47)

Unfortunately, for Chambolle-Pock the operator $C$ in the factorization $CC^{*}$ is typically not explicitly available¹³¹³13However, see Section 6.1 and Section 6.2 below for some factorizations.. Moreover,

A=\begin{bmatrix}A_{1}&L^{*}\\ -L&A_{2}^{-1}\end{bmatrix}.

(48)

Then $A$ is maximally monotone on $X\times Y$ , because it is the sum of the maximally monotone operator $(x,y)\mapsto A_{1}x\times A_{2}^{-1}y$ and a skew linear operator. Next,


$\displaystyle\operatorname{zer}A$	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{0\in A_{1}x+L^{*}y\land 0\in-Lx+A_{2}^{-1}y}\big{\}}$	(49a)
	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{0\in A_{1}x+L^{*}y\land y\in A_{2}(Lx)}\big{\}}$	(49b)
	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{x\in\operatorname{zer}(A_{1}+L^{}A_{2}L)\land y\in(-L^{-}A_{1}x)\cap(A_{2}Lx)}\big{\}}$	(49c)
	$\displaystyle=\big{\{}{(x,y)}~{}\big{\|}~{}{y\in\operatorname{zer}(-L^{}A_{1}^{-1}(-L)+A_{2}^{-1})\land x\in(A_{1}^{-1}(-L^{}y))\cap(L^{-1}A_{2}^{-1}y)}\big{\}}$	(49d)

where 49d follows from elementary algebraic manipulations (see also [17, Proposition 1]). This shows that when $(x,y)\in\operatorname{zer}A$ , then $x$ is a primal solution while $y$ corresponds to a dual solution, i.e., $y$ satisfies $0\in L^{*}A_{1}^{-\ovee}Ly=A_{2}^{-1}y$ . One verifies (see [8, equation (3.4)]) that

(M+A)^{-1}\colon X\times Y\to X\times Y\colon\begin{bmatrix}x\\ y\end{bmatrix}\mapsto\begin{bmatrix}J_{\sigma A_{1}}(\sigma x)\\ J_{\tau A_{2}^{-1}}\big{(}2\tau LJ_{\sigma A_{1}}(\sigma x)+\tau y\big{)}\end{bmatrix}

(50)

is indeed single-valued, full-domain, and Lipschitz continuous. Hence (see [8, equation (3.3)])

\displaystyle T(x,y)

\displaystyle=(M+A)^{-1}M(x,y)=\begin{bmatrix}J_{\sigma A_{1}}(x-\sigma L^{*}y)\\ J_{\tau A_{2}^{-1}}\big{(}y+\tau L(2J_{\sigma A_{1}}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix}.

(51)

Using the general inverse resolvent identity, (see, e.g., [6, Proposition 23.20]), we know that $J_{\tau A_{2}^{-1}}=\operatorname{Id}-\tau J_{\frac{1}{\tau}A_{2}}\circ\frac{1}{\tau}\operatorname{Id}$ . Therefore, we can express 51 also as

T(x,y)=\begin{bmatrix}J_{\sigma A_{1}}(x-\sigma L^{*}y)\\[5.0pt] y+\tau L(2J_{\sigma A_{1}}(x-\sigma L^{*}y)-x)-\tau J_{\frac{1}{\tau}A_{2}}\big{(}\frac{1}{\tau}y+L(2J_{\sigma A_{1}}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix}.

(52)

5.2.2 An example without strong convergence

Now suppose temporarily that $X=\ell^{2}$ , that $B$ , $s_{0}$ and $\bar{s}$ are as in Proposition 2.4, that $A_{1}=\tfrac{1}{\sigma}B$ , and that $A_{2}=0$ . Then 49 and 51 turn into

\operatorname{zer}A=\operatorname{zer}(B)\times\{0\}\;\;\text{and}\;\;T(x,y)=\begin{bmatrix}J_{B}(x-\sigma L^{*}y)\\ 0\end{bmatrix}.

(53)

Suppose that furthermore $(\forall{k\in{\mathbb{N}}})$ $\lambda_{k}=1$ and $u_{0}=(s_{0},0)$ . It then follows that $u_{k}=T^{k}(s_{0},0)=(J_{B}^{k}(s_{0}),0)\>{\rightharpoonup}\>(\bar{s},0)$ but the convergence is not strong.

5.2.3 Specialization to normal cones of linear subspaces

We now assume that $U$ is closed linear subspace of $X$ , that $V$ is a closed linear subspace of $Y$ , that $A_{1}=N_{U},A_{2}=N_{V}$ . Let $(x,y)\in X\times Y$ . Then $A_{1}x=U^{\perp}$ if $x\in U$ , and $A_{1}x=\varnothing$ if $x\notin U$ and similarly for $N_{V}y$ . Note that $0=0+0\in U^{\perp}+L^{*}V^{\perp}$ . It follows that $x\in\operatorname{zer}(A_{1}+L^{*}A_{2}L)$ $\Leftrightarrow$ $0\in A_{1}x+L^{*}A_{2}Lx$ $\Leftrightarrow$ $[x\in U\land Lx\in V]$ $\Leftrightarrow$ $x\in U\cap L^{-1}(V)$ . We have shown $\operatorname{zer}(A_{1}+L^{*}A_{2}L)=U\cap L^{-1}(V)$ . Now assume that $x\in\operatorname{zer}(A_{1}+L^{*}A_{2}L)=U\cap L^{-1}(V)$ . Then $y\in(-L^{-*}A_{1}x)\cap(A_{2}Lx)$ $\Leftrightarrow$ $[L^{*}(-y)\in A_{1}x\land y\in A_{2}Lx]$ $\Leftrightarrow$ $[-L^{*}y\in U^{\perp}\land y\in V^{\perp}]$ $\Leftrightarrow$ $[L^{*}y\in-U^{\perp}=U^{\perp}\land y\in V^{\perp}]$ $\Leftrightarrow$ $y\in L^{-*}(U^{\perp})\cap V^{\perp}$ . Combining this with 49 yields

\operatorname{Fix}T=\operatorname{zer}A=\big{(}U\cap L^{-1}(V)\big{)}\times\big{(}V^{\perp}\cap L^{-*}(U^{\perp})\big{)}.

(54)

Now 51 and 52 particularize to

\displaystyle T(x,y)

\displaystyle=\begin{bmatrix}P_{U}(x-\sigma L^{*}y)\\ P_{V^{\perp}}\big{(}y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix}

(55)

and

T(x,y)=\begin{bmatrix}P_{U}(x-\sigma L^{*}y)\\[5.0pt] y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)-P_{V}\big{(}y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)\big{)}\end{bmatrix}

(56)

respectively.

Lemma 5.1.

Given $(x_{0},y_{0})\in X\times Y$ , we have

P_{\operatorname{Fix}T}^{M}(x_{0},y_{0})=\big{(}P_{U\cap L^{-1}(V)}(x_{0}-\sigma L^{*}y_{0}),P_{V^{\perp}\cap L^{-*}(U^{\perp})}(y_{0}-\tau Lx_{0})\big{)}.

(57)

Proof. Using 54, we want to find the (unique by Theorem 3.1 (iii)) minimizer of the function $(x,y)\mapsto\|(x,y)-(x_{0},y_{0})\|_{M}$ subject to $x\in U\cap L^{-1}(V)$ and $y\in V^{\perp}\cap L^{-*}(U^{\perp})$ . First, squaring $\|(x,y)-(x_{0},y_{0})\|_{M}$ and recalling 47 yields $\frac{1}{\sigma}\|x-x_{0}\|^{2}-2\left\langle{L(x-x_{0})},{y-y_{0}}\right\rangle+\frac{1}{\tau}\|y-y_{0}\|^{2}$ . Second, expanding, discarding constant terms, and using $Lx\perp y$ results in $\frac{1}{\sigma}\|x\|^{2}-\frac{2}{\sigma}\left\langle{x},{x_{0}}\right\rangle+2\left\langle{Lx_{0}},{y}\right\rangle+2\left\langle{Lx},{y_{0}}\right\rangle+\frac{1}{\tau}\|y\|^{2}-\frac{2}{\tau}\left\langle{y},{y_{0}}\right\rangle=\frac{1}{\sigma}\big{(}\|x\|^{2}-2\left\langle{x},{x_{0}-\sigma L^{*}y_{0}}\right\rangle\big{)}+\frac{1}{\tau}\big{(}\|y\|^{2}-2\left\langle{y},{y_{0}-\tau Lx_{0}}\right\rangle\big{)}$ . Thirdly, completing the squares (which adds only constant terms here), we obtain $\frac{1}{\sigma}\|x-(x_{0}-\sigma L^{*}y_{0})\|^{2}+\frac{1}{\tau}\|y-(y_{0}-\tau Lx_{0})\|^{2}$ and the proclaimed identity follows. $\hfill\quad\blacksquare$

If $(u_{k})_{k\in{\mathbb{N}}}=(x_{k},y_{k})_{k\in{\mathbb{N}}}$ is the PPP sequence for Chambolle-Pock, then the weak limit $u^{*}=(x^{*},y^{*})$ of $(Tu_{k})_{k\in{\mathbb{N}}}$ is given by

(x^{*},y^{*})=\big{(}P_{U\cap L^{-1}(V)}(x_{0}-\sigma L^{*}y_{0}),P_{V^{\perp}\cap L^{-*}(U^{\perp})}(y_{0}-\tau Lx_{0})\big{)}

(58)

because of Theorem 4.1 (ii) and Lemma 5.1. Moreover, combining with Theorem 4.2, we obtain the strong convergence result

P_{U}(x_{k}-\sigma L^{*}y_{k})\to P_{U\cap L^{-1}(V)}(x_{0})\quad\text{when $y_{0}=0$ and $(\lambda_{k})_{k\in{\mathbb{N}}}\equiv\lambda$.}

(59)

Remark 5.2.

We note that 58 beautifully generalizes 39, where $Y=X$ , $L=\operatorname{Id}$ and $\sigma=\tau=1$ . To the best of our knowledge, the limit formula 58 appears to be new, even in the classical case where $M$ is positive definite. Moreover, 59 is a nice way to compute algorithmically $P_{U\cap L^{-1}(V)}$ in the case when only $P_{V}$ is available but $P_{L^{-1}(V)}$ is not.

5.2.4 Specializing Section 5.2.3 even further to two lines in $\mathbb{R}^{2}$

We now turn to a pleasing special case that allows for the explicit computation of the spectral radius and the operator norm of $T$ . Keeping the setup of Section 5.2.3, we assume additionally that $Y=X$ , $L=\operatorname{Id}$ , and $\sigma=1/\tau$ . Then the operator $T$ from 55 turns into

\displaystyle T

\displaystyle=\begin{bmatrix}P_{U}&-\frac{1}{\tau}P_{U}\\ \tau P_{V^{\perp}}(P_{U}-P_{U^{\perp}})&P_{V^{\perp}}(P_{U^{\perp}}-P_{U})\end{bmatrix}.

(60)

We now specialize even further to $X=\mathbb{R}^{2}$ and $\theta\in\left[0,\pi\right[$ , where we consider the two lines

U=\mathbb{R}[1,0]{{}^{\mkern-1.5mu\mathsf{T}}}\;\;\text{and}\;\;V=\mathbb{R}[\cos(\theta),\sin(\theta)]{{}^{\mkern-1.5mu\mathsf{T}}}

(61)

which have an angle of $\theta$ and for which have projection matrices (see, e.g., [3, Section 5])

P_{U}=\begin{bmatrix}1&0\\ 0&0\end{bmatrix}\;\;\text{and}\;\;P_{V}=\begin{bmatrix}\cos^{2}(\theta)&\cos(\theta)\sin(\theta)\\ \cos(\theta)\sin(\theta)&\sin^{2}(\theta)\end{bmatrix}.

(62)

It follows that 60 turns into

T=\begin{bmatrix}1&0&-\frac{1}{\tau}&0\\ 0&0&0&0\\ \tau\,\sin^{2}(\theta)&\tau\,\cos\left(\theta\right)\,\sin\left(\theta\right)&-\sin^{2}(\theta)&-\cos\left(\theta\right)\,\sin\left(\theta\right)\\ -\tau\,\cos(\theta)\,\sin(\theta)&-\tau\cos^{2}(\theta)&\cos\left(\theta\right)\,\sin(\theta)&\cos^{2}(\theta)\end{bmatrix}.

(63)

We know from the main results of Bredies et al. that $T$ has excellent properties with respect to the PPP algorithm. For this particular $T$ , we can quantify the key notions:

Lemma 5.3.

The operator $T$ defined in 63 satisfies the following:

(i)

(spectral radius) $\rho(T)=|\cos(\theta)|$ , which is independent of $\tau$ , and minimized with minimum value of $0$ when $\theta=\pi/2$ and maximized with maximum value of $1$ when $\theta=0$ .
(ii)

(operator norm) We have

$\|T\|=\sqrt{1+\frac{1+\tau^{4}+(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}}}.$ (64)
(iii)

(bounds) $\sqrt{2}\leq\sqrt{1+\max\{\tau^{2},1/\tau^{2}\}}\leq\|T\|\leq\tau+{1}/{\tau}$ , and the lower bound is attained when $\theta=0$ and $\tau=1$ while the upper bound is attained when $\theta=\pi/2$ .
(iv)

(Douglas-Rachford case) When $\tau=1$ , we have $\|T\|=\sqrt{2+2\sin(\theta)}$ which is minimized with minimum value $\sqrt{2}$ when $\theta=0$ , and maximized with maximum value $2$ when $\theta=\pi/2$ .

Proof. The matrix $T$ has the characteristic polynomial (in the variable $\zeta$ ):

\zeta^{2}\big{(}\zeta^{2}-2\cos^{2}(\theta)\zeta+\cos^{2}(\theta)\big{)};

(65)

hence, the eigenvalues of $T$ are $0,0,|\cos(\theta)|\big{(}|\cos(\theta)|\pm\mathrm{i}|\sin(\theta)|\big{)}$ , with absolute values
$0,0,|\cos(\theta)|\sqrt{|\cos(\theta)|^{2}+|(\pm|\sin(\theta)|)|^{2}}=|\cos(\theta)|$ . Thus we obtain that the spectral radius of $T$ to be

\rho(T)=|\cos(\theta)|.

(66)

This verifies (i).

Next, the matrix $T{{}^{\mkern-1.5mu\mathsf{T}}}T$ has the characteristic polynomial (in the variable $\zeta$ )

\frac{\zeta^{2}}{\tau^{2}}\Big{(}\tau^{2}\zeta^{2}-(1+\tau^{2})^{2}\zeta+(1+\tau^{2})^{2}\cos^{2}(\theta)\Big{)}

(67)

and thus it has eigenvalues $0,0$ and


	$\displaystyle\frac{(1+\tau^{2})^{2}\pm\sqrt{(1+\tau^{2})^{4}-4\tau^{2}(1+\tau^{2})^{2}\cos^{2}(\theta)}}{2\tau^{2}}$		(68a)
	$\displaystyle=\frac{(1+\tau^{2})^{2}\pm(1+\tau^{2})\sqrt{1+\tau^{4}+2\tau^{2}(1-2\cos^{2}(\theta))}}{2\tau^{2}}$		(68b)
	$\displaystyle=\frac{(1+\tau^{2})^{2}\pm(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}}$		(68c)
	$\displaystyle=1+\frac{1+\tau^{4}\pm(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}}.$		(68d)

Therefore,

\displaystyle\|T\|

\displaystyle=\sqrt{1+\frac{1+\tau^{4}+(1+\tau^{2})\sqrt{1+\tau^{4}-2\tau^{2}\cos(2\theta)}}{2\tau^{2}}}

(69)

and we have verified (ii).

Turning to (iii), we see that for fixed $\tau$ that $\|T\|$ is smallest (resp. largest) when $\theta=0$ (resp. $\theta=\pi/2$ ) in which case $\|T\|$ becomes $\sqrt{1+({1+\tau^{4}+(1+\tau^{2})|1-\tau^{2}|})/({2\tau^{2}})}$ (resp. $\sqrt{1+({1+\tau^{4}+(1+\tau^{2})(1+\tau^{2})})/({2\tau^{2}})}\big{)}$ which further simplifies to $\sqrt{1+\max\{\tau^{2},1/\tau^{2}\}}$ (resp. $\tau+1/\tau$ ).

Finally, (iv) follows by simplifying (ii) with $\tau=1$ . $\hfill\quad\blacksquare$

Remark 5.4.

Lemma 5.3 (iii) impressively shows that studying the operator $T$ from 63 is outside the realm of fixed point theory of classical nonexpansive mappings. In such cases, the spectral radius is helpful (for more results in this direction, see [4]). We also point out that the operator $T$ from 63 is firmly nonexpansive — not with respect to the standard Hilbert space norm on $X^{2}$ , but rather with respect to the seminorm induced by the preconditioner $M$ (see [8, Lemma 2.6] for details).

5.2.5 Numerical experiment

In this section, let $U$ be a closed affine subspace of $X$ and let $b\in Y$ . Let $(x,y)\in X\times Y$ . We assume that $A_{1}=N_{U}$ and $A_{2}=N_{\{b\}}$ . Then 52 turns into

T(x,y)=\begin{bmatrix}P_{U}(x-\sigma L^{*}y)\\[5.0pt] y+\tau L(2P_{U}(x-\sigma L^{*}y)-x)-\tau b\end{bmatrix}.

(70)

Now assume that $x\in\operatorname{zer}(A_{1}+L^{*}A_{2}L)=U\cap L^{-1}(b)$ . Then $y\in(-L^{-*}A_{1}x)\cap(A_{2}Lx)$ $\Leftrightarrow$ $[L^{*}(-y)\in A_{1}x\land y\in A_{2}Lx]$ $\Leftrightarrow$ $[-L^{*}y\in(U-U)^{\perp}\land y\in(\{b\}-\{b\})^{\perp}]$ $\Leftrightarrow$ $[L^{*}y\in-(U-U)^{\perp}=(U-U)^{\perp}\land y\in Y]$ $\Leftrightarrow$ $y\in L^{-*}((U-U)^{\perp})$ . It follows from 49 that

\operatorname{zer}A=\operatorname{Fix}T=\big{(}U\cap L^{-1}(b)\big{)}\times L^{-*}\big{(}(U-U)^{\perp}\big{)}.

(71)

Arguing similarly to the derivation of 58 and 59, we obtain that the weak limit $u^{*}=(x^{*},y^{*})$ of the PPP sequence $(u_{k})_{k\in{\mathbb{N}}}=(x_{k},y_{k})_{k\in{\mathbb{N}}}$ of Chambolle-Pock is given by

(x^{*},y^{*})=\big{(}P_{U\cap L^{-1}(b)}(x_{0}-\sigma L^{*}y_{0}),P_{L^{-*}((U-U)^{\perp})}(y_{0}-\tau Lx_{0})\big{)}

(72)

and that

P_{U}(x_{k}-\sigma L^{*}y_{k})\to P_{U\cap L^{-1}(b)}(x_{0})\quad\text{when $y_{0}=0$ and $(\lambda_{k})_{k\in{\mathbb{N}}}\equiv\lambda$.}

(73)

We now illustrate 73 numerically using a set up motivated by Computed Tomography (CT), which uses X-ray measurements to reconstruct cross-sectional body images (see, e.g., [22]). The resulting inverse problem we consider here amounts to solving $Lx=b$ where $L\in\mathbb{R}^{6750\times 2500}$ and $b\in\mathbb{R}^{6750}$ : indeed, the matrix $L$ and the vector $b$ were generated from the $50\times 50$ Shepp-Logan phantom image [29], reshaped to a vector $x^{*}\in\mathbb{R}^{2500}$ using the Matlab AIR Tools II package¹⁴¹⁴14We used the command paralleltomo(50,0:2:178,75). In fact, the experiments were performed in GNU Octave [16]. [21]. In turn, the closed linear subspace $U$ was obtained by using the a priori information that the first and last two columns of the phantom image must be black, i.e., they contain only zeros. Because the matrix $L^{*}L$ is smaller than $LL^{*}$ , we compute $\|L\|$ via $\|L\|^{2}=\|L^{*}L\|$ and then set $\sigma:=\tau:=0.99/\|L\|$ . In Figure 1 we present the reconstructed phantom images generated after $100$ and $10000$ iterations of Chambolle-Pock, with starting point $(x_{0},y_{0})=(0,0)$ and $\lambda_{k}\equiv 1$ , along with the exact phantom.

Refer to caption — Figure 1: The exact phantom image and reconstructions generated by $x_{100}$ and $x_{10000}$ .

5.3 Ryu

5.3.1 General setup

Let $X$ be a real Hilbert space, and $A_{1},A_{2},A_{3}$ be maximally monotone on $X$ . The goal is to find a point in

Z:=\operatorname{zer}(A_{1}+A_{2}+A_{3})

(74)

which we assume to be nonempty. The three-operator splitting algorithm by Ryu [28] is designed to solve this problem. It was pointed out in [9] that this fits the framework by Bredies et al. Indeed, assume that

\displaystyle H=X^{5},\quad D=X^{2},\quad\text{and}\quad C=\begin{bmatrix}\operatorname{Id}&0\\ 0&\operatorname{Id}\\ -\operatorname{Id}&-\operatorname{Id}\\ \operatorname{Id}&0\\ 0&\operatorname{Id}\end{bmatrix}.

(75)

Then

C^{*}=\begin{bmatrix}\operatorname{Id}&0&-\operatorname{Id}&\operatorname{Id}&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&\operatorname{Id}\end{bmatrix}

(76)

is clearly surjective and

M=CC^{*}=\begin{bmatrix}\operatorname{Id}&0&-\operatorname{Id}&\operatorname{Id}&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&\operatorname{Id}\\ -\operatorname{Id}&-\operatorname{Id}&2\operatorname{Id}&-\operatorname{Id}&-\operatorname{Id}\\ \operatorname{Id}&0&-\operatorname{Id}&\operatorname{Id}&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&\operatorname{Id}\end{bmatrix}.

(77)

Now assume that

A=\begin{bmatrix}2A_{1}+\operatorname{Id}&0&\operatorname{Id}&-\operatorname{Id}&0\\ -2\operatorname{Id}&2A_{2}+\operatorname{Id}&\operatorname{Id}&0&-\operatorname{Id}\\ -\operatorname{Id}&-\operatorname{Id}&2A_{3}&\operatorname{Id}&\operatorname{Id}\\ \operatorname{Id}&0&-\operatorname{Id}&0&0\\ 0&\operatorname{Id}&-\operatorname{Id}&0&0\end{bmatrix},

(78)

which is maximally monotone¹⁵¹⁵15Indeed, $A$ is the sum of the operator $(x_{1},\ldots,x_{5})\mapsto 2A_{1}x_{1}\times 2A_{2}x_{2}\times 2A_{3}x_{3}\times\{0\}\times\{0\}$ , the gradient of $(x_{1},\ldots,x_{5})\mapsto\frac{1}{2}\lVert x_{1}-x_{2}\rVert^{2}$ , and a skew linear operator. on $H$ . Next, given $(x_{1},\ldots,x_{5})\in H$ , we have


$\displaystyle(x_{1},\ldots,x_{5})\in\operatorname{zer}A$	$\displaystyle\Leftrightarrow\begin{cases}0\in(2A_{1}+\operatorname{Id})x_{1}+x_{3}-x_{4}\\ 0\in-2x_{1}+(2A_{2}+\operatorname{Id})x_{2}+x_{3}-x_{5}\\ 0\in-x_{1}-x_{2}+2A_{3}x_{3}+x_{4}+x_{5}\\ 0=x_{1}-x_{3}\\ 0=x_{2}-x_{3}\end{cases}$	(79a)
	$\displaystyle\Leftrightarrow\begin{cases}x=x_{1}=x_{2}=x_{3}\\ x_{4}\in 2A_{1}x+2x\\ x_{5}\in 2A_{2}x\\ -x_{4}-x_{5}\in 2A_{3}x-2x.\end{cases}$	(79b)

This gives the description

\operatorname{zer}A=\big{\{}{(z,z,z,x_{4},x_{5})\in H}~{}\big{|}~{}{z\in Z\land\tfrac{1}{2}x_{4}\in A_{1}z+z\land\tfrac{1}{2}x_{5}\in A_{2}z\land-\tfrac{1}{2}x_{4}-\tfrac{1}{2}x_{5}\in A_{3}z-z}\big{\}}.

(80)

Next, one verifies that

(M+A)^{-1}:X^{5}\to X^{5}:\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix}\mapsto\begin{bmatrix}y_{1}\\ y_{2}\\ y_{3}\\ y_{4}\\ y_{5}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{(}\tfrac{1}{2}x_{1}\big{)}\\[2.84526pt] J_{A_{2}}\big{(}\tfrac{1}{2}x_{2}+y_{1}\big{)}\\[2.84526pt] J_{A_{3}}\big{(}\tfrac{1}{2}x_{3}+y_{1}+y_{2}\big{)}\\[2.84526pt] x_{4}-2y_{1}+2y_{3}\\[2.84526pt] x_{5}-2y_{2}+2y_{3}\end{bmatrix},

(81)

which is clearly single-valued, full domain, and Lipschitz continuous. Combining 81 and 77, we end up with


$\displaystyle T=(M+A)^{-1}M:X^{5}$	$\displaystyle\to X^{5}$	(82a)
$\displaystyle\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix}$	$\displaystyle\mapsto\begin{bmatrix}y_{1}\\ y_{2}\\ y_{3}\\ y_{4}\\ y_{5}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\tfrac{1}{2}(x_{1}-x_{3}+x_{4})\big{\rparen}\\[2.84526pt] J_{A_{2}}\big{\lparen}\tfrac{1}{2}(x_{2}-x_{3}+x_{5})+y_{1}\big{\rparen}\\[2.84526pt] J_{A_{3}}\big{\lparen}\tfrac{1}{2}(-x_{1}-x_{2}+2x_{3}-x_{4}-x_{5})+y_{1}+y_{2}\big{\rparen}\\[2.84526pt] x_{1}-x_{3}+x_{4}-2y_{1}+2y_{3}\\[2.84526pt] x_{2}-x_{3}+x_{5}-2y_{2}+2y_{3}\end{bmatrix}.$	(82b)

Turning to $\widetilde{T}$ , we verify that $\widetilde{T}=C^{*}(M+A)^{-1}C:D\to D$ is given by

\widetilde{T}\colon\begin{bmatrix}w_{1}\\ w_{2}\end{bmatrix}\mapsto\begin{bmatrix}w_{1}\\ w_{2}\end{bmatrix}+\begin{bmatrix}y_{3}-y_{1}\\ y_{3}-y_{2}\end{bmatrix},\text{~{}~{}where}\begin{bmatrix}y_{1}\\ y_{2}\\ y_{3}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\tfrac{1}{2}w_{1}\big{\rparen}\\[2.84526pt] J_{A_{2}}\big{\lparen}\tfrac{1}{2}w_{2}+z_{1}\big{\rparen}\\[2.84526pt] J_{A_{3}}\big{\lparen}\tfrac{1}{2}(-w_{1}-w_{2})+y_{1}+y_{2}\big{\rparen}\end{bmatrix}.

(83)

The operator $\widetilde{T}$ is a scaled version¹⁶¹⁶16If we denote the original Ryu operator by $T_{\rm R}$ , then $\widetilde{T}(w)=2T_{\rm R}(w/2)$ . So, $\operatorname{Fix}\widetilde{T}=2\operatorname{Fix}T_{\rm R}$ and $\widetilde{T}^{k}(w)=2T_{\rm R}^{k}(w/2)$ . of the original Ryu operator. Combining 80 and 14, we obtain

\operatorname{Fix}\widetilde{T}=\\ \big{\{}{(w_{1},w_{2})\in H}~{}\big{|}~{}{z\in Z\land\tfrac{1}{2}w_{1}\in A_{1}z+z\land\tfrac{1}{2}w_{2}\in A_{2}z\land-\tfrac{1}{2}w_{1}-\tfrac{1}{2}w_{2}\in A_{3}z-z}\big{\}}.

(84)

5.3.2 Specialization to normal cones of linear subspaces

We now assume that each $A_{i}=N_{U_{i}}$ , where each $U_{i}$ is a closed linear subspace of $X$ . Then $Z=\operatorname{zer}(A_{1}+A_{2}+A_{3})=U_{1}\cap U_{2}\cap U_{3}$ . Then 80 turns into

\operatorname{zer}A=\operatorname{Fix}T=\big{\{}{(z,z,z,x_{4},x_{5})\in H}~{}\big{|}~{}{z\in Z\land x_{4}\in U_{1}^{\perp}+2z\land x_{5}\in U_{2}^{\perp}\land x_{4}+x_{5}\in U_{3}^{\perp}+2z}\big{\}},

(85)

while 84 becomes

\operatorname{Fix}\widetilde{T}=\big{\{}{(w_{1},w_{2})\in D}~{}\big{|}~{}{z\in Z\land w_{1}\in U_{1}^{\perp}+2z\land w_{2}\in U_{2}^{\perp}\land w_{1}+w_{2}\in U_{3}^{\perp}+2z}\big{\}}.

(86)

We now provide an alternative description of the two fixed point sets.

Lemma 5.5 (fixed point sets).

Set $S:=\big{\{}{(z,z,z,2z,0)}~{}\big{|}~{}{z\in Z=U_{1}\cap U_{2}\cap U_{3}}\big{\}}\subseteq X^{5}$ and $\Delta_{2}:=\big{\{}{(x,x)}~{}\big{|}~{}{x\in X}\big{\}}\subseteq X^{2}$ . Then¹⁷¹⁷17When $S_{1}\perp S_{2}$ , we also write $S_{1}\oplus S_{2}$ for $S_{1}+S_{2}$ to stress the orthogonality.

\operatorname{Fix}T=S\oplus\big{(}\{0\}^{3}\times\big{(}(U_{1}^{\perp}\times U_{2}^{\perp})\cap(\Delta_{2}^{\perp}+(\{0\}\times U_{3}^{\perp}))\big{)}\big{)}

(87)

and

\operatorname{Fix}\widetilde{T}=(Z\times\{0\})\oplus\big{(}(U_{1}^{\perp}\times U_{2}^{\perp})\cap(\Delta_{2}^{\perp}+(\{0\}\times U_{3}^{\perp}))\big{)}.

(88)

Proof. Recall that $\Delta_{2}^{\perp}=\big{\{}{(x,-x)}~{}\big{|}~{}{x\in X}\big{\}}$ . The identity 87 is a reformulation of 85. To obtain 88, combine 86 with the fact that $2Z=Z$ . The orthogonality statements are a consequence of $Z\subseteq U_{1}\perp U_{1}^{\perp}$ . $\hfill\quad\blacksquare$

Corollary 5.6 (projections).

Set $E:=(U_{1}^{\perp}\times U_{2}^{\perp})\cap(\Delta_{2}^{\perp}+(\{0\}\times U_{3}^{\perp}))$ , where $\Delta_{2}$ is as in Lemma 5.5. Then

(\forall w=(w_{1},w_{2})\in D=X^{2})\quad P_{\operatorname{Fix}\widetilde{T}}w=\big{(}P_{Z}(w_{1}),0\big{)}+P_{E}(w).

(89)

Now let $u=(u_{1},u_{2},u_{3},u_{4},u_{5})\in H=X^{5}$ , set $w:=C^{*}u$ , and $w^{*}:=P_{\operatorname{Fix}\widetilde{T}}(w)$ . Then

P_{\operatorname{Fix}T}^{M}(u)=\big{(}\tfrac{1}{2}P_{Z}w_{1},\tfrac{1}{2}P_{Z}w_{1},\tfrac{1}{2}P_{Z}w_{1},w_{1}^{\ast},w_{2}^{\ast}\big{)}

(90)

Proof. The identity 89 follows directly from 88. We now tackle 90. From Theorem 3.1 (i), we get

P^{M}_{\operatorname{Fix}T}(u)=(M+A)^{-1}CP_{\operatorname{Fix}\widetilde{T}}(C^{*}u)=(M+A)^{-1}CP_{\operatorname{Fix}\widetilde{T}}(w)=(M+A)^{-1}Cw^{*}.

(91)

By 89, $w^{*}=(z+v_{1},v_{2})=(z+v_{1},-v_{1}+v_{3})$ , where $z:=P_{Z}(w_{1})$ and each $v_{i}\in U_{i}^{\perp}$ . Hence $Cw^{*}=(z+v_{1},v_{2},-z-v_{3},z+v_{1},v_{2})$ using 75. In view of 81, we obtain

\displaystyle(M+A)^{-1}Cw^{*}

\displaystyle=(M+A)^{-1}\begin{bmatrix}z+v_{1}\\ v_{2}\\ -z-v_{3}\\ z+v_{1}\\ v_{2}\end{bmatrix}=\begin{bmatrix}[1.3]P_{U_{1}}\big{(}\frac{1}{2}(z+v_{1})\big{)}=\frac{1}{2}z\\ P_{U_{2}}\big{(}\frac{1}{2}v_{2}+\frac{1}{2}z\big{)}=\frac{1}{2}z\\ P_{U_{3}}\big{(}-\frac{1}{2}(z+v_{3})+z\big{)}=\frac{1}{2}z\\ z+v_{1}\\ v_{2}\end{bmatrix}

(92)

and we are done. $\hfill\quad\blacksquare$

Of course if $(u_{k})_{k\in{\mathbb{N}}}$ is the PPP sequence for Ryu, then the weak limit of $(Tu_{k})_{k\in{\mathbb{N}}}$ is $P^{M}_{\operatorname{Fix}T}(u_{0})$ due to Theorem LABEL:*t:wconv (iii). This is given by 90 with $u$ replaced by the starting point $u_{0}$ . The weak limit of the rPPP sequence $(w_{k})_{k\in{\mathbb{N}}}$ is given by 89 (with $w$ replaced by the starting point $w_{0}$ ) — this formula was observed already in [7, Lemma 4.1].

5.4 Malitsky-Tam

5.4.1 General setup

For $n\geq 3$ , we consider $n$ maximally monotone operators $A_{1},\dots,A_{n}$ on the Hilbert space $X$ . The goal becomes to find a point in

Z:=\operatorname{zer}(A_{1}+\dots+A_{n})

(93)

which we assume to be nonempty. The splitting algorithm by Malitsky and Tam [27] can deal with this problem and also fits the structure described in [9]. Assume that

H=X^{2n-1},\;\;D=X^{n-1},\;\;\text{and}\;\;C=\left[\begin{array}[]{rrrrr}\operatorname{Id}\\ -\operatorname{Id}&\operatorname{Id}\\ &-\operatorname{Id}&\ddots&\\ &&\ddots&\operatorname{Id}\\ &&&-\operatorname{Id}\\ \hline\cr\\[-8.53581pt] \operatorname{Id}\\ &\operatorname{Id}\\ &&\ddots\\ &&&\operatorname{Id}\end{array}\right]\colon D\to H.

(94)

Then

C^{\ast}=\left[\begin{array}[]{rrlr|rrr}\operatorname{Id}&-\operatorname{Id}&&&\operatorname{Id}\\ &\ddots&\ddots&&&\ddots\\ &&\operatorname{Id}&-\operatorname{Id}&&&\operatorname{Id}\end{array}\right]\colon H\to D\colon\begin{bmatrix}x_{1}\\ \vdots\\ x_{n}\\ v_{1}\\ \vdots\\ v_{n-1}\end{bmatrix}\mapsto\begin{bmatrix}x_{1}-x_{2}+v_{1}\\ \vdots\\ x_{n-1}-x_{n}+v_{n-1}\end{bmatrix}

(95)

is surjective. Using $C$ and $C^{\ast}$ , we compute

M=\left[\begin{array}[]{rrrrrr|rrrrrrrr}\operatorname{Id}&-\operatorname{Id}&&&&&\operatorname{Id}&&\\ -\operatorname{Id}&2\operatorname{Id}&-\operatorname{Id}&&&&-\operatorname{Id}&\operatorname{Id}&\\ &\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&&&&-\operatorname{Id}&\ddots\\ &&-\operatorname{Id}&2\operatorname{Id}&-\operatorname{Id}&&&&\ddots&\operatorname{Id}\\[5.0pt] &&&-\operatorname{Id}&\operatorname{Id}&&&&&-\operatorname{Id}\\ \hline\cr\operatorname{Id}&-\operatorname{Id}&&&&&\operatorname{Id}\\ &\operatorname{Id}&-\operatorname{Id}&&&&&\phantom{-}\operatorname{Id}\\ &&\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&&&&&\ddots\\ &&&\operatorname{Id}&-\operatorname{Id}&&&&&\operatorname{Id}\end{array}\right].

(96)

Finally, we assume that

A=\left[\begin{array}[]{rrrcl|rrrlr}2A_{1}+\operatorname{Id}&\operatorname{Id}&&&&-\operatorname{Id}&&\\ -\operatorname{Id}&2A_{2}&\ddots&&&\operatorname{Id}&-\operatorname{Id}&\\ &-\operatorname{Id}&\ddots&\operatorname{Id}&&&\operatorname{Id}&\ddots\\ &&\ddots&2A_{n-1}&\operatorname{Id}&&&\ddots&&-\operatorname{Id}\\[5.0pt] -2\operatorname{Id}&&&-\operatorname{Id}&2A_{n}+\operatorname{Id}&&&&&\operatorname{Id}\\ \hline\cr\operatorname{Id}&-\operatorname{Id}&&&&&\\ &\operatorname{Id}&-\operatorname{Id}&&&&&\makebox(-30.0,-20.0)[]{0}\\ &&\rotatebox[origin={lT}]{10.0}{$\ddots$}&\rotatebox[origin={lT}]{10.0}{$\ddots$}&&&&\\ &&&\operatorname{Id}\phantom{-}&-\operatorname{Id}&&&&&\end{array}\right],

(97)

which is maximally monotone because it can be written as the sum of the maximally monotone operator $(x_{1},\ldots,x_{n},v_{1},\ldots,v_{n-1})\mapsto 2A_{1}x_{1}\times\cdots\times 2A_{n}x_{n}\times\{0\}^{n-1}$ , a linear monotone operator that is the gradient of a convex function, and a skew linear operator. Next, given $u:=(x_{1},\dots,x_{n},v_{1},\dots,v_{n-1})\in H$ , we have


$\displaystyle u\in\operatorname{zer}A$	$\displaystyle\Leftrightarrow\left\{\begin{array}[]{rlr}0&\in(2A_{1}+\operatorname{Id})x_{1}+x_{2}-v_{1}\\ 0&\in-x_{i-1}+2A_{i}x_{i}+x_{i+1}+v_{i-1}-v_{i}&\text{ for all }i\in\{2,\dots,n-1\}\\ 0&\in-2x_{1}-x_{n-1}+(2A_{n}+\operatorname{Id})x_{n}+v_{n-1}\\ 0&=x_{j}-x_{j+1}&\text{ for all }j\in\{1,\dots,n-1\}\end{array}\right.$	(98e)
	$\displaystyle\Leftrightarrow\begin{cases}z=x_{1}=\dots=x_{n}\\ v_{1}\in 2(A_{1}+\operatorname{Id})z\\ v_{i}\in 2A_{i}z+v_{i-1}&\text{ for all }i\in\{2,\dots,n-1\}\\ v_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}z+v_{n-2}).\end{cases}$	(98f)

This gives us

$\displaystyle\operatorname{zer}A=$	$\displaystyle\big{\{}(z,\dots,z,v_{1},\dots,v_{n-1})\in H\mid$	$\displaystyle z\in\operatorname{zer}(A_{1}+\dots+A_{n})\land v_{1}\in 2(A_{1}+\operatorname{Id})z$
$\displaystyle(\forall i\in\{2,\dots,n-2\})\;v_{i}-v_{i-1}\in 2A_{i}z$
$\displaystyle\land v_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}z+v_{n-2})\big{\}}.$			(99)

Next, we verify that

(M+A)^{-1}:X^{2n-1}\to X^{2n-1}:\begin{bmatrix}x_{1}\\ \vdots\\ x_{i}\\ \vdots\\ x_{n}\\ v_{1}\\ \vdots\\ v_{n-1}\end{bmatrix}\mapsto\begin{bmatrix}y_{1}\\ \vdots\\ y_{i}\\ \vdots\\ y_{n}\\ w_{1}\\ \vdots\\ w_{n-1}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\frac{1}{2}x_{1}\big{\rparen}\\ \vdots\\ J_{A_{i}}\big{\lparen}\frac{1}{2}x_{i}+y_{i-1}\big{\rparen}\\ \vdots\\ J_{A_{n}}\big{\lparen}\frac{1}{2}x_{n}+y_{1}+y_{n-1}\big{\rparen}\\ v_{1}-2y_{1}+2y_{2}\\ \vdots\\ v_{n-1}-2y_{n-1}+2y_{n}\end{bmatrix}

(100)

which is clearly single-valued, full domain, and Lipschitz continuous. Using the fact that $T=(M+A)^{-1}M$ along with 100 and 96, we get

T\colon H\to H\colon\begin{bmatrix}x_{1}\\ \vdots\\ x_{i}\\ \vdots\\ x_{n}\\ v_{1}\\ \vdots\\ v_{n-1}\end{bmatrix}\mapsto\begin{bmatrix}y_{1}\\ \vdots\\ y_{i}\\ \vdots\\ y_{n}\\ w_{1}\\ \vdots\\ w_{n-1}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\frac{1}{2}(x_{1}-x_{2}+v_{1})\big{\rparen}\\ \vdots\\ J_{A_{i}}\big{\lparen}\frac{1}{2}(-x_{i-1}+2x_{i}-x_{i+1}-v_{i-1}+v_{i})+y_{i-1}\big{\rparen}\\ \vdots\\ J_{A_{n}}\big{\lparen}\frac{1}{2}(-x_{n-1}+x_{n}-v_{n-1})+y_{1}+y_{n-1}\big{\rparen}\\ x_{1}-x_{2}+v_{1}-2y_{1}+2y_{2}\\ \vdots\\ x_{n}-x_{n-1}+v_{n-1}-2y_{n-1}+2y_{n}\end{bmatrix}

(101)

while

\displaystyle\widetilde{T}(w)

\displaystyle=\begin{bmatrix}w_{1}\\ \vdots\\ w_{n-1}\end{bmatrix}+\begin{bmatrix}z_{2}-z_{1}\\ \vdots\\ z_{n}-z_{n-1}\end{bmatrix}\text{ where }\begin{bmatrix}z_{1}\\ \vdots\\ z_{i}\\ \vdots\\ z_{n}\end{bmatrix}=\begin{bmatrix}J_{A_{1}}\big{\lparen}\frac{1}{2}w_{1}\big{\rparen}\\ \vdots\\ J_{A_{i}}\big{\lparen}\frac{1}{2}(-w_{i-1}+w_{i})+z_{i-1}\big{\rparen}\\ \vdots\\ J_{A_{n}}\big{\lparen}\frac{1}{2}(-w_{n-1})+z_{1}+z_{n-1}\big{\rparen}\end{bmatrix}.

(102)

Here, $\widetilde{T}$ becomes a scaled version of the Malitsky-Tam operator from [27, Algorithm 1]. Finally, for $w=(w_{1},\dots,w_{n-1})$ , we have


$\displaystyle\operatorname{Fix}\widetilde{T}$	$\displaystyle=\big{\{}w\in X^{n-1}\mid\;\begin{aligned} &z\in X\land w_{1}\in 2(\operatorname{Id}+A_{1})z\\ &\land(\forall i\in\{2,\dots,n-2\})w_{i}-w_{i-1}\in 2A_{i}z\\ &\land w_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}+w_{n-2})\big{\}}\end{aligned}$	(103a)
	$\displaystyle=\big{\{}w\in X^{n-1}\mid\begin{aligned} &z\;\in\operatorname{zer}(A_{1}+\dots+A_{n})\land w_{1}\in 2(\operatorname{Id}+A_{1})z\\ &\land(\forall i\in\{2,\dots,n-2\})w_{i}-w_{i-1}\in 2A_{i}z\\ &\land w_{n-1}\in 2(\operatorname{Id}-A_{n})z\cap(2A_{n-1}z+w_{n-2})\big{\}}.\end{aligned}$	(103b)

To obtain in 103b that $z\in\operatorname{zer}(A_{1}+\dots+A_{n})$ , we use the telescoping property on the statements for $w_{i}$ in 103a.

5.4.2 Specialization to normal cones of linear subspaces

We now assume that $U_{i}$ is a closed linear subspace of $X$ and that $A_{i}=N_{U_{i}}$ for each $i\in\{1,\dots,n\}$ . Then $Z=\operatorname{zer}(A_{1}+\dots+A_{n})=U_{1}\cap\dots\cap U_{n}$ and 99 yields

$\displaystyle\operatorname{Fix}T$	$\displaystyle=\big{\{}(x,\dots,x,v_{1},\dots,v_{n-1})\mid\;$	$\displaystyle x\in Z\land v_{1}-2x\in U_{1}^{\perp}$
$\displaystyle\land(\forall i\in\{2,\dots,n-2\})v_{i}\in U_{i}^{\perp}+v_{i-1}$
$\displaystyle\land v_{n-1}\in(U_{n}^{\perp}+2x)\cap(U_{n-1}^{\perp}+v_{n-2})\big{\}}$			(104)

and

	$\displaystyle\operatorname{Fix}\widetilde{T}$	$\displaystyle=\big{\{}w\in X^{n-1}\mid$	$\displaystyle\;z\in Z\land w_{1}\in U_{1}^{\perp}+2z\land(\forall i\in\{2,\dots,n-2\})w_{i}\in U_{i}^{\perp}+w_{i-1}$
	$\displaystyle\land w_{n-1}\in(U_{n}^{\perp}+2z)\cap(U_{n-1}^{\perp}+w_{n-2})\big{\}}.$				(105)

Lemma 5.7.

Set $\Delta=\big{\{}{(x,\dots,x)\in X^{n-1}}~{}\big{|}~{}{x\in X}\big{\}}$ and define


$\displaystyle E$	$\displaystyle:={\operatorname{ran}}\,\Psi\cap(X^{n-2}\times U_{n}^{\perp})$	(106a)
	$\displaystyle\subseteq U_{1}^{\perp}\times\dots\times(U_{1}^{\perp}+\cdots+U_{n-2}^{\perp})\times((U_{1}^{\perp}+\cdots+U_{n-1}^{\perp})\cap U_{n}^{\perp})$	(106b)

such that

\displaystyle\Psi:

\displaystyle=U_{1}^{\perp}\times\dots\times U_{n}^{\perp}\to X^{n-1}\colon(y_{1},\ldots,y_{n})\mapsto(y_{1},y_{1}+y_{2},\dots,y_{1}+\dots+y_{n-1}).

(107)

Then

\operatorname{Fix}T=\big{\{}{(\underbrace{x,\dots,x}_{n\text{\;copies}},2x,\dots,2x)\in X^{2n-1}}~{}\big{|}~{}{x\in Z}\big{\}}\oplus(\{0\}^{n}\times E),

(108)

and

\operatorname{Fix}\widetilde{T}=\Delta_{Z^{n-1}}\oplus E,

(109)

where $\Delta_{Z^{n-1}}:=\Delta\cap Z^{n-1}$ .

Proof. Using 104, we know that a generic point in $\operatorname{Fix}T$ is represented by

\begin{bmatrix}x\\ \vdots\\ x\\ 2x+y_{1}\\ \vdots\\ 2x+\sum_{j=1}^{n-2}y_{j}\\ 2x+\sum_{j=1}^{n-1}y_{j}\end{bmatrix}=\begin{bmatrix}x\\ \vdots\\ x\\ 2x+y_{1}\\ \vdots\\ 2x+\sum_{j=1}^{n-2}y_{j}\\ 2x+y_{n}\end{bmatrix},

(110)

where $y_{i}\in U_{i}^{\perp}$ for each $i\in\{1,\dots,n\}$ . This yields 108. The orthogonality in 108 follows from 106b and the fact that $Z^{\perp}=\overline{(U_{1}^{\perp}+\dots+U_{n}^{\perp})}$ . The representation for 109 can be obtained similarly from Section 5.4.2. Finally, the orthogonality in 109 follows as $\Delta_{Z^{n-1}}\subseteq Z^{n-1}\perp\overline{U_{1}^{\perp}+\dots+U_{n}^{\perp}}^{n-1}\supseteq E$ . $\hfill\quad\blacksquare$

Corollary 5.8.

Using the definition of $E$ from Lemma 5.7, for $w\in X^{n-1}$ ,

P_{\operatorname{Fix}\widetilde{T}}w=\big{(}P_{Z}(\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}}),\dots,P_{Z}(\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}})\big{)}+P_{E}(w)

(111)

where $\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}}:=\frac{1}{n-1}\sum_{i=1}^{n-1}w_{i}$ . Now let $u\in H=X^{2n-1}$ , set $w=C^{\ast}u$ and $w^{\ast}=P_{\operatorname{Fix}\widetilde{T}}w$ . Then

\displaystyle P_{\operatorname{Fix}T}^{M}(u)

\displaystyle=\big{(}\tfrac{1}{2}P_{Z}\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}},\ldots,\tfrac{1}{2}P_{Z}\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}},w_{1}^{\ast},\ldots,w_{n-1}^{\ast}\big{)}.

(112)

Proof. The identity 111 can be obtained from 109 after using the fact that $P_{Z^{n-1}\cap\Delta}=P_{Z^{n-1}}P_{\Delta}$ (see [7, Proof of Lemma 4.3]). Note that the projection onto $\Delta$ can be interpreted as reproducing the average of the argument a suitable number of times. For 112, we observe through 111 that $w^{\ast}_{k}=z+\sum_{j=1}^{k}v_{j}$ for $k\in\{1,\dots,n-1\}$ for some $v_{i}\in U_{i}^{\perp}$ and $z=P_{Z}(\hbox to0.0pt{\hskip 1.21411pt\leavevmode\hbox{\set@color$\overline{\hbox{}}$}\hss}{\leavevmode\hbox{\set@color$w$}})$ . Therefore using 91 again, but this time with 100, we get

P_{\operatorname{Fix}T}^{M}(u)=(M+A)^{-1}\begin{bmatrix}z+v_{1}\\ v_{2}\\ \vdots\\ v_{n-1}\\ -v_{n}-z\\ w_{1}^{\ast}\\ \vdots\\ w_{n-1}^{\ast}\end{bmatrix}=\begin{bmatrix}P_{U_{1}}(\frac{1}{2}(z+v_{1}))=\frac{1}{2}z\\ \vdots\\ P_{U_{i}}(\frac{1}{2}v_{i}+\frac{1}{2}z)=\frac{1}{2}z\\ \vdots\\ P_{U_{n}}(-\frac{1}{2}(v_{n}+z)+z)=\frac{1}{2}z\\ w_{1}^{\ast}\\ \vdots\\ w_{n-1}^{\ast}\end{bmatrix}

(113)

and we are done. $\hfill\quad\blacksquare$

6 More on Chambolle-Pock

In this section, we provide some additional insights about the Chambolle-Pock framework discussed in Section 5.2.

6.1 Factoring $M$

In this section, we discuss how to find a factorization of $M$ into $CC^{*}$ using Linear Algebra techniques. Suppose that $X=\mathbb{R}^{n}$ , $Y=\mathbb{R}^{m}$ , and $L\in\mathbb{R}^{m\times n}$ . Recall from 46 that

M=\begin{bmatrix}\frac{1}{\sigma}\operatorname{Id}_{X}&-L^{*}\\ -L&\frac{1}{\tau}\operatorname{Id}_{Y}\end{bmatrix}\in\mathbb{R}^{(n+m)\times(n+m)}.

(114)

Then $M\succeq 0$ $\Leftrightarrow$ $\sigma\tau\|L\|^{2}\leq 1$ which we assume henceforth. The following result, which is easily verified directly, provides a factorization of $M$ into $CC^{*}$ :

Lemma 6.1.

Suppose $Z\in\mathbb{R}^{m\times m}$ satisfies¹⁸¹⁸18For instance, $Z$ could arise from a Cholesky factorization of (or as the square root of) $\operatorname{Id}_{Y}-\sigma\tau LL^{*}$ . A referee pointed out that the result can be generalized to Hilbert spaces. Indeed, the proof of the operator version works exactly the same: use the form given in 116 to compute $CC^{*}$ , then simplify the result with 115 to finally obtain $M$ from 114.

ZZ^{*}=\operatorname{Id}_{Y}-\sigma\tau LL^{*}.

(115)

Then

C:=\begin{bmatrix}\frac{1}{\sqrt{\sigma}}\operatorname{Id}_{X}&0\\ -\sqrt{\sigma}L&\frac{1}{\sqrt{\tau}}Z\end{bmatrix}

(116)

factors $M$ into $CC^{*}$ .

Example 6.2.

If $m=n=1$ , $\sigma=\tau=1$ , and $L=[\lambda]$ , where $|\lambda|\leq 1$ , then

C=\begin{bmatrix}1&0\\ -\lambda&\sqrt{1-\lambda^{2}}\end{bmatrix}

(117)

factors $M$ into $CC^{*}$ .

Example 6.3.

If $\sigma\tau LL^{*}=\operatorname{Id}_{Y}$ , then $Z=0$ and we can simplify and reduce 116 further to

C:=\begin{bmatrix}\frac{1}{\sqrt{\sigma}}\operatorname{Id}_{X}\\ -\sqrt{\sigma}L\end{bmatrix}\in\mathbb{R}^{(n+m)\times n},

(118)

which factors $M$ into $CC^{*}$ .

6.2 An example on the real line

Consider the general setup in Section 5.2.1. We now specialize this further to the case when $X=Y=\mathbb{R}$ and $L=\lambda\operatorname{Id}$ , where $|\lambda|\leq 1$ . Clearly, $\|L\|=|\lambda|$ . We also assume that $\sigma=\tau=1$ . Then $\sigma\tau\|L\|^{2}\leq 1$ and the preconditioner matrix (see 46) is

M_{\lambda}:=\begin{bmatrix}1&-\lambda\\ -\lambda&1\end{bmatrix}.

(119)

The matrix $M_{\lambda}$ acts on $H=\mathbb{R}^{2}$ , and it is indeed positive semidefinite, with eigenvalues $1\pm\lambda$ and

\|M_{\lambda}\|=1+|\lambda|.

(120)

If $|\lambda|<1$ , then

M_{\lambda}^{-1}=\frac{1}{1-\lambda^{2}}\begin{bmatrix}1&\lambda\\ \lambda&1\end{bmatrix}\;\;\text{has eigenvalues $\frac{1}{1\pm\lambda}$, \;\;and so \;\; $\|M_{\lambda}^{-1}\|=\frac{1}{1-|\lambda|}$}.

(121)

To find a factorization $M_{\lambda}=C_{\lambda}C_{\lambda}^{*}$ , we follow the recipe given in [8, Proof of Proposition 2.3], which starts by determining the principal square root of $M_{\lambda}$ . Indeed, one directly verifies that

S_{\lambda}:=\frac{1}{2}\begin{bmatrix}\sqrt{1-\lambda}+\sqrt{1+\lambda}&\sqrt{1-\lambda}-\sqrt{1+\lambda}\\[5.69054pt] \sqrt{1-\lambda}-\sqrt{1+\lambda}&\sqrt{1-\lambda}+\sqrt{1+\lambda}\end{bmatrix}=\sqrt{M_{\lambda}}

(122)

and that $S_{\lambda}$ has eigenvalues $\sqrt{1+\lambda},\sqrt{1-\lambda}$ , and hence $\|S_{\lambda}\|=\sqrt{1+|\lambda|}$ . Note that we have the equivalences $S_{\lambda}$ is invertible $\Leftrightarrow$ $M_{\lambda}$ is invertible $\Leftrightarrow$ $|\lambda|<1$ , in which case

S_{\lambda}^{-1}=\frac{1}{2}\begin{bmatrix}\frac{1}{\sqrt{1-\lambda}}+\frac{1}{\sqrt{1+\lambda}}&\frac{1}{\sqrt{1-\lambda}}-\frac{1}{\sqrt{1+\lambda}}\\[5.69054pt] \frac{1}{\sqrt{1-\lambda}}-\frac{1}{\sqrt{1+\lambda}}&\frac{1}{\sqrt{1-\lambda}}+\frac{1}{\sqrt{1+\lambda}}\end{bmatrix},

(123)

$D=\overline{\operatorname{ran}}\,S_{\lambda}=\mathbb{R}^{2}=H$ , $S_{\lambda}^{-1}$ has eigenvalues $1/\sqrt{1+\lambda},1/\sqrt{1-\lambda}$ , and $\|S_{\lambda}^{-1}\|=1/\sqrt{1-|\lambda|}$ . In addition,

S_{\pm 1}=\frac{1}{\sqrt{2}}\begin{bmatrix}1&\mp 1\\[5.69054pt] \mp 1&1\end{bmatrix},

(124)

and here $D=\overline{\operatorname{ran}}\,S_{\pm 1}=\mathbb{R}[1,\mp 1]{{}^{\mkern-1.5mu\mathsf{T}}}$ is a proper subspace of $H$ .

Let us sum up the discussion as follows.

Lemma 6.4.

Using the square root approach, we may factor $M_{\lambda}$ into $M_{\lambda}=C_{\lambda}C_{\lambda}^{*}$ with

C_{\lambda}:=\frac{1}{2}\begin{bmatrix}\sqrt{1-\lambda}+\sqrt{1+\lambda}&\sqrt{1-\lambda}-\sqrt{1+\lambda}\\[5.69054pt] \sqrt{1-\lambda}-\sqrt{1+\lambda}&\sqrt{1-\lambda}+\sqrt{1+\lambda}\end{bmatrix},

(125)

where $C_{\lambda}=C_{\lambda}^{*}$ . If $|\lambda|=1$ , then we may also factor $M_{\pm 1}$ into $C_{\pm 1}C_{\pm 1}^{*}$ , where

C_{\pm 1}=\begin{bmatrix}1\\ \mp 1\end{bmatrix}.

(126)

In fact, the verification of Lemma 6.4 is algebraic in nature and one may directly also verify the following operator variants of 122 and 123:

Lemma 6.5.

Suppose that $Y=X$ and let $L\colon X\to X$ be symmetric and positive semidefinite with $\|L\|\leq 1$ . Then the principal square root of the preconditioning operator

M=\begin{bmatrix}\operatorname{Id}&-L\\ -L&\operatorname{Id}\end{bmatrix}

(127)

\sqrt{M}=\frac{1}{2}\begin{bmatrix}\sqrt{\operatorname{Id}-L}+\sqrt{\operatorname{Id}+L}&\sqrt{\operatorname{Id}-L}-\sqrt{\operatorname{Id}+L}\\[5.69054pt] \sqrt{\operatorname{Id}-L}-\sqrt{\operatorname{Id}+L}&\sqrt{\operatorname{Id}-L}+\sqrt{\operatorname{Id}+L}\end{bmatrix}.

(128)

If $\|L\|<1$ , then $\operatorname{Id}\pm L$ are invertible and positive definite; moreover,

\sqrt{M}^{-1}=\frac{1}{2}\begin{bmatrix}\sqrt{(\operatorname{Id}-L)^{-1}}+\sqrt{(\operatorname{Id}+L)^{-1}}&\sqrt{(\operatorname{Id}-L)^{-1}}-\sqrt{(\operatorname{Id}+L)^{-1}}\\[5.69054pt] \sqrt{(\operatorname{Id}-L)^{-1}}-\sqrt{(\operatorname{Id}+L)^{-1}}&\sqrt{(\operatorname{Id}-L)^{-1}}+\sqrt{(\operatorname{Id}+L)^{-1}}\end{bmatrix}.

(129)

The following generalization of Lemma 6.5 was prompted by comments of a reviewer.

Lemma 6.6.

Suppose that $X=\mathbb{R}^{n}$ , $Y=\mathbb{R}^{m}$ , and $L\colon X\to Y$ , i.e., $L\in\mathbb{R}^{m\times n}$ , satisfies $\|L\|\leq 1$ . Set $S:=\sqrt{L^{*}L}$ , $U:=LS^{\dagger}$ , and $T:=\sqrt{LL^{*}}$ which give rise to the canonical polar decompositions of $L$ and $L^{*}$ :

L=US\quad\text{and}\quad L^{*}=U^{*}T.

(130)

Set $A:=\arcsin(S)$ and $B:=\arcsin(T)$ . Then the principal square root of

M:=\begin{bmatrix}\operatorname{Id}_{X}&-L^{*}\\ -L&\operatorname{Id}_{Y}\end{bmatrix}

(131)


$\displaystyle\sqrt{M}$	$\displaystyle=\begin{bmatrix}\cos(A/2)&-U^{*}\sin(B/2)\\[5.69054pt] -U\sin(A/2)&\cos(B/2)\end{bmatrix}$	(132a)
	$\displaystyle=\frac{1}{2}\begin{bmatrix}\sqrt{\operatorname{Id}_{X}-\sqrt{L^{}L}}+\sqrt{\operatorname{Id}_{X}+\sqrt{L^{}L}}&\;\;U^{}\bigg{(}\sqrt{\operatorname{Id}_{Y}-\sqrt{LL^{}}}-\sqrt{\operatorname{Id}_{Y}+\sqrt{LL^{}}}\bigg{)}\\[11.38109pt] U\bigg{(}\sqrt{\operatorname{Id}_{X}-\sqrt{L^{}L}}-\sqrt{\operatorname{Id}_{X}+\sqrt{L^{}L}}\bigg{)}&\;\;\sqrt{\operatorname{Id}_{Y}-\sqrt{LL^{}}}+\sqrt{\operatorname{Id}_{Y}+\sqrt{LL^{*}}}\end{bmatrix}.$	(132b)

Proof. Because this result is not needed elsewhere, we only sketch the proof which relies on some more advanced matrix analysis (see [23] and [24, 25] for background material). (In fact, the reviewer suggested this will work for general operators using results from [13].) Note that $S$ and $T$ are symmetric and positive semidefinite. We have $P_{{\operatorname{ran}}\,S}S=S=SS^{\dagger}S=SP_{{\operatorname{ran}}\,S^{*}}=SP_{{\operatorname{ran}}\,S}$ . For the statement on the canonical polar decomposition, see [23, Theorem 8.3 and remarks on page 195] which also has

\displaystyle U^{*}U=SS^{\dagger}=P_{{\operatorname{ran}}\,S}.

(133)

This implies $LL^{*}U=(US)(US)^{*}U=USSU^{*}U=US^{2}P_{{\operatorname{ran}}\,S}=US^{2}=UL^{*}L$ and so

T^{2}U=(LL^{*})U=U(L^{*}L)=US^{2}

(134)

and similarly

S^{2}U^{*}=(L^{*}L)U^{*}=U^{*}(LL^{*})=U^{*}T^{2}.

(135)

The last identity extends to monomials in the form $(L^{*}L)^{k}U^{*}=U^{*}(LL^{*})^{k}$ and further to polynomials. For $f$ suitably defined on the spectra of $LL^{*}$ and $L^{*}L$ , we have, using [23, Theorem 1.33],

f(L^{*}L)U^{*}=U^{*}f(LL^{*})

(136)

and similarly $f(LL^{*})U=Uf(L^{*}L)$ .

The spectra of $S$ and $T$ lie in $[0,1]$ which makes $A$ and $B$ well defined, with spectra in $[0,\pi/2]$ . It follows that the spectra of $A/2$ and $B/2$ lie in $[0,\pi/4]$ which yields spectra of $\cos(A/2)$ and $\cos(B/2)$ in $[1/\sqrt{2},1]$ . Hence $\cos(A/2)$ and $\cos(B/2)$ are invertible.

Now set

W:=\begin{bmatrix}\cos(A/2)&-U^{*}\sin(B/2)\\[5.69054pt] -U\sin(A/2)&\cos(B/2)\end{bmatrix}.

(137)

The matrices $A$ and $B$ are symmetric, and so is $W$ because $(U\sin(A/2))^{*}=\sin(A/2)U^{*}=U^{*}\sin(B/2)$ follows from 136 with $f(t)=\sin(\tfrac{1}{2}\arcsin(\sqrt{t}))$ . Moreover, since $\cos(A/2)\succ 0$ , $\sin(A/2)=\frac{1}{2}\sin(A)(\cos(A/2))^{-1}$ and $P_{{\operatorname{ran}}\,S}=U^{*}U$ , we have ${\operatorname{ran}}\,\sin(A/2)\subseteq{\operatorname{ran}}\,\sin(A)={\operatorname{ran}}\,(S)=\operatorname{Fix}P_{{\operatorname{ran}}\,S}=\operatorname{Fix}U^{*}U$ . Because $\cos(B/2)\succ 0$ , we obtain altogether


	$\displaystyle\cos(A/2)-U^{*}\sin(B/2)\big{(}\cos(B/2)\big{)}^{-1}U\sin(A/2)$		(138a)
	$\displaystyle=\cos(A/2)-U^{*}\big{(}\cos(B/2)\big{)}^{-1}\sin(B/2)U\sin(A/2)$		(138b)
	$\displaystyle=\cos(A/2)-\big{(}\cos(A/2)\big{)}^{-1}U^{*}U\sin(A/2)\sin(A/2)$		(138c)
	$\displaystyle=\cos(A/2)-\big{(}\cos(A/2)\big{)}^{-1}\sin^{2}(A/2)$		(138d)
	$\displaystyle=\big{(}\cos(A/2)\big{)}^{-1}\Big{(}\cos^{2}(A/2)-\sin^{2}(A/2)\Big{)}$		(138e)
	$\displaystyle=\big{(}\cos(A/2)\big{)}^{-1}\cos(A)$		(138f)
	$\displaystyle\succeq 0,$		(138g)

and a Schur complement argument shows that $W$ is positive semidefinite. We now show that $W^{2}=M$ . We start with the $(W^{2})_{1,1}$ , the upper left block of $W^{2}$ :


$\displaystyle(W^{2})_{1,1}$	$\displaystyle=\cos^{2}(A/2)+U^{*}\sin(B/2)U\sin(A/2)$	(139a)
	$\displaystyle=\cos^{2}(A/2)+\sin(A/2)U^{*}U\sin(A/2)$	(139b)
	$\displaystyle=\cos^{2}(A/2)+\sin(A/2)\sin(A/2)$	(139c)
	$\displaystyle=\operatorname{Id}_{X}.$	(139d)

Similarly, $(W^{2})_{2,2}=\operatorname{Id}_{Y}$ . Next,


$\displaystyle(W^{2})_{2,1}$	$\displaystyle=-U\sin(A/2)\cos(A/2)+\cos(B/2)\big{(}-U\sin(A/2)\big{)}$	(140a)
	$\displaystyle=-U\sin(A/2)\cos(A/2)-U\cos(A/2)\sin(A/2)$	(140b)
	$\displaystyle=-U\sin(A)$	(140c)
	$\displaystyle=-L.$	(140d)

Similarly, $(W^{2})_{1,2}=-L^{*}$ . We have shown that $W^{2}=M$ , and so 132a is verified. Finally, for $\theta\in[-1,1]$ we have


$\displaystyle\cos\big{(}\tfrac{1}{2}\arcsin(\theta)\big{)}$	$\displaystyle=\frac{\sqrt{1+\theta}+\sqrt{1-\theta}}{2},$	(141a)
$\displaystyle\sin\big{(}\tfrac{1}{2}\arcsin(\theta)\big{)}$	$\displaystyle=\frac{\sqrt{1+\theta}-\sqrt{1-\theta}}{2},$	(141b)

and this gives 132b. $\hfill\quad\blacksquare$

Corollary 6.7.

Suppose that $X=\mathbb{R}^{n}$ , $Y=\mathbb{R}^{m}$ , and $L\in\mathbb{R}^{m\times n}$ satisfies $\sigma^{2}\|L\|\leq 1$ , where $\sigma>0$ . Set $U:=L(\sqrt{L^{*}L})^{\dagger}$ . Then the principal square root of

M:=\begin{bmatrix}\tfrac{1}{\sigma}\operatorname{Id}_{X}&-L^{*}\\ -L&\tfrac{1}{\sigma}\operatorname{Id}_{Y}\end{bmatrix}

(142)

\frac{1}{2}\begin{bmatrix}\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}-\sqrt{L^{*}L}}+\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}+\sqrt{L^{*}L}}&\;\;U^{*}\bigg{(}\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}-\sqrt{LL^{*}}}-\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}+\sqrt{LL^{*}}}\bigg{)}\\[11.38109pt] U\bigg{(}\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}-\sqrt{L^{*}L}}-\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{X}+\sqrt{L^{*}L}}\bigg{)}&\;\;\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}-\sqrt{LL^{*}}}+\sqrt{\tfrac{1}{\sigma}\operatorname{Id}_{Y}+\sqrt{LL^{*}}}\end{bmatrix}.

(143)

Proof. Set $S:=\sqrt{L^{*}L}$ , $T:=\sqrt{LL^{*}}$ , and $L_{1}:=\sigma L$ . Then $\|L_{1}\|\leq 1$ . Next, set ${S}_{1}:=\sqrt{{L_{1}^{*}}{L_{1}}}=\sigma\sqrt{L^{*}L}=\sigma S$ , $U_{1}:={L_{1}}({S_{1}})^{\dagger}=LS^{\dagger}=U$ , and $T_{1}:=\sqrt{{L_{1}}{L_{1}^{*}}}=\sigma\sqrt{LL^{*}}=\sigma T$ . By Lemma 6.6, the canonical polar decomposition of $L_{1}$ is $L_{1}=U_{1}S_{1}=U(\sigma S)$ and we have the principal square root of

M_{1}:=\begin{bmatrix}\operatorname{Id}_{X}&-L_{1}^{*}\\ -L_{1}&\operatorname{Id}_{Y}\end{bmatrix}={\sigma}M

(144)

\sqrt{M_{1}}=\frac{1}{2}\begin{bmatrix}\sqrt{\operatorname{Id}_{X}-S_{1}}+\sqrt{\operatorname{Id}_{X}+S_{1}}&\;\;U^{*}\big{(}\sqrt{\operatorname{Id}_{Y}-T_{1}}-\sqrt{\operatorname{Id}_{Y}+T_{1}}\big{)}\\[11.38109pt] U\big{(}\sqrt{\operatorname{Id}_{X}-S_{1}}-\sqrt{\operatorname{Id}_{X}+S_{1}}\big{)}&\;\;\sqrt{\operatorname{Id}_{Y}-T_{1}}+\sqrt{\operatorname{Id}_{Y}+T_{1}}\end{bmatrix}.

(145)

It follows that $\tfrac{1}{\sqrt{\sigma}}\sqrt{M_{1}}=\sqrt{M}$ , i.e.,

\sqrt{M}=\frac{1}{2\sqrt{\sigma}}\begin{bmatrix}\sqrt{\operatorname{Id}_{X}-\sigma S}+\sqrt{\operatorname{Id}_{X}+\sigma S}&\;\;U^{*}\big{(}\sqrt{\operatorname{Id}_{Y}-\sigma T}-\sqrt{\operatorname{Id}_{Y}+\sigma T}\big{)}\\[11.38109pt] U\big{(}\sqrt{\operatorname{Id}_{X}-\sigma S}-\sqrt{\operatorname{Id}_{X}+\sigma S}\big{)}&\;\;\sqrt{\operatorname{Id}_{Y}-\sigma T}+\sqrt{\operatorname{Id}_{Y}+\sigma T}\end{bmatrix},

(146)

which yields the conclusion. $\hfill\quad\blacksquare$

6.2.1 An example where ${\operatorname{ran}}\,M$ is not closed

Bredies et al. hinted in [8, Remark 3.1] the existence of $M$ which may fail to have closed range. We now provide a setting where $M$ , $D$ , and $C$ are explicit and where indeed $D$ is not smaller than $H$ . To this end, we assume that $X=Y=\ell^{2}$ and

L\colon\ell^{2}\to\ell^{2}\colon\mathbf{x}=(x_{n})_{n\in{\mathbb{N}}}\mapsto(\lambda_{n}x_{n})_{n\in{\mathbb{N}}},\quad\text{where $(\lambda_{n})_{n\in{\mathbb{N}}}$ is a sequence in $\left]0,1\right[$.}

(147)

It is clear that $L$ is a one-to-one continuous linear operator with $\|L\|=\sup_{n\in{\mathbb{N}}}\lambda_{n}\leq 1$ , and $\overline{\operatorname{ran}}\,L=Y$ . Moreover, $L$ is onto $\Leftrightarrow$ $\inf_{n\in{\mathbb{N}}}\lambda_{n}>0$ . We think of $L$ as a Cartesian product of operators $\mathbb{R}\to\mathbb{R}\colon\xi\mapsto\lambda_{n}\xi$ . Now assume that $\sigma=\tau=1$ . In view of our work in Section 6.2, we can interpret the associated preconditioner $M$ with

M\colon(x_{0},x_{1},\ldots)\mapsto M_{\lambda_{0}}x_{0}\times M_{\lambda_{1}}x_{1}\times\cdots,

(148)

where $M_{\lambda}=C_{\lambda}C_{\lambda}^{*}$ , using 119 and Lemma 6.4. Set $C\colon(x_{0},x_{1},\ldots)\mapsto C_{\lambda_{0}}x_{0}\times C_{\lambda_{1}}x_{1}\times\cdots$ . Then $H=D=\overline{\operatorname{ran}}\,C^{*}$ . Note that $M$ is a continuous linear operator, with

\|M\|=\sup_{n\in{\mathbb{N}}}\|M_{\lambda_{n}}\|=1+\sup_{n\in{\mathbb{N}}}\lambda_{n}.

(149)

using 120. It is clear that $M$ is injective and has dense range because each $M_{\lambda_{n}}$ is surjective. However, the smallest eigenvalue of $M_{\lambda_{n}}$ is $1-\lambda_{n}$ ; thus,

\text{$M$ is bijective}\;\;\Leftrightarrow\;\;\sup_{n\in{\mathbb{N}}}\lambda_{n}<1,

(150)

in which case $\|M^{-1}\|=(1-\sup_{n\in{\mathbb{N}}}\lambda_{n})^{-1}$ by 121. As ${\operatorname{ran}}\,M$ is dense, we have $\overline{\operatorname{ran}}\,M=X$ . It follows from 150 that ${\operatorname{ran}}\,M=X\Leftrightarrow\sup_{{n\in{\mathbb{N}}}}\lambda_{n}<1$ .

It is time for a summary.

Proposition 6.8.

Suppose that $\sup_{n\in{\mathbb{N}}}\lambda_{n}=1$ . Then $M$ is not surjective and ${\operatorname{ran}}\,M$ is dense, but not closed. If $C\colon D\to H$ is any factorization of $M$ in the sense that $M=CC^{*}$ , then ${\operatorname{ran}}\,C^{*}$ is not closed either.

Proof. We observed already the statement on $M$ . If ${\operatorname{ran}}\,C^{*}$ was closed, then [15, Lemma 8.40] would imply that ${\operatorname{ran}}\,(C)={\operatorname{ran}}\,(CC^{*})={\operatorname{ran}}\,(M)$ is closed which is absurd. $\hfill\quad\blacksquare$

Remark 6.9.

Suppose that $\sup_{n\in{\mathbb{N}}}\lambda_{n}=1$ . Then [8, Proof of Proposition 2.3] suggests to find $C$ by first computing the square root $S$ of $M$ . Indeed, using 122, we see that $S$ is given by

S\colon(x_{0},x_{1},\ldots)\mapsto S_{\lambda_{0}}x_{0}\times S_{\lambda_{1}}x_{1}\times\cdots,

(151)

restricted to $\ell^{2}$ . Once again, we have that ${\operatorname{ran}}\,S$ is dense but not closed. Hence $C=S$ and $D=H$ in this case.

Acknowledgments

We thank the Editor-in-Chief, Dr. Jong-Shi Pang, for his guidance and support. We also would like to thank two anonymous reviewers for their unusually careful reading and constructive comments which helped us to improve this manuscript significantly. The research of the authors was partially supported by Discovery Grants of the Natural Sciences and Engineering Research Council of Canada.

References

[1] J.B. Baillon, R.E. Bruck, and S. Reich: On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces, Houston Journal of Mathematics 4 (1978), 1–9. https://www.math.uh.edu/~hjm/vol04-1.html
[2] H.H. Bauschke: Fenchel duality, Fitzpatrick functions and the extension of firmly nonexpansive mappings, Proceedings of the AMS 135 (2007), 135–139. https://doi.org/10.1090/S0002-9939-06-08770-3
[3] H.H. Bauschke, J.Y. Bello Cruz, T.T.T. Nghia, H.M. Phan, and X. Wang: The rate of linear convergence of the Douglas-Rachford algorithm for subspaces is the cosine of the Friedrichs angle, Journal of Approximation Theory 185 (2014), 63–79. https://doi.org/10.1016/j.jat.2014.06.002
[4] H.H. Bauschke, J.Y. Bello Cruz, T.T.T. Nghia, H.M. Phan, and X. Wang: Optimal rates of convergence of matrices with applications, Numerical Algorithms 73 (2016), 33–76. https://doi.org/10.1007/s11075-015-0085-4
[5] H.H. Bauschke, R.I. Boţ, W.L. Hare, and W.M. Moursi: Attouch-Théra duality revisited: paramonotonicity and operator splitting, Journal of Approximation Theory 164 (2012), 1065–1084. https://doi.org/10.1016/j.jat.2012.05.008
[6] H.H. Bauschke and P.L. Combettes: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edition, Springer, 2017. https://doi.org/10.1007/978-3-319-48311-5
[7] H.H. Bauschke, S. Singh, and X. Wang: The splitting algorithms by Ryu, by Malitsky-Tam, and by Campoy applied to normal cones of linear subspaces converge strongly to the projection onto the intersection, SIAM Journal on Optimization 33 (2023), 739–765. https://doi.org/10.1137/22M1483165
[8] K. Bredies, E. Chenchene, D.A. Lorenz, and E. Naldi: Degenerate preconditioned proximal point algorithms, SIAM Journal on Optimization 32 (2022), 2376–2401. https://doi.org/10.1137/21M1448112
[9] K. Bredies, E. Chenchene, and E. Naldi: Graph and distributed extensions of the Douglas-Rachford algorithm, SIAM Journal on Optimization 34 (2024), https://doi.org/10.1137/22M1535097 and also https://arxiv.org/abs/2211.04782
[10] L.M. Briceño-Arias and F. Roldán: Resolvent of the parallel composition and the proximity operator of the infimal postcomposition, Optimization Letters 17 (2023), 399–412. https://doi.org/10.1007/s11590-022-01906-5
[11] A. Chambolle and T. Pock: A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging, Journal of Mathematical Imaging and Vision 40 (2011), 120–145. https://doi.org/10.1007/s10851-010-0251-1
[12] L. Condat: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms, Journal of Optimization Theory and Applications 158 (2013), 460–479. https://doi.org/10.1007/s10957-012-0245-9
[13] J.B. Conway: A Course in Functional Analysis, 2nd edition, Springer, 1990. https://doi.org/10.1007/978-1-4757-4383-8
[14] R. Cross: Multivalued Linear Operators, Marcel Dekker, 1998.
[15] F. Deutsch, Best Approximation in Inner Product Spaces, Springer, 2001. https://doi.org/10.1007/978-1-4684-9298-9
[16] J.W. Eaton, D. Bateman, S. Hauberg, and R. Wehbring: GNU Octave: a high-level interactive language for numerical computations. https://octave.org/
[17] J. Eckstein and M. Ferris: Smooth methods of multipliers for complementarity problems. Mathematical Programming 86 (1999), 65–90. https://doi.org/10.1007/s101079900076
[18] B. Evens, P. Latafat, and P. Patrinos: Convergence of the Chambolle-Pock algorithm in the absence of monotonicity, https://doi.org/10.48550/arXiv.2312.06540
[19] B. Evens, P. Pas, P. Latafat, and P. Patrinos: Convergence of the preconditioned proximal point method and Douglas–Rachford splitting in the absence of monotonicity, Mathematical Programming (Series A) (2025). https://doi.org/10.1007/s10107-024-02182-0
[20] A. Genel and J. Lindenstrauss: An example concerning fixed points, Israel Journal of Mathematics 22 (1975), 81–86. https://doi.org/10.1007/BF02757276
[21] P.C. Hansen and J.S. Jørgensen, AIR Tools II: algebraic iterative reconstruction methods, improved implementation, Numerical Algorithms 79 (2017), 107–137. https://doi.org/10.1007/s11075-017-0430-x
[22] G.T. Herman: Fundamentals of Computerized Tomography, 2nd edition, Springer, 2010. https://doi.org/10.1007/978-1-84628-723-7
[23] N.J. Higham: Functions of Matrices, SIAM, 2008. https://doi.org/10.1137/1.9780898717778
[24] R.A. Horn and C.R. Johnson: Matrix Analysis, 2nd edition, Cambridge University Press, 2012. https://doi.org/10.1017/CBO9781139020411
[25] R.A. Horn and C.R. Johnson: Topics in Matrix Analysis, Cambridge University Press, 1994. https://doi.org/10.1017/CBO9780511840371
[26] P. Latafat and P. Patrinos: Asymmetric forward–backward–adjoint splitting for solving monotone inclusions involving three operators, Computational Optimization and Applications 68 (2017), 57–93. https://doi.org/10.1007/s10589-017-9909-6
[27] Y. Malitsky and M.K. Tam, Resolvent splitting for sums of monotone operators with minimal lifting, Mathematical Programming (Series A) 201 (2022), 231–262. https://doi.org/10.1007/s10107-022-01906-4
[28] E.K. Ryu, Uniqueness of DRS as the 2 operator resolvent-splitting and impossibility of 3 operator resolvent-splitting, Mathematical Programming (Series A) 182, 233–273, 2020. https://doi.org/10.1007/s10107-019-01403-1
[29] L.A. Shepp and B.F. Logan, The Fourier reconstruction of a head section, IEEE Transactions of Nuclear Sciences 21, 21–43 (1974). https://doi.org/10.1109/TNS.1974.6499235
[30] L. Yao: On Monotone Linear Relations and the Sum Problem in Banach Spaces, doctoral thesis, UBC Okanagan, 2011. https://doi.org/10.14288/1.0072528


(a) Exact phantom image		(b) Image generated by $x_{100}$		(c) Image generated by $x_{10000}$

On the Bredies-Chenchene-Lorenz-Naldi algorithm: linear relations and strong convergence

Abstract

1 Introduction

Fact 1.1 (Bredies-Chenchene-Lorez-Naldi, 2022).

Fact 1.2 (Genel-Lindenstrauss).

2 Auxiliary results

2.1 More results from Bredies et al.’s [8]

Proposition 2.1 (Fix⁡T\operatorname{Fix}T vs Fix⁡T~\operatorname{Fix}\widetilde{T}).

Fact 2.2.

2.2 A strong convergence result

Fact 2.3 (Baillon-Bruck-Reich, 1978).

2.3 From Genel-Lindenstrauss to maximally monotone operators

Proposition 2.4.

3 The limits and the MM-projection

3.1 The limits

Theorem 3.1 (projecting onto Fix⁡T~\operatorname{Fix}\widetilde{T} and Fix⁡T\operatorname{Fix}T).

Definition 3.2 (MM-projection onto Fix⁡T\operatorname{Fix}T).

Remark 3.3.

Theorem 3.4 (when Fix⁡T~\operatorname{Fix}\widetilde{T} is affine).

Remark 3.5 (linear relations).

Theorem 3.6 (PFix⁡TMP_{\operatorname{Fix}T}^{M} from PFix⁡T~P_{\operatorname{Fix}\widetilde{T}}).

3.2 On the notion of a general MM-projection

Proposition 3.7.

Proposition 3.8.

Remark 3.9.

4 Convergence

Theorem 4.1 (weak convergence and limits).

Theorem 4.2 (strong convergence and limits).

Remark 4.3.

5 Examples

5.1 Douglas-Rachford

5.1.1 General setup

5.1.2 An example without strong convergence

5.1.3 Specialization to normal cones of linear subspaces

5.1.4 When ΠSM\Pi_{S}^{M} is bizarre

5.2 Chambolle-Pock

5.2.1 General setup

5.2.2 An example without strong convergence

5.2.3 Specialization to normal cones of linear subspaces

Lemma 5.1.

Remark 5.2.

5.2.4 Specializing Section 5.2.3 even further to two lines in ℝ2\mathbb{R}^{2}

Lemma 5.3.

Remark 5.4.

5.2.5 Numerical experiment

5.3 Ryu

5.3.1 General setup

5.3.2 Specialization to normal cones of linear subspaces

Lemma 5.5 (fixed point sets).

Corollary 5.6 (projections).

5.4 Malitsky-Tam

5.4.1 General setup

5.4.2 Specialization to normal cones of linear subspaces

Lemma 5.7.

Corollary 5.8.

6 More on Chambolle-Pock

6.1 Factoring MM

Lemma 6.1.

Example 6.2.

Example 6.3.

6.2 An example on the real line

Lemma 6.4.

Lemma 6.5.

Lemma 6.6.

Corollary 6.7.

6.2.1 An example where ran⁡M{\operatorname{ran}}\,M is not closed

Proposition 6.8.

Remark 6.9.

Acknowledgments

References

On the Bredies-Chenchene-Lorenz-Naldi algorithm:
linear relations and strong convergence

Proposition 2.1 ( $\operatorname{Fix}T$ vs $\operatorname{Fix}\widetilde{T}$ ).

3 The limits and the $M$ -projection

Theorem 3.1 (projecting onto $\operatorname{Fix}\widetilde{T}$ and $\operatorname{Fix}T$ ).

Definition 3.2 ( $M$ -projection onto $\operatorname{Fix}T$ ).

Theorem 3.4 (when $\operatorname{Fix}\widetilde{T}$ is affine).

Theorem 3.6 ( $P_{\operatorname{Fix}T}^{M}$ from $P_{\operatorname{Fix}\widetilde{T}}$ ).

3.2 On the notion of a general $M$ -projection

5.1.4 When $\Pi_{S}^{M}$ is bizarre

5.2.4 Specializing Section 5.2.3 even further to two lines in $\mathbb{R}^{2}$

6.1 Factoring $M$

6.2.1 An example where ${\operatorname{ran}}\,M$ is not closed