This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence Results of Two-Step Inertial Proximal Point Algorithm

Olaniyi S. Iyiola111Department of Mathematics, Clarkson University, Potsdam, NY, USA; e-mail:niyi4oau@gmail.com; oiyiola@clarkson.edu  , Yekini Shehu222College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, People’s Republic of China; e-mail: yekini.shehu@zjnu.edu.cn
Abstract

This paper proposes a two-point inertial proximal point algorithm to find zero of maximal monotone operators in Hilbert spaces. We obtain weak convergence results and non-asymptotic O(1/n)O(1/n) convergence rate of our proposed algorithm in non-ergodic sense. Applications of our results to various well-known convex optimization methods, such as the proximal method of multipliers and the alternating direction method of multipliers are given. Numerical results are given to demonstrate the accelerating behaviors of our method over other related methods in the literature.

Keywords:Proximal point algorithm; Two-point inertia; Maximal monotone operators; Weak and non-asymptotic convergence; Hilbert spaces.

2010 MSC classification: 90C25, 90C30, 90C60, 68Q25, 49M25, 90C22

1 Introduction

Suppose HH is a real Hilbert space with inner product .,.\langle.,.\rangle and induced norm .\|\cdot\|. Given a maximal monotone set-valued operator, A:H2H,A:H\to 2^{H}, we consider the following inclusion problem:

find xH such that 0A(x).\mbox{find }x\in H\ \mbox{ such that }\ \textbf{0}\in A(x). (1)

Throughout this paper, we shall denote by A1(0)A^{-1}(\textbf{0}), the set of solutions to  (1). It is well known that the inclusion problem  (1) serves as a unifying model for many problems of fundamental importance, including fixed point problem, variational inequality problem, minimization of closed proper convex functions, and their variants and extensions. Therefore, its efficient solution is of practical interest in many situations.

Related works. The proximal point algorithm (PPA), which was first studied by Martinet and further developed by Rockafellar and others (see, for example, [23, 41, 43, 44, 50]) has been used for many years for studying the inclusion problem  (1). Starting from an arbitrary point x0H,x_{0}\in H, the PPA iteratively generates its sequence {xn}\{x_{n}\} by

xn+1=JλA(xn),x_{n+1}=J_{\lambda}^{A}(x_{n}), (2)

(where the operator JλA:=(I+λA)1J_{\lambda}^{A}:=(I+\lambda A)^{-1} is the so-called resolvent operator, that has been introduced by Moreau in [43]) which is equivalent to

0λA(xn+1)+xn+1xn\textbf{0}\in\lambda A(x_{n+1})+x_{n+1}-x_{n} (3)

where λ>0\lambda>0 called proximal parameter. The PPA  (2) is a very powerful algorithmic tool and contains many well known algorithms as special cases, including the classical augmented Lagrangian method [33, 49], the Douglas-Rachford splitting method [27, 37] and the alternating direction method of multipliers [30, 31]. Interesting results on weak convergence and rate of convergence of PPA have been obtained in [28, 32, 42]. The equivalent representation of the PPA  (3), can be written as

0xn+1xnλ+A(xn+1).\textbf{0}\in\frac{x_{n+1}-x_{n}}{\lambda}+A(x_{n+1}). (4)

This can be viewed as an implicit discretization of the evolution differential inclusion problem

0dxdt+A(x(t))\textbf{0}\in\frac{dx}{dt}+A(x(t)) (5)

It has been shown that the solution trajectory of  (5) converges to a solution of  (1) provided that TT satisfies certain conditions (see, for example, [13]).

To speed up convergence of PPA  (2), the following second order evolution differential inclusion problem was introduced in the literature:

0d2xdt2+βdxdt+A(x(t)),\textbf{0}\in\frac{d^{2}x}{dt^{2}}+\beta\frac{dx}{dt}+A(x(t)), (6)

where β>0\beta>0 is a friction parameter. If A=f,A=\nabla f, where f:2f:\mathbb{R}^{2}\to\mathbb{R} is a differentiable convex function with attainable minimum, the system  (6) characterizes roughly the motion of a heavy ball which rolls under its own inertia over the graph of ff until friction stops it at a stationary point of f.f. In this case, the three terms in  (6) denote, respectively, inertial force, friction force and gravity force. Consequently, the system  (6) is usually referred to as the heavy-ball with friction (HBF) system (see [45]). In theory, the convergence of the solution trajectories of the HBF system to a solution of  (1) can be faster than those of the first-order system  (5), while in practice the second order inertial term d2xdt2\frac{d^{2}x}{dt^{2}} can be exploited to design faster algorithms (see, e.g., [1, 5]). As a result of the properties of  (6), an implicit discretization method was proposed in [2, 3] as follows, given xn1x_{n-1} and xnx_{n}, the next point xn+1x_{n+1} is determined via

0xn+12xn+xn1h2+βxn+1xnh+A(xn+1),\textbf{0}\in\frac{x_{n+1}-2x_{n}+x_{n-1}}{h^{2}}+\beta\frac{x_{n+1}-x_{n}}{h}+A(x_{n+1}), (7)

which result to an iterative algorithm of the form

xn+1=JλA(xn+θ(xnxn1)),x_{n+1}=J_{\lambda}^{A}(x_{n}+\theta(x_{n}-x_{n-1})), (8)

where λ=h21+βh\lambda=\frac{h^{2}}{1+\beta h} and θ=11+βh.\theta=\frac{1}{1+\beta h}. In fact, if we replace xn+12xn+xn1x_{n+1}-2x_{n}+x_{n-1} in  (7) with xn+1xn+ρ(xn1xn),ρ[0,1]x_{n+1}-x_{n}+\rho(x_{n-1}-x_{n}),\rho\in[0,1], we obtain a general algorithm of  (8) with λ=h21+βh\lambda=\frac{h^{2}}{1+\beta h} and θ=ρ1+βh,ρ[0,1].\theta=\frac{\rho}{1+\beta h},\rho\in[0,1]. Note also that  (8) is the proximal point step applied to the extrapolated point xn+θ(xnxn1)x_{n}+\theta(x_{n}-x_{n-1}) rather than xnx_{n} as in the classical PPA  (2). We call the iterative method in  (8) one-step inertial PPA. Convergence properties of  (8) have been studied in [2, 4, 3, 38, 40, 44] under some assumptions on the parameters θ\theta and λ.\lambda. The inertial PPA  (8) has been adapted to studying inertial Douglas-Rachford splitting method [7, 17, 14, 26, 27, 37], inertial alternating method of multipliers (ADMM) [15, 24, 26, 31, 32] and demonstrated their performance numerically on some imaging and data analysis problems. In all the references mentioned above, the inertial PPA  (8) (which is the PPA  (2) with the one-step inertial extrapolation) has been studied. This is a different approach we that we take in this paper, where we consider the PPA  (2) with the two-step inertial extrapolation. Our result is motivated by the results given in [35], where an accelerated proximal point algorithm (which involves both the one-step inertial term and correction term) for maximal monotone operator is studied. In contrast to the method of Kim [35], we replace iterate yn1y_{n-1} in the correction term of [35] with xn1x_{n-1} to obtain two-step inertial extrapolation and investigate the convergence properties.

From another point of view, our proposed two-step inertial PPA can be regarded as a general parametrized proximal point algorithm. Recent and interesting results on parametrized proximal point algorithm can be found in [9, 39], where parametrized proximal point algorithm is developed for solving a class of separable convex programming problems subject to linear and convex constraints. In these papers [9, 39], it was shown numerically that parametrized proximal point algorithm could perform significantly better for solving sparse optimization problems than ADMM and relaxed proximal point algorithm.

Advantages of two-step proximal point algorithms. In [47, 48], Poon and Liang discussed some limitations of inertial Douglas-Rachford splitting method and inertial ADMM. For example, consider the following feasibility problem in 2\mathbb{R}^{2}.

Example 1.1.

Let T1,T22T_{1},T_{2}\subset\mathbb{R}^{2} be two subspaces such that T1T2T_{1}\cap T_{2}\neq\emptyset. Find x2x\in\mathbb{R}^{2} such that xT1T2x\in T_{1}\cap T_{2}.

It was shown in [48, Section 4] that two-step inertial Douglas-Rachford splitting method, where

xn+1=FDR(xn+θ(xnxn1)+δ(xn1xn2))x_{n+1}=F_{DR}(x_{n}+\theta(x_{n}-x_{n-1})+\delta(x_{n-1}-x_{n-2}))

converges faster than one-step inertial Douglas-Rachford splitting method

xn+1=FDR(xn+θ(xnxn1))x_{n+1}=F_{DR}(x_{n}+\theta(x_{n}-x_{n-1}))

for Example 1.1. In fact, it was shown using this Example 1.1 that one-step inertial Douglas-Rachford splitting method

xn+1=FDR(xn+θ(xnxn1))x_{n+1}=F_{DR}(x_{n}+\theta(x_{n}-x_{n-1}))

converges slower than the Douglas-Rachford splitting method

xn+1=FDR(xn),x_{n+1}=F_{DR}(x_{n}),

where

FDR:=12(I+(2PT1I)(2PT2I))F_{DR}:=\frac{1}{2}\Big{(}I+(2P_{T_{1}}-I)(2P_{T_{2}}-I)\Big{)}

is the Douglas-Rachford splitting operator. This example therefore shows that one-step inertial Douglas-Rachford splitting method may fail to provide acceleration. Therefore, for certain cases, the use of inertia of more than two points could be beneficial. It was remark in [36, Chapter 4] that the use of more than two points xn,xn1x_{n},x_{n-1} could provide acceleration. For example, the following two-step inertial extrapolation

yn=xn+θ(xnxn1)+δ(xn1xn2)y_{n}=x_{n}+\theta(x_{n}-x_{n-1})+\delta(x_{n-1}-x_{n-2}) (9)

with θ>0\theta>0 and δ<0\delta<0 can provide acceleration. The failure of one-step inertial acceleration of ADMM was also discussed in [47, Section 3] and adaptive acceleration for ADMM was proposed instead. Polyak [46] also discussed that the multi-step inertial methods can boost the speed of optimization methods even though neither the convergence nor the rate result of such multi-step inertial methods is established in [46]. Some results on multi-step inertial methods have recently been studied in [22, 25].

Our contribution. In this paper, we propose an inertial proximal point algorithm with two-step inertial extrapolation step. We obtain weak convergence results and give non-asymptotic O(1/n)O(1/n) convergence rate of our proposed algorithm in non-ergodic sense. The summability conditions of the inertial parameters and the sequence of iterates imposed in [22, Algorithm 1.2], [25, Theorem 4.2 (35)], and [36, Chapter 4, (4.2.5)] are dispensed with in our results. We apply our results to the proximal method of multipliers and the alternating direction method of multipliers. We support our theoretical analysis with some preliminary computational experiments, which confirm the superiority of our method over other related ones in the literature.

Outline. In Section 2, we give some basic definitions and results needed in subsequent sections. In Section 3, we derive our method from the dynamical systems and later introduce our proposed method. We also give both weak convergence and non-asymptotic O(1/n)O(1/n) convergence rate of our method in Section 3. We give applications of our results to convex-concave saddle-point problems, the proximal method of multipliers, ADMM, primal–dual hybrid gradient method and Douglas–Rachford splitting method in Section 4. We give some numerical illustrations in Section 5 and concluding remarks are given in Section 6.

2 Preliminaries

In this section, we give some definitions and basic results that will be used in our subsequent analysis. The weak and the strong convergence of {xn}H\{x_{n}\}\subset H to xHx\in H is denoted by xnxx_{n}\rightharpoonup x and xnxx_{n}\to x as nn\to\infty respectively.

Definition 2.1.

A mapping T:HHT:H\to H is called

  • (i)

    nonexpansive if TxTyxy,\|Tx-Ty\|\leq\|x-y\|, for all x,yH;x,y\in H;

  • (ii)

    firmly nonexpansive if TxTy2xy2(IT)x(IT)y2\|Tx-Ty\|^{2}\leq\|x-y\|^{2}-\|(I-T)x-(I-T)y\|^{2} for all x,yH.x,y\in H. Equivalently, TT is firmly nonexpansive if TxTy2xy,TxTy\|Tx-Ty\|^{2}\leq\langle x-y,Tx-Ty\rangle for all x,yH;x,y\in H;

  • (iii)

    averaged if TT can be expressed as the averaged of the identity mapping II and a nonexpansive mapping SS, i.e., T=(1α)I+αST=(1-\alpha)I+\alpha S with α(0,1).\alpha\in(0,1). Alternatively, TT is α\alpha-averaged if

    TxTy2xy21αα(IT)x(IT)y2,x,yH.\|Tx-Ty\|^{2}\leq\|x-y\|^{2}-\frac{1-\alpha}{\alpha}\|(I-T)x-(I-T)y\|^{2},\forall x,y\in H.
Definition 2.2.

A multivalued mapping A:H2HA:H\to 2^{H} is said to be monotone if for any x,yH,x,y\in H,

xy,fg0,\langle x-y,f-g\rangle\geq 0,

where fAxf\in Ax and gAy.g\in Ay. The Graph of AA is defined by

Gr(A):={(x,f)H×H:fAx}.Gr(A):=\{(x,f)\in H\times H:f\in Ax\}.

If Gr(A)Gr(A) is not properly contained in the graph of any other monotone mapping, then we say that AA is maximal. It is well-known that for each xHx\in H, and λ>0\lambda>0, there is a unique zHz\in H such that x(I+λA)zx\in(I+\lambda A)z. The single-valued operator JλA(x)J_{\lambda}^{A}(x) is called the resolvent of AA (see [28]).

Lemma 2.3.

The following identities hold for all u,v,wHu,v,w\in H:

2u,v=u2+v2uv2=u+v2u2v2.2\langle u,v\rangle=\|u\|^{2}+\|v\|^{2}-\|u-v\|^{2}=\|u+v\|^{2}-\|u\|^{2}-\|v\|^{2}.
Lemma 2.4.

Let x,y,zHx,y,z\in H and a,ba,b\in\mathbb{R}. Then

(1+a)x(ab)ybz2\displaystyle\|(1+a)x-(a-b)y-bz\|^{2} =\displaystyle= (1+a)x2(ab)y2bz2+(1+a)(ab)xy2\displaystyle(1+a)\|x\|^{2}-(a-b)\|y\|^{2}-b\|z\|^{2}+(1+a)(a-b)\|x-y\|^{2}
+b(1+a)xz2b(ab)yz2.\displaystyle+b(1+a)\|x-z\|^{2}-b(a-b)\|y-z\|^{2}.

3 Main Results

3.1 Motivations from Dynamical Systems

Consider the following second order dynamical system

x¨(t)+α(t)x˙(t)+β(t)(x(t)JλA(x(t)))=0,x(0)=x0,x˙(0)=v0,\ddot{x}(t)+\alpha(t)\dot{x}(t)+\beta(t)(x(t)-J_{\lambda}^{A}(x(t)))=0,\leavevmode\nobreak\ \leavevmode\nobreak\ x(0)=x_{0},\dot{x}(0)=v_{0}, (10)

where α,β:[0,)[0,)\alpha,\beta:[0,\infty)\rightarrow[0,\infty) are Lebesgue measurable functions and λ>0\lambda>0. Let 0<ω2<ω10<\omega_{2}<\omega_{1} be two weighting parameters such that ω1+ω2=1\omega_{1}+\omega_{2}=1, h>0h>0 is the time step-size, tn=nht_{n}=nh and xn=x(tn)x_{n}=x(t_{n}). Consider an explicit Euler forward discretization with respect to JλAJ_{\lambda}^{A}, explicit discretization of x˙(t)\dot{x}(t), and a weighted sum of explicit and implicit discretization of x¨(t)\ddot{x}(t), we have

ω1h2(xn+12xn+xn1)+ω2h2(xn2xn1+xn2)\displaystyle\frac{\omega_{1}}{h^{2}}(x_{n+1}-2x_{n}+x_{n-1})+\frac{\omega_{2}}{h^{2}}(x_{n}-2x_{n-1}+x_{n-2})
+αnh(xnxn1)+βn(ynJλA(yn))=0,\displaystyle+\frac{\alpha_{n}}{h}(x_{n}-x_{n-1})+\beta_{n}(y_{n}-J_{\lambda}^{A}(y_{n}))=0, (11)

where yny_{n} performs ”extrapolation” onto the points xn,xn1x_{n},x_{n-1} and xn2x_{n-2}, which will be chosen later. We observe that since T:=IJλAT:=I-J_{\lambda}^{A} is Lipschitz continuous, there is some flexibility in this choice. Therefore,  (3.1) becomes

xn+12xn+xn1+ω2ω1(xn2xn1+xn2)\displaystyle x_{n+1}-2x_{n}+x_{n-1}+\frac{\omega_{2}}{\omega_{1}}(x_{n}-2x_{n-1}+x_{n-2})
+αnhω1(xnxn1)+βnh2ω1(ynJλA(yn))=0.\displaystyle+\frac{\alpha_{n}h}{\omega_{1}}(x_{n}-x_{n-1})+\frac{\beta_{n}h^{2}}{\omega_{1}}(y_{n}-J_{\lambda}^{A}(y_{n}))=0. (12)

This implies that

xn+1=2xnxn1ω2ω1(xn2xn1+xn2)\displaystyle x_{n+1}=2x_{n}-x_{n-1}-\frac{\omega_{2}}{\omega_{1}}(x_{n}-2x_{n-1}+x_{n-2}) (13)
αnhω1(xnxn1)βnh2ω1(ynJλA(yn))\displaystyle-\frac{\alpha_{n}h}{\omega_{1}}(x_{n}-x_{n-1})-\frac{\beta_{n}h^{2}}{\omega_{1}}(y_{n}-J_{\lambda}^{A}(y_{n}))
=\displaystyle= xn+(xnxn1)ω2ω1(xnxn1)+ω2ω1(xn1xn2)\displaystyle x_{n}+(x_{n}-x_{n-1})-\frac{\omega_{2}}{\omega_{1}}(x_{n}-x_{n-1})+\frac{\omega_{2}}{\omega_{1}}(x_{n-1}-x_{n-2})
αnhω1(xnxn1)βnh2ω1(ynJλA(yn))\displaystyle-\frac{\alpha_{n}h}{\omega_{1}}(x_{n}-x_{n-1})-\frac{\beta_{n}h^{2}}{\omega_{1}}(y_{n}-J_{\lambda}^{A}(y_{n}))
=\displaystyle= xn+(1ω2ω1αnhω1)(xnxn1)+ω2ω1(xn1xn2)\displaystyle x_{n}+\Big{(}1-\frac{\omega_{2}}{\omega_{1}}-\frac{\alpha_{n}h}{\omega_{1}}\Big{)}(x_{n}-x_{n-1})+\frac{\omega_{2}}{\omega_{1}}(x_{n-1}-x_{n-2})
βnh2ω1(ynJλA(yn)).\displaystyle-\frac{\beta_{n}h^{2}}{\omega_{1}}(y_{n}-J_{\lambda}^{A}(y_{n})).

Set

θn:=1ω2ω1αnhω1,δn:=ω2ω1,ρn:=βnh2ω1.\theta_{n}:=1-\frac{\omega_{2}}{\omega_{1}}-\frac{\alpha_{n}h}{\omega_{1}},\leavevmode\nobreak\ \leavevmode\nobreak\ \delta_{n}:=\frac{\omega_{2}}{\omega_{1}},\leavevmode\nobreak\ \leavevmode\nobreak\ \rho_{n}:=\frac{\beta_{n}h^{2}}{\omega_{1}}.

Then we have from  (13) that

xn+1\displaystyle x_{n+1} =\displaystyle= xn+θn(xnxn1)+δn(xn1xn2)\displaystyle x_{n}+\theta_{n}(x_{n}-x_{n-1})+\delta_{n}(x_{n-1}-x_{n-2}) (14)
ρnyn+ρnJλA(yn).\displaystyle-\rho_{n}y_{n}+\rho_{n}J_{\lambda}^{A}(y_{n}).

Choosing yn=xn+θn(xnxn1)+δn(xn1xn2)y_{n}=x_{n}+\theta_{n}(x_{n}-x_{n-1})+\delta_{n}(x_{n-1}-x_{n-2}), then  (14) becomes

{yn=xn+θn(xnxn1)+δn(xn1xn2),xn+1=(1ρn)yn+ρnJλA(yn)\displaystyle\left\{\begin{array}[]{ll}&y_{n}=x_{n}+\theta_{n}(x_{n}-x_{n-1})+\delta_{n}(x_{n-1}-x_{n-2}),\\ &x_{n+1}=(1-\rho_{n})y_{n}+\rho_{n}J_{\lambda}^{A}(y_{n})\end{array}\right. (17)

This is two-step inertial proximal point algorithm we intend to study in the next section of this paper.

3.2 Proposed Method

In this subsection, we consider two-step inertial proximal point algorithm given in  (17) with θn=θ,δn=δ\theta_{n}=\theta,\leavevmode\nobreak\ \leavevmode\nobreak\ \delta_{n}=\delta and ρn=1\rho_{n}=1 for the sake of simplicity.

In our convergence analysis, we assume that parameters δ\delta and θ\theta lie in the following region:

𝒢:={(δ,θ):0θ<13,3θ13+4θ<δ0}.\mathcal{G}:=\Big{\{}(\delta,\theta):0\leq\theta<\frac{1}{3},\frac{3\theta-1}{3+4\theta}<\delta\leq 0\Big{\}}. (18)

One can see clearly from  (18) that δ<13θ\delta<1-3\theta.

We now present our proposed method as follows:

Algorithm 1 (2-Step Inertial PPA)
1:Choose parameters δ\delta and θ\theta satisfying condition  (18). Choose x1,x0,y0Hx_{-1},x_{0},y_{0}\in H arbitrarily, λ>0\lambda>0 and set n=0.n=0.
2:Given xn1,xnx_{n-1},x_{n} and yny_{n}, compute xn+1x_{n+1} as follows:
{xn+1=JλA(yn),yn+1=xn+1+θ(xn+1xn)+δ(xnxn1)\displaystyle\left\{\begin{array}[]{ll}&x_{n+1}=J_{\lambda}^{A}(y_{n}),\\ &y_{n+1}=x_{n+1}+\theta(x_{n+1}-x_{n})+\delta(x_{n}-x_{n-1})\end{array}\right. (21)
3:Set nn+1n\leftarrow n+1, and go to Step 2.
Remark 3.1.

When δ=0\delta=0 in our proposed Algorithm 1, our method reduces to the inertial proximal point algorithm studied in [3, 4, 6, 7, 8, 14, 20, 38, 40] to mention but a few. Our method is an extension of the inertial proximal point algorithm in [3, 4, 6, 7, 8, 14, 20, 38, 40]. We will show the advantage gained with the introduction of δ(,0]\delta\in(-\infty,0] in the numerical experiments in Section 5.

3.3 Convergence Analysis

We present the weak convergence analysis of sequence of iterates generated by our proposed Algorithm 1 in this subsection.

Theorem 3.2.

Let A:H2HA:H\to 2^{H} be a maximal monotone. Suppose A1(0)A^{-1}(\textbf{0})\neq\emptyset and let {xn}\{x_{n}\} be generated by Algorithm 1. Then {xn}\{x_{n}\} converges weakly to a point in A1(0).A^{-1}(\textbf{0}).

Proof.

Let xA1(0)x^{*}\in A^{-1}(\textbf{0}). Then

ynx\displaystyle y_{n}-x^{*} =\displaystyle= xn+θ(xnxn1)+δ(xn1xn2)x\displaystyle x_{n}+\theta(x_{n}-x_{n-1})+\delta(x_{n-1}-x_{n-2})-x^{*}
=\displaystyle= (1+θ)(xnx)(θδ)(xn1x)δ(xn2x)\displaystyle(1+\theta)(x_{n}-x^{*})-(\theta-\delta)(x_{n-1}-x^{*})-\delta(x_{n-2}-x^{*})

Therefore, by Lemma 2.4, we obtain

ynx2\displaystyle\|y_{n}-x^{*}\|^{2} =\displaystyle= (1+θ)(xnx)(θδ)(xn1x)δ(xn2x)2\displaystyle\|(1+\theta)(x_{n}-x^{*})-(\theta-\delta)(x_{n-1}-x^{*})-\delta(x_{n-2}-x^{*})\|^{2} (22)
=\displaystyle= (1+θ)xnx2(θδ)xn1x2δxn2x2\displaystyle(1+\theta)\|x_{n}-x^{*}\|^{2}-(\theta-\delta)\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}
+(1+θ)(θδ)xnxn12+δ(1+θ)xnxn22\displaystyle+(1+\theta)(\theta-\delta)\|x_{n}-x_{n-1}\|^{2}+\delta(1+\theta)\|x_{n}-x_{n-2}\|^{2}
δ(θδ)xn1xn22.\displaystyle-\delta(\theta-\delta)\|x_{n-1}-x_{n-2}\|^{2}.

By Cauchy-Schwartz inequality, we obtain

2θxn+1xn,xnxn12θxn+1xnxnxn1,-2\theta\langle x_{n+1}-x_{n},x_{n}-x_{n-1}\rangle\geq-2\theta\|x_{n+1}-x_{n}\|\|x_{n}-x_{n-1}\|, (23)
2δxn+1xn,xn1xn22|δ|xn+1xnxn1xn2,-2\delta\langle x_{n+1}-x_{n},x_{n-1}-x_{n-2}\rangle\geq-2|\delta|\|x_{n+1}-x_{n}\|\|x_{n-1}-x_{n-2}\|, (24)

and

2δθxnxn1,xn1xn2\displaystyle 2\delta\theta\langle x_{n}-x_{n-1},x_{n-1}-x_{n-2}\rangle =\displaystyle= 2δθxn1xn,xn1xn2\displaystyle-2\delta\theta\langle x_{n-1}-x_{n},x_{n-1}-x_{n-2}\rangle (25)
\displaystyle\geq 2|δ|θxnxn1xn1xn2.\displaystyle-2|\delta|\theta\|x_{n}-x_{n-1}\|\|x_{n-1}-x_{n-2}\|.

By  (23),  (24) and  (25), we obtain

xn+1yn2\displaystyle\|x_{n+1}-y_{n}\|^{2} =\displaystyle= xn+1(xn+θ(xnxn1)+δ(xn1xn2))2\displaystyle\|x_{n+1}-(x_{n}+\theta(x_{n}-x_{n-1})+\delta(x_{n-1}-x_{n-2}))\|^{2} (26)
=\displaystyle= xn+1xnθ(xnxn1)δ(xn1xn2)2\displaystyle\|x_{n+1}-x_{n}-\theta(x_{n}-x_{n-1})-\delta(x_{n-1}-x_{n-2})\|^{2}
=\displaystyle= xn+1xn22θxn+1xn,xnxn1\displaystyle\|x_{n+1}-x_{n}\|^{2}-2\theta\langle x_{n+1}-x_{n},x_{n}-x_{n-1}\rangle
2δxn+1xn,xn1xn2+θ2xnxn12\displaystyle-2\delta\langle x_{n+1}-x_{n},x_{n-1}-x_{n-2}\rangle+\theta^{2}\|x_{n}-x_{n-1}\|^{2}
+2δθxnxn1,xn1xn2+δ2xn1xn22\displaystyle+2\delta\theta\langle x_{n}-x_{n-1},x_{n-1}-x_{n-2}\rangle+\delta^{2}\|x_{n-1}-x_{n-2}\|^{2}
\displaystyle\geq xn+1xn22θxn+1xnxnxn1\displaystyle\|x_{n+1}-x_{n}\|^{2}-2\theta\|x_{n+1}-x_{n}\|\|x_{n}-x_{n-1}\|
2|δ|xn+1xnxn1xn2+θ2xnxn12\displaystyle-2|\delta|\|x_{n+1}-x_{n}\|\|x_{n-1}-x_{n-2}\|+\theta^{2}\|x_{n}-x_{n-1}\|^{2}
2|δ|θxnxn1xn1xn2+δ2xn1xn22\displaystyle-2|\delta|\theta\|x_{n}-x_{n-1}\|\|x_{n-1}-x_{n-2}\|+\delta^{2}\|x_{n-1}-x_{n-2}\|^{2}
\displaystyle\geq xn+1xn2θxn+1xn2θxnxn12\displaystyle\|x_{n+1}-x_{n}\|^{2}-\theta\|x_{n+1}-x_{n}\|^{2}-\theta\|x_{n}-x_{n-1}\|^{2}
|δ|xn+1xn2|δ|xn1xn22+θ2xnxn12\displaystyle-|\delta|\|x_{n+1}-x_{n}\|^{2}-|\delta|\|x_{n-1}-x_{n-2}\|^{2}+\theta^{2}\|x_{n}-x_{n-1}\|^{2}
|δ|θxnxn12|δ|θxn1xn22+δ2xn1xn22\displaystyle-|\delta|\theta\|x_{n}-x_{n-1}\|^{2}-|\delta|\theta\|x_{n-1}-x_{n-2}\|^{2}+\delta^{2}\|x_{n-1}-x_{n-2}\|^{2}
=\displaystyle= (1|δ|θ)xn+1xn2+(θ2θ|δ|θ)xnxn12\displaystyle(1-|\delta|-\theta)\|x_{n+1}-x_{n}\|^{2}+(\theta^{2}-\theta-|\delta|\theta)\|x_{n}-x_{n-1}\|^{2}
+(δ2|δ||δ|θ)xn1xn22.\displaystyle+(\delta^{2}-|\delta|-|\delta|\theta)\|x_{n-1}-x_{n-2}\|^{2}.

Using  (22) and  (26) in  (21), we have (noting that JλAJ_{\lambda}^{A} is firmly nonexpansive and δ0\delta\leq 0)

xn+1x2\displaystyle\|x_{n+1}-x^{*}\|^{2} =\displaystyle= JλA(yn)x2\displaystyle\|J_{\lambda}^{A}(y_{n})-x^{*}\|^{2} (27)
=\displaystyle= JλA(yn)JλA(x)2\displaystyle\|J_{\lambda}^{A}(y_{n})-J_{\lambda}^{A}(x^{*})\|^{2}
\displaystyle\leq ynx2JλA(yn)yn2\displaystyle\|y_{n}-x^{*}\|^{2}-\|J_{\lambda}^{A}(y_{n})-y_{n}\|^{2}
=\displaystyle= ynx2xn+1yn2\displaystyle\|y_{n}-x^{*}\|^{2}-\|x_{n+1}-y_{n}\|^{2}
\displaystyle\leq (1+θ)xnx2(θδ)xn1x2δxn2x2\displaystyle(1+\theta)\|x_{n}-x^{*}\|^{2}-(\theta-\delta)\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}
+(1+θ)(θδ)xnxn12+δ(1+θ)xnxn22\displaystyle+(1+\theta)(\theta-\delta)\|x_{n}-x_{n-1}\|^{2}+\delta(1+\theta)\|x_{n}-x_{n-2}\|^{2}
δ(θδ)xn1xn22(1|δ|θ)xn+1xn2\displaystyle-\delta(\theta-\delta)\|x_{n-1}-x_{n-2}\|^{2}-(1-|\delta|-\theta)\|x_{n+1}-x_{n}\|^{2}
(θ2θ|δ|θ)xnxn12(δ2|δ||δ|θ)xn1xn22\displaystyle-(\theta^{2}-\theta-|\delta|\theta)\|x_{n}-x_{n-1}\|^{2}-(\delta^{2}-|\delta|-|\delta|\theta)\|x_{n-1}-x_{n-2}\|^{2}
=\displaystyle= (1+θ)xnx2(θδ)xn1x2δxn2x2\displaystyle(1+\theta)\|x_{n}-x^{*}\|^{2}-(\theta-\delta)\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}
+((1+θ)(θδ)(θ2θ|δ|θ))xnxn12\displaystyle+\Big{(}(1+\theta)(\theta-\delta)-(\theta^{2}-\theta-|\delta|\theta)\Big{)}\|x_{n}-x_{n-1}\|^{2}
+δ(1+θ)xnxn22(1|δ|θ)xn+1xn2\displaystyle+\delta(1+\theta)\|x_{n}-x_{n-2}\|^{2}-(1-|\delta|-\theta)\|x_{n+1}-x_{n}\|^{2}
(δ(θδ)+(δ2|δ||δ|θ))xn1xn22\displaystyle-\Big{(}\delta(\theta-\delta)+(\delta^{2}-|\delta|-|\delta|\theta)\Big{)}\|x_{n-1}-x_{n-2}\|^{2}
=\displaystyle= (1+θ)xnx2(θδ)xn1x2δxn2x2\displaystyle(1+\theta)\|x_{n}-x^{*}\|^{2}-(\theta-\delta)\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}
+(2θδδθ+|δ|θ)xnxn12+δ(1+θ)xnxn22\displaystyle+(2\theta-\delta-\delta\theta+|\delta|\theta)\|x_{n}-x_{n-1}\|^{2}+\delta(1+\theta)\|x_{n}-x_{n-2}\|^{2}
(1|δ|θ)xn+1xn2+(|δ|+|δ|θδθ)xn1xn22\displaystyle-(1-|\delta|-\theta)\|x_{n+1}-x_{n}\|^{2}+(|\delta|+|\delta|\theta-\delta\theta)\|x_{n-1}-x_{n-2}\|^{2}
\displaystyle\leq (1+θ)xnx2(θδ)xn1x2δxn2x2\displaystyle(1+\theta)\|x_{n}-x^{*}\|^{2}-(\theta-\delta)\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}
+(2θδδθ+|δ|θ)xnxn12(1|δ|θ)xn+1xn2\displaystyle+(2\theta-\delta-\delta\theta+|\delta|\theta)\|x_{n}-x_{n-1}\|^{2}-(1-|\delta|-\theta)\|x_{n+1}-x_{n}\|^{2}
+(|δ|+|δ|θδθ)xn1xn22.\displaystyle+(|\delta|+|\delta|\theta-\delta\theta)\|x_{n-1}-x_{n-2}\|^{2}.

Therefore,

xn+1x2θxnx2δxn1x2+(1|δ|θ)xn+1xn2\displaystyle\|x_{n+1}-x^{*}\|^{2}-\theta\|x_{n}-x^{*}\|^{2}-\delta\|x_{n-1}-x^{*}\|^{2}+(1-|\delta|-\theta)\|x_{n+1}-x_{n}\|^{2} (28)
\displaystyle\leq xnx2θxn1x2δxn2x2\displaystyle\|x_{n}-x^{*}\|^{2}-\theta\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}
+(1|δ|θ)xnxn12+(3θ1+(1+θ)(|δ|δ))xnxn12\displaystyle+(1-|\delta|-\theta)\|x_{n}-x_{n-1}\|^{2}+(3\theta-1+(1+\theta)(|\delta|-\delta))\|x_{n}-x_{n-1}\|^{2}
+(|δ|+|δ|θδθ)xn1xn22.\displaystyle+(|\delta|+|\delta|\theta-\delta\theta)\|x_{n-1}-x_{n-2}\|^{2}.

Now, define

Γn:=xnx2θxn1x2δxn2x2+(1|δ|θ)xnxn12,n1.\Gamma_{n}:=\|x_{n}-x^{*}\|^{2}-\theta\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}+(1-|\delta|-\theta)\|x_{n}-x_{n-1}\|^{2},\leavevmode\nobreak\ \leavevmode\nobreak\ n\geq 1.

We show that Γn0,n1.\Gamma_{n}\geq 0,\leavevmode\nobreak\ \leavevmode\nobreak\ \forall n\geq 1. Observe that

xn1x22xnxn12+2xnx2.\displaystyle\|x_{n-1}-x^{*}\|^{2}\leq 2\|x_{n}-x_{n-1}\|^{2}+2\|x_{n}-x^{*}\|^{2}. (29)

So,

Γn\displaystyle\Gamma_{n} =\displaystyle= xnx2θxn1x2δxn2x2\displaystyle\|x_{n}-x^{*}\|^{2}-\theta\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2} (30)
+(1|δ|θ)xnxn12\displaystyle+(1-|\delta|-\theta)\|x_{n}-x_{n-1}\|^{2}
\displaystyle\geq xnx22θxnxn122θxnx2\displaystyle\|x_{n}-x^{*}\|^{2}-2\theta\|x_{n}-x_{n-1}\|^{2}-2\theta\|x_{n}-x^{*}\|^{2}
δxn2x2+(1|δ|θ)xnxn12\displaystyle-\delta\|x_{n-2}-x^{*}\|^{2}+(1-|\delta|-\theta)\|x_{n}-x_{n-1}\|^{2}
=\displaystyle= (12θ)xnx2+(1|δ|3θ)xnxn12\displaystyle(1-2\theta)\|x_{n}-x^{*}\|^{2}+(1-|\delta|-3\theta)\|x_{n}-x_{n-1}\|^{2}
δxn2x2\displaystyle-\delta\|x_{n-2}-x^{*}\|^{2}
\displaystyle\geq 0,\displaystyle 0,

since |δ|<13θ3θ1<δ<13θ|\delta|<1-3\theta\Leftrightarrow 3\theta-1<\delta<1-3\theta and δ0\delta\leq 0. Furthermore, we obtain from  (28) that

Γn+1Γn\displaystyle\Gamma_{n+1}-\Gamma_{n} \displaystyle\leq (3θ1+(1+θ)(|δ|δ))(xn1xn22xnxn12)\displaystyle-(3\theta-1+(1+\theta)(|\delta|-\delta))(\|x_{n-1}-x_{n-2}\|^{2}-\|x_{n}-x_{n-1}\|^{2}) (31)
[(3θ1+(1+θ)(|δ|δ))(|δ|+|δ|θδθ)]xn1xn22\displaystyle-\Big{[}-(3\theta-1+(1+\theta)(|\delta|-\delta))-(|\delta|+|\delta|\theta-\delta\theta)\Big{]}\|x_{n-1}-x_{n-2}\|^{2}
=\displaystyle= (3θ1+(1+θ)(|δ|δ))(xn1xn22xnxn12)\displaystyle-(3\theta-1+(1+\theta)(|\delta|-\delta))(\|x_{n-1}-x_{n-2}\|^{2}-\|x_{n}-x_{n-1}\|^{2})
(13θ2|δ|2θ|δ|+2θδ+δ)xn1xn22\displaystyle-\Big{(}1-3\theta-2|\delta|-2\theta|\delta|+2\theta\delta+\delta\Big{)}\|x_{n-1}-x_{n-2}\|^{2}
=\displaystyle= c1(xn1xn22xnxn12)c2xn1xn22,\displaystyle c_{1}(\|x_{n-1}-x_{n-2}\|^{2}-\|x_{n}-x_{n-1}\|^{2})-c_{2}\|x_{n-1}-x_{n-2}\|^{2},

where c1:=(3θ1+(1+θ)(|δ|δ))c_{1}:=-(3\theta-1+(1+\theta)(|\delta|-\delta)) and c2:=13θ2|δ|2θ|δ|+2θδ+δ.c_{2}:=1-3\theta-2|\delta|-2\theta|\delta|+2\theta\delta+\delta. Noting that |δ|=δ|\delta|=-\delta since δ0\delta\leq 0, we then have that

c1=(3θ1+(1+θ)(|δ|δ))>03θ12(1+θ)<δ.c_{1}=-(3\theta-1+(1+\theta)(|\delta|-\delta))>0\Leftrightarrow\frac{3\theta-1}{2(1+\theta)}<\delta.

Furthermore,

c2=13θ2|δ|2θ|δ|+2θδ+δ>03θ13+4θ<δ.c_{2}=1-3\theta-2|\delta|-2\theta|\delta|+2\theta\delta+\delta>0\Leftrightarrow\frac{3\theta-1}{3+4\theta}<\delta.

Observe that if 0θ<130\leq\theta<\frac{1}{3}, then

3θ1<3θ12(1+θ)<3θ13+4θ.3\theta-1<\frac{3\theta-1}{2(1+\theta)}<\frac{3\theta-1}{3+4\theta}.

This implies by  (18) that both c1:=(3θ1+(1+θ)(|δ|δ))>0c_{1}:=-(3\theta-1+(1+\theta)(|\delta|-\delta))>0 and c2:=13θ2|δ|2θ|δ|+2θδ+δ>0c_{2}:=1-3\theta-2|\delta|-2\theta|\delta|+2\theta\delta+\delta>0 if

3θ13+4θ<δ0.\frac{3\theta-1}{3+4\theta}<\delta\leq 0. (32)

By  (31), we obtain

Γn+1+c1xnxn12\displaystyle\Gamma_{n+1}+c_{1}\|x_{n}-x_{n-1}\|^{2} \displaystyle\leq Γn+c1xn1xn22\displaystyle\Gamma_{n}+c_{1}\|x_{n-1}-x_{n-2}\|^{2} (33)
c2xn1xn22.\displaystyle-c_{2}\|x_{n-1}-x_{n-2}\|^{2}.

Letting Γ¯n:=Γn+c1xn1xn22\bar{\Gamma}_{n}:=\Gamma_{n}+c_{1}\|x_{n-1}-x_{n-2}\|^{2}, we obtain from  (33) that

Γ¯n+1Γ¯n.\bar{\Gamma}_{n+1}\leq\bar{\Gamma}_{n}. (34)

This implies from  (34) that the sequence {Γ¯n}\{\bar{\Gamma}_{n}\} is decreasing and thus limnΓ¯n\underset{n\rightarrow\infty}{\lim}\bar{\Gamma}_{n} exists. Consequently, we have from  (33) that

limnc2xn1xn22=0.\displaystyle\underset{n\rightarrow\infty}{\lim}c_{2}\|x_{n-1}-x_{n-2}\|^{2}=0. (35)

Hence,

limnxn1xn2=0.\displaystyle\underset{n\rightarrow\infty}{\lim}\|x_{n-1}-x_{n-2}\|=0. (36)

Using  (36) and existence of limit of {Γ¯n}\{\bar{\Gamma}_{n}\}, we have that limnΓn\underset{n\rightarrow\infty}{\lim}\Gamma_{n} exists. Furthermore,

xn+1yn\displaystyle\|x_{n+1}-y_{n}\| =\displaystyle= xn+1xnθ(xnxn1)δ(xn1xn2)\displaystyle\|x_{n+1}-x_{n}-\theta(x_{n}-x_{n-1})-\delta(x_{n-1}-x_{n-2})\| (37)
\displaystyle\leq xn+1xn+θxnxn1+|δ|xn1xn20\displaystyle\|x_{n+1}-x_{n}\|+\theta\|x_{n}-x_{n-1}\|+|\delta|\|x_{n-1}-x_{n-2}\|\rightarrow 0

as n.n\rightarrow\infty. Therefore,

limnJλA(yn)yn=0.\displaystyle\underset{n\rightarrow\infty}{\lim}\|J_{\lambda}^{A}(y_{n})-y_{n}\|=0. (38)

Also,

ynxnθxnxn1+|δ|xn1xn20,n.\displaystyle\|y_{n}-x_{n}\|\leq\theta\|x_{n}-x_{n-1}\|+|\delta|\|x_{n-1}-x_{n-2}\|\rightarrow 0,n\rightarrow\infty. (39)

Since limnΓn\underset{n\rightarrow\infty}{\lim}\Gamma_{n} exists and limnxnxn1=0\underset{n\rightarrow\infty}{\lim}\|x_{n}-x_{n-1}\|=0, we obtain from  (30) that the sequence {xn}\{x_{n}\} is bounded. Hence {xn}\{x_{n}\} has at least one accumulation point vHv^{*}\in H. Assume that {xnk}{xn}\{x_{n_{k}}\}\subset\{x_{n}\} such that xnkv,x_{n_{k}}\rightharpoonup v^{*}, kk\to\infty. Since ynxn0,ny_{n}-x_{n}\rightarrow 0,n\rightarrow\infty, we have that ynkvy_{n_{k}}\rightharpoonup v^{*}, kk\to\infty. Also, by  (38), we have that JλA(ynk)v,J_{\lambda}^{A}(y_{n_{k}})\rightharpoonup v^{*}, kk\to\infty. Since JλA(yn)=(I+λA)1ynJ_{\lambda}^{A}(y_{n})=(I+\lambda A)^{-1}y_{n} we obtain

1λ(ynkJλA(ynk))A(JλA(ynk)).\frac{1}{\lambda}\Big{(}y_{n_{k}}-J_{\lambda}^{A}(y_{n_{k}})\Big{)}\in A(J_{\lambda}^{A}(y_{n_{k}})).

By the monotonicity of AA, we have nk\forall n_{k},

xJλA(ynk),w1λ(ynkJλA(ynk))0,x,wsatisfyingwA(x).\langle x-J_{\lambda}^{A}(y_{n_{k}}),w-\frac{1}{\lambda}\Big{(}y_{n_{k}}-J_{\lambda}^{A}(y_{n_{k}})\Big{)}\rangle\geq 0,\ \forall\ x,w\ \mbox{satisfying}\ w\in A(x). (40)

Using  (38) in  (40) we get xv,w0,x,w\langle x-v^{*},w\rangle\geq 0,\forall x,w satisfying wA(x)w\in A(x). Since AA is maximal monotone, we conclude that vA1(0)v^{*}\in A^{-1}(\textbf{0}) (see, for example, [51]).

Since limnΓn\underset{n\rightarrow\infty}{\lim}\Gamma_{n} exists and limnxn+1xn=0\underset{n\rightarrow\infty}{\lim}\|x_{n+1}-x_{n}\|=0, we have that

limn[xnx2θxn1x2δxn2x2]\displaystyle\underset{n\rightarrow\infty}{\lim}\Big{[}\|x_{n}-x^{*}\|^{2}-\theta\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}\Big{]} (41)

exists.

We now show that xnxA1(0)x_{n}\rightharpoonup x^{*}\in A^{-1}(\textbf{0}). Let us assume that there exist {xnk}{xn}\{x_{n_{k}}\}\subset\{x_{n}\} and {xnj}{xn}\{x_{n_{j}}\}\subset\{x_{n}\} such that xnkv,kx_{n_{k}}\rightharpoonup v^{*},k\rightarrow\infty and xnjx,jx_{n_{j}}\rightharpoonup x^{*},j\rightarrow\infty. We show that v=xv^{*}=x^{*}.

Observe that

2xn,xv=xnv2xnx2v2+x2,2\langle x_{n},x^{*}-v^{*}\rangle=\|x_{n}-v^{*}\|^{2}-\|x_{n}-x^{*}\|^{2}-\|v^{*}\|^{2}+\|x^{*}\|^{2}, (42)
2xn1,xv=xn1v2xn1x2v2+x2,2\langle x_{n-1},x^{*}-v^{*}\rangle=\|x_{n-1}-v^{*}\|^{2}-\|x_{n-1}-x^{*}\|^{2}-\|v^{*}\|^{2}+\|x^{*}\|^{2}, (43)

and

2xn2,xv=xn2v2xn2x2v2+x2,2\langle x_{n-2},x^{*}-v^{*}\rangle=\|x_{n-2}-v^{*}\|^{2}-\|x_{n-2}-x^{*}\|^{2}-\|v^{*}\|^{2}+\|x^{*}\|^{2}, (44)

Therefore,

2θxn1,xv\displaystyle 2\langle-\theta x_{n-1},x^{*}-v^{*}\rangle =\displaystyle= θxn1v2+θxn1x2\displaystyle-\theta\|x_{n-1}-v^{*}\|^{2}+\theta\|x_{n-1}-x^{*}\|^{2} (45)
+θv2θx2\displaystyle+\theta\|v^{*}\|^{2}-\theta\|x^{*}\|^{2}

and

2δxn2,xv\displaystyle 2\langle-\delta x_{n-2},x^{*}-v^{*}\rangle =\displaystyle= δxn2v2+δxn2x2\displaystyle-\delta\|x_{n-2}-v^{*}\|^{2}+\delta\|x_{n-2}-x^{*}\|^{2} (46)
+δv2δx2\displaystyle+\delta\|v^{*}\|^{2}-\delta\|x^{*}\|^{2}

Addition of  (42),  (45) and  (46) gives

2xnθxn1δxn2,xv\displaystyle 2\langle x_{n}-\theta x_{n-1}-\delta x_{n-2},x^{*}-v^{*}\rangle =\displaystyle= (xnv2θxn1v2δxn2v2)\displaystyle\Big{(}\|x_{n}-v^{*}\|^{2}-\theta\|x_{n-1}-v^{*}\|^{2}-\delta\|x_{n-2}-v^{*}\|^{2}\Big{)}
(xnx2θxn1x2δxn2x2)\displaystyle-\Big{(}\|x_{n}-x^{*}\|^{2}-\theta\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}\Big{)}
+(1θδ)(x2v2).\displaystyle+(1-\theta-\delta)(\|x^{*}\|^{2}-\|v^{*}\|^{2}).

According to  (41), we have

limn[xnx2θxn1x2δxn2x2]\displaystyle\underset{n\rightarrow\infty}{\lim}\Big{[}\|x_{n}-x^{*}\|^{2}-\theta\|x_{n-1}-x^{*}\|^{2}-\delta\|x_{n-2}-x^{*}\|^{2}\Big{]}

exists and

limn[xnv2θxn1v2δxn2v2]\displaystyle\underset{n\rightarrow\infty}{\lim}\Big{[}\|x_{n}-v^{*}\|^{2}-\theta\|x_{n-1}-v^{*}\|^{2}-\delta\|x_{n-2}-v^{*}\|^{2}\Big{]}

exists. This implies that

limnxnθxn1δxn2,xv\underset{n\rightarrow\infty}{\lim}\langle x_{n}-\theta x_{n-1}-\delta x_{n-2},x^{*}-v^{*}\rangle

exists. Now,

vθvδv,xv\displaystyle\langle v^{*}-\theta v^{*}-\delta v^{*},x^{*}-v^{*}\rangle =\displaystyle= limkxnkθxnk1δxnk2,xv\displaystyle\underset{k\rightarrow\infty}{\lim}\langle x_{n_{k}}-\theta x_{n_{k}-1}-\delta x_{n_{k}-2},x^{*}-v^{*}\rangle
=\displaystyle= limnxnθxn1δxn2,xv\displaystyle\underset{n\rightarrow\infty}{\lim}\langle x_{n}-\theta x_{n-1}-\delta x_{n-2},x^{*}-v^{*}\rangle
=\displaystyle= limjxnjθxnj1δxnj2,xv\displaystyle\underset{j\rightarrow\infty}{\lim}\langle x_{n_{j}}-\theta x_{n_{j}-1}-\delta x_{n_{j}-2},x^{*}-v^{*}\rangle
=\displaystyle= xθxδx,xv,\displaystyle\langle x^{*}-\theta x^{*}-\delta x^{*},x^{*}-v^{*}\rangle,

and this yields

(1θδ)xv2=0.(1-\theta-\delta)\|x^{*}-v^{*}\|^{2}=0.

Since δ0<1θ\delta\leq 0<1-\theta, we obtain that x=vx^{*}=v^{*}. Hence, {xn}\{x_{n}\} converges weakly to a point in A1(0)A^{-1}(0). This completes the proof.

Remark 3.3.

(a) The conditions imposed on the parameters in our proposed Algorithm 1 are weaker than the ones imposed in [25, 4.1 Algorithms] and [36, Chapter 4, (4.2.5)]. For example, we do not impose the summability conditions in [25, Theorem 4.1 (32)], [25, Theorem 4.2 (35)] and [36, Chapter 4 (4.2.5)] in our Algorithm 1. Therefore, our results are improvements over the results obtained in [25].

(b) The assumptions on the parameters in Algorithm 1 are also different from the conditions imposed on the iterative parameters in [22, Algorithm 1.2]. For example, conditions (b) and (c) of [22, Algorithm 1.2] are not needed in our convergence analysis. More importantly, we do not assume that θ+δ=1\theta+\delta=1 (as imposed in [22, Algorithm 1.2]) in our convergence analysis \Diamond

In the next result, we give a non-asymptotic O(1/n)O(1/n) convergence rate of our proposed Algorithm 1.

Theorem 3.4.

Let A:H2HA:H\to 2^{H} be a maximal monotone. Assume that A1(0)A^{-1}(\textbf{0})\neq\emptyset and x0=x1=x2x_{0}=x_{-1}=x_{-2}. Let {xn}\{x_{n}\} be generated by Algorithm 1. Then, for any xA1(0)x^{*}\in A^{-1}(\textbf{0}) and n>0n>0, it holds that

min0jn2xj+1yj23(1+θ2+δ2)1c2(1θδ)x0x2n1\underset{0\leq j\leq n-2}{\min}\|x_{j+1}-y_{j}\|^{2}\leq 3\Big{(}1+\theta^{2}+\delta^{2}\Big{)}\frac{1}{c_{2}}\frac{(1-\theta-\delta)\|x_{0}-x^{*}\|^{2}}{n-1} (47)

where c2=13θ2|δ|2θ|δ|+2θδ+δc_{2}=1-3\theta-2|\delta|-2\theta|\delta|+2\theta\delta+\delta.

Proof.

Let xA1(0)x^{*}\in A^{-1}(0). Observe that

Γ¯0\displaystyle\bar{\Gamma}_{0} =\displaystyle= Γ0=x0x2θx1x2δx2x2+(1|δ|θ)x0x12\displaystyle\Gamma_{0}=\|x_{0}-x^{*}\|^{2}-\theta\|x_{-1}-x^{*}\|^{2}-\delta\|x_{-2}-x^{*}\|^{2}+(1-|\delta|-\theta)\|x_{0}-x_{-1}\|^{2} (48)
=\displaystyle= x0x2θx1x2δx2x2\displaystyle\|x_{0}-x^{*}\|^{2}-\theta\|x_{-1}-x^{*}\|^{2}-\delta\|x_{-2}-x^{*}\|^{2}
=\displaystyle= (1θδ)x0x2.\displaystyle(1-\theta-\delta)\|x_{0}-x^{*}\|^{2}.

From  (33), we obtain

c2j=0nxj1xj22Γ¯0Γ¯n+1.\displaystyle c_{2}\sum_{j=0}^{n}\|x_{j-1}-x_{j-2}\|^{2}\leq\bar{\Gamma}_{0}-\bar{\Gamma}_{n+1}. (49)

This implies that

j=0nxj1xj22\displaystyle\sum_{j=0}^{n}\|x_{j-1}-x_{j-2}\|^{2} \displaystyle\leq 1c2Γ¯0=1c2Γ0\displaystyle\frac{1}{c_{2}}\bar{\Gamma}_{0}=\frac{1}{c_{2}}\Gamma_{0} (50)
=\displaystyle= 1c2(1θδ)x0x2.\displaystyle\frac{1}{c_{2}}(1-\theta-\delta)\|x_{0}-x^{*}\|^{2}.

This implies that

min0jnxj1xj221c2(1θδ)x0x2n+1.\displaystyle\underset{0\leq j\leq n}{\min}\|x_{j-1}-x_{j-2}\|^{2}\leq\frac{1}{c_{2}}\frac{(1-\theta-\delta)\|x_{0}-x^{*}\|^{2}}{n+1}. (51)

Consequently,

min0jn2xj+1xj21c2(1θδ)x0x2n1\displaystyle\underset{0\leq j\leq n-2}{\min}\|x_{j+1}-x_{j}\|^{2}\leq\frac{1}{c_{2}}\frac{(1-\theta-\delta)\|x_{0}-x^{*}\|^{2}}{n-1} (52)

and

min0jn1xjxj121c2(1θδ)x0x2n.\displaystyle\underset{0\leq j\leq n-1}{\min}\|x_{j}-x_{j-1}\|^{2}\leq\frac{1}{c_{2}}\frac{(1-\theta-\delta)\|x_{0}-x^{*}\|^{2}}{n}. (53)

From  (37), we obtain

xn+1yn2\displaystyle\|x_{n+1}-y_{n}\|^{2} \displaystyle\leq (xn+1xn+θxnxn1+|δ|xn1xn2)2\displaystyle\Big{(}\|x_{n+1}-x_{n}\|+\theta\|x_{n}-x_{n-1}\|+|\delta|\|x_{n-1}-x_{n-2}\|\Big{)}^{2} (54)
\displaystyle\leq 3(xn+1xn2+θ2xnxn12+δ2xn1xn22).\displaystyle 3\Big{(}\|x_{n+1}-x_{n}\|^{2}+\theta^{2}\|x_{n}-x_{n-1}\|^{2}+\delta^{2}\|x_{n-1}-x_{n-2}\|^{2}\Big{)}.

Therefore,

min0jn2xj+1yj2\displaystyle\underset{0\leq j\leq n-2}{\min}\|x_{j+1}-y_{j}\|^{2} \displaystyle\leq 3min0jn2xj+1xj2+3θ2min0jn2xjxj12\displaystyle 3\underset{0\leq j\leq n-2}{\min}\|x_{j+1}-x_{j}\|^{2}+3\theta^{2}\underset{0\leq j\leq n-2}{\min}\|x_{j}-x_{j-1}\|^{2} (55)
+3δ2min0jn2xj1xj22\displaystyle+3\delta^{2}\underset{0\leq j\leq n-2}{\min}\|x_{j-1}-x_{j-2}\|^{2}
\displaystyle\leq (3+3θ2+3δ2)1c2(1θδ)x0x2n1.\displaystyle\Big{(}3+3\theta^{2}+3\delta^{2}\Big{)}\frac{1}{c_{2}}\frac{(1-\theta-\delta)\|x_{0}-x^{*}\|^{2}}{n-1}.

This completes the proof. ∎

Remark 3.5.

The non-asymptotic O(1/n)O(1/n) convergence rate of Algorithm 1 given in Theorem 3.4 extends the convergence rate given in [20, Theorem 2, (2.13)] for the case when δ=0\delta=0 in Algorithm 1.

4 Applications

We give some applications of our proposed two-step inertial proximal point Algorithm 1 to solving convex-concave saddle-point problem using two-step inertial versions of augmented Lagrangian, the proximal method of multipliers, and ADMM. We also consider two-step inertial versions primal–dual hybrid gradient method and Douglas–Rachford splitting method.

4.1 Convex–Concave Saddle-Point Problem

Let us consider the following convex-concave saddle-point problem:

minuH1maxvH2ϕ(u,v),\min_{u\in H_{1}}\max_{v\in H_{2}}\phi(u,v), (56)

where H1H_{1} and H2H_{2} are real Hilbert spaces equipped with inner product .,.\langle.,.\rangle and ϕ(.,v)(H1),ϕ(u,.)(H2)\phi(.,v)\in\mathcal{F}(H_{1}),-\phi(u,.)\in\mathcal{F}(H_{2}), where the associated saddle subdifferential operator

(uϕ(u,v)v(ϕ(u,v))\left(\begin{array}[]{c}\partial_{u}\phi(u,v)\\ \partial_{v}(-\phi(u,v)\\ \end{array}\right) (57)

is monotone. Using the ideas in [51], we apply our proposed Algorithm 1 to solve problem  (56) below.

Algorithm 2 2-Step Inertial PPA for Saddle-Point Problem
1:Choose parameters δ\delta and θ\theta satisfying condition  (18). Choose ϕ(.,v)(H1),ϕ(u,.)(H2),u^0H1,v^0H2,x1=x0=y0=(u^0,v^0)\phi(.,v)\in\mathcal{F}(H_{1}),-\phi(u,.)\in\mathcal{F}(H_{2}),\hat{u}_{0}\in H_{1},\hat{v}_{0}\in H_{2},x_{-1}=x_{0}=y_{0}=(\hat{u}_{0},\hat{v}_{0}) and λ>0\lambda>0. Set n=0.n=0.
2:Compute as follows:
{xn+1=(un+1,vn+1)=argminuH1maxvH2{ϕ(u,v)+12λuu^n212λvv^n2},yn+1=(u^n+1,v^n+1)=xn+1+θ(xn+1xn)+δ(xnxn1)\displaystyle\left\{\begin{array}[]{ll}&x_{n+1}=(u_{n+1},v_{n+1})={\rm arg}\min\limits_{u\in H_{1}}\max\limits_{v\in H_{2}}\Big{\{}\phi(u,v)+\frac{1}{2\lambda}\|u-\hat{u}_{n}\|^{2}-\frac{1}{2\lambda}\|v-\hat{v}_{n}\|^{2}\Big{\}},\\ &y_{n+1}=(\hat{u}_{n+1},\hat{v}_{n+1})=x_{n+1}+\theta(x_{n+1}-x_{n})+\delta(x_{n}-x_{n-1})\end{array}\right. (60)
3:Set nn+1n\leftarrow n+1, and go to 2.

Furthermore, let us consider the following convex–concave Lagrangian problem:

minuH1maxvH2{L(u,v):f(u)+v,Aub},\min_{u\in H_{1}}\max_{v\in H_{2}}\{L(u,v):f(u)+\langle v,Au-b\rangle\}, (61)

which is associated with the following linearly constrained problem

{minuH1f(u)s.t.Au=b,\displaystyle\left\{\begin{array}[]{ll}&\min\limits_{u\in H_{1}}f(u)\\ &s.t.Au=b,\end{array}\right. (64)

where AB(H1,H2)A\in B(H_{1},H_{2}) and bH2b\in H_{2}. Applying our proposed Algorithm 1 to solving problem  (61) gives

Algorithm 3 2-Step Inertial Proximal Method of Multipliers
1:Choose parameters δ\delta and θ\theta satisfying condition  (18). Choose ϕ(.,v)(H1),ϕ(u,.)(H2),u^0H1,v^0H2,x1=x0=y0=(u^0,v^0)\phi(.,v)\in\mathcal{F}(H_{1}),-\phi(u,.)\in\mathcal{F}(H_{2}),\hat{u}_{0}\in H_{1},\hat{v}_{0}\in H_{2},x_{-1}=x_{0}=y_{0}=(\hat{u}_{0},\hat{v}_{0}) and λ>0\lambda>0. Set n=0.n=0.
2:Compute as follows:
{un+1=argminuH1{L(u,v^n)+λ2Aub2+12λuu^n2},xn+1=(un+1,v^n+λ(Aun+1b)),yn+1=(u^n+1,v^n+1)=xn+1+θ(xn+1xn)+δ(xnxn1)\displaystyle\left\{\begin{array}[]{ll}&u_{n+1}={\rm arg}\min\limits_{u\in H_{1}}\Big{\{}L(u,\hat{v}_{n})+\frac{\lambda}{2}\|Au-b\|^{2}+\frac{1}{2\lambda}\|u-\hat{u}_{n}\|^{2}\Big{\}},\\ &x_{n+1}=(u_{n+1},\hat{v}_{n}+\lambda(Au_{n+1}-b)),\\ &y_{n+1}=(\hat{u}_{n+1},\hat{v}_{n+1})=x_{n+1}+\theta(x_{n+1}-x_{n})+\delta(x_{n}-x_{n-1})\end{array}\right. (68)
3:Set nn+1n\leftarrow n+1, and go to 2.

4.2 Version of Primal–Dual Hybrid Gradient Method

In this section, let us consider the following coupled convex-concave saddle-point problem

minuH1maxvH2{ϕ(u,v)f(u)+Ku,vg(v)},\min_{u\in H_{1}}\max_{v\in H_{2}}\{\phi(u,v)\equiv f(u)+\langle Ku,v\rangle-g(v)\}, (69)

where f(H1),g(H2)f\in\mathcal{F}(H_{1}),g\in\mathcal{F}(H_{2}) and K(H1,H2)K\in\mathcal{B}(H_{1},H_{2}). The primal-dual hybrid gradient (PDHG) method (see, for example, [18, 29]) is one of the popular methods for solving problem  (69) and it is a preconditioned proximal point algorithm with λ=1\lambda=1 for the saddle subdifferential operator of ϕ\phi given in  (57) (see, [19, 34]). Given the associated preconditioner as

P=(1τIKK1σI)P=\left(\begin{array}[]{cc}\frac{1}{\tau}I&-K^{*}\\ -K&\frac{1}{\sigma}I\\ \end{array}\right) (70)

which is positive definite when τσK2<1\tau\sigma\|K\|^{2}<1, K:=supx1Kx\|K\|:=\sup_{\|x\|\leq 1}\|Kx\|; we adapt our proposed Algorithm 1 to solve problem  (69) as given below:

Algorithm 4 2-Step Inertial PDHG Method
1:Choose parameters δ\delta and θ\theta satisfying condition  (18). Choose f(H1),g(H2),K(H1,H2),u^0H1,v^0H2,τσK2<1,x1=x0=y0=(u^0,v^0)f\in\mathcal{F}(H_{1}),g\in\mathcal{F}(H_{2}),K\in\mathcal{B}(H_{1},H_{2}),\hat{u}_{0}\in H_{1},\hat{v}_{0}\in H_{2},\tau\sigma\|K\|^{2}<1,x_{-1}=x_{0}=y_{0}=(\hat{u}_{0},\hat{v}_{0}). Set n=0.n=0.
2:Compute as follows:
{un+1=argminuH1{f(u)+Ku,v^n+12τuu^n2},vn+1=argminvH2{g(v)K(2un+1u^n),v+12σvv^n2},xn+1=(un+1,vn+1),yn+1=(u^n+1,v^n+1)=xn+1+θ(xn+1xn)+δ(xnxn1)\displaystyle\left\{\begin{array}[]{ll}&u_{n+1}={\rm arg}\min\limits_{u\in H_{1}}\Big{\{}f(u)+\langle Ku,\hat{v}_{n}\rangle+\frac{1}{2\tau}\|u-\hat{u}_{n}\|^{2}\Big{\}},\\ &v_{n+1}={\rm arg}\min\limits_{v\in H_{2}}\Big{\{}g(v)-\langle K(2u_{n+1}-\hat{u}_{n}),v\rangle+\frac{1}{2\sigma}\|v-\hat{v}_{n}\|^{2}\Big{\}},\\ &x_{n+1}=(u_{n+1},v_{n+1}),\\ &y_{n+1}=(\hat{u}_{n+1},\hat{v}_{n+1})=x_{n+1}+\theta(x_{n+1}-x_{n})+\delta(x_{n}-x_{n-1})\end{array}\right. (75)
3:Set nn+1n\leftarrow n+1, and go to 2.

4.3 Version of Douglas–Rachford Splitting Method

This section presents an application of Algorithm 1 to the Douglas-Rachford splitting method for finding zeros of an operator TT such that TT is the sum of two maximal monotone operators, i.e. T=A+BT=A+B with A,B:H2HA,B:H\to 2^{H} being maximal monotone multi-functions on a Hilbert space HH. The method was originally introduced in [27] in a finite-dimensional setting, its extension to maximal monotone mappings in Hilbert spaces can be found in [37].

The Douglas–Rachford splitting method [27, 37] iteratively applies the operator

Gλ,A,B:=JλAo(2JλBI)+(IJλB)G_{\lambda,A,B}:=J_{\lambda}^{A}o(2J_{\lambda}^{B}-I)+(I-J_{\lambda}^{B}) (76)

and has found to be effective in many applications including ADMM. In [28, Theorem 4], the Douglas–Rachford operator  (76) was found to be a resolvent Jλ,A,BMJ^{M}_{\lambda,A,B} of a maximal monotone operator

Mλ,A,B:=Gλ,A,B1I.M_{\lambda,A,B}:=G^{-1}_{\lambda,A,B}-I. (77)

Therefore, the Douglas–Rachford splitting method is a special case of the proximal point method (with λ=1\lambda=1), written as

vn+1=Jλ,A,BM(vn)=Gλ,A,B(vn).v_{n+1}=J^{M}_{\lambda,A,B}(v_{n})=G_{\lambda,A,B}(v_{n}). (78)

Hence, we can apply our Algorithm 1 to the Douglas-Rachford splitting method as given below:

Algorithm 5 2-Step Inertial Douglas-Rachford Splitting Method
1:Choose parameters δ\delta and θ\theta satisfying condition  (18). Choose u0=v0=v1H,λ>0u_{0}=v_{0}=v_{-1}\in H,\lambda>0. Set n=0.n=0.
2:Compute as follows:
{vn+1=Gλ,A,B(un),un+1=vn+1+θ(vn+1vn)+δ(vnvn1)\displaystyle\left\{\begin{array}[]{ll}&v_{n+1}=G_{\lambda,A,B}(u_{n}),\\ &u_{n+1}=v_{n+1}+\theta(v_{n+1}-v_{n})+\delta(v_{n}-v_{n-1})\end{array}\right. (81)
3:Set nn+1n\leftarrow n+1, and go to 2.

4.4 Version of Alternating Direction Method of Multipliers (ADMM)

Suppose H1,H2H_{1},H_{2} andGG are real Hilbert spaces and let us consider the following linearly constrained convex problem

{minxH1,zH2f(x)+g(z)s.t.Ax+Bz=c,\displaystyle\left\{\begin{array}[]{ll}&\min\limits_{x\in H_{1},z\in H_{2}}f(x)+g(z)\\ &s.t.Ax+Bz=c,\end{array}\right. (84)

where f(H1),g(H2),A(H1,G),B(H2,G)f\in\mathcal{F}(H_{1}),g\in\mathcal{F}(H_{2}),A\in\mathcal{B}(H_{1},G),B\in\mathcal{B}(H_{2},G) and cGc\in G. The dual of problem  (84) is given by

maxvG{f(Av)g(Bv)+c,v},\max_{v\in G}\{-f^{*}(-A^{*}v)-g^{*}(-B^{*}v)+\langle c,v\rangle\}, (85)

where the conjugate of ff is given by f(y):=supxH1{y,xf(x)}f^{*}(y):=\sup_{x\in H_{1}}\{\langle y,x\rangle-f(x)\} and the conjugate of gg is given by g(y):=supzH2{y,zg(z)}g^{*}(y):=\sup_{z\in H_{2}}\{\langle y,z\rangle-g(z)\}. Solving the dual problem  (85) is equivalent to solving the inclusion problem: find vGv\in G subject to

0Af(Av)Bg(Bv)c.0\in-A\partial f^{*}(-A^{*}v)-B\partial g^{*}(-B^{*}v)-c. (86)

Using similar ideas in [35, Section 6.4], we adapt the 2-Step Inertial Douglas-Rachford Splitting Method in Algorithm 5 to the following 2-Step Inertial ADMM (where we apply Algorithm 5 is adapted to monotone inclusion problem  (86)):

Algorithm 6 2-Step Inertial ADMM
1:Choose parameters δ\delta and θ\theta satisfying condition  (18). Choose f(H1),g(H2),A(H1,G),B(H2,G),x0H1,z0H2,v^0G,λ>0f\in\mathcal{F}(H_{1}),g\in\mathcal{F}(H_{2}),A\in\mathcal{B}(H_{1},G),B\in\mathcal{B}(H_{2},G),x_{0}\in H_{1},z_{0}\in H_{2},\hat{v}_{0}\in G,\lambda>0. Set n=0.n=0.
2:Compute as follows:
{xn+1=argminxH1{f(x)+v^n,Ax+Bznc+λ2Ax+Bznc2},η^n=v^n,n=0,1,η^n=v^n+θ(v^nv^n1+λA(xn+1xn))+δ(v^n1v^n2+λA(xnxn1))n=2,3,zn+1=argminzH2{g(z)+η^n,Axn+1+Bzc+λ2Axn+1+Bzc2},v^n+1=η^n+λ(Axn+1+Bzn+1c)\displaystyle\left\{\begin{array}[]{ll}&x_{n+1}={\rm arg}\min\limits_{x\in H_{1}}\Big{\{}f(x)+\langle\hat{v}_{n},Ax+Bz_{n}-c\rangle+\frac{\lambda}{2}\|Ax+Bz_{n}-c\|^{2}\Big{\}},\\ &\hat{\eta}_{n}=\hat{v}_{n},n=0,1,\\ &\hat{\eta}_{n}=\hat{v}_{n}+\theta(\hat{v}_{n}-\hat{v}_{n-1}+\lambda A(x_{n+1}-x_{n}))+\delta(\hat{v}_{n-1}-\hat{v}_{n-2}+\lambda A(x_{n}-x_{n-1}))\\ &n=2,3,\ldots\\ &z_{n+1}={\rm arg}\min\limits_{z\in H_{2}}\Big{\{}g(z)+\langle\hat{\eta}_{n},Ax_{n+1}+Bz-c\rangle+\frac{\lambda}{2}\|Ax_{n+1}+Bz-c\|^{2}\Big{\}},\\ &\hat{v}_{n+1}=\hat{\eta}_{n}+\lambda(Ax_{n+1}+Bz_{n+1}-c)\end{array}\right. (93)
3:Set nn+1n\leftarrow n+1, and go to 2.

We remark that the idea of Algorithm 6 can be applied to stochastic ADMM or its multi-block extensions in the recent works in [10, 11] for solving problems arising in machine learning because the basic model therein is a two-block separable convex optimization problem.

5 Numerical Illustrations

The focus of this section is to provide some computational experiments to demonstrate the effectiveness, accuracy and easy-to-implement nature of our proposed algorithms. We further compare our proposed schemes with some existing methods in the literature. All codes were written in MATLAB R2020b and performed on a PC Desktop Intel(R) Core(TM) i7-6600U CPU @ 3.00GHz 3.00 GHz, RAM 32.00 GB.

Numerical comparisons are made with the algorithm proposed in [20], denoted as Chen et al. Alg. 1, algorithm proposed in [35], denoted as Kim Alg, and algorithm proposed in [9], denoted as P-PPA Alg.

Example 5.1.

First, we compare our proposed 2-Step Inertial Proximal Method of Multipliers in Algorithm 3 with the methods in [20, 35] using the following basis pursuit problem:

{minxNu1subjecttoAu=b,\displaystyle\left\{\begin{array}[]{ll}&\min\limits_{x\in\mathbb{R}^{N}}\|u\|_{1}\\ &{\rm subject\leavevmode\nobreak\ to}\leavevmode\nobreak\ \leavevmode\nobreak\ Au=b,\end{array}\right. (96)

where AM×NA\in\mathbb{R}^{M\times N} and bMb\in\mathbb{R}^{M}. In our experiment, we consider N=200,500N=200,500 and M=50,100M=50,100 and AA is randomly generated. A true sparse uu^{*} is randomly generated followed by a thresholding to sparsify nonzero elements, and bb is then given by AuAu^{*}. We run 100 iterations of the proximal method of multipliers and its variants with initial x0=0x_{0}=\textbf{0}. Since the un+1u_{n+1}-update does not have a closed form, we used a sufficient number of iterations to solve the un+1u_{n+1}-update using the strongly convex version of FISTA [12] in [16, Theorem 4.10]. The stopping criterion for this example is Dn:=ynxn+12ϵD_{n}:=||y_{n}-x_{n+1}||_{2}\leq\epsilon, where ϵ=104\epsilon=10^{-4}. The parameter used are provided in Table 1

Table 1: Methods Parameters for Example 5.1
Proposed Alg. 3 λ1=104\displaystyle\lambda_{1}=10^{-4} θ=0.1\displaystyle\theta=0.1 δ=0.14412\displaystyle\delta=-0.14412
Chen et al. Alg. λ1=104\displaystyle\lambda_{1}=10^{-4} θ=0.1\displaystyle\theta=0.1 δ=0\displaystyle\delta=0
Kim Alg. λ1=104\displaystyle\lambda_{1}=10^{-4}
Table 2: Example 5.1 comparison for different NN and MM
(N,M)=(200,50)(N,M)=(200,50) (N,M)=(200,100)(N,M)=(200,100)
No. of Iter. CPU Time No. of Iter. CPU Time
Proposed Alg. 3 3 4.3561×1034.3561\times 10^{-3} 3 4.1955×1034.1955\times 10^{-3}
Chen et al. Alg. 1 3 5.8181×1035.8181\times 10^{-3} 3 4.7572×1034.7572\times 10^{-3}
Kim Alg. 3 5.0265×1035.0265\times 10^{-3} 3 5.5091×1035.5091\times 10^{-3}
(N,M)=(500,50)(N,M)=(500,50) (N,M)=(500,100)(N,M)=(500,100)
No. of Iter. CPU Time No. of Iter. CPU Time
Proposed Alg. 3 3 2.7343×1022.7343\times 10^{-2} 3 3.0797×1023.0797\times 10^{-2}
Chen et al. Alg. 1 3 3.0817×1023.0817\times 10^{-2} 3 3.1057×1023.1057\times 10^{-2}
Kim Alg. 3 2.8120×1022.8120\times 10^{-2} 3 3.8054×1023.8054\times 10^{-2}
Refer to caption
Figure 1: Example 5.1: (N,M)=(200,50)(N,M)=(200,50)
Refer to caption
Figure 2: Example 5.1: (N,M)=(200,100)(N,M)=(200,100)
Refer to caption
Figure 3: Example 5.1: (N,M)=(500,50)(N,M)=(500,50)
Refer to caption
Figure 4: Example 5.1: (N,M)=(500,100)(N,M)=(500,100)
Example 5.2.

In this example, we apply our proposed 2-Step Inertial ADMM (Algorithm 6) to the problem

{minxNmaxzM12Fxb2+γz1s.t.Dxz=0\displaystyle\left\{\begin{array}[]{ll}&\min\limits_{x\in\mathbb{R}^{N}}\max\limits_{z\in\mathbb{R}^{M}}\frac{1}{2}\|Fx-b\|^{2}+\gamma\|z\|_{1}\\ &s.t.Dx-z=0\end{array}\right. (99)

where γ>0\gamma>0 and compare with the methods in [9, 20, 35]. The problem  (99) is associated with the total-variation-regularized least squares problem

minxN12Fxb2+γDx1,\min_{x\in\mathbb{R}^{N}}\frac{1}{2}\|Fx-b\|^{2}+\gamma\|Dx\|_{1}, (100)

where Fp×N,bpF\in\mathbb{R}^{p\times N},b\in\mathbb{R}^{p}, and a matrix DM×ND\in\mathbb{R}^{M\times N} is given as

D=(110000110001100011)\displaystyle D=\left(\begin{array}[]{ccccccc}-1&1&0&0&\ldots&0&\\ 0&1&-1&0&\ldots&0&\\ \vdots&\ddots&\ddots&\ddots&\ddots&\vdots&\\ \vdots&\ddots&0&1&-1&0&\\ 0&\ldots&\ldots&0&1&-1&\end{array}\right)

Take f(x):=12Fxb2,g(z):=γz1,A=D,B=If(x):=\frac{1}{2}\|Fx-b\|^{2},g(z):=\gamma\|z\|_{1},A=D,B=-I and c=0c=0 in  (84), our 2-Step Inertial ADMM 6 reduces to:

Algorithm 7 2-Step Inertial ADMM
1:Choose parameters δ\delta and θ\theta satisfying condition  (18). Choose x0N,z0M,v^0M,λ>0x_{0}\in\mathbb{R}^{N},z_{0}\in\mathbb{R}^{M},\hat{v}_{0}\in\mathbb{R}^{M},\lambda>0. Set n=0.n=0.
2:Compute as follows:
{xn+1=argminxN{12Fxb2+v^n,Dxzn+λ2Dxzn2}=(FTF+λDTD)1(DT(λznv^n)+FTb),η^n=v^n,n=0,1,η^n=v^n+θ(v^nv^n1+λD(xn+1xn))+δ(v^n1v^n2+λD(xnxn1))n=2,3,zn+1=argminzM{γz1+η^n,Dxn+1z+λ2Dxn+1z2}=Sγλ(Dxn+1+1λη^n)v^n+1=η^n+λ(Dxn+1zn+1),\displaystyle\left\{\begin{array}[]{ll}&x_{n+1}={\rm arg}\min\limits_{x\in\mathbb{R}^{N}}\Big{\{}\frac{1}{2}\|Fx-b\|^{2}+\langle\hat{v}_{n},Dx-z_{n}\rangle+\frac{\lambda}{2}\|Dx-z_{n}\|^{2}\Big{\}}\\ &=(F^{T}F+\lambda D^{T}D)^{-1}(D^{T}(\lambda z_{n}-\hat{v}_{n})+F^{T}b),\\ &\hat{\eta}_{n}=\hat{v}_{n},n=0,1,\\ &\hat{\eta}_{n}=\hat{v}_{n}+\theta(\hat{v}_{n}-\hat{v}_{n-1}+\lambda D(x_{n+1}-x_{n}))+\delta(\hat{v}_{n-1}-\hat{v}_{n-2}+\lambda D(x_{n}-x_{n-1}))\\ &n=2,3,\ldots\\ &z_{n+1}={\rm arg}\min\limits_{z\in\mathbb{R}^{M}}\Big{\{}\gamma\|z\|_{1}+\langle\hat{\eta}_{n},Dx_{n+1}-z\rangle+\frac{\lambda}{2}\|Dx_{n+1}-z\|^{2}\Big{\}}\\ &=S_{\frac{\gamma}{\lambda}}\Big{(}Dx_{n+1}+\frac{1}{\lambda}\hat{\eta}_{n}\Big{)}\\ &\hat{v}_{n+1}=\hat{\eta}_{n}+\lambda(Dx_{n+1}-z_{n+1}),\end{array}\right. (110)
3:Set nn+1n\leftarrow n+1, and go to 2.

where Sτ(z):=max{|z|τ,0}sign(z)S_{\tau}(z):=\max\{|z|-\tau,0\}\odot{\rm sign}(z) is the soft-thresholding operator, with the element-wise absolute value, maximum and multiplication operators, |.|,max{.,.}|.|,\max\{.,.\} and \odot, respectively.

In this numerical test, we consider different cases for the choices of NN, MM, and pp. A true vector xx^{*} is constructed such that a vector DxDx^{*} has few nonzero elements. A matrix FF is randomly generated and a noisy vector bb is generated by adding randomly generated (noise) vector to FxFx^{*}.

Case 1: N=100N=100, M=99M=99, and p=5p=5
Case 2: N=200N=200, M=199M=199, and p=10p=10
Case 3: N=300N=300, M=299M=299, and p=20p=20
Case 4: N=400N=400, M=399M=399, and p=40p=40

The stopping criterion for this example is DnϵD_{n}\leq\epsilon, where ϵ=105\epsilon=10^{-5}. The parameter used are provided in Table 3.

Table 3: Methods Parameters for Example 5.2
Proposed Alg. 3 λ1=0.1\displaystyle\lambda_{1}=0.1 θ=0.1\displaystyle\theta=0.1 δ=103\displaystyle\delta=-10^{-3} γ=0.01\displaystyle\gamma=0.01
Dn:=Dxnzn22D_{n}:=||Dx_{n}-z_{n}||_{2}^{2}
Chen et al. Alg. λ1=0.1\displaystyle\lambda_{1}=0.1 θ=0.1\displaystyle\theta=0.1 δ=0\displaystyle\delta=0 γ=0.01\displaystyle\gamma=0.01
Dn:=Dxnzn22D_{n}:=||Dx_{n}-z_{n}||_{2}^{2}
Kim Alg. λ1=0.1\displaystyle\lambda_{1}=0.1 γ=0.01\displaystyle\gamma=0.01 Dn:=Dxnzn22D_{n}:=||Dx_{n}-z_{n}||_{2}^{2}
P-PPA Alg. s=10\displaystyle s=10 ρ=6\displaystyle\rho=6 ε=1.5\displaystyle\varepsilon=1.5 τ=1.5\displaystyle\tau=1.5
σ=1.1s\displaystyle\sigma=\frac{1.1}{s} μ=0.1FTb\displaystyle\mu=0.1||F^{T}b||_{\infty} Dn:=yn+1yn2D_{n}:=||y_{n+1}-y_{n}||_{2}
Table 4: Example 5.1 comparison for different cases
Case 1 Case 2
No. of Iter. CPU Time No. of Iter. CPU Time
Proposed Alg. 6 7 1.9990×1041.9990\times 10^{-4} 21 1.2536×1031.2536\times 10^{-3}
Chen et al. Alg. 1 8 1.9240×1041.9240\times 10^{-4} 43 2.6423×1032.6423\times 10^{-3}
Kim Alg. 11 2.9860×1042.9860\times 10^{-4} 41 2.3246×1032.3246\times 10^{-3}
P-PPA Alg. 39 1.2939×1031.2939\times 10^{-3} 40 2.6799×1032.6799\times 10^{-3}
Case 3 Case 4
No. of Iter. CPU Time No. of Iter. CPU Time
Proposed Alg. 3 21 2.2213×1032.2213\times 10^{-3} 24 7.2828×1037.2828\times 10^{-3}
Chen et al. Alg. 1 52 6.4371×1036.4371\times 10^{-3} 60 2.3097×1022.3097\times 10^{-2}
Kim Alg. 24 2.7002×1032.7002\times 10^{-3} 43 1.5199×1021.5199\times 10^{-2}
P-PPA Alg. 41 3.5834×1033.5834\times 10^{-3} 44 1.2796×1021.2796\times 10^{-2}
Refer to caption
Figure 5: Example 5.2: Case 1
Refer to caption
Figure 6: Example 5.2: Case 2
Refer to caption
Figure 7: Example 5.2: Case 3
Refer to caption
Figure 8: Example 5.2: Case 4
Remark 5.3.


From the above numerical Examples 5.1 and 5.2, we give the following remarks.

  • (1).

    Noting from Figures 2 - 8 and Tables 2 and 4, our proposed Algorithms are clearly easy to implement, efficiency and accurate.

  • (2).

    Clearly from the numerical Example 5.2, our proposed Algorithm 6 outperforms the algorithm proposed by Chen et al. in [20], the algorithm proposed by Kim in [35], and the algorithm proposed by Bai et al. in [9] with respect to the number of iterations and the CPU time. The proposed algorithm 3 also competes favorably with Chen et al. Alg. 1 in [20] and that of Kim in [35] for the case of Example 5.1.

6 Conclusion

We have introduced in this paper, a two-step inertial proximal point algorithm for monotone inclusion problem in Hilbert spaces. Weak convergence of the sequence of iterates are obtained under standard conditions and non-asymptotic O(1/n)\textit{O}(1/n) rate of convergence in the ergodic sense given. We support the theoretical analysis of our method with numerical illustrations derived from basis pursuit problem and numerical implementation of two-step inertial ADMM. Preliminary numerical results show that our proposed method is competitive and outperforms some related and recent proximal point algorithms in the literature. Part of our future projects is to study two-step inertial proximal point algorithm with corrected term for the monotone inclusion considered in this paper.

Acknowledgements

The authors are grateful to the associate editor and the two anonymous referees for their insightful comments and suggestions which have improved greatly on the earlier version of the paper.

References

  • [1] Aluffi-Pentini F, Parisi V., Zirilli F. Algorithm 617. DAFNE: a differential-equations algorithm for nonlinear equations. ACM Trans. Math. Software 10 (1984), 317-324.
  • [2] Alvarez F. On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38 (2000), 1102–1119.
  • [3] Alvarez F., Attouch H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9 (2001), 3–11.
  • [4] Alvarez F. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14 (2004), 773–782.
  • [5] Antipin A. S. Minimization of convex functions on convex sets by means of differential equations. Differentsial’nye Uravneniya 30 (1994), 1475-1486, 1652; translation in Differential Equations 30 (1994), 1365–1375.
  • [6] Attouch H., Cabot, A. Convergence rate of a relaxed inertial proximal algorithm for convex minimization. Optimization 69 (2020), 1281-1312.
  • [7] Attouch H., Peypouquet J., Redont P. A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Optim. 24 (2014), 232-256.
  • [8] Aujol J.-F. , Dossal C., Rondepierre A. Optimal Convergence Rates for Nesterov Acceleration. SIAM J. Optim. 29 (2019), 3131-3153.
  • [9] Bai, J., Zhang, H., Li, J. A parameterized proximal point algorithm for separable convex optimization. Optim. Lett. 12 (2018), 1589-1608.
  • [10] Bai, J., Hager, W., Zhang, H. An inexact accelerated stochastic ADMM for separable composite convex optimization. Comput. Optim. Appl. 81 (2022), 479-518.
  • [11] Bai, J., Han, D., Sun, H., Zhang, H. Convergence analysis of an inexact accelerated stochastic ADMM with larger stepsizes. CSIAM Trans. Appl. Math. To appear, arXiv:2103.16154v2, (2022).
  • [12] A. Beck and M. Teboulle; A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci. 2 (1) (2009), 183-202.
  • [13] Bruck R. E., Jr. Asymptotic convergence of nonlinear contraction semigroups in Hilbert space. J. Functional Analysis 18 (1975), 15–26.
  • [14] Bot R. I., Csetnek E. R., Hendrich C. Inertial Douglas-Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256 (2015), 472–487.
  • [15] Bot R. I., Csetnek E. R. An inertial alternating direction method of multipliers. Minimax Theory Appl. 1 (2016), 29-49.
  • [16] Chambolle, A., Pock, T. An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016).
  • [17] Chambolle, A., Pock, T. On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159 (2016), Ser. A, 253–287.
  • [18] A. Chambolle, T. Pock; A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging and Vision 40(1) (2011), 120-145.
  • [19] A. Chambolle and T. Pock; On the ergodic convergence rates of a first-order primal–dual algorithm, Math. Program. 159 (2016), 253-287.
  • [20] C. Chen, S. Ma, and J. Yang; A general inertial proximal point algorithm for mixed variational inequality problem, SIAM J. Optim. 25 (2015), 2120–2142.
  • [21] Cholamjiak P., Thong D.V., Cho Y.J. A novel inertial projection and contraction method for solving pseudomonotone variational inequality problems. Acta Appl. Math. 169 (2020), 217-245.
  • [22] Combettes, P. L., Glaudin, L. E. Quasi-nonexpansive iterations on the affine hull of orbits: from Mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27 (2017), 2356-2380.
  • [23] Corman, E., Yuan, X. M. A generalized proximal point algorithm and its convergence rate. SIAM J. Optim. 24 (2014), 1614-1638.
  • [24] Davis, D., Yin W. Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42 (2017), 783-805.
  • [25] Dong, Q.L., Huang, J.Z., Li, X.H., Cho, Y. J., Rassias, Th. M. MiKM: multi-step inertial Krasnosel’skii–Mann algorithm and its applications. J. Global Optim. 73, 801–824 (2019).
  • [26] Dong Q. L., Jiang D., Cholamjiak P., Shehu Y. A strong convergence result involving an inertial forward-backward algorithm for monotone inclusions. J. Fixed Point Theory Appl. 19 (2017), 3097-3118.
  • [27] J. Douglas and H.H. Rachford; On the numerical solution of heat conduction problems in two or three space variables, Trans. Amer. Math. Soc. 82 (1956), 421-439.
  • [28] Eckstein, J., Bertsekas, D.P. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55 (1992), Ser. A, 293-318.
  • [29] Esser, E., Zhang, X., Chan, T. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–46 (2010).
  • [30] Gabay, D., Mercier, B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2 (1976), 17-40.
  • [31] Glowinski, R., Marrocco, A. Sur l’approximation, par élḿents finis d’ordre un, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. (French) Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Rouge Anal. Numér. 9 (1975), no. R-2, 41-76.
  • [32] Gu¨\ddot{u}ler, O. New proximal point algorithms for convex minimization. SIAM J. Optim. 2 (1992), 649-664.
  • [33] Hestenes, M. R. Multiplier and gradient methods. J. Optim. Theory Appl. 4 (1969), 303-320.
  • [34] He, B., Yuan, X. Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–49 (2012).
  • [35] Kim, D., Accelerated proximal point method for maximally monotone operators. Math. Program. 190 (2021), 57–87.
  • [36] J. Liang, Convergence rates of first-order operator splitting methods. PhD thesis, Normandie Université; GREYC CNRS UMR 6072, 2016.
  • [37] P.L. Lions and B. Mercier; Splitting algorithms for the sum of two nonlinear operators, SIAM J. Numer. Anal. 16 (1979), 964-979.
  • [38] Lorenz, D. A., Pock, T. An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vision 51 (2015), 311-325.
  • [39] Ma, F., Ni, M. A class of customized proximal point algorithms for linearly constrained convex optimization. Comp. Appl. Math. 37 (2018), 896-911.
  • [40] Maingé, P. E., Merabet, N. A new inertial-type hybrid projection-proximal algorithm for monotone inclusions. Appl. Math. Comput. 215 (2010), 3149-3162.
  • [41] Martinet, B. Régularisation d’inéquations variationnelles par approximations successives. (French) Rev. Française Informat. Recherche Opérationnelle 4 (1970), Sér. R-3, 154–158.
  • [42] Tao, M., Yuan, X. On the optimal linear convergence rate of a generalized proximal point algorithm. J. Sci. Comput. 74 (2018), 826-850.
  • [43] Moreau, J. J. Proximité et dualité dans un espace Hilbertien. (French) Bull. Soc. Math. France 93 (1965), 273-299.
  • [44] Moudafi, A., Elisabeth, E. An approximate inertial proximal method using the enlargement of a maximal monotone operator. Int. J. Pure Appl. Math. 5 (2003), 283-299.
  • [45] B.T. Polyak, Some methods of speeding up the convergence of iterates methods, U.S.S.R Comput. Math. Phys., 4 (5) (1964), 1-17.
  • [46] B. T. Polyak, Introduction to optimization. New York, Optimization Software, Publications Division, 1987.
  • [47] C. Poon, J. Liang, Trajectory of Alternating Direction Method of Multipliers and Adaptive Acceleration, In Advances In Neural Information Processing Systems (2019).
  • [48] C. Poon, J. Liang, Geometry of First-Order Methods and Adaptive Acceleration, arXiv:2003.03910.
  • [49] Powell, M. J. D. A method for nonlinear constraints in minimization problems. 1969 Optimization (Sympos., Univ. Keele, Keele, 1968) pp. 283–298 Academic Press, London.
  • [50] Rockafellar, R. T. Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14 (1976), 877–898.
  • [51] R. T. Rockafellar; Monotone operators and the proximal point algorithm, SIAM J. Control. Optim. 14 (1976) 877-898.