This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\jno

drnxxx

\shortauthorlist

Convergence Rate of IFBS

Convergence Rate of Inertial Forward-Backward Splitting Algorithms Based on the Local Error Bound Condition

Hongwei Liu, Ting Wang
School of Mathematics and Statistics
Email: hwliuxidian@163.comCorresponding author. Email: wangting_7640@163.com
   Xidian University    Xi’an    710126    China
and
Zexian Liu
State Key Laboratory of Scientific and Engineering Computing
Email: liuzexian2008@163.com
   Institute of Computational Mathematics and Scientific/Engineering computing    AMSS    Chinese Academy of Sciences    Beijing    100190    China;
School of Mathematics and Statistics
   Guizhou University    Guiyang    550025    China
Abstract

The “ Inertial Forward-Backward algorithm ” (IFB) is a powerful tool for convex nonsmooth minimization problems, and under the local error bound condition, the RR-linear convergence rates for the sequences of objective values and iterates have been proved if the inertial parameter γk\gamma_{k} satisfies supkγk<1.{\sup_{k}}{\gamma_{k}}<1. However, the convergence result for supkγk=1{\sup_{k}}{\gamma_{k}}=1 is not know. In this paper, based on the local error bound condition, we exploit a new assumption condition for the important parameter tkt_{k} in IFB, which implies that limkγk=1,{\lim_{k\to\infty}}{\gamma_{k}}=1, and establish the convergence rate of function values and strong convergence of the iterates generated by the IFB algorithms with six tkt_{k} satisfying the above assumption condition in Hilbert space. It is remarkable that, under the local error bound condition, we show that the IFB algorithms with some tkt_{k} can achieve sublinear convergence rate of o(1kp)o\left({\frac{1}{{{k^{p}}}}}\right) for any positive integer p>1p>1. In addition, we propose a class of Inertial Forward-Backward algorithm with adaptive modification and show it has same convergence results as IFB under the error bound condition. Some numerical experiments are conducted to illustrate our results. Inertial Forward-Backward algorithm; local error bound condition; rate of convergence.

1 Introduction

Let HH be a real Hilbert space. f:Hf:H\to\mathbb{R} be a smooth convex function and continuously differentiable with LfL_{f}-Lipschitz continuous gradient, and g:H{+}g:H\to\mathbb{R}\cup\left\{{+\infty}\right\} be a proper lower semi-continuous convex function. We also assume that the proximal operator of λg,\lambda g, i.e.,

proxλg()=argminxH{g(x)+12λx2}{\rm{pro}}{{\rm{x}}_{\lambda g}}\left(\cdot\right)\mathop{=\arg\min}\limits_{x\in H}\left\{{g\left(x\right)+\frac{1}{{2\lambda}}{{\left\|{x-\cdot}\right\|}^{2}}}\right\} (1)

can be easliy computed for all λ>0.\lambda>0. In this paper, we consider the following problem:

(P)minxHF(x):=f(x)+g(x).(P)\quad\quad\quad\quad\quad\mathop{\min}\limits_{x\in H}F\left(x\right):=f\left(x\right)+g\left(x\right).

We assume that problem (PP) is solvable, i.e., X:=argminF,{X}:=\arg\min F\neq\emptyset, and for xX{x_{*}}\in{X} we set F:=F(x).{F_{*}}:=F\left({{x_{*}}}\right).

In order to solve the problem (PP), several algorithms have been proposed based on the use of the proximal operator due to the non differentiable part. One can consult Johnstone & Moulin (2017), Moudafi & Oliny (2003) and Villa & Salzo (2013) for a recent account on the proximal-based algorithms that play a central role in nonsmooth optimization. A typical optimization strategy for solving problem (PP) is the Inertial Forward-Backward algorithm (IFB), which consists in applying iteratively at every point the non-expansive operator Tλ:HH,{T_{\lambda}}:H\to H, defined as

Tλ(x):=proxλg(xλf(x))xH.{T_{\lambda}}\left(x\right):={\rm{pro}}{{\rm{x}}_{\lambda g}}\left({x-\lambda\nabla f\left(x\right)}\right)\;\forall x\in H.
Algorithm 1 Inertial Forward-Backward algorithm (IFB)

Step 0. Take y1=x0Rn,t1=1.{y_{1}}={x_{0}}\in{R^{n}},{t_{1}}=1. Input λ=μLf,\lambda=\frac{\mu}{{{L_{f}}}}, where μ]0,1[{\mu}\in\left]{{\rm{0}},{\rm{1}}}\right[.
       Step k. Compute
                     xk=Tλ(yk)=proxλg(ykλf(yk)){x_{k}}={T_{\lambda}}\left({{y_{k}}}\right)={\rm{pro}}{{\rm{x}}_{\lambda g}}\left({{y_{k}}-\lambda\nabla f\left({{y_{k}}}\right)}\right)
                     yk+1=xk+γk(xkxk1){y_{k+1}}={x_{k}}+{\gamma_{k}}\left({{x_{k}}-{x_{k-1}}}\right) where γk=tk1tk+1.{\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}}.

In view of the composition of IFB, we can easily find that the inertial term γk{\gamma_{k}} plays an important role for improving the speed of convergence of IFB. Based on Nesterov’s extrapolation technique (see, Nesterov, 2019), Beck and Teboulle proposed a “fast iterative shrinkage-thresholding algorithm” (FISTA) with t1=1t_{1}=1 and tk+1=1+1+4tk22{t_{k+1}}{\rm{=}}\frac{{{\rm{1+}}\sqrt{{\rm{1+4}}t_{k}^{2}}}}{2} for solving (PP) (see, Beck & Teboulle, 2009). The remarkable properties of this algorithm are the computational simplicity and the significantly better global rate of convergence of the function values, that is F(xk)F(x)=O(1k2).F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=O\left({\frac{1}{{{k^{2}}}}}\right). Several variants of FISTA considered in works such as Apidopoulos & Aujol (2020), Chambolle & Dossal (2015), Calatroni & Chambolle (2019), Donghwan & Jeffrey (2018), Mridula & Shukla (2020), Su & Boyd (2016) and Tao & Boley (2016), the properties such as convergence of the iterates and rate of convergence of function values have also been studied.

Chambolle and Dossal (see, Chambolle & Dossal, 2015) pointed out that FISTA satisfies a better worst-case estimate, however, the convergence of the iterates is not known. They proposed a new tk=k1+aa(a>2){t_{k}}=\frac{{k-1+a}}{a}\left({a>2}\right) to show that the iterates generated by the corresponding IFB, named “FISTA_CD”, converges weakly to the minimizer of FF. Attouch and Peypouquet (see, Attouch & Peypouquet, 2016) further proved that the sequence of function values generated by FISTA_CD approximates the optimal value of the problem with a rate that is strictly faster than O(1k2),O\left({\frac{1}{{{k^{2}}}}}\right), namely F(xk)F(x)=o(1k2).F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2}}}}}\right). Apidopoulos et al. (see, Apidopoulos & Aujol, 2020) noticed that the basic idea of the choices of tkt_{k} in Attouch & Cabot (2018), Beck & Teboulle (2009) and Chambolle & Dossal (2015) is the Nesterov’s rule: tk2tk+12+tk+10,t_{k}^{2}-t_{k+1}^{2}+{t_{k+1}}\geq 0, and they focused on the case that the Nesterov’s rule is not satisfied. They studied the γk=nn+b{\gamma_{k}}=\frac{n}{{n+b}} with 0<b<3{0<b<3} and found that the exact estimate bound is: F(xk)F(x)=O(1k2b3)F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=O\left({\frac{1}{{{k^{\frac{{2b}}{3}}}}}}\right). Attouch and Peypouquet (see, Attouch & Cabot, 2018) considered various options of γk{\gamma_{k}} to analyze the convergence rate of the function values and weak convergence of the iterates under the given assumptions. Further, they showed that the strong convergence of iterates can be satisfied for the special options of ff. Wen, Chen and Pong (see, Wen & Chen, 2017) showed that for the nonsmooth convex minimization problem (PP), under the local error bound condition (see, Tseng & Yun, 2009), the RR-linear convergence of both the sequence {xk}\left\{{{x_{k}}}\right\} and the corresponding sequence of objective values {F(xk)}\left\{{F\left({{x_{k}}}\right)}\right\} can be satisfied if supkγk<1;{\sup_{k}}{\gamma_{k}}<1; and they pointed out that the sequences {xk}\left\{{{x_{k}}}\right\} and {F(xk)}\left\{{F\left({{x_{k}}}\right)}\right\} generated by FISTA with fixed restart or both fixed and adaptive restart schemes (see, O’Donoghue & Candès, 2015) are RR-linearly convergent under the error bound condition. However, the local convergence rate of the iterates generated by FISTA for solving (PP) is still unknown, even under the local error bound condition.

The local error bound condition, which estimates the distance from xx to XX^{*} by the norm of the proximal residual at x,x, has been proved to be extremely useful in analyzing the convergence rates of a host of iterative methods for solving optimization problems (see, Zhou & So, 2017). Major contributions on developing and using error bound condition to derive convergence results of iterative algorithms have been developed in a series of papers (see, e.g. Hai, 2020; Luo & Tseng, 1992; Necoara & Nesterov, 2019; Tseng & Yun, 2009, 2010; Tseng, 2010; Zhou & So, 2017). Zhou and So (see, Zhou & So, 2017) established error bounds for minimizing the sum of a smooth convex function and a general closed proper convex function. Such a problem contains general constrained minimization problems and various regularized loss minimization formulations in machine learning, signal processing, and statistics. There are many choices of ff and gg satisfy the local error bound condition, including:

  • (Pang, 1987, Theorem 3.1) ff is strong convex, and gg is arbitrary.

  • (Luo & Tseng, 1992a, Theorem 2.3) ff is a quadratic function, and gg is a polyhedral function.

  • (Luo & Tseng, 1992, Theorem 2.1) gg is a polyhedral function and f=h(Ax)+c,x,f=h(Ax){\rm{+}}\left\langle{c,x}\right\rangle, where ARm×n,cRnA\in{R^{m\times n}},c\in{R^{n}} and hh is a continuous differentiable function with gradient Lipschitz continuous and strongly convex on any compact convex set. This covers the well-known LASSO.

  • (Luo & Tseng, 1993, Theorem 4.1) gg is a polyhedral function and f(x)=maxyY{(Ax)Tyh(y)}+qTx,f\left(x\right)=\mathop{\max}\limits_{y\in Y}\left\{{{{\left({Ax}\right)}^{T}}y-h\left(y\right)}\right\}+{q^{T}}x, where YY is a polyhedral set, hh is a strongly convex differentiable function with gradient Lipschitz continuous.

  • (Tseng, 2010, Theorem 2) ff takes the form f(x)=h(Ax),f\left(x\right)=h\left({Ax}\right), where hh is same as the above second item and gg is the grouped LASSO regularizer.

More examples satisfying the error bound condition can be referred to Tseng (2010), Zhou & So (2017), Tseng & Yun (2009), Pang (1987) and Luo & Tseng (1992).

It has been observed numerically that first-order methods for solving those specific structured instances of problem (PP) converge at a much faster rate than that suggested by the theory in Tao & Boley (2016), Xiao & Zhang (2013) and Zhou & So (2017). A very powerful approach to analyze this phenomenon is the local error bound condition. Hence, the first point this work focuses is the improved convergence rate of IFBs with some special tkt_{k} under the local error bound condition.

We also pay attention to the Nesterov’s rule: tk2tk+12+tk+10.t_{k}^{2}-t_{k+1}^{2}+{t_{k+1}}\geq 0. For the tkt_{k} satisfies it, we can derive that tk+1tk<1{t_{k+1}}-{t_{k}}<1 and k=1+1tk\sum\limits_{k=1}^{+\infty}{\frac{1}{{{t_{k}}}}} is divergent, which will greatly limit the choice of tk.t_{k}. What we expect is whether we can find the more suitable tkt_{k} and obtain the improved theoretical results if we replace the Nesterov’s rule by some new we proposed.

Contributions.

In this paper, based on the local error bound condition, we exploit an assumption condition for the important parameter tkt_{k} in IFB, and prove the convergence results including convergence rate of function values and strong convergence of iterates generated by the corresponding IFB. The above mentioned assumption condition imposed on tkt_{k} provides a theoretical basis for choosing a new tkt_{k} in IFB to solving those problems satisfying the local error bound condition, like LASSO. We use a “comparison methods” to discuss six choices of tkt_{k}, which include the ones in original FISTA (see, Beck & Teboulle, 2009) and FISTA_CD (see, Chambolle & Dossal, 2015) and satisfy our assumption condition, and separately show the improved convergence rates of the function values and establish the sublinear convergence of the iterates generated by corresponding IFBs. We also establish the same convergence results for IFB with an adaptive modification (IFB_AdapM), which performs well in numerical experiments. It is remarkable that, under the local error bound condition, the strong convergence of the iterates generated by the original FISTA is established, the convergence rate of function values for FISTA_CD is improved to o(1k2(a+1))o\left({\frac{1}{{{k^{2(a+1)}}}}}\right), and the IFB algorithms with some tkt_{k} can achieve sublinear convergence rate o(1kp)o\left({\frac{1}{{{k^{p}}}}}\right) for any positive integer p>1p>1.

2 An new assumption condition for tkt_{k} and the convergence of the corresponding IFB algorithms

In this section, we derive a new assumption condition for the tkt_{k} in IFB, and analyze the convergence results of the corresponding IFB under the local error bound condition.

We start by recalling a key result, which plays an important role in our theoretical analysis.

Lemma 2.1.

(Chambolle & Pock, 2016, ineq (4.36)) For any yRn,λ=μLf,y\in{R^{n}},\lambda=\frac{\mu}{{{L_{f}}}}, where μ(0,1]{\mu}\in\left({{\rm{0}},{\rm{1}}}\right], we have,

xnF(Tλ(y))F(x)+12λxy21μ2λTλ(y)y212λTλ(y)x2.\forall x\in\mathbb{R}^{n}\quad F\left({{T_{\lambda}}\left(y\right)}\right)\leq F\left(x\right)+\frac{1}{{2\lambda}}{\left\|{x-y}\right\|^{2}}-\frac{{1-\mu}}{{2\lambda}}{\left\|{{T_{\lambda}}\left(y\right)-y}\right\|^{2}}-\frac{1}{{2\lambda}}{\left\|{{T_{\lambda}}\left(y\right)-x}\right\|^{2}}. (2)

Next, we give a very weak assumption to show that the sequence {F(xk)},\left\{{F\left({{x_{k}}}\right)}\right\}, which is generated by Algorithm 1 with 0γk10\leq\gamma_{k}\leq 1 for kk is large sufficiently, converges to F(x)F\left({{x_{*}}}\right) independent on tk.t_{k}.

Assumption A0:A_{0}: For any ξ0F,\xi_{0}\geq{F^{*}}, there exist ϵ0>0\epsilon_{0}>0 and τ0>0{\tau_{0}}>0 such that

dist(x,X)τ0{\rm{dist}}\left({x,{X^{*}}}\right)\leq{\tau_{\rm{0}}} (3)

whenever T1Lf(x)x<ε0\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left(x\right)-x}\right\|<{\varepsilon_{0}} and F(x)ξ0.F\left(x\right)\leq\xi_{0}.

Remark 2. Note that Assumption A0A_{0} can be derived by the assumption that FF is boundedness of level sets.

Lemma 2.2.

(Nesterov, 2013, Lemma 2) For λ1λ2>0,{\lambda_{1}}\geq{\lambda_{2}}>0, we have

xnTλ1(x)xTλ2(x)xandTλ1(x)xλ1Tλ2(x)xλ2.\forall x\in\mathbb{R}^{n}\quad\left\|{{T_{{\lambda_{1}}}}\left(x\right)-x}\right\|\geq\left\|{{T_{{\lambda_{2}}}}\left(x\right)-x}\right\|\quad{\rm{and}}\quad\frac{{\left\|{{T_{{\lambda_{1}}}}\left(x\right)-x}\right\|}}{{{\lambda_{1}}}}\leq\frac{{\left\|{{T_{{\lambda_{2}}}}\left(x\right)-x}\right\|}}{{{\lambda_{2}}}}. (4)
Proof 2.3.

Above lemma can be obtained from Lemma 2 of Nesterov (2013) with B:=I,B:=I, L:=1λ.L:=\frac{1}{\lambda}.

Theorem 2.4.

Let {xk},\left\{{{x_{k}}}\right\}, {yk}\left\{{{y_{k}}}\right\} be generated by Algorithm 1. Suppose that Assumption A0A_{0} holds and there exists a positive interger n0n_{0} such that for kn0,k\geq n_{0}, 0γk1.0\leq\gamma_{k}\leq 1. Then,
1) k=1xk+1yk+12\sum\limits_{k=1}^{\infty}{{{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|}^{2}}} is convergent.
2) limkF(xk)=F(x).\mathop{\lim}\limits_{k\to\infty}F\left({{x_{k}}}\right)=F\left({{x^{*}}}\right).

Proof 2.5.

Applying the inequality (2) at the point x=xk,y=yk+1,x={x_{k}},\;y={y_{k+1}}, we obtain

k11μ2λxk+1yk+12(F(xk)+γk22λxkxk12)(F(xk+1)+12λxk+1xk2).\forall k\geq 1\quad\frac{{1-\mu}}{{2\lambda}}{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|^{2}}\leq\left({F\left({{x_{k}}}\right)+\frac{{\gamma_{k}^{2}}}{{2\lambda}}{{\left\|{{x_{k}}-{x_{k-1}}}\right\|}^{2}}}\right)-\left({F\left({{x_{k+1}}}\right)+\frac{1}{{2\lambda}}{{\left\|{{x_{k+1}}-{x_{k}}}\right\|}^{2}}}\right). (5)

Then, we can easily obtain k=n0xk+1yk+12<+\sum\nolimits_{k={n_{0}}}^{\infty}{{{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|}^{2}}}<+\infty since that 0γk10\leq\gamma_{k}\leq 1 holds for any kn0.k\geq n_{0}. Then, result 1) can be obtained since that increasing the finite term does not change the convergence of the series. Moreover, for any ϵ¯>0,\bar{\epsilon}>0, there exists a n1,n_{1}, which is sufficiently large, such that for any kn¯:=max(n0,n1),k\geq\bar{n}:=\max\left({{n_{0}},{n_{1}}}\right), xkyk<ϵ¯.\left\|{{x_{k}}-{y_{k}}}\right\|<\bar{\epsilon}. Setting ξ0=F(xn¯+1)+12λxn¯+1xn¯2.{\xi_{0}}=F\left({{x_{{\bar{n}}+1}}}\right)+\frac{1}{{2\lambda}}{\left\|{{x_{{\bar{n}}+1}}-{x_{{\bar{n}}}}}\right\|^{2}}. From Lemma 2.2 with λ<1Lf\lambda<\frac{1}{{{L_{f}}}} and the nonexpansiveness property of the proximal operator, we obtain that

T1Lf(xk)xk1λLfTλ(xk)xk=1λLfTλ(xk)Tλ(yk)(1+1λLf)xkyk,\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left({{x_{k}}}\right)-{x_{k}}}\right\|\leq\frac{1}{{\lambda{L_{f}}}}\left\|{{T_{\lambda}}\left({{x_{k}}}\right)-{x_{k}}}\right\|=\frac{1}{{\lambda{L_{f}}}}\left\|{{T_{\lambda}}\left({{x_{k}}}\right)-{T_{\lambda}}\left({{y_{k}}}\right)}\right\|\leq\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\left\|{{x_{k}}-{y_{k}}}\right\|, (6)

hence, T1Lf(xk)xk<(1+1λLf)ϵ¯\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left({{x_{k}}}\right)-{x_{k}}}\right\|<\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\bar{\epsilon} for any kn¯.k\geq\bar{n}. Also, it follows from (5) that for any kn¯,k\geq\bar{n}, {F(xk+1)+12λxk+1xk2}\left\{{F\left({{x_{k+1}}}\right)+\frac{1}{{2\lambda}}{{\left\|{{x_{k+1}}-{x_{k}}}\right\|}^{2}}}\right\} is non-increasing, then, F(xk)ξ0.F\left({{x_{k}}}\right)\leq{\xi_{0}}. Hence, combining with the Assumption A0A_{0}, we have for ξ0=F(xn¯+1)+12λxn¯+1xn¯2,{\xi_{0}}=F\left({{x_{{\bar{n}}+1}}}\right)+\frac{1}{{2\lambda}}{\left\|{{x_{{\bar{n}}+1}}-{x_{{\bar{n}}}}}\right\|^{2}}, there exist ϵ0:=(1+1λLf)ϵ¯{\epsilon_{0}}:=\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\bar{\epsilon} and τ0>0,{\tau_{0}}>0, such that

kn¯,dist(xk,X)τ0.\forall k\geq\bar{n},\quad{\rm{dist}}\left({{x_{k}},{X^{*}}}\right)\leq\tau_{0}. (7)

In addition, applying the inequality (2) at the point y=yk+1,y={y_{k+1}}, and xx be an xk+1Xx_{k+1}^{*}\in{X} such that dist(xk+1,X)=xk+1xk+1,{\rm{dist}}\left({{x_{k+1}},{X}}\right)=\left\|{{x_{k+1}}-x_{k+1}^{*}}\right\|, we obtain

F(xk+1)F(x)\displaystyle F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right) (8)
12λyk+1xk+1212λxk+1xk+12\displaystyle\leq\frac{1}{{2\lambda}}{\left\|{{y_{k+1}}-x_{k+1}^{*}}\right\|^{2}}-\frac{1}{{2\lambda}}{\left\|{{x_{k+1}}-x_{k+1}^{*}}\right\|^{2}}
=12λyk+1xk+12+1λyk+1xk+1,xk+1xk+1\displaystyle=\frac{1}{{2\lambda}}{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|^{2}}+\frac{1}{\lambda}\left\langle{{y_{k+1}}-{x_{k+1}},{x_{k+1}}-x_{k+1}^{*}}\right\rangle
12λyk+1xk+12+1λyk+1xk+1dist(xk+1,X).\displaystyle\leq\frac{1}{{2\lambda}}{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|^{2}}+\frac{1}{\lambda}\left\|{{y_{k+1}}-{x_{k+1}}}\right\|{\rm{dist}}\left({{x_{k+1}},{X}}\right).

Then, combining with yk+1xk+10\left\|{{y_{k+1}}-{x_{k+1}}}\right\|\to 0 by result 1) and (7), we have limkF(xk)=F(x).\mathop{\lim}\limits_{k\to\infty}F\left({{x_{k}}}\right)=F\left({{x_{*}}}\right).

The rest of this paper is based on the following assumption.

Assumption A1:A_{1}: (“Local error bound condition”, Tseng & Yun (2009)) For any ξF,\xi\geq{F_{*}}, there exist ε>0\varepsilon>0 and τ¯>0{\bar{\tau}}>0 such that

dist(x,X)τ¯T1Lf(x)x{\rm{dist}}\left({x,{X^{*}}}\right)\leq\bar{\tau}\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left(x\right)-x}\right\| (9)

whenever T1Lf(x)x<ε\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left(x\right)-x}\right\|<\varepsilon and F(x)ξ.F\left(x\right)\leq\xi.

As mentioned in Section 1, the tkt_{k} in FISTA accelerates convergence rate from O(1k)O\left({\frac{1}{k}}\right) to O(1k2)O\left({\frac{1}{{{k^{2}}}}}\right) for the function values and tkt_{k} in FISTA_CD improves the convergence rate to o(1k2).o\left({\frac{1}{{{k^{2}}}}}\right). Other options for tkt_{k} are considered in Attouch & Cabot (2018) and Apidopoulos & Aujol (2020). Hence, we see that tkt_{k} is the crucial factor to guarantee the convergence of the iterates or to improve the rate of convergence for the function values. Apidopoulos et al. in Apidopoulos & Aujol (2020) points that if tkt_{k} satisfies the Nesterov’s rule, then one can obtain a better convergence rate. However, we notice that the Nesterov’s rule will limit the choice of tkt_{k} greatly. In the following, we present a new Assumption A2A_{2} for tk,t_{k}, which helps us to obtain some new options of tk,t_{k}, and analyze the convergence of iterates and convergence rate of the function values for the Algorithm 1 with a class of abstract tkt_{k} satisfied Assumption A2A_{2} under the local error bound condition.

Assumption A2:A_{2}: There exists a positive constant 0<σ10<\sigma\leq 1 such that limkkσ(tk+1tk1)=c,\mathop{\lim}\limits_{k\to\infty}k^{\sigma}\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=c, where c>0.c>0.

Remark 3. It follows that γk]0,1[,{\gamma_{k}}\in\left]{0,1}\right[, for any kk is sufficiently large, limktk=+{\mathop{\lim}\limits_{k\to\infty}{t_{k}}=+\infty} and limktk+1tk=1{\mathop{\lim}\limits_{k\to\infty}\frac{{{t_{k+1}}}}{{{t_{k}}}}=1} from Assumptions A2A_{2}. (It is easy to verify that tkt_{k} in FISTA and tkt_{k} in FISTA_CD both satisfy the Assumption A2A_{2} by choosing σ=1\sigma=1, also, we can see that there exist some tkt_{k}, which satisfy or do not satisfy Nesterov’s rule, satisfy Assumption A2A_{2} (See Section 3))

Lemma 2.6.

Suppose that Assumptions A1{A_{1}} and A2A_{2} hold. Let {xk}\left\{{{x_{k}}}\right\} be generated by Algorithm 1 and xX.{x_{*}}\in{X}. Then, there exists a constant τ1>0\tau_{1}>0 such that

k1,F(xk+1)F(x)τ1λyk+1xk+12.\forall k\geq 1,\quad F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)\leq\frac{{{\tau_{1}}}}{\lambda}{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|^{2}}.
Proof 2.7.

Since γk]0,1[{\gamma_{k}}\in\left]{0,1}\right[ holds for kn0\forall k\geq n_{0} by Assumption A2,A_{2}, then, similar with the proof of Theorem 2.4, for ξ0=F(xn¯+1)+12λxn¯+1xn¯2,{\xi_{0}}=F\left({{x_{{\bar{n}}+1}}}\right)+\frac{1}{{2\lambda}}{\left\|{{x_{{\bar{n}}+1}}-{x_{{\bar{n}}}}}\right\|^{2}}, there exist ϵ0:=(1+1λLf)ϵ¯{\epsilon_{0}}:=\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\bar{\epsilon} and τ0>0,{\tau_{0}}>0, such that

dist(xk,X)τ¯T1Lf(xk)xkτ¯(1+1λLf)xkyk2τ¯μxkyk,{\rm{dist}}\left({{x_{k}},{X^{*}}}\right)\leq\bar{\tau}\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left({{x_{k}}}\right)-{x_{k}}}\right\|\leq\bar{\tau}\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\left\|{{x_{k}}-{y_{k}}}\right\|\leq\frac{{2\bar{\tau}}}{\mu}\left\|{{x_{k}}-{y_{k}}}\right\|, (10)

where the second inequality of (10) follows from (6) and the third one follows the fact that λ=μLf\lambda=\frac{\mu}{{{L_{f}}}} with μ(0,1).\mu\in\left({0,1}\right). In addition, it follows from (2.7) that

F(xk+1)F(x)12λyk+1xk+12+1λyk+1xk+1dist(xk+1,X)1λ(2τ¯μ+12)yk+1xk+12,kn¯.F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)\leq\frac{1}{{2\lambda}}{{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|}^{2}}+\frac{1}{\lambda}\left\|{{y_{k+1}}-{x_{k+1}}}\right\|{\rm{dist}}\left({{x_{k+1}},{X}}\right)\\ \leq\frac{1}{\lambda}\left({\frac{{2\bar{\tau}}}{\mu}+\frac{1}{2}}\right){{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|}^{2}},\quad\forall k\geq\bar{n}. (11)

Also, we can find a constant c>0c>0 such that for 1kn¯1,\forall 1\leq k\leq{\bar{n}}-1, F(xk+1)F(x)cλyk+1xk+12.F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)\leq\frac{c}{\lambda}{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|^{2}}. Therefore, there exists a τ1max{2τ¯μ+12,c}{\tau_{1}}\geq\max\left\{{\frac{{2\bar{\tau}}}{\mu}+\frac{1}{2},c}\right\} such that the conclusion holds.

Here, we introduce a new way, which we called “comparison method”, that considers a sequence {αk}\left\{{{\alpha_{k}}}\right\} such that αk=sk1sk+1γk,{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}}, where {sk}\left\{{{s_{k}}}\right\} is a nonnegative sequence, to estimate the bounds of objective function and the local variation of the iterates.

Lemma 2.8.

Suppose that there exists a nonnegative sequence {sk}\left\{{{s_{k}}}\right\} such that αk=sk1sk+1γk,{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}}, for kk is sufficiently large, and γk=tk1tk+1,{\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}}, tkt_{k} satisfies the Assumption A2.A_{2}. Then, we have limksk=+,\mathop{\lim}\limits_{k\to\infty}{s_{k}}=+\infty, and limsupksk+12sk2sk20.\mathop{\lim\sup}\limits_{k\to\infty}\frac{{s_{k+1}^{2}-s_{k}^{2}}}{{s_{k}^{2}}}\leq 0.

Proof 2.9.

See the detailed proof in Appendix A.

Theorem 2.10.

Suppose that Assumptions A1{A_{1}} and A2{A_{2}} hold and there exists a nonnegative sequence {sk}\left\{{{s_{k}}}\right\} such that αk=sk1sk+1γk,{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq\gamma_{k}, for kk is sufficiently large. Then, we have that F(xk+1)F(x)=o(1sk+12)F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{s_{k+1}^{2}}}}\right) and xk+1xk=O(1sk+1).{\left\|{{x_{k+1}}-{x_{k}}}\right\|}=O\left({\frac{1}{{s_{k+1}}}}\right). Further, if k=11sk+1\sum\limits_{k=1}^{\infty}{\frac{1}{{s_{k+1}}}} is convergent, then the iterates {xk}\left\{{{x_{k}}}\right\} converges strongly to a minimizer of F.F.

Proof 2.11.

Denote that Ek=sk+12(F(xk)F(x))+sk22λxkxk12.{E_{k}}=s_{k+1}^{2}\left({F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{s_{k}^{2}}}{{2\lambda}}{\left\|{{x_{k}}-{x_{k-1}}}\right\|^{2}}. Applying (5), we have

F(xk+1)F(x)+1μ2λxk+1yk+12+12λxk+1xk2F(xk)F(x)+γk22λxkxk12.F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)+\frac{{1-\mu}}{{2\lambda}}{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|^{2}}+\frac{1}{{2\lambda}}{\left\|{{x_{k+1}}-{x_{k}}}\right\|^{2}}\leq F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)+\frac{{\gamma_{k}^{2}}}{{2\lambda}}{\left\|{{x_{k}}-{x_{k-1}}}\right\|^{2}}.

By the assumption condition, we have γk2αk2\gamma_{k}^{2}\leq\alpha_{k}^{2} for any kk is sufficiently large, then,

F(xk+1)F(x)+1μ2λxk+1yk+12+12λxk+1xk2F(xk)F(x)+αk22λxkxk12.F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)+\frac{{1-\mu}}{{2\lambda}}{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|^{2}}+\frac{1}{{2\lambda}}{\left\|{{x_{k+1}}-{x_{k}}}\right\|^{2}}\leq F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)+\frac{{\alpha_{k}^{2}}}{{2\lambda}}{\left\|{{x_{k}}-{x_{k-1}}}\right\|^{2}}.

Multiplying by sk+12,s_{k+1}^{2}, we have

Ek+1+(sk+12sk+22)(F(xk+1)F(x))+1μ2λsk+12xk+1yk+12\displaystyle{E_{k+1}}+\left({s_{k+1}^{2}-s_{k+2}^{2}}\right)\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{1-\mu}}{{2\lambda}}s_{k+1}^{2}{{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|}^{2}}
sk+12(F(xk)F(x))+(sk1)22λxkxk12\displaystyle\leq s_{k+1}^{2}\left({F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{{{\left({{s_{k}}-1}\right)}^{2}}}}{{2\lambda}}{{\left\|{{x_{k}}-{x_{k-1}}}\right\|}^{2}}
=sk+12(F(xk)F(x))+sk22λxkxk122sk12λxkxk12Ek.\displaystyle=s_{k+1}^{2}\left({F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{s_{k}^{2}}}{{2\lambda}}{{\left\|{{x_{k}}-{x_{k-1}}}\right\|}^{2}}-\frac{{2{s_{k}}-1}}{{2\lambda}}{{\left\|{{x_{k}}-{x_{k-1}}}\right\|}^{2}}\leq{E_{k}}.

Then, combining with the Lemma 2.6, we have

Ek+1+((sk+12sk+22)sk+12+1μ2τ1)sk+12(F(xk+1)F(x))Ek.{E_{k+1}}+\left({\frac{{\left({s_{k+1}^{2}-s_{k+2}^{2}}\right)}}{{s_{k+1}^{2}}}+\frac{{1-\mu}}{{2{\tau_{1}}}}}\right)s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)\leq{E_{k}}. (12)

Since that limsupksk+22sk+12sk+120\mathop{\lim\sup}\limits_{k\to\infty}\frac{{s_{k+2}^{2}-s_{k+1}^{2}}}{{s_{k+1}^{2}}}\leq 0 from Lemma 2.8, we have sk+12sk+22sk+121μ4τ1\frac{{s_{k+1}^{2}-s_{k+2}^{2}}}{{s_{k+1}^{2}}}\geq-\frac{{1-\mu}}{{4{\tau_{1}}}} for kk is large sufficiently, then, (12) can be deduce that for any kk0,k\geq k_{0}, where k0k_{0} is sufficiently large,

Ek+1+1μ4τ1sk+12(F(xk+1)F(x))Ek,{E_{k+1}}+\frac{{1-\mu}}{{4{\tau_{1}}}}s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)\leq{E_{k}}, (13)

i.e., k=k0+sk+12(F(xk+1)F(x))<+.\sum\nolimits_{k={k_{0}}}^{+\infty}{s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)}<+\infty. Since that increasing the finite term does not change the convergence of the series, we can easy to obtain that k=1sk+12(F(xk+1)F(x))\sum\limits_{k=1}^{\infty}{s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)} is convergent. Hence, F(xk+1)F(x)=o(1sk+12)F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{s_{k+1}^{2}}}}\right) holds ture.

Further, since that {Ek}\left\{{{E_{k}}}\right\} is convergent from (13), we have {sk+12xk+1xk2}\left\{{s_{k+1}^{2}{{\left\|{{x_{k+1}}-{x_{k}}}\right\|}^{2}}}\right\} is bounded, which means that xk+1xkO(1sk+1),{\left\|{{x_{k+1}}-{x_{k}}}\right\|}\leq O\left({\frac{1}{{s_{k+1}}}}\right), i.e., there exists a constant c1>0c_{1}>0 such that xk+1xkc1sk+1.\left\|{{x_{k+1}}-{x_{k}}}\right\|\leq\frac{c_{1}}{{{s_{k+1}}}}. Recalling the assumption that k=11sk+1\sum\limits_{k=1}^{\infty}{\frac{1}{{s_{k+1}}}} is convergent, we can deduce that the sequence {xk}\left\{{{x_{k}}}\right\} is a Cauchy series. Suppose that limkxk=x¯,\mathop{\lim}\limits_{k\to\infty}{x_{k}}=\bar{x}, we conclude that {xk}\left\{{{x_{k}}}\right\} strongly converges to x¯X\bar{x}\in{X^{*}} since FF is lower semi-continuous convex.

3 The sublinear convergence rates of IFB algorithms with special tkt_{k}

In the following, we show the improved convergence rates for the IFBs with six special tkt_{k} satisfying the Assumption 2.

Case 1. tk=e(k1)α,0<α<1.{t_{k}}={e^{{{\left({k-1}\right)}^{\alpha}}}},0<\alpha<1.

Corollary 3.1.

Suppose that Assumption A1A_{1} holds. Let {xk}\left\{{{x_{k}}}\right\} be generated by Algorithm 1 with tkt_{k} in Case 1 and xX.{x_{*}}\in{X}. Then, we have
1) F(xk)F(x)=o(1e2(k1)α)F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{e^{2{{\left({k-1}\right)}^{\alpha}}}}}}}\right) and xkxk1=O(1e(k1)α).\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{e^{{{\left({k-1}\right)}^{\alpha}}}}}}}\right).
2) {xk}\left\{{{x_{k}}}\right\} is converges sublinearly to x¯X\bar{x}\in{X^{*}} at the O((k1)α1α1e(k1)α)O\left({{{\left({k-1}\right)}^{\alpha\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil}}{e^{-{{\left({k-1}\right)}^{\alpha}}}}}\right) rate of convergence.

Proof 3.2.

We can easily verify that limkk1α(tk+1tk1)=α,\mathop{\lim}\limits_{k\to\infty}{k^{1-\alpha}}\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=\alpha, which means that Assumption A2A_{2} holds. By setting sk=tk,{s_{k}}={t_{k}}, we conclude that the result 1) is satisfied from Theorem 2.10. It follows from the result 1) that there exists a positive constant cc^{\prime} such that xkxk1ce(k1)α,\left\|{{x_{k}}-{x_{k-1}}}\right\|\leq\frac{{c^{\prime}}}{{{e^{{{\left({k-1}\right)}^{\alpha}}}}}}, we can deduce that

p>1,xk+pxki=k+1k+pxixi1i=k+1k+pce(k1)αckk+pe(x1)α𝑑x.\forall p>1,\quad\left\|{{x_{k+p}}-{x_{k}}}\right\|\leq\sum\limits_{i=k+1}^{k+p}{\left\|{{x_{i}}-{x_{i-1}}}\right\|}\leq\sum\limits_{i=k+1}^{k+p}{\frac{{c^{\prime}}}{{{e^{{{\left({k-1}\right)}^{\alpha}}}}}}}\leq c^{\prime}\int_{k}^{k+p}{{e^{-{{\left({x-1}\right)}^{\alpha}}}}}dx.

Since the convergence of 1+e(x1)α𝑑x,\int_{1}^{+\infty}{{e^{-{{\left({x-1}\right)}^{\alpha}}}}}dx, we see that i=1+xixi1\sum\limits_{i=1}^{+\infty}{\left\|{{x_{i}}-{x_{i-1}}}\right\|} is convergent, which means that {xk}{\left\{{{x_{k}}}\right\}} is a Cauchy series and converges strongly to x¯X.\bar{x}\in{X}. Then, as p,p\to\infty, we have

xkx¯ck+e(x1)α𝑑x=c(k1)α+1αy1α1ey𝑑ycα(k1)α+y1α1ey𝑑y.{\left\|{{x_{k}}-\bar{x}}\right\|\leq c^{\prime}\int_{k}^{+\infty}{{e^{-{{\left({x-1}\right)}^{\alpha}}}}}dx=c^{\prime}\int_{{{\left({k-1}\right)}^{\alpha}}}^{+\infty}{\frac{1}{\alpha}{y^{\frac{1}{\alpha}-1}}{e^{-y}}}dy\leq\frac{{c^{\prime}}}{\alpha}\int_{{{\left({k-1}\right)}^{\alpha}}}^{+\infty}{{y^{\left\lceil{\frac{1}{\alpha}-1}\right\rceil}}{e^{-y}}}dy.}

Denote ω=1α1\omega=\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil and A=(k1)α.A={\left({k-1}\right)^{\alpha}}. We can deduce that

xkx¯cαA+yωey𝑑y=cαAωeA+cαj=0ω1((i=0j(ωi))(Aωj1eA))=O(AωeA),\left\|{{x_{k}}-\bar{x}}\right\|\leq\frac{{c^{\prime}}}{\alpha}\int_{A}^{+\infty}{{y^{\omega}}{e^{-y}}dy}=\frac{{c^{\prime}}}{\alpha}{A^{\omega}}{e^{-A}}+\frac{{c^{\prime}}}{\alpha}\sum\limits_{j=0}^{\omega-1}{\left({\left({\prod\limits_{i=0}^{j}{\left({\omega-i}\right)}}\right)\left({{A^{\omega-j-1}}{e^{-A}}}\right)}\right)}=O\left({{A^{\omega}}{e^{-A}}}\right),

which means that

xkx¯=O((k1)α1α1e(k1)α).\left\|{{x_{k}}-\bar{x}}\right\|=O\left({{{\left({k-1}\right)}^{\alpha\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil}}{e^{-{{\left({k-1}\right)}^{\alpha}}}}}\right).

Hence, result 2) holds.

Remark 4. Notice that p>1,(k1)α1α1e(k1)α=o(1kp),\forall p>1,\;{\left({k-1}\right)^{\alpha\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil}}{e^{-{{\left({k-1}\right)}^{\alpha}}}}=o\left({\frac{1}{{{k^{p}}}}}\right), which means that the sublinear convergence rate of the IFB with the tkt_{k} in Case 1 is faster than any order.

Case 2. tk=kr1+aa(r>1,a>0).{t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r>1},a>0\right).

Corollary 3.3.

Suppose that Assumption A1A_{1} holds. Let {xk}\left\{{{x_{k}}}\right\} be generated by Algorithm 1 with tkt_{k} in Case 2 and xX.{x_{*}}\in{X}. Then, we have
1) F(xk)F(x)=o(1k2r)F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2r}}}}}\right) and xkxk1=O(1kr).\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{r}}}}}\right).
2) {xk}\left\{{{x_{k}}}\right\} is converges sublinearly to x¯X\bar{x}\in{X^{*}} at the O(1kr1)O\left({\frac{1}{{{k^{r-1}}}}}\right) rate of convergence.

Proof 3.4.

It is easy to verify that limkk(tk+1tk1)=r\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=r, which means that Assumption A2A_{2} holds. Setting sk=tk,{s_{k}}={t_{k}}, then, combining with limkskkr=1a\mathop{\lim}\limits_{k\to\infty}\frac{{{s_{k}}}}{{{k^{r}}}}=\frac{1}{a} and k=11sk\sum\limits_{k=1}^{\infty}{\frac{1}{{{s_{k}}}}} is convergent, we can deduce that the result 1) holds and {xk}\left\{{{x_{k}}}\right\} converges strongly to x¯X\bar{x}\in{X} by Theorem 2.10. It follows from the result 1) that there exists a positive constant c′′c^{\prime\prime} such that xkxk1c′′kr.\left\|{{x_{k}}-{x_{k-1}}}\right\|\leq\frac{{c^{\prime\prime}}}{{{k^{r}}}}. Then, we can deduce that

p>1,xk+pxki=k+1k+pxixi1c′′i=k+1k+p1krc′′kk+p1xr𝑑x.\forall p>1,\quad\left\|{{x_{k+p}}-{x_{k}}}\right\|\leq\sum\limits_{i=k+1}^{k+p}{\left\|{{x_{i}}-{x_{i-1}}}\right\|}\leq c^{\prime\prime}\sum\limits_{i=k+1}^{k+p}{\frac{1}{{{k^{r}}}}}\leq c^{\prime\prime}\int_{k}^{k+p}{\frac{1}{{{x^{r}}}}}dx.

Then,

xkx¯c′′r11kr1,asp.\left\|{{x_{k}}-\bar{x}}\right\|\leq\frac{{c^{\prime\prime}}}{{r-1}}\frac{1}{{{k^{r-1}}}},\;as\;p\to\infty.

Hence, result 2) holds.

Remark 5. For the tkt_{k} in Case 2, we show that the convergence rate of function values and iterates related to the value of rr. The larger r,r, the better convergence rate Algorithm 1 achieves.

Case 3. tk=kr1+aa(r<1,a>0).{t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r<1,a>0}\right).

Corollary 3.5.

Suppose that Assumption A1A_{1} holds. Let {xk}\left\{{{x_{k}}}\right\} be generated by Algorithm 1 with tkt_{k} in Case 3 and xX.{x_{*}}\in{X}. Then, for any positive constant p>1,p>1,
1) F(xk)F(x)=o(1kp)F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{p}}}}}\right) and xkxk1=O(1kp)\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{p}}}}}\right).
2) {xk}\left\{{{x_{k}}}\right\} sublinearly converges to x¯X\bar{x}\in{X^{*}} at the O(1kp1)O\left({\frac{1}{{{k^{p-1}}}}}\right) rate of convergence.

Proof 3.6.

It is easy to verify that limkk(tk+1tk1)=r\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=r, which means that Assumption A2A_{2} holds. Since limkkr(γk1)=limkkr(tk+1tk+1)tk+1=limkkr((k+1)rkr+a)(k+1)r1+a=a,\mathop{\lim}\limits_{k\to\infty}{k^{r}}\left({{\gamma_{k}}-1}\right)=\mathop{\lim}\limits_{k\to\infty}\frac{{-{k^{r}}\left({{t_{k+1}}-{t_{k}}+1}\right)}}{{{t_{k+1}}}}=\mathop{\lim}\limits_{k\to\infty}\frac{{-{k^{r}}\left({{{\left({k+1}\right)}^{r}}-{k^{r}}+a}\right)}}{{{{\left({k+1}\right)}^{r}}-1+a}}=-a, we have

γk=1akr+o(1kr),ask+.\displaystyle{\gamma_{k}}=1-\frac{a}{{{k^{r}}}}+o\left({\frac{1}{{{k^{r}}}}}\right),\;as\;k\to+\infty. (14)

For any positive constant p>1,p>1, denote that s1=1s_{1}=1 and sk=(k1)p,k2.{s_{k}}={\left({k-1}\right)^{p}},\;k\geq 2. Then, we have

αk=sk1sk+1=(11k)p1kp=1pk+o(1k),ask+.\displaystyle{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}={\left({1-\frac{1}{k}}\right)^{p}}-\frac{1}{{{k^{p}}}}=1-\frac{p}{k}+o\left({\frac{1}{k}}\right),\;as\;k\to+\infty. (15)

By (14) and (15), we obtain that

limkkr(αkγk)=limkkr(pk+akr+o(1kr)+o(1k))=a>0,\displaystyle\mathop{\lim}\limits_{k\to\infty}{k^{r}}\left({{\alpha_{k}}-{\gamma_{k}}}\right)=\mathop{\lim}\limits_{k\to\infty}{k^{r}}\left({-\frac{p}{k}+\frac{a}{{{k^{r}}}}+o\left({\frac{1}{{{k^{r}}}}}\right)+o\left({\frac{1}{k}}\right)}\right)=a>0, (16)

which implies that αkγk{{\alpha_{k}}\geq{\gamma_{k}}} for kk is large sufficiently. Hence, using skkp{s_{k}}\sim{k^{p}} and Theorem 2.10, result 1) holds and {xk}\left\{{{x_{k}}}\right\} converges strongly to x¯X.\bar{x}\in{X}. Similar with the proof of Corollary 3.3, we conclude result 2).

Case 4. t1=1{t_{1}}=1 and tk=klnθk(θ>0),{t_{k}}=\frac{k}{{{{\ln}^{\theta}}k}}\left({\theta>0}\right), k2.\forall k\geq 2.

Corollary 3.7.

Suppose that Assumption A1A_{1} holds. Let {xk}\left\{{{x_{k}}}\right\} be generated by Algorithm 1 with tkt_{k} in Case 4 and xX.{x_{*}}\in{X}. Then, for any positive constant p2,p\geq 2,
1) F(xk)F(x)=o(1k2p)F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2p}}}}}\right) and xkxk1=O(1kp)\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{p}}}}}\right).
2) {xk}\left\{{{x_{k}}}\right\} sublinearly converges to x¯X\bar{x}\in{X^{*}} at the O(1kp1)O\left({\frac{1}{{{k^{p-1}}}}}\right) rate of convergence.

Proof 3.8.

We can prove that limkk(tk+1tk1)=1,\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=1, which means that Assumption A2A_{2} holds. Observe that

γk=tktk+11tk+1=(11k+1)(ln(k+1)lnk)θlnθ(k+1)k+1=(11k+1)(1+ln(1+1k)lnk)θlnθ(k+1)k+1=(11k+1)(1+θln(1+1k)lnk+o(ln(1+1k)lnk))lnθ(k+1)k+1=(11k+1)(1+o(1k+1))lnθ(k+1)k+1=11k+1lnθ(k+1)k+1+o(1k+1).\begin{array}[]{l}{\gamma_{k}}=\frac{{{t_{k}}}}{{{t_{k+1}}}}-\frac{1}{{{t_{k+1}}}}=\left({1-\frac{1}{{k+1}}}\right){\left({\frac{{\ln\left({k+1}\right)}}{{\ln k}}}\right)^{\theta}}-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =\left({1-\frac{1}{{k+1}}}\right){\left({1+\frac{{\ln\left({1+\frac{1}{k}}\right)}}{{\ln k}}}\right)^{\theta}}-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =\left({1-\frac{1}{{k+1}}}\right)\left({1+\frac{{\theta\ln\left({1+\frac{1}{k}}\right)}}{{\ln k}}+o\left({\frac{{\ln\left({1+\frac{1}{k}}\right)}}{{\ln k}}}\right)}\right)-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =\left({1-\frac{1}{{k+1}}}\right)\left({1+o\left({\frac{1}{{k+1}}}\right)}\right)-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =1-\frac{1}{{k+1}}-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}+o\left({\frac{1}{{k+1}}}\right).\end{array}

Denote that sk=kp,{s_{k}}={k^{p}}, where p2.p\geq 2. Then, we easily obtain that

αk=sk1sk+1=(11k+1)p1(k+1)p=1pk+1+o(1k+1).{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}={\left({1-\frac{1}{{k+1}}}\right)^{p}}-\frac{1}{{{{\left({k+1}\right)}^{p}}}}=1-\frac{p}{{k+1}}+o\left({\frac{1}{{k+1}}}\right).

Hence, the condition αkγk{{\alpha_{k}}\geq{\gamma_{k}}} holds true for kk is large sufficiently. Using Theorem 2.10, result 1) holds. And similar with the proof of Corollary 3.3, result 2) holds.

Case 5. t1=1t_{1}=1 and tk+1=1+1+4tk22,{t_{k+1}}=\frac{{1+\sqrt{1+4t_{k}^{2}}}}{2}, k1.\forall k\geq 1.

Corollary 3.9.

Suppose that Assumption A1A_{1} holds. Let {xk}\left\{{{x_{k}}}\right\} be generated by Algorithm 1 with tkt_{k} in Case 5 and xX.{x_{*}}\in{X}. Then,
1) F(xk)F(x)=o(1k6)F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{6}}}}}\right) and xkxk1=O(1k3)\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{3}}}}}\right).
2) {xk}\left\{{{x_{k}}}\right\} sublinearly converges to x¯X\bar{x}\in{X^{*}} at the O(1k2)O\left({\frac{1}{{{k^{2}}}}}\right) rate of convergence.

Proof 3.10.

We can easily obtain that limkk(tk+1tk1)=1,\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=1, which means that Assumption A2A_{2} holds. Observe that

γk=tk1tk+1=1tk+1tk12tk+132tk+1.\displaystyle{\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}}=1-\frac{{{t_{k+1}}-{t_{k}}-\frac{1}{2}}}{{{t_{k+1}}}}-\frac{3}{{2{t_{k+1}}}}. (17)

Since that limktk(tk+1tk12)=limktk(1+1+4tk22tk12)=18\mathop{{\rm{lim}}}\limits_{k\to\infty}{t_{k}}\left({{t_{k+1}}-{t_{k}}-\frac{1}{2}}\right)=\mathop{{\rm{lim}}}\limits_{k\to\infty}{t_{k}}\left({\frac{{1+\sqrt{1+4t_{k}^{2}}}}{2}-{t_{k}}-\frac{1}{2}}\right)=\frac{1}{8} and limktkk=12,\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{{t_{k}}}}{k}=\frac{1}{2}, we can deduce that

limkk2(tk+1tk12tk+1)=12,\mathop{{\rm{lim}}}\limits_{k\to\infty}{k^{2}}\left({\frac{{{t_{k+1}}-{t_{k}}-\frac{1}{2}}}{{{t_{k+1}}}}}\right)=\frac{1}{2},

which means that

tk+1tk12tk+1=12k2+o(1k2),ask+.\displaystyle\frac{{{t_{k+1}}-{t_{k}}-\frac{1}{2}}}{{{t_{k+1}}}}=\frac{1}{{2{k^{2}}}}+o\left({\frac{1}{{{k^{2}}}}}\right),\;as\;k\to+\infty. (18)

By Stolz theorem, we obtain

limk12ktk+1lnk=limk12(tk+2tk+1)ln(1+1k)=limkktk+1tk+1(tk+2tk+112)=14,\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{\frac{1}{2}k-{t_{k+1}}}}{{\ln k}}=\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{\frac{1}{2}-\left({{t_{k+2}}-{t_{k+1}}}\right)}}{{\ln\left({1+\frac{1}{k}}\right)}}=\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{-k}}{{{t_{k+1}}}}{t_{k+1}}\left({{t_{k+2}}-{t_{k+1}}-\frac{1}{2}}\right)=-\frac{1}{4},

then,

limk32tk+13klnkk2=limk3ktk+1(12ktk+1lnk)=32,\mathop{\lim}\limits_{k\to\infty}\frac{{\frac{3}{{2{t_{k+1}}}}-\frac{3}{k}}}{{\frac{{\ln k}}{{{k^{2}}}}}}=\mathop{\lim}\limits_{k\to\infty}3\frac{k}{{{t_{k+1}}}}\left({\frac{{\frac{1}{2}k-{t_{k+1}}}}{{\ln k}}}\right)=-\frac{3}{2},

which means that

32tk+1=3k32lnkk2+o(lnkk2),ask+.\displaystyle\frac{3}{{2{t_{k+1}}}}=\frac{3}{k}-\frac{3}{2}\frac{{\ln k}}{{{k^{2}}}}+o\left({\frac{{\ln k}}{{{k^{2}}}}}\right),\;as\;k\to+\infty. (19)

Hence, by (17)–(19), we have

γk=13k+3lnk2k2+o(lnkk2),ask+.\displaystyle{\gamma_{k}}=1-\frac{3}{k}+\frac{{3\ln k}}{{2{k^{2}}}}+o\left({\frac{{\ln k}}{{{k^{2}}}}}\right),\;as\;k\to+\infty. (20)

Denote that s1=s2=1{s_{1}}={s_{2}}=1 and sk=(k1)3(1k1lnxx2𝑑x)2,k3.{s_{k}}=\frac{{{{\left({k-1}\right)}^{3}}}}{{{{\left({\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}},\;\forall k\geq 3. Then, since that 1+lnxx2𝑑x=1\int_{1}^{+\infty}{\frac{{\ln x}}{{{x^{2}}}}dx}=1 and (k1klnxx2𝑑x)2k3=o(lnkk2),\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}=o\left({\frac{{\ln k}}{{{k^{2}}}}}\right), we have

αk\displaystyle{\alpha_{k}} =sk1sk+1\displaystyle=\frac{{{s_{k}}-1}}{{{s_{k+1}}}} (21)
=(11k)3(1+k1klnxx2𝑑x1k1lnxx2𝑑x)2(k1klnxx2𝑑x)2k3\displaystyle={{\left({1-\frac{1}{k}}\right)}^{3}}{{\left({1+\frac{{\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}}{{\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{2}}}}dx}}}}\right)}^{2}}-\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}
(11k)3(1+lnkk21+lnxx2𝑑x)2(k1klnxx2𝑑x)2k3\displaystyle\geq{{\left({1-\frac{1}{k}}\right)}^{3}}{{\left({1+\frac{{\frac{{\ln k}}{{{k^{2}}}}}}{{\int_{1}^{+\infty}{\frac{{\ln x}}{{{x^{2}}}}dx}}}}\right)}^{2}}-\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}
=(11k)3(1+lnkk2)2(k1klnxx2𝑑x)2k3\displaystyle={{\left({1-\frac{1}{k}}\right)}^{3}}{{\left({1+\frac{{\ln k}}{{{k^{2}}}}}\right)}^{2}}-\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}
=13k+2lnkk2+o(lnkk2),ask+.\displaystyle=1-\frac{3}{k}+\frac{{2\ln k}}{{{k^{2}}}}+o\left({\frac{{\ln k}}{{{k^{2}}}}}\right),\;as\;k\to+\infty.

Obviously, by (20) and (21), we have limkαkγklnkk212.\mathop{\lim}\limits_{k\to\infty}\frac{{{\alpha_{k}}-{\gamma_{k}}}}{{\frac{{\ln k}}{{{k^{2}}}}}}\geq\frac{1}{2}. Therefore, αkγk\alpha_{k}\geq\gamma_{k} for kk is large sufficiently. Using the fact that skk3{s_{k}}\sim{k^{3}} and Theorem 2.10, we conclude the result 1) and {xk}\left\{{{x_{k}}}\right\} converges strongly to x¯X.\bar{x}\in{X}. Further, similar with the proof of Corollary 3.3, result 2) holds.

Case 6. tk=k1+aa(a>0).{t_{k}}=\frac{{k-1+a}}{a}\;\left({a>0}\right).

Corollary 3.11.

Suppose that Assumption A1A_{1} holds. Let {xk}\left\{{{x_{k}}}\right\} be generated by Algorithm 1 with tkt_{k} in Case 6 and xX.{x_{*}}\in{X}. Then,
1) F(xk)F(x)=o(1k2(a+1))F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2\left({a+1}\right)}}}}}\right) and xkxk1=O(1ka+1)\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{a+1}}}}}\right).
2) {xk}\left\{{{x_{k}}}\right\} sublinearly converges to x¯X\bar{x}\in{X^{*}} at the O(1ka)O\left({\frac{1}{{{k^{a}}}}}\right) rate of convergence.

Proof 3.12.

It is easy to verify that limkk(tk+1tk1)=1\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=1, which means that Assumption A2A_{2} holds. Observe that γk=tk1tk+1=k1k+a.{\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}}=\frac{{k-1}}{{k+a}}. For a1,a\geq 1, denote sk=(k+a1)a+1.{s_{k}}={\left({k+a-1}\right)^{a+1}}. Otherwise, denote s1=s2=1,{s_{1}}={s_{2}}=1, and sk=(k+a1)a+11k1lnxx1+a𝑑x,k3.{s_{k}}=\frac{{{{\left({k+a-1}\right)}^{a+1}}}}{{\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}},\;\forall k\geq 3.

1) For the case a>1,a>1, we have

αk=(k+a1)a+11(k+a)a+1=(11k+a)a+11(k+a)a+1=1a+1k+a+a(a+1)21(k+a)2+o(1(k+a)2)1(k+a)a+1=γk+a(a+1)21(k+a)2+o(1(k+a)2),ask+.\displaystyle\begin{array}[]{l}{\alpha_{k}}=\frac{{{{\left({k+a-1}\right)}^{a+1}}-1}}{{{{\left({k+a}\right)}^{a+1}}}}={\left({1-\frac{1}{{k+a}}}\right)^{a+1}}-\frac{1}{{{{\left({k+a}\right)}^{a+1}}}}\\ \quad\;\;=1-\frac{{a+1}}{{k+a}}+\frac{{a\left({a+1}\right)}}{2}\frac{1}{{{{\left({k+a}\right)}^{2}}}}+o\left({\frac{1}{{{{\left({k+a}\right)}^{2}}}}}\right)-\frac{1}{{{{\left({k+a}\right)}^{a+1}}}}\\ \quad\;\;={\gamma_{k}}+\frac{{a\left({a+1}\right)}}{2}\frac{1}{{{{\left({k+a}\right)}^{2}}}}+o\left({\frac{1}{{{{\left({k+a}\right)}^{2}}}}}\right),as\;k\to+\infty.\end{array} (25)

which means that limk(k+a)2(αkγk)=a(a+1)2>0,\mathop{\lim}\limits_{k\to\infty}{\left({k+a}\right)^{2}}\left({{\alpha_{k}}-{\gamma_{k}}}\right)=\frac{{a\left({a+1}\right)}}{2}>0, i.e., αk>γk{{\alpha_{k}}>{\gamma_{k}}} for kk is large sufficiently.

2) For the case a=1,a=1, we have

αk=k21(k+1)2=(11k+1)21(k+1)2=γk.\displaystyle{\alpha_{k}}=\frac{{{k^{2}}-1}}{{{{\left({k+1}\right)}^{2}}}}={\left({1-\frac{1}{{k+1}}}\right)^{2}}-\frac{1}{{{{\left({k+1}\right)}^{2}}}}={\gamma_{k}}. (26)

Obviously, αkγk{{\alpha_{k}}\geq{\gamma_{k}}} for any k1.k\geq 1.

3) For the case a<1,a<1, we have

αk\displaystyle{\alpha_{k}} =sk1sk+1=(11k+a)a+1(1+k1klnxx1+a𝑑x1k1lnxx1+a𝑑x)1klnxx1+a𝑑x(k+a)a+1\displaystyle=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}={{\left({1-\frac{1}{{k+a}}}\right)}^{a+1}}\left({1+\frac{{\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}}\right)-\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}} (27)
(11k+a)a+1(1+lnkk1+a1+lnxx1+a𝑑x)1klnxx1+a𝑑x(k+a)a+1\displaystyle\geq{{\left({1-\frac{1}{{k+a}}}\right)}^{a+1}}\left({1+\frac{{\frac{{\ln k}}{{{k^{1+a}}}}}}{{\int_{1}^{+\infty}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}}\right)-\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}}
=(11k+a)a+1(1+a2lnkk1+a)1klnxx1+a𝑑x(k+a)a+1,ask+.\displaystyle={{\left({1-\frac{1}{{k+a}}}\right)}^{a+1}}\left({1+\frac{{{a^{2}}\ln k}}{{{k^{1+a}}}}}\right)-\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}},\;as\;k\to+\infty.

From (11k+a)a+1=1a+1k+a+a(a+1)21(k+a)2+o(1(k+a)2),{\left({1-\frac{1}{{k+a}}}\right)^{a+1}}=1-\frac{{a+1}}{{k+a}}+\frac{{a\left({a+1}\right)}}{2}\frac{1}{{{{\left({k+a}\right)}^{2}}}}+o\left({\frac{1}{{{{\left({k+a}\right)}^{2}}}}}\right), 1(k+a)2=o(lnkk1+a)\frac{1}{{{{\left({k+a}\right)}^{2}}}}=o\left({\frac{{\ln k}}{{{k^{1+a}}}}}\right) and 1klnxx1+a𝑑x(k+a)a+1=o(lnkk1+a)\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}}=o\left({\frac{{\ln k}}{{{k^{1+a}}}}}\right), (27) can be deduced that

αkγk+k1k+aa2lnkk1+a+o(lnkk1+a),{\alpha_{k}}\geq{\gamma_{k}}+\frac{{k-1}}{{k+a}}\frac{{{a^{2}}\ln k}}{{{k^{1+a}}}}+o\left({\frac{{\ln k}}{{{k^{1+a}}}}}\right),

which implies that limkαkγklnkk1+aa2>0,\mathop{\lim}\limits_{k\to\infty}\frac{{{\alpha_{k}}-{\gamma_{k}}}}{{\frac{{\ln k}}{{{k^{1+a}}}}}}\geq{a^{2}}>0, i.e., αkγk{\alpha_{k}}\geq{\gamma_{k}} for kk is large sufficiently.

Hence, γkαk\gamma_{k}\leq\alpha_{k} holds for kk is large sufficiently. Since that sk=O(ka+1),{s_{k}}=O\left({{k^{a+1}}}\right), we conclude the result 1) from Theorem 2.10 and {xk}\left\{{{x_{k}}}\right\} converges strongly to x¯X.\bar{x}\in{X}. Further, similar with the proof of Corollary 3.3, result 2) holds.

Remark 6. We see that tkt_{k} in case 2 is the tkt_{k} proposed in FISTA_CD (see, Chambolle & Dossal, 2015) but with a wider scope of a.a. Corollary 3.11 shows that the convergence rates of IFB with tkt_{k} in Case 2 related to the value of a.a.

Notice that both of the convergence results in Corollary 3.1 and Corollary 3.5 enjoy sublinear convergence rate of o(1kp)o\left({\frac{1}{{{k^{p}}}}}\right) for any p>1.p>1. Here, We give a further analysis for the convergence rate of the IFB from another aspect. From our Assumption A2,A_{2}, we can derive that

γk=1ckσ+o(1kσ)1tk+1.{\gamma_{k}}=1-\frac{c}{{{k^{\sigma}}}}+o\left({\frac{1}{{{k^{\sigma}}}}}\right)-\frac{1}{{{t_{k+1}}}}.

For tk{t_{k}} in Case 1, we have γk=1αk1α+o(1k1α);{\gamma_{k}}=1-\frac{\alpha}{{{k^{1-\alpha}}}}+o\left({\frac{1}{{{k^{1-\alpha}}}}}\right); For tk{t_{k}} in Case 3, we have corresponding γk=1akr+o(1kr).{\gamma_{k}}=1-\frac{a}{{{k^{r}}}}+o\left({\frac{1}{{{k^{r}}}}}\right). Obviously, these two γk{\gamma_{k}} are of the similar magnitude, in particular, they should be of the same order if we choose r=0.5,r=0.5, a=0.5,a=0.5, and α=0.5,\alpha=0.5, theoretically. Thus, it’s reasonable that the corresponding IFBs have similar numerical experiments. Numerical results in Section 4 can confirm this conclusion.

4 Inertial Forward-Backward Algorithm with an Adaptive Modification

For solving the problem (PP), the authors in Wen & Chen (2017) showed that under the error bound condition, the sequences {xk}\left\{{{x_{k}}}\right\} and {F(xk)}\left\{{F\left({{x_{k}}}\right)}\right\} generated by FISTA with fixed restart are RR-linearly convergent. In O’Donoghue & Candès (2015), an adaptive scheme for FISTA were proposed and enjoyed global linear convergence of the objective values when applying this method to problem (PP) with ff being strongly convex and g=0.g=0. And the authors stated that after a certain number of iterations, adaptive restarting may provide linear convergence for Lasso, while they didn’t prove similar results for general nonsmooth convex problem (PP). In this section, we will explain that Inertial forward-backward algorithm with an adaptive modification enjoys same convergence results as we proved in Section 2 and Section 3. The adaptive modification scheme is described below:

Algorithm 2 Inertial forward-backward algorithm with an adaptive modification (IFB_AdapM)

Step 0. Take y1=x0Rn,t1=1.{y_{1}}={x_{0}}\in{R^{n}},{t_{1}}=1. Input λ=μLf,\lambda=\frac{\mu}{{{L_{f}}}}, where μ]0,1[{\mu}\in\left]{{\rm{0}},{\rm{1}}}\right[.
       Step k. Compute
                           xk=Tλ(yk)=proxλg(ykλf(yk)){x_{k}}={T_{\lambda}}\left({{y_{k}}}\right)={\rm{pro}}{{\rm{x}}_{\lambda g}}\left({{y_{k}}-\lambda\nabla f\left({{y_{k}}}\right)}\right)
                              yk+1=xk+γk(xkxk1),{y_{k+1}}={x_{k}}+{\gamma_{k}}\left({{x_{k}}-{x_{k-1}}}\right),
                 where {γk=0,if(ykxk)T(xkxk1)>0orF(xk)>F(xk1),γk=tk1tk+1,otherwise.\left\{\begin{array}[]{l}{\gamma_{k}}=0,\;{\rm{if}}\;{\left({{y_{k}}-{x_{k}}}\right)^{T}}\left({{x_{k}}-{x_{k-1}}}\right)>0\;{\rm{or}}\;F\left({{x_{k}}}\right)>F\left({{x_{k-1}}}\right),\\ {\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}},\;{\rm{otherwise}.}\end{array}\right.

Note that the adaptive modification condition is same with the adpative restart scheme in O’Donoghue & Candès (2015). Here, we call the condition (ykxk)T(xkxk1)>0{\left({{y_{k}}-{x_{k}}}\right)^{T}}\left({{x_{k}}-{x_{k-1}}}\right)>0 as gradient modification scheme and call F(xk)>F(xk1)F\left({{x_{k}}}\right)>F\left({{x_{k-1}}}\right) as function modification scheme. While, unlike the restart strategy setting tk=0{t_{k}}=0 every time the restart condition holds to make the update of mumentum restarts from 0, Algorithm 2 sets the momentum back to 0 (Called adaptive modification step) at the current iteration but don’t interrupt the update of mumentum. Based on Theorem 2.2 and the fact γk=0αk=sk1sk+1,{\gamma_{k}}=0\leq{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}, we can obtain the same convergence rates for the function values and iterates of Algorithm 2. Specifically, Algorithm 2 with tk=e(k1)α(0<α<1){t_{k}}={e^{{{\left({k-1}\right)}^{\alpha}}}}\left({0<\alpha<1}\right) or tk=kr1+aa(r<1,a>0){t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r<1,a>0}\right) converges with any sublinear rate of type 1kp\frac{1}{{{k^{p}}}} and the corresponding numerical performances compare favourably with FISTA equipped with the fixed restart scheme or both the fixed and adaptive restart schemes, which has RR-linearly convergence rate (See the numerical experiments in Section 5).

5 Numerical Experiments

In this section, we conduct numerical experiments to study the numerical performance of IFB with different options of tkt_{k} and to verify our theoretical results. The codes are available at https://github.com/TingWang7640/Paper_EB.git

LASSO We first consider the LASSO

minxRnF(x)=12Axb2+δx1.\displaystyle\mathop{\min}\limits_{x\in{R^{n}}}F\left(x\right)=\frac{1}{2}{\left\|{Ax-b}\right\|^{2}}+\delta{\left\|x\right\|_{1}}. (28)

We generate an ARm×nA\in{R^{m\times n}} be a Gaussian matrix and randomly generate a ss-sparse vector x^\hat{x} and set b=Ax^+0.5ε,b=A\hat{x}+0.5\varepsilon, where ε\varepsilon has standard i.i.d. Gaussian entries. And set δ=1.\delta=1. We observe that (28) is in the form of problem (PP) with f(x)=12Axb2f\left(x\right)=\frac{1}{2}{\left\|{Ax-b}\right\|^{2}} and g(x)=δx1.g\left(x\right)=\delta{\left\|x\right\|_{1}}. It is clear that ff has a Lipschitz continuous gradient and Lf=λmax(ATA).{L_{f}}={\lambda_{\max}}\left({{A^{T}}A}\right). Moreover, we can observe that (28) satisfies the local error bound condition based on the third example in Introduction with h(x)=12x2h\left(x\right)=\frac{1}{2}{\left\|x\right\|^{2}} and c=ATb.c=-{A^{T}}b. We terminate the algorithms once F(xk)<108.\left\|{\partial F\left({{x_{k}}}\right)}\right\|<{10^{-8}}.

Considering Corollary 3.3. We know that in theory, the rate of convergence of IFB with tkt_{k} in Case 2 should improve constantly as rr increasing. In the Fig.1, we test four choices of r,r, which is r=2,r=2, r=4,r=4, r=6r=6 and r=8,r=8, to show the same result in experiments as in theory. Denote that the IFB with tk=kr1+aa{t_{k}}=\frac{{{k^{r}}-1+a}}{a} is called as “FISTA_pow(r)”. Here we set a=4.a=4. And the constant stepsize is λ=0.98Lf.\lambda=\frac{{0.98}}{{{L_{f}}}}.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Computational results for the convergence of ψk\left\|{{\psi_{k}}}\right\| and (F(xk)F).\left({F\left({{x_{k}}}\right)-F^{*}}\right).

In Corollary 3.11, we show that the convergence rate of corresponding IFB greatly related to the value of a.a. In the Fig.2, we test four choices of a,a, which is a=4,a=4, a=6,a=6, a=8a=8 and a=10,a=10, to verify our theoretical results. Set λ=0.98Lf.\lambda=\frac{0.98}{{{L_{f}}}}.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: Computational results for the convergence of ψk\left\|{{\psi_{k}}}\right\| and (F(xk)F).\left({F\left({{x_{k}}}\right)-F^{*}}\right).

Now, we perform numerical experiments to study the IFB with five choices of tk.t_{k}. Notice that the IFBs with tkt_{k} discussed in Case 1 and Case 3 enjoy the rates of convergence better than any order of convergence rate, and in the end of last section, we emphasize that these two IFBs should achieve almost the same numerical experiments if we set the related parameters as r=0.5,r=0.5, a=0.5,a=0.5, and α=0.5.\alpha=0.5. Hence, we consider the following algorithms:
1) FISTA;
2) FISTA_CD with a=4a=4;
3) FISTA_pow(8), i.e., the IFB with tk=kr1+aa(r=8anda=4){t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r=8\;{\rm{and}}\;a=4}\right).
4) FISTA_pow(0.5), i.e., the IFB with tk=kr1+aa(r=0.5anda=0.5){t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r=0.5\;{\rm{and}}\;a=0.5}\right).
5) FISTA_exp, i.e., the IFB with tk=e(k1)α,0<α<1.{t_{k}}={e^{{{\left({k-1}\right)}^{\alpha}}}},0<\alpha<1. And set α=0.5.\alpha=0.5.

Set λ=0.98Lf.\lambda=\frac{{0.98}}{{{L_{f}}}}.

Refer to caption
(a)
Refer to caption
(b)
Figure 3: Computational results for the convergence of ψk\left\|{{\psi_{k}}}\right\| and (F(xk)F).\left({F\left({{x_{k}}}\right)-F^{*}}\right).

Our computational results are presented in Fig.3. We see that FISTA_exp and FISTA_pow(0.5) cost many fewer steps than FISTA_CD and FISTA, and faster than FISTA_pow(8). This results are same as the theoretical analyses in Section 3. And we see that the two lines of FISTA_exp and FISTA_pow(0.5) almost coincide, here, we give the detail number of iterations: for FISTA_exp, it’s number of iteration is 3948, and for FISTA_pow(0.5), it’s number of iteration is 3964, which confirms our theoretical analysis.

Sparse Logistic Regression. We also consider the sparse logistic regression with the l1l_{1} regularized, that is

minx1ni=1nlog(1+exp(lihi,x))+δx1,\displaystyle\mathop{\min}\limits_{x}\frac{1}{n}\sum\limits_{i=1}^{n}{\log\left({1+\exp\left({-{l_{i}}\left\langle{{h_{i}},x}\right\rangle}\right)}\right)}+\delta\left\|{{x}}\right\|_{1}, (29)

where hiRm,{h_{i}}\in{R^{m}}, li{1,1},i=1,n.{l_{i}}\in\left\{{-1,1}\right\},i=1,\cdots n. Define Kij=lihij{K_{ij}}=-{l_{i}}{h_{ij}} and Lf=4nKTK.{L_{f}}=\frac{4}{n}\left\|{{K^{T}}K}\right\|. It satisfies the local error bound condition since the third example in Introduction with h(x)=1ni=1nlog(1+exp(xi))h\left(x\right)=\frac{1}{n}\sum\limits_{i=1}^{n}{\log\left({1+\exp\left({{x_{i}}}\right)}\right)} and A=K,A=K, c=0.c=0. Set δ=1.e2.\delta=1.e-2. We take three datasets “w4a”, “a9a” and “sonar” from LIBSVM (see, Chang & Lin, 2011). And the computational results relative to the number of iterations are reported in following Table 1.

Table 1: Comparison of the number of iterations
FISTA FISTA_CD FISTA_pow(8) FISTA_pow(0.5) FISTA_exp
“ w4a ” 1147 760 544 510 548
“ a9a ” 2049 1289 757 623 714
“ sonar ” 8405 3406 1586 922 980

We see from Table 1 that FISTA_exp, FISTA_pow(0.5) and FISTA_pow(8) outperform FISTA and FISTA_CD and the numerical results are consistent with the theoretical ones.

Strong convex quadratic programming with simplex constraints.

minx[sl,su]12xTAx+bTx,\mathop{\min}\limits_{x\in\left[{sl,su}\right]}\frac{1}{2}{x^{T}}Ax+{b^{T}}x,

where ARm×mA\in{R^{m\times m}} is a symmetric positive definite matrix generated by A=BTB+sIA={B^{T}}B+sI where BRm2×mB\in{R^{\frac{m}{2}\times m}} with i.i.d. standard Gaussian entries and ss chosen uniformly at random from [0,1]\left[{0,1}\right]. The vector bRmb\in{R^{m}} is generated with i.i.d. standard Gaussian entries. Set susu = ones(m,1) and slsl = -ones(m,1). Notice that f(x)=12xTAx+bTxf\left(x\right)=\frac{1}{2}{x^{T}}Ax+{b^{T}}x with μf=λmin(A){\mu_{f}}={\lambda_{\min}}\left(A\right) and Lf=λmax(A),{L_{f}}={\lambda_{\max}}\left(A\right), and g(x)=δ[sl,su](x).g\left(x\right)={\delta_{\left[{sl,su}\right]}}\left(x\right). Here, we terminate the algorithms once F(xk)<106.\left\|{\partial F\left({{x_{k}}}\right)}\right\|<{10^{-6}}.

Now we perform numerical experiments with (1) Forward-Backward method without inertial (FB); (2) FISTA with fixed and adaptive restart schemes (FISTA_R); (3) IFB with β:=μfLfμf+Lf{\beta^{*}}:=\frac{{\sqrt{{\mu_{f}}}-\sqrt{{L_{f}}}}}{{\sqrt{{\mu_{f}}}+\sqrt{{L_{f}}}}} (IFB_β{\beta^{*}}); (4) FISTA_exp; (5) Algorithm 2 (Gradient scheme) with tk=ek1{t_{k}}={e^{\sqrt{k-1}}} (IFB_AdapM_exp).

According to Corollary 3.1, we know that o(1kp)o\left({\frac{1}{{{k^{p}}}}}\right) sublinearly convergence rate for the sequences {xk}\left\{{{x_{k}}}\right\} and {F(xk)}\left\{{F\left({{x_{k}}}\right)}\right\} generated by algorithms FISTA_exp and IFB_AdapM_exp, which slower than RR-linear convergence for algorithms FB, FISTA_R and IFB_β{\beta^{*}} from a theoretical point of view. However, we can see from Fig.4 and Fig.5 that IFB_AdapM_exp always has better performance than FISTA_exp and sometimes, IFB_AdapM_exp performs better than other four algorithms and FISTA_exp performs similar with IFB_β{\beta^{*}} but don’t require the strong convex parameter. Consequently, although the linear convergence rate is not reached, FISTA_exp and IFB_AdapM_exp still have good numerical performances, and this adaptive modification scheme can significantly improve the convergence speed of IFB.

Refer to caption
(a)
Refer to caption
(b)
Figure 4: Computational results for the convergence of ψk\left\|{{\psi_{k}}}\right\| and (F(xk)F).\left({F\left({{x_{k}}}\right)-F^{*}}\right).
Refer to caption
(a)
Refer to caption
(b)
Figure 5: Computational results for the convergence of ψk\left\|{{\psi_{k}}}\right\| and (F(xk)F).\left({F\left({{x_{k}}}\right)-F^{*}}\right).

6 Conclusion

In this paper, under the local error bound condition, we study the convergence results of IFBs with a class of abstract tkt_{k} satisfying the assumption A2A_{2} for solving the problem (PP). We use a new method called “comparison method” to discuss the improved convergence rates of function values and sublinear rates of convergence of iterates generated by the IFBs with six choices of tk.t_{k}. In particular, we show that, under the local error bound condition, the strong convergence of iterates generated by the original FISTA can be established, the convergence rate of FISTA_CD is actually related to the value of a,a, and the sublinear convergence rates for both of function values and iterates generated by IFBs with tkt_{k} in Case 1 and Case 3 can achieve o(1kp)o\left({\frac{1}{{{k^{p}}}}}\right) for any positive integer p>1.p>1. Specifically, our results still hold for IFBs with an adaptive modification scheme.

Acknowledgements

The work was supported by the National Natural Science Foundation of China (No.11901561), the Natural Science Foundation of Guangxi (No.2018GXNSFBA281180) and the Postdoctoral Fund Project of China (Grant No.2019M660833).

References

  • Attouch & Peypouquet (2016) Attouch, H. & Peypouquet, J. (2016) The rate of convergence of Nesterov’s accelerated forwardbackward method is actually faster than 1k2{\frac{1}{{{k^{2}}}}}. SIAM J. Optim., 26, 1824–1834.
  • Attouch & Cabot (2018) Attouch, H. & Cabot, A. (2018) Convergence rates of inertial forward-backward algorithms. SIAM J. Optim., 28, 849–874.
  • Apidopoulos & Aujol (2020) Apidopoulos, V., Aujol, J. & Dossal, C. (2020) Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’s rule. Math. Program., 180, 137–156.
  • Beck & Teboulle (2009) Beck, A. & Teboulle, M. (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci., 2, 183–202.
  • Chambolle & Dossal (2015) Chambolle, A. & Dossal, C. (2015) On the convergence of the iterates of the “fast iterative shrinkage-thresholding algorithm. J. Optim. Theory Appl., 166, 968–982.
  • Chambolle & Pock (2016) Chambolle, A. & Pock, T. (2015) An introduction to continuous optimization for imaging. Acta Numerica., 25, 161–319.
  • Chang & Lin (2011) Chang, C. C. & Lin, C. J. (2011) LIBSVM: a library for support vector machines. ACM. Trans. Intell. Syst. Technol., 2, 1–27.
  • Calatroni & Chambolle (2019) Calatroni, L. & Chambolle, A. (2019) Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM J. Optim., 29, 1772–1798.
  • Donghwan & Jeffrey (2018) Donghwan, K. & Jeffrey, A. F. (2018) Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim., 28, 223–250.
  • Hai (2020) Hai, T. N.. (2020) Error bounds and stability of the projection method for strongly pseudomonotone equilibrium problems. Int. J. Comput. Math., also available online from https://doi.org/10.1080/00207160.2019.1711374.html.
  • Johnstone & Moulin (2017) Johnstone, P. R. & Moulin, P. (2017) Local and global convergence of a general inertial proximal splitting scheme for minimizing composite functions. Comput. Optim. Appl., 67, 259–292.
  • Luo & Tseng (1992) Luo, Z. & Tseng, P. (1992) On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim., 30, 408–425.
  • Luo & Tseng (1992a) Luo, Z. & Tseng, P. (1992a) Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim., 2, 43-54.
  • Luo & Tseng (1993) Luo, Z. & Tseng, P. (1993) On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res., 18, 846-867.
  • Moudafi & Oliny (2003) Moudafi, A. & Oliny, M. (2003) Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math., 155, 447–454.
  • Mridula & Shukla (2020) Mridula, V. & Shukla, K. K. (2020) Convergence analysis of accelerated proximal extra-gradient method with applications. Neurocomputing., 388, 288–300.
  • Necoara & Nesterov (2019) Necoara, I., Nesterov, Y. & Glineur, F. (2019) Linear convergence of first order methods for non-strongly convex optimization. Math. Program., 175, 69–107.
  • Nesterov (2019) Nesterov, Y. (2019) A method for solving the convex programming problem with convergence rate O(1k2)O\left({\frac{1}{{{k^{2}}}}}\right). Dokl. Akad. Nauk SSSR., 269, 543–547.
  • Nesterov (2013) Nesterov, Y. (2013) Gradient methods for minimizing composite functions. Math. Program., 140, 125–161.
  • O’Donoghue & Candès (2015) O’Donoghue, B. & Candès, E. (2015) Adaptive restart for accelerated gradient schemes. Dokl. Akad. Nauk SSSR., 15, 715–732.
  • Pang (1987) Pang, J. S. (1987) A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res., 12, 474–484.
  • Su & Boyd (2016) Su, W., Boyd, S. & Candès, E. J. (2016) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. J. Mach. Learn. Res., 17, 1–43.
  • Tseng & Yun (2009) Tseng, P. & Yun, S. (2009) A coordinate gradient descent method for nonsmooth separable minimization. Math. Program., 117, 387–423.
  • Tseng & Yun (2010) Tseng, P. & Yun, S. (2010) A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput Optim Appl., 47, 179–206.
  • Tseng (2010) Tseng, P. (2010) Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program., 125, 263–295.
  • Tao & Boley (2016) Tao, S. Z., Boley, D. & Zhang, S. Z. (2016) Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM J. Optim., 26, 313–336.
  • Villa & Salzo (2013) Villa, S., Salzo, S., Baldassarre, L. & Verri, A. (2013) Accelerated and Inexact Forward-Backward Algorithms. SIAM J. Optim., 23, 1607–1633.
  • Wen & Chen (2017) Wen, B., Chen, X. J., & Pong, T. K. (2017) Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problemsLinear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim., 27, 124–145.
  • Xiao & Zhang (2013) Xiao, L. & Zhang, T. (2013) A Proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim., 23, 1062–1091.
  • Zhou & So (2017) Zhou, Z. & So, A. M. (2017) A unified approach to error bounds for structured convex optimization problems. Math. Program., 165, 689–728.
  • Nesterov (2003) Nesterov, Y. (2003) Introductory lectures on convex optimization: A basic course. Springer Science & Business Media.

Appendix A Proof of Lemma 2.4

Proof A.1.

Assume by contradiction that liminfksk=l<+.\mathop{\lim\inf}\limits_{k\to\infty}{s_{k}}=l<+\infty. Notice that l0l\geq 0 since that {sk}\left\{{{s_{k}}}\right\} is a nonnegative sequence. Then, there exists a subsequence {skj}\left\{{{s_{{k_{j}}}}}\right\} such that limjskj=l.\mathop{\lim}\limits_{j\to\infty}{s_{{k_{j}}}}=l. By the condition αk=sk1sk+1γk,{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}}, we have skj1skj+1γkj,\frac{{{s_{{k_{j}}}}-1}}{{{s_{{k_{j}}+1}}}}\geq{\gamma_{{k_{j}}}}, then, combining with the fact that limkγk=1\mathop{\lim}\limits_{k\to\infty}{\gamma_{k}}=1 from Remark 3, we deduce that

limsupjskj+1limjskj1γkj=l1,\mathop{\lim\sup}\limits_{j\to\infty}{s_{{k_{j}}+1}}\leq\mathop{\lim}\limits_{j\to\infty}\frac{{{s_{{k_{j}}}}-1}}{{{\gamma_{{k_{j}}}}}}=l-1,

which leads to a contradiction that l=liminfkskliminfjskj+1limsupjskj+1l1.l=\mathop{\lim\inf}\limits_{k\to\infty}{s_{k}}\leq\mathop{\lim\inf}\limits_{j\to\infty}{s_{{k_{j}}+1}}\leq\mathop{\lim\sup}\limits_{j\to\infty}{s_{{k_{j}}+1}}\leq l-1. Hence, limksk=+.\mathop{\lim}\limits_{k\to\infty}{s_{k}}=+\infty.

Further, by the condition αk=sk1sk+1γk,{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}}, we get sk+1sk1γk+1sk+1.\frac{{{s_{k+1}}}}{{{s_{k}}}}\leq\frac{1}{{{\gamma_{k}}+\frac{1}{{{s_{k+1}}}}}}. Combining with limkγk=1\mathop{\lim}\limits_{k\to\infty}{\gamma_{k}}=1 and limksk=+,\mathop{\lim}\limits_{k\to\infty}{s_{k}}=+\infty, we obtain that limsupksk+1sk1.\mathop{\lim\sup}\limits_{k\to\infty}\frac{{{s_{k+1}}}}{{{s_{k}}}}\leq 1. Since that {sk}\left\{{{s_{k}}}\right\} is a nonnegative subsequence, we have limsupk(sk+1sk)21,\mathop{\lim\sup}\limits_{k\to\infty}{\left({\frac{{{s_{k+1}}}}{{{s_{k}}}}}\right)^{2}}\leq 1, which leads to the result that

limsupksk+12sk2sk2=limsupk(sk+1sk)210.\mathop{\lim\sup}\limits_{k\to\infty}\frac{{s_{k+1}^{2}-s_{k}^{2}}}{{s_{k}^{2}}}=\mathop{\lim\sup}\limits_{k\to\infty}{\left({\frac{{{s_{k+1}}}}{{{s_{k}}}}}\right)^{2}}-1\leq 0.