\jno

drnxxx

\shortauthorlist

Convergence Rate of IFBS

Convergence Rate of Inertial Forward-Backward Splitting Algorithms Based on the Local Error Bound Condition

Hongwei Liu, Ting Wang
School of Mathematics and Statistics Email: hwliuxidian@163.comCorresponding author. Email: wangting_7640@163.com Xidian University Xi’an 710126 China
and
Zexian Liu
State Key Laboratory of Scientific and Engineering Computing Email: liuzexian2008@163.com Institute of Computational Mathematics and Scientific/Engineering computing AMSS Chinese Academy of Sciences Beijing 100190 China;
School of Mathematics and Statistics Guizhou University Guiyang 550025 China

Abstract

The “ Inertial Forward-Backward algorithm ” (IFB) is a powerful tool for convex nonsmooth minimization problems, and under the local error bound condition, the $R$ -linear convergence rates for the sequences of objective values and iterates have been proved if the inertial parameter $\gamma_{k}$ satisfies ${\sup_{k}}{\gamma_{k}}<1.$ However, the convergence result for ${\sup_{k}}{\gamma_{k}}=1$ is not know. In this paper, based on the local error bound condition, we exploit a new assumption condition for the important parameter $t_{k}$ in IFB, which implies that ${\lim_{k\to\infty}}{\gamma_{k}}=1,$ and establish the convergence rate of function values and strong convergence of the iterates generated by the IFB algorithms with six $t_{k}$ satisfying the above assumption condition in Hilbert space. It is remarkable that, under the local error bound condition, we show that the IFB algorithms with some $t_{k}$ can achieve sublinear convergence rate of $o\left({\frac{1}{{{k^{p}}}}}\right)$ for any positive integer $p>1$ . In addition, we propose a class of Inertial Forward-Backward algorithm with adaptive modification and show it has same convergence results as IFB under the error bound condition. Some numerical experiments are conducted to illustrate our results. Inertial Forward-Backward algorithm; local error bound condition; rate of convergence.

1 Introduction

Let $H$ be a real Hilbert space. $f:H\to\mathbb{R}$ be a smooth convex function and continuously differentiable with $L_{f}$ -Lipschitz continuous gradient, and $g:H\to\mathbb{R}\cup\left\{{+\infty}\right\}$ be a proper lower semi-continuous convex function. We also assume that the proximal operator of $\lambda g,$ i.e.,

{\rm{pro}}{{\rm{x}}_{\lambda g}}\left(\cdot\right)\mathop{=\arg\min}\limits_{x\in H}\left\{{g\left(x\right)+\frac{1}{{2\lambda}}{{\left\|{x-\cdot}\right\|}^{2}}}\right\}

(1)

can be easliy computed for all $\lambda>0.$ In this paper, we consider the following problem:

(P)\quad\quad\quad\quad\quad\mathop{\min}\limits_{x\in H}F\left(x\right):=f\left(x\right)+g\left(x\right).

We assume that problem ( $P$ ) is solvable, i.e., ${X}:=\arg\min F\neq\emptyset,$ and for ${x_{*}}\in{X}$ we set ${F_{*}}:=F\left({{x_{*}}}\right).$

In order to solve the problem ( $P$ ), several algorithms have been proposed based on the use of the proximal operator due to the non differentiable part. One can consult Johnstone & Moulin (2017), Moudafi & Oliny (2003) and Villa & Salzo (2013) for a recent account on the proximal-based algorithms that play a central role in nonsmooth optimization. A typical optimization strategy for solving problem ( $P$ ) is the Inertial Forward-Backward algorithm (IFB), which consists in applying iteratively at every point the non-expansive operator ${T_{\lambda}}:H\to H,$ defined as

{T_{\lambda}}\left(x\right):={\rm{pro}}{{\rm{x}}_{\lambda g}}\left({x-\lambda\nabla f\left(x\right)}\right)\;\forall x\in H.

Algorithm 1 Inertial Forward-Backward algorithm (IFB)

In view of the composition of IFB, we can easily find that the inertial term ${\gamma_{k}}$ plays an important role for improving the speed of convergence of IFB. Based on Nesterov’s extrapolation technique (see, Nesterov, 2019), Beck and Teboulle proposed a “fast iterative shrinkage-thresholding algorithm” (FISTA) with $t_{1}=1$ and ${t_{k+1}}{\rm{=}}\frac{{{\rm{1+}}\sqrt{{\rm{1+4}}t_{k}^{2}}}}{2}$ for solving ( $P$ ) (see, Beck & Teboulle, 2009). The remarkable properties of this algorithm are the computational simplicity and the significantly better global rate of convergence of the function values, that is $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=O\left({\frac{1}{{{k^{2}}}}}\right).$ Several variants of FISTA considered in works such as Apidopoulos & Aujol (2020), Chambolle & Dossal (2015), Calatroni & Chambolle (2019), Donghwan & Jeffrey (2018), Mridula & Shukla (2020), Su & Boyd (2016) and Tao & Boley (2016), the properties such as convergence of the iterates and rate of convergence of function values have also been studied.

Chambolle and Dossal (see, Chambolle & Dossal, 2015) pointed out that FISTA satisfies a better worst-case estimate, however, the convergence of the iterates is not known. They proposed a new ${t_{k}}=\frac{{k-1+a}}{a}\left({a>2}\right)$ to show that the iterates generated by the corresponding IFB, named “FISTA_CD”, converges weakly to the minimizer of $F$ . Attouch and Peypouquet (see, Attouch & Peypouquet, 2016) further proved that the sequence of function values generated by FISTA_CD approximates the optimal value of the problem with a rate that is strictly faster than $O\left({\frac{1}{{{k^{2}}}}}\right),$ namely $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2}}}}}\right).$ Apidopoulos et al. (see, Apidopoulos & Aujol, 2020) noticed that the basic idea of the choices of $t_{k}$ in Attouch & Cabot (2018), Beck & Teboulle (2009) and Chambolle & Dossal (2015) is the Nesterov’s rule: $t_{k}^{2}-t_{k+1}^{2}+{t_{k+1}}\geq 0,$ and they focused on the case that the Nesterov’s rule is not satisfied. They studied the ${\gamma_{k}}=\frac{n}{{n+b}}$ with ${0<b<3}$ and found that the exact estimate bound is: $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=O\left({\frac{1}{{{k^{\frac{{2b}}{3}}}}}}\right)$ . Attouch and Peypouquet (see, Attouch & Cabot, 2018) considered various options of ${\gamma_{k}}$ to analyze the convergence rate of the function values and weak convergence of the iterates under the given assumptions. Further, they showed that the strong convergence of iterates can be satisfied for the special options of $f$ . Wen, Chen and Pong (see, Wen & Chen, 2017) showed that for the nonsmooth convex minimization problem ( $P$ ), under the local error bound condition (see, Tseng & Yun, 2009), the $R$ -linear convergence of both the sequence $\left\{{{x_{k}}}\right\}$ and the corresponding sequence of objective values $\left\{{F\left({{x_{k}}}\right)}\right\}$ can be satisfied if ${\sup_{k}}{\gamma_{k}}<1;$ and they pointed out that the sequences $\left\{{{x_{k}}}\right\}$ and $\left\{{F\left({{x_{k}}}\right)}\right\}$ generated by FISTA with fixed restart or both fixed and adaptive restart schemes (see, O’Donoghue & Candès, 2015) are $R$ -linearly convergent under the error bound condition. However, the local convergence rate of the iterates generated by FISTA for solving ( $P$ ) is still unknown, even under the local error bound condition.

The local error bound condition, which estimates the distance from $x$ to $X^{*}$ by the norm of the proximal residual at $x,$ has been proved to be extremely useful in analyzing the convergence rates of a host of iterative methods for solving optimization problems (see, Zhou & So, 2017). Major contributions on developing and using error bound condition to derive convergence results of iterative algorithms have been developed in a series of papers (see, e.g. Hai, 2020; Luo & Tseng, 1992; Necoara & Nesterov, 2019; Tseng & Yun, 2009, 2010; Tseng, 2010; Zhou & So, 2017). Zhou and So (see, Zhou & So, 2017) established error bounds for minimizing the sum of a smooth convex function and a general closed proper convex function. Such a problem contains general constrained minimization problems and various regularized loss minimization formulations in machine learning, signal processing, and statistics. There are many choices of $f$ and $g$ satisfy the local error bound condition, including:

•

(Pang, 1987, Theorem 3.1) $f$ is strong convex, and $g$ is arbitrary.
•

(Luo & Tseng, 1992a, Theorem 2.3) $f$ is a quadratic function, and $g$ is a polyhedral function.
•

(Luo & Tseng, 1992, Theorem 2.1) $g$ is a polyhedral function and $f=h(Ax){\rm{+}}\left\langle{c,x}\right\rangle,$ where $A\in{R^{m\times n}},c\in{R^{n}}$ and $h$ is a continuous differentiable function with gradient Lipschitz continuous and strongly convex on any compact convex set. This covers the well-known LASSO.
•

(Luo & Tseng, 1993, Theorem 4.1) $g$ is a polyhedral function and $f\left(x\right)=\mathop{\max}\limits_{y\in Y}\left\{{{{\left({Ax}\right)}^{T}}y-h\left(y\right)}\right\}+{q^{T}}x,$ where $Y$ is a polyhedral set, $h$ is a strongly convex differentiable function with gradient Lipschitz continuous.
•

(Tseng, 2010, Theorem 2) $f$ takes the form $f\left(x\right)=h\left({Ax}\right),$ where $h$ is same as the above second item and $g$ is the grouped LASSO regularizer.

More examples satisfying the error bound condition can be referred to Tseng (2010), Zhou & So (2017), Tseng & Yun (2009), Pang (1987) and Luo & Tseng (1992).

It has been observed numerically that first-order methods for solving those specific structured instances of problem ( $P$ ) converge at a much faster rate than that suggested by the theory in Tao & Boley (2016), Xiao & Zhang (2013) and Zhou & So (2017). A very powerful approach to analyze this phenomenon is the local error bound condition. Hence, the first point this work focuses is the improved convergence rate of IFBs with some special $t_{k}$ under the local error bound condition.

We also pay attention to the Nesterov’s rule: $t_{k}^{2}-t_{k+1}^{2}+{t_{k+1}}\geq 0.$ For the $t_{k}$ satisfies it, we can derive that ${t_{k+1}}-{t_{k}}<1$ and $\sum\limits_{k=1}^{+\infty}{\frac{1}{{{t_{k}}}}}$ is divergent, which will greatly limit the choice of $t_{k}.$ What we expect is whether we can find the more suitable $t_{k}$ and obtain the improved theoretical results if we replace the Nesterov’s rule by some new we proposed.

Contributions.

In this paper, based on the local error bound condition, we exploit an assumption condition for the important parameter $t_{k}$ in IFB, and prove the convergence results including convergence rate of function values and strong convergence of iterates generated by the corresponding IFB. The above mentioned assumption condition imposed on $t_{k}$ provides a theoretical basis for choosing a new $t_{k}$ in IFB to solving those problems satisfying the local error bound condition, like LASSO. We use a “comparison methods” to discuss six choices of $t_{k}$ , which include the ones in original FISTA (see, Beck & Teboulle, 2009) and FISTA_CD (see, Chambolle & Dossal, 2015) and satisfy our assumption condition, and separately show the improved convergence rates of the function values and establish the sublinear convergence of the iterates generated by corresponding IFBs. We also establish the same convergence results for IFB with an adaptive modification (IFB_AdapM), which performs well in numerical experiments. It is remarkable that, under the local error bound condition, the strong convergence of the iterates generated by the original FISTA is established, the convergence rate of function values for FISTA_CD is improved to $o\left({\frac{1}{{{k^{2(a+1)}}}}}\right)$ , and the IFB algorithms with some $t_{k}$ can achieve sublinear convergence rate $o\left({\frac{1}{{{k^{p}}}}}\right)$ for any positive integer $p>1$ .

2 An new assumption condition for $t_{k}$ and the convergence of the corresponding IFB algorithms

In this section, we derive a new assumption condition for the $t_{k}$ in IFB, and analyze the convergence results of the corresponding IFB under the local error bound condition.

We start by recalling a key result, which plays an important role in our theoretical analysis.

Lemma 2.1.

(Chambolle & Pock, 2016, ineq (4.36)) For any $y\in{R^{n}},\lambda=\frac{\mu}{{{L_{f}}}},$ where ${\mu}\in\left({{\rm{0}},{\rm{1}}}\right]$ , we have,

\forall x\in\mathbb{R}^{n}\quad F\left({{T_{\lambda}}\left(y\right)}\right)\leq F\left(x\right)+\frac{1}{{2\lambda}}{\left\|{x-y}\right\|^{2}}-\frac{{1-\mu}}{{2\lambda}}{\left\|{{T_{\lambda}}\left(y\right)-y}\right\|^{2}}-\frac{1}{{2\lambda}}{\left\|{{T_{\lambda}}\left(y\right)-x}\right\|^{2}}.

(2)

Next, we give a very weak assumption to show that the sequence $\left\{{F\left({{x_{k}}}\right)}\right\},$ which is generated by Algorithm 1 with $0\leq\gamma_{k}\leq 1$ for $k$ is large sufficiently, converges to $F\left({{x_{*}}}\right)$ independent on $t_{k}.$

Assumption $A_{0}:$ For any $\xi_{0}\geq{F^{*}},$ there exist $\epsilon_{0}>0$ and ${\tau_{0}}>0$ such that

{\rm{dist}}\left({x,{X^{*}}}\right)\leq{\tau_{\rm{0}}}

(3)

whenever $\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left(x\right)-x}\right\|<{\varepsilon_{0}}$ and $F\left(x\right)\leq\xi_{0}.$

Remark 2. Note that Assumption $A_{0}$ can be derived by the assumption that $F$ is boundedness of level sets.

Lemma 2.2.

(Nesterov, 2013, Lemma 2) For ${\lambda_{1}}\geq{\lambda_{2}}>0,$ we have

\forall x\in\mathbb{R}^{n}\quad\left\|{{T_{{\lambda_{1}}}}\left(x\right)-x}\right\|\geq\left\|{{T_{{\lambda_{2}}}}\left(x\right)-x}\right\|\quad{\rm{and}}\quad\frac{{\left\|{{T_{{\lambda_{1}}}}\left(x\right)-x}\right\|}}{{{\lambda_{1}}}}\leq\frac{{\left\|{{T_{{\lambda_{2}}}}\left(x\right)-x}\right\|}}{{{\lambda_{2}}}}.

(4)

Proof 2.3.

Above lemma can be obtained from Lemma 2 of Nesterov (2013) with $B:=I,$ $L:=\frac{1}{\lambda}.$

Theorem 2.4.

Let $\left\{{{x_{k}}}\right\},$ $\left\{{{y_{k}}}\right\}$ be generated by Algorithm 1. Suppose that Assumption $A_{0}$ holds and there exists a positive interger $n_{0}$ such that for $k\geq n_{0},$ $0\leq\gamma_{k}\leq 1.$ Then,
1) $\sum\limits_{k=1}^{\infty}{{{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|}^{2}}}$ is convergent.
2) $\mathop{\lim}\limits_{k\to\infty}F\left({{x_{k}}}\right)=F\left({{x^{*}}}\right).$

Proof 2.5.

Applying the inequality (2) at the point $x={x_{k}},\;y={y_{k+1}},$ we obtain

\forall k\geq 1\quad\frac{{1-\mu}}{{2\lambda}}{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|^{2}}\leq\left({F\left({{x_{k}}}\right)+\frac{{\gamma_{k}^{2}}}{{2\lambda}}{{\left\|{{x_{k}}-{x_{k-1}}}\right\|}^{2}}}\right)-\left({F\left({{x_{k+1}}}\right)+\frac{1}{{2\lambda}}{{\left\|{{x_{k+1}}-{x_{k}}}\right\|}^{2}}}\right).

(5)

Then, we can easily obtain $\sum\nolimits_{k={n_{0}}}^{\infty}{{{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|}^{2}}}<+\infty$ since that $0\leq\gamma_{k}\leq 1$ holds for any $k\geq n_{0}.$ Then, result 1) can be obtained since that increasing the finite term does not change the convergence of the series. Moreover, for any $\bar{\epsilon}>0,$ there exists a $n_{1},$ which is sufficiently large, such that for any $k\geq\bar{n}:=\max\left({{n_{0}},{n_{1}}}\right),$ $\left\|{{x_{k}}-{y_{k}}}\right\|<\bar{\epsilon}.$ Setting ${\xi_{0}}=F\left({{x_{{\bar{n}}+1}}}\right)+\frac{1}{{2\lambda}}{\left\|{{x_{{\bar{n}}+1}}-{x_{{\bar{n}}}}}\right\|^{2}}.$ From Lemma 2.2 with $\lambda<\frac{1}{{{L_{f}}}}$ and the nonexpansiveness property of the proximal operator, we obtain that

\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left({{x_{k}}}\right)-{x_{k}}}\right\|\leq\frac{1}{{\lambda{L_{f}}}}\left\|{{T_{\lambda}}\left({{x_{k}}}\right)-{x_{k}}}\right\|=\frac{1}{{\lambda{L_{f}}}}\left\|{{T_{\lambda}}\left({{x_{k}}}\right)-{T_{\lambda}}\left({{y_{k}}}\right)}\right\|\leq\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\left\|{{x_{k}}-{y_{k}}}\right\|,

(6)

hence, $\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left({{x_{k}}}\right)-{x_{k}}}\right\|<\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\bar{\epsilon}$ for any $k\geq\bar{n}.$ Also, it follows from (5) that for any $k\geq\bar{n},$ $\left\{{F\left({{x_{k+1}}}\right)+\frac{1}{{2\lambda}}{{\left\|{{x_{k+1}}-{x_{k}}}\right\|}^{2}}}\right\}$ is non-increasing, then, $F\left({{x_{k}}}\right)\leq{\xi_{0}}.$ Hence, combining with the Assumption $A_{0}$ , we have for ${\xi_{0}}=F\left({{x_{{\bar{n}}+1}}}\right)+\frac{1}{{2\lambda}}{\left\|{{x_{{\bar{n}}+1}}-{x_{{\bar{n}}}}}\right\|^{2}},$ there exist ${\epsilon_{0}}:=\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\bar{\epsilon}$ and ${\tau_{0}}>0,$ such that

\forall k\geq\bar{n},\quad{\rm{dist}}\left({{x_{k}},{X^{*}}}\right)\leq\tau_{0}.

(7)

In addition, applying the inequality (2) at the point $y={y_{k+1}},$ and $x$ be an $x_{k+1}^{*}\in{X}$ such that ${\rm{dist}}\left({{x_{k+1}},{X}}\right)=\left\|{{x_{k+1}}-x_{k+1}^{*}}\right\|,$ we obtain

	$\displaystyle F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)$		(8)
	$\displaystyle\leq\frac{1}{{2\lambda}}{\left\\|{{y_{k+1}}-x_{k+1}^{}}\right\\|^{2}}-\frac{1}{{2\lambda}}{\left\\|{{x_{k+1}}-x_{k+1}^{}}\right\\|^{2}}$
	$\displaystyle=\frac{1}{{2\lambda}}{\left\\|{{y_{k+1}}-{x_{k+1}}}\right\\|^{2}}+\frac{1}{\lambda}\left\langle{{y_{k+1}}-{x_{k+1}},{x_{k+1}}-x_{k+1}^{*}}\right\rangle$
	$\displaystyle\leq\frac{1}{{2\lambda}}{\left\\|{{y_{k+1}}-{x_{k+1}}}\right\\|^{2}}+\frac{1}{\lambda}\left\\|{{y_{k+1}}-{x_{k+1}}}\right\\|{\rm{dist}}\left({{x_{k+1}},{X}}\right).$

Then, combining with $\left\|{{y_{k+1}}-{x_{k+1}}}\right\|\to 0$ by result 1) and (7), we have $\mathop{\lim}\limits_{k\to\infty}F\left({{x_{k}}}\right)=F\left({{x_{*}}}\right).$

The rest of this paper is based on the following assumption.

Assumption $A_{1}:$ (“Local error bound condition”, Tseng & Yun (2009)) For any $\xi\geq{F_{*}},$ there exist $\varepsilon>0$ and ${\bar{\tau}}>0$ such that

{\rm{dist}}\left({x,{X^{*}}}\right)\leq\bar{\tau}\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left(x\right)-x}\right\|

(9)

whenever $\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left(x\right)-x}\right\|<\varepsilon$ and $F\left(x\right)\leq\xi.$

As mentioned in Section 1, the $t_{k}$ in FISTA accelerates convergence rate from $O\left({\frac{1}{k}}\right)$ to $O\left({\frac{1}{{{k^{2}}}}}\right)$ for the function values and $t_{k}$ in FISTA_CD improves the convergence rate to $o\left({\frac{1}{{{k^{2}}}}}\right).$ Other options for $t_{k}$ are considered in Attouch & Cabot (2018) and Apidopoulos & Aujol (2020). Hence, we see that $t_{k}$ is the crucial factor to guarantee the convergence of the iterates or to improve the rate of convergence for the function values. Apidopoulos et al. in Apidopoulos & Aujol (2020) points that if $t_{k}$ satisfies the Nesterov’s rule, then one can obtain a better convergence rate. However, we notice that the Nesterov’s rule will limit the choice of $t_{k}$ greatly. In the following, we present a new Assumption $A_{2}$ for $t_{k},$ which helps us to obtain some new options of $t_{k},$ and analyze the convergence of iterates and convergence rate of the function values for the Algorithm 1 with a class of abstract $t_{k}$ satisfied Assumption $A_{2}$ under the local error bound condition.

Assumption $A_{2}:$ There exists a positive constant $0<\sigma\leq 1$ such that $\mathop{\lim}\limits_{k\to\infty}k^{\sigma}\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=c,$ where $c>0.$

Remark 3. It follows that ${\gamma_{k}}\in\left]{0,1}\right[,$ for any $k$ is sufficiently large, ${\mathop{\lim}\limits_{k\to\infty}{t_{k}}=+\infty}$ and ${\mathop{\lim}\limits_{k\to\infty}\frac{{{t_{k+1}}}}{{{t_{k}}}}=1}$ from Assumptions $A_{2}$ . (It is easy to verify that $t_{k}$ in FISTA and $t_{k}$ in FISTA_CD both satisfy the Assumption $A_{2}$ by choosing $\sigma=1$ , also, we can see that there exist some $t_{k}$ , which satisfy or do not satisfy Nesterov’s rule, satisfy Assumption $A_{2}$ (See Section 3))

Lemma 2.6.

Suppose that Assumptions ${A_{1}}$ and $A_{2}$ hold. Let $\left\{{{x_{k}}}\right\}$ be generated by Algorithm 1 and ${x_{*}}\in{X}.$ Then, there exists a constant $\tau_{1}>0$ such that

\forall k\geq 1,\quad F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)\leq\frac{{{\tau_{1}}}}{\lambda}{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|^{2}}.

Proof 2.7.

Since ${\gamma_{k}}\in\left]{0,1}\right[$ holds for $\forall k\geq n_{0}$ by Assumption $A_{2},$ then, similar with the proof of Theorem 2.4, for ${\xi_{0}}=F\left({{x_{{\bar{n}}+1}}}\right)+\frac{1}{{2\lambda}}{\left\|{{x_{{\bar{n}}+1}}-{x_{{\bar{n}}}}}\right\|^{2}},$ there exist ${\epsilon_{0}}:=\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\bar{\epsilon}$ and ${\tau_{0}}>0,$ such that

{\rm{dist}}\left({{x_{k}},{X^{*}}}\right)\leq\bar{\tau}\left\|{{T_{\frac{1}{{{L_{f}}}}}}\left({{x_{k}}}\right)-{x_{k}}}\right\|\leq\bar{\tau}\left({1+\frac{1}{{\lambda{L_{f}}}}}\right)\left\|{{x_{k}}-{y_{k}}}\right\|\leq\frac{{2\bar{\tau}}}{\mu}\left\|{{x_{k}}-{y_{k}}}\right\|,

(10)

where the second inequality of (10) follows from (6) and the third one follows the fact that $\lambda=\frac{\mu}{{{L_{f}}}}$ with $\mu\in\left({0,1}\right).$ In addition, it follows from (2.7) that

F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)\leq\frac{1}{{2\lambda}}{{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|}^{2}}+\frac{1}{\lambda}\left\|{{y_{k+1}}-{x_{k+1}}}\right\|{\rm{dist}}\left({{x_{k+1}},{X}}\right)\\ \leq\frac{1}{\lambda}\left({\frac{{2\bar{\tau}}}{\mu}+\frac{1}{2}}\right){{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|}^{2}},\quad\forall k\geq\bar{n}.

(11)

Also, we can find a constant $c>0$ such that for $\forall 1\leq k\leq{\bar{n}}-1,$ $F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)\leq\frac{c}{\lambda}{\left\|{{y_{k+1}}-{x_{k+1}}}\right\|^{2}}.$ Therefore, there exists a ${\tau_{1}}\geq\max\left\{{\frac{{2\bar{\tau}}}{\mu}+\frac{1}{2},c}\right\}$ such that the conclusion holds.

Here, we introduce a new way, which we called “comparison method”, that considers a sequence $\left\{{{\alpha_{k}}}\right\}$ such that ${\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}},$ where $\left\{{{s_{k}}}\right\}$ is a nonnegative sequence, to estimate the bounds of objective function and the local variation of the iterates.

Lemma 2.8.

Suppose that there exists a nonnegative sequence $\left\{{{s_{k}}}\right\}$ such that ${\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}},$ for $k$ is sufficiently large, and ${\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}},$ $t_{k}$ satisfies the Assumption $A_{2}.$ Then, we have $\mathop{\lim}\limits_{k\to\infty}{s_{k}}=+\infty,$ and $\mathop{\lim\sup}\limits_{k\to\infty}\frac{{s_{k+1}^{2}-s_{k}^{2}}}{{s_{k}^{2}}}\leq 0.$

Proof 2.9.

See the detailed proof in Appendix A.

Theorem 2.10.

Suppose that Assumptions ${A_{1}}$ and ${A_{2}}$ hold and there exists a nonnegative sequence $\left\{{{s_{k}}}\right\}$ such that ${\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq\gamma_{k},$ for $k$ is sufficiently large. Then, we have that $F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{s_{k+1}^{2}}}}\right)$ and ${\left\|{{x_{k+1}}-{x_{k}}}\right\|}=O\left({\frac{1}{{s_{k+1}}}}\right).$ Further, if $\sum\limits_{k=1}^{\infty}{\frac{1}{{s_{k+1}}}}$ is convergent, then the iterates $\left\{{{x_{k}}}\right\}$ converges strongly to a minimizer of $F.$

Proof 2.11.

Denote that ${E_{k}}=s_{k+1}^{2}\left({F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{s_{k}^{2}}}{{2\lambda}}{\left\|{{x_{k}}-{x_{k-1}}}\right\|^{2}}.$ Applying (5), we have

F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)+\frac{{1-\mu}}{{2\lambda}}{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|^{2}}+\frac{1}{{2\lambda}}{\left\|{{x_{k+1}}-{x_{k}}}\right\|^{2}}\leq F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)+\frac{{\gamma_{k}^{2}}}{{2\lambda}}{\left\|{{x_{k}}-{x_{k-1}}}\right\|^{2}}.

By the assumption condition, we have $\gamma_{k}^{2}\leq\alpha_{k}^{2}$ for any $k$ is sufficiently large, then,

F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)+\frac{{1-\mu}}{{2\lambda}}{\left\|{{x_{k+1}}-{y_{k+1}}}\right\|^{2}}+\frac{1}{{2\lambda}}{\left\|{{x_{k+1}}-{x_{k}}}\right\|^{2}}\leq F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)+\frac{{\alpha_{k}^{2}}}{{2\lambda}}{\left\|{{x_{k}}-{x_{k-1}}}\right\|^{2}}.

Multiplying by $s_{k+1}^{2},$ we have

	$\displaystyle{E_{k+1}}+\left({s_{k+1}^{2}-s_{k+2}^{2}}\right)\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{1-\mu}}{{2\lambda}}s_{k+1}^{2}{{\left\\|{{x_{k+1}}-{y_{k+1}}}\right\\|}^{2}}$
	$\displaystyle\leq s_{k+1}^{2}\left({F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{{{\left({{s_{k}}-1}\right)}^{2}}}}{{2\lambda}}{{\left\\|{{x_{k}}-{x_{k-1}}}\right\\|}^{2}}$
	$\displaystyle=s_{k+1}^{2}\left({F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)}\right)+\frac{{s_{k}^{2}}}{{2\lambda}}{{\left\\|{{x_{k}}-{x_{k-1}}}\right\\|}^{2}}-\frac{{2{s_{k}}-1}}{{2\lambda}}{{\left\\|{{x_{k}}-{x_{k-1}}}\right\\|}^{2}}\leq{E_{k}}.$

Then, combining with the Lemma 2.6, we have

{E_{k+1}}+\left({\frac{{\left({s_{k+1}^{2}-s_{k+2}^{2}}\right)}}{{s_{k+1}^{2}}}+\frac{{1-\mu}}{{2{\tau_{1}}}}}\right)s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)\leq{E_{k}}.

(12)

Since that $\mathop{\lim\sup}\limits_{k\to\infty}\frac{{s_{k+2}^{2}-s_{k+1}^{2}}}{{s_{k+1}^{2}}}\leq 0$ from Lemma 2.8, we have $\frac{{s_{k+1}^{2}-s_{k+2}^{2}}}{{s_{k+1}^{2}}}\geq-\frac{{1-\mu}}{{4{\tau_{1}}}}$ for $k$ is large sufficiently, then, (12) can be deduce that for any $k\geq k_{0},$ where $k_{0}$ is sufficiently large,

{E_{k+1}}+\frac{{1-\mu}}{{4{\tau_{1}}}}s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)\leq{E_{k}},

(13)

i.e., $\sum\nolimits_{k={k_{0}}}^{+\infty}{s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)}<+\infty.$ Since that increasing the finite term does not change the convergence of the series, we can easy to obtain that $\sum\limits_{k=1}^{\infty}{s_{k+1}^{2}\left({F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)}\right)}$ is convergent. Hence, $F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{s_{k+1}^{2}}}}\right)$ holds ture.

Further, since that $\left\{{{E_{k}}}\right\}$ is convergent from (13), we have $\left\{{s_{k+1}^{2}{{\left\|{{x_{k+1}}-{x_{k}}}\right\|}^{2}}}\right\}$ is bounded, which means that ${\left\|{{x_{k+1}}-{x_{k}}}\right\|}\leq O\left({\frac{1}{{s_{k+1}}}}\right),$ i.e., there exists a constant $c_{1}>0$ such that $\left\|{{x_{k+1}}-{x_{k}}}\right\|\leq\frac{c_{1}}{{{s_{k+1}}}}.$ Recalling the assumption that $\sum\limits_{k=1}^{\infty}{\frac{1}{{s_{k+1}}}}$ is convergent, we can deduce that the sequence $\left\{{{x_{k}}}\right\}$ is a Cauchy series. Suppose that $\mathop{\lim}\limits_{k\to\infty}{x_{k}}=\bar{x},$ we conclude that $\left\{{{x_{k}}}\right\}$ strongly converges to $\bar{x}\in{X^{*}}$ since $F$ is lower semi-continuous convex.

3 The sublinear convergence rates of IFB algorithms with special $t_{k}$

In the following, we show the improved convergence rates for the IFBs with six special $t_{k}$ satisfying the Assumption 2.

Case 1. ${t_{k}}={e^{{{\left({k-1}\right)}^{\alpha}}}},0<\alpha<1.$

Corollary 3.1.

Suppose that Assumption $A_{1}$ holds. Let $\left\{{{x_{k}}}\right\}$ be generated by Algorithm 1 with $t_{k}$ in Case 1 and ${x_{*}}\in{X}.$ Then, we have
1) $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{e^{2{{\left({k-1}\right)}^{\alpha}}}}}}}\right)$ and $\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{e^{{{\left({k-1}\right)}^{\alpha}}}}}}}\right).$
2) $\left\{{{x_{k}}}\right\}$ is converges sublinearly to $\bar{x}\in{X^{*}}$ at the $O\left({{{\left({k-1}\right)}^{\alpha\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil}}{e^{-{{\left({k-1}\right)}^{\alpha}}}}}\right)$ rate of convergence.

Proof 3.2.

We can easily verify that $\mathop{\lim}\limits_{k\to\infty}{k^{1-\alpha}}\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=\alpha,$ which means that Assumption $A_{2}$ holds. By setting ${s_{k}}={t_{k}},$ we conclude that the result 1) is satisfied from Theorem 2.10. It follows from the result 1) that there exists a positive constant $c^{\prime}$ such that $\left\|{{x_{k}}-{x_{k-1}}}\right\|\leq\frac{{c^{\prime}}}{{{e^{{{\left({k-1}\right)}^{\alpha}}}}}},$ we can deduce that

\forall p>1,\quad\left\|{{x_{k+p}}-{x_{k}}}\right\|\leq\sum\limits_{i=k+1}^{k+p}{\left\|{{x_{i}}-{x_{i-1}}}\right\|}\leq\sum\limits_{i=k+1}^{k+p}{\frac{{c^{\prime}}}{{{e^{{{\left({k-1}\right)}^{\alpha}}}}}}}\leq c^{\prime}\int_{k}^{k+p}{{e^{-{{\left({x-1}\right)}^{\alpha}}}}}dx.

Since the convergence of $\int_{1}^{+\infty}{{e^{-{{\left({x-1}\right)}^{\alpha}}}}}dx,$ we see that $\sum\limits_{i=1}^{+\infty}{\left\|{{x_{i}}-{x_{i-1}}}\right\|}$ is convergent, which means that ${\left\{{{x_{k}}}\right\}}$ is a Cauchy series and converges strongly to $\bar{x}\in{X}.$ Then, as $p\to\infty,$ we have

{\left\|{{x_{k}}-\bar{x}}\right\|\leq c^{\prime}\int_{k}^{+\infty}{{e^{-{{\left({x-1}\right)}^{\alpha}}}}}dx=c^{\prime}\int_{{{\left({k-1}\right)}^{\alpha}}}^{+\infty}{\frac{1}{\alpha}{y^{\frac{1}{\alpha}-1}}{e^{-y}}}dy\leq\frac{{c^{\prime}}}{\alpha}\int_{{{\left({k-1}\right)}^{\alpha}}}^{+\infty}{{y^{\left\lceil{\frac{1}{\alpha}-1}\right\rceil}}{e^{-y}}}dy.}

Denote $\omega=\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil$ and $A={\left({k-1}\right)^{\alpha}}.$ We can deduce that

\left\|{{x_{k}}-\bar{x}}\right\|\leq\frac{{c^{\prime}}}{\alpha}\int_{A}^{+\infty}{{y^{\omega}}{e^{-y}}dy}=\frac{{c^{\prime}}}{\alpha}{A^{\omega}}{e^{-A}}+\frac{{c^{\prime}}}{\alpha}\sum\limits_{j=0}^{\omega-1}{\left({\left({\prod\limits_{i=0}^{j}{\left({\omega-i}\right)}}\right)\left({{A^{\omega-j-1}}{e^{-A}}}\right)}\right)}=O\left({{A^{\omega}}{e^{-A}}}\right),

which means that

\left\|{{x_{k}}-\bar{x}}\right\|=O\left({{{\left({k-1}\right)}^{\alpha\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil}}{e^{-{{\left({k-1}\right)}^{\alpha}}}}}\right).

Hence, result 2) holds.

Remark 4. Notice that $\forall p>1,\;{\left({k-1}\right)^{\alpha\left\lceil{{\textstyle{1\over\alpha}}-1}\right\rceil}}{e^{-{{\left({k-1}\right)}^{\alpha}}}}=o\left({\frac{1}{{{k^{p}}}}}\right),$ which means that the sublinear convergence rate of the IFB with the $t_{k}$ in Case 1 is faster than any order.

Case 2. ${t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r>1},a>0\right).$

Corollary 3.3.

Suppose that Assumption $A_{1}$ holds. Let $\left\{{{x_{k}}}\right\}$ be generated by Algorithm 1 with $t_{k}$ in Case 2 and ${x_{*}}\in{X}.$ Then, we have
1) $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2r}}}}}\right)$ and $\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{r}}}}}\right).$
2) $\left\{{{x_{k}}}\right\}$ is converges sublinearly to $\bar{x}\in{X^{*}}$ at the $O\left({\frac{1}{{{k^{r-1}}}}}\right)$ rate of convergence.

Proof 3.4.

It is easy to verify that $\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=r$ , which means that Assumption $A_{2}$ holds. Setting ${s_{k}}={t_{k}},$ then, combining with $\mathop{\lim}\limits_{k\to\infty}\frac{{{s_{k}}}}{{{k^{r}}}}=\frac{1}{a}$ and $\sum\limits_{k=1}^{\infty}{\frac{1}{{{s_{k}}}}}$ is convergent, we can deduce that the result 1) holds and $\left\{{{x_{k}}}\right\}$ converges strongly to $\bar{x}\in{X}$ by Theorem 2.10. It follows from the result 1) that there exists a positive constant $c^{\prime\prime}$ such that $\left\|{{x_{k}}-{x_{k-1}}}\right\|\leq\frac{{c^{\prime\prime}}}{{{k^{r}}}}.$ Then, we can deduce that

\forall p>1,\quad\left\|{{x_{k+p}}-{x_{k}}}\right\|\leq\sum\limits_{i=k+1}^{k+p}{\left\|{{x_{i}}-{x_{i-1}}}\right\|}\leq c^{\prime\prime}\sum\limits_{i=k+1}^{k+p}{\frac{1}{{{k^{r}}}}}\leq c^{\prime\prime}\int_{k}^{k+p}{\frac{1}{{{x^{r}}}}}dx.

Then,

\left\|{{x_{k}}-\bar{x}}\right\|\leq\frac{{c^{\prime\prime}}}{{r-1}}\frac{1}{{{k^{r-1}}}},\;as\;p\to\infty.

Hence, result 2) holds.

Remark 5. For the $t_{k}$ in Case 2, we show that the convergence rate of function values and iterates related to the value of $r$ . The larger $r,$ the better convergence rate Algorithm 1 achieves.

Case 3. ${t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r<1,a>0}\right).$

Corollary 3.5.

Suppose that Assumption $A_{1}$ holds. Let $\left\{{{x_{k}}}\right\}$ be generated by Algorithm 1 with $t_{k}$ in Case 3 and ${x_{*}}\in{X}.$ Then, for any positive constant $p>1,$
1) $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{p}}}}}\right)$ and $\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{p}}}}}\right)$ .
2) $\left\{{{x_{k}}}\right\}$ sublinearly converges to $\bar{x}\in{X^{*}}$ at the $O\left({\frac{1}{{{k^{p-1}}}}}\right)$ rate of convergence.

Proof 3.6.

It is easy to verify that $\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=r$ , which means that Assumption $A_{2}$ holds. Since $\mathop{\lim}\limits_{k\to\infty}{k^{r}}\left({{\gamma_{k}}-1}\right)=\mathop{\lim}\limits_{k\to\infty}\frac{{-{k^{r}}\left({{t_{k+1}}-{t_{k}}+1}\right)}}{{{t_{k+1}}}}=\mathop{\lim}\limits_{k\to\infty}\frac{{-{k^{r}}\left({{{\left({k+1}\right)}^{r}}-{k^{r}}+a}\right)}}{{{{\left({k+1}\right)}^{r}}-1+a}}=-a,$ we have

\displaystyle{\gamma_{k}}=1-\frac{a}{{{k^{r}}}}+o\left({\frac{1}{{{k^{r}}}}}\right),\;as\;k\to+\infty.

(14)

For any positive constant $p>1,$ denote that $s_{1}=1$ and ${s_{k}}={\left({k-1}\right)^{p}},\;k\geq 2.$ Then, we have

\displaystyle{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}={\left({1-\frac{1}{k}}\right)^{p}}-\frac{1}{{{k^{p}}}}=1-\frac{p}{k}+o\left({\frac{1}{k}}\right),\;as\;k\to+\infty.

(15)

By (14) and (15), we obtain that

\displaystyle\mathop{\lim}\limits_{k\to\infty}{k^{r}}\left({{\alpha_{k}}-{\gamma_{k}}}\right)=\mathop{\lim}\limits_{k\to\infty}{k^{r}}\left({-\frac{p}{k}+\frac{a}{{{k^{r}}}}+o\left({\frac{1}{{{k^{r}}}}}\right)+o\left({\frac{1}{k}}\right)}\right)=a>0,

(16)

which implies that ${{\alpha_{k}}\geq{\gamma_{k}}}$ for $k$ is large sufficiently. Hence, using ${s_{k}}\sim{k^{p}}$ and Theorem 2.10, result 1) holds and $\left\{{{x_{k}}}\right\}$ converges strongly to $\bar{x}\in{X}.$ Similar with the proof of Corollary 3.3, we conclude result 2).

Case 4. ${t_{1}}=1$ and ${t_{k}}=\frac{k}{{{{\ln}^{\theta}}k}}\left({\theta>0}\right),$ $\forall k\geq 2.$

Corollary 3.7.

Suppose that Assumption $A_{1}$ holds. Let $\left\{{{x_{k}}}\right\}$ be generated by Algorithm 1 with $t_{k}$ in Case 4 and ${x_{*}}\in{X}.$ Then, for any positive constant $p\geq 2,$
1) $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2p}}}}}\right)$ and $\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{p}}}}}\right)$ .
2) $\left\{{{x_{k}}}\right\}$ sublinearly converges to $\bar{x}\in{X^{*}}$ at the $O\left({\frac{1}{{{k^{p-1}}}}}\right)$ rate of convergence.

Proof 3.8.

We can prove that $\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=1,$ which means that Assumption $A_{2}$ holds. Observe that

\begin{array}[]{l}{\gamma_{k}}=\frac{{{t_{k}}}}{{{t_{k+1}}}}-\frac{1}{{{t_{k+1}}}}=\left({1-\frac{1}{{k+1}}}\right){\left({\frac{{\ln\left({k+1}\right)}}{{\ln k}}}\right)^{\theta}}-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =\left({1-\frac{1}{{k+1}}}\right){\left({1+\frac{{\ln\left({1+\frac{1}{k}}\right)}}{{\ln k}}}\right)^{\theta}}-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =\left({1-\frac{1}{{k+1}}}\right)\left({1+\frac{{\theta\ln\left({1+\frac{1}{k}}\right)}}{{\ln k}}+o\left({\frac{{\ln\left({1+\frac{1}{k}}\right)}}{{\ln k}}}\right)}\right)-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =\left({1-\frac{1}{{k+1}}}\right)\left({1+o\left({\frac{1}{{k+1}}}\right)}\right)-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}\\ =1-\frac{1}{{k+1}}-\frac{{{{\ln}^{\theta}}\left({k+1}\right)}}{{k+1}}+o\left({\frac{1}{{k+1}}}\right).\end{array}

Denote that ${s_{k}}={k^{p}},$ where $p\geq 2.$ Then, we easily obtain that

{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}={\left({1-\frac{1}{{k+1}}}\right)^{p}}-\frac{1}{{{{\left({k+1}\right)}^{p}}}}=1-\frac{p}{{k+1}}+o\left({\frac{1}{{k+1}}}\right).

Hence, the condition ${{\alpha_{k}}\geq{\gamma_{k}}}$ holds true for $k$ is large sufficiently. Using Theorem 2.10, result 1) holds. And similar with the proof of Corollary 3.3, result 2) holds.

Case 5. $t_{1}=1$ and ${t_{k+1}}=\frac{{1+\sqrt{1+4t_{k}^{2}}}}{2},$ $\forall k\geq 1.$

Corollary 3.9.

Suppose that Assumption $A_{1}$ holds. Let $\left\{{{x_{k}}}\right\}$ be generated by Algorithm 1 with $t_{k}$ in Case 5 and ${x_{*}}\in{X}.$ Then,
1) $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{6}}}}}\right)$ and $\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{3}}}}}\right)$ .
2) $\left\{{{x_{k}}}\right\}$ sublinearly converges to $\bar{x}\in{X^{*}}$ at the $O\left({\frac{1}{{{k^{2}}}}}\right)$ rate of convergence.

Proof 3.10.

We can easily obtain that $\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=1,$ which means that Assumption $A_{2}$ holds. Observe that

\displaystyle{\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}}=1-\frac{{{t_{k+1}}-{t_{k}}-\frac{1}{2}}}{{{t_{k+1}}}}-\frac{3}{{2{t_{k+1}}}}.

(17)

Since that $\mathop{{\rm{lim}}}\limits_{k\to\infty}{t_{k}}\left({{t_{k+1}}-{t_{k}}-\frac{1}{2}}\right)=\mathop{{\rm{lim}}}\limits_{k\to\infty}{t_{k}}\left({\frac{{1+\sqrt{1+4t_{k}^{2}}}}{2}-{t_{k}}-\frac{1}{2}}\right)=\frac{1}{8}$ and $\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{{t_{k}}}}{k}=\frac{1}{2},$ we can deduce that

\mathop{{\rm{lim}}}\limits_{k\to\infty}{k^{2}}\left({\frac{{{t_{k+1}}-{t_{k}}-\frac{1}{2}}}{{{t_{k+1}}}}}\right)=\frac{1}{2},

which means that

\displaystyle\frac{{{t_{k+1}}-{t_{k}}-\frac{1}{2}}}{{{t_{k+1}}}}=\frac{1}{{2{k^{2}}}}+o\left({\frac{1}{{{k^{2}}}}}\right),\;as\;k\to+\infty.

(18)

By Stolz theorem, we obtain

\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{\frac{1}{2}k-{t_{k+1}}}}{{\ln k}}=\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{\frac{1}{2}-\left({{t_{k+2}}-{t_{k+1}}}\right)}}{{\ln\left({1+\frac{1}{k}}\right)}}=\mathop{\rm{lim}}\limits_{k\to\infty}\frac{{-k}}{{{t_{k+1}}}}{t_{k+1}}\left({{t_{k+2}}-{t_{k+1}}-\frac{1}{2}}\right)=-\frac{1}{4},

then,

\mathop{\lim}\limits_{k\to\infty}\frac{{\frac{3}{{2{t_{k+1}}}}-\frac{3}{k}}}{{\frac{{\ln k}}{{{k^{2}}}}}}=\mathop{\lim}\limits_{k\to\infty}3\frac{k}{{{t_{k+1}}}}\left({\frac{{\frac{1}{2}k-{t_{k+1}}}}{{\ln k}}}\right)=-\frac{3}{2},

which means that

\displaystyle\frac{3}{{2{t_{k+1}}}}=\frac{3}{k}-\frac{3}{2}\frac{{\ln k}}{{{k^{2}}}}+o\left({\frac{{\ln k}}{{{k^{2}}}}}\right),\;as\;k\to+\infty.

(19)

Hence, by (17)–(19), we have

\displaystyle{\gamma_{k}}=1-\frac{3}{k}+\frac{{3\ln k}}{{2{k^{2}}}}+o\left({\frac{{\ln k}}{{{k^{2}}}}}\right),\;as\;k\to+\infty.

(20)

Denote that ${s_{1}}={s_{2}}=1$ and ${s_{k}}=\frac{{{{\left({k-1}\right)}^{3}}}}{{{{\left({\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}},\;\forall k\geq 3.$ Then, since that $\int_{1}^{+\infty}{\frac{{\ln x}}{{{x^{2}}}}dx}=1$ and $\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}=o\left({\frac{{\ln k}}{{{k^{2}}}}}\right),$ we have

$\displaystyle{\alpha_{k}}$	$\displaystyle=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}$	(21)
	$\displaystyle={{\left({1-\frac{1}{k}}\right)}^{3}}{{\left({1+\frac{{\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}}{{\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{2}}}}dx}}}}\right)}^{2}}-\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}$
	$\displaystyle\geq{{\left({1-\frac{1}{k}}\right)}^{3}}{{\left({1+\frac{{\frac{{\ln k}}{{{k^{2}}}}}}{{\int_{1}^{+\infty}{\frac{{\ln x}}{{{x^{2}}}}dx}}}}\right)}^{2}}-\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}$
	$\displaystyle={{\left({1-\frac{1}{k}}\right)}^{3}}{{\left({1+\frac{{\ln k}}{{{k^{2}}}}}\right)}^{2}}-\frac{{{{\left({\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{2}}}}dx}}\right)}^{2}}}}{{{k^{3}}}}$
	$\displaystyle=1-\frac{3}{k}+\frac{{2\ln k}}{{{k^{2}}}}+o\left({\frac{{\ln k}}{{{k^{2}}}}}\right),\;as\;k\to+\infty.$

Obviously, by (20) and (21), we have $\mathop{\lim}\limits_{k\to\infty}\frac{{{\alpha_{k}}-{\gamma_{k}}}}{{\frac{{\ln k}}{{{k^{2}}}}}}\geq\frac{1}{2}.$ Therefore, $\alpha_{k}\geq\gamma_{k}$ for $k$ is large sufficiently. Using the fact that ${s_{k}}\sim{k^{3}}$ and Theorem 2.10, we conclude the result 1) and $\left\{{{x_{k}}}\right\}$ converges strongly to $\bar{x}\in{X}.$ Further, similar with the proof of Corollary 3.3, result 2) holds.

Case 6. ${t_{k}}=\frac{{k-1+a}}{a}\;\left({a>0}\right).$

Corollary 3.11.

Suppose that Assumption $A_{1}$ holds. Let $\left\{{{x_{k}}}\right\}$ be generated by Algorithm 1 with $t_{k}$ in Case 6 and ${x_{*}}\in{X}.$ Then,
1) $F\left({{x_{k}}}\right)-F\left({{x_{*}}}\right)=o\left({\frac{1}{{{k^{2\left({a+1}\right)}}}}}\right)$ and $\left\|{{x_{k}}-{x_{k-1}}}\right\|=O\left({\frac{1}{{{k^{a+1}}}}}\right)$ .
2) $\left\{{{x_{k}}}\right\}$ sublinearly converges to $\bar{x}\in{X^{*}}$ at the $O\left({\frac{1}{{{k^{a}}}}}\right)$ rate of convergence.

Proof 3.12.

It is easy to verify that $\mathop{\lim}\limits_{k\to\infty}k\left({\frac{{{t_{k+1}}}}{{{t_{k}}}}-1}\right)=1$ , which means that Assumption $A_{2}$ holds. Observe that ${\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}}=\frac{{k-1}}{{k+a}}.$ For $a\geq 1,$ denote ${s_{k}}={\left({k+a-1}\right)^{a+1}}.$ Otherwise, denote ${s_{1}}={s_{2}}=1,$ and ${s_{k}}=\frac{{{{\left({k+a-1}\right)}^{a+1}}}}{{\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}},\;\forall k\geq 3.$

1) For the case $a>1,$ we have

\displaystyle\begin{array}[]{l}{\alpha_{k}}=\frac{{{{\left({k+a-1}\right)}^{a+1}}-1}}{{{{\left({k+a}\right)}^{a+1}}}}={\left({1-\frac{1}{{k+a}}}\right)^{a+1}}-\frac{1}{{{{\left({k+a}\right)}^{a+1}}}}\\ \quad\;\;=1-\frac{{a+1}}{{k+a}}+\frac{{a\left({a+1}\right)}}{2}\frac{1}{{{{\left({k+a}\right)}^{2}}}}+o\left({\frac{1}{{{{\left({k+a}\right)}^{2}}}}}\right)-\frac{1}{{{{\left({k+a}\right)}^{a+1}}}}\\ \quad\;\;={\gamma_{k}}+\frac{{a\left({a+1}\right)}}{2}\frac{1}{{{{\left({k+a}\right)}^{2}}}}+o\left({\frac{1}{{{{\left({k+a}\right)}^{2}}}}}\right),as\;k\to+\infty.\end{array}

(25)

which means that $\mathop{\lim}\limits_{k\to\infty}{\left({k+a}\right)^{2}}\left({{\alpha_{k}}-{\gamma_{k}}}\right)=\frac{{a\left({a+1}\right)}}{2}>0,$ i.e., ${{\alpha_{k}}>{\gamma_{k}}}$ for $k$ is large sufficiently.

2) For the case $a=1,$ we have

\displaystyle{\alpha_{k}}=\frac{{{k^{2}}-1}}{{{{\left({k+1}\right)}^{2}}}}={\left({1-\frac{1}{{k+1}}}\right)^{2}}-\frac{1}{{{{\left({k+1}\right)}^{2}}}}={\gamma_{k}}.

(26)

Obviously, ${{\alpha_{k}}\geq{\gamma_{k}}}$ for any $k\geq 1.$

3) For the case $a<1,$ we have

$\displaystyle{\alpha_{k}}$	$\displaystyle=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}={{\left({1-\frac{1}{{k+a}}}\right)}^{a+1}}\left({1+\frac{{\int_{k-1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{\int_{1}^{k-1}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}}\right)-\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}}$	(27)
	$\displaystyle\geq{{\left({1-\frac{1}{{k+a}}}\right)}^{a+1}}\left({1+\frac{{\frac{{\ln k}}{{{k^{1+a}}}}}}{{\int_{1}^{+\infty}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}}\right)-\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}}$
	$\displaystyle={{\left({1-\frac{1}{{k+a}}}\right)}^{a+1}}\left({1+\frac{{{a^{2}}\ln k}}{{{k^{1+a}}}}}\right)-\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}},\;as\;k\to+\infty.$

From ${\left({1-\frac{1}{{k+a}}}\right)^{a+1}}=1-\frac{{a+1}}{{k+a}}+\frac{{a\left({a+1}\right)}}{2}\frac{1}{{{{\left({k+a}\right)}^{2}}}}+o\left({\frac{1}{{{{\left({k+a}\right)}^{2}}}}}\right),$ $\frac{1}{{{{\left({k+a}\right)}^{2}}}}=o\left({\frac{{\ln k}}{{{k^{1+a}}}}}\right)$ and $\frac{{\int_{1}^{k}{\frac{{\ln x}}{{{x^{1+a}}}}dx}}}{{{{\left({k+a}\right)}^{a+1}}}}=o\left({\frac{{\ln k}}{{{k^{1+a}}}}}\right)$ , (27) can be deduced that

{\alpha_{k}}\geq{\gamma_{k}}+\frac{{k-1}}{{k+a}}\frac{{{a^{2}}\ln k}}{{{k^{1+a}}}}+o\left({\frac{{\ln k}}{{{k^{1+a}}}}}\right),

which implies that $\mathop{\lim}\limits_{k\to\infty}\frac{{{\alpha_{k}}-{\gamma_{k}}}}{{\frac{{\ln k}}{{{k^{1+a}}}}}}\geq{a^{2}}>0,$ i.e., ${\alpha_{k}}\geq{\gamma_{k}}$ for $k$ is large sufficiently.

Hence, $\gamma_{k}\leq\alpha_{k}$ holds for $k$ is large sufficiently. Since that ${s_{k}}=O\left({{k^{a+1}}}\right),$ we conclude the result 1) from Theorem 2.10 and $\left\{{{x_{k}}}\right\}$ converges strongly to $\bar{x}\in{X}.$ Further, similar with the proof of Corollary 3.3, result 2) holds.

Remark 6. We see that $t_{k}$ in case 2 is the $t_{k}$ proposed in FISTA_CD (see, Chambolle & Dossal, 2015) but with a wider scope of $a.$ Corollary 3.11 shows that the convergence rates of IFB with $t_{k}$ in Case 2 related to the value of $a.$

Notice that both of the convergence results in Corollary 3.1 and Corollary 3.5 enjoy sublinear convergence rate of $o\left({\frac{1}{{{k^{p}}}}}\right)$ for any $p>1.$ Here, We give a further analysis for the convergence rate of the IFB from another aspect. From our Assumption $A_{2},$ we can derive that

{\gamma_{k}}=1-\frac{c}{{{k^{\sigma}}}}+o\left({\frac{1}{{{k^{\sigma}}}}}\right)-\frac{1}{{{t_{k+1}}}}.

For ${t_{k}}$ in Case 1, we have ${\gamma_{k}}=1-\frac{\alpha}{{{k^{1-\alpha}}}}+o\left({\frac{1}{{{k^{1-\alpha}}}}}\right);$ For ${t_{k}}$ in Case 3, we have corresponding ${\gamma_{k}}=1-\frac{a}{{{k^{r}}}}+o\left({\frac{1}{{{k^{r}}}}}\right).$ Obviously, these two ${\gamma_{k}}$ are of the similar magnitude, in particular, they should be of the same order if we choose $r=0.5,$ $a=0.5,$ and $\alpha=0.5,$ theoretically. Thus, it’s reasonable that the corresponding IFBs have similar numerical experiments. Numerical results in Section 4 can confirm this conclusion.

4 Inertial Forward-Backward Algorithm with an Adaptive Modification

For solving the problem ( $P$ ), the authors in Wen & Chen (2017) showed that under the error bound condition, the sequences $\left\{{{x_{k}}}\right\}$ and $\left\{{F\left({{x_{k}}}\right)}\right\}$ generated by FISTA with fixed restart are $R$ -linearly convergent. In O’Donoghue & Candès (2015), an adaptive scheme for FISTA were proposed and enjoyed global linear convergence of the objective values when applying this method to problem ( $P$ ) with $f$ being strongly convex and $g=0.$ And the authors stated that after a certain number of iterations, adaptive restarting may provide linear convergence for Lasso, while they didn’t prove similar results for general nonsmooth convex problem ( $P$ ). In this section, we will explain that Inertial forward-backward algorithm with an adaptive modification enjoys same convergence results as we proved in Section 2 and Section 3. The adaptive modification scheme is described below:

Algorithm 2 Inertial forward-backward algorithm with an adaptive modification (IFB_AdapM)

Step 0. Take ${y_{1}}={x_{0}}\in{R^{n}},{t_{1}}=1.$ Input $\lambda=\frac{\mu}{{{L_{f}}}},$ where ${\mu}\in\left]{{\rm{0}},{\rm{1}}}\right[$ .
       Step k. Compute
                           ${x_{k}}={T_{\lambda}}\left({{y_{k}}}\right)={\rm{pro}}{{\rm{x}}_{\lambda g}}\left({{y_{k}}-\lambda\nabla f\left({{y_{k}}}\right)}\right)$
                              ${y_{k+1}}={x_{k}}+{\gamma_{k}}\left({{x_{k}}-{x_{k-1}}}\right),$
                 where $\left\{\begin{array}[]{l}{\gamma_{k}}=0,\;{\rm{if}}\;{\left({{y_{k}}-{x_{k}}}\right)^{T}}\left({{x_{k}}-{x_{k-1}}}\right)>0\;{\rm{or}}\;F\left({{x_{k}}}\right)>F\left({{x_{k-1}}}\right),\\ {\gamma_{k}}=\frac{{{t_{k}}-1}}{{{t_{k+1}}}},\;{\rm{otherwise}.}\end{array}\right.$

Note that the adaptive modification condition is same with the adpative restart scheme in O’Donoghue & Candès (2015). Here, we call the condition ${\left({{y_{k}}-{x_{k}}}\right)^{T}}\left({{x_{k}}-{x_{k-1}}}\right)>0$ as gradient modification scheme and call $F\left({{x_{k}}}\right)>F\left({{x_{k-1}}}\right)$ as function modification scheme. While, unlike the restart strategy setting ${t_{k}}=0$ every time the restart condition holds to make the update of mumentum restarts from 0, Algorithm 2 sets the momentum back to 0 (Called adaptive modification step) at the current iteration but don’t interrupt the update of mumentum. Based on Theorem 2.2 and the fact ${\gamma_{k}}=0\leq{\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}},$ we can obtain the same convergence rates for the function values and iterates of Algorithm 2. Specifically, Algorithm 2 with ${t_{k}}={e^{{{\left({k-1}\right)}^{\alpha}}}}\left({0<\alpha<1}\right)$ or ${t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r<1,a>0}\right)$ converges with any sublinear rate of type $\frac{1}{{{k^{p}}}}$ and the corresponding numerical performances compare favourably with FISTA equipped with the fixed restart scheme or both the fixed and adaptive restart schemes, which has $R$ -linearly convergence rate (See the numerical experiments in Section 5).

5 Numerical Experiments

In this section, we conduct numerical experiments to study the numerical performance of IFB with different options of $t_{k}$ and to verify our theoretical results. The codes are available at https://github.com/TingWang7640/Paper_EB.git

LASSO We first consider the LASSO

\displaystyle\mathop{\min}\limits_{x\in{R^{n}}}F\left(x\right)=\frac{1}{2}{\left\|{Ax-b}\right\|^{2}}+\delta{\left\|x\right\|_{1}}.

(28)

We generate an $A\in{R^{m\times n}}$ be a Gaussian matrix and randomly generate a $s-$ sparse vector $\hat{x}$ and set $b=A\hat{x}+0.5\varepsilon,$ where $\varepsilon$ has standard i.i.d. Gaussian entries. And set $\delta=1.$ We observe that (28) is in the form of problem ( $P$ ) with $f\left(x\right)=\frac{1}{2}{\left\|{Ax-b}\right\|^{2}}$ and $g\left(x\right)=\delta{\left\|x\right\|_{1}}.$ It is clear that $f$ has a Lipschitz continuous gradient and ${L_{f}}={\lambda_{\max}}\left({{A^{T}}A}\right).$ Moreover, we can observe that (28) satisfies the local error bound condition based on the third example in Introduction with $h\left(x\right)=\frac{1}{2}{\left\|x\right\|^{2}}$ and $c=-{A^{T}}b.$ We terminate the algorithms once $\left\|{\partial F\left({{x_{k}}}\right)}\right\|<{10^{-8}}.$

Considering Corollary 3.3. We know that in theory, the rate of convergence of IFB with $t_{k}$ in Case 2 should improve constantly as $r$ increasing. In the Fig.1, we test four choices of $r,$ which is $r=2,$ $r=4,$ $r=6$ and $r=8,$ to show the same result in experiments as in theory. Denote that the IFB with ${t_{k}}=\frac{{{k^{r}}-1+a}}{a}$ is called as “FISTA_pow(r)”. Here we set $a=4.$ And the constant stepsize is $\lambda=\frac{{0.98}}{{{L_{f}}}}.$

In Corollary 3.11, we show that the convergence rate of corresponding IFB greatly related to the value of $a.$ In the Fig.2, we test four choices of $a,$ which is $a=4,$ $a=6,$ $a=8$ and $a=10,$ to verify our theoretical results. Set $\lambda=\frac{0.98}{{{L_{f}}}}.$

Now, we perform numerical experiments to study the IFB with five choices of $t_{k}.$ Notice that the IFBs with $t_{k}$ discussed in Case 1 and Case 3 enjoy the rates of convergence better than any order of convergence rate, and in the end of last section, we emphasize that these two IFBs should achieve almost the same numerical experiments if we set the related parameters as $r=0.5,$ $a=0.5,$ and $\alpha=0.5.$ Hence, we consider the following algorithms:
1) FISTA;
2) FISTA_CD with $a=4$ ;
3) FISTA_pow(8), i.e., the IFB with ${t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r=8\;{\rm{and}}\;a=4}\right)$ .
4) FISTA_pow(0.5), i.e., the IFB with ${t_{k}}=\frac{{{k^{r}}-1+a}}{a}\;\left({r=0.5\;{\rm{and}}\;a=0.5}\right)$ .
5) FISTA_exp, i.e., the IFB with ${t_{k}}={e^{{{\left({k-1}\right)}^{\alpha}}}},0<\alpha<1.$ And set $\alpha=0.5.$

Set $\lambda=\frac{{0.98}}{{{L_{f}}}}.$

Our computational results are presented in Fig.3. We see that FISTA_exp and FISTA_pow(0.5) cost many fewer steps than FISTA_CD and FISTA, and faster than FISTA_pow(8). This results are same as the theoretical analyses in Section 3. And we see that the two lines of FISTA_exp and FISTA_pow(0.5) almost coincide, here, we give the detail number of iterations: for FISTA_exp, it’s number of iteration is 3948, and for FISTA_pow(0.5), it’s number of iteration is 3964, which confirms our theoretical analysis.

Sparse Logistic Regression. We also consider the sparse logistic regression with the $l_{1}$ regularized, that is

\displaystyle\mathop{\min}\limits_{x}\frac{1}{n}\sum\limits_{i=1}^{n}{\log\left({1+\exp\left({-{l_{i}}\left\langle{{h_{i}},x}\right\rangle}\right)}\right)}+\delta\left\|{{x}}\right\|_{1},

(29)

where ${h_{i}}\in{R^{m}},$ ${l_{i}}\in\left\{{-1,1}\right\},i=1,\cdots n.$ Define ${K_{ij}}=-{l_{i}}{h_{ij}}$ and ${L_{f}}=\frac{4}{n}\left\|{{K^{T}}K}\right\|.$ It satisfies the local error bound condition since the third example in Introduction with $h\left(x\right)=\frac{1}{n}\sum\limits_{i=1}^{n}{\log\left({1+\exp\left({{x_{i}}}\right)}\right)}$ and $A=K,$ $c=0.$ Set $\delta=1.e-2.$ We take three datasets “w4a”, “a9a” and “sonar” from LIBSVM (see, Chang & Lin, 2011). And the computational results relative to the number of iterations are reported in following Table 1.

Table 1: Comparison of the number of iterations

	FISTA	FISTA_CD	FISTA_pow(8)	FISTA_pow(0.5)	FISTA_exp
“ w4a ”	1147	760	544	510	548
“ a9a ”	2049	1289	757	623	714
“ sonar ”	8405	3406	1586	922	980

We see from Table 1 that FISTA_exp, FISTA_pow(0.5) and FISTA_pow(8) outperform FISTA and FISTA_CD and the numerical results are consistent with the theoretical ones.

Strong convex quadratic programming with simplex constraints.

\mathop{\min}\limits_{x\in\left[{sl,su}\right]}\frac{1}{2}{x^{T}}Ax+{b^{T}}x,

where $A\in{R^{m\times m}}$ is a symmetric positive definite matrix generated by $A={B^{T}}B+sI$ where $B\in{R^{\frac{m}{2}\times m}}$ with i.i.d. standard Gaussian entries and $s$ chosen uniformly at random from $\left[{0,1}\right]$ . The vector $b\in{R^{m}}$ is generated with i.i.d. standard Gaussian entries. Set $su$ = ones(m,1) and $sl$ = -ones(m,1). Notice that $f\left(x\right)=\frac{1}{2}{x^{T}}Ax+{b^{T}}x$ with ${\mu_{f}}={\lambda_{\min}}\left(A\right)$ and ${L_{f}}={\lambda_{\max}}\left(A\right),$ and $g\left(x\right)={\delta_{\left[{sl,su}\right]}}\left(x\right).$ Here, we terminate the algorithms once $\left\|{\partial F\left({{x_{k}}}\right)}\right\|<{10^{-6}}.$

Now we perform numerical experiments with (1) Forward-Backward method without inertial (FB); (2) FISTA with fixed and adaptive restart schemes (FISTA_R); (3) IFB with ${\beta^{*}}:=\frac{{\sqrt{{\mu_{f}}}-\sqrt{{L_{f}}}}}{{\sqrt{{\mu_{f}}}+\sqrt{{L_{f}}}}}$ (IFB_ ${\beta^{*}}$ ); (4) FISTA_exp; (5) Algorithm 2 (Gradient scheme) with ${t_{k}}={e^{\sqrt{k-1}}}$ (IFB_AdapM_exp).

According to Corollary 3.1, we know that $o\left({\frac{1}{{{k^{p}}}}}\right)$ sublinearly convergence rate for the sequences $\left\{{{x_{k}}}\right\}$ and $\left\{{F\left({{x_{k}}}\right)}\right\}$ generated by algorithms FISTA_exp and IFB_AdapM_exp, which slower than $R$ -linear convergence for algorithms FB, FISTA_R and IFB_ ${\beta^{*}}$ from a theoretical point of view. However, we can see from Fig.4 and Fig.5 that IFB_AdapM_exp always has better performance than FISTA_exp and sometimes, IFB_AdapM_exp performs better than other four algorithms and FISTA_exp performs similar with IFB_ ${\beta^{*}}$ but don’t require the strong convex parameter. Consequently, although the linear convergence rate is not reached, FISTA_exp and IFB_AdapM_exp still have good numerical performances, and this adaptive modification scheme can significantly improve the convergence speed of IFB.

6 Conclusion

In this paper, under the local error bound condition, we study the convergence results of IFBs with a class of abstract $t_{k}$ satisfying the assumption $A_{2}$ for solving the problem ( $P$ ). We use a new method called “comparison method” to discuss the improved convergence rates of function values and sublinear rates of convergence of iterates generated by the IFBs with six choices of $t_{k}.$ In particular, we show that, under the local error bound condition, the strong convergence of iterates generated by the original FISTA can be established, the convergence rate of FISTA_CD is actually related to the value of $a,$ and the sublinear convergence rates for both of function values and iterates generated by IFBs with $t_{k}$ in Case 1 and Case 3 can achieve $o\left({\frac{1}{{{k^{p}}}}}\right)$ for any positive integer $p>1.$ Specifically, our results still hold for IFBs with an adaptive modification scheme.

Acknowledgements

The work was supported by the National Natural Science Foundation of China (No.11901561), the Natural Science Foundation of Guangxi (No.2018GXNSFBA281180) and the Postdoctoral Fund Project of China (Grant No.2019M660833).

References

Attouch & Peypouquet (2016) Attouch, H. & Peypouquet, J. (2016) The rate of convergence of Nesterov’s accelerated forwardbackward method is actually faster than ${\frac{1}{{{k^{2}}}}}$ . SIAM J. Optim., 26, 1824–1834.
Attouch & Cabot (2018) Attouch, H. & Cabot, A. (2018) Convergence rates of inertial forward-backward algorithms. SIAM J. Optim., 28, 849–874.
Apidopoulos & Aujol (2020) Apidopoulos, V., Aujol, J. & Dossal, C. (2020) Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’s rule. Math. Program., 180, 137–156.
Beck & Teboulle (2009) Beck, A. & Teboulle, M. (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci., 2, 183–202.
Chambolle & Dossal (2015) Chambolle, A. & Dossal, C. (2015) On the convergence of the iterates of the “fast iterative shrinkage-thresholding algorithm. J. Optim. Theory Appl., 166, 968–982.
Chambolle & Pock (2016) Chambolle, A. & Pock, T. (2015) An introduction to continuous optimization for imaging. Acta Numerica., 25, 161–319.
Chang & Lin (2011) Chang, C. C. & Lin, C. J. (2011) LIBSVM: a library for support vector machines. ACM. Trans. Intell. Syst. Technol., 2, 1–27.
Calatroni & Chambolle (2019) Calatroni, L. & Chambolle, A. (2019) Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM J. Optim., 29, 1772–1798.
Donghwan & Jeffrey (2018) Donghwan, K. & Jeffrey, A. F. (2018) Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim., 28, 223–250.
Hai (2020) Hai, T. N.. (2020) Error bounds and stability of the projection method for strongly pseudomonotone equilibrium problems. Int. J. Comput. Math., also available online from https://doi.org/10.1080/00207160.2019.1711374.html.
Johnstone & Moulin (2017) Johnstone, P. R. & Moulin, P. (2017) Local and global convergence of a general inertial proximal splitting scheme for minimizing composite functions. Comput. Optim. Appl., 67, 259–292.
Luo & Tseng (1992) Luo, Z. & Tseng, P. (1992) On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim., 30, 408–425.
Luo & Tseng (1992a) Luo, Z. & Tseng, P. (1992a) Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim., 2, 43-54.
Luo & Tseng (1993) Luo, Z. & Tseng, P. (1993) On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res., 18, 846-867.
Moudafi & Oliny (2003) Moudafi, A. & Oliny, M. (2003) Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math., 155, 447–454.
Mridula & Shukla (2020) Mridula, V. & Shukla, K. K. (2020) Convergence analysis of accelerated proximal extra-gradient method with applications. Neurocomputing., 388, 288–300.
Necoara & Nesterov (2019) Necoara, I., Nesterov, Y. & Glineur, F. (2019) Linear convergence of first order methods for non-strongly convex optimization. Math. Program., 175, 69–107.
Nesterov (2019) Nesterov, Y. (2019) A method for solving the convex programming problem with convergence rate $O\left({\frac{1}{{{k^{2}}}}}\right)$ . Dokl. Akad. Nauk SSSR., 269, 543–547.
Nesterov (2013) Nesterov, Y. (2013) Gradient methods for minimizing composite functions. Math. Program., 140, 125–161.
O’Donoghue & Candès (2015) O’Donoghue, B. & Candès, E. (2015) Adaptive restart for accelerated gradient schemes. Dokl. Akad. Nauk SSSR., 15, 715–732.
Pang (1987) Pang, J. S. (1987) A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res., 12, 474–484.
Su & Boyd (2016) Su, W., Boyd, S. & Candès, E. J. (2016) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. J. Mach. Learn. Res., 17, 1–43.
Tseng & Yun (2009) Tseng, P. & Yun, S. (2009) A coordinate gradient descent method for nonsmooth separable minimization. Math. Program., 117, 387–423.
Tseng & Yun (2010) Tseng, P. & Yun, S. (2010) A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput Optim Appl., 47, 179–206.
Tseng (2010) Tseng, P. (2010) Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program., 125, 263–295.
Tao & Boley (2016) Tao, S. Z., Boley, D. & Zhang, S. Z. (2016) Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM J. Optim., 26, 313–336.
Villa & Salzo (2013) Villa, S., Salzo, S., Baldassarre, L. & Verri, A. (2013) Accelerated and Inexact Forward-Backward Algorithms. SIAM J. Optim., 23, 1607–1633.
Wen & Chen (2017) Wen, B., Chen, X. J., & Pong, T. K. (2017) Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problemsLinear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim., 27, 124–145.
Xiao & Zhang (2013) Xiao, L. & Zhang, T. (2013) A Proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim., 23, 1062–1091.
Zhou & So (2017) Zhou, Z. & So, A. M. (2017) A unified approach to error bounds for structured convex optimization problems. Math. Program., 165, 689–728.
Nesterov (2003) Nesterov, Y. (2003) Introductory lectures on convex optimization: A basic course. Springer Science & Business Media.

Appendix A Proof of Lemma 2.4

Proof A.1.

Assume by contradiction that $\mathop{\lim\inf}\limits_{k\to\infty}{s_{k}}=l<+\infty.$ Notice that $l\geq 0$ since that $\left\{{{s_{k}}}\right\}$ is a nonnegative sequence. Then, there exists a subsequence $\left\{{{s_{{k_{j}}}}}\right\}$ such that $\mathop{\lim}\limits_{j\to\infty}{s_{{k_{j}}}}=l.$ By the condition ${\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}},$ we have $\frac{{{s_{{k_{j}}}}-1}}{{{s_{{k_{j}}+1}}}}\geq{\gamma_{{k_{j}}}},$ then, combining with the fact that $\mathop{\lim}\limits_{k\to\infty}{\gamma_{k}}=1$ from Remark 3, we deduce that

\mathop{\lim\sup}\limits_{j\to\infty}{s_{{k_{j}}+1}}\leq\mathop{\lim}\limits_{j\to\infty}\frac{{{s_{{k_{j}}}}-1}}{{{\gamma_{{k_{j}}}}}}=l-1,

which leads to a contradiction that $l=\mathop{\lim\inf}\limits_{k\to\infty}{s_{k}}\leq\mathop{\lim\inf}\limits_{j\to\infty}{s_{{k_{j}}+1}}\leq\mathop{\lim\sup}\limits_{j\to\infty}{s_{{k_{j}}+1}}\leq l-1.$ Hence, $\mathop{\lim}\limits_{k\to\infty}{s_{k}}=+\infty.$

Further, by the condition ${\alpha_{k}}=\frac{{{s_{k}}-1}}{{{s_{k+1}}}}\geq{\gamma_{k}},$ we get $\frac{{{s_{k+1}}}}{{{s_{k}}}}\leq\frac{1}{{{\gamma_{k}}+\frac{1}{{{s_{k+1}}}}}}.$ Combining with $\mathop{\lim}\limits_{k\to\infty}{\gamma_{k}}=1$ and $\mathop{\lim}\limits_{k\to\infty}{s_{k}}=+\infty,$ we obtain that $\mathop{\lim\sup}\limits_{k\to\infty}\frac{{{s_{k+1}}}}{{{s_{k}}}}\leq 1.$ Since that $\left\{{{s_{k}}}\right\}$ is a nonnegative subsequence, we have $\mathop{\lim\sup}\limits_{k\to\infty}{\left({\frac{{{s_{k+1}}}}{{{s_{k}}}}}\right)^{2}}\leq 1,$ which leads to the result that

\mathop{\lim\sup}\limits_{k\to\infty}\frac{{s_{k+1}^{2}-s_{k}^{2}}}{{s_{k}^{2}}}=\mathop{\lim\sup}\limits_{k\to\infty}{\left({\frac{{{s_{k+1}}}}{{{s_{k}}}}}\right)^{2}}-1\leq 0.

	$\displaystyle F\left({{x_{k+1}}}\right)-F\left({{x_{*}}}\right)$		(8)
	$\displaystyle\leq\frac{1}{{2\lambda}}{\left\\|{{y_{k+1}}-x_{k+1}^{}}\right\\|^{2}}-\frac{1}{{2\lambda}}{\left\\|{{x_{k+1}}-x_{k+1}^{}}\right\\|^{2}}$
	$\displaystyle=\frac{1}{{2\lambda}}{\left\\|{{y_{k+1}}-{x_{k+1}}}\right\\|^{2}}+\frac{1}{\lambda}\left\langle{{y_{k+1}}-{x_{k+1}},{x_{k+1}}-x_{k+1}^{*}}\right\rangle$
	$\displaystyle\leq\frac{1}{{2\lambda}}{\left\\|{{y_{k+1}}-{x_{k+1}}}\right\\|^{2}}+\frac{1}{\lambda}\left\\|{{y_{k+1}}-{x_{k+1}}}\right\\|{\rm{dist}}\left({{x_{k+1}},{X}}\right).$

Convergence Rate of Inertial Forward-Backward Splitting Algorithms Based on the Local Error Bound Condition

Abstract

1 Introduction

2 An new assumption condition for tkt_{k} and the convergence of the corresponding IFB algorithms

Lemma 2.1.

Lemma 2.2.

Proof 2.3.

Theorem 2.4.

Proof 2.5.

Lemma 2.6.

Proof 2.7.

Lemma 2.8.

Proof 2.9.

Theorem 2.10.

Proof 2.11.

3 The sublinear convergence rates of IFB algorithms with special tkt_{k}

Corollary 3.1.

Proof 3.2.

Corollary 3.3.

Proof 3.4.

Corollary 3.5.

Proof 3.6.

Corollary 3.7.

Proof 3.8.

Corollary 3.9.

Proof 3.10.

Corollary 3.11.

Proof 3.12.

4 Inertial Forward-Backward Algorithm with an Adaptive Modification

5 Numerical Experiments

6 Conclusion

Acknowledgements

References

Appendix A Proof of Lemma 2.4

Proof A.1.

2 An new assumption condition for $t_{k}$ and the convergence of the corresponding IFB algorithms

3 The sublinear convergence rates of IFB algorithms with special $t_{k}$