This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Smoothing Accelerated Proximal Gradient Method with Fast Convergence Rate for Nonsmooth Multi-objective Optimization

Huang Chengzhi
Abstract.

This paper proposes a Smoothing Accelerated Proximal Gradient Method with Extrapolation Term (SAPGM) for nonsmooth multiobjective optimization. By combining the smoothing methods and the accelerated algorithm for multiobjective optimization by Tanabe et al., our method achieve fast convergence rate. Specifically, we establish that the convergence rate of our proposed method can be enhanced to o(lnσk/k)o(\ln^{\sigma}k/k) by incorporating a extrapolation term k1k+α1\frac{k-1}{k+\alpha-1} with α>3\alpha>3.Moreover, we prove that the iterates sequence is convergent to a Pareto optimal solution of the primal problem. Furthermore, we present an effective strategy for solving the subproblem through its dual representation, validating the efficacy of the proposed method through a series of numerical experiments.

Key words and phrases:
Nonsmooth multiobjective optimization, Smoothing method, Accelerated algorithm with extrapolation, Convergence rate, Sequential convergence.
1991 Mathematics Subject Classification:
Primary: 49J52, 65K05; Secondary: 90C25, 90C29.
The first author is supported by [insert grant information here]
Corresponding author: First-name1 Last-name1

Huang Chengzhi1{}^{{\href mailto:2022110518015@cqnu.com}*1} and First-name2 Last-name21,2{}^{{\href mailto:author2@domain.com}1,2}

1Chong qing normal university, China

2Affiliation, Country


(Communicated by Handling Editor)

1. Introduction

Multiobjective optimization involves the simultaneous minimization (or maximization) of multiple objective functions while considering relevant constraints. The concept of Pareto optimality becomes crucial, as finding a single point that minimizes all objective functions concurrently is challenging. A point is deemed Pareto optimal if there exists no other point with the same or smaller objective function values and at least one strictly smaller objective function value. Applications of multiobjective optimization are pervasive, spanning economics[9], engineering[21], mechanics[37], statistics[41], internet routing[12], and location problems[3].

This paper focuses predominantly on composite nonsmooth multiobjective optimization, expressed as:

minxnF(x)\min_{x\in\mathbb{R}^{n}}F(x) (1)

with F:n({})mF:\mathbb{R}^{n}\to(\mathbb{R}\cup\{\infty\})^{m} and F:=(F1,,Fm)TF:=(F_{1},\dotsb,F_{m})^{T} taking the form

Fi(x):=fi(x)+gi(x),i=1,2,,m,F_{i}(x):=f_{i}(x)+g_{i}(x),i=1,2,\dotsb,m, (2)

where, fi:nf_{i}:\mathbb{R}^{n}\to\mathbb{R} represents a convex but nonsmooth function, and gi:ng_{i}:\mathbb{R}^{n}\to\mathbb{R} is a closed, proper, and convex function,which may not be nonsmooth.

The composite optimization problem is a significant class of optimization problems, not only because it encompasses various practical challenges—such as minimax problems [49] and penalty methods for constrained optimization [48]—but also due to its wide range of applications. For instance, as discussed in [44], the separable structure in (1) can be used to model robust multi-objective optimization problems, which involve uncertain parameters and optimize for the worst-case scenario. Additionally, this structure is applicable in machine learning[25], particularly for solving multi-objective clustering problems.

Naturally, we are interested in methods for solving multi-objective optimization problems. Common methods include scalarization method, evolutionary method and gradient method.

Scalarization is a fundamental approach to solve multiobjective optimization problems, transforming them into single-objective ones. Various procedures, such as optimizing one objective while treating others as constraints[36], or aggregating all objectives[39], are commonly applied. Evolution algorithms[46] provide another avenue, but proving their convergence rate poses challenges. Consequently, traditional methods for solving the problem directly are also employed.

In response to limitations, descent methods for multiobjective optimization problems have gained significant attention. These algorithms, which reduce all objective functions at each iteration, offer advantages such as not requiring prior parameter selection and providing convergence guarantees under reasonable assumptions. Noteworthy methods include the steepest descent[15], projected gradient[17], proximal point[4], Newton[16], trust region[5], and conjugate gradient methods[27] for solving g(x)=0g(x)=0. Among these, first-order methods, utilizing only the first-order derivatives of the objective functions, are distinguished, such as the steepest descent, projected gradient, and proximal gradient methods. The latter method converges to Pareto solutions with a rate of O(1/k).

To enhance the convergence efficiency of the proximal gradient method, numerous scholars have endeavored to introduce acceleration techniques into single-objective first-order methodologies. Detailed works can be seen in the following literature:[33][6][7][1].

The application of acceleration algorithms in single objective scenarios prompted a significant surge in interest in exploring their efficacy in the realm of multi-objective optimization problems. A recent noteworthy development by Tanabe et al. [42] involves the extension of the highly regarded Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) to the multi-objective context. The ensuing convergence rate, denoted as O(1/k2)O(1/k^{2}) and characterized by a merit function [43], represents a substantial improvement over the proximal gradient method for Multi-Objective Problems (MOP) [44]. Moreover, Nishimura et al. [35] have established a monotonicity version of the multiobjective FISTA, adding to the methodological advancements in this domain. Furthermore, Tanabe et al. [45] have expanded the applicability of the multiobjective FISTA by introducing hyperparameters, offering a generalization applicable even in single-objective scenarios. Importantly, this extended framework preserves the commendable convergence rate of O(1/k2)O(1/k^{2}) observed in the multiobjective FISTA. Additionally, it is proved that the iterative sequences is convergent.Inspired by the impact of the extrapolation parameters in single-objective case[1], we introduce the extrapolation parameter k1k+α1\frac{k-1}{k+\alpha-1} with α>3\alpha>3 into the multiobjective proximal gradient algorithm.

After solving the problem of algorithm acceleration, another problem follows: how to deal with non-smooth multi-objective optimization efficiently?

For non-smooth multi-objective optimization problems, current research mainly includes Mäkelä et al.’s proximal bundle method [31, 32, 20, 30] and the subgradient method. Gebken et al. [18] proposed a subgradient descent algorithm for solving non-smooth multi-objective optimization problems by combining the descent direction from [29] with the approximation based on Goldstein’s ϵ\epsilon-subdifferential from [19]. Besides, Konstantin Sonntag et al. [26] proposed a new subgradient method for solving non-smooth vector optimization problems, which includes regularization and interior point variants of Newton’s method. However, all of them face the challenge of requiring complex calculations and numerous calls to subgradients, resulting in a significant increase in computation time. Fortunately, Chen [8] proposed a smoothing construction, which used a sequence of functions to approximate the objective functions of the primal problem. This construction can avoid calculating the subgradient and directly use the gradient of the smoothing function to obtain the result. Inspired by this idea, we decided to construct an algorithm with fast speed to solve non-smooth multi-objective optimization problems under the smoothing framework, combined with the previously mentioned accelerated proximal gradient method with extrinsic terms.

Moreover, with practical computational efficiency in mind, we derive a convex and differentiable dual of the subproblem, simplifying its solution, particularly when the number of objective functions is fewer than the decision variable dimension. The entire algorithm is implemented using this dual problem, and its effectiveness is confirmed through numerical experiments.

The structure of this paper unfolds as follows: Section 2 introduces notations and concepts, Section 3 presents the smoothing accelerated proximal gradient method with extrapolation for nonsmooth multiobjective optimization, and Section 4 analyzes its o(lnσk/k)o(\ln^{\sigma}k/k) convergence rate. Section 5 outlines an efficient method to solve the subproblem through its dual form, and Section 6 reports numerical results for test problems.

2. Preliminary Results

In this paper, for any natural number nn, the symbol n\mathbb{R}^{n} denotes the nn-dimensional Euclidean space. The notation +nn\mathbb{R}^{n}_{+}\subseteq\mathbb{R}^{n} is employed to signify the non-negative orthant of n\mathbb{R}^{n}, denoted as +n:={vn|vi0,i=1,2,,n}\mathbb{R}^{n}_{+}:=\{v\in\mathbb{R}^{n}|v_{i}\geq 0,i=1,2,\dotsb,n\}. Additionally, Δn\Delta^{n} represents the standard simplex in n\mathbb{R}^{n} and is defined as

Δn:={λ+n|λi0,i=1nλi=1}.\Delta^{n}:=\{\lambda\in\mathbb{R}^{n}_{+}|\lambda_{i}\geq 0,\sum_{i=1}^{n}\lambda_{i}=1\}.

Subsequently, the partial orders induced by +n\mathbb{R}^{n}_{+} are considered, where for any v1,v2nv^{1},v^{2}\in\mathbb{R}^{n}, v1v2v^{1}\leq v^{2} (alternatively, v1v2v^{1}\geq v^{2}) holds if v2v1+nv^{2}-v^{1}\in\mathbb{R}^{n}_{+}, and v1<v2v^{1}<v^{2} (alternatively, v1>v2v^{1}>v^{2}) if v2v1int+nv^{2}-v^{1}\in\text{int}\,\mathbb{R}^{n}_{+}. Moreover, let ,\left\langle\cdot,\cdot\right\rangle denote the Euclidean inner product in n\mathbb{R}^{n}, specifically defined as u,v:=i=1nuivi\left\langle u,v\right\rangle:=\sum_{i=1}^{n}u_{i}v_{i}. The Euclidean norm \left\|\cdot\right\| is introduced as u:=u,u\left\|u\right\|:=\sqrt{\left\langle u,u\right\rangle}. Furthermore, the 1\ell_{1}-norm and the \ell_{\infty}-norm are defined by u1:=i=1n|ui|\left\|u\right\|_{1}:=\sum_{i=1}^{n}|u_{i}| and u:=maxi=1,,n|ui|\left\|u\right\|_{\infty}:=\max_{i=1,\dotsb,n}|u_{i}|, respectively.

Because the construction of proximal gradient algorithm,we should introduce some basic definitions for following discussion.For a closed , proper and convex function h:n{}h:\mathbb{R}^{n}\to\mathbb{R}\cup\{\infty\},the Moreau envelope of hh defined by

h(x):=minyRn{h(y)+12xy2}.\mathcal{M}_{h}(x):=\min_{y\in R^{n}}\{h(y)+\frac{1}{2}\left\|x-y\right\|^{2}\}.

The unique solution of the above problem is called the proximal operator of hh and write it as

proxh(x):=argminyRn{h(y)+12xy2}.prox_{h}(x):=\arg\min_{y\in R^{n}}\{h(y)+\frac{1}{2}\left\|x-y\right\|^{2}\}.

Next,we introduce a property between Moreau envelope and proximal operation by following lemma.

Lemma 2.1 ([38]).

If hh is a proper closed and convex function, the Moreau envelope h\mathcal{M}_{h} is lipschitz continuous and takes the following form,

h(x):=xproxh(x).\nabla\mathcal{M}_{h}(x):=x-prox_{h}(x).

As explicated in the Introduction section, the principal challenge in addressing the optimization problem denoted as (1) through the Proximal Gradient (PG) and Accelerated Proximal Gradient (APG) methods comes from the nonsmooth nature of the objective function ff. Specifically, when ff is nonsmooth or its gradient f\nabla f lacks global Lipschitz continuity, a straightforward approach involves resorting to the smoothing method, a pivotal aspect in our analytical framework. In the context of this study, we introduce an algorithm utilizing the smoothing function delineated in [8]. This smoothing function serves the purpose of approximating the nonsmooth convex function ff by a set of smooth convex functions, thereby facilitating the application of gradient-based optimization techniques.

Definition 2.2 ([8]).

For convex function ff in (2), we call f~:n×+\tilde{f}:\mathbb{R}^{n}\times\mathbb{R}_{+}\to\mathbb{R} a smoothing function of ff, if f~\tilde{f} satisfies the following conditions:

(i) for any fixed μ>0\mu>0,f~(,μ)\tilde{f}(\cdot,\mu) is continuously differentiable on n\mathbb{R}^{n};

(ii) limzx,μ0f~(z,μ)=f(x),xn\lim_{z\to x,\mu\downarrow 0}\tilde{f}(z,\mu)=f(x),\forall x\in\mathbb{R}^{n};

(iii) (gradient consistence) {limzx,μ0c~(z,μ)}f(x),xn\{\lim_{z\to x,\mu\downarrow 0}\tilde{c}(z,\mu)\}\subseteq\partial f(x),\forall x\in\mathbb{R}^{n} ;

(iv) for any fixed μ>0\mu>0, f~(z,μ)\tilde{f}(z,\mu) is convex on n\mathbb{R}^{n};

(v) there exists a k>0k>0 such that

|f~(x,μ2)f~(x,μ1)|k|μ1μ2|,xn,μ1,μ2++;|\tilde{f}(x,\mu_{2})-\tilde{f}(x,\mu_{1})|\leq k|\mu_{1}-\mu_{2}|,\forall x\in\mathbb{R}^{n},\mu_{1},\mu_{2}\in\mathbb{R}_{++};

(vi) there exists an L>0L>0 such that xf~(,μ)\nabla_{x}\tilde{f}(\cdot,\mu) is Lipschitz continuous on n\mathbb{R}^{n} with factor Lμ1L\mu^{-1} for any fixed μ++\mu\in\mathbb{R}_{++}.

Combining properties (ii) and (v) in Definition (2.2), we have

|f~(x,μ)f(x)|kμ,xn,μ++.|\tilde{f}(x,\mu)-f(x)|\leq k\mu,\forall x\in\mathbb{R}^{n},\mu\in\mathbb{R}_{++}.

The exploration of smooth approximations for diverse specialized nonsmooth functions has a venerable lineage, yielding a wealth of theoretical insights [8], [13], [34], [40], [22]. The foundational conditions (i)–(iii) articulated herein are integral elements in the characterization of a smoothing function, as delineated in [8]. These conditions are imperative for ensuring the efficacy of smoothing methods when applied to the resolution of corresponding nonsmooth problems. Condition (iv) stipulates that the smoothing function f~(,μ)\tilde{f}(\cdot,\mu) preserves the convexity of ff for any fixed μ++\mu\in\mathbb{R}_{++}. Conditions (v) and (vi) serve to guarantee the global Lipschitz continuity of f~(x,)\tilde{f}(x,\cdot) for any fixed xnx\in\mathbb{R}^{n} and the global Lipschitz continuity of xf~(,μ)\nabla_{x}\tilde{f}(\cdot,\mu) for any fixed μ++\mu\in\mathbb{R}_{++}, respectively. These conditions collectively establish a foundation for the utility and effectiveness of the smoothing function in the context of nonsmooth optimization problems.

We now revisit the optimality criteria for the multiobjective optimization problem denoted as (1). An element xnx^{*}\in\mathbb{R}^{n} is deemed weakly Pareto optimal if there does not exist xnx\in\mathbb{R}^{n} such that F(x)<F(x)F(x)<F(x^{*}), where F:nmF:\mathbb{R}^{n}\to\mathbb{R}^{m} represents the vector-valued objective function. The ensemble of weakly Pareto optimal solutions is denoted as XX^{*}. The merit function u0:n{}u_{0}:\mathbb{R}^{n}\to\mathbb{R}\cup\{\infty\}, as introduced in [43], is expressed in the following manner:

u0(x):=supznmini=1,,m[Fi(x)Fi(z)].u_{0}(x):=\sup_{z\in\mathbb{R}^{n}}\min_{i=1,\dotsb,m}[F_{i}(x)-F_{i}(z)]. (3)

The following lemma proves that u0u_{0} is a merit function in the Pareto sense.

Lemma 2.3 ([42]).

Let u0u_{0} be given as (3), then u0(x)0,xnu_{0}(x)\geq 0,x\in\mathbb{R}^{n}, and xx is the weakly Pareto optimal for (1) if and only if u0(x)=0u_{0}(x)=0.

3. The Smoothing Accelerated Proximal Gradient Method with Extrapolation term for Non-smooth Multi-objective Optimization

This section introduces an accelerated variant of the proximal gradient method tailored for multiobjective optimization. Drawing inspiration from the achievements reported in [1], we incorporate extrapolation techniques with parameters βk=k1k+α1\beta_{k}=\frac{k-1}{k+\alpha-1}, where α>3\alpha>3. Choosing the smoothing function c~\tilde{c} as defined in Definition (2.2), we formulate an accelerated proximal gradient algorithm to solve the multiobjective optimization problem denoted as (1). The algorithm achieves a faster convergence rate while also gain the sequential convergence.

Subsequently, we present the methodology employed to address the optimization problem denoted as (1). Similar to the exposition in [42], a subproblem is delineated and resolved in each iteration. Using the descent lemma, the proposed approach tackles the ensuing subproblem for prescribed values of xdom(F)x\in\text{dom}(F), yny\in\mathbb{R}^{n}, and L\ell\geq L:

minznφ(z;x,y,μ),\min_{z\in\mathbb{R}^{n}}\varphi_{\ell}(z;x,y,\mu), (4)

where

φl(z;x,y,μ):=maxi=1,,m[f~i(y,μ),zy+gi(z)+f~i(y,μ)F~i(x,μ)]+2zy2.\varphi_{l}(z;x,y,\mu):=\max_{i=1,\dotsb,m}\left[\left\langle\nabla\tilde{f}_{i}(y,\mu),z-y\right\rangle+g_{i}(z)+\tilde{f}_{i}(y,\mu)-\tilde{F}_{i}(x,\mu)\right]+\frac{\ell}{2}\left\|z-y\right\|^{2}.\\ (5)

Since gig_{i} is convex for all i=1,,m,zφ(z;x,y,μ)i=1,\dotsb,m,z\mapsto\varphi_{\ell}(z;x,y,\mu) is strongly convex.Thus,the subproblem (4) has a unique optimal solution p(x,y,μ)p_{\ell}(x,y,\mu) and attain the optimal function value θ(x,y,μ)\theta_{\ell}(x,y,\mu),i.e.,

p(x,y,μ):=argminznφ(z,x,y,μ)andθ(x,y,μ):=minznφ(z,x,y,μ).p_{\ell}(x,y,\mu):=\arg\min_{z\in\mathbb{R}^{n}}\varphi_{\ell}(z,x,y,\mu)\ \text{and}\ \theta_{\ell}(x,y,\mu):=\min_{z\in\mathbb{R}^{n}}\varphi_{\ell}(z,x,y,\mu). (6)

Furthermore, the optimality condition associated with the optimization problem denoted as (4) implies that, for all xdomFx\in\text{dom}\,F and yny\in\mathbb{R}^{n}, there exists η(x,y,μ)g(p(x,y,μ))\eta(x,y,\mu)\in\partial g(p_{\ell}(x,y,\mu)) and a Lagrange multiplier λ(x,y)m\lambda(x,y)\in\mathbb{R}^{m} such that

i=1mλi(x,y)[f~i(y,μ)+ηi(x,y,μ)]=[p(x,y)y]\sum_{i=1}^{m}\lambda_{i}(x,y)[\nabla\tilde{f}_{i}(y,\mu)+\eta_{i}(x,y,\mu)]=-\ell[p_{\ell}(x,y)-y] (7)
λ(x,y)Δm,λj(x,y)=0j(x,y),\lambda(x,y)\in\Delta^{m},\quad\lambda_{j}(x,y)=0\quad\forall j\notin\mathcal{I}(x,y), (8)

where Δm\Delta^{m} denotes the standard simplex and

(x,y):=argmaxi=1,,m[f~i(y,μ),p(x,y,μ)y+gi(p(x,y,μ))+f~i(y,μ)F~i(x,μ)].\mathcal{I}(x,y):=\arg\max_{i=1,\dotsb,m}[\left\langle\nabla\tilde{f}_{i}(y,\mu),p_{\ell}(x,y,\mu)-y\right\rangle+g_{i}(p_{\ell}(x,y,\mu))+\tilde{f}_{i}(y,\mu)-\tilde{F}_{i}(x,\mu)]. (9)

Before we present the algorithm framework, we first give the following assumption.

Assumption 3.1.

Suppose XX^{*} is set of the weakly Pareto optimal points and F~(c):={xn|F~(x)c}\mathcal{L}_{\tilde{F}}(c):=\{x\in\mathbb{R}^{n}|\tilde{F}(x)\leq c\}, then for any xF~(F~(x0))x\in\mathcal{L}_{\tilde{F}}(\tilde{F}(x^{0})), then there exists xXx\in X^{*}such that F~(x)F~(x)\tilde{F}(x^{*})\leq\tilde{F}(x)and

R:=supF~F~(XF~(F~(x0)))infzF~1({F~})zx02<+.R:=\sup_{\tilde{F}^{*}\in\tilde{F}(X^{*}\cap\mathcal{L}_{\tilde{F}}(\tilde{F}(x^{0})))}\inf_{z\in\tilde{F}^{-1}(\{\tilde{F}^{*}\})}\left\|z-x^{0}\right\|^{2}<+\infty.

For easy of reference and corresponding to its structure, we call the proposed algorithm the smoothing accelerated proximal gradient method with extrapolation term for nonsmooth multiobjective optimization(SAPGM) in this paper.The algorithm is in the following form.

Algorithm 1 The Smoothing Accelerated Proximal Gradient Method with Extrapolation term for Non-smooth Multi-objective Optimization
0:  Take initial point x1=x0domFx^{-1}=x^{0}\in\text{dom}F, y0=x0y^{0}=x^{0}, lL~l\geq\tilde{L}, ε>0\varepsilon>0, μ0R++\mu_{0}\in R_{++}, γ0R++\gamma_{0}\in R_{++}. Choose parameters η(0,1)\eta\in(0,1), α>3\alpha>3, σ(12,1]\sigma\in(\frac{1}{2},1]. Set k=0k=0.
1:  loop
2:     Compute
yk=xk+k1k+α1(xkxk1)y^{k}=x^{k}+\frac{k-1}{k+\alpha-1}(x^{k}-x^{k-1})
μk+1=μ0(k+α1)lnσ(k+α1)\mu_{k+1}=\frac{\mu_{0}}{(k+\alpha-1)\ln^{\sigma}(k+\alpha-1)}
3:     Set γ¯k+1=γk,=γ¯k+1μk+1\bar{\gamma}_{k+1}=\gamma_{k},\ell={\bar{\gamma}_{k+1}\mu_{k+1}},compute
x^k+1=p(xk,yk,μk+1)\hat{x}^{k+1}=p_{\ell}(x^{k},y^{k},\mu^{k+1})
4:     if 
2mini(f~i(x^k+1,μk+1)f~i(yk,μk+1)\displaystyle 2\min_{i}(\tilde{f}_{i}(\hat{x}_{k+1},\mu_{k+1})-\tilde{f}_{i}(y_{k},\mu_{k+1}) f~i(yk,μk+1),x^k+1yk)\displaystyle-\langle\nabla\tilde{f}_{i}(y_{k},\mu_{k+1}),\hat{x}_{k+1}-y_{k}\rangle)
>1γ¯k+1μk+1x^k+1yk2\displaystyle>\frac{1}{\bar{\gamma}_{k+1}\mu_{k+1}}\parallel\hat{x}_{k+1}-y_{k}\parallel^{2}
 then
5:        
γ¯k+1=ηγ¯k+1andgotostep 3\bar{\gamma}_{k+1}=\eta\bar{\gamma}_{k+1}\ and\ go\ to\ step\ 3
6:     else
7:        
γk+1=γ¯k+1,xk+1=x^k+1andgotostep 2,k+1\gamma_{k+1}=\bar{\gamma}_{k+1},x^{k+1}=\hat{x}^{k+1}\ and\ go\ to\ step\ 2,k+1
8:     end if
9:     if xkxk+1<ε\left\|x^{k}-x^{k+1}\right\|<\varepsilon and μk+1<ϵ\mu_{k+1}<\epsilon then
10:        return  xk+1x^{k+1}
11:     end if
12:  end loop
12:  xx^{*}: A weakly Pareto optimal point

4. The convergence rate analysis of SAPGM

4.1. Some Basic Estimation

This section shows that SAPGM has a faster convergence rate than O(1/k2)O(1/k^{2}) under the Assumption (3.1). For the convenience of the complexity analysis, we use some functions defined in [42].For k0k\geq 0,let Wk:n{}W_{k}:\mathbb{R}^{n}\to\mathbb{R}\cup\{-\infty\} and uk:nu_{k}:\mathbb{R}^{n}\to\mathbb{R} be defined by

Wk(z,μk)\displaystyle W_{k}(z,\mu_{k}) :=mini=1,,m[F~i(xk,μk)Fi(z)]+κμk,\displaystyle:=\min_{i=1,\dotsb,m}[\tilde{F}_{i}(x^{k},\mu_{k})-F_{i}(z)]+\kappa\mu_{k}, (10)
uk\displaystyle u_{k} :=k+α1α1xk+1kα1xk.\displaystyle:=\frac{k+\alpha-1}{\alpha-1}x^{k+1}-\frac{k}{\alpha-1}x^{k}.

Given a fixed weakly Pareto solution xnx^{*}\in\mathbb{R}^{n}, define the global energy function which serves for Lyapunov analysis:

k+1:=2γμkα1(k+α1)2Wk\displaystyle\mathscr{E}_{k+1}:=\frac{2\gamma\mu_{k}}{\alpha-1}(k+\alpha-1)^{2}W_{k} +(α1)ukx2\displaystyle+(\alpha-1)\left\|u^{k}-x^{*}\right\|^{2} (11)
+(4κγ0μ02σ1)μk(k+α2)ln1σ(k+α2),\displaystyle+(\frac{4\kappa\gamma_{0}\mu_{0}}{2\sigma-1})\mu_{k}(k+\alpha-2)\ln^{1-\sigma}(k+\alpha-2),

where Wk:=Wk(x,μk)W_{k}:=W_{k}(x^{*},\mu_{k}).

Following the properties outlined in [47], we present the following properties regarding the sequence {k}\{\mathscr{E}_{k}\}.

Proposition 4.1.

Let k\mathscr{E}_{k} the sequence defined in (11). Then, for any k1k\geq 1, we have

k+1+2(α3)γk+1μk+1α1(k+α1)Wkk.{\mathscr{E}}_{k+1}+\frac{2(\alpha-3)\gamma_{k+1}\mu_{k+1}}{\alpha-1}(k+\alpha-1)\>W_{k}\leq{\mathscr{E}}_{k}. (12)

Moreover,

(i) the sequence k\mathscr{E}_{k} is non-increasing for all k1,andlimkkexists;k\geq 1,\text{and}\ \lim_{k\to\infty}\mathscr{E}_{k}\ exists;

(ii) for every k1k\geq 1

k(α1)zx02+4(α1)κμ02+4κγ0μ022σ1(α1)ln1σ(α1).\mathscr{E}_{k}\leq(\alpha-1)\|z-x^{0}\|^{2}+4(\alpha-1)\kappa\mu_{0}^{2}+\frac{4\kappa\gamma_{0}\mu_{0}^{2}}{2\sigma-1}(\alpha-1)\ln^{1-\sigma}(\alpha-1).
Proof.

Before proving the proposition, we should discuss the following inequality where is crucial for proving: For any x𝒳x\in\mathcal{X} and kk\in\mathbb{N},it holds that for all i=1,,mi=1,\dotsb,m

F~i(xk+1,μk+1)\displaystyle\tilde{F}_{i}(x^{k+1},\mu_{k+1}) F~i(x,μk+1)+(γk+1μk+1)1ykxk+1,ykx\displaystyle\leq\tilde{F}_{i}(x,\mu_{k+1})+(\gamma_{k+1}\mu_{k+1})^{-1}\left\langle y^{k}-x^{k+1},y^{k}-x\right\rangle (13)
12(γk+1μk+1)1xk+1yk2.\displaystyle\quad-\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\parallel x^{k+1}-y^{k}\parallel^{2}.

From step 4 and step 7 of the algorithm, we can see that for all i=1,,mi=1,\dotsb,m, the following inequality holds

f~i(xk+1,μk+1)\displaystyle\tilde{f}_{i}(x^{k+1},\mu_{k+1})\leq f~i(yk,μk+1)+f~i(yk,μk+1),xk+1yk\displaystyle\tilde{f}_{i}(y^{k},\mu_{k+1})+\left\langle\nabla\tilde{f}_{i}(y^{k},\mu_{k+1}),x^{k+1}-y^{k}\right\rangle (14)
+12γk+1μk+1xk+1yk2.\displaystyle+\frac{1}{2\gamma_{k+1}\mu_{k+1}}\parallel x^{k+1}-y^{k}\parallel^{2}.

Set

Q(x,y,μ,γ):=f~i(y,μ)+f~i(y,μ),xy+12γμxy2+gi(x).\displaystyle Q(x,y,\mu,\gamma):=\tilde{f}_{i}(y,\mu)+\left\langle\nabla\tilde{f}_{i}(y,\mu),x-y\right\rangle+\frac{1}{2\gamma\mu}\parallel x-y\parallel^{2}+g_{i}(x).

We noticed that for a fixed y,μy,\mu and γ\gamma, function Q(x,y,μ,γ)Q(x,y,\mu,\gamma)in (γμ)1(\gamma\mu)^{-1}as coefficient of strong convex function. Therefore, Q(x,y,μ,γ)Q(x,y,\mu,\gamma) has a unique global minimum on 𝒳\mathcal{X}, its record of p(y,μ,γ)p(y,\mu,\gamma), namely:

p(y,μ,γ):=argminx𝒳Q(x,y,μ,γ).p(y,\mu,\gamma):=\arg\min_{x\in\mathcal{X}}Q(x,y,\mu,\gamma).

Since p(y,μ,γ)p(y,\mu,\gamma) is the minimum, we infer that

Q(x,y,μ,γ)Q(p(y,μ,γ),y,μ,γ)+12(γμ)1xp(y,μ,γ)2,x𝒳.Q(x,y,\mu,\gamma)\geq Q(p(y,\mu,\gamma),y,\mu,\gamma)+\frac{1}{2}(\gamma\mu)^{-1}\parallel x-p(y,\mu,\gamma)\parallel^{2},\forall x\in\mathcal{X}. (15)

Combining Step 3 and Step 7 with the definition of the proximal operator, we get

xk+1=p(yk,μk+1,γk+1).x^{k+1}=p(y^{k},\mu_{k+1},\gamma_{k+1}). (16)

Taking y=yk,μ=μk+1y=y^{k},\mu=\mu_{k+1} and γ=γk+1\gamma=\gamma_{k+1} in inequality (15), we have

Q(x,yk,μk+1,γk+1)Q(xk+1,yk,μk+1,γk+1)+12(γk+1μk+1)1xxk+12,x𝒳.Q(x,y^{k},\mu_{k+1},\gamma_{k+1})\geq Q(x^{k+1},y^{k},\mu_{k+1},\gamma_{k+1})+\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\parallel x-x^{k+1}\parallel^{2},\forall x\in\mathcal{X}.

After some rearrangement, we deduce that, for any x𝒳x\in\mathcal{X},

gi(xk+1)gi(x)+f~i(yk,μk+1),xxk+1+12(γk+1μk+1)1xyk2\displaystyle g_{i}(x^{k+1})\leq g_{i}(x)+\left\langle\nabla\tilde{f}_{i}(y^{k},\mu_{k+1}),x-x^{k+1}\right\rangle+\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\parallel x-y^{k}\parallel^{2} (17)
12(γk+1μk+1)1xxk+1212(γk+1μk+1)1xk+1yk2.\displaystyle-\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\parallel x-x^{k+1}\parallel^{2}-\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\parallel x^{k+1}-y^{k}\parallel^{2}.

Adding (17) and (14), and using the convexity of f~i\tilde{f}_{i}, we deduced that, for any x𝒳x\in\mathcal{X},

F~i(xk+1,μk+1)\displaystyle\tilde{F}_{i}(x^{k+1},\mu_{k+1}) =f~i(xk+1,μk+1)+gi(xk+1)\displaystyle=\tilde{f}_{i}(x^{k+1},\mu_{k+1})+g_{i}(x^{k+1}) (18)
F~i(x,μk+1)+12(γk+1μk+1)1xyk2\displaystyle\leq\tilde{F}_{i}(x,\mu_{k+1})+\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\parallel x-y^{k}\parallel^{2}
12(γk+1μk+1)1xxk+12.\displaystyle\quad-\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\parallel x-x^{k+1}\parallel^{2}.

So we have proven that inequality (13) holds.

Recalling that WkW_{k} define in [47], in order to be exactly, we use WkW^{{}^{\prime}}_{k}:

Wk=F~(xk,μk)+κμkF(x),W^{{}^{\prime}}_{k}=\tilde{F}(x^{k},\mu_{k})+\kappa\mu_{k}-F(x^{*}),

where xargminF(x)x^{*}\in\arg\min F(x). We can see that if we just let FF be replaced by FiF_{i},let xx^{*} be the weak Pareto solution of minF(x)\min F(x), then we get

Wk,i=F~i(xk,μk)+κμkFi(x).W^{{}^{\prime}}_{k,i}=\tilde{F}_{i}(x^{k},\mu_{k})+\kappa\mu_{k}-F_{i}(x^{*}).

Combining (13) and this definition, the WkW_{k} define in our article has following relation with Wk,iW^{{}^{\prime}}_{k,i}:

Wk=mini=1,,mWk,iWk,i,i=1,,m.W_{k}=\min_{i=1,\dotsb,m}W^{{}^{\prime}}_{k,i}\leq W^{{}^{\prime}}_{k,i},\forall i=1,\dotsb,m.

So we can get the following two inequalities of WkW_{k} and Wk+1W_{k+1}, which are basic for this discussion:

Wk+1Wk+(μk+1γk+1)1ykxk+1,ykxk12(γk+1μk+1)1xk+1yk2,W_{k+1}\leq W_{k}+(\mu_{k+1}\gamma_{k+1})^{-1}\langle y^{k}-x^{k+1},y^{k}-x^{k}\rangle-\frac{1}{2}(\gamma_{k+1}\mu_{k+1})^{-1}\|x^{k+1}-y^{k}\|^{2}, (19)

and

Wk+1\displaystyle W_{k+1}\leq (μk+1γk+1)1ykxk+1,ykz\displaystyle(\mu_{k+1}\gamma_{k+1})^{-1}\langle y^{k}-x^{k+1},y^{k}-z\rangle (20)
12(μk+1γk+1)1xk+1yk2+2κμk+1.\displaystyle-\frac{1}{2}(\mu_{k+1}\gamma_{k+1})^{-1}\|x^{k+1}-y^{k}\|^{2}+2\kappa\mu_{k+1}.

The rest of the proof is similar to the Proposition 3.1 proof in the article[47], so we don’t want to go into details. ∎

As a result of Proposition (4.1), we obtain some important properties of WkW_{k} as shown below, where we need to introduce an important lemma on sequence convergence.

Lemma 4.2 ([47] Lemma 3.3).

Let {ak}\{a_{k}\} be a sequence of nonnegative numbers, and satisfy

k=1(ak+1ak)+<.\sum_{k=1}^{\infty}(a_{k+1}-a_{k})_{+}<\infty.

Then, limkak\lim_{k\to\infty}a_{k} exists.

Lemma 4.3.

{γk}\{\gamma_{k}\} is non-increasing.

Proof.

The iterative format of γk\gamma_{k} shows that it is non-increasing, ∎

Theorem 4.4.

Suppose{xk}and{yk}\left\{x^{k}\right\}and\left\{y^{k}\right\}be the sequences generated by SAPGM, for any znandα>3z\in\mathbb{R}^{n}\>and\>\alpha>3,it holds that

(i) k=1μkγk(k+α2)Wk<\sum_{k=1}^{\infty}\mu_{k}\gamma_{k}(k+\alpha-2)W_{k}<\infty;

(ii)limk[(k1)2xkxk12+2μkγk(k+α2)2Wk]\lim\limits_{k\to\infty}[(k-1)^{2}\left\|x^{k}-x^{k-1}\right\|^{2}+2\mu_{k}\gamma_{k}(k+\alpha-2)^{2}W_{k}] exists;

(iii)k=1(k1)xkxk12<\sum_{k=1}^{\infty}(k-1)\left\|x^{k}-x^{k-1}\right\|^{2}<\infty;

(iv)u0(xk)=o(lnσk/k).u_{0}(x^{k})=o(\ln^{\sigma}k/k).

Proof.

Before proving, we state that the proof of (i),(ii), and (iii) are similar to the proof of Proposition 3.2 in [47].

(i) By summing inequality (12) from k=1k=1 to KK, we obtain:

δK+1+2(α3)α1k=1Kγk+1μk+1(k+α1)Wk1.\delta_{K+1}+\frac{2(\alpha-3)}{\alpha-1}\sum_{k=1}^{K}\gamma_{k+1}\mu_{k+1}(k+\alpha-1)W_{k}\leq\mathscr{E}_{1}.

Now, letting KK\to\infty in the above inequality and using Proposition (ii), since α>3\alpha>3 and k0\mathscr{E}_{k}\geq 0 for all k0k\geq 0, we can infer that

k=1μk+1γk+1(k+α1)Wk(α1)12(α3)<.\sum_{k=1}^{\infty}\mu_{k+1}\gamma_{k+1}(k+\alpha-1)W_{k}\leq\frac{(\alpha-1)\mathscr{E}_{1}}{2(\alpha-3)}<\infty. (21)

Since for all k1k\geq 1, it holds that

μk+1γk+1(k+α1)=μ0γ0lnσ(k+α1)μkγk(k+α2)lnσ(k+α2)lnσ(k+α1).\mu_{k+1}\gamma_{k+1}(k+\alpha-1)=\frac{\mu_{0}\gamma_{0}}{\ln^{\sigma}(k+\alpha-1)}\geq\mu_{k}\gamma_{k}\frac{(k+\alpha-2)\ln^{\sigma}(k+\alpha-2)}{\ln^{\sigma}(k+\alpha-1)}.

We further obtain:

μk+1γk+1(k+α1)lnσ(α1)lnσαμkγk(k+α2).\mu_{k+1}\gamma_{k+1}(k+\alpha-1)\geq\frac{\ln^{\sigma}(\alpha-1)}{\ln^{\sigma}\alpha}\mu_{k}\gamma_{k}(k+\alpha-2).

This inequality follows from the fact that lnα(k+α2)lnα(k+α1)\frac{\ln^{\alpha}(k+\alpha-2)}{\ln^{\alpha}(k+\alpha-1)} is increasing for all k1k\geq 1.

Therefore, inequality (21) implies:

k=1μkγk(k+α2)Wk<.\sum_{k=1}^{\infty}\mu_{k}\gamma_{k}(k+\alpha-2)W_{k}<\infty.

(ii) Returning to inequality (19) and using the identity

ab2+2ba,bc=ac2+bc2-\|a-b\|^{2}+2\langle b-a,b-c\rangle=-\|a-c\|^{2}+\|b-c\|^{2}

with a=xk+1,b=yka=x^{k+1},b=y^{k}, and c=xkc=x^{k}, we deduce:

Wk+1Wk12(μk+1γk+1)1xk+1xk2+12(μk+1γk+1)1ykxk2.W_{k+1}\leq W_{k}-\frac{1}{2}(\mu_{k+1}\gamma_{k+1})^{-1}\|x^{k+1}-x^{k}\|^{2}+\frac{1}{2}(\mu_{k+1}\gamma_{k+1})^{-1}\|y^{k}-x^{k}\|^{2}.

By the definition of yky^{k} in SAPGM and multiplying the inequality by μk+1γk+1(k+α1)2\mu_{k+1}\gamma_{k+1}(k+\alpha-1)^{2}, we get:

2μk+1γk+1\displaystyle 2\mu_{k+1}\gamma_{k+1} (k+α1)2Wk+1+(k+α1)2xk+1xk2\displaystyle(k+\alpha-1)^{2}W_{k+1}+(k+\alpha-1)^{2}\|x^{k+1}-x^{k}\|^{2}
2μk+1γk+1(k+α1)2Wk+(k1)2xkxk12.\displaystyle\leq 2\mu_{k+1}\gamma_{k+1}(k+\alpha-1)^{2}W_{k}+(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}.

Since k+α1kk+\alpha-1\geq k, we can rearrange terms to obtain:

0k2xk+1xk2(k1)2xkxk12+2μk+1γk+1(k+α1)2(Wk+1Wk).0\geq k^{2}\|x^{k+1}-x^{k}\|^{2}-(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}+2\mu_{k+1}\gamma_{k+1}(k+\alpha-1)^{2}(W_{k+1}-W_{k}). (22)

Next, observe that

μk+1(k+α1)2μk(k+α2)2\displaystyle\mu_{k+1}(k+\alpha-1)^{2}-\mu_{k}(k+\alpha-2)^{2}
=\displaystyle= μk+1(k+α1)(k+α1(k+α2)lnσ(k+α1)lnσ(k+α2))\displaystyle\mu_{k+1}(k+\alpha-1)\left(k+\alpha-1-\frac{(k+\alpha-2)\ln^{\sigma}(k+\alpha-1)}{\ln^{\sigma}(k+\alpha-2)}\right)
\displaystyle\leq μk+1(k+α1),\displaystyle\mu_{k+1}(k+\alpha-1),

which leads to:

μk+1(k+α1)2Wk+1μk(k+α2)2Wk\displaystyle\mu_{k+1}(k+\alpha-1)^{2}W_{k+1}-\mu_{k}(k+\alpha-2)^{2}W_{k} (23)
=\displaystyle= μk+1(k+α1)2(Wk+1Wk)+(μk+1(k+α1)2μk(k+α2)2)Wk\displaystyle\mu_{k+1}(k+\alpha-1)^{2}(W_{k+1}-W_{k})+\left(\mu_{k+1}(k+\alpha-1)^{2}-\mu_{k}(k+\alpha-2)^{2}\right)W_{k}
\displaystyle\leq μk+1(k+α1)2(Wk+1Wk)+μk+1(k+α1)Wk.\displaystyle\mu_{k+1}(k+\alpha-1)^{2}(W_{k+1}-W_{k})+\mu_{k+1}(k+\alpha-1)W_{k}.

For simplicity, define:

ζk:=(k1)2xkxk12+2μkγk(k+α2)2Wk.\zeta_{k}:=(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}+2\mu_{k}\gamma_{k}(k+\alpha-2)^{2}W_{k}.

Substituting (23) into (22), we obtain:

ζk+1ζk2γk+1μk+1(k+α1)Wk.\zeta_{k+1}-\zeta_{k}\leq 2\gamma_{k+1}\mu_{k+1}(k+\alpha-1)W_{k}. (24)

Taking the positive part of the left-hand side and using inequality (21), we find:

k=1(ζk+1ζk)+<.\sum_{k=1}^{\infty}(\zeta_{k+1}-\zeta_{k})_{+}<\infty.

Since ζk0\zeta_{k}\geq 0, by Lemma 4.2, we infer that limkζk\lim_{k\to\infty}\zeta_{k} exists.

(iii) In view of α>3\alpha>3, we observe that

(k+α1)2xk+1xk2(k1)2xkxk12\displaystyle(k+\alpha-1)^{2}\|x^{k+1}-x^{k}\|^{2}-(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}
\displaystyle\geq (k+2)2xk+1xk2(k1)2xkxk12\displaystyle(k+2)^{2}\|x^{k+1}-x^{k}\|^{2}-(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}
\displaystyle\geq k2xk+1xk2(k1)2xkxk12+4kxk+1xk2,\displaystyle k^{2}\|x^{k+1}-x^{k}\|^{2}-(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}+4k\|x^{k+1}-x^{k}\|^{2},

combining which with (24), then we obtain

k2xk+1xk2(k1)2xkxk12+4kxk+1xk2\displaystyle k^{2}\|x^{k+1}-x^{k}\|^{2}-(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}+4k\|x^{k+1}-x^{k}\|^{2}
\displaystyle\leq 2μkγk(k+α2)2Wk2μk+1γk+1(k+α1)2Wk+1+2μk+1γk+1(k+α1)Wk.\displaystyle 2\mu_{k}\gamma_{k}(k+\alpha-2)^{2}W_{k}-2\mu_{k+1}\gamma_{k+1}(k+\alpha-1)^{2}W_{k+1}+2\mu_{k+1}\gamma_{k+1}(k+\alpha-1)W_{k}.

Summing up the above inequality for k=1,2,,Kk=1,2,\ldots,K, we obtain

K2xK+1xK2+4k=1Kkxk+1xk2\displaystyle K^{2}\|x^{K+1}-x^{K}\|^{2}+4\sum_{k=1}^{K}k\|x^{k+1}-x^{k}\|^{2} (25)
\displaystyle\leq 2μ1γ1(α1)2W1+2k=1Kμk+1γk+1(k+α1)Wk.\displaystyle 2\mu_{1}\gamma_{1}(\alpha-1)^{2}W_{1}+2\sum_{k=1}^{K}\mu_{k+1}\gamma_{k+1}(k+\alpha-1)W_{k}.

Since W11W_{1}\leq\mathscr{E}_{1}, letting KK tend to infinity in the above inequality, by (21) and (25), we have

k=1kxk+1xk2<.\sum_{k=1}^{\infty}k\|x^{k+1}-x^{k}\|^{2}<\infty.

(iv) Combining (i) and (ii), we have

k=1[(k1)xkxk12+2γkμk(k+α2)Wk]<,\sum_{k=1}^{\infty}\left[(k-1)\|x^{k}-x^{k-1}\|^{2}+2\gamma_{k}\mu_{k}(k+\alpha-2)W_{k}\right]<\infty,

which implies

k=1ln(k+α2)(k+α2)ln(k+α2)[(k1)2xkxk12+2γkμk(k+α2)2Wk]\displaystyle\sum_{k=1}^{\infty}\frac{\ln(k+\alpha-2)}{(k+\alpha-2)\ln(k+\alpha-2)}\left[(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}+2\gamma_{k}\mu_{k}(k+\alpha-2)^{2}W_{k}\right]
k=11k+α2[(k1)(k+α2)xkxk12+2γkμk(k+α2)2Wk]\displaystyle\leq\sum_{k=1}^{\infty}\frac{1}{k+\alpha-2}\left[(k-1)(k+\alpha-2)\|x^{k}-x^{k-1}\|^{2}+2\gamma_{k}\mu_{k}(k+\alpha-2)^{2}W_{k}\right]
=k=1[(k1)xkxk12+2γkμk(k+α2)Wk]<.\displaystyle=\sum_{k=1}^{\infty}\left[(k-1)\|x^{k}-x^{k-1}\|^{2}+2\gamma_{k}\mu_{k}(k+\alpha-2)W_{k}\right]<\infty.

Observe that k=11(k+α1)ln(k+α1)=\sum_{k=1}^{\infty}\frac{1}{(k+\alpha-1)\ln(k+\alpha-1)}=\infty, then

limkinfln(k+α2)((k1)2xkxk12+2γkμk(k+α2)2Wk)=0.\lim_{k\to\infty}\inf\ln(k+\alpha-2)\left((k-1)^{2}\|x^{k}-x^{k-1}\|^{2}+2\gamma_{k}\mu_{k}(k+\alpha-2)^{2}W_{k}\right)=0. (26)

Combining this with (ii), we obtain

limk(k1)2xkxk12+2γkμk(k+α2)2Wk=0,\lim_{k\to\infty}(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}+2\gamma_{k}\mu_{k}(k+\alpha-2)^{2}W_{k}=0, (27)

by Wk0W_{k}\geq 0, which further implies

limk(k1)2xkxk12=0andlimkγkμk(k+α2)2Wk=0.\lim_{k\to\infty}(k-1)^{2}\|x^{k}-x^{k-1}\|^{2}=0\quad\mathrm{and}\quad\lim_{k\to\infty}\gamma_{k}\mu_{k}(k+\alpha-2)^{2}W_{k}=0. (28)

Recalling the definition of μk\mu_{k} in step 2 of algorithm,the non-increment of γ\gamma and (10), the second equation in (28) implies

limk(k+α2)lnσ(k+α1)Wk=0\lim\limits_{k\to\infty}(k+\alpha-2)\ln^{\sigma}(k+\alpha-1)W_{k}=0

By the definition of Wk=Wk(x,μk)W_{k}=W_{k}(x^{*},\mu_{k}),we get that

limk(k+α2)lnσ(k+α1)(mini=1,,m[F~i(xk,μk)Fi(x)]+κμk)=0\lim\limits_{k\to\infty}(k+\alpha-2)\ln^{\sigma}(k+\alpha-1)\left(\min_{i=1,\dotsb,m}[\tilde{F}_{i}(x^{k},\mu_{k})-F_{i}(x^{*})]+\kappa\mu_{k}\right)=0

From the definition of u0u_{0} and the fact that xx^{*} is a weak Pareto point, we infer that

mini=1,,m[Fi(x)Fi(z)]\displaystyle\min_{i=1,\dotsb,m}[F_{i}(x^{*})-F_{i}(z)] mini=1,,m[Fi(xk)Fi(z)]\displaystyle\leq\min_{i=1,\dotsb,m}[F_{i}(x_{k})-F_{i}(z)]
=mini=1,,m[Fi(xk)F~i(xk,μk)+\displaystyle=\min_{i=1,\dotsb,m}[F_{i}(x_{k})-\tilde{F}_{i}(x_{k},\mu_{k})+
F~i(xk,μk)Fi(x)+Fi(x)Fi(z)]\displaystyle\quad\tilde{F}_{i}(x_{k},\mu_{k})-F_{i}(x^{*})+F_{i}(x^{*})-F_{i}(z)]
κμk+mini=1,,m[F~i(xk,μk)Fi(x)].\displaystyle\leq\kappa\mu_{k}+\min_{i=1,\dotsb,m}[\tilde{F}_{i}(x_{k},\mu_{k})-F_{i}(x^{*})].

So we have

supznmini=1,,m[Fi(x)Fi(z)]κμk+mini=1,,m[F~i(xk,μk)Fi(x)].\sup_{z\in\mathbb{R}^{n}}\min_{i=1,\dotsb,m}[F_{i}(x^{*})-F_{i}(z)]\leq\kappa\mu_{k}+\min_{i=1,\dotsb,m}[\tilde{F}_{i}(x_{k},\mu_{k})-F_{i}(x^{*})].

Because (k+α2)lnσ(k+α1)>0(k+\alpha-2)\ln^{\sigma}(k+\alpha-1)>0, we get that

0\displaystyle 0 (k+α2)lnσ(k+α1)supznmini=1,,m[Fi(x)Fi(z)]\displaystyle\leq(k+\alpha-2)\ln^{\sigma}(k+\alpha-1)\sup_{z\in\mathbb{R}^{n}}\min_{i=1,\dotsb,m}[F_{i}(x^{*})-F_{i}(z)]
κμk+mini=1,,m[F~i(xk,μk)Fi(x)].\displaystyle\leq\kappa\mu_{k}+\min_{i=1,\dotsb,m}[\tilde{F}_{i}(x_{k},\mu_{k})-F_{i}(x^{*})].

So we know that

limk(k+α2)lnσ(k+α1)(supznmini=1,,m[Fi(x)Fi(z)])=0.\lim\limits_{k\to\infty}(k+\alpha-2)\ln^{\sigma}(k+\alpha-1)\left(\sup_{z\in\mathbb{R}^{n}}\min_{i=1,\dotsb,m}[F_{i}(x^{*})-F_{i}(z)]\right)=0.

This result illustrates that for any σ(12,1]\sigma\in(\frac{1}{2},1] in the SAPGM algorithm, it holds u0(xk)=o(lnσk/k).u_{0}(x^{k})=o(\ln^{\sigma}k/k).

4.2. Sequential Convergence

In this subsection, we are ready to analyze the convergence of the iterates generated by the SAPGM. In this context, we articulate the discrete manifestation of Opial’s lemma, laying the groundwork for a rigorous examination of the convergence properties inherent in the sequence {xk}.\{x^{k}\}.

Lemma 4.5 ([47] Lemma 3.4).

Let SS be a nonempty subset of n\mathbb{R}^{n} and {zk}\{z_{k}\} be a sequence of n.\mathbb{R}^{n}.

Assume that

(i) limkzkz\lim_{k\to\infty}\|z_{k}-z\| exists for every zS;z\in S;

(ii) every sequential limit point of sequence {zk}\{z_{k}\} as kk\to\infty belongs to S.

Then, as kk\to\infty, {zk}\{z_{k}\} converge to a point in SS.

To prove the sequential convergence, we must recall the following inequality on nonnegative sequences, which will be used in the forthcoming sequential convergence result.

Lemma 4.6 ([47] Lemma 3.5).

Assume α3\alpha\geq 3.Let {ak}\{a_{k}\} and {ωk}\{\omega_{k}\} be two sequences of nonnegative numbers such that

ak+1k1k+α1ak+ωka_{k+1}\leq{\frac{k-1}{k+\alpha-1}}a_{k}+\omega_{k}

for all k1k\geq 1.If k=1kωk<\sum_{k=1}^{\infty}k\omega_{k}<\infty,then k=1ak<.\sum_{k=1}^{\infty}a_{k}<\infty.

Theorem 4.7.

Let {xk}\{x_{k}\} be the sequence generated by the algorithm. Then, as kk\to\infty, the sequence {xk}\{x_{k}\} converges to a weak Pareto solution of the original problem.

Proof.

Let {xk}\{x_{k}\} be the sequence generated by SAPGM, and let x¯\overline{x} be its cluster point. For any znz\in\mathbb{R}^{n}, if we can prove that μ0(x¯)=0\mu_{0}(\overline{x})=0, then x¯\overline{x} is a weak Pareto optimal solution of the original problem.

Because μ0(x¯)=0maxi=1,,m[Fi(z)Fi(x¯)]0,zn\mu_{0}(\overline{x})=0\iff\max_{i=1,\dots,m}[F_{i}(z)-F_{i}(\overline{x})]\geq 0,\forall z\in\mathbb{R}^{n}, we only need to prove

maxi=1,,m[Fi(z)Fi(x¯)]0,zn.\max_{i=1,\dots,m}[F_{i}(z)-F_{i}(\overline{x})]\geq 0,\forall z\in\mathbb{R}^{n}.

Therefore, we can reform the problem by some smoothing function properties in [8].

maxi=1,,m[Fi(z)Fi(x¯)]=\displaystyle\max_{i=1,\dots,m}[F_{i}(z)-F_{i}(\overline{x})]= maxi=1,,m[Fi(z)F~i(z,μk)+F~i(z,μk)F~i(x¯,μk)\displaystyle\max_{i=1,\dots,m}[F_{i}(z)-\tilde{F}_{i}(z,\mu_{k})+\tilde{F}_{i}(z,\mu_{k})-\tilde{F}_{i}(\overline{x},\mu_{k})
+F~i(x¯,μk)Fi(x¯)]\displaystyle+\tilde{F}_{i}(\overline{x},\mu_{k})-F_{i}(\overline{x})]
maxi=1,,m[F~i(z,μk)F~i(x¯,μk)]κμk.\displaystyle\geq\max_{i=1,\dots,m}[\tilde{F}_{i}(z,\mu_{k})-\tilde{F}_{i}(\overline{x},\mu_{k})]-\kappa\mu_{k}.

Through the subproblem φ\varphi_{\ell}, we can get

maxi=1,,m[F~i(x¯,μk)F~i(x,μk)]φ(x¯+α(zx¯);x,x¯,μk)\displaystyle\max_{i=1,\dots,m}[\tilde{F}_{i}(\overline{x},\mu_{k})-\tilde{F}_{i}(x,\mu_{k})]\leq\varphi_{\ell}(\overline{x}+\alpha(z-\overline{x});x,\overline{x},\mu_{k})
=maxi=1,,m[f~i(x¯,μk),α(zx¯)+gi(x¯+α(zx¯))+f~i(x¯,μk)F~i(x,μk)]\displaystyle=\max_{i=1,\dots,m}\left[\left\langle\nabla\tilde{f}_{i}(\overline{x},\mu_{k}),\alpha(z-\overline{x})\right\rangle+g_{i}(\overline{x}+\alpha(z-\overline{x}))+\tilde{f}_{i}(\overline{x},\mu_{k})-\tilde{F}_{i}(x,\mu_{k})\right]
+(γμk)12α(zx¯)2.\displaystyle\quad+\frac{(\gamma\mu_{k})^{-1}}{2}\left\|\alpha(z-\overline{x})\right\|^{2}.

Due to the convexity of f~i\tilde{f}_{i}, we have

maxi=1,,m[F~i(x¯,μk)F~i(x,μk)]maxi=1,,m[F~i(x¯+α(zx¯),μk)F~i(x,μk)]+(γμk)12α(zx¯)2.\max_{i=1,\dots,m}[\tilde{F}_{i}(\overline{x},\mu_{k})-\tilde{F}_{i}(x,\mu_{k})]\leq\max_{i=1,\dots,m}[\tilde{F}_{i}(\overline{x}+\alpha(z-\overline{x}),\mu_{k})-\tilde{F}_{i}(x,\mu_{k})]+\frac{(\gamma\mu_{k})^{-1}}{2}\left\|\alpha(z-\overline{x})\right\|^{2}.

Furthermore, the convexity of F~i\tilde{F}_{i} leads to

maxi=1,,m[F~i(x¯,μk)F~i(x,μk)]\displaystyle\max_{i=1,\dots,m}[\tilde{F}_{i}(\overline{x},\mu_{k})-\tilde{F}_{i}(x,\mu_{k})] maxi=1,,m[αF~i(z,μk)+(1α)F~i(x¯,μk)F~i(x,μk)]\displaystyle\leq\max_{i=1,\dots,m}[\alpha\tilde{F}_{i}(z,\mu_{k})+(1-\alpha)\tilde{F}_{i}(\overline{x},\mu_{k})-\tilde{F}_{i}(x,\mu_{k})]
+(γμk)12α(zx¯)2\displaystyle\quad+\frac{(\gamma\mu_{k})^{-1}}{2}\left\|\alpha(z-\overline{x})\right\|^{2}
αmaxi=1,,m[F~i(z,μk)F~i(x¯,μk)]+maxi=1,,m[F~i(x¯,μk)F~i(x,μk)]\displaystyle\leq\alpha\max_{i=1,\dots,m}[\tilde{F}_{i}(z,\mu_{k})-\tilde{F}_{i}(\overline{x},\mu_{k})]+\max_{i=1,\dots,m}[\tilde{F}_{i}(\overline{x},\mu_{k})-\tilde{F}_{i}(x,\mu_{k})]
+(γμk)12α(zx¯)2.\displaystyle\quad+\frac{(\gamma\mu_{k})^{-1}}{2}\left\|\alpha(z-\overline{x})\right\|^{2}.

Therefore, we gain

maxi=1,,m[F~i(z,μk)F~i(x¯,μk)]α(γμk)12zx¯2.\max_{i=1,\dots,m}[\tilde{F}_{i}(z,\mu_{k})-\tilde{F}_{i}(\overline{x},\mu_{k})]\geq-\frac{\alpha(\gamma\mu_{k})^{-1}}{2}\left\|z-\overline{x}\right\|^{2}.

Letting α\alpha tend to 0 monotonically, we get maxi=1,,m[F~i(z,μk)F~i(x¯,μk)]0.\max_{i=1,\dots,m}[\tilde{F}_{i}(z,\mu_{k})-\tilde{F}_{i}(\overline{x},\mu_{k})]\geq 0.

At the same time, letting kk\to\infty, we have

maxi=1,,m[Fi(z)Fi(x¯)]0,zn.\max_{i=1,\dots,m}[F_{i}(z)-F_{i}(\overline{x})]\geq 0,\forall z\in\mathbb{R}^{n}.

Thus, x¯\overline{x} is a weak Pareto optimal solution of the original problem.

Next, if we prove that for all weak Pareto optimal solution x¯\overline{x}, limkxkx¯\lim_{k\to\infty}\|x_{k}-\overline{x}\| exists, then we can deduce the convergence of the sequence {xk}\{x_{k}\} from the lemma.

Because Wk+1(x¯)0W_{k+1}(\overline{x})\geq 0 and following inequality

Wk+1(z,μk+1)\displaystyle W_{k+1}(z,\mu_{k+1}) μk+1γk+12{2xk+1yk+1,yk+1z+xk+1yk+12}+2κμk+1,\displaystyle\leq\frac{-\mu_{k+1}\gamma_{k+1}}{2}\{2\langle x^{k+1}-y^{k+1},y^{k+1}-z\rangle+\|x^{k+1}-y^{k+1}\|^{2}\}+2\kappa\mu_{k+1},

we get

2yk+1xk+1,yk+1x¯xk+1yk+12+2κμk+10,2\langle y^{k+1}-x^{k+1},y^{k+1}-\overline{x}\rangle-\|x^{k+1}-y^{k+1}\|^{2}+2\kappa\mu_{k+1}\geq 0,

which implies

yk+1x¯2xk+1x¯2+2κμk+10.\|y^{k+1}-\overline{x}\|^{2}-\|x^{k+1}-\overline{x}\|^{2}+2\kappa\mu_{k+1}\geq 0.

Then,

xk+1x¯2\displaystyle\|x^{k+1}-\overline{x}\|^{2} yk+1x¯2\displaystyle\leq\|y^{k+1}-\overline{x}\|^{2}
=xk+k1k+α1(xkxk1)x¯2+2κμk+1\displaystyle=\|x^{k}+\frac{k-1}{k+\alpha-1}(x^{k}-x^{k-1})-\overline{x}\|^{2}+2\kappa\mu_{k+1}
=xkx¯2+(k1k+α1)2xkxk12\displaystyle=\|x^{k}-\overline{x}\|^{2}+\left(\frac{k-1}{k+\alpha-1}\right)^{2}\|x^{k}-x^{k-1}\|^{2}
+2(k1k+α1)xkx¯,xkxk1+2κμk+1\displaystyle\quad+2\left(\frac{k-1}{k+\alpha-1}\right)\langle x^{k}-\overline{x},x^{k}-x^{k-1}\rangle+2\kappa\mu_{k+1}
=xkx¯2+((k1k+α1)2+k1k+α1)xkxk12\displaystyle=\|x^{k}-\overline{x}\|^{2}+\left(\left(\frac{k-1}{k+\alpha-1}\right)^{2}+\frac{k-1}{k+\alpha-1}\right)\|x^{k}-x^{k-1}\|^{2}
+k1k+α1(xkx¯2xk1x¯2)+2κμk+1\displaystyle\quad+\frac{k-1}{k+\alpha-1}(\|x^{k}-\overline{x}\|^{2}-\|x^{k-1}-\overline{x}\|^{2})+2\kappa\mu_{k+1}
xkx¯2+2xkxk12\displaystyle\leq\|x^{k}-\overline{x}\|^{2}+2\|x^{k}-x^{k-1}\|^{2}
+k1k+α1(xkx¯2xk1x¯2)+2κμk+1.\displaystyle\quad+\frac{k-1}{k+\alpha-1}(\|x^{k}-\overline{x}\|^{2}-\|x^{k-1}-\overline{x}\|^{2})+2\kappa\mu_{k+1}.

Let hk:=xkx¯2h_{k}:=\|x^{k}-\overline{x}\|^{2}, we have

(hk+1hk)+(k1k+α1)(hkhk1)++2xkxk12+2κμk+1.(h_{k+1}-h_{k})_{+}\leq\left(\frac{k-1}{k+\alpha-1}\right)(h_{k}-h_{k-1})_{+}+2\|x^{k}-x^{k-1}\|^{2}+2\kappa\mu_{k+1}.

From Lemmas 4.5 and 4.6, we obtain k=1(hk+1hk)+<\sum_{k=1}^{\infty}(h_{k+1}-h_{k})_{+}<\infty, and from the non-negativity of {hk}\{h_{k}\}, we know that limkhk\lim_{k\to\infty}h_{k} exists. ∎

Remark 4.8.

Now, from the sequence convergence, we set

xkxk+1<ϵ,andμk+1<ϵ\parallel x^{k}-x^{k+1}\parallel_{\infty}<\epsilon,and\ \mu_{k+1}<\epsilon

as the algorithm-stopping criterion. From the above proof process, it is very natural to see the reason for our setting.

5. Efficient computation of the subproblem via its dual

In the previous section, we proved the global convergence and complexity results of SAPGM. Subsequently, our focus shifts to empirically assessing the method’s practical efficacy. Specifically, we elucidate a methodology for computing the subproblem. To commence, let us introduce a formal definition.

ψi(z;x,y,μ):=f~i(y,μ),zy+gi(z)+f~i(y,μ)F~i(x,μ)+2zy2\psi_{i}(z;x,y,\mu):=\left\langle\nabla\tilde{f}_{i}(y,\mu),z-y\right\rangle+g_{i}(z)+\tilde{f}_{i}(y,\mu)-\tilde{F}_{i}(x,\mu)+\frac{\ell}{2}\left\|z-y\right\|^{2} (29)

for all i=1,,mi=1,\dotsb,m. Then, fixing some L\ell\geq L, we can rewrite the objective function φ(z;x,y)\varphi_{\ell}(z;x,y) as

φ(z;x,y,μ)=maxi=1,,mψi(z;x,y,μ).\varphi_{\ell}(z;x,y,\mu)=\max_{i=1,\dotsb,m}\psi_{i}(z;x,y,\mu).

Based on the discussion in [42], we obtain the dual problem as follows:

maxλm\displaystyle\max_{\lambda\in\mathbb{R}^{m}} ω(λ)\displaystyle\quad\omega(\lambda) (30)
s.t.\displaystyle\mathrm{s.t.} λ0andi=1mλi=1,\displaystyle\quad\lambda\geq 0\quad\mathrm{and}\quad\sum_{i=1}^{m}\lambda_{i}=1,

where

ω(λ)\displaystyle\omega(\lambda) :=1i=1mλigi(y1i=1mλif~i(y,μ))\displaystyle:=\ell{\mathcal{M}_{\frac{1}{\ell}\sum_{i=1}^{m}\lambda_{i}g_{i}}}\left(y-\frac{1}{\ell}\sum_{i=1}^{m}\lambda_{i}\nabla\tilde{f}_{i}(y,\mu)\right) (31)
12i=1mλif~i(y,μ)2+i=1mλi{f~i(μ)F~i(x,μ)}\displaystyle\quad-\frac{1}{2\ell}\left\|\sum_{i=1}^{m}\lambda_{i}\nabla\tilde{f}_{i}(y,\mu)\right\|^{2}+\sum_{i=1}^{m}\lambda_{i}\{\tilde{f}_{i}(\mu)-\tilde{F}_{i}(x,\mu)\}

Given the identification of the global optimal solution λ\lambda^{*} for the dual problem (30), it becomes feasible to construct the optimal solution zz^{*} for the original subproblem as follows:

z=𝐩𝐫𝐨𝐱1i=1mλigi(y1i=1mλif~i(y,μ)),z^{*}=\mathbf{prox}_{\frac{1}{\ell}\sum_{i=1}^{m}\lambda_{i}^{*}g_{i}}\left(y-\frac{1}{\ell}\sum_{i=1}^{m}\lambda_{i}^{*}\nabla\tilde{f}_{i}(y,\mu)\right), (32)

where prox denotes the proximal operator.

So we can choose the Frank-Wolfe method [24] to solve the above dual problem (30).

Algorithm 2 Frank-Wolfe algorithm
0:  xDx\in D,where DD is the feasible set of problem.KK is max iteration number,μ\mu\in\mathbb{R} is smoothing parameter.
1:  for k=0,1,,Kk=0,1,\dotsb,K do
2:     Compute s=argminsDs,F~(xk,μ)s=\arg\min_{s\in D}\left\langle s,\nabla\tilde{F}(x^{k},\mu)\right\rangle
3:     Update xk+1:=(12k+2)xk+2k+2sx^{k+1}:=(1-\frac{2}{k+2})x^{k}+\frac{2}{k+2}s
4:  end for

Additionally, ω\omega is differentiable, which can make the Frank-Wolfe method easy to achieve, as the following Lemma shows.

Lemma 5.1 ([42],Theorem 6.1).

The function ω:m\omega:\mathbb{R}^{m}\to\mathbb{R} defined by (31)\left(31\right) is continaously differentiahle at every λm\lambda\in\mathbb{R}^{m} and

ω(λ)=\displaystyle\nabla\omega(\lambda)= g(prox1i=1mλigi(y1i=1mλif~i(y,μ)))\displaystyle g\left(\textbf{prox}_{\frac{1}{\ell}\sum\limits_{i=1}^{m}\lambda_{i}g_{i}}\left(y-\frac{1}{\ell}\sum\limits_{i=1}^{m}\lambda_{i}\nabla\tilde{f}_{i}(y,\mu)\right)\right)
+\displaystyle+ Jf~(y)(prox1i=1mλigi(y1i=1mλif~i(y,μ))y)+f~(y,μ)F~(x,μ),\displaystyle J_{\tilde{f}}(y)\left(\textbf{prox}_{\frac{1}{\ell}\sum\limits_{i=1}^{m}\lambda_{i}g_{i}}\left(y-\frac{1}{\ell}\sum\limits_{i=1}^{m}\lambda_{i}\nabla\tilde{f}_{i}(y,\mu)\right)-y\right)+\tilde{f}(y,\mu)-\tilde{F}(x,\mu),

where prox is the proximal operator, and Jc~(y)J_{\tilde{c}}(y) is the Jacobian matrix at yy given by

Jf~(y):=(f~1(y,μ),,f~m(y,μ)).J_{\tilde{f}}(y):=\left(\nabla\tilde{f}_{1}(y,\mu),\ldots,\nabla\tilde{f}_{m}(y,\mu)\right)^{\top}.

The proof is similar to that in [42]. This theorem establishes that the dual problem denoted as (30) constitutes an mm-dimensional differentiable convex optimization problem. Consequently, the effective computation of the proximal operator for the summation i=1mλigi\sum_{i=1}^{m}\lambda_{i}g_{i} in a rapid manner would enable the resolution of (30) through the application of convex optimization techniques.

6. Numerical experiments

In this section, we present numerical results to show the good performance of the SAPGM algorithm for solving (1). The numerical experiments are performed in Python 3.10 on a 64-bit Lenovo PC with a 12th Gen Intel(R) Core(TM) i7-12700H CPU @ 2.70 GHz and 16GB RAM. To compare with the SAPGM, we use DNNM [18], the descent method for local Lipschitz multi-objective optimization problems, to conduct controlled experiments on the same test problems. For simplicity, we use Iter to represent the number of iterations and Time to represent the amount of time a program takes to run.

For convenience, we introduce some smoothing functions as follows: For the maximum function max(z,0)\max(z,0),we use its smoothing function [14] as follow:

ϕ~(z,μ)={0,z<μ(z+μ)36μ2,μz<0z+(z+μ)36μ2,0zmuz,z>μ\displaystyle\tilde{\phi}(z,\mu)=\begin{cases}0,&z<-\mu\\ \frac{(z+\mu)^{3}}{6\mu^{2}},&-\mu\leq z<0\\ z+\frac{(z+\mu)^{3}}{6\mu^{2}},&0\leq z\leq mu\\ z,&z>\mu\end{cases}

For the maximum function max(z1,,zn)\max(z_{1},\dotsb,z_{n}),it can be represented by max{z,0}\max\{z,0\} becase max{a,b}=a+max{ba,0}\max\{a,b\}=a+\max\{b-a,0\}.

For the 1\ell_{1} -norm function z1\left\|z\right\|_{1},we define its smoothing function as follow:

θ~(z,μ)={|z|if|z|>μ,z22μ+μ2if|z|μ,\displaystyle\tilde{\theta}(z,\mu)=\begin{cases}|z|\quad\quad\quad if\ |z|>\mu,\\ \frac{z^{2}}{2\mu}+\frac{\mu}{2}\quad if\ |z|\leq\mu,\end{cases}

To demonstrate the performance of SAPGM, we selected the DNNM algorithm as a comparison algorithm and chose three types of problems as our benchmark tests: small-scale bi-objective optimization problems, large-scale bi-objective optimization problems with sparse structures, and tri-objective optimization problems. The objective functions in the test problem are selected from [[23],[28],[47]].Now we list them in Table LABEL:test_problem:

Table 1. Test Problems
Problem Functions 𝐱\mathbf{x}
Large scale problem f1(𝐱)=max{A𝐱,0}b1+0.01𝐱1f2(𝐱)=max{A𝐱b1ϵ^,0}0.03𝐱1\begin{aligned} f_{1}(\mathbf{x})&=\left\|\max\{A\mathbf{x},0\}-b\right\|_{1}+0.01\left\|\mathbf{x}\right\|_{1}\\ f_{2}(\mathbf{x})&=-\max\{\left\|A\mathbf{x}-b\right\|_{1}-\hat{\epsilon},0\}-0.03\left\|\mathbf{x}\right\|_{1}\end{aligned} 𝟎𝐱𝟏\mathbf{0}\leq\mathbf{x}\leq\mathbf{1}
CR & MF2 f1(𝐱)=max{x12+(x21)2+x21,x12(x21)2+x2+1}f2(𝐱)=x1+2(x12+x221)+1.75|x12+x221|\begin{aligned} f_{1}(\mathbf{x})&=\max\{x_{1}^{2}+(x_{2}-1)^{2}+x_{2}-1,-x_{1}^{2}-(x_{2}-1)^{2}+x_{2}+1\}\\ f_{2}(\mathbf{x})&=-x_{1}+2(x_{1}^{2}+x_{2}^{2}-1)+1.75|x_{1}^{2}+x_{2}^{2}-1|\end{aligned} 1.5𝐱𝟐\mathbf{1.5}\leq\mathbf{x}\leq\mathbf{2}
CB3 & LQ f1(𝐱)=max{x14+x22,(2x1)2+(2x2)2,2ex2x1}f2(𝐱)=max{x1x2,x1x2+x12+x221}\begin{aligned} f_{1}(\mathbf{x})&=\max\{x_{1}^{4}+x_{2}^{2},(2-x_{1})^{2}+(2-x_{2})^{2},2e^{x_{2}-x_{1}}\}\\ f_{2}(\mathbf{x})&=\max\{-x_{1}-x_{2},-x_{1}-x_{2}+x_{1}^{2}+x_{2}^{2}-1\}\end{aligned} 1.5𝐱𝟐\mathbf{1.5}\leq\mathbf{x}\leq\mathbf{2}
CB3 & MF1 f1(𝐱)=max{x14+x22,(2x1)2+(2x2)2,2ex2x1}f2(𝐱)=x1+20max{x12+x221,0}\begin{aligned} f_{1}(\mathbf{x})&=\max\{x_{1}^{4}+x_{2}^{2},(2-x_{1})^{2}+(2-x_{2})^{2},2e^{x_{2}-x_{1}}\}\\ f_{2}(\mathbf{x})&=-x_{1}+20\max\{x_{1}^{2}+x_{2}^{2}-1,0\}\end{aligned} 𝟎𝐱𝟏\mathbf{0}\leq\mathbf{x}\leq\mathbf{1}
JOS1 & 1\ell_{1} f1(𝐱)=1ni=1nxi2f2(𝐱)=1ni=1n(xi2)2f3(𝐱)=𝐱1\begin{aligned} f_{1}(\mathbf{x})&=\frac{1}{n}\sum_{i=1}^{n}x_{i}^{2}\\ f_{2}(\mathbf{x})&=\frac{1}{n}\sum_{i=1}^{n}(x_{i}-2)^{2}\\ f_{3}(\mathbf{x})&=\parallel\mathbf{x}\parallel_{1}\end{aligned} 𝟏𝐱𝟐\mathbf{1}\leq\mathbf{x}\leq\mathbf{2}
BK1 & 1\ell_{1} f1(𝐱)=x12+x22f2(𝐱)=(x15)2+(x25)2f3(𝐱)=𝐱1\begin{aligned} f_{1}(\mathbf{x})&=x_{1}^{2}+x_{2}^{2}\\ f_{2}(\mathbf{x})&=(x_{1}-5)^{2}+(x_{2}-5)^{2}\\ f_{3}(\mathbf{x})&=\parallel\mathbf{x}\parallel_{1}\end{aligned} 𝟓𝐱𝟏𝟎\mathbf{-5}\leq\mathbf{x}\leq\mathbf{10}
SP1 & 1\ell_{1} f1(𝐱)=(x11)2+(x1x2)2f2(𝐱)=(x23)2+(x1x2)2f3(𝐱)=𝐱1\begin{aligned} f_{1}(\mathbf{x})&=(x_{1}-1)^{2}+(x_{1}-x_{2})^{2}\\ f_{2}(\mathbf{x})&=(x_{2}-3)^{2}+(x_{1}-x_{2})^{2}\\ f_{3}(\mathbf{x})&=\parallel\mathbf{x}\parallel_{1}\end{aligned} 𝟓𝐱𝟏𝟎\mathbf{5}\leq\mathbf{x}\leq\mathbf{10}

For the large-scale bi-objective optimization problems with sparse structures, we selected three sparsity levels: 10%, 20%, and 50%. For a given group of (m, n, Spar), the data in a large-scale problem is generated as follows:

𝐀=𝐧𝐩.𝐫𝐚𝐧𝐝𝐨𝐦.𝐫𝐚𝐧𝐝𝐧(𝐦,𝐧);\displaystyle\mathbf{A}=\mathbf{np.random.randn(m,n)}; 𝐬=𝐒𝐩𝐚𝐫𝐧;\displaystyle\mathbf{s}=\mathbf{Spar*n};
𝐱=𝐧𝐩.𝐫𝐚𝐧𝐝𝐨𝐦.𝐮𝐧𝐢𝐟𝐨𝐫𝐦(𝟎,𝟏,(𝟐𝟎𝟎,𝟏));\displaystyle\mathbf{x}=\mathbf{np.random.uniform(0,1,(200,1))}; 𝐱[:𝐧𝐢𝐧𝐭(𝐬)]=𝟎;\displaystyle\mathbf{x[:n-int(s)]=0};
𝐧𝐩.𝐫𝐚𝐧𝐝𝐨𝐦.𝐬𝐡𝐮𝐟𝐟𝐥𝐞(𝐱);\displaystyle\mathbf{np.random.shuffle(x)}; 𝐱[𝐱>𝟏]=𝟏;\displaystyle\mathbf{x[x>1]=1};
𝐛𝐛=𝐀.𝐝𝐨𝐭(𝐱);\displaystyle\mathbf{bb}=\mathbf{A.dot(x)}; 𝐛=𝐧𝐩.𝐦𝐚𝐱𝐢𝐦𝐮𝐦(𝐛𝐛,𝐧𝐩.𝐳𝐞𝐫𝐨𝐬(𝐛𝐛.𝐬𝐡𝐚𝐩𝐞)).\displaystyle\mathbf{b}=\mathbf{np.maximum(bb,np.zeros(bb.shape))}.

The parameter settings for the DNNM algorithm can be referenced below:

σ=0.75,α=4,μ0=0.5,ϵ^=0.001,itermax=1e3.\sigma=0.75,\alpha=4,\mu_{0}=0.5,\hat{\epsilon}=0.001,iter_{max}=1e3.

The parameter settings for the DNNM algorithm can be referenced below:

ε=1e3,δ=1e3,c=0.25,t0=1,itermax=1e3.\varepsilon=1e-3,\delta=1e-3,c=0.25,t_{0}=1,iter_{max}=1e3.

To demonstrate that using objective functions like JOS1 in three-objective test problems is reasonable, we compare them with the fast proximal gradient algorithm for multi-objective optimization [50]. For convenience, we refer to it as FPGA. This comparison shows that the SAPGM algorithm can degenerate into FPGA, thereby confirming that the composition of the three-dimensional test problems is appropriate. The results are listed in Table LABEL:com_purity and Figure 1. It can be seen from the results that although the involvement of smoothing causes SAPGM to be slower than FPGA on smooth problems, both can obtain similar Pareto fronts. This indicates that SAPGM can degenerate into FPGA.

Table 2. Comparison between SAPGM and FPGA (Purity, Gamma, Delta, and HVS)
Problem SAPGM FPGA
purity Γ\Gamma Δ\Delta hvs purity Γ\Gamma Δ\Delta hvs
JOS1 0.9155 0.0787 0.8684 0.1163 0.9155 0.0787 0.8684 0.1163
BK1 0.9670 0.4703 0.9989 0.0920 0.9520 0.1259 0.6884 0.0221
SP1 0.9437 0.1070 0.6819 0.0838 0.7370 0.3318 1.4252 0.0187
Refer to caption
(a)
Refer to caption
(b)

(a) JOS1

Refer to caption
(c)
Refer to caption
(d)

(b) BK1

Refer to caption
(e)
Refer to caption
(f)

(c) SP1

Figure 1. The Pareto fronts for Smooth problems.

We use the following metrics to evaluate the performance of the algorithms:

Number of Iterations: The total number of iterations required to meet the stopping criteria.

Time: The time taken to satisfy the stopping criteria.

Purity [2]: This metric represents the proportion of solutions obtained by a given solver that lie within the approximated Pareto frontier.

Hypervolume(hvs) [51]: This metric quantifies the volume of the objective space dominated by the obtained Pareto frontier.

Spread Metrics (Γ\Gamma and Δ\Delta) [10]: These metrics assess the distribution of solutions across the Pareto frontier.

Additionally, we constructed performance profiles [11] for each evaluation metric to facilitate a comprehensive comparison of the algorithms.

We now check the performance of the algorithms. For each problem above, we run the algorithms with 200 different initial points, in which Figure 3 to Figure 5 are the Pareto front of the large-scale double objective optimization problems, Figure 6 is the front of the three-objective optimization problems, and Figure 2 is the front of the small-scale double objective optimization problems. In general, SAPGM can map the problem ground surface well, while the DNNM algorithm can not accurately reflect the problem ground surface in some problems. Table 3 shows the average of the computational time and iteration counts for each problem. From the table, it is possible to see that acceleration is in general more efficient in terms of time. In fact, by checking the performance profiles given in Figure 7(a) and Figure 7(b), we observe that SAPGM performs better in terms of iteration counts and time.

Besides the performance, it is usually important to see how good the Pareto frontier is. Thus, once again we show performance profiles, spread metric Γ\Gamma (Figure 7(c)), spread metric Δ\Delta (Figure 7(d)) hypervolume (Figure 7(e)) and this time for purity (Figure 7(f)). SAPGM outperforms the DNNM, obtaining better Pareto frontiers. We can thus conclude that at least among the test problems considered, SAPGM seems promising both in terms of performance and uniform Pareto frontiers.

In cases where the SAPGM algorithm exhibits the same number of iterations across different problems as shown in the table, we discovered that the number of iterations is related to the parameter constraints of μ\mu. As the constraints are reduced, the number of iterations changes, but this does not significantly affect the characterization of the Pareto front. Three kinds of problems, the CR&MF2, JOS1&1\ell_{1} and large scale problem ((m,n)=(500,100),spar=10%), are selected as test problems to explore the influence of different μ\mu on algorithm iteration times and Pareto frontier characterization. The results are listed in Table 4 to Table 6.In Table 6, hypervolume is zero due to the low sparsity of the initial point. It does not significantly affect the results. We find that with the decrease of μ\mu, the number of iterations and running time increase. However, judging from the performance profiles used before, the decrease of μ\mu does not strengthen the characterization of the Pareto frontier but achieves slightly worse results in some problems. So we confirm that μ\mu’s choice of 1e-3 is a reasonable choice.

Table 3. Performance of SAPGM and DNNM
Class Problem SAPGM DNNM
iter time iter time
Two obj CR&MF2 43600 97.6322 200000 244.7978
CB3&LQ 43600 128.4447 223294 471.4855
CB3&MF1 43600 96.1398 150972 348.6537
Large scale Spar = 0.1
500*100 2200 27.4354 95498 10941.3457
1000*200 2200 50.6058 31635 2558.4360
2000*400 2200 208.9523 199316 82865.6740
Spar = 0.2
500*100 2200 32.7338 165255 11572.2238
1000*200 2200 65.8336 173010 21074.3461
2000*400 2200 322.2478 198621 41724.1614
Spar = 0.5
500*100 2200 28.2064 194611 12932.9436
1000*200 2200 80.7623 113755 14413.0340
2000*400 2200 171.9536 200000 48799.6603
Three obj JOS1&1\ell_{1} 43600 129.6857 48344 132.5368
BK1&1\ell_{1} 43600 346.8673 761652 1311.6791
SP1&1\ell_{1} 43600 345.6614 196950 405.0506
Table 4. Results for different values of μ\mu in CR & MF2.
Metric CR & MF2
μ\mu=1e-1 μ\mu=1e-2 μ\mu=1e-3 μ\mu=1e-5 μ\mu=1e-7 μ\mu=1e-10
Iter 800 6200 43600 200000 200000 200000
Time 1.1963 8.1796 93.1940 629.3381 633.3625 2046.4206
Purity 0 0.0693 0.8713 0.8713 0.8713 0.8713
Γ\Gamma / / 6.9795 6.9795 6.9795 6.9795
Δ\Delta / / 0.8068 0.8068 0.8068 0.8068
hvs 0 0 128.0904 128.0904 128.0904 128.0904
Table 5. Results for different values of μ\mu in JOS1 & 1\ell_{1}.
Metric JOS1 & 1\ell_{1}
μ\mu=1e-1 μ\mu=1e-2 μ\mu=1e-3 μ\mu=1e-5 μ\mu=1e-7 μ\mu=1e-10
Iter 800 6200 43600 200000 200000 200000
Time 1.9047 17.9726 129.6857 1644.7979 1645.9337 2146.4206
Purity 0 0 0.9559 0.9559 0.9559 0.9559
Γ\Gamma / / 0.1591 0.1591 0.1591 0.1591
Δ\Delta / / 0.8635 0.8635 0.8635 0.8635
hvs 0 0 0.5866 0.5866 0.5866 0.5866
Table 6. Results for different values of μ\mu in the Large scale problem .
Metric Large scale problem when (m,n,Spar) = (500,100,10%)
μ\mu=1e-1 μ\mu=1e-2 μ\mu=1e-3 μ\mu=1e-5 μ\mu=1e-7 μ\mu=1e-10
Iter 800 2200 2200 2200 2200 2200
Time 14.7241 50.6055 27.4354 36.1575 44.6184 40.6530
Purity 1.0000 1.0000 0.9600 0.9570 0.9570 0.9570
Γ\Gamma 0.3364 0.3256 0.3256 0.3256 0.3256 0.3256
Δ\Delta 1.9225 2.3421 2.3420 2.3421 2.3421 2.3421
hvs 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Refer to caption
(a)
Refer to caption
(b)

(a) CR&MF2

Refer to caption
(c)
Refer to caption
(d)

(b) CB3&LQ

Refer to caption
(e)
Refer to caption
(f)

(c) CB3&MF1

Figure 2. The Pareto fronts for small scale two objective optimization problems.
Refer to caption
(a)
Refer to caption
(b)

(a) spar = 10%,(m,n) = (500,100)

Refer to caption
(c)
Refer to caption
(d)

(b) spar = 10%,(m,n) = (1000,200)

Refer to caption
(e)
Refer to caption
(f)

(c) spar = 10%,(m,n) = (2000,400)

Figure 3. The Pareto fronts for large scale problems when Spar = 10%.
Refer to caption
(a)
Refer to caption
(b)

(a) spar = 20%,(m,n) = (500,100)

Refer to caption
(c)
Refer to caption
(d)

(b) spar = 20%,(m,n) = (1000,200)

Refer to caption
(e)
Refer to caption
(f)

(c) spar = 20%,(m,n) = (2000,400)

Figure 4. The Pareto fronts for large scale problems when Spar = 20%.
Refer to caption
(a)
Refer to caption
(b)

(a) spar = 50%,(m,n) = (500,100)

Refer to caption
(c)
Refer to caption
(d)

(b) spar = 50%,(m,n) = (1000,200)

Refer to caption
(e)
Refer to caption
(f)

(c) spar = 50%,(m,n) = (2000,400)

Figure 5. The Pareto fronts for large scale problems when Spar = 50%.
Refer to caption
(a)
Refer to caption
(b)

(a) BK1&1\ell_{1}

Refer to caption
(c)
Refer to caption
(d)

(b) JOS1&1\ell_{1}

Refer to caption
(e)
Refer to caption
(f)

(c) SP1&1\ell_{1}

Figure 6. The Pareto fronts for Tri-objective optimization problems.
Refer to caption
(a) Performance Profile:iteration
Refer to caption
(b) Performance Profile:time
Refer to caption
(c) Performance Profile:Spread metric Γ\Gamma
Refer to caption
(d) Performance Profile:Spread metric Δ\Delta
Refer to caption
(e) Performance Profile:Hypervolume
Refer to caption
(f) Performance Profile:Purity
Figure 7. Performance Profile.

7. Conclusions

In this paper, we propose a Smoothing Accelerated Proximal Gradient (SAPG) algorithm designed for the resolution of nonsmooth convex multiobjective optimization problems. Each iteration involves employing the accelerated proximal gradient with an extrapolation coefficient of k1k+α1\frac{k-1}{k+\alpha-1} to minimize the problem (1) with a fixed smoothing parameter, followed by an update to the smoothing parameter. Besides, we prove its convergence rate by a global energy function, which improves to o(lnσk/k)o(\ln^{\sigma}k/k). Additionally, theoretical proofs affirm that the iterates sequence converges to an optimal solution to the problem. An effective strategy for solving the subproblem is presented through its dual representation. The results of numerical experiments underscore the superior performance of the SAPG algorithm and underscore the importance of extrapolation in achieving faster convergence rates.

For future work, we plan to discuss the influence of parameters ,1/μ\ell,1/\mu, and α\alpha on the convergence speed of the algorithm, and give more general parameter selection criteria. This will be more conducive to the application of the algorithm to specific problems and enhance the specific application value of the algorithm. In addition, we will also study whether SAPGM has a good effect on problems with higher dimensions and more objective functions, and we hope to replace 1\ell_{1} norm with 0\ell_{0} norm in the target problem, which is conducive to the application of the algorithm in large-scale sparse optimization problems. But it also means that we need more theories to support our research on these goals.

Acknowledgments

We would like to thank you for following the instructions above very closely. It will save us lot of time and expedite the process of your article’s publication.

References

  • [1] H. Attouch and J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than O(1/k2)O(1/k^{2}), SIAM Journal on Optimization, 2016, 26(3): 1824-1834.
  • [2] S. Bandyopadhyay, S. K. Pal and A. B. Aruna, Multiobjective GAs, quantitative indices, and pattern classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(5): 2088-2099.
  • [3] N. Belgasmi, L. Ben Saïd and K. Ghédira, Evolutionary multiobjective optimization of the multi-location transshipment problem, Operational Research, 2008, 8: 167-183.
  • [4] H. Bonnel, A. N. Iusem and B. F. Svaiter, Proximal methods in vector optimization, SIAM Journal on Optimization, 2005, 15(4): 953-970.
  • [5] G. A. Carrizo, P. A. Lotito and M. C. Maciel, Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem, Mathematical Programming, 2016, 159: 339-369.
  • [6] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2009, 2(1): 183-202.
  • [7] A. Chambolle and C. Dossal, On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”, Journal of Optimization Theory and Applications, 2015, 166: 968-982.
  • [8] X. Chen, Smoothing methods for nonsmooth, nonconvex minimization, Mathematical Programming, 2012, 134: 71-99.
  • [9] A. Chinchuluun and P. M. Pardalos, A survey of recent developments in multiobjective optimization, Annals of Operations Research, 2007, 154(1): 29-50.
  • [10] A. L. Custódio, J. F. A. Madeira, A. I. F. Vaz and L. N. Vicente, Direct multisearch for multiobjective optimization, SIAM Journal on Optimization, 2011, 21(3): 1109-1140.
  • [11] E. D. Dolan and J. J. Moré, Benchmarking optimization software with performance profiles, Mathematical Programming, 2002, 91: 201-213.
  • [12] E. K. Doolittle, H. L. M. Kerivin and M. M. Wiecek, Robust multiobjective optimization with application to internet routing, Annals of Operations Research, 2018, 271: 487-525.
  • [13] F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems, Springer New York, 2003.
  • [14] Y. Feng, L. Hongwei, Z. Shuisheng, et al., A smoothing trust-region Newton-CG method for minimax problem, Applied Mathematics and Computation, 2008, 199(2): 581-589.
  • [15] J. Fliege and B. F. Svaiter, Steepest descent methods for multicriteria optimization, Mathematical Methods of Operations Research, 2000, 51: 479-494.
  • [16] J. Fliege, L. M. G. Drummond and B. F. Svaiter, Newton’s method for multiobjective optimization, SIAM Journal on Optimization, 2009, 20(2): 602-626.
  • [17] E. H. Fukuda and L. M. Graça Drummond, Inexact projected gradient method for vector optimization, Computational Optimization and Applications, 2013, 54: 473-493.
  • [18] B. Gebken and S. Peitz, An efficient descent method for locally Lipschitz multiobjective optimization problems, Journal of Optimization Theory and Applications, 2021, 188: 696-723.
  • [19] A. A. Goldstein, Optimization of Lipschitz continuous functions, Mathematical Programming, 1977, 13: 14-22.
  • [20] M. Haarala, K. Miettinen and M. M. Mäkela, New limited memory bundle method for large-scale nonsmooth optimization, Optimization Methods and Software, 2004, 19(6): 673-692.
  • [21] J. Hakanen and R. Allmendinger, Multiobjective optimization and decision making in engineering sciences, Optimization and Engineering, 2021, 22: 1031-1037.
  • [22] J. B. Hiriart-Urruty and C. Lemaréchal, Convex analysis and minimization algorithms I: Fundamentals, Springer Science & Business Media, 1996.
  • [23] S. Huband, P. Hingston, L. Barone and L. While, A review of multiobjective test problems and a scalable test problem toolkit, IEEE Transactions on Evolutionary Computation, 2006, 10(5): 477-506.
  • [24] M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, in International Conference on Machine Learning, 2013: 427-435.
  • [25] Y. Jin (Ed.), Multi-objective machine learning, Springer Science & Business Media, 2006.
  • [26] A. Kumari, Subgradient methods for non-smooth vector optimization problems, International Journal of Pure and Applied Mathematics, 2015, 102(3): 563-578.
  • [27] L. R. Lucambio Pérez and L. F. Prudente, Nonlinear conjugate gradient methods for vector optimization, SIAM Journal on Optimization, 2018, 28(3): 2690-2720.
  • [28] L. Lukšan and J. Vlcek, Test problems for nonsmooth unconstrained and linearly constrained optimization, Technical report, 2000.
  • [29] N. Mahdavi-Amiri and R. Yousefpour, An effective nonsmooth multiobjective optimization method for finding a weakly Pareto optimal solution of nonsmooth problems, International Journal of Applied and Computational Mathematics, 2012, 1(1): 1-21.
  • [30] M. M. Mäkela and P. Neittaanmäki, Nonsmooth optimization: Analysis and algorithms with applications to optimal control, World Scientific, 1992.
  • [31] M. M. Mäkelä, Multiobjective proximal bundle method for nonconvex nonsmooth optimization: Fortran subroutine MPBNGC 2.0, Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, 2003, 13.
  • [32] O. Montonen, N. Karmitsa, and M. M. Mäkelä, Multiple subgradient descent bundle method for convex nonsmooth multiobjective optimization, Optimization, 2018, 67(1), 139-158.
  • [33] Y. Nesterov, A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)O(1/k^{2}), Dokl. Akad. Nauk SSSR, 1983, 269(3): 543.
  • [34] Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, 2005, 103: 127-152.
  • [35] Y. Nishimura, E. H. Fukuda and N. Yamashita, Monotonicity for Multiobjective Accelerated Proximal Gradient Methods, arXiv preprint arXiv:2206.04412, 2022.
  • [36] Y. Nikulin, K. Miettinen, and M. M. Mäkelä, A new achievement scalarizing function based on parameterization in multiobjective optimization, OR Spectrum, 2012, 34, 69-87.
  • [37] P. Ren, Z. Zuo, and W. Huang, Effects of axial profile on the main bearing performance of internal combustion engine and its optimization using multiobjective optimization algorithms, Journal of Mechanical Science and Technology, 2021, 35, 3519-3531.
  • [38] M. Rockafellar, Convex analysis, Princeton University Press, 1997.
  • [39] M. Rocca, Sensitivity to uncertainty and scalarization in robust multiobjective optimization: An overview with application to mean-variance portfolio optimization, Annals of Operations Research, 2022: 1-16.
  • [40] R. T. Rockafellar and R. J. B. Wets, Variational analysis, Springer, 1998.
  • [41] E. E. Rosinger, Interactive algorithm for multiobjective optimization, Journal of Optimization Theory and Applications, 1981, 35, 339-365.
  • [42] H. Tanabe, E. H. Fukuda, and N. Yamashita, An accelerated proximal gradient method for multiobjective optimization, Computational Optimization and Applications, 2023, 1-35.
  • [43] H. Tanabe, E. H. Fukuda, and N. Yamashita, New merit functions for multiobjective optimization and their properties, Optimization, 2023, 1-38.
  • [44] H. Tanabe, E. H. Fukuda, and N. Yamashita, Proximal gradient methods for multiobjective optimization and their applications, Computational Optimization and Applications, 2019, 72, 339-361.
  • [45] H. Tanabe, E. H. Fukuda, and N. Yamashita, A globally convergent fast iterative shrinkage-thresholding algorithm with a new momentum factor for single and multi-objective convex optimization, arXiv preprint arXiv:2205.05262, 2022.
  • [46] P. Wang and Y. Ma, A dynamic multiobjective evolutionary algorithm based on fine prediction strategy and nondominated solutions-guided evolution, Applied Intelligence, 2023, 1-22.
  • [47] F. Wu and W. Bian, Smoothing Accelerated Proximal Gradient Method with Fast Convergence Rate for Nonsmooth Convex Optimization Beyond Differentiability, Journal of Optimization Theory and Applications, 2023, 197(2), 539-572.
  • [48] Z. Xia, Y. Liu, J. Lu, et al., Penalty method for constrained distributed quaternion-variable optimization, IEEE Transactions on Cybernetics, 2020, 51(11), 5631-5636.
  • [49] W. Xian, F. Huang, Y. Zhang, et al., A faster decentralized algorithm for nonconvex minimax problems, Advances in Neural Information Processing Systems, 2021, 34, 25865-25877.
  • [50] J. Zhang and X. Yang, The convergence rate of the accelerated proximal gradient algorithm for Multiobjective Optimization is faster than O(1/k2)O(1/k^{2}), arXiv preprint arXiv:2312.06913, 2023.
  • [51] E. Zitzler, Evolutionary algorithms for multiobjective optimization: Methods and applications, Shaker, Ithaca, 1999.

Received xxxx 20xx; revised xxxx 20xx; early access xxxx 20xx.