This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: José Luis Romero 22institutetext: Damián Fernández 33institutetext: Facultad de Matemática, Astronomía, Física y Computación
Universidad Nacional de Córdoba (CIEM–CONICET)
Av. Medina Allende s/n, Ciudad Universitaria, CP X5000HUA, Córdoba, Argentina
e-mail: joseluisromero@unc.edu.ar
44institutetext: Damián Fernández 55institutetext: e-mail: dfernandez@unc.edu.ar 66institutetext: Germán Ariel Torres, Corresponding author 77institutetext: Facultad de Ciencias Exactas y Naturales y Agrimensura
Universidad Nacional del Nordeste (IMIT–CONICET)
Av. Libertad 5470, Corrientes, CP 3400, Corrientes, Argentina
e-mail: german.torres@comunidad.unne.edu.ar

Enhancing sharp augmented Lagrangian methods with smoothing techniques for nonlinear programming

José Luis Romero    Damián Fernández    Germán Ariel Torres
(Received: date / Accepted: date)
Abstract

This paper proposes a novel approach to solving nonlinear programming problems using a sharp augmented Lagrangian method with a smoothing technique. Traditional sharp augmented Lagrangian methods are known for their effectiveness but are often hindered by the need for global minimization of nonconvex, nondifferentiable functions at each iteration. To address this challenge, we introduce a smoothing function that approximates the sharp augmented Lagrangian, enabling the use of primal minimization strategies similar to those in Powell–Hestenes–Rockafellar (PHR) methods. Our approach retains the theoretical rigor of classical duality schemes while allowing for the use of stationary points in the primal optimization process. We present two algorithms based on this method–one utilizing standard descent and the other employing coordinate descent. Numerical experiments demonstrate that our smoothing–based method compares favorably with the PHR augmented Lagrangian approach, offering both robustness and practical efficiency. The proposed method is particularly advantageous in scenarios where exact minimization is computationally infeasible, providing a balance between theoretical precision and computational tractability.

Keywords:
Augmented Sharp Lagrangian Continuous Optimization Nonlinear Programming
MSC:
49J53 49K99 90C30

1 Introduction

We want to solve the following nonlinear programming problem:

minimizef(x)subject toh(x)=0,\begin{array}[]{cl}\textrm{minimize}&f\left(x\right)\\[5.69054pt] \textrm{subject to}&h\left(x\right)=0,\end{array} (1)

where f:nf:\mathbbm{R}^{n}\to\mathbbm{R} and h:nmh:\mathbbm{R}^{n}\to\mathbbm{R}^{m} are twice continuously differentiable. Stationary points xx of (1) along with the associated Lagrange multipliers λ\lambda, are characterized by the following system of equations

xL(x,λ)=0,h(x)=0,\nabla_{x}L\left(x,\lambda\right)=0,\qquad h\left(x\right)=0, (2)

where L:n×mL:\mathbbm{R}^{n}\times\mathbbm{R}^{m}\to\mathbbm{R} such that L(x,λ)=f(x)+λ,h(x)L\left(x,\lambda\right)=f\left(x\right)+\left\langle\lambda,h\left(x\right)\right\rangle, is the Lagrangian function associated with problem (1).

Augmented Lagrangian methods have been extensively used to solve nonlinear programming problems. These methods involve the iterative minimization of an augmented Lagrangian function with respect to the primal variables, followed by appropriate updates to the dual variables. Among the various methods, the most studied are the so–called Powell–Hestenes–Rockafellar (PHR) augmented Lagrangian methods hestenes1969multiplier ; powell1969method ; rockafellar1974augmented , which are based on the function L¯2:n×m×(0,)\bar{L}_{2}:\mathbbm{R}^{n}\times\mathbbm{R}^{m}\times\left(0,\infty\right)\to\mathbbm{R} such that

L¯2(x;λ,r)=f(x)+λ,h(x)+r2h(x)2.\bar{L}_{2}\left(x;\lambda,r\right)=f\left(x\right)+\left\langle\lambda,h\left(x\right)\right\rangle+\frac{r}{2}\left\|h\left(x\right)\right\|^{2}. (3)

This method has been studied extensively in the literature bertsekas1982constrained ; conn1991globally ; conn1996convergence ; pennanen2002local ; birgin2005numerical ; andreani2008augmented ; fernandez2012local ; birgin2014practical , and implemented successfully in software packages such as LANCELOT conn1992lancelot and ALGENCAN andreani2007augmented . We emphasize that at each iteration, the function to be minimized, xL¯2(x;λ,r)x\mapsto\bar{L}_{2}\left(x;\lambda,r\right), is continuously differentiable. Furthermore, convergence results can be obtained even if, instead of a global minimizer, an approximate stationary point is found at each iteration.

Another group of methods is based on the sharp augmented Lagrangian function L¯1:×m×(0,)\bar{L}_{1}:\mathbbm{R}\times\mathbbm{R}^{m}\times\left(0,\infty\right)\to\mathbbm{R} defined as follows:

L¯1(x;λ,r)=f(x)+λ,h(x)+rh(x).\bar{L}_{1}\left(x;\lambda,r\right)=f\left(x\right)+\left\langle\lambda,h\left(x\right)\right\rangle+r\left\|h\left(x\right)\right\|. (4)

Methods based on this function have been studied in gasimov2002augmented ; burachik2006modified ; jefferson2009thesis ; burachik2010primal ; kasimbeyli2009modified ; bagirov2019sharp . These studies propose a duality scheme that preserves the main idea of the modified subgradient algorithm, where at each iteration, a function of the dual variables is obtained by globally minimizing xL¯1(x;λ,r)x\mapsto\bar{L}_{1}\left(x;\lambda,r\right), and the dual variables are updated in the direction of a subgradient of the dual function. A practical drawback of these methods is that at each iteration, one must find a global minimizer, or a good approximation, of a nonlinear, nonconvex, and nondifferentiable function.

In this work, we propose a method based on the sharp augmented Lagrangian with a primal approach similar to that used in PHR augmented Lagrangian methods. To perform the primal minimization, we shall use a suitable smoothing technique. Among all possible approaches to smoothing out the kinks of the sharp augmented Lagrangian function, we choose the one introduced in fernandez2022augmented . Specifically, it was shown that

L¯1(x;λ,r)=inft>0{L¯2(x;λ,rt)+r2t}.\bar{L}_{1}\left(x;\lambda,r\right)=\inf_{t>0}\left\{\textstyle\bar{L}_{2}\left(x;\lambda,\frac{r}{t}\right)+\frac{r}{2}t\right\}.

This relation establishes the sharp augmented Lagrangian as a scalarization of the penalty parameter in the PHR augmented Lagrangian (3). By adopting this approach, we aim to inherit some of the desirable properties of the PHR augmented Lagrangian.

For parameters (λ,r)m×(0,)\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right), we define the extended–real–valued smoothing function as follows:

L~λ,r(x,t)={f(x)+λ,h(x)+r2th(x)2+r2t,t>0,f(x),t=0,h(x)=0,,otherwise.\tilde{L}_{\lambda,r}\left(x,t\right)=\left\{\begin{array}[]{ll}f\left(x\right)+\left\langle\lambda,h\left(x\right)\right\rangle+\frac{r}{2t}\left\|h\left(x\right)\right\|^{2}+\frac{r}{2}t,&\quad t>0,\\[5.69054pt] f\left(x\right),&\quad t=0,\,h\left(x\right)=0,\\[5.69054pt] \infty,&\quad\textrm{otherwise}.\end{array}\right.

It can be observed that L~λ,r\tilde{L}_{\lambda,r} is a lower semicontinuous function on n×\mathbbm{R}^{n}\times\mathbbm{R} and is continuously differentiable on n×(0,)\mathbbm{R}^{n}\times\left(0,\infty\right). Clearly, for this function we have L¯1(x;λ,r)=mint0{L~λ,r(x,t)}\bar{L}_{1}\left(x;\lambda,r\right)=\min_{t\geq 0}\left\{\tilde{L}_{\lambda,r}\left(x,t\right)\right\}.

The goal is to modify the sharp augmented Lagrangian method by replacing the minimization of the function xL¯1(x;λ,r)x\mapsto\bar{L}_{1}\left(x;\lambda,r\right) with the minimization of the function (x,t)L~λ,r(x,t)\left(x,t\right)\mapsto\tilde{L}_{\lambda,r}\left(x,t\right). The nondifferentiability of L¯1\bar{L}_{1} at xx where h(x)=0h\left(x\right)=0 is addressed by introducing a singularity in L~λ,r\tilde{L}_{\lambda,r} at t=0t=0. We will demonstrate that employing this smoothing technique does not compromise the classical duality scheme, which requires exact minimizers at each step. Additionally, we will present a primal approach where stationary points at each step are acceptable.

The rest of the paper is organized as follows. In Section 2, we examine the classical duality approach based on exact minimization for the smoothing function. Section 3 introduces the primal approach using stationary points, detailing two algorithms: one employing standard descent and the other using coordinate descent. Section 4 studies the boundedness of the penaly parameter. Section 5 provides a comparison between the PHR augmented Lagrangian method and our two primal algorithms. Section 6 is dedicated to conclusiones, and finally, an Appendix is included, describing a set of test problems chosen by the authors.

We conclude this section by defining our notation. We use ,\langle\cdot,\cdot\rangle to denote the Euclidean inner product and \|\cdot\| to represent the associated norm.

2 Exact algorithm

We will prove that finding the global minimizer of L~λ,r\tilde{L}_{\lambda,r} instead of the global minimizer of L¯1(;λ,r)\bar{L}_{1}\left(\cdot;\lambda,r\right) allows us to recover the main results from (jefferson2009thesis, , Chapter 2). To achieve this, we propose a method using a dual approach based on a modified subgradient algorithm. To this end, we refer to (1) as the primal problem and define its associated augmented dual problem:

maximize(λ,r)m×(0,)q~(λ,r),\mathop{\textrm{maximize}}_{\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right)}\tilde{q}\left(\lambda,r\right),

where

q~(λ,r)=inf(x,t)n×L~λ,r(x,t).\tilde{q}\left(\lambda,r\right)=\inf_{\left(x,t\right)\in\mathbbm{R}^{n}\times\mathbbm{R}}\tilde{L}_{\lambda,r}\left(x,t\right).

Let us denote the set of minimizers of L~λ,r\tilde{L}_{\lambda,r} by AA, that is,

A(λ,r)={(x,t)n×|L~λ,r(x,t)=q~(λ,r)}.A\left(\lambda,r\right)=\left\{\left(x,t\right)\in\mathbbm{R}^{n}\times\mathbbm{R}\ \middle|\ \tilde{L}_{\lambda,r}\left(x,t\right)=\tilde{q}\left(\lambda,r\right)\right\}.

For the exact case, we will use Algorithm 1, which follows the classical dual approach based on the modified subgradient algorithm.

Step 0: Initialization
Choose (λ0,r0)m×(0,)\left(\lambda^{0},r_{0}\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right) and a sequence of exogenous parameters {αk}(0,)\left\{\alpha_{k}\right\}\subset\left(0,\infty\right).
Set k:=0k:=0.
Step 1: Solving the kk–th subproblem
  1. (a)

    Find (xk+1,tk+1)A(λk,rk)\left(x^{k+1},t_{k+1}\right)\in A\left(\lambda^{k},r_{k}\right).

  2. (b)

    If tk+1=0t_{k+1}=0, STOP.

  3. (c)

    If tk+10t_{k+1}\neq 0, go to Step 2.

Step 2: Updating dual variables
Set
λk+1\displaystyle\lambda^{k+1} =λk+rktk+1h(xk+1),\displaystyle=\lambda^{k}+\frac{r_{k}}{t_{k+1}}h\left(x^{k+1}\right),
rk+1\displaystyle r_{k+1} =2rk.\displaystyle=2r_{k}.
Set k:=k+1k:=k+1 and go to Step 1.
Algorithm 1 Exact sharp Lagrangian (modified subgradient alg.)

In the next result we show some properties of the elements within the solution set AA. One of these properties reveals a relationship between the minimizers of L~λ,r\tilde{L}_{\lambda,r} and the subgradients of q~-\tilde{q}. To establish this, note first that q~-\tilde{q} is a convex function, as it is the supremum of affine functions. Therefore, its subdifferential is given by

(q~)(λ^,r^)={(ξ,σ)q~(λ,r)\displaystyle\partial\left(-\tilde{q}\right)\left(\hat{\lambda},\hat{r}\right)=\Big{\{}\left(\xi,\sigma\right)\mid-\tilde{q}\left(\lambda,r\right) q~(λ^,r^)\displaystyle\geq-\tilde{q}\left(\hat{\lambda},\hat{r}\right)
+(ξ,σ),(λ,r)(λ^,r^),(λ,r)}.\displaystyle+\left.\left\langle\left(\xi,\sigma\right),\left(\lambda,r\right)-\left(\hat{\lambda},\hat{r}\right)\right\rangle,\ \forall\left(\lambda,r\right)\right\}.
Lemma 1

Let (x^,t^)A(λ^,r^)\left(\hat{x},\hat{t}\right)\in A\left(\hat{\lambda},\hat{r}\right), then it holds that

  1. (a)

    t^=h(x^)\hat{t}=\left\|h\left(\hat{x}\right)\right\|, and

  2. (b)

    (h(x^),t^)(q~)(λ^,r^)-\left(h\left(\hat{x}\right),\hat{t}\right)\in\partial\left(-\tilde{q}\right)\left(\hat{\lambda},\hat{r}\right).

Proof

(a) If t^>0\hat{t}>0, since L~λ^,r^\tilde{L}_{\hat{\lambda},\hat{r}} is continuously differentiable on n×(0,)\mathbbm{R}^{n}\times\left(0,\infty\right), then

0=L~λ^,r^t(x^,t^)=r^2(1h(x^)2t^2).0=\frac{\partial\tilde{L}_{\hat{\lambda},\hat{r}}}{\partial t}\left(\hat{x},\hat{t}\right)=\frac{\hat{r}}{2}\left(1-\frac{\left\|h\left(\hat{x}\right)\right\|^{2}}{\hat{t}^{2}}\right).

Thus, t^=h(x^)\hat{t}=\left\|h\left(\hat{x}\right)\right\|. In the case where t^=0\hat{t}=0, it cannot occur that h(x^)0h\left(\hat{x}\right)\neq 0, because in that situation q~(λ^,r^)=L~λ^,r^(x^,t^)=\tilde{q}\left(\hat{\lambda},\hat{r}\right)=\tilde{L}_{\hat{\lambda},\hat{r}}\left(\hat{x},\hat{t}\right)=\infty. However, we know that q~(λ^,r^)L~λ^,r^(x,t)<\tilde{q}\left(\hat{\lambda},\hat{r}\right)\leq\tilde{L}_{\hat{\lambda},\hat{r}}\left(x^{\prime},t^{\prime}\right)<\infty for any (x,t)\left(x^{\prime},t^{\prime}\right) with t>0t^{\prime}>0.

(b) Take (λ,r)m×(0,)\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right). If t^>0\hat{t}>0, then we have:

q~(λ,r)\displaystyle\tilde{q}\left(\lambda,r\right)\leq{} L~λ,r(x^,t^)=f(x^)+λ,h(x^)+r2t^h(x^)2+r2t^\displaystyle\tilde{L}_{\lambda,r}\left(\hat{x},\hat{t}\right)=f\left(\hat{x}\right)+\left\langle\lambda,h\left(\hat{x}\right)\right\rangle+\frac{r}{2\hat{t}}\left\|h\left(\hat{x}\right)\right\|^{2}+\frac{r}{2}\hat{t}
=\displaystyle={} f(x^)+λ^,h(x^)+r^2t^h(x^)2+r^2t^+h(x^),λλ^\displaystyle f\left(\hat{x}\right)+\left\langle\hat{\lambda},h\left(\hat{x}\right)\right\rangle+\frac{\hat{r}}{2\hat{t}}\left\|h\left(\hat{x}\right)\right\|^{2}+\frac{\hat{r}}{2}\hat{t}+\left\langle h\left(\hat{x}\right),\lambda-\hat{\lambda}\right\rangle
+\displaystyle+{} (rr^)2t^h(x^)2+(rr^)t^2\displaystyle\frac{\left(r-\hat{r}\right)}{2\hat{t}}\left\|h\left(\hat{x}\right)\right\|^{2}+\left(r-\hat{r}\right)\frac{\hat{t}}{2}
=\displaystyle={} q~(λ^,r^)+(h(x^),t^),(λ,r)(λ^,r^),\displaystyle\tilde{q}\left(\hat{\lambda},\hat{r}\right)+\left\langle\left(h\left(\hat{x}\right),\hat{t}\right),\left(\lambda,r\right)-\left(\hat{\lambda},\hat{r}\right)\right\rangle,

where in the last equation we use h(x^)=t^\left\|h\left(\hat{x}\right)\right\|=\hat{t} (item (a)). For the case when t^=0\hat{t}=0, by item (a) we have h(x^)=0h\left(\hat{x}\right)=0. Thus:

q~(λ,r)\displaystyle\tilde{q}\left(\lambda,r\right)\leq{} L~λ,r(x^,t^)=f(x^)=L~λ^,r^(x^,t^)=q~(λ^,r^)\displaystyle\tilde{L}_{\lambda,r}\left(\hat{x},\hat{t}\right)=f\left(\hat{x}\right)=\tilde{L}_{\hat{\lambda},\hat{r}}\left(\hat{x},\hat{t}\right)=\tilde{q}\left(\hat{\lambda},\hat{r}\right)
=\displaystyle={} q~(λ^,r^)+(h(x^),t^),(λ,r)(λ^,r^).\displaystyle\tilde{q}\left(\hat{\lambda},\hat{r}\right)+\left\langle\left(h\left(\hat{x}\right),\hat{t}\right),\left(\lambda,r\right)-\left(\hat{\lambda},\hat{r}\right)\right\rangle.

Hence, (h(x^),t^)(q~)(λ^,r^)-\left(h\left(\hat{x}\right),\hat{t}\right)\in\partial\left(-\tilde{q}\right)\left(\hat{\lambda},\hat{r}\right).

Now we will study the sequences generated by Algorithm 1. First, note that if the smoothing parameter is zero, the primal point is feasible, since by Lemma 1(a), we have h(xk)=tk=0\left\|h\left(x^{k}\right)\right\|=t_{k}=0. Also, the dual updates follow a subgradient direction of q~-\tilde{q}, i.e., a proximal point iteration of a linearization of q~-\tilde{q}. Note that the problem

minimize(λ,r)q~(λk,rk)(h(xk+1),tk+1),(λ,r)(λk,rk)+tk+12rk(λ,r)(λk,rk)2,\begin{array}[]{ll}\displaystyle\mathop{\textrm{minimize}}_{(\lambda,r)}&-\tilde{q}\left(\lambda^{k},r_{k}\right)-\left\langle\left(h\left(x^{k+1}\right),t_{k+1}\right),\left(\lambda,r\right)-\left(\lambda^{k},r_{k}\right)\right\rangle\\[5.69054pt] &+\displaystyle\frac{t_{k+1}}{2r_{k}}\left\|\left(\lambda,r\right)-\left(\lambda^{k},r_{k}\right)\right\|^{2},\end{array}

has the unique solution

(λk+1,rk+1)=(λk,rk)+rktk+1(h(xk+1),tk+1).\left(\lambda^{k+1},r_{k+1}\right)=\left(\lambda^{k},r_{k}\right)+\frac{r_{k}}{t_{k+1}}\left(h\left(x^{k+1}\right),t_{k+1}\right).

Clearly, Algorithm 1 will generate a sequence if at Step 1 a global minimizer of L~λk,rk\tilde{L}_{\lambda^{k},r_{k}} is found. The existence of such a minimizer can be ensured under various sets of assumptions. Since our result is independent of such assumptions, we will simply assume that A(λk,rk)A\left(\lambda^{k},r_{k}\right)\neq\emptyset.

The next result states that if the algorithm stops, we obtain both a primal and a dual solution, otherwise the generated dual variables will approach the maximum of q~\tilde{q}.

Theorem 2.1

Suppose that A(λ,r)A\left(\lambda,r\right)\neq\emptyset for every (λ,r)m×(0,)\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right). Then, the following holds:

  1. (a)

    If Algorithm 1 stops at the kk–th iteration, then xk+1x^{k+1} is an optimal primal solution, and (λk,rk)\left(\lambda^{k},r_{k}\right) is an optimal dual solution. Moreover, the optimal values of the primal and augmented dual problems are the same.

  2. (b)

    If (λk,rk)\left(\lambda^{k},r_{k}\right) is not a dual solution, then q~(λk+1,rk+1)q~(λk,rk)\tilde{q}\left(\lambda^{k+1},r_{k+1}\right)\geq\tilde{q}\left(\lambda^{k},r_{k}\right).

Proof

(a) If the algorithm terminates at the kk–th iteration, then tk+1=0t_{k+1}=0, and therefore h(xk+1)=0h\left(x^{k+1}\right)=0 (see Lemma 1(a)). Thus, (0,0)(q~)(λk,rk)\left(0,0\right)\in\partial\left(-\tilde{q}\right)\left(\lambda^{k},r_{k}\right). By the convexity of q~-\tilde{q}, for any (λ,r)m×(0,)\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right) it holds that

q~(λ,r)q~(λk,rk),\tilde{q}\left(\lambda,r\right)\leq\tilde{q}\left(\lambda^{k},r_{k}\right),

which implies that (λk,rk)\left(\lambda^{k},r_{k}\right) is an optimal dual solution. On the other hand, for any xx with h(x)=0h\left(x\right)=0 we have

f(xk+1)=L~λk,rk(xk+1,tk+1)L~λk,rk(x,0)=f(x),f\left(x^{k+1}\right)=\tilde{L}_{\lambda^{k},r_{k}}\left(x^{k+1},t_{k+1}\right)\leq\tilde{L}_{\lambda^{k},r_{k}}\left(x,0\right)=f\left(x\right),

which means xk+1x^{k+1} is an optimal primal solution. Moreover, there is no duality gap, since

q~(λk,rk)=L~λk,rk(xk+1,tk+1)=f(xk+1).\tilde{q}\left(\lambda^{k},r_{k}\right)=\tilde{L}_{\lambda^{k},r_{k}}\left(x^{k+1},t_{k+1}\right)=f\left(x^{k+1}\right).

(b) If (λk,rk)\left(\lambda^{k},r_{k}\right) is not a dual solution, Algorithm 1 will generate (xk+2,tk+2)A(λk+1,rk+1)\left(x^{k+2},t_{k+2}\right)\in A\left(\lambda^{k+1},r_{k+1}\right). First, consider the case when tk+2>0t_{k+2}>0. Given that tk+2=h(xk+2)t_{k+2}=\left\|h\left(x^{k+2}\right)\right\| and using the update formula for λk+1\lambda^{k+1} and rk+1r_{k+1} we have:

q~(λk+1,rk+1)=\displaystyle\tilde{q}\left(\lambda^{k+1},r_{k+1}\right)={} f(xk+2)+λk+1,h(xk+2)+rk+12tk+2h(xk+2)2\displaystyle f\left(x^{k+2}\right)+\left\langle\lambda^{k+1},h\left(x^{k+2}\right)\right\rangle+\frac{r_{k+1}}{2t_{k+2}}\left\|h\left(x^{k+2}\right)\right\|^{2}
+rk+12tk+2\displaystyle+\frac{r_{k+1}}{2}t_{k+2}
=\displaystyle={} f(xk+2)+λk,h(xk+2)+rk2tk+2h(xk+2)2+rk2tk+2\displaystyle f\left(x^{k+2}\right)+\left\langle\lambda^{k},h\left(x^{k+2}\right)\right\rangle+\frac{r_{k}}{2t_{k+2}}\left\|h\left(x^{k+2}\right)\right\|^{2}+\frac{r_{k}}{2}t_{k+2}
+rktk+1h(xk+1),h(xk+2)+rktk+2\displaystyle+\frac{r_{k}}{t_{k+1}}\left\langle h\left(x^{k+1}\right),h\left(x^{k+2}\right)\right\rangle+r_{k}t_{k+2}
\displaystyle\geq{} q~(λk,rk)+rktk+2rktk+1h(xk+1)h(xk+2)\displaystyle\tilde{q}\left(\lambda^{k},r_{k}\right)+r_{k}t_{k+2}-\frac{r_{k}}{t_{k+1}}\left\|h\left(x^{k+1}\right)\right\|\left\|h\left(x^{k+2}\right)\right\|
=\displaystyle={} q~(λk,rk).\displaystyle\tilde{q}\left(\lambda^{k},r_{k}\right).

Now, consider the case when tk+2=0t_{k+2}=0. In this situation, the algorithm stops, which, by item (a), implies that (λk+1,rk+1)\left(\lambda^{k+1},r_{k+1}\right) is an optimal dual solution. Since (λk,rk)\left(\lambda^{k},r_{k}\right) is not a dual solution, it must be that q~(λk+1,rk+1)>q~(λk,rk)\tilde{q}\left(\lambda^{k+1},r_{k+1}\right)>\tilde{q}\left(\lambda^{k},r_{k}\right).

Boundedness of the sequences generated by an algorithm is crucial for analyzing convergence. The following lemma describes what happens when the sequence of multipliers or the sequence of penalty parameters is bounded.

Lemma 2

Suppose that A(λ,r)A\left(\lambda,r\right)\neq\emptyset for every (λ,r)m×(0,)\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right). Consider that Algorithm 1 generates an infinite sequence {(λk,rk)}\left\{\left(\lambda^{k},r_{k}\right)\right\}.

  1. (a)

    If {rk}\left\{r_{k}\right\} is bounded, then {λk}\left\{\lambda^{k}\right\} is bounded.

  2. (b)

    If {λk}\left\{\lambda^{k}\right\} is bounded and the set of optimal dual solutions is nonempty, then {rk}\left\{r^{k}\right\} is bounded.

Proof

If an infinite sequence is generated, then tk>0t_{k}>0 for all kk. Then tk=h(xk)t_{k}=\left\|h\left(x^{k}\right)\right\| and the first part follows from the following expressions:

λk+1λ0j=0kλj+1λj=j=0krjtj+1h(xj+1)=j=0krj,\displaystyle\left\|\lambda_{k+1}-\lambda_{0}\right\|\leq\sum_{j=0}^{k}\left\|\lambda_{j+1}-\lambda_{j}\right\|=\sum_{j=0}^{k}\frac{r_{j}}{t_{j+1}}\left\|h\left(x^{j+1}\right)\right\|=\sum_{j=0}^{k}r_{j},
rk+1r0=j=0k(rj+1rj)=j=0krj.\displaystyle r_{k+1}-r_{0}=\sum_{j=0}^{k}\left(r_{j+1}-r_{j}\right)=\sum_{j=0}^{k}r_{j}.

To prove the second statement, suppose that {λk}\left\{\lambda^{k}\right\} is bounded and take a dual solution (λ¯,r¯)\left(\bar{\lambda},\bar{r}\right). Since (h(xk+1),tk+1)(q~)(λk,rk)-\left(h\left(x^{k+1}\right),t_{k+1}\right)\in\partial\left(-\tilde{q}\right)\left(\lambda^{k},r_{k}\right), we have that

q~(λ¯,r¯)q~(λk,rk)+h(xk+1),λ¯λk+tk+1(r¯rk).\tilde{q}\left(\bar{\lambda},\bar{r}\right)\leq\tilde{q}\left(\lambda^{k},r_{k}\right)+\left\langle h\left(x^{k+1}\right),\bar{\lambda}-\lambda^{k}\right\rangle+t_{k+1}\left(\bar{r}-r_{k}\right).

Therefore,

rkq~(λk,rk)q~(λ¯,r¯)tk+1+h(xk+1),λ¯λktk+1+r¯λ¯λk+r¯,r_{k}\leq\frac{\tilde{q}\left(\lambda^{k},r_{k}\right)-\tilde{q}\left(\bar{\lambda},\bar{r}\right)}{t_{k+1}}+\frac{\left\langle h\left(x^{k+1}\right),\bar{\lambda}-\lambda^{k}\right\rangle}{t_{k+1}}+\bar{r}\leq\left\|\bar{\lambda}-\lambda^{k}\right\|+\bar{r},

where we use the fact that (λ¯,r¯)\left(\bar{\lambda},\bar{r}\right) is a dual solution, the Cauchy–Schwarz inequality, and that tk+1=h(xk+1)t_{k+1}=\left\|h\left(x^{k+1}\right)\right\|. Consequently, if {λk}\left\{\lambda_{k}\right\} is bounded we conclude that {rk}\left\{r_{k}\right\} must also be bounded.

The next lemma provides a sufficient condition for the boundedness of the sequence of penalty parameters.

Lemma 3

Under the assumptions of Lemma 2, if the set of optimal dual solutions is nonempty, then the sequence {rk}\left\{r_{k}\right\} is bounded.

Proof

Let (λ¯,r¯)\left(\bar{\lambda},\bar{r}\right) be a dual solution. To arrive at a contradiction, assume that {rk}\left\{r_{k}\right\} is unbounded. Since {rk}\left\{r_{k}\right\} is increasing, there exists k0k_{0} such that 2r¯<rk2\bar{r}<r_{k} for all kk0k\geq k_{0}. For kk0k\geq k_{0}, using the concavity of q~\tilde{q}, we have:

λ¯λk+12\displaystyle\left\|\bar{\lambda}-\lambda^{k+1}\right\|^{2} =λ¯(λk+rktk+1h(xk+1))2\displaystyle=\left\|\bar{\lambda}-\left(\lambda^{k}+\frac{r_{k}}{t_{k+1}}h\left(x^{k+1}\right)\right)\right\|^{2}
=λ¯λk2+rk2tk+12h(xk+1)22rktk+1λ¯λk,h(xk+1)\displaystyle=\left\|\bar{\lambda}-\lambda^{k}\right\|^{2}+\frac{r_{k}^{2}}{t_{k+1}^{2}}\left\|h\left(x^{k+1}\right)\right\|^{2}-2\frac{r_{k}}{t_{k+1}}\left\langle\bar{\lambda}-\lambda^{k},h\left(x^{k+1}\right)\right\rangle
λ¯λk2+rk2+2rktk+1(q~(λk,rk)q~(λ¯,r¯)+tk+1(r¯rk))\displaystyle\leq\left\|\bar{\lambda}-\lambda^{k}\right\|^{2}+r_{k}^{2}+2\frac{r_{k}}{t_{k+1}}\left(\tilde{q}\left(\lambda^{k},r_{k}\right)-\tilde{q}\left(\bar{\lambda},\bar{r}\right)+t_{k+1}\left(\bar{r}-r_{k}\right)\right)
λ¯λk2+rk(2r¯rk)\displaystyle\leq\left\|\bar{\lambda}-\lambda^{k}\right\|^{2}+r_{k}\left(2\bar{r}-r_{k}\right)
λ¯λk2,\displaystyle\leq\left\|\bar{\lambda}-\lambda^{k}\right\|^{2},

where we used the fact that q~(λk,rk)q~(λ¯,r¯)\tilde{q}\left(\lambda^{k},r_{k}\right)\leq\tilde{q}\left(\bar{\lambda},\bar{r}\right). Thus, {λk}\left\{\lambda^{k}\right\} is bounded, and by Lemma 2, {rk}\left\{r_{k}\right\} must also be bounded, leading to a contradiction.

The following theorem guarantees the finite termination of Algorithm 1, provided that dual solutions exist.

Theorem 2.2

Suppose that A(λ,r)A\left(\lambda,r\right)\neq\emptyset for every (λ,r)m×(0,)\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right) and that the set of dual solutions is nonempty. Then, there exists k¯>0\bar{k}>0 such that Algorithm 1 stops at the k¯\bar{k}–th iteration. In particular, xk¯+1x^{\bar{k}+1} is a primal solution, and (λk¯,rk¯)\left(\lambda^{\bar{k}},r_{\bar{k}}\right) is a dual solution.

Proof

To derive a contradiction, suppose that Algorithm 1 does not have finite termination. Then, for all kk, we obtain

rkr0=j=0k1(rj+1rj)=j=0k1rjj=0k1r0=kr0,r_{k}-r_{0}=\sum_{j=0}^{k-1}\left(r_{j+1}-r_{j}\right)=\sum_{j=0}^{k-1}r_{j}\geq\sum_{j=0}^{k-1}r_{0}=kr_{0},

which contradicts the boundedness of {rk}\left\{r_{k}\right\} as established by Lemma 3. Therefore, there exists k¯\bar{k} such that tk¯+1=0t_{\bar{k}+1}=0, and, by Theorem 2.1(a), xk¯+1x^{\bar{k}+1} is a primal solution and (λk¯,rk¯)\left(\lambda^{\bar{k}},r_{\bar{k}}\right) is a dual solution.

It is important to emphasize that in the case of no duality gap, i.e., when q~(λ,r)=f(x)\tilde{q}\left(\lambda,r\right)=f\left(x\right) for feasible dual and primal points (λ,r)\left(\lambda,r\right) and xx, respectively, the vector λ\lambda is considered as a Lagrange multiplier in an extended sense. The structure of our augmented dual function enables us to show the existence of a Lagrange multiplier that satisfies (2).

Proposition 1

Suppose that A(λ,r)A\left(\lambda,r\right)\neq\emptyset for every (λ,r)m×(0,)\left(\lambda,r\right)\in\mathbbm{R}^{m}\times\left(0,\infty\right). If Algorithm 1 terminates at the kk–th iteration, then xk+1x^{k+1} is a stationary point of (1).

Proof

From Theorem 2.1(a), we know that h(xk+1)=0h\left(x^{k+1}\right)=0 and q~(λk,rk)=f(xk+1)\tilde{q}\left(\lambda^{k},r_{k}\right)=f\left(x^{k+1}\right). Therefore,

L¯1(xk+1;λk,rk)=f(xk+1)=q~(λk,rk)L~λk,rk(x,t),\bar{L}_{1}\left(x^{k+1};\lambda^{k},r_{k}\right)=f\left(x^{k+1}\right)=\tilde{q}\left(\lambda^{k},r_{k}\right)\leq\tilde{L}_{\lambda^{k},r_{k}}\left(x,t\right),

for all (x,t)\left(x,t\right). Taking the minimum for t0t\geq 0 we obtain that L¯1(xk+1;λk,rk)L¯1(x;λk,rk)\bar{L}_{1}\left(x^{k+1};\lambda^{k},r_{k}\right)\leq\bar{L}_{1}\left(x;\lambda^{k},r_{k}\right). Hence, xk+1x^{k+1} is a minimizer of L¯1\bar{L}_{1} and thus

0L¯1(xk+1;λk,rk)={f(xk+1)+h(xk+1)(λk+rkξ)ξ1}.0\in\partial\bar{L}_{1}\left(x^{k+1};\lambda^{k},r_{k}\right)=\left\{\nabla f\left(x^{k+1}\right)+\nabla h\left(x^{k+1}\right)\left(\lambda^{k}+r_{k}\xi\right)\mid\left\|\xi\right\|\leq 1\right\}.

Therefore, 0=f(xk+1)+h(xk+1)(λk+rkξk)0=\nabla f\left(x^{k+1}\right)+\nabla h\left(x^{k+1}\right)\left(\lambda^{k}+r_{k}\xi^{k}\right) for some ξk\xi^{k} with ξk1\left\|\xi^{k}\right\|\leq 1. In summary, xk+1x^{k+1} is a stationary point with the associated Lagrange multiplier λk+rkξk\lambda^{k}+r_{k}\xi^{k}.

Note that Step 1a of Algorithm 1 requires to tackle two issues:

  • dealing with a nondifferentiable objective function when t=0t=0,

  • finding an exact global minimizer.

Regarding the nondifferentiability, it is possible that all solutions are of the form (x,0)\left(x,0\right) where L~λ,r\tilde{L}_{\lambda,r} is not continuous and hence not differentiable. Additionally, since most optimization solvers assume some degree of smoothness for the objective function and constraints, numerical issues may arise.

Furthermore, global optimization algorithms tend to be slow or unreliable for large– or medium– scale problems birgin2014practical . Moreover, exact solutions are unattainable due to the limitations of finite arithmetic in computers.

In preliminary implementations of Algorithm 1, we observed that Step 1a could not finish satisfactorily when tkt_{k} was close to zero and the penalty parameter rkr_{k} became excessively large.

In the next section we propose a strategy to address the nondifferentiability at t=0t=0. We also relax the requirement of finding an exact solution to the subproblem, allowing for an inexact stationary point.

3 Inexact Methods

In this section, we present a primal approach to the previous algorithm. The main advantage of this approach is that it allows us to replace exact minimizers with inexact stationary points. It is well known that for nonsmooth functions, such as x|x|x\mapsto\left|x\right|, there may be no points in a neighborhood of the minimizer where the derivative of the function is close to zero. This behavior can also be observed in L~λ,r\tilde{L}_{\lambda,r} when applied to simple problems, as illustrated in the next example.

Example 1

Consider the following problem:

minimize12x2subject tox=0.\begin{array}[]{cl}\textrm{minimize}&\frac{1}{2}x^{2}\\[5.69054pt] \textrm{subject to}&x=0.\end{array} (5)

The solution is x¯=0\bar{x}=0 with the associated Lagrange multiplier λ¯=0\bar{\lambda}=0. For this problem, for each pair (λ,r)\left(\lambda,r\right), we have

L~λ,r(x,t)={12x2+λx+r2tx2+r2t,t>0,0,x=t=0,,otherwise.\tilde{L}_{\lambda,r}\left(x,t\right)=\left\{\begin{array}[]{ll}\frac{1}{2}x^{2}+\lambda x+\frac{r}{2t}x^{2}+\frac{r}{2}t,&\quad t>0,\\[5.69054pt] 0,&\quad x=t=0,\\[5.69054pt] \infty,&\quad\textrm{otherwise}.\end{array}\right.

Let r|λ|+cr\geq\left|\lambda\right|+c for some c>0c>0, and suppose there exists (x,t)\left(x,t\right) with t>0t>0 such that L~λ,r(x,t)ε\left\|\nabla\tilde{L}_{\lambda,r}\left(x,t\right)\right\|\leq\varepsilon. Then, there exist pp and qq such that p2+q21p^{2}+q^{2}\leq 1 and

εq\displaystyle\varepsilon q =L~λ,rx(x,t)=x+λ+rtx,\displaystyle=\frac{\partial\tilde{L}_{\lambda,r}}{\partial x}\left(x,t\right)=x+\lambda+\frac{r}{t}x, (6)
εp\displaystyle\varepsilon p =L~λ,rt(x,t)=r2(1x2t2).\displaystyle=\frac{\partial\tilde{L}_{\lambda,r}}{\partial t}\left(x,t\right)=\frac{r}{2}\left(1-\frac{x^{2}}{t^{2}}\right). (7)

From (6), we have that x=(εqλ)t/(t+r)x=\left(\varepsilon q-\lambda\right)t/\left(t+r\right), and from (7), we get 0=(12εp/r)t2x20=\left(1-2\varepsilon p/r\right)t^{2}-x^{2}. Combining these two equations, we obtain

0\displaystyle 0 =(12εpr)(t+r)2t2(εqλ)2t2\displaystyle=\left(1-\frac{2\varepsilon p}{r}\right)\left(t+r\right)^{2}t^{2}-\left(\varepsilon q-\lambda\right)^{2}t^{2}
=[(12εpr)(t+r)2(εqλ)2]t2.\displaystyle=\left[\left(1-\frac{2\varepsilon p}{r}\right)\left(t+r\right)^{2}-\left(\varepsilon q-\lambda\right)^{2}\right]t^{2}.

Since t>0t>0, for ε\varepsilon small enough such that 12εp/r>01-2\varepsilon p/r>0, we get

t=|εqλ|12εp/rrε+|λ|12εp/rrε+rc12εp/rr.t=\frac{\left|\varepsilon q-\lambda\right|}{\sqrt{1-2\varepsilon p/r}}-r\leq\frac{\varepsilon+\left|\lambda\right|}{\sqrt{1-2\varepsilon p/r}}-r\leq\frac{\varepsilon+r-c}{\sqrt{1-2\varepsilon p/r}}-r.

This leads to a contradiction, as the rightmost term is negative for sufficiently small ε\varepsilon. Hence, there is no (x,t)\left(x,t\right) with t>0t>0 such that L~λ,r(x,t)ε\left\|\nabla\tilde{L}_{\lambda,r}\left(x,t\right)\right\|\leq\varepsilon for sufficiently small ε\varepsilon.

To ensure the existence of inexact stationary points for a continuously differentiable function, we introduce a barrier function associated with the condition t0t\geq 0. For computational simplicity, we employ the inverse barrier function. The function to be minimized is defined as:

L^λ,rs(x,t)=L~λ,r(x,t)+r2ts2,\hat{L}^{s}_{\lambda,r}\left(x,t\right)=\tilde{L}_{\lambda,r}\left(x,t\right)+\frac{r}{2t}s^{2}, (8)

where s>0s>0 is the barrier parameter. This function is clearly lower semicontinuous and continuously differentiable for t>0t>0. The existence of inexact stationary points is shown in the following proposition.

Proposition 2

Assume that ff is bounded from below and hh is bounded. Then, for any s>0s>0 and ε~>0\tilde{\varepsilon}>0, there exists a pair (x(s),t(s))\left(x\left(s\right),t\left(s\right)\right) such that L^λ,rs(x(s),t(s))ε~\left\|\nabla\hat{L}^{s}_{\lambda,r}\left(x\left(s\right),t\left(s\right)\right)\right\|\leq\tilde{\varepsilon}.

Proof

Let f(x)κff\left(x\right)\geq\kappa_{f} and h(x)κh\left\|h\left(x\right)\right\|\leq\kappa_{h} hold for all xx. Then, the following holds:

  • If rλr\geq\left\|\lambda\right\|, we have (rλ)h(x)0\left(r-\left\|\lambda\right\|\right)\left\|h\left(x\right)\right\|\geq 0.

  • If r<λr<\left\|\lambda\right\|, then (rλ)h(x)(rλ)κh\left(r-\left\|\lambda\right\|\right)\left\|h\left(x\right)\right\|\geq\left(r-\left\|\lambda\right\|\right)\kappa_{h}.

Thus, for all s>0s>0 and (x,t)n×(0,)\left(x,t\right)\in\mathbbm{R}^{n}\times\left(0,\infty\right) we have

L^λ,rs(x,t)\displaystyle\hat{L}^{s}_{\lambda,r}\left(x,t\right) L~λ,r(x,t)L¯1(x;λ,r)f(x)+(rλ)h(x)\displaystyle\geq\tilde{L}_{\lambda,r}\left(x,t\right)\geq\bar{L}_{1}\left(x;\lambda,r\right)\geq f\left(x\right)+\left(r-\left\|\lambda\right\|\right)\left\|h\left(x\right)\right\|
κf+min{0,(rλ)κh}.\displaystyle\geq\kappa_{f}+\min\left\{0,\left(r-\left\|\lambda\right\|\right)\kappa_{h}\right\}.

Hence, L^λ,rs\hat{L}^{s}_{\lambda,r} is bounded from below. Now, take η>0\eta>0 and (x~,t~)\left(\tilde{x},\tilde{t}\right) such that L^λ,rs(x~,t~)<inf(x,t)L^λ,rs(x,t)+η\hat{L}^{s}_{\lambda,r}\left(\tilde{x},\tilde{t}\right)<\inf_{\left(x,t\right)}\hat{L}^{s}_{\lambda,r}\left(x,t\right)+\eta. Then, by (rockafellar2009variational, , Proposition 10.44), there exists (x(s),t(s))\left(x\left(s\right),t\left(s\right)\right) such that L^λ,rs(x(s),t(s))L^λ,rs(x~,t~)\hat{L}^{s}_{\lambda,r}\left(x\left(s\right),t\left(s\right)\right)\leq\hat{L}^{s}_{\lambda,r}\left(\tilde{x},\tilde{t}\right) and a vector vL^λ,rs(x(s),t(s))v\in\partial\hat{L}^{s}_{\lambda,r}\left(x\left(s\right),t\left(s\right)\right) with vε~\left\|v\right\|\leq\tilde{\varepsilon}. Notice that by the definition of L^λ,rs\hat{L}^{s}_{\lambda,r}, it follows that t(s)>0t\left(s\right)>0, implying that L^λ,rs\hat{L}^{s}_{\lambda,r} is continuously differentiable at (x(s),t(s))\left(x\left(s\right),t\left(s\right)\right). Consequently, we have

L^λ,rs(x(s),t(s))=vε~.\left\|\nabla\hat{L}^{s}_{\lambda,r}\left(x\left(s\right),t\left(s\right)\right)\right\|=\left\|v\right\|\leq\tilde{\varepsilon}.

Remark 1

It is not hard to see that the objective and constraint functions can be redefined to meet the conditions of Proposition 2. For instance, given some M>0M>0, we could define f~(x)=ef(x)\tilde{f}\left(x\right)=e^{f\left(x\right)} and h~i(x)=max{M+tanh(hi(x)+M),min{hi(x),M+tanh(hi(x)M)}}\tilde{h}_{i}\left(x\right)=\max\left\{-M+\tanh\left(h_{i}\left(x\right)+M\right),\min\left\{h_{i}\left(x\right),M+\tanh\left(h_{i}\left(x\right)-M\right)\right\}\right\}. Consequently, f~\tilde{f} is bounded from below, h~\tilde{h} is bounded, and both functions are twice continuously differentiable.

Recall that L^λ,rs(x,t)=(xL^λ,rs(x,t),L^λ,rst(x,t))\nabla\hat{L}^{s}_{\lambda,r}\left(x,t\right)=\left(\nabla_{x}\hat{L}^{s}_{\lambda,r}\left(x,t\right),\frac{\partial\hat{L}^{s}_{\lambda,r}}{\partial t}\left(x,t\right)\right) where

xL^λ,rs(x,t)\displaystyle\nabla_{x}\hat{L}^{s}_{\lambda,r}\left(x,t\right) =xL~λ,r(x,t)=f(x)+h(x)(λ+rth(x)),\displaystyle=\nabla_{x}\tilde{L}_{\lambda,r}\left(x,t\right)=\nabla f\left(x\right)+\nabla h\left(x\right)\left(\lambda+\frac{r}{t}h\left(x\right)\right), (9)
L^λ,rst(x,t)\displaystyle\frac{\partial\hat{L}^{s}_{\lambda,r}}{\partial t}\left(x,t\right) =r2(1h(x)2+s2t2).\displaystyle=\frac{r}{2}\left(1-\frac{\left\|h\left(x\right)\right\|^{2}+s^{2}}{t^{2}}\right). (10)

These expressions will be useful in the forthcoming calculations.

3.1 Fixing the smoothing parameter

According to Proposition 2, there exists (x(s),t(s))\left(x\left(s\right),t\left(s\right)\right), an inexact stationary point of L^λ,rs\hat{L}^{s}_{\lambda,r}. To compute such a point, note that, as indicated by (10), the stationary point of the function tL^λ,rs(x,t)t\mapsto\hat{L}^{s}_{\lambda,r}\left(x,t\right), and hence a global minimizer due to convexity, must satisfy

t=h(x)2+s2.t=\sqrt{\left\|h\left(x\right)\right\|^{2}+s^{2}}.

We propose a coordinate descent–like strategy to find this point. In the kk–th iteration, we define tk+1t_{k+1} as the minimizer of tL^λk,rksk(xk,t)t\mapsto\hat{L}^{s_{k}}_{\lambda^{k},r_{k}}\left(x^{k},t\right), and xk+1x^{k+1} as an inexact stationary point of the problem minimizing xL^λk,rksk(x,tk+1)x\mapsto\hat{L}^{s_{k}}_{\lambda^{k},r_{k}}\left(x,t_{k+1}\right). The existence of such xk+1x^{k+1} is guaranteed by similar arguments to those in Proposition 2. Based on this approach, we define Algorithm 2.

Step 0: Initialization
Take λmin<λmax\lambda_{\min}<\lambda_{\max}, 0<τ<10<\tau<1, γ>1\gamma>1, and tol>0tol>0. Choose x0nx^{0}\in\mathbbm{R}^{n}, t0>0t_{0}>0, λ¯0[λmin,λmax]m\bar{\lambda}^{0}\in\left[\lambda_{\min},\lambda_{\max}\right]^{m}, and r0>0r_{0}>0. Consider sequences of exogenous parameters εk0\varepsilon_{k}\searrow 0 and {sk}(0,)\left\{s_{k}\right\}\subset\left(0,\infty\right) that are bounded.
Set k:=0k:=0.
Step 1: Checking stopping criterion
If xL~λ¯k,rk(xk,tk)2+h(xk)2tol, then STOP.\text{If }\sqrt{\left\|\nabla_{x}\tilde{L}_{\bar{\lambda}^{k},r_{k}}\left(x^{k},t_{k}\right)\right\|^{2}+\left\|h\left(x^{k}\right)\right\|^{2}}\leq tol,\text{ then STOP}.
Step 2: Updating primal variables
Define
tk+1=h(xk)2+sk2.t_{k+1}=\sqrt{\left\|h\left(x^{k}\right)\right\|^{2}+s_{k}^{2}}.
Find xk+1nx^{k+1}\in\mathbbm{R}^{n} such that
xL~λ¯k,rk(xk+1,tk+1)εk.\left\|\nabla_{x}\tilde{L}_{\bar{\lambda}^{k},r_{k}}\left(x^{k+1},t_{k+1}\right)\right\|\leq\varepsilon_{k}.
Step 3: Updating Lagrangian multipliers
Define
λk+1=λ¯k+rkh(xk+1)tk+1.\lambda^{k+1}=\bar{\lambda}^{k}+r_{k}\frac{h\left(x^{k+1}\right)}{t_{k+1}}.
Step 4: Updating penalty parameter
If h(xk+1)τh(xk)\left\|h\left(x^{k+1}\right)\right\|\leq\tau\left\|h\left(x^{k}\right)\right\|, choose rk+1=rkr_{k+1}=r_{k}. Otherwise, define rk+1=γrkr_{k+1}=\gamma r_{k}.
Step 5: Projecting Lagrangian multipliers
Compute λ¯k+1[λmin,λmax]m\bar{\lambda}^{k+1}\in\left[\lambda_{\min},\lambda_{\max}\right]^{m}.
Set k:=k+1k:=k+1 and go to Step 1.
Algorithm 2 Inexact sharp Lagrangian (fixed smoothing parameter)

The next proposition shows that every limit point of the sequence generated by Algorithm 2 is either a stationary point of problem (1) or a stationary point of the problem that minimizes infeasibility.

Proposition 3

Let x¯n\bar{x}\in\mathbbm{R}^{n} be a limit point of the sequence {xk}\left\{x^{k}\right\} generated by Algorithm 2. Then:

  1. (a)

    If {rk}\left\{r_{k}\right\} is bounded, then x¯\bar{x} is a stationary point of problem (1).

  2. (b)

    If {rk}\left\{r_{k}\right\} is unbounded and {h(xk)}\left\{h\left(x^{k}\right)\right\} is bounded, then x¯\bar{x} is a stationary point of the unconstrained problem of minimizing h(x)2\left\|h\left(x\right)\right\|^{2}.

Proof

Let x¯\bar{x} be a limit point of the sequence {xk}\left\{x^{k}\right\} generated by Algorithm 2. Then there exists an infinite subset 𝒦\mathcal{K}\subset\mathbbm{N} such that limkk𝒦xk+1=x¯\lim_{\underset{k\in\mathcal{K}}{k\to\infty}}x^{k+1}=\bar{x}. With this in mind, we consider two cases.

(a) If {rk}\left\{r_{k}\right\} is bounded, then by Step 4 of Algorithm 2, there exists k0k_{0}\in\mathbbm{N} such that rk=rk0r_{k}=r_{k_{0}} and h(xk+1)τh(xk)\left\|h\left(x^{k+1}\right)\right\|\leq\tau\left\|h\left(x^{k}\right)\right\| for all kk0k\geq k_{0}. Therefore, for all kk0k\geq k_{0},

h(xk+1)tk+1=h(xk+1)h(xk)2+sk2τh(xk)h(xk)2+sk2τ.\frac{\left\|h\left(x^{k+1}\right)\right\|}{t_{k+1}}=\frac{\left\|h\left(x^{k+1}\right)\right\|}{\sqrt{\left\|h\left(x^{k}\right)\right\|^{2}+s_{k}^{2}}}\leq\frac{\tau\left\|h\left(x^{k}\right)\right\|}{\sqrt{\left\|h\left(x^{k}\right)\right\|^{2}+s_{k}^{2}}}\leq\tau.

Taking the limit over a suitable subsequence, from the inequality in Step 2 of Algorithm 2, we have

f(x¯)+h(x¯)(λ¯+rk0ξ)=0.\nabla f\left(\bar{x}\right)+\nabla h\left(\bar{x}\right)\left(\bar{\lambda}+r_{k_{0}}\xi\right)=0.

for some ξ\xi, which is a limit point of {h(xk+1)/tk+1}k𝒦\left\{h\left(x^{k+1}\right)/t_{k+1}\right\}_{k\in\mathcal{K}}, and for some λ¯\bar{\lambda}, a limit point of {λ¯k}k𝒦\left\{\bar{\lambda}^{k}\right\}_{k\in\mathcal{K}}.

Furthermore, by Step 4 of Algorithm 2, we have that h(x¯)=0h\left(\bar{x}\right)=0, indicating that x¯\bar{x} is a feasible point for problem (1). Therefore, x¯\bar{x} is a stationary point for this problem.

(b) If {rk}\left\{r_{k}\right\} is unbounded and {h(xk)}\left\{h\left(x^{k}\right)\right\} is bounded, then 1/rk01/r_{k}\to 0 and tk+1t_{k+1} is bounded (since {sk}\left\{s_{k}\right\} is bounded).

On the other hand, if we multiply both sides of the inequality in Step 2 of Algorithm 2 by tk+1/rkt_{k+1}/r_{k}, we obtain

tk+1rkf(xk+1)+h(xk+1)[tk+1rkλ¯k+h(xk+1)]tk+1rkεk.\left\|\frac{t_{k+1}}{r_{k}}\nabla f\left(x^{k+1}\right)+\nabla h\left(x^{k+1}\right)\left[\frac{t_{k+1}}{r_{k}}\bar{\lambda}^{k}+h\left(x^{k+1}\right)\right]\right\|\leq\frac{t_{k+1}}{r_{k}}\varepsilon_{k}.

Taking limits as k𝒦k\in\mathcal{K}, we get

h(x¯)h(x¯)=0.\nabla h\left(\bar{x}\right)h\left(\bar{x}\right)=0.

Consequently, x¯\bar{x} is a stationary point for the problem of minimizing h(x)2\left\|h\left(x\right)\right\|^{2}.

The boundedness condition of {h(xk)}\left\{h\left(x^{k}\right)\right\} in item (b) can be ensured by using a cutoff function, as described in Remark 1.

3.2 Varying the smoothing parameter

In the previous section, we decoupled the computation of (xk+1,tk+1)\left(x^{k+1},t_{k+1}\right). As a result, this pair is not necessarily an inexact stationary point of L^λ¯k,rks\hat{L}^{s}_{\bar{\lambda}^{k},r_{k}}. In this section, however, we will treat this point as an inexact stationary point of L^λ¯k,rks\hat{L}^{s}_{\bar{\lambda}^{k},r_{k}}. By doing so, we will obtain results similar to those in the aforementioned section.

Step 0: Initialization
Take λmin<λmax\lambda_{\min}<\lambda_{\max}, 0<τ<10<\tau<1, γ>1\gamma>1, tol>0tol>0. Choose x0nx^{0}\in\mathbbm{R}^{n}, t0>0t_{0}>0, λ¯0[λmin,λmax]m\bar{\lambda}^{0}\in\left[\lambda_{\min},\lambda_{\max}\right]^{m}, r0>0r_{0}>0. Take sequences of exogenous parameters εk0\varepsilon_{k}\searrow 0 and {sk}(0,)\left\{s_{k}\right\}\subset\left(0,\infty\right) bounded.
Set k:=0k:=0.
Step 1: Checking stopping criterion
If xL~λ¯k,rk(xk,tk)2+h(xk)2tol, then STOP.\text{If }\sqrt{\left\|\nabla_{x}\tilde{L}_{\bar{\lambda}^{k},r_{k}}\left(x^{k},t_{k}\right)\right\|^{2}+\left\|h\left(x^{k}\right)\right\|^{2}}\leq tol,\text{ then STOP}.
Step 2: Updating primal variables
Find (xk+1,tk+1)n×++\left(x^{k+1},t_{k+1}\right)\in\mathbbm{R}^{n}\times\mathbbm{R}_{++} such that
L^λ¯k,rksk(xk+1,tk+1)εk,\left\|\nabla\hat{L}^{s_{k}}_{\bar{\lambda}^{k},r_{k}}\left(x^{k+1},t_{k+1}\right)\right\|\leq\varepsilon_{k},
Step 3: Updating Lagrangian multipliers
Define
λk+1=λ¯k+rkh(xk+1)tk+1.\lambda^{k+1}=\bar{\lambda}^{k}+r_{k}\frac{h\left(x^{k+1}\right)}{t_{k+1}}.
Step 4: Updating penalty parameter
If h(xk+1)τh(xk)\left\|h\left(x^{k+1}\right)\right\|\leq\tau\left\|h\left(x^{k}\right)\right\|, choose rk+1=rkr_{k+1}=r_{k}. Otherwise, define rk+1=γrkr_{k+1}=\gamma r_{k}.
Step 5: Projecting Lagrangian multipliers
Compute λ¯k+1[λmin,λmax]m\bar{\lambda}^{k+1}\in\left[\lambda_{\min},\lambda_{\max}\right]^{m}.
Set k:=k+1k:=k+1 and go to Step 1.
Algorithm 3 Inexact sharp Lagrangian (varying the smoothing parameter)
Remark 2

From Step 2 of Algorithm 3 it can be deduced that the sequence {h(xk)/tk}\left\{\left\|h\left(x^{k}\right)\right\|/t_{k}\right\} is bounded and also that {tk}\left\{t_{k}\right\} is bounded if the sequence {h(xk)}\left\{h\left(x^{k}\right)\right\} is bounded. Indeed, from (10), we have

|L^λk,rkskt(xk+1,tk+1)|εk,\left|\frac{\partial\hat{L}^{s_{k}}_{\lambda^{k},r_{k}}}{\partial t}\left(x^{k+1},t_{k+1}\right)\right|\leq\varepsilon_{k},

which holds if and only if

12εkrkh(xk+1)2+sk2tk+121+2εkrk.1-\frac{2\varepsilon_{k}}{r_{k}}\leq\frac{\left\|h\left(x^{k+1}\right)\right\|^{2}+s_{k}^{2}}{t_{k+1}^{2}}\leq 1+\frac{2\varepsilon_{k}}{r_{k}}.

Thus,

h(xk+1)tk+11+2εkrk,tk+1h(xk+1)2+sk212εkrk.\frac{\left\|h\left(x^{k+1}\right)\right\|}{t_{k+1}}\leq\sqrt{1+\frac{2\varepsilon_{k}}{r_{k}}},\qquad t_{k+1}\leq\sqrt{\frac{\left\|h\left(x^{k+1}\right)\right\|^{2}+s_{k}^{2}}{1-\frac{2\varepsilon_{k}}{r_{k}}}}.

The following proposition shows that every limit point of the sequence generated by Algorithm 3 is either a stationary point of problem (1) or a stationary point of the problem that minimizes infeasibility.

Proposition 4

Let x¯n\bar{x}\in\mathbbm{R}^{n} be a limit point of the sequence {xk}\left\{x^{k}\right\} generated by Algorithm 3. Then:

  1. (a)

    If {rk}\left\{r_{k}\right\} is bounded, then x¯\bar{x} is a stationary point of problem (1).

  2. (b)

    If {rk}\left\{r_{k}\right\} is unbounded, then x¯\bar{x} is a stationary point of the unconstrained problem of minimizing h(x)2\left\|h\left(x\right)\right\|^{2}.

Proof

Let x¯\bar{x} be a limit point of the sequence {xk}\left\{x^{k}\right\} generated by Algorithm 3. Then there exists an infinite subset 𝒦\mathcal{K}\subset\mathbbm{N} such that limkk𝒦xk+1=x¯\lim_{\underset{k\in\mathcal{K}}{k\to\infty}}x^{k+1}=\bar{x}. With this in mind, we consider two cases.

(a) Suppose {rk}\left\{r_{k}\right\} is bounded. Then, by Step 4 of Algorithm 3, there exists k0k_{0}\in\mathbbm{N} such that rk=rk0r_{k}=r_{k_{0}} and h(xk+1)τh(xk)\left\|h\left(x^{k+1}\right)\right\|\leq\tau\left\|h\left(x^{k}\right)\right\| for all kk0k\geq k_{0}. Consequently, h(x¯)=0h\left(\bar{x}\right)=0, which means that x¯\bar{x} is a feasible point for problem (1).

On the other hand, by Remark 2, the sequence {h(xk+1)/tk+1}k𝒦\left\{h\left(x^{k+1}\right)/t_{k+1}\right\}_{k\in\mathcal{K}} is bounded. Taking the limit over a suitable subsequence, from the inequality in Step 2 of Algorithm 3, we obtain

f(x¯)+h(x¯)(λ¯+rk0ξ)=0.\nabla f\left(\bar{x}\right)+\nabla h\left(\bar{x}\right)\left(\bar{\lambda}+r_{k_{0}}\xi\right)=0.

for some ξ\xi, which is a limit point of {h(xk+1)/tk+1}k𝒦\left\{h\left(x^{k+1}\right)/t_{k+1}\right\}_{k\in\mathcal{K}}, and for some λ¯\bar{\lambda}, a limit point of {λ¯k}k𝒦\left\{\bar{\lambda}^{k}\right\}_{k\in\mathcal{K}}.

Therefore, x¯\bar{x} is a stationary point for the problem of minimizing f(x)f\left(x\right) subject to h(x)=0h\left(x\right)=0.

(b) Suppose {rk}\left\{r_{k}\right\} is unbounded. Since {h(xk+1)}k𝒦\left\{h\left(x^{k+1}\right)\right\}_{k\in\mathcal{K}} is bounded (because {xk+1}k𝒦\left\{x^{k+1}\right\}_{k\in\mathcal{K}} is convergent and hh is a continuous function), it follows from Remark 2 that {tk+1}k𝒦\left\{t_{k+1}\right\}_{k\in\mathcal{K}} is also bounded.

On the other hand, if we multiply both sides of the inequality in Step 2 of Algorithm 3 by tk+1/rkt_{k+1}/r_{k} we obtain

tk+1rkf(xk+1)+h(xk+1)[tk+1rkλ¯k+h(xk+1)]tk+1rkεk.\left\|\frac{t_{k+1}}{r_{k}}\nabla f\left(x^{k+1}\right)+\nabla h\left(x^{k+1}\right)\left[\frac{t_{k+1}}{r_{k}}\bar{\lambda}^{k}+h\left(x^{k+1}\right)\right]\right\|\leq\frac{t_{k+1}}{r_{k}}\varepsilon_{k}.

Taking limits as k𝒦k\in\mathcal{K}, we get

h(x¯)h(x¯)=0.\nabla h\left(\bar{x}\right)h\left(\bar{x}\right)=0.

Consequently, x¯\bar{x} is a stationary point for the problem of minimizing h(x)2\left\|h\left(x\right)\right\|^{2}.

4 Boundedness of the penalty parameters

We have observed that, in both Algorithm 2 and Algorithm 3, if the sequence of penalty parameters is bounded, the limit points of the sequence generated by these algorithms are stationary points of the primal problem. This raises the question of when it is possible to guarantee such a bound, that is, under what conditions the sequence of penalty parameters remains bounded. This issue is of interest not only from a theoretical point of view but also from a computational one, since when the penalty parameters are excessively large, the subproblems tend to be ill–conditioned, making their solution more difficult. More generally, the study of conditions that ensure the bounding of penalty parameters is fundamental in approaches based on augmented Lagrangians (see andreani2007augmented ; fernandez2012local ; birgin2014practical ).

Henceforth, we will assume that the following hypotheses hold.

  1. (H1)

    The sequences {xk}\left\{x^{k}\right\}, {λk}\left\{\lambda^{k}\right\} generated by the application of Algorithm 2 or Algorithm 3 satisfy the following conditions: limkxk=x¯\lim_{k\to\infty}x^{k}=\bar{x} and λ¯k+1=P[λmin,λmax]m(λk+1)\bar{\lambda}^{k+1}=P_{\left[\lambda_{\min},\lambda_{\max}\right]^{m}}\left(\lambda^{k+1}\right), orthogonal projection onto [λmin,λmax]m\left[\lambda_{\min},\lambda_{\max}\right]^{m}.

  2. (H2)

    The point x¯\bar{x} is feasible (that is, h(x¯)=0h\left(\bar{x}\right)=0).

  3. (H3)

    At the point x¯\bar{x}, the LICQ condition is satisfied, meaning that the gradients hi(x¯)\nabla h_{i}\left(\bar{x}\right), i=1,,mi=1,\ldots,m are linearly independent.

  4. (H4)

    The second–order sufficient optimality condition is satisfied at x¯\bar{x}, with Lagrange multiplier λ¯m\bar{\lambda}\in\mathbbm{R}^{m}.

  5. (H5)

    It is satisfied that λ¯[λmin,λmax]m\bar{\lambda}\in\left[\lambda_{\min},\lambda_{\max}\right]^{m}.

Remark 3

Note that, due to Hypothesis (H3), the Lagrange multiplier mentioned in Hypothesis (H4) is unique.

Proposition 5

Suppose that Hypotheses (H1), (H2), (H3) and (H5) hold. Then limkλ¯k=λ¯\lim_{k\to\infty}\bar{\lambda}^{k}=\bar{\lambda}.

Proof

First, let us see that the sequence {λk}\left\{\lambda^{k}\right\} is bounded. Suppose, in fact, that is not; then there exists a subsequence {λk}k𝒦\left\{\lambda^{k}\right\}_{k\in\mathcal{K}} of {λk}\left\{\lambda^{k}\right\}, where 𝒦\mathcal{K}\subset\mathbbm{N} is infinite, such that limkk𝒦λk=\lim_{\underset{k\in\mathcal{K}}{k\to\infty}}\left\|\lambda^{k}\right\|=\infty. Furthermore, in both Algorithm 2 and Algorithm 3, from Step 2 and the definition of λk+1\lambda^{k+1} it follows that, for each k0k\in\mathbbm{N}_{0},

f(xk+1)+h(xk+1)λk+1εk.\left\|\nabla f\left(x^{k+1}\right)+\nabla h\left(x^{k+1}\right)\lambda^{k+1}\right\|\leq\varepsilon_{k}.

Or, equivalently,

f(xk)+h(xk)λkεk1,for all k.\left\|\nabla f\left(x^{k}\right)+\nabla h\left(x^{k}\right)\lambda^{k}\right\|\leq\varepsilon_{k-1},\quad\text{for all }k\in\mathbbm{N}. (11)

Consequently, for each k𝒦k\in\mathcal{K}, it follows that

f(xk)λk+h(xk)λkλkεk1λk.\left\|\frac{\nabla f\left(x^{k}\right)}{\left\|\lambda^{k}\right\|}+\nabla h\left(x^{k}\right)\frac{\lambda^{k}}{\left\|\lambda^{k}\right\|}\right\|\leq\frac{\varepsilon_{k-1}}{\left\|\lambda^{k}\right\|}.

Thus, by taking the limit over kk in 𝒦\mathcal{K} and using Hypothesis (H1), we obtain

h(x¯)λ~=0,\nabla h\left(\bar{x}\right)\tilde{\lambda}=0,

where λ~\tilde{\lambda} is a limit point of {λkλk}k𝒦\left\{\frac{\lambda^{k}}{\left\|\lambda^{k}\right\|}\right\}_{k\in\mathcal{K}}. Since λ~=1\left\|\tilde{\lambda}\right\|=1, this contradicts Hypothesis (H3). Therefore, {λk}\left\{\lambda^{k}\right\} must be bounded.

Since {λk}\left\{\lambda^{k}\right\} is a bounded sequence, to show that limkλk=λ¯\lim_{k\to\infty}\lambda^{k}=\bar{\lambda}, it is sufficient to prove that λ¯\bar{\lambda} is its unique limit point. To this end, let λ^\hat{\lambda} be a limit point of {λk}\left\{\lambda^{k}\right\}. We will demonstrate that λ^=λ¯\hat{\lambda}=\bar{\lambda}. Since λ^\hat{\lambda} is a limit point of {λk}\left\{\lambda^{k}\right\}, there exists an infinite set 𝒦\mathcal{K}\subset\mathbbm{N}, 𝒦\mathcal{K} such that limkk𝒦λk=λ^\lim_{\underset{k\in\mathcal{K}}{k\to\infty}}\lambda^{k}=\hat{\lambda}. Thus, by taking the limit of (11) over kk in 𝒦\mathcal{K}, we obtain that

f(x¯)+h(x¯)λ^=0.\nabla f\left(\bar{x}\right)+\nabla h\left(\bar{x}\right)\hat{\lambda}=0.

Therefore, by Remark 3, we have λ^=λ¯\hat{\lambda}=\bar{\lambda}. Thus, λ¯\bar{\lambda} is the unique limit point of the sequence {λk}\left\{\lambda^{k}\right\}.

Finally, by the continuity of the projection, it follows that limkλ¯k=λ¯\lim_{k\to\infty}\bar{\lambda}^{k}=\bar{\lambda}.

Theorem 4.1

Let us consider either Algorithm 2 or Algorithm 3, and assume that Hypotheses (H1) through (H5) are satisfied. Additionally, suppose that sk=O(h(xk))s_{k}=O\left(\left\|h\left(x^{k}\right)\right\|\right) and εk=o(h(xk))\varepsilon_{k}=o\left(\left\|h\left(x^{k}\right)\right\|\right). Then, the sequence of penalty parameters {rk}\left\{r_{k}\right\} is bounded.

Proof

First, note that rktk+1\frac{r_{k}}{t_{k+1}}\to\infty. Indeed, from Hipotheses (H1) and (H2), and the continuity of hh, we have limkh(xk)=0\lim_{k\to\infty}h\left(x^{k}\right)=0. Since sk=O(h(xk))s_{k}=O\left(\left\|h\left(x^{k}\right)\right\|\right), it follows that tk+10t_{k+1}\to 0. Given that {rk}\left\{r_{k}\right\} is non–decreasing, the result follows.

On the other hand, recall that L~λ,r(x,t)=L¯(x;λ,rt)+r2t\tilde{L}_{\lambda,r}\left(x,t\right)=\bar{L}\left(x;\lambda,\frac{r}{t}\right)+\frac{r}{2}t when t>0t>0. Considering this, along with the previous paragraph and using Hypotheses (H1), (H3) and (H4), we obtain, according to (fernandez2012local, , Prop. 4.2), that there exists M>0M>0 such that, for sufficiently large kk,

xk+1x¯+λk+1λ¯M(εk+tk+1rkλ¯kλ¯).\left\|x^{k+1}-\bar{x}\right\|+\left\|\lambda^{k+1}-\bar{\lambda}\right\|\leq M\left(\varepsilon_{k}+\frac{t_{k+1}}{r_{k}}\left\|\bar{\lambda}^{k}-\bar{\lambda}\right\|\right). (12)

Furthermore, since h(x¯)=0h\left(\bar{x}\right)=0, and due to the continuity of the first derivatives of hh, there exists L>0L>0 such that, for all k0k\in\mathbbm{N}_{0},

h(xk+1)Lxk+1x¯.\left\|h\left(x^{k+1}\right)\right\|\leq L\left\|x^{k+1}-\bar{x}\right\|.

Therefore, by (12), it follows that for sufficiently large kk,

h(xk+1)LM(εk+tk+1rkλ¯kλ¯).\left\|h\left(x^{k+1}\right)\right\|\leq LM\left(\varepsilon_{k}+\frac{t_{k+1}}{r_{k}}\left\|\bar{\lambda}^{k}-\bar{\lambda}\right\|\right). (13)

Now, it is necessary to distinguish between two cases, depending on the algorithm used to obtain tk+1t_{k+1}:

  • In the case where tk+1t_{k+1} is obtained from the application of Algorithm 2, according to Step 2, we have tk+1=h(xk)2+sk2t_{k+1}=\sqrt{\left\|h\left(x^{k}\right)\right\|^{2}+s_{k}^{2}}. Since sk=O(h(xk))s_{k}=O\left(\left\|h\left(x^{k}\right)\right\|\right), it follows that tk+1=O(h(xk))t_{k+1}=O\left(\left\|h\left(x^{k}\right)\right\|\right). Consequently, given that limkλ¯k=λ¯\lim_{k\to\infty}\bar{\lambda}^{k}=\bar{\lambda}, we obtain that tk+1rkλ¯kλ¯=o(h(xk))\frac{t_{k+1}}{r_{k}}\left\|\bar{\lambda}^{k}-\bar{\lambda}\right\|=o\left(\left\|h\left(x^{k}\right)\right\|\right). Therefore, since εk=o(h(xk))\varepsilon_{k}=o\left(\left\|h\left(x^{k}\right)\right\|\right), from (13), it follows that h(xk+1)=o(h(xk))\left\|h\left(x^{k+1}\right)\right\|=o\left(\left\|h\left(x^{k}\right)\right\|\right).

  • In the case where tk+1t_{k+1} arises from the application of Algorithm 3, by Remark 2, we have that

    tk+1h(xk+1)2+sk212εkrk.t_{k+1}\leq\sqrt{\frac{\left\|h\left(x^{k+1}\right)\right\|^{2}+s_{k}^{2}}{1-\frac{2\varepsilon_{k}}{r_{k}}}}.

    Then, since 12εkrk11-\frac{2\varepsilon_{k}}{r_{k}}\to 1 and sk=O(h(xk))s_{k}=O\left(\left\|h\left(x^{k}\right)\right\|\right), it follows that tk+1=O(h(xk+1))+O(h(xk))t_{k+1}=O\left(\left\|h\left(x^{k+1}\right)\right\|\right)+O\left(\left\|h\left(x^{k}\right)\right\|\right). Therefore, applying a similar reasoning to the previous case, we obtain h(xk+1)=o(h(xk+1))+o(h(xk))\left\|h\left(x^{k+1}\right)\right\|=o\left(\left\|h\left(x^{k+1}\right)\right\|\right)+o\left(\left\|h\left(x^{k}\right)\right\|\right). This implies, as easily verified, that h(xk+1)=o(h(xk))\left\|h\left(x^{k+1}\right)\right\|=o\left(\left\|h\left(x^{k}\right)\right\|\right).

In summary, we have shown that, regardless of the algorithm used to obtain tk+1t_{k+1}, it holds that h(xk+1)=o(h(xk))\left\|h\left(x^{k+1}\right)\right\|=o\left(\left\|h\left(x^{k}\right)\right\|\right). Therefore, h(xk+1)τh(xk)\left\|h\left(x^{k+1}\right)\right\|\leq\tau\left\|h\left(x^{k}\right)\right\| for sufficiently large kk. Consequently, the sequence {rk}\left\{r_{k}\right\} must be bounded.

5 Numerical results

In this section, we will analyze a series of problems from the base hock1980test , selecting only those problems with equality constraints: problems 6–9, 26–28, 39–40, 42, 47–52, 56, 61, 77–79. Additionally, problems formulated by the authors (501–514) have been included, which feature either one or two constraints, a domain dimension of at most 3, and involve simple functions (typically linear or quadratic). Some of these problems have finite feasible sets, while others have infinite but easily described sets (such as a spheres or a circles). Problem 508 uses the well–known Rosenbrock function as the objective function smith2020 .

For the practical implementations of ALGENCAN, we used the default parameters. For both Algorithm 2 and Algorithm 3, the parameters were set as follows: λmin=1020\lambda_{\min}=-10^{20}, λmax=1020\lambda_{\max}=10^{20}, τ=0.9\tau=0.9, γ=10\gamma=10, tol=108tol=10^{-8}, x0x_{0} as specified for each problem, t0=1t_{0}=1, λ¯0=0\bar{\lambda}^{0}=0, r0=10r_{0}=10.

All experiments were conducted on a PC running Linux, Core(TM) i7-10510U CPU 1.80GHz with 32 GB of RAM. The algorithms were implemented in GNU Fortran 95, using compiler version 4:11.2.0.

Step 2 of Algorithms 2 and 3 was solved by GENCAN, which is part of the well–known software ALGENCAN birgin2014practical ; birgin2000nonmonotone ; andreani2008augmented (version 3.1.1). GENCAN birgin2002large ; andreani2010second ; andretta2005practical ; birgin2001box ; birgin2008structured is an active set method with projected gradients designed for bound–constrained minimization.

The test problems were solved with ALGENCAN, Algorithm 2, and Algorithm 3. The exit flags of ALGENCAN are stored in the variable Inform, and can take the following values:

  • 0: Solution was found (according to the stopping criteria of ALGENCAN).

  • 1: The penalty parameter is too large. The problem may be infeasible or badly scaled. Further analysis is required.

  • 2: Maximum of iterations reached. Feasibility–complementarity and optimality tolerances could not be achieved. Whether the final iterate is a solution or not requires further analysis.

  • 3: It seems that a stationary–of–the–infeasibility probably infeasible point was found. Whether the final iterate is a solution or not requires further analysis.

The exit flags for the other two algorithms are also stored in the variable Inform, with the following values:

  • 0: Solution was found (i.e., Step 1 was successfully completed).

  • 1: Maximum number of iterations reached.

Tables LABEL:table:algencan, LABEL:table:alg2, LABEL:table:alg3 show the results using ALGENCAN, Algorithm 2, and Algorithm 3, respectively. Each column of the tables represents, from left to right:

  • Prob.: problem number to be solved.

  • It.: number of external iterations.

  • KKT norm: KKT norm of the last calculated point.

  • Int. It.: the sum of all internal iterations during the algorithm’s execution.

  • Inform: exit flag, as explained above.

  • xx: last point determined by the algorithm.

  • f(x)f\left(x\right): value of the objective function at the last approximation.

  • λ\lambda: last Lagrange multiplier obtained.

  • Infeas.: norm of infeasibility at the final point.

It is important to highlight that the stopping criteria for ALGENCAN differ from those of Algorithms 2 and 3. Specifically, the latter algorithms stop when the KKT norm is less than a predefined tolerance, whereas ALGENCAN employs a different criterion. This discrepancy explains why there are problems where ALGENCAN reports a zero output but does not meet the KKT norm tolerance required by Algorithms 2 and 3.

Considering these stopping criteria, we observe that ALGENCAN was unable to solve problems 56, 79, 511 and 512, whereas Algorithms 2 and 3 could not solve problems 26, 56, 79 and 511.

Some preliminary conclusions can be drawn from inspecting the Tables:

  • The three algorithms solve all problems except four. ALGENCAN could not solve problems 56, 79, 511, 512, while Algorithms 2 and 3 could not solve problems 26, 56, 79 and 511. This means that ALGENCAN was the winner on problem 26, and the other two algorithms were the winners on 512.

  • Considering the number of internal iterations, Algorithm 2 is better than Algorithm 3.

  • Considering the number of internal iterations, Algorithm 2 is a good competitor to ALGENCAN (in fact, it performs fewer internal iterations in almost all problems).

  • All three algorithms obtain reasonable good primal approximations but fail to solve the Lagrangian multipliers.

The numerical experiments shown in the result tables suggest that our algorithms are competitive when compared to the well–known ALGENCAN software. This encourages us to investigate deeper into this novel strategy in order to be able to solve a wider range of test problems.

Table 1: Problems solved by ALGENCAN.
Prob. It. KKT norm Int. it. Inform xx f(x)f\left(x\right) λ\lambda Infeas.
6 6 3.19E-10 23 0 1.0000E+00 7.6980E-20 3.1164E-11 5.2225E-12
1.0000E+00
7 15 4.00E-10 49 0 -1.8908E-13 -1.7321E+00 2.8868E-01 4.0008E-10
1.7321E+00
8 3 7.94E-14 9 0 4.6016E+00 -1.0000E+00 -1.6095E-15 7.3949E-14
1.9558E+00 -3.0103E-15
9 4 3.44E-10 8 0 -7.5000E+01 -5.0000E-01 -3.2725E-02 2.6745E-10
-1.0000E+02
26 5 5.96E-09 37 0 1.0004E+00 3.6300E-13 -2.4609E-10 1.4997E-10
1.0004E+00
9.9961E-01
27 8 8.43E-09 55 0 -1.0000E+00 4.0000E-02 4.0000E-02 3.8751E-11
1.0000E+00
-1.1131E-08
28 7 2.23E-09 33 0 5.0000E-01 1.5406E-18 -1.2872E-09 1.9018E-11
-5.0000E-01
5.0000E-01
39 20 1.97E-10 125 0 1.0000E+00 -1.0000E+00 -1.0000E+00 1.7795E-10
1.0000E+00 -1.0000E+00
1.0262E-14
-1.2755E-14
40 8 1.33E-09 54 0 7.9370E-01 -2.5000E-01 5.0000E-01 5.1885E-10
7.0711E-01 -4.7194E-01
5.2973E-01 3.5355E-01
8.4090E-01
42 7 4.05E-09 33 0 2.0000E+00 1.3858E+01 -2.0000E+00 8.3106E-11
2.0000E+00 2.5355E+00
8.4853E-01
1.1314E+00
47 7 4.34E-09 68 0 1.0000E+00 8.7943E-19 -7.8935E-11 9.1590E-11
1.0000E+00 1.7514E-09
1.0000E+00 -8.9889E-10
1.0000E+00
1.0000E+00
48 6 1.82E-08 22 0 1.0000E+00 3.5793E-17 -2.9494E-09 1.2964E-12
1.0000E+00 4.5016E-10
1.0000E+00
1.0000E+00
1.0000E+00
49 7 3.02E-07 51 0 1.0096E+00 5.4099E-10 5.1837E-08 3.3812E-11
1.0096E+00 -2.9788E-08
1.0000E+00
9.9518E-01
1.0000E+00
50 5 2.82E-08 20 0 1.0000E+00 6.6509E-18 6.2155E-09 2.1283E-10
1.0000E+00 -1.0453E-08
1.0000E+00 5.1301E-09
1.0000E+00
1.0000E+00
51 6 1.74E-09 19 0 1.0000E+00 2.8382E-20 6.3542E-10 7.2982E-11
1.0000E+00 9.5253E-11
1.0000E+00 -1.4789E-09
1.0000E+00
1.0000E+00
52 8 2.46E-08 42 0 -9.4556E-02 5.3266E+00 3.2779E+00 1.6924E-11
3.1519E-02 2.9054E+00
5.1576E-01 -7.7479E+00
-4.5272E-01
3.1519E-02
56 15 5.55E+37 34 1 -3.7198E+16 1.3968E+47 -9.5844E+36 6.0282E+16
3.4676E+16 8.9346E+36
1.0829E+14 2.7902E+34
-1.9582E+18 9.2673E+35
-1.9282E+18
-1.9291E+18
2.5304E+18
61 6 2.79E-08 18 0 5.3268E+00 -1.4365E+02 -8.8768E-01 3.3420E-11
-2.1190E+00 -1.7378E+00
3.2105E+00
77 10 5.53E-09 54 0 1.1662E+00 2.4151E-01 -8.5540E-02 4.5998E-11
1.1821E+00 -3.1878E-02
1.3803E+00
1.5060E+00
6.1092E-01
78 8 8.65E-09 38 0 -1.7171E+00 -2.9197E+00 7.4445E-01 6.7312E-10
1.5957E+00 -7.0358E-01
1.8272E+00 9.6806E-02
-7.6364E-01
-7.6364E-01
79 50 2.11E+13 396 2 3.0206E+04 2.4224E+14 -1.3817E+09 5.1635E-05
-3.5822E+02 -2.9089E+11
-5.4120E+01 2.9505E+08
3.2900E+03
6.6213E-05
501 26 6.28E-10 47 0 1.0000E+00 -1.5000E+00 5.5556E-01 6.2821E-10
502 3 7.20E-11 3 0 7.1957E-11 2.5889E-21 -7.1956E-11 7.1957E-11
503 3 4.73E-11 3 0 2.3672E-11 1.1207E-21 -4.7344E-11 4.7344E-11
2.3672E-11
504 13 3.93E-10 10 0 -1.0000E+00 0.0000E+00 -8.7294E-11 3.9268E-10
505 8 1.78E-10 65 0 -1.8724E-19 -1.0000E+00 1.5000E+00 1.9016E-11
-1.0000E+00
1.4775E-20
506 16 7.43E-10 41 0 -7.0711E-01 -1.4142E+00 7.0711E-01 7.4303E-10
-7.0711E-01
507 14 2.92E-10 30 0 -1.0000E+00 -1.0000E+00 -5.0000E-01 2.9178E-10
508 7 5.34E-03 21 0 1.0000E+00 8.7977E-08 -6.5693E-03 3.6369E-12
1.0000E+00
509 5 1.61E-08 24 0 6.0000E+00 -1.0800E+02 1.5000E+00 1.5200E-10
3.0000E+00
510 7 4.08E-09 35 0 -5.3452E-01 -3.7417E+00 1.8708E+00 2.3976E-11
-8.0178E-01
-2.6726E-01
511 50 1.61E-06 377 2 1.7203E-10 -2.7453E-05 3.6425E+04 4.1482E-10
-2.7453E-05 -1.8212E+04
512 0 1.73E+00 0 3 0.0000E+00 0.0000E+00 0.0000E+00 1.0000E+00
0.0000E+00
513 2 1.66E-10 5 0 1.6601E-10 -7.5943E-40 -1.3280E-19 1.6601E-10
514 6 1.91E-10 6 0 1.0000E+00 5.0000E-01 -1.0000E+00 1.9056E-10
-4.8412E-50
Table 2: Problems solved by Algorithm 2
Prob. It. KKT norm Int. it. Inform xx f(x)f\left(x\right) λ\lambda Infeas.
6 6 6.59E-09 48 0 1.0000E+00 1.7749E-30 -6.2058E-15 1.5465E-12
1.0000E+00
7 6 1.34E-09 18 0 -2.8635E-11 -1.7321E+00 2.8868E-01 2.0242E-12
1.7321E+00
8 4 2.95E-12 9 0 4.6016E+00 -1.0000E+00 -2.8633E-13 0.0000E+00
1.9558E+00 -2.4159E-14
9 3 5.51E-10 4 0 -1.8750E+03 -5.0000E-01 -3.2725E-02 0.0000E+00
-2.5000E+03
26 100 9.65E-02 337 1 9.9490E-01 1.0590E-08 -1.9556E-02 0.0000E+00
9.9490E-01
1.0050E+00
27 7 1.65E-11 27 0 -1.0000E+00 4.0000E-02 4.0000E-02 7.9936E-15
1.0000E+00
-1.2091E-15
28 5 4.44E-12 6 0 5.0000E-01 2.3179E-24 1.1578E-12 6.6613E-16
-5.0000E-01
5.0000E-01
39 9 2.03E-09 22 0 1.0000E+00 -1.0000E+00 -1.0000E+00 1.2511E-11
1.0000E+00 -1.0000E+00
1.1100E-10
1.6069E-15
40 6 6.18E-09 11 0 7.9370E-01 -2.5000E-01 5.0000E-01 2.2698E-11
7.0711E-01 -4.7194E-01
5.2973E-01 3.5355E-01
8.4090E-01
42 7 1.23E-09 14 0 2.0000E+00 1.3858E+01 -2.0000E+00 6.0051E-12
2.0000E+00 2.5355E+00
8.4853E-01
1.1314E+00
47 6 1.61E-09 25 0 1.0000E+00 2.2413E-20 -1.7803E-11 4.3042E-13
1.0000E+00 -4.7435E-11
1.0000E+00 1.1626E-10
1.0000E+00
1.0000E+00
48 4 2.91E-09 6 0 1.0000E+00 2.9251E-21 -9.0754E-12 9.4296E-13
1.0000E+00 -1.3989E-11
1.0000E+00
1.0000E+00
1.0000E+00
49 16 7.48E-09 58 0 1.0018E+00 6.2683E-13 5.7117E-10 6.2172E-15
1.0018E+00 8.2825E-11
1.0000E+00
9.9911E-01
1.0000E+00
50 4 1.08E-09 11 0 1.0000E+00 8.0396E-21 -2.1761E-11 3.3669E-13
1.0000E+00 1.0586E-11
1.0000E+00 -1.2536E-11
1.0000E+00
1.0000E+00
51 6 2.24E-09 8 0 1.0000E+00 5.6697E-24 -2.2413E-10 1.3496E-12
1.0000E+00 -1.9609E-10
1.0000E+00 6.8389E-10
1.0000E+00
1.0000E+00
52 10 6.35E-10 22 0 -9.4556E-02 5.3266E+00 3.2779E+00 2.1715E-12
3.1519E-02 2.9054E+00
5.1576E-01 -7.7479E+00
-4.5272E-01
3.1519E-02
56 100 1.40E+91 298 1 1.5152E+00 1.0502E+00 -5.0000E+03 6.3584E+00
9.6954E-01 -5.0000E+03
-7.1490E-01 -5.0000E+03
2.9998E+19 5.0000E+03
7.1635E+18
-2.2724E+19
-1.9009E+19
61 6 1.49E-09 16 0 5.3268E+00 -1.4365E+02 -8.8768E-01 1.0668E-12
-2.1190E+00 -1.7378E+00
3.2105E+00
77 7 5.21E-10 21 0 1.1662E+00 2.4151E-01 -8.5540E-02 4.8630E-14
1.1821E+00 -3.1878E-02
1.3803E+00
1.5060E+00
6.1092E-01
78 6 1.13E-09 14 0 -1.7171E+00 -2.9197E+00 7.4445E-01 2.5413E-13
1.5957E+00 -7.0358E-01
1.8272E+00 9.6806E-02
-7.6364E-01
-7.6364E-01
79 100 2.47E+06 375 1 5.7299E-01 5.9982E+01 5.0000E+03 2.3441E-12
1.1160E+00 -5.0000E+03
1.6417E+00 5.0000E+03
4.4075E+00
3.4904E+00
501 7 1.90E-10 16 0 1.0000E+00 -1.5000E+00 5.5556E-01 5.1514E-13
502 4 9.17E-09 3 0 5.9086E-11 1.7456E-21 -5.9073E-11 5.9086E-11
503 4 9.46E-09 3 0 -2.1551E-11 9.2889E-22 4.3099E-11 4.3102E-11
-2.1551E-11
504 5 5.47E-11 20 0 1.0000E+00 0.0000E+00 -6.4788E-14 5.2625E-14
505 9 1.51E-09 51 0 1.1604E-11 -1.0000E+00 1.5000E+00 3.3107E-13
-1.0000E+00
2.4703E-14
506 6 7.71E-09 19 0 -7.0711E-01 -1.4142E+00 7.0711E-01 2.0228E-11
-7.0711E-01
507 6 1.73E-09 13 0 -1.0000E+00 -1.0000E+00 -5.0000E-01 4.5453E-12
508 6 6.77E-10 45 0 1.0000E+00 6.2476E-24 -4.9880E-12 2.5120E-12
1.0000E+00
509 5 1.65E-09 17 0 6.0000E+00 -1.0800E+02 1.5000E+00 2.1316E-13
3.0000E+00
510 7 4.37E-10 27 0 -5.3452E-01 -3.7417E+00 1.8708E+00 9.7899E-13
-8.0178E-01
-2.6726E-01
511 100 2.07E+15 501 1 5.1787E-11 1.5724E-05 5.0000E+03 1.4917E-10
1.5724E-05 5.0000E+03
512 6 6.19E-09 7 0 -7.0711E-01 -9.8777E-01 1.1027E-01 1.6256E-11
-7.0711E-01
513 2 6.45E-10 3 0 -4.5608E-12 -4.3266E-46 -3.2249E-10 4.5608E-12
514 7 3.55E-10 6 0 1.0000E+00 5.0000E-01 -1.0000E+00 2.1860E-13
3.5348E-10
Table 3: Problems solved by Algorithm 3
Prob. It. KKT norm Int. it. Inform xx f(x)f\left(x\right) λ\lambda Infeas.
6 13 2.57E-09 110 0 1.0000E+00 7.3651E-18 3.4826E-10 6.6613E-14
1.0000E+00
7 8 1.90E-09 46 0 5.0272E-10 -1.7321E+00 2.8868E-01 1.4895E-12
1.7321E+00
8 13 5.35E-09 40 0 4.6016E+00 -1.0000E+00 1.0027E-10 2.3202E-13
1.9558E+00 -3.3556E-11
9 5 2.18E-09 14 0 -1.5000E+01 -5.0000E-01 -3.2725E-02 1.4211E-13
-2.0000E+01
26 100 4.74E+01 660 1 9.9437E-01 1.5625E-08 -1.1138E-02 2.0828E-13
9.9438E-01
1.0056E+00
27 6 5.22E-09 47 0 -1.0000E+00 4.0000E-02 4.0000E-02 2.9801E-11
1.0000E+00
-2.0599E-09
28 7 2.63E-09 22 0 5.0000E-01 2.0302E-18 1.1782E-09 2.1871E-14
-5.0000E-01
5.0000E-01
39 9 6.28E-09 36 0 1.0000E+00 -1.0000E+00 -1.0000E+00 8.5081E-11
1.0000E+00 -1.0000E+00
2.7535E-10
4.2486E-11
40 6 9.23E-09 18 0 7.9370E-01 -2.5000E-01 5.0000E-01 2.0256E-11
7.0711E-01 -4.7194E-01
5.2973E-01 3.5355E-01
8.4090E-01
42 7 4.02E-09 22 0 2.0000E+00 1.3858E+01 -2.0000E+00 8.7150E-12
2.0000E+00 2.5355E+00
8.4853E-01
1.1314E+00
47 10 9.57E-09 46 0 1.0000E+00 4.1159E-19 4.3795E-10 1.6206E-12
1.0000E+00 -1.0239E-09
1.0000E+00 3.6878E-09
1.0000E+00
1.0000E+00
48 7 5.11E-09 22 0 1.0000E+00 6.1576E-18 1.2104E-09 4.9714E-14
1.0000E+00 4.8590E-10
1.0000E+00
1.0000E+00
1.0000E+00
49 24 6.57E-09 112 0 1.0022E+00 1.4621E-12 1.3366E-09 2.8087E-15
1.0022E+00 -3.6112E-10
1.0000E+00
9.9890E-01
1.0000E+00
50 19 9.91E-09 63 0 1.0000E+00 1.3296E-17 -6.0127E-10 4.0116E-14
1.0000E+00 4.0573E-10
1.0000E+00 -6.9435E-10
1.0000E+00
1.0000E+00
51 13 7.98E-09 40 0 1.0000E+00 4.1029E-18 1.5427E-09 1.8011E-13
1.0000E+00 -4.4049E-10
1.0000E+00 1.8594E-09
1.0000E+00
1.0000E+00
52 10 8.71E-09 32 0 -9.4556E-02 5.3266E+00 3.2779E+00 2.5459E-11
3.1519E-02 2.9054E+00
5.1576E-01 -7.7479E+00
-4.5272E-01
3.1519E-02
56 100 5.80E+98 237 1 -3.0520E+19 -2.2250E+59 -5.0000E+03 1.2964E+20
-7.2904E+19 -5.0000E+03
1.0000E+20 5.0000E+03
-1.0000E+20 5.0000E+03
1.0000E+20
-1.0000E+20
-1.0000E+20
61 8 9.38E-09 37 0 5.3268E+00 -1.4365E+02 -8.8768E-01 3.3364E-12
-2.1190E+00 -1.7378E+00
3.2105E+00
77 8 9.97E-09 51 0 1.1662E+00 2.4151E-01 -8.5540E-02 6.9963E-12
1.1821E+00 -3.1878E-02
1.3803E+00
1.5060E+00
6.1092E-01
78 11 2.19E-09 36 0 -1.7171E+00 -2.9197E+00 7.4445E-01 5.9653E-13
1.5957E+00 -7.0358E-01
1.8272E+00 9.6806E-02
-7.6364E-01
-7.6364E-01
79 100 2.24E+20 330 1 -3.2497E+01 6.0361E+08 -5.0000E+03 3.2945E+06
-5.3327E+01 5.0000E+03
1.4875E+02 5.0000E+03
1.7034E+02
1.3614E+01
501 6 1.03E-09 21 0 1.0000E+00 -1.5000E+00 5.5556E-01 5.1683E-12
502 6 1.11E-11 21 0 6.3728E-15 2.0307E-29 -7.5249E-15 6.3728E-15
503 6 1.60E-09 21 0 -5.3100E-12 5.6392E-23 7.3257E-10 1.0620E-11
-5.3100E-12
504 8 5.24E-09 40 0 2.0000E+00 9.0000E+00 -1.0000E+00 4.2739E-12
505 7 6.66E-09 33 0 1.0000E+00 7.5269E-14 3.9434E-10 6.0957E-12
4.2221E-05
-1.8313E-09
506 6 7.39E-09 22 0 7.0711E-01 1.4142E+00 -7.0711E-01 1.0221E-11
7.0711E-01
507 6 5.76E-09 20 0 -1.0000E+00 -1.0000E+00 -5.0000E-01 5.9488E-12
508 7 8.53E-09 51 0 1.0000E+00 9.4913E-20 -1.0074E-08 2.2686E-12
1.0000E+00
509 11 9.62E-09 63 0 6.0000E+00 -1.0800E+02 1.5000E+00 1.4779E-12
3.0000E+00
510 7 4.22E-09 42 0 -5.3452E-01 -3.7417E+00 1.8708E+00 9.4080E-13
-8.0178E-01
-2.6726E-01
511 100 4.59E-03 661 1 3.5683E-10 -3.5009E-05 5.0000E+03 5.5027E-10
-3.5009E-05 -5.0000E+03
512 6 2.13E-09 19 0 -7.0711E-01 -9.8777E-01 1.1027E-01 1.9461E-12
-7.0711E-01
513 5 5.38E-12 16 0 3.4568E-14 -1.4280E-54 1.8983E-14 3.4568E-14
514 6 9.78E-09 19 0 1.0000E+00 5.0000E-01 -1.0000E+00 5.5404E-11
-1.9922E-10

6 Conclusions

This paper presents a novel method for solving nonlinear programming problems based on the sharp augmented Lagrangian. It introduces a smoothed function to overcome the nondifferentiability of the sharp augmented Lagrangian, thereby facilitating the minimization process.

The exact algorithm presented in Section 2 converges to a global solution of the primal problem. However, its practical implementation faces several challenges:

  • Nondifferentiability: the objective function becomes nondifferentiable when the smoothing parameter tt approaches zero, potentially leading to numerical difficulties with optimization solvers that assume smoothness.

  • Global optimization: finding an exact global minimizer of the smoothed function is computationally expensive and can be challenging, particularly for large–scale problems.

These challenges demand relaxing the requirement for exact global minimizers.

The proposed inexact algorithms offer a more practical approach. A barrier function ensures the existence of inexact stationary points, even if the original function may have nondifferentiable points. One algorithm employs a fixed smoothing parameter, while the other uses a varying smoothing parameter. The convergence properties of both algorithms are analyzed, demonstrating that limit points of the generated sequences are either stationary points of the original problem or stationary points of a feasibility problem.

The boundedness of the penalty parameter is guaranteed under specific assumptions, including the convergence of the iterates, feasibility of the limit point, linear independence of the gradients, and satisfaction of the second–order sufficient optimality condition. Sufficient conditions are provided to ensure the boundedness of the penalty parameter, focusing on the relationship between the smoothing parameter and the constraint violation.

The proposed algorithms exhibit competitive performance compared to ALGENCAN, particularly in terms of the number of internal iterations. However, their performance varies depending on the specific problem characteristics, suggesting the need for further analysis and potential improvements. The promising results obtained in this study motivate further research to explore the potential of these algorithms for solving a wider range of problems.

Appendix A Description of the problems 501–514

Problem 501

Objective function: f(x)=12x22x.f\left(x\right)=\frac{1}{2}x^{2}-2x.
Constraints: h(x)=x(x1)(x+1)=0.h\left(x\right)=x\left(x-1\right)\left(x+1\right)=0.
Initial point: x0=2.x^{0}=2.
Solutions: x=1,λ=12.x^{*}=1,\quad\lambda^{*}=\frac{1}{2}.

Problem 502

Objective function: f(x)=12x2.f\left(x\right)=\frac{1}{2}x^{2}.
Constraints: h(x)=x=0.h\left(x\right)=x=0.
Initial point: x0=10.x^{0}=10.
Solutions: x=0,λ=0.x^{*}=0,\quad\lambda^{*}=0.

Problem 503

Objective function: f(x)=x12+x22.f\left(x\right)=x_{1}^{2}+x_{2}^{2}.
Constraints: h(x)=x1+x2=0.h\left(x\right)=x_{1}+x_{2}=0.
Initial point: x0=(3,3).x^{0}=\left(3,3\right).
Solutions: x=(0,0),λ=0.x^{*}=\left(0,0\right),\quad\lambda^{*}=0.

Problem 504

Objective function: f(x)=(x21)2.f\left(x\right)=\left(x^{2}-1\right)^{2}.
Constraints: h(x)=(x21)(x24)=0.h\left(x\right)=\left(x^{2}-1\right)\left(x^{2}-4\right)=0.
Initial point: x0=10.x^{0}=10.
Solutions: x=±1,λ=0.x^{*}=\pm 1,\quad\lambda^{*}=0.

Problem 505

Objective function: f(x)=x23+x1x32.f\left(x\right)=x_{2}^{3}+x_{1}x_{3}^{2}.
Constraints: h(x)=x12+x22+x321=0.h\left(x\right)=x_{1}^{2}+x_{2}^{2}+x_{3}^{2}-1=0.
Initial point: x0=(1,1,1).x^{0}=\left(1,1,1\right).
Solutions: x=(0,1,0),λ=32.x^{*}=\left(0,-1,0\right),\quad\lambda^{*}=\frac{3}{2}.

Problem 506

Objective function: f(x)=x1+x2.f\left(x\right)=x_{1}+x_{2}.
Constraints: h(x)=x12+x221=0.h\left(x\right)=x_{1}^{2}+x_{2}^{2}-1=0.
Initial point: x0=(10,10).x^{0}=\left(10,10\right).
Solutions: x=(22,22),λ=22.x^{*}=\left(-\frac{\sqrt{2}}{2},-\frac{\sqrt{2}}{2}\right),\quad\lambda^{*}=\frac{\sqrt{2}}{2}.

Problem 507

Objective function: f(x)=x.f\left(x\right)=x.
Constraints: h(x)=x3x=0.h\left(x\right)=x^{3}-x=0.
Initial point: x0=1.5.x^{0}=-1.5.
Solutions: x=1,λ=12.x^{*}=-1,\quad\lambda^{*}=-\frac{1}{2}.

Problem 508

Objective function: f(x)=100(x2x12)2+(1x1)2.f\left(x\right)=100\left(x_{2}-x_{1}^{2}\right)^{2}+\left(1-x_{1}\right)^{2}.
Constraints: h(x)=x1x2=0.h\left(x\right)=x_{1}-x_{2}=0.
Initial point: x0=(100,1.2).x^{0}=\left(100,1.2\right).
Solutions: x=(1,1),λ=0.x^{*}=\left(1,1\right),\quad\lambda^{*}=0.

Problem 509

Objective function: f(x)=x12x2.f\left(x\right)=-x_{1}^{2}x_{2}.
Constraints: h(x)=4x1x2+x12108=0.h\left(x\right)=4x_{1}x_{2}+x_{1}^{2}-108=0.
Initial point: x0=(3,3).x^{0}=\left(3,3\right).
Solutions: x=(6,3),λ=32.x^{*}=\left(6,3\right),\quad\lambda^{*}=\frac{3}{2}.

Problem 510

Objective function: f(x)=2x1+3x2+x3.f\left(x\right)=2x_{1}+3x_{2}+x_{3}.
Constraints: h(x)=x12+x22+x321=0.h\left(x\right)=x_{1}^{2}+x_{2}^{2}+x_{3}^{2}-1=0.
Initial point: x0=(1,1,1).x^{0}=\left(1,1,1\right).
Solutions: x=1414(2,3,1),λ=142.x^{*}=\frac{-\sqrt{14}}{14}\left(2,3,1\right),\quad\lambda^{*}=\frac{\sqrt{14}}{2}.

Problem 511

Objective function: f(x)=x1+x2.f\left(x\right)=x_{1}+x_{2}.
Constraints: h1(x)=(x11)2+x221=0,h_{1}\left(x\right)=\left(x_{1}-1\right)^{2}+x_{2}^{2}-1=0,
h2(x)=(x12)2+x224=0.h_{2}\left(x\right)=\left(x_{1}-2\right)^{2}+x_{2}^{2}-4=0.
Initial point: x0=(1,1).x^{0}=\left(1,1\right).
Solutions: x=(0,0)x^{*}=\left(0,0\right) (the only feasible point).

Problem 512

Objective function: f(x)=sin(x1+x2).f\left(x\right)=\sin\left(x_{1}+x_{2}\right).
Constraints: h(x)=x12+x221=0.h\left(x\right)=x_{1}^{2}+x_{2}^{2}-1=0.
Initial point: x0=(0,0).x^{0}=\left(0,0\right).
Solutions: x=(22,22),λ=22cos(2).x^{*}=\left(-\frac{\sqrt{2}}{2},-\frac{\sqrt{2}}{2}\right),\quad\lambda^{*}=\frac{\sqrt{2}}{2}\cos\left(\sqrt{2}\right).

Problem 513

Objective function: f(x)=x4.f\left(x\right)=-x^{4}.
Constraints: h(x)=x=0.h\left(x\right)=x=0.
Initial point: x0=1.x^{0}=1.
Solutions: x=0,λ=0.x^{*}=0,\quad\lambda^{*}=0.

Problem 514

Objective function: f(x)=12(x12+x22).f\left(x\right)=\frac{1}{2}\left(x_{1}^{2}+x_{2}^{2}\right).
Constraints: h(x)=x11=0.h\left(x\right)=x_{1}-1=0.
Initial point: x0=(4.9,0.1).x^{0}=\left(4.9,0.1\right).
Solutions: x=(1,0),λ=1.x^{*}=\left(1,0\right),\quad\lambda^{*}=-1.
Acknowledgements.
This work was partially supported by the following grants from SGCyT–UNNE (19F010), ANPCyT (PICT-2021-GRF-TI-00188, PICT-2019-2019-04569) and SeCYT–UNC (33620230100671CB).

References

  • [1] R. Andreani, E.G. Birgin, J.M. Martínez, and M.L. Schuverdt. Augmented Lagrangian methods under the constant positive linear dependence constraint qualification. Mathematical Programming, 111(1):5–32, 2008.
  • [2] R. Andreani, E.G. Birgin, J.M. Martínez, and M.L. Schuverdt. On augmented Lagrangian methods with general lower–level constraints. SIAM Journal on Optimization, 18(4):1286–1309, 2008.
  • [3] R. Andreani, E.G. Birgin, J.M. Martínez, and M.L. Schuverdt. Second–order negative–curvature methods for box–constrained and general constrained optimization. Computational Optimization and Applications, 45(2):209–236, 2010.
  • [4] M. Andretta, E.G. Birgin, and J.M. Martínez. Practical active–set Euclidian trust–region method with spectral projected gradients for bound–constrained minimization. Optimization, 54(3):305–325, 2005.
  • [5] A.M. Bagirov, G. Ozturk, and R. Kasimbeyli. A sharp augmented Lagrangian–based method in constrained non–convex optimization. Optimization Methods and Software, 34(3):462–488, 2019.
  • [6] D.P. Bertsekas. Constrained optimization and Lagrange multiplier methods. Computer Science and Applied Mathematics. Academic Press, 1982.
  • [7] E.G. Birgin, R.A. Castillo, and J.M. Martínez. Numerical comparison of augmented Lagrangian algorithms for nonconvex problems. Computational Optimization and Applications, 31(1):31–55, 2005.
  • [8] E.G. Birgin and J.M. Martínez. A box–constrained optimization algorithm with negative curvature directions and spectral projected gradients. In Topics in Numerical Analysis: With Special Emphasis on Nonlinear Problems, pages 49–60. Springer, 2001.
  • [9] E.G. Birgin and J.M. Martínez. Large–scale active–set box–constrained optimization method with spectral projected gradients. Computational Optimization and Applications, 23:101–125, 2002.
  • [10] E.G. Birgin and J.M. Martínez. Structured minimal–memory inexact quasi–Newton method and secant preconditioners for augmented Lagrangian optimization. Computational Optimization and Applications, 39:1–16, 2008.
  • [11] E.G. Birgin and J.M. Martínez. Practical augmented Lagrangian methods for constrained optimization. SIAM, 2014.
  • [12] E.G. Birgin, J.M. Martínez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10(4):1196–1211, 2000.
  • [13] R.S. Burachik, R.N. Gasimov, N.A. Ismayilova, and C.Y. Kaya. On a modified subgradient algorithm for dual problems via sharp augmented Lagrangian. Journal of Global Optimization, 34:55–78, 2006.
  • [14] R.S. Burachik, A.N. Iusem, and J.G. Melo. A primal dual modified subgradient algorithm with sharp Lagrangian. Journal of Global Optimization, 46:347–361, 2010.
  • [15] A.R. Conn, N. Gould, A. Sartenaer, and Ph.L. Toint. Convergence properties of an augmented Lagrangian algorithm for optimization with a combination of general equality and linear constraints. SIAM Journal on Optimization, 6(3):674–703, 1996.
  • [16] A.R. Conn, N.I.M. Gould, and P. Toint. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM Journal on Numerical Analysis, 28(2):545–572, 1991.
  • [17] A.R. Conn, N.I.M. Gould, and Ph.L. Toint. LANCELOT: a Fortran package for large–scale nonlinear optimization (Release A), volume 17 of Springer Series in Computational Mathematics. Springer–Verlag, Berlin, 1992.
  • [18] D. Fernández. Augmented Lagrangians quadratic growth and second–order sufficient optimality conditions. Optimization, 71(1):97–115, 2022.
  • [19] D. Fernández and M.V. Solodov. Local convergence of exact and inexact augmented Lagrangian methods under the second–order sufficient optimality condition. SIAM Journal on Optimization, 22(2):384–407, 2012.
  • [20] R.N. Gasimov. Augmented Lagrangian duality and nondifferentiable optimization methods in nonconvex programming. Journal of Global Optimization, 24:187–203, 2002.
  • [21] J.D. Gonçalves de Melo. On general augmented Lagrangians and a modified subgradient algorithm. PhD thesis, IMPA, Rio de Janeiro, 2009.
  • [22] A.R. Hedar. Global optimization test problems. http://www-optima.amp.i.kyoto-u.ac.jp/member/student/hedar/Hedar_files/TestGO.htm.
  • [23] M.R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4(5):303–320, 1969.
  • [24] W. Hock and K. Schittkowski. Test examples for nonlinear programming codes. Journal of Optimization Theory and Applications, 30:127–129, 1980.
  • [25] R. Kasimbeyli, O. Ustun, and A.M. Rubinov. The modified subgradient algorithm based on feasible values. Optimization, 58(5):535–560, 2009.
  • [26] T. Pennanen. Local convergence of the proximal point algorithm and multiplier methods without monotonicity. Mathematics of Operations Research, 27(1):170–191, 2002.
  • [27] M.J.D. Powell. A method for nonlinear constraints in minimization problems. Optimization, pages 283–298, 1969.
  • [28] R.T. Rockafellar. Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM Journal on Control, 12(2):268–285, 1974.
  • [29] R.T. Rockafellar and R.J.B. Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.