This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Strong contraction mapping and topological non-convex optimization

Siwei Luo siuluosiwei@gmail.com The University of Illinois at Chicago, 1200 W Harrison St, Chicago, IL 60607
Abstract

The strong contraction mapping, a self-mapping that the range is always a subset of the domain, admits a unique fixed-point which can be pinned down by the iteration of the mapping. We introduce a topological non-convex optimization method as an application of strong contraction mapping to achieve global minimum convergence. The strength of the approach is its robustness to local minima and initial point position.

Strong contraction mapping, Banach fixed-point theorem, Kakutani fixed-point theorem, Cantor’s intersection theorem, topological non-convex optimization

I Introduction

Calculus of variation plays a critical role in the modern calculation. Mostly, the function of interest cannot be obtained directly but makes a functional attain its extremum. This leads people to construct different functionals according to different questions, such as the least action principle, Fermat’s principle, maximum likelihood estimation, finite element analysis, machine learning and so forth. These methods provide us the routine to transfer the original problem to an optimization problem, which makes optimization methods have been used almost everywhere.

However, the popular gradient-based optimization methods be applied to a non-convex function with many local minima is unprincipled. They face great challenges of finding the global minimum point of the function. Because the information from the derivative of a single point is not sufficient to decide the global geometrical property of the function. In the gradient-based methods, the domain of the searching area is divided into several subsets with regards to different local minima. And usually, iterating points will converge to one local minimum depends on where the initial point starts from. For a successful minimum point convergence, the initial point luckily happens to be sufficiently near the global minimum point.

It is the very time to think outside box and try to cultivate some method other than gradient-based methods. Before that, let us first think about why gradient-based optimization methods fail the task of global minimum convergence. Let (X,d) be a metric space and T:X \rightarrow X is a self-mapping. For the inequality that,

d(T(x),T(y))qd(x,y),x,yX.d(T(x),T(y))\leq qd(x,y),\forall x,y\in X. (1)

if q[0,1),q\in[0,1), T is called contractive; if q[0,1],q\in[0,1], T is called nonexpansive; if q<q<\infty, T is called Lipschitz continuousHusain ; Latif . The gradient-based methods are usually nonexpansive mapping. The fixed point may exist but is not unique for a general situation. For instance, if the objective function f(x)f(x) has many local minima, for the gradient descent method mapping T(x)=xγf(x)T(x)=x-\gamma\nabla f(x), any local minimum is a fixed point accordingly. From the perspective of spectra of a bounded operator, for a nonexpansive mapping, any minimum of the objective function is an eigenvector of the eigenvalue equation T(x)=λxT(x)=\lambda x , in which λ=1\lambda=1. In the optimization problem, nonexpansive mapping sometimes works but their weakness is obvious.

It is worth to note that to summarize the condition for the existence of fixed-point of non-expansive mapping or Lipschitz continuous mapping is a great challenge. Topologically, to prove the existence of fixed-point of the contraction mapping is relatively easier than to prove the existence of fixed-point of non-expansive mapping or Lipschitz continuous mapping. For contraction mapping, the set of the range is shrinking. While, for non-expansive mapping or Lipschitz continuous mapping, the change of range set can involve complicated transformation and rotationAhues ; Rudin . There is no straightforward way to find the fixed-point of mapping. The fixed-point(or fixed-points) of Lipschitz continuous mapping, even if it exists, can cunningly hide inside an inflating volume. Because both the existence and uniqueness of solution are important in optimization problem so that the contractive mapping is more favored than the nonexpansive mapping.

The well-known Banach fixed-point theorem, as the first fixed-point theorem on contraction mapping, plays an important role in solving linear or nonlinear system. But for optimization problems, the condition of contraction mapping T:XXT:X\rightarrow X that d(T(x),T(y))qd(x,y)d(T(x),T(y))\leq qd(x,y), which usually requires convexity of the function, is hard to be applied to non-convex optimization problem. In this study, we are trying to extend the Banach fixed-point theorem to an applicable method for optimization problems, which is called strong contraction mapping.

Strong contraction mapping is a surjection self-mapping that always maps to the subset of its domain. We will prove that strong contraction mapping admits a unique fixed-point, explain how to build an optimization method as an application of strong contraction mapping and illustrate why its fixed-point is the global minimum point of the objective function.

II Strong contraction mapping and the fixed-point

Recall the definition of diameter D(X)D(X) of a metric space XX.

Definition 1.

Let (X,d)(X,d) be a metric space. And the metric measurement D(X)D(X) refers to the maximum distance between two points in the vector space X Fred :

D(X):=sup{d(x,y),x,yX}D(X):=sup\{d(x,y),\forall x,y\in X\} (2)
Definition 2.

Let (X,d)(X,d) be a complete metric space. Then a mapping T:XXT:X\rightarrow X is called weak contraction mapping on XX if the range of mapping T is always a subset of its domain during the iteration, namely, (T)=Xi+1Xi\mathcal{R}(T)=X_{i+1}\subset X_{i} and there exists a q[0,1)q\in[0,1) such that D(Xi+1)qD(Xi)D(X_{i+1})\leq qD(X_{i}).

This contraction mapping is called strong because the requirement D(Xi+1)q(Xi)D(X_{i+1})\leq q(X_{i}) is looser than d(T(x),T(y))qd(x,y)d(T(x),T(y))\leq qd(x,y) what in Banach fixed-point theorem, which doesn’t require the distance between two points getting smaller and smaller but the diameter of the range of the mapping getting smaller and eventually shrinking to a point. Thereafter, the inequality d(T(x),T(y))qd(x,y)d(T(x),T(y))\leq qd(x,y) is included by the inequality D(Xi+1)q(Xi)D(X_{i+1})\leq q(X_{i}) as a special case.

Theorem 1.

Let (X,d)(X,d) be a non-empty complete metric space with strong contraction mapping T:XXT:X\rightarrow X. Then T admits a unique fixed-point xx^{*} in XX such that x=T(x)x^{*}=T(x^{*}).

To prove the Theorem.1, one can follow the same logic of Banach fixed-point theorem proofBanach but substitute the inequality d(T(x),T(y))qd(x,y)d(T(x),T(y))\leq qd(x,y) with D(Xi+1)q(Xi)D(X_{i+1})\leq q(X_{i}). Let x0Xx_{0}\in X be arbitrary initial point and define a sequence {xi}\{x_{i}\} as: xi=T(xi1)x_{i}=T(x_{i-1}).

Lemma 1.1.

{xi}\{x_{i}\} is a Cauchy sequence in (X,d)(X,d) and hence converges to a limit xx^{*} in XX.

Proof.Let m,nm,n\in\mathbb{N} such that m>nm>n.

d(xm,xn)D(Xn)qnD(X0)\begin{split}d(x_{m},x_{n})&\leq D(X_{n})\\ &\leq q^{n}D(X_{0})\\ \end{split}

Let ϵ>0\epsilon>0 be arbitrary, since q[0,1)q\in[0,1), we can find a large NN\in\mathbb{N} such that

qNϵD(X0).q^{N}\leq\frac{\epsilon}{D(X_{0})}.

Hence, by choosing m,nm,n large enough:

d(xm,xn)qnD(X0)ϵD(X0)D(X0)=ϵ.d(x_{m},x_{n})\leq q^{n}D(X_{0})\leq\frac{\epsilon}{D(X_{0})}D(X_{0})=\epsilon.

Thus, {xi}\{x_{i}\} is Cauchy sequence. \Box

As long as D(X0)D(X_{0}) is bounded, the convergence is guaranteed, which is independent of the choice of x0x_{0}.

Lemma 1.2.

x:=limxnx^{*}:=\lim x_{n} is a fixed-point of TT in XX.

Proof.

x=limixi=limnT(xi1)x=limixi=T(limnxi1)\begin{split}x^{*}=\lim_{i\rightarrow\infty}x_{i}=\lim_{n\rightarrow\infty}T(x_{i-1})\\ x^{*}=\lim_{i\rightarrow\infty}x_{i}=T(\lim_{n\rightarrow\infty}x_{i-1})\end{split}

Thus,x=T(x)x^{*}=T(x^{*}). \Box

Definition 3.

X:=limXiX^{*}:=\lim X_{i} is a fixed-set of T in X.

Lemma 1.3.

xx^{*} is the only fixed-point of TT in (X,d)(X,d), the only element of XX^{*} and the diameter D(X)=0D(X^{*})=0 .

Proof. Suppose there exists another fixed-point y that T(y)=yT(y)=y, then choose the subspace XiX_{i} that both the xx^{*} and yy are the only elements in XiX_{i}. By definition, Xi+1=(T(Xi))X_{i+1}=\mathcal{R}(T(X_{i})) so that, both the xx^{*} and yy are elements in Xi+1X_{i+1}, namely,

0d(x,y)D(Xi+1)qD(Xi)=qd(x,y)0\leq d(x^{*},y)\leq D(X_{i+1})\leq qD(X_{i})=qd(x^{*},y)\\
d(x,y)=0d(x^{*},y)=0
D(X)=D(limiXi)=limiD(Xi)limiqnD(X0)=0D(X^{*})=D(\lim_{i\rightarrow\infty}X_{i})=\lim_{i\rightarrow\infty}D(X_{i})\leq\lim_{i\rightarrow\infty}q^{n}D(X_{0})=0

Since (X,d)(X,d) is a non-empty complete metric space and the diameter D(X)=0D(X^{*})=0, XX^{*} has a single element xx^{*}.

Thus x=yx^{*}=y. \Box

In this section, we have proven the existence and uniqueness of fixed-point of strong contraction mapping. Compared with contraction map, the strong contraction map can address a much wider range of problems as the requirement D(T(xi))D(xi)D(T(x_{i}))\leq D(x_{i}) is looser than d(T(x),T(y))d(x,y)d(T(x),T(y))\leq d(x,y). Different from d(T(x),T(y))d(x,y)d(T(x),T(y))\leq d(x,y), the inequality D(T(xi))D(xi)D(T(x_{i}))\leq D(x_{i}) doesn’t require xix_{i} in sequence {xn}\{x_{n}\} must move closer to each other for every step but confine the range of mapping to be smaller and smaller. Thereafter, the sequence {xn}\{x_{n}\} is Cauchy sequence and converge to the fixed-point. It is worth to mention if some mapping TT is not a strong contraction mapping. Still, one can think about whether TnT^{n} or TnT^{-n} is a strong contraction mapping to find a way around. If that is the case, then the iteration of TnT^{n} or TnT^{-n} yields a fixed-point.

III Optimization algorithm implementation

After the discussion of strong contraction mapping, let us think about how to construct an optimization algorithm occupied with the property of strong contraction mapping.

The objective function ff is a mapping defined on X, f:XRf:X\rightarrow R.

Definition 4.

And an affine hyperplane is a subset HH of X×RX\times R of the form

H={xX×R;h(x)=L}H=\{x\in X\times R;h(x)=L\}

where hh is a linear functional that does not vanish identically and LL\in\mathcal{R} is a given constantBrezis .

To overcome the dilemma that the optimization method may be saturated by local minima, intuitively, one can utilize a hyperplane that parallels to the domain to cut the objective function ff such that the intersection between the hyperplane and the function are contours. They reflect the global geometrical property of the function. The difficulty is how to iterate the hyperplane to move downwards and decide the position of the global minimum point.

Definition 5.

Contours is a subset C of X×RX\times R of the form

C={xX×R;h(x)=LR,f(x)=LR}C=\{x\in X\times R;h(x)=L\in R,f(x)=L\in R\}

Observing that contours will divide the objective function ff into two parts. One is higher than the height of contours and the other is lower than the height of contours. Now, our task is to map to a point lower than the height of contours. As a numerical method, instead of getting all points of contours or symbolic expression of contours, we want to get n roots on the contours using root finding algorithm, that is

rji=f1(Li)=f1(f(xi)),j{1,2,,n}r_{j}^{i}=f^{-1}(L_{i})=f^{-1}(f(x_{i})),\hskip 2.0pt\forall j\in\{1,2,...,n\} (3)

where, ii indicates iith iterate and jj indicate jjth root.

First, provide one arbitrary initial point x0x_{0} to the function and calculate the height of contours f(x0)=L0f(x_{0})=L_{0} at the height of the initial point;

If the objective function ff is non-convex, as a consequence, to map to a point lower than the height of contours cannot be achieved by simply averaging the roots on contours. However, the objective function ff mostly can be decomposed into many locally convex subsets. We can map to a point lower than the height of contours by averaging the roots belong to the same locally convex subsets and iterate hyperplane downwards.

A function g:ng:\mathcal{R}^{n}\rightarrow\mathcal{R} is convex, if for every x,ynx,y\in\mathcal{R}^{n} and λ[0,1]\lambda\in[0,1], we have the inequality Bertsekas

g(λx+(1λ)y)λg(x)+(1λ)g(y)g(\lambda x+(1-\lambda)y)\leq\lambda g(x)+(1-\lambda)g(y) (4)

The inequality d(xm,xn)qnD(X0)d(x_{m},x_{n})\leq q^{n}D(X_{0}) indicates the rate of convergence. The smaller q is, the higher rate of convergence is achieved. To achieve high rate of convergence, it is important to extend the Inequality.4 to the situation with regards to many points r1,r2,,rnr_{1},r_{2},...,r_{n} ,

g(λ1r1+λ2r2++λnrn)λ1g(r1)+(1λ1)g(λ2r2++λnrn1λ1)g(\lambda_{1}r_{1}+\lambda_{2}r_{2}+...+\lambda_{n}r_{n})\leq\lambda_{1}g(r_{1})+(1-\lambda_{1})g(\frac{\lambda_{2}r_{2}+...+\lambda_{n}r_{n}}{1-\lambda_{1}})
g(λ1r1+λ2r2++λnrn)λ1g(r1)+λ2g(r2)+(1λ1λ2)g(λ3r3++λnrn1λ1λ2)g(\lambda_{1}r_{1}+\lambda_{2}r_{2}+...+\lambda_{n}r_{n})\leq\lambda_{1}g(r_{1})+\lambda_{2}g(r_{2})+(1-\lambda_{1}-\lambda_{2})g(\frac{\lambda_{3}r_{3}+...+\lambda_{n}r_{n}}{1-\lambda_{1}-\lambda_{2}})

By induction,

g(λ1r1+λ2r2++λnrn)λ1g(r1)+λ2g(r2)++λng(rn)g(\lambda_{1}r_{1}+\lambda_{2}r_{2}+...+\lambda_{n}r_{n})\leq\lambda_{1}g(r_{1})+\lambda_{2}g(r_{2})+...+\lambda_{n}g(r_{n}) (5)

which is Jensen’s Inequality Chandler ,

g(j=1nλjrj)j=1nλjg(rj)g(\sum_{j=1}^{n}\lambda_{j}r_{j})\leq\sum_{j=1}^{n}\lambda_{j}g(r_{j}) (6)

where,

i=1nλi=1\sum_{i=1}^{n}\lambda_{i}=1

When we apply the Jensen’s Inequality.6 to the objective function ff. Let λj=1n\lambda_{j}=\frac{1}{n},for j=1,2,,nj=1,2,...,n, the Jensen’s Inequality.6 turns to be a strict inequality, that is

f(λ1r1+λ2r2++λnrn)<λ1f(r1)+λ2f(r2)++λnf(rn)=f(xi)=Lif(\lambda_{1}r_{1}+\lambda_{2}r_{2}+...+\lambda_{n}r_{n})<\lambda_{1}f(r_{1})+\lambda_{2}f(r_{2})+...+\lambda_{n}f(r_{n})=f(x_{i})=L_{i} (7)

Let xi+1=1nj=1nrjix_{i+1}=\frac{1}{n}\sum_{j=1}^{n}r_{j}^{i} , that is

f(xi+1)=Li+1<f(xi)=Li,i=1,2,,nf(x_{i+1})=L_{i+1}<f(x_{i})=L_{i},\hskip 10.0pt\forall i=1,2,...,n

Therefore, it is important to check whether two roots belong to the same locally convex subset and traverse all roots, decompose them into these locally convex subsets and get averages accordingly. Based on the Inequality.4, one practical way to achieve that is to pair every two roots on contours and scan function’s value along the segment between the two roots and check whether there exists a point higher than contours’ height. Traverse all the roots and apply this examination on them, then we can decompose the roots with regards to different locally convex subsets. To check whether two roots belong to the same convex subset, NN number of random points along the segment between two roots are checked whether higher than the contour’s level or notSchachter ; Tseng . If the Inequality.4 always holds for NN times of test, then we believe the two roots locate at the same locally convex subset and store them together.

After the set of roots being decomposed into several locally convex subsets, the average of roots in the same locally convex subset is always lower than the contours’ height due to Jensen’s Inequality.6.

Theorem 2.

Provided there is a unique global minimum point xminx_{min} of the objective function ff, then the fixed-point xx^{*} of the strong contraction mapping TT is the global minimum point of the function.

Since the iterating point xi+1x_{i+1} is always lower than xix_{i} ,

0f(xi+1)f(xmin)<f(xi)f(xmin)\begin{split}0\leq f(x_{i+1})-f(x_{min})<f(x_{i})-f(x_{min})\\ \end{split}

Hence, there exists a ξ(0,1)\xi\in(0,1) such that,

0f(xi+1)f(xmin)<ξi(f(x0)f(xmin))0\leq f(x_{i+1})-f(x_{min})<\xi^{i}(f(x_{0})-f(x_{min}))

As ii goes to infinity, then

0limif(xi+1)f(xmin)<limiξi(f(x0)f(xmin))limif(xi+1)f(xmin)=0limif(xi+1)=f(xmin)f(limixi+1)=f(xmin)f(x)=f(xmin)\begin{split}0\leq\lim_{i\rightarrow\infty}f(x_{i+1})-f(x_{min})&<\lim_{i\rightarrow\infty}\xi^{i}(f(x_{0})-f(x_{min}))\\ \lim_{i\rightarrow\infty}f(x_{i+1})&-f(x_{min})=0\\ \lim_{i\rightarrow\infty}f(x_{i+1})&=f(x_{min})\\ f(\lim_{i\rightarrow\infty}x_{i+1})&=f(x_{min})\\ f(x^{*})&=f(x_{min})\end{split}

Because the fixed-point xx^{*} is at the same height as the global minimum point xminx_{min} and the global minimum point is unique. Thus, the fixed-point xx^{*} coincides with the global minimum point. The iteration xi+1=Txix_{i+1}=T{x_{i}} yields the fixed-point xx^{*}that solves the equation T(x)=xT(x^{*})=x^{*}.Taylor

Rewrite the mapping that xi+1=T(xi)x_{i+1}=T(x_{i}),which is averaging of roots r1,r2,rnr_{1},r_{2},...r_{n} that locate at the same locally convex subset,

xi+1=T(xi)=1nj=1nrj=1nj=1nf1(f(xi))x_{i+1}=T(x_{i})=\frac{1}{n}\sum_{j=1}^{n}r_{j}=\frac{1}{n}\sum_{j=1}^{n}f^{-1}(f(x_{i})) (8)

Then, like a Russian doll, the dynamical system can be explicitly written as an expansion of iterates as

xi+1=T(xi)=Ti+1(x0)=1nj=1nrj=1nj=1nf1(f(xi))=1nj=1nf1(f(1qj=1qf1(f(xi1))))=1nj=1nf1(f(1qj=1qf1(f(1kj=1kf1(f(x0))))))\begin{split}x_{i+1}&=T(x_{i})=T^{i+1}(x_{0})\\ &=\frac{1}{n}\sum_{j=1}^{n}r_{j}=\frac{1}{n}\sum_{j=1}^{n}f^{-1}(f(x_{i}))\\ &=\frac{1}{n}\sum_{j=1}^{n}f^{-1}(f(\frac{1}{q}\sum_{j=1}^{q}f^{-1}(f(x_{i-1}))))\\ &\vdots\\ &=\frac{1}{n}\sum_{j=1}^{n}f^{-1}(f(\frac{1}{q}\sum_{j=1}^{q}f^{-1}(f(\dots\frac{1}{k}\sum_{j=1}^{k}f^{-1}(f(x_{0}))\dots))))\end{split} (9)

After the set of roots are decomposed into several convex subsets, the averages of roots with regards to each subset are calculated and the lowest one is returned as an update point from each iterate. Thereafter, the remaining calculation is to repeat the iterate over and over until convergence and return the converged point as the global minimum. The decomposition of roots set provided a divide-and-conquer method to transfer the original problem to a number of subproblems and solve them recursively. Cormen

In summary, the main procedure of topological non-convex optimization method is decomposed into the following steps: 1. Given the initial guess point x0x_{0} for the objective function and calculate the contour level L0L_{0}; 2. Solve the equation f(xi)=Lif(x_{i})=L_{i} and get nn number of roots. Decompose the set of roots into several convex subsets, return the lowest average of roots as an update point from each iterate; 3. Repeat the above iterate until convergence.

Algorithm 1 topological non-convex optimization
input: x0x_{0};
initialize: set tolerance ϵ\epsilon;
repeat:
   calculate Li=f(xi)L_{i}=f(x_{i});
   vector<root>vector{<}root{>} convex[Num_Of_Convex]convex[Num\_Of\_Convex];
  for(j=1;j<m;j++)(j=1;j<m;j{++})
      rji=f1(Li)=f1(f(xi))r_{j}^{i}=f^{-1}(L_{i})=f^{-1}(f(x_{i}));   \Diamond solved by root finding algorithm
  Traverse and pair two roots rpir_{p}^{i} and rqir_{q}^{i}
      bool flag = true;
     for(k=1;k<N;k++)(k=1;k<N;k{++}){
         λ\lambda = random(0,1);
        if(f(λrpi+(1λ)rqi)>=Li)(f(\lambda r_{p}^{i}+(1-\lambda)r_{q}^{i})>=L_{i}) flag = false,break;   \Diamond check whether they belong to the same locally convex subset
      }
     if (flag){
         convex[ζ].push_back(rpi)convex[\zeta].push\_back(r_{p}^{i});
         convex[ζ].push_back(rqi)convex[\zeta].push\_back(r_{q}^{i});   \Diamond store roots into a same locally convex subset
     }
  for(ζ=0;ζ<convex.size();ζ++)(\zeta=0;\zeta<convex.size();\zeta{++})
      αζ=1nj=1nrji\alpha_{\zeta}=\frac{1}{n}\sum_{j=1}^{n}r_{j}^{i};
   xi+1=argmin(f(αζ))x_{i+1}=argmin(f(\alpha_{\zeta}));   \Diamond choose the lowest average to achieve higher rate of convergence
until: xi+1xi<ϵx_{i+1}-x_{i}<\epsilon
return: xi+1x_{i+1}

According to the previous discussion, the mechanism of the proposed optimization method has posted some restrictions to the objective function. In summary, it requests the objective function to be continuous, occupied with a unique global minimum point, can be decomposed into many locally convex components and its domain is non-empty complete. And the good news is the most of real-world non-convex optimization problems can satisfy these prerequisites.

IV Experiments on Sphere, McCormick and Ackley functions

First of all, the optimization algorithm has been tested on a convex function the Sphere function f(x)=xi2f(\boldmath{x})=\sum{x^{2}_{i}}. The minimum is (0,0,0), where f(0,0,0)=0f(0,0,0)=0. The iterations of roots and contours is shown in FIG.1 and the update of searching point is shown in TABLE.1.

Refer to caption
Figure 1: The red point markers are the roots and spherical surface is the contour is 3D space for each iteration. The contour in 3D space is a ball for Sphere function. The intermediate steps illustrate how the contour shrink to a point during the procedure.
iterate updating point height of contour
0 (1.00,1.00,1.00) 3.0000
1 (0.15,0.36,-0.25) 0.2146
2 (-0.1,-0.11,-0.042) 0.0237
3 (-0.024,-0.0012,-0.0001) 0.0004
4 (-0.0061,-0.00033,-0.0045)
Table 1: When the optimization method is tested on Sphere function, the average of roots and the level of contour for each iteration is shown above.

Then, we test the optimization algorithm on the McCormick function. And the first 4 iterates of roots and contour is shown in FIG.2 and the detailed iteration of the searching point from the numerical calculation is shown in TABLE.2. The test result indicate the average of roots can move towards the global minimum point (-0.54719,-1.54719), where f(0.54719,1.54719)=1.9133f(-0.54719,-1.54719)=-1.9133.

Refer to caption
Figure 2: The point markers are the roots and the dash lines are the contours during iterates. The first 4 iteration results are drawn to illustrate the procedure of optimization test on the McCormick function.
iterate updating point height of contour
0 (2.00000,2.0000) 2.2431975047
1 (-0.6409,-0.8826) -1.1857067055
2 (-0.8073,-1.8803) -1.7770492114
3 (-0.5962,-1.4248) -1.8814760998
4 (-0.4785,-1.5162) -1.9074191216
5 (-0.5640,-1.5686) -1.9125755974
6 (-0.5561,-1.5467) -1.9131043354
7 (-0.5474,-1.5465) -1.9132219834
8 (-0.5473,-1.5472)
Table 2: When the optimization method is tested on McCormick function, the average of roots and the level of contour for each iteration is shown above.

Ackley function would be a nightmare for most gradient-based methods. Nevertheless, the algorithm has been tested on Ackley function where the global minimum locates at (0,0) that f(0,0)=0f(0,0)=0. And the first 6 iterates of roots and contours are shown in FIG.3 and the minimum point (-0.00000034,0.00000003) return by the algorithm is shown in TABLE.3. The test result shows that the optimization algorithm is robust to local minima and able to achieve the global minimum convergence. The quest to locate the global minimum pays off handsomely.

Refer to caption
Figure 3: The point markers are the roots and the dash lines are contours during iterates. The first 6 iteration results are drawn to illustrate the procedure of optimization test on the Ackley function. The intermediate steps show how contours ignore the existence of local minima and safely approach to the global minimum point.
iterate updating point height of contour
0 (2.00000000,2.00000000) 6.59359908
1 (-0.78076083,-1.34128187) 5.82036224
2 (-0.35105371,-0.62030933) 4.11933422
3 (-0.20087095,0.38105138) 3.09359564
4 (0.06032320,-0.88101860) 2.17077104
15 (0.00000404,-0.00000130) 0.00001199
16 (-0.00000194,-0.00000079) 0.00000591
17 (-0.00000034,0.00000003)
Table 3: When the optimization method is tested on Ackley function, the average of roots and the level of contour for each iteration is shown above.

The observation from these experiments is that the size of contour become smaller and smaller during the iterative process and eventually converge to a point, which is the global minimum point of the function.

V Conclusion

We introduced the definition of the strong contraction mapping and the existence and uniqueness of its fixed-point in this paper. As an extension of Banach fixed-point theorem, the iteration of strong contraction mapping is a Cauchy sequence and yields the unique fixed-point, which perfectly fit with the task of optimization. The global minimum convergence regardless of local minima and initial point position is a very significant strength for the optimization algorithm. And we illustrated how to implement an optimization method occupied with strong contraction mapping property. This topological optimization method finds a way around that even if the objective function is non-convex, still, we can decompose it into many convex components and take advantage of convexity of a locally convex component to pin down the global minimum point. This optimization method has been tested on Sphere, McCormick and Ackley functions and successfully achieved the global minimum convergence as expected. These experiments demonstrate the contours’ shrinking and the iterating point’s approaching the global minimum point. We look forward to extending our study to the higher dimensional situation and believe that the optimization method works for that in principle.

References

  • (1) Husain, T., and Abdul Latif. ”Fixed points of multivalued nonexpansive maps.” International Journal of Mathematics and Mathematical Sciences 14.3 (1991): 421-430.
  • (2) Latif, Abdul, and Ian Tweddle. ”On multivalued f-nonexpansive maps.” Demonstratio Mathematica 32.3 (1999): 565-574.
  • (3) Ahues, Mario, Alain Largillier, and Balmohan Limaye. Spectral computations for bounded operators. Chapman and Hall/CRC, 2001.
  • (4) Rudin, Walter. ”Functional analysis. 1991.” Internat. Ser. Pure Appl. Math (1991).
  • (5) Brezis, Haim. Functional analysis, Sobolev spaces and partial differential equations. Springer Science Business Media, 2010.
  • (6) Croom, Fred H. Principles of topology. Courier Dover Publications, 2016.
  • (7) Kiwiel, K. C. (2001). Convergence and efficiency of subgradient methods for quasiconvex minimization. Mathematical programming, 90(1), 1-25.
  • (8) Banach, S. (1922). Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fund math, 3(1), 133-181.
  • (9) Branciari, A. (2002). A fixed point theorem for mappings satisfying a general contractive condition of integral type. International Journal of Mathematics and Mathematical Sciences, 29(9), .
  • (10) Taylor, A. E., & Lay, D. C. (1958). (Vol. 2). New York: Wiley.
  • (11) Schachter, Bruce. ”Decomposition of polygons into convex sets.” IEEE Transactions on Computers 11 (1978): 1078-1082.
  • (12) Tseng, Paul. ”Applications of a splitting algorithm to decomposition in convex programming and variational inequalities.” SIAM Journal on Control and Optimization 29.1 (1991): 119-138.
  • (13) Arveson, W. (2006). A short course on spectral theory (Vol. 209). Springer Science Business Media.
  • (14) Blanchard, P.; Devaney, R. L.; Hall, G. R. (2006). Differential Equations. London: Thompson. pp. 96–111. ISBN 0-495-01265-3.
  • (15) Bertsekas, Dimitri (2003). Convex Analysis and Optimization. Athena Scientific.
  • (16) Chandler, David. ”Introduction to modern statistical mechanics.” Introduction to Modern Statistical Mechanics, by David Chandler, pp. 288. Foreword by David Chandler. Oxford University Press, Sep 1987. ISBN-10: 0195042778. ISBN-13: 9780195042771 (1987): 288.
  • (17) Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2009.