This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Laboratoire d’Informatique (LIX), CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France22institutetext: École Normale Supérieure

Proven Runtime Guarantees for How the MOEA/D Computes the Pareto Front From the Subproblem Solutions

Benjamin Doerr 11 0000-0002-9786-220X    Martin S. Krejca 11 0000-0002-1765-1219    Noé Weeks 22
Abstract

The decomposition-based multi-objective evolutionary algorithm (MOEA/D) does not directly optimize a given multi-objective function ff, but instead optimizes N+1N+1 single-objective subproblems of ff in a co-evolutionary manner. It maintains an archive of all non-dominated solutions found and outputs it as approximation to the Pareto front. Once the MOEA/D found all optima of the subproblems (the gg-optima), it may still miss Pareto optima of ff. The algorithm is then tasked to find the remaining Pareto optima directly by mutating the gg-optima.

In this work, we analyze for the first time how the MOEA/D with only standard mutation operators computes the whole Pareto front of the OneMinMax benchmark when the gg-optima are a strict subset of the Pareto front. For standard bit mutation, we prove an expected runtime of O(nNlogn+nn/(2N)Nlogn)O(nN\log n+n^{n/(2N)}N\log n) function evaluations. Especially for the second, more interesting phase when the algorithm start with all gg-optima, we prove an Ω(n(1/2)(n/N+1)N2n/N)\Omega(n^{(1/2)(n/N+1)}\sqrt{N}2^{-n/N}) expected runtime. This runtime is super-polynomial if N=o(n)N=o(n), since this leaves large gaps between the gg-optima, which require costly mutations to cover.

For power-law mutation with exponent β(1,2)\beta\in(1,2), we prove an expected runtime of O(nNlogn+nβlogn)O\mathopen{}\mathclose{{}\left(nN\log n+n^{\beta}\log n}\right) function evaluations. The O(nβlogn)O\mathopen{}\mathclose{{}\left(n^{\beta}\log n}\right) term stems from the second phase of starting with all gg-optima, and it is independent of the number of subproblems NN. This leads to a huge speedup compared to the lower bound for standard bit mutation. In general, our overall bound for power-law suggests that the MOEA/D performs best for N=O(nβ1)N=O(n^{\beta-1}), resulting in an O(nβlogn)O(n^{\beta}\log n) bound. In contrast to standard bit mutation, smaller values of NN are better for power-law mutation, as it is capable of easily creating missing solutions.

Keywords:
MOEA/D multi-objective optimization runtime analysis power-law mutation.

1 Introduction

Many real-world problems require the simultaneous optimization of different objectives. In this setting, known as multi-objective optimization, different solutions may not be comparable based on their objective values, as one solution can win over another solution in one objective, but lose in another objective. This results in a set of incomparable optimal objective values, commonly referred to as Pareto front. The aim in multi-objective optimization is to find the Pareto front of a problem, or a good approximation thereof.

Due to their population-based and heuristic nature, evolutionary algorithms lend themselves very well to multi-objective optimization, and they have been successfully applied for decades to a plethora of hard multi-objective optimization problems [32]. This strong interest has led to a variety of algorithms [24, 32], following different paradigms.

From the early days of theoretical analyses of evolutionary algorithms on, multi-objective evolutionary algorithms have been analyzed also via theoretical means [21, 17, 12, 22]. This area saw a boost of activity in the last two years, when the first mathematical runtime analysis of the NSGA-II [31] inspired many deep analyses of this important algorithm and variants such as the NSGA-III or SMS-EMOA [26, 1, 9, 4, 10, 11, 3, 2, 23, 27, 28, 29, 30, 20].

A substantially different, yet also very important algorithm is the decomposition-based multi-objective evolutionary algorithm (MOEA/D) [25]. It decomposes a multi-objective optimization problem ff into various single-objective subproblems of ff. These subproblems are optimized in parallel. While doing so, the non-dominated solutions for ff are maintained in an archive.

Despite its popularity in empirical research and good performance in real-world problems [24, 32], the MOEA/D has not been extensively studied theoretically [19, 15, 14, 5]. In particular, it is currently not known how the basic MOEA/D using only standard mutation operators as variation operators finds Pareto optima that are not already an optimum of one of the subproblems (we refer to Section 2 for a detailed discussion of the previous works). We recall that the MOEA/D solves a number of single-objective subproblems essentially via single-objective approaches. Hence, if all Pareto optima of the original problem appear as optima of subproblems (we call these gg-optima in the remainder), this is the setting regarded, e.g., in [19], then the multi-objective aspect of the problem vanishes and the only question is how efficiently the single-objective approaches solve the subproblems.

Our Contribution.

Naturally, in a typical application of the MOEA/D, one cannot assume that the subproblems are sufficiently numerous and evenly distributed so that each Pareto optimum appears as gg-optimum. To better understand how the MOEA/D copes with such situations, we study in this work mathematically how the MOEA/D computes the full Pareto front when started with a population consisting of all gg-optima. In this first work on this topic, as often in the mathematical runtime analysis, we consider the basic OneMinMax (OMM\mathrm{OMM}) benchmark [13]. As in most previous works, we assume that the N+1N+1 subproblems are constructed in a manner that the corresponding gg-optima are spread equidistantly, with respect to the Hamming distance, across the Pareto front. We regard the basic MOEA/D using standard bit mutation as only variation operator. Interestingly, this is the first theoretical analysis of the MOEA/D in this setting (apart from the case that the gg-optimal cover the Pareto front [19]). Hence our results (Theorem 5.1), together with standard estimates on this time to compute the gg-optima (Corollary 3), also give the first estimates on the full runtime of the MOEA/D in this setting (Corollary 1).

Since our results show that the main bottleneck for computing points on the Pareto front that are far from all gg-optima is that standard bit mutation rarely flips many bits (due to the concentration behavior of the binomial distribution), we also resort to the heavy-tailed mutation operator proposed by Doerr et al. [8]. Interestingly, this allows for drastic speed-ups when considering the phase of the optimization that starts with all gg-optima (Theorem 5.2).

In detail, our various results provide us with expected-runtime estimates for the MOEA/D with either mutation operator to optimize OMM\mathrm{OMM} (Corollaries 1 and 2). These results prove, respectively, an expected number of O(nNlogn+nn/(2N)Nlogn)O(nN\log n+n^{n/(2N)}N\log n) function evaluations for the MOEA/D with standard bit mutation, and O(nNlogn+nβlogn)O\mathopen{}\mathclose{{}\left(nN\log n+n^{\beta}\log n}\right) expected function evaluations for power-law mutation with power-law exponent β(1,2)\beta\in(1,2). In both results, the second term refers to the interesting phase where the algorithm is initialized with all gg-optima and is tasked to find the remaining Pareto optima of OMM\mathrm{OMM}.

Our overall bound for standard bit mutation yields O(n2logn)O(n^{2}\log n) in the best case of N=nN=n, matching the result by Li et al. [19, Proposition 44]. For general NN, the second term in our bound suggests that the MOEA/D performs best if N[n2..n]N\in[\frac{n}{2}..n] and that the runtime is super-polynomial once N=o(n)N=o(n). Moreover, we prove a lower bound of Ω(n(1/2)(n/N+1)N2n/N)\Omega(n^{(1/2)(n/N+1)}\sqrt{N}2^{-n/N}) for this second term (Theorem 5.1), which supports this runtime characterization. However, this lower bound is not necessarily applicable to the entire optimization of the MOEA/D on OMM\mathrm{OMM}, as we only prove it for a certain phase of the optimization process. Nonetheless, we believe that it is true for sufficiently large nn. We go more into detail about this behavior in Section 5 and especially in Section 5.1. Overall, our bounds suggest that standard bit mutation performs better when NN is large.

Our upper bound for power-law mutation is best once N=O(nβ1)N=O(n^{\beta-1}), resulting in a runtime bound of O(nβlogn)O(n^{\beta}\log n). The bound O(nβlogn)O(n^{\beta}\log n) stems from the second phase of the optimization, where the algorithm is initialized with all gg-optima (Theorem 5.2). It is noteworthy that this bound does not depend on NN, as, roughly, the time to fill in the gaps between gg-optima is inversely proportional to the cost of performing N+1N+1 function evaluations each iterations. Hence, the parameter NN only influences our bound for the first phase of finding all gg-optima. Overall, this suggests that the MOEA/D performs better when NN is small, which is opposite of the behavior with standard bit mutation. Moreover, the bound for power-law mutation is O(n2logn)O(n^{2}\log n) in the worst case, matching the best case of our bound for standard bit mutation. This suggests that power-law mutation is more preferable than standard bit mutation in our setting, as power-law mutation exhibits a far more robust runtime behavior.

Last, for each of the two phases we consider, we get independent results that hold for a larger class of algorithms (Lemmas 3 and 4) or functions (Theorems 5.1 and 5.2). Moreover, we prove an anti-concentration bound for the hypergeometric distribution close around its expected value (Lemma 2). Due to the more general setting, we believe that these results are of independent interest.

2 Related Work

Theoretical analyses of the MOEA/D so far are scarce. Most results do not consider the MOEA/D with only standard mutation operators. And those that do make simplifying assumptions about the decomposition, using problem-specific knowledge. We also note that we could not find any proofs for how the MOEA/D finds its reference point (one of its parameters), which is important in the general scenarios. If the reference point is mentioned, it is immediately assumed to be best-possible.

Li et al. [19] conducted the first mathematical analysis of the MOEA/D. They study the runtime of the algorithm on the classic bi-objective OneMinMax (OMM\mathrm{OMM}[13] and LeadingOnesTrailingZeros (LOTZ\mathrm{LOTZ})111We note that the authors call OMM\mathrm{OMM} COCZ, and that they consider a version of LOTZ\mathrm{LOTZ}, called LPTNO, that considers strings over {1,1}\{-1,1\} instead of binary strings, effectively replacing 0s with 1-1s. This changes some values for certain parameters but effectively results in the same bounds, to the best of our knowledge. Hence, we use the names OMM\mathrm{OMM} and LOTZ\mathrm{LOTZ} throughout this section. benchmarks of size nn, that is, the number of objective-function evaluations until the MOEA/D finds all Pareto optima of the problem. Both benchmarks have a Pareto front of size n+1n+1. The authors consider a decomposition of each problem into n+1n+1 subproblems—matching the size of the Pareto front—, and they assume that the MOEA/D uses standard bit mutation. The authors prove that if the subproblems are chosen such that the n+1n+1 gg-optima correspond one-to-one to the Pareto front of the problem, then the MOEA/D optimizes OMM\mathrm{OMM} in O(n2logn)O(n^{2}\log n) expected function evaluations, and LOTZ\mathrm{LOTZ} in O(n3)O(n^{3}). We note that this requires a different subproblem structure for OMM\mathrm{OMM} and LOTZ\mathrm{LOTZ}. Moreover, the authors show that if the MOEA/D uses the commonly used subproblem structure of OMM\mathrm{OMM} for LOTZ\mathrm{LOTZ}, then the gg-optima do not cover the entire Pareto front of LOTZ\mathrm{LOTZ}, for sufficiently large nn. In this case, the authors argue that optimization takes a long time, as the missing solutions need to be found, but these arguments are not made formal.

Huang and Zhou [14] analyze the MOEA/D when using contiguous hypermutations, a non-standard mutation operator stemming from artificial immune systems. The authors study two versions of contiguous hypermutation, and they consider the OMM\mathrm{OMM} and the LOTZ\mathrm{LOTZ} benchmarks as well as a deceptive bi-objective problem and one containing a plateau. Moreover, the authors consider a general decomposition of each problem into N+1N+1 subproblems. This decomposition is always the same and corresponds to the commonly used one, which results in evenly spread gg-optima on the Pareto front of OMM\mathrm{OMM}, also analyzed by Li et al. [19] above. Huang and Zhou [14] prove that both MOEA/D variants optimize each of the four benchmarks in O(Nn2logn)O(Nn^{2}\log n) expected function evaluations. Moreover, they prove the same runtime bounds for the 44-objective versions of OMM\mathrm{OMM} and LOTZ\mathrm{LOTZ}. Overall, their results suggest that a choice of N=O(1)N=O(1) is more beneficial, as the hypermutation operators are capable to cover larger distances than standard bit mutation and can thus find Pareto optima efficiently that do not correspond to gg-optima.

Huang et al. [15] analyze a variant of the MOEA/D that employs standard bit mutation as well as one-point crossover at a rate of 0.50.5. When performing crossover, one of the parents is the best solution of the current subproblem gg and the other one is chosen uniformly at random among the non-dominated solutions of the subproblems closest to gg with respect to the Euclidean distance in one of their parameters. The problem decomposition is always the commonly used one with equidistant gg-optima, as in the two papers discussed above. The authors consider essentially the same four222The authors actually treat LOTZ\mathrm{LOTZ} and LPTNO as two functions, but the results are the same, which is why we count it as one problem. problems as Huang and Zhou [14] above, and they also consider a general number of N+1N+1 subproblems. For LOTZ\mathrm{LOTZ} and the plateau function, the authors proved an expected runtime of O(Nn2)O(Nn^{2}) function evaluations, and an O(Nnlogn)O(Nn\log n) bound for OMM\mathrm{OMM} and the deceptive function.

Very recently, Do et al. [5] analyzed the MOEA/D on the multi-objective minimum-weight-base problem, which is an abstraction of classical NP-hard combinatorial problems. Their MOEA/D variant uses a decomposition based on weight scalarization, different from the previous works above. The authors then prove that this variant finds an approximation of the Pareto front of the problem within expected fixed-parameter polynomial time.

3 Preliminaries

We denote the natural numbers by \mathbb{N}, including 0, and the real numbers by \mathbb{R}. For a,ba,b\in\mathbb{R}, let [a..b]=[a,b][a..b]=[a,b]\cap\mathbb{N} and [a]=[1..a][a]=[1..a].

Let n1n\in\mathbb{N}_{\geq 1}. We consider bi-objective optimization problems, that is, functions f:{0,1}n2f\colon\{0,1\}^{n}\to\mathbb{R}^{2}. We always assume that the dimension n1n\in\mathbb{N}_{\geq 1} is given implicitly. When using big-O notation, it refers to asymptotics in this nn. In this sense, an event occurs with high probability if its probability is at least 1o(1)1-o(1).

We call a point x{0,1}nx\in\{0,1\}^{n} an individual and f(x)f(x) the objective value of xx. For all i[n]i\in[n] and j[2]j\in[2], we let xix_{i} denote the ii-th component of xx and fj(x)f_{j}(x) the jj-th component of f(x)f(x). Moreover, let |x|0|x|_{0} denote the number of 0s of xx, and let |x|1|x|_{1} denote its number of 11s.

For all u,v2u,v\in\mathbb{R}^{2}, we say that vv weakly dominates uu (written vuv\succeq u) if and only if for all i[2]i\in[2] holds that fi(v)fi(u)f_{i}(v)\geq f_{i}(u). We say that vv strictly dominates uu if and only if one of these inequalities is strict. We extend this notation to individuals, where a dominance holds if and only if it holds for their respective objective values.

We consider the maximization of bi-objective functions ff, that is, we are interested in \succeq-maximal elements, called Pareto-optimal individuals. The set of all objective values that are not strictly dominated, that is, the set F{v2x{0,1}n,f(x)v}F^{*}\coloneqq\{v\in\mathbb{R}^{2}\mid\not\exists x\in\{0,1\}^{n},f(x)\succ v\}, is called the Pareto front of ff.

OneMinMax

We analyze the OneMinMax (OMM\mathrm{OMM}) benchmark [13] problem, which returns for each individual the number of 0s as the first objective, and the number of 11s as the second objective. Formally, OMM:x(|x|0,|x|1)\mathrm{OMM}\colon x\mapsto(|x|_{0},|x|_{1}).

Note that each individual is Pareto optimal. The Pareto front of OneMinMax is {(i,ni),i[0..n]}\{(i,n-i),i\in[0..n]\}.

Mathematical Tools

We use the well-known multiplicative drift theorem [7] with tail bounds [6]. We present the theorem in a fashion that is sufficient for our purposes. Throughout this article, if we write for a stopping time TT and two unary formulas PP and QQ that for all tt\in\mathbb{N} with t<Tt<T holds that P(t)Q(t)P(t)\geq Q(t), then we mean that for all tt\in\mathbb{N} holds that P(t)𝟙{t<T}Q(t)𝟙{t<T}P(t)\cdot\mathds{1}\{t<T\}\geq Q(t)\cdot\mathds{1}\{t<T\}, where 𝟙\mathds{1} denotes the indicator function.

Theorem 3.1 (Multiplicative drift [7], upper tail bound [6][18, Theorem 2.4.52.4.5])

Let nn\in\mathbb{N}, let (Xt)t(X_{t})_{t\in\mathbb{N}} be a random process over [0..n][0..n], and let T=inf{tXt=0}T=\inf\{t\in\mathbb{N}\mid X_{t}=0\}. Moreover, assume that there is a δ>0\delta\in\mathbb{R}_{>0} such that for all tt\in\mathbb{N} with t<Tt<T holds that E[XtXt+1Xt]δXt\operatorname{E}[X_{t}-X_{t+1}\mid X_{t}]\geq\delta X_{t}. Then, E[T]1δlnn\operatorname{E}[T]\leq\frac{1}{\delta}\ln n, and for all r0r\in\mathbb{R}_{\geq 0} holds that Pr[T>1δ(r+lnn)]er\Pr[T>\frac{1}{\delta}(r+\ln n)]\leq\mathrm{e}^{-r}.

4 The MOEA/D

We analyze the decomposition-based multi-objective evolutionary algorithm (MOEA/D; Algorithm 1[25] for multi-objective optimization. The MOEA/D decomposes its objective function into a pre-specified number of single-objective subproblems. These subproblems are optimized in parallel. The MOEA/D maintains an archive of the \succeq-best solutions found so far, allowing it to find Pareto-optimal solutions for the original problem while only explicitly optimizing the subproblems.

More formally, given, besides others, a bi-objective optimization problem ff as well as a decomposition number N[n]N\in[n] and a weight vector w[0,1]N+1w\in[0,1]^{N+1}, the MOEA/D optimizes ff by decomposing ff into N+1N+1 single-objective subproblems {gi:{0,1}n×2}i[0..N]\{g_{i}\colon\{0,1\}^{n}\times\mathbb{R}^{2}\to\mathbb{R}\}_{i\in[0..N]}, weighted by ww. Each of these subproblems is subject to minimization (of an error). The MOEA/D maintains a population of N+1N+1 individuals (xi)i[0..N]({0,1}n)N+1(x_{i})_{i\in[0..N]}\in(\{0,1\}^{n})^{N+1} as well as a reference point z2z^{*}\in\mathbb{R}^{2} such that for each i[0..N]i\in[0..N], individual xix_{i} is the currently best solution found for subproblem ii with respect to zz^{*}. Ideally, the reference point is a point such that for all j[2]j\in[2], value zjz^{*}_{j} is optimal for objective fjf_{j}. To this end, the MOEA/D updates zz^{*} whenever it optimizes a subproblem. Moreover, the algorithm maintains a set P2P\subseteq\mathbb{R}^{2} (the Pareto front) of non-dominated objective values.

We consider subproblems that measure the maximum distance to the reference point, known as Chebyshev approach. That is, for all i[0..N]i\in[0..N], x{0,1}nx\in\{0,1\}^{n}, and z2z^{*}\in\mathbb{R}^{2}, it holds that

gi(x,z)=max(wi|z1f1(x)|,(1wi)|z2f2(x)|).\displaystyle g_{i}(x,z^{*})=\max\bigl{(}w_{i}\cdot|z^{*}_{1}-f_{1}(x)|,(1-w_{i})\cdot|z^{*}_{2}-f_{2}(x)|\bigr{)}. (1)

When minimizing subproblem i[0..N]i\in[0..N], the MOEA/D picks xix_{i} as parent and mutates it according to a given mutation operator. Afterward, it compares the offspring of the mutation to xix_{i} and selects the better one.333We note that the general MOEA/D allows to specify neighborhoods among the subproblems, which exchange non-dominated solutions among each other. In this article, we focus on no exchange.

We define the runtime of the MOEA/D on ff as the number of function evaluations of ff until the Pareto front PP of the algorithm is equal to the Pareto front FF^{*} of ff for the first time.

In this article, we consider the MOEA/D with different mutation operators.

Input: A decomposition number N1N\in\mathbb{N}_{\geq 1}, a weight vector w[0,1]N+1w\in[0,1]^{N+1}, subproblems {gi}i[0..N]\{g_{i}\}_{i\in[0..N]}, a mutation operator mut:{0,1}n{0,1}n\mathrm{mut}\colon\{0,1\}^{n}\to\{0,1\}^{n}, and a termination criterion.
1 Initialization: for each i[0..N]i\in[0..N], choose xix_{i} uniformly at random from {0,1}n\{0,1\}^{n}; set z1=maxi[0..N]f1(xi)z^{*}_{1}=\max_{i\in[0..N]}f_{1}(x_{i}), z2=maxi[0..N]f2(xi)z^{*}_{2}=\max_{i\in[0..N]}f_{2}(x_{i}), and iteratively add f(xi)f(x_{i}) to PP if there is no j[0..i1]j\in[0..i-1] such that xix_{i} is weakly dominated by xjx_{j};
2
3while stopping criterion is not met do
4       for each subproblem i[0..N]i\in[0..N] do
5             Mutation: ymut(xi)y\leftarrow\mathrm{mut}(x_{i});
6             Update zz^{*}: set z1max(z1,f1(y))z^{*}_{1}\leftarrow\max(z^{*}_{1},f_{1}(y)), z2max(z2,f2(y))z^{*}_{2}\leftarrow\max(z^{*}_{2},f_{2}(y));
7             Update xix_{i}: if gi(y,z)gi(xi,z)g_{i}(y,z^{*})\leq g_{i}(x_{i},z^{*}), then xiyx_{i}\leftarrow y;
8             Update PP: remove all elements weakly dominated by f(y)f(y) from PP and add f(y)f(y) to PP if it is not weakly dominated by an element of PP;
9            
10      
Algorithm 1 The MOEA/D [25] maximizing a bi-objective problem f:{0,1}n2f\colon\{0,1\}^{n}\to\mathbb{R}^{2}. See also Section 4.

Mutation Operators

We consider both standard bit mutation as well as power-law mutation [8]. Let x{0,1}nx\in\{0,1\}^{n} be the input (the parent) of the mutation. Both operators create a new individual (the offspring) by first copying xx and then adjusting its bit values. Standard bit mutation flips, for all i[n]i\in[n], bit xix_{i} independently with probability 1/n1/n.

Power-law mutation requires a parameter β(1,2)\beta\in(1,2) (the power-law exponent) as input and utilizes the power-law distribution Pow(β,n)\mathrm{Pow}(\beta,n) over [n][n], defined as follows. Let Cβ=i[n]iβC_{\beta}=\sum_{i\in[n]}i^{-\beta} as well as XPow(β,n)X\sim\mathrm{Pow}(\beta,n). For all i[n]i\in[n], it holds that Pr[X=i]=iβ/Cβ\Pr[X=i]=i^{-\beta}/C_{\beta}. The power-law mutation first draws XPow(β,n)X\sim\mathrm{Pow}(\beta,n) (the mutation strength) and then flips an XX-cardinality subset of positions in xx chosen uniformly at random.

The following lemma bounds the probability to mutate an individual with at most n4\frac{n}{4} 0s into one with at most n2\frac{n}{2}, showing that the probability is proportional to the distance in the number of 0s. Its proof makes use of the anti-concentration of the hypergeometric distribution (Lemma 2) around its expectation, which we discuss after Lemma 1. We note that, due to space restrictions, the proofs of these results are found in the appendix.

Lemma 1

Let x,y{0,1}nx,y\in\{0,1\}^{n} with u|x|0[0..n4]u\coloneqq|x|_{0}\in[0..\frac{n}{4}] and v|y|0[u+1..n21]v\coloneqq|y|_{0}\in[u+1..\frac{n}{2}-1] with u<vu<v. Moreover, let β(1,2)\beta\in(1,2), and let mutβ\mathrm{mut}_{\beta} denote the power-law mutation with power-law exponent β\beta. Then Pr[mutβ(x)=y]=Ω((vu)β)\Pr[\mathrm{mut}_{\beta}(x)=y]=\Omega\bigl{(}(v-u)^{-\beta}\bigr{)}.

Hypergeometric distribution.

The hypergeometric distribution (Hyp\mathrm{Hyp}) takes three parameters, namely, nn\in\mathbb{N}, k[0..n]k\in[0..n], and r[0..n]r\in[0..n], and it has a support of [max(0,r+kn)..min(k,r)][\max(0,r+k-n)..\min(k,r)]. A random variable XHyp(n,k,r)X\sim\mathrm{Hyp}(n,k,r) describes the number of good balls drawn when drawing rr balls uniformly at random without replacement from a set of nn balls, out of which kk are good. That is, for all i[max(0,r+kn)..min(k,r)]i\in[\max(0,r+k-n)..\min(k,r)] holds Pr[X=i]=(ki)(nkri)/(nr)\Pr[X=i]=\binom{k}{i}\binom{n-k}{r-i}/\binom{n}{r}. Moreover, E[X]=rkn\operatorname{E}[X]=r\frac{k}{n} as well as Var[X]=rkn(1kn)nrn1\operatorname{Var}[X]=r\frac{k}{n}(1-\frac{k}{n})\frac{n-r}{n-1}. In the context of power-law mutation, nn represents the number of bits, kk the number of specific bits to flip (for example, 0-bits), and rr represents the mutation strength.

The following lemma shows that the hypergeometric distribution has a reasonable probability of sampling values that deviate only by the standard deviation from its expected value.

Lemma 2

Let nn\in\mathbb{N}, k[0..n2]k\in[0..\frac{n}{2}], and r[0..34n]r\in[0..\frac{3}{4}n], and let γ>0\gamma\in\mathbb{R}_{>0} be a constant. Moreover, let HHyp(n,k,r)H\sim\mathrm{Hyp}(n,k,r). Then, for all x[E[H]2Var[H],E[H]+2Var[H]]x\in[\operatorname{E}[H]-2\sqrt{\operatorname{Var}[H]},\operatorname{E}[H]+2\sqrt{\operatorname{Var}[H]}] holds that Pr[H=x]γ/Var[H]\Pr[H=x]\geq\gamma/\sqrt{\operatorname{Var}[H]}.

5 Runtime Analysis

We analyze the MOEA/D (Algorithm 1) on the OneMinMax (OMM\mathrm{OMM}) function (of dimension n1n\in\mathbb{N}_{\geq 1}) with subproblems spread uniformly across the Pareto front of OMM\mathrm{OMM}. To this end, we make the following assumptions about the input of the algorithm, which we refer to as the parametrization: We consider decomposition numbers N[n]N\in[n], we define the weight vector as w=(inN)i[0..N]w=(i\frac{n}{N})_{i\in[0..N]}, and we consider the subproblems as defined in equation 1. In our calculations, we assume that NN divides nn, as this avoids rounding, but we note that all results are equally valid if NN does not divide nn, although the computations become more verbose, adding little merit. We note that, due to space restrictions, some of our proofs are in the appendix.

Our main results are the following, bounding the expected runtime for both standard bit mutation and power-law mutation.

Corollary 1

Consider the MOEA/D maximizing OMM\mathrm{OMM} with standard bit mutation and with the parametrization specified at the beginning of Section 5. Then its expected runtime is O(nNlogn+nn/(2N)Nlogn)O(nN\log n+n^{n/(2N)}N\log n) function evaluations.

Corollary 2

Consider the MOEA/D maximizing OMM\mathrm{OMM} with power-law exponent β(1,2)\beta\in(1,2) and with the parametrization specified at the beginning of Section 5. Then its expected runtime is O(nNlogn+nβlogn)O\mathopen{}\mathclose{{}\left(nN\log n+n^{\beta}\log n}\right) function evaluations.

Both runtime results present two terms, which stem from the two phases into which we separate our analysis. The first term in the results is an upper bound of the first phase, which is the number of function evaluations it takes the MOEA/D to optimize all subproblems. We call the solutions to these subproblems gg-optima. Our bounds for either mutation operator for the first phase are the same (Corollary 3). The optimization mainly resembles performing N+1N+1 times an optimization similar to that of the well-known (1+1)(1+1) evolutionary algorithm on the OneMax benchmark function. For N=nN=n, our result for standard bit mutation recovers the result by Li et al. [19, Proposition 44]. We go into detail about the analysis in Section 5.1.

The second phase starts immediately after the first phase and stops once the Pareto front of the MOEA/D covers the Pareto front of OMM\mathrm{OMM}. During this analysis, we only assume that the MOEA/D found the gg-optima so far. Thus, in the worst case, it still needs to find all other Pareto-optima of OMM\mathrm{OMM}. To this end, each such optimum needs to be created via mutation directly from one of the gg-optima, as the latter are not being replaced, due to them being optimal. Depending on the gap size, that is, the number of Pareto optima between two gg-optima of neighboring subproblems, the mutation operator makes a big difference on the expected runtime bound. We analyze both mutation operators separately in Section 5.2.

Regarding Corollary 1, we see that the upper bound for standard bit mutation only handles values for N[n2..n]N\in[\frac{n}{2}..n] without any slowdown in comparison to the first phase. For smaller values, the upper bound is dominated by the second term and becomes super-polynomial once N=o(n)N=o(n). In Theorem 5.1, we prove a lower bound for the second phase that shows that the expected runtime is also at least super-polynomial once N=o(n)N=o(n). However, as this lower bound only applies to the second phase, it may be possible during the whole optimization of OMM\mathrm{OMM} that the algorithm finds most of the OMM\mathrm{OMM}-Pareto optima already during the first phase, although we conjecture this not to happen for sufficiently small NN.

For power-law mutation (Corollary 2), the bound for the second phase is independent of NN (Theorem 5.2). This shows that the power-law mutation picks up missing Pareto optima fairly quickly. In fact, even for an optimal value of N=nN=n for standard bit mutation, which results in a bound of O(n2logn)O(n^{2}\log n) for the second phase, the bound for power-law mutation is still smaller by a factor of nβ2n^{\beta-2}, which is less than 11, as we assume that β(1,2)\beta\in(1,2). Moreover, for the whole optimization, the upper bound for power-law mutation is best once N=O(nβ1)N=O(n^{\beta-1}), that is, for smaller values of NN. This is in contrast to the upper bound for standard bit mutation, which gets better for larger values of NN.

Overall, our results suggest that standard bit mutation profits from having many subproblems, as creating initially skipped solutions may be hard to create once all subproblems are optimized. In contrast, power-law mutation is slowed down by optimizing many subproblems in parallel. Instead, it profits from optimizing fewer such subproblems and then creating initially skipped solutions.

In the following, we first analyze the first phase (Section 5.1) and then the second phase (Section 5.2).

5.1 First Phase

Recall that the first phase considers optimization of OMM\mathrm{OMM} only until all subproblems are optimized, that is, until all gg-optima are found. Our main result is the following and shows that finding the gg-optima is not challenging for the MOEA/D, regardless of the mutation operator.

Corollary 3

Consider the MOEA/D maximizing OMM\mathrm{OMM} with the parametrization specified at the beginning of Section 5. Then the expected time until PP contains all gg-optima of OMM\mathrm{OMM} is O(nNlogn)O(nN\log n) function evaluations for both standard bit mutation and power-law mutation with power-law exponent β(1,2)\beta\in(1,2).

For our proof of Corollary 3, we split the first phase into two parts. The first part considers the time until the reference point zz^{*} is optimal, that is, until z=(n,n)z^{*}=(n,n). For this part, only the optimization of g0g_{0} and gNg_{N} is relevant.

The second part starts with an optimal reference point and considers the time until all gg-optima are found. For this part, we consider the optimization of an arbitrary subproblem and multiply the resulting time by roughly NlogNN\log N, as we consider N+1N+1 subproblems and we wait until all of them are optimized.

In order to prove results that hold for the MOEA/D with standard bit mutation as well as with power-law mutation, we consider a general mutation operator, which we call general mutation. It takes one parameter p1(0,1]p_{1}\in(0,1] and works as follows: It chooses to flip exactly one bit in the parent with probability p1p_{1} (and it flips any other number with probability 1p11-p_{1}). Conditional on flipping exactly one bit, it flips one of the nn bits of the parent uniformly at random and returns the result. Note that standard bit mutation is general mutation with p1=(11n)n1p_{1}=(1-\frac{1}{n})^{n-1} and that power-law mutation with power-law exponent β\beta is general mutation with p1=1/Cβp_{1}=1/C_{\beta}. Both of these values are constants.

For the first part, we get the following result.

Lemma 3

Consider the MOEA/D maximizing OMM\mathrm{OMM} with the parametrization specified at the beginning of Section 5 and with general mutation with parameter p1(0,1]p_{1}\in(0,1]. Then the expected time until z=(n,n)z^{*}=(n,n) holds is O(np1Nlogn)O(\frac{n}{p_{1}}N\log n) function evaluations.

Proof

Let TT be the first iteration such that z1=nz^{*}_{1}=n. Without loss of generality, we only analyze E[T]\operatorname{E}[T], as the analysis for z2=nz^{*}_{2}=n is identical when changing all 11s into 0s and vice versa in the individuals in all arguments that follow. Thus, the expected runtime that we aim to prove is at most 2E[T]2\operatorname{E}[T] by linearity of expectation. Hence, we are left to show that E[T]=O(np1logn)\operatorname{E}[T]=O(\frac{n}{p_{1}}\log n), where the factor of NN in the bound of Lemma 3 stems from the MOEA/D making N+1N+1 function evaluations per iteration. To this end, we only consider the optimization of gNg_{N}.

By equation 1, the choice of the weight vector ww, and the definition of OMM\mathrm{OMM}, it follows for all x{0,1}nx\in\{0,1\}^{n} and z2z\in\mathbb{R}^{2} that gN(x,z)=max(|z1|x|1|,0)=|z1|x|1|g_{N}(x,z)=\max(|z_{1}-|x|_{1}|,0)=|z_{1}-|x|_{1}|. In each iteration, let xNx_{N} denote the best-so-far solution for gNg_{N} at the beginning of the iteration, and let yy denote its offspring generated via mutation. Note that, due to how zz^{*} is initialized and updated, in each iteration, it holds that z1max(f1(xN),f1(y))=max(|xN|1,|y|1)z^{*}_{1}\geq\max(f_{1}(x_{N}),f_{1}(y))=\max(|x_{N}|_{1},|y|_{1}). Thus, if |y|1>|xN|1|y|_{1}>|x_{N}|_{1}, then g(y,z)<g(xN,z)g(y,z^{*})<g(x_{N},z), and thus xNx_{N} is updated to yy at the end of the iteration. Hence, the optimization of gNg_{N} proceeds like a (1+1)(1+1)-EA-variant with general mutation optimizing OneMax.

More formally, let (Xt)t(X_{t})_{t\in\mathbb{N}} such that for each tt\in\mathbb{N}, the value XtX_{t} denotes the number of 0s in xNx_{N} at the end of iteration tt. Note that XT=0X_{T}=0. We aim to apply the multiplicative drift theorem (Theorem 3.1) to XX with TT. By the definition of the mutation operator, it follows for all t<Tt<T that E[XtXt+1Xt]Xtp1n\operatorname{E}[X_{t}-X_{t+1}\mid X_{t}]\geq X_{t}\frac{p_{1}}{n}, since it is sufficient to choose to flip one bit (with probability p1p_{1}) and then to flip one of the XtX_{t} 0s of xNx_{N} (at the beginning of the iteration), which are chosen uniformly at random. Thus, by Theorem 3.1, it follows that E[T]np1logn\operatorname{E}[T]\leq\frac{n}{p_{1}}\log n, concluding the proof. ∎

For the second part, we get the following result.

Lemma 4

Consider the MOEA/D maximizing OMM\mathrm{OMM} with the parametrization specified at the beginning of Section 5 and with z=(n,n)z^{*}=(n,n) and with general mutation with parameter p1(0,1]p_{1}\in(0,1]. Then the expected time until PP contains all gg-optima of OMM\mathrm{OMM} is O(np1Nlogn)O(\frac{n}{p_{1}}N\log n) function evaluations.

Proof

Let i[0,N]i\in[0,N]. We bound with high probability the time TT until gig_{i} is optimized, only counting the function evaluations for subproblem ii. The result of Lemma 4 follows then by considering the maximum runtime among all values of ii and multiplying it by N+1N+1, as we perform N+1N+1 function evaluations per iteration and optimize all subproblems in parallel. We bound the maximum with high probability by taking a union bound over all N+1N+1 different values for ii. If the maximum of TT over all ii is at least B0B\in\mathbb{R}_{\geq 0} with probability at most q[0,1)q\in[0,1), then we get the overall expected runtime by repeating our analysis 11q\frac{1}{1-q} times in expectation, as the actual runtime is dominated by a geometric random variable with success probability 1q1-q. The overall expected runtime is then O(BN11q)O(BN\frac{1}{1-q}) Thus, it remains to show that Pr[T>np1(ln(n)+2ln(N+1))](N+1)2\Pr[T>\frac{n}{p_{1}}\bigl{(}\ln(n)+2\ln(N+1)\bigr{)}]\leq(N+1)^{-2}, as it then follows that q(N+1)1q\leq(N+1)^{-1} and thus 11q2\frac{1}{1-q}\leq 2. We aim to prove this probability bound with the multiplicative drift theorem (Theorem 3.1).

Let (Xt)t(X_{t})_{t\in\mathbb{N}} be such that for all tt\in\mathbb{N}, value XtX_{t} denotes the value of gi(xi,z)g_{i}(x_{i},z^{*}) at the beginning of the iteration. Note that XT=0X_{T}=0 and that for all t<Tt<T holds that Xt[0..n]X_{t}\in[0..n] and that for XtX_{t} to reduce (if Xt>0X_{t}>0), it is sufficient to flip one of the XtX_{t} bits that reduce the distance. Thus, by the definition of the mutation operator, it follows that E[XtXt+1Xt]Xtp1n\operatorname{E}[X_{t}-X_{t+1}\mid X_{t}]\geq X_{t}\frac{p_{1}}{n}. Overall, by (Theorem 3.1), it follows that Pr[T>np1(ln(n)+2ln(N+1))](N+1)2\Pr[T>\frac{n}{p_{1}}\bigl{(}\ln(n)+2\ln(N+1)\bigr{)}]\leq(N+1)^{-2}. The proof is concluded by noting that NnN\leq n and thus ln(n)+2ln(N+1)=O(logn)\ln(n)+2\ln(N+1)=O(\log n). ∎

By the linearity of expectation, the expected time of the first phase is the sum of the expected runtimes of both parts. Moreover, since standard bit mutation and power-law mutation are both a general mutation with p1=Θ(1)p_{1}=\Theta(1), we can omit p1p_{1} in the asymptotic notation. Overall, we obtain Corollary 3.

5.2 Second Phase

Recall that the second phase assumes that the MOEA/D starts with the gg-optima as its solutions to the subproblems, and it lasts until all OMM\mathrm{OMM}-Pareto optima are found.

For this phase, the actual objective function is not very important. All that matters is that if the solutions (xi)i[0..N](x_{i})_{i\in[0..N]} of the MOEA/D are such that for all i[0..N]i\in[0..N] holds that |xi|1=inN|x_{i}|_{1}=i\frac{n}{N}, then xix_{i} is optimal for gig_{i}. We refer to such a situation as evenly spread gg-optima.

Since there is a drastic difference between the runtimes of standard bit mutation and power-law mutation, we analyze these two operators separately.

Standard Bit Mutation

Since the standard bit mutation is highly concentrated around flipping only a constant number of bits, it does not perform well when it needs to fill in larger gaps. The following theorem is our main result, and it proves an upper and a lower bound for the expected runtime of the second phase. These bounds are not tight, but they show that the runtime is already super-polynomial once N=o(n)N=o(n).

Theorem 5.1

Consider the MOEA/D maximizing a bi-objective function with evenly spread gg-optima and with standard bit mutation, using the parametrization specified at the beginning of Section 5. Moreover, assume that n2N\frac{n}{2N} is integer and that the algorithm is initialized with (xi)i[0..N](x_{i})_{i\in[0..N]} such that for all i[0..N]i\in[0..N] holds that |xi|1=inN|x_{i}|_{1}=i\cdot\frac{n}{N}. Then the expected runtime until for each j[0..n]j\in[0..n] at least one individual with jj 0s is created via mutation is O(nn/(2N)Nlogn)Ω(n(1/2)(n/N+1)N2n/N)O(n^{n/(2N)}N\log n)\cap\Omega(n^{(1/2)(n/N+1)}\sqrt{N}2^{-n/N}) function evaluations.

Power-Law Mutation

The power-law mutation allows to create individuals at larger distance from its parent with far higher probability than standard bit mutation. Our main result is the following theorem, which shows that the MOEA/D with power-law mutation optimizes the second phase of OMM\mathrm{OMM} efficiently. As before, we state this theorem in a more general fashion.

Theorem 5.2

Consider the MOEA/D optimizing a bi-objective function with evenly spread gg-optima, using the parametrization specified at the beginning of Section 5. Moreover, assume that the algorithm is initialized with {inN}i[0..N]\{i\cdot\frac{n}{N}\}_{i\in[0..N]}. Then the expected runtime until for each j[0..n]j\in[0..n] at least one individual with jj 0s is created via mutation is O(nβlogn)O(n^{\beta}\log n).

The bound of Theorem 5.2 does not depend on NN, in contrast to the bound on the first phase (Corollary 3). The reason for our bound not depending on NN is roughly that the effort to fill in a gap between to gg-optima is inversely proportional to the cost of an iteration, namely, N+1N+1. Thus, a smaller value of NN leads to faster iterations but more iterations spend to fill the gaps, and vice versa.

Our (omitted) proof of Theorem 5.2 makes use of the following lemma, which bounds the probability to create a specific individual from any of the gg-optima in a single iteration of the MOEA/D.

Lemma 5

Consider a specific iteration during the optimization of the MOEA/D of a bi-objective function with evenly spread gg-optima, using the parametrization specified at the beginning of Section 5. Moreover, assume that the algorithm is initialized with {inN}i[0..N]\{i\cdot\frac{n}{N}\}_{i\in[0..N]}. Last, let y{0,1}ny\in\{0,1\}^{n} be such that u|y|0[0..n2]u\coloneqq|y|_{0}\in[0..\frac{n}{2}]. Then the probability that yy is produced during mutation in this iteration is Ω(N(nβ))\Omega\mathopen{}\mathclose{{}\left(N(n^{-\beta})}\right).

Proof

Let ii be such that ni/N<un(i+1)/Nni/N<u\leq n(i+1)/N. Clearly, iN/2i\leq N/2, as un/2u\leq n/2. Note that are at least i/4i/4 values of j[0..N/4]j\in[0..N/4] such that jn/Nn/4jn/N\leq n/4. By Lemma 1, each individual xjx_{j} mutates into an individual with uu 0-bits with probability Ω(((i+1)n/N)β)\Omega\bigl{(}\bigl{(}(i+1)n/N\bigr{)}^{-\beta}\bigr{)}. Because there are Ω(i)\Omega(i) such possible mutations during an iteration, the probability of generating at least one during an iteration is Ω(i1βnβNβ)=Ω(N(Ni)β1nβ)=Ω(Nnβ)\Omega\mathopen{}\mathclose{{}\left(i^{1-\beta}n^{-\beta}\cdot N^{\beta}}\right)=\Omega(N\bigl{(}\frac{N}{i}\bigr{)}^{\beta-1}n^{-\beta})=\Omega\mathopen{}\mathclose{{}\left(Nn^{-\beta}}\right), concluding the proof. ∎

6 Conclusion

We studied the impact of the decomposition number NN of the MOEA/D [25] on the classic multi-objective benchmark OneMinMax (OMM\mathrm{OMM}[13] theoretically. Our analyses considered subproblems that are evenly spread out on the Pareto front of OMM\mathrm{OMM}. Especially, we studied the expected runtime when starting with all optima of the subproblems (the gg-optima) and requiring to find the remaining Pareto optima of OMM\mathrm{OMM}.

One of our theoretical results (Theorem 5.2) shows that using power-law mutation allows the MOEA/D to efficiently find the entire Pareto front of OMM\mathrm{OMM} even if it is initialized only with the gg-optima. Interestingly, this bound is independent of the number of problems NN and thus the number of initially missing Pareto optima between two neighboring gg-optima. Together with our general bound for finding all gg-optima (Corollary 3), this shows that the MOEA/D with power-law mutation always optimizes OMM\mathrm{OMM} efficiently (Corollary 2). Depending on NN, our total-runtime bound ranges from O(n2logn)O(n^{2}\log n) in the worst case to O(nβlogn)O(n^{\beta}\log n) in the best case of N=O(nβ1)N=O(n^{\beta-1}). This suggests that the MOEA/D is, in fact, slowed down when using many subproblems, despite large values of NN implying that the subproblems cover the Pareto front of OMM\mathrm{OMM} better than smaller values. The reason is that a large value of NN roughly translates to optimizing the same problem NN times. With the power-law mutation, it is better to optimize fewer and therefore more diverse subproblems and to then find the remaining Pareto optima efficiently via the power-law mutation.

For standard bit mutation, when starting with all gg-optima, we show (Theorem 5.1) that the MOEA/D is not capable of efficiently finding all Pareto optima if NN is sufficiently small, as our lower bound is super-polynomial for N=o(n)N=o(n). Nonetheless, for N=Θ(n)N=\Theta(n), the expected runtime of the MOEA/D with standard bit mutation is polynomial in this part. This translates to an overall polynomial expected runtime (Corollary 1) for N=Θ(n)N=\Theta(n) that it is even O(n2logn)O(n^{2}\log n) for N[n2..n]N\in[\frac{n}{2}..n], matching our worst-case bound with power-law.

Overall, our results suggest a clear benefit of power-law mutation over standard bit mutation of the MOEA/D in the setting we considered. Not only is the power-law variant faster, it is also far more robust to the choice of NN and thus to how the problem is decomposed.

For future work, it would be interesting to improve the upper bounds or prove matching lower bounds. A similar direction is to consider an exchange of best-so-far solutions among the subproblems. The classic MOEA/D supports such an exchange, which could potentially lead to finding the gg-optima more quickly. Another promising direction is the study of different problem decompositions, for example, not-evenly spread subproblems or subproblem definitions different from equation 1. Last, we considered the OMM\mathrm{OMM} setting, with a stronger generalization some of our results (Theorems 5.1 and 5.2). However, it is not clear to what extent the benefit of power-law mutation carries over to problems with an entirely different structure, such as LeadingOnesTrailingZeros.

References

  • [1] Bian, C., Qian, C.: Better running time of the non-dominated sorting genetic algorithm II (NSGA-II) by using stochastic tournament selection. In: Parallel Problem Solving From Nature, PPSN 2022. pp. 428–441. Springer (2022)
  • [2] Cerf, S., Doerr, B., Hebras, B., Kahane, J., Wietheger, S.: The first proven performance guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) on a combinatorial optimization problem. In: International Joint Conference on Artificial Intelligence, IJCAI 2023. pp. 5522–5530. ijcai.org (2023)
  • [3] Dang, D.C., Opris, A., Salehi, B., Sudholt, D.: Analysing the robustness of NSGA-II under noise. In: Genetic and Evolutionary Computation Conference, GECCO 2023. pp. 642–651. ACM (2023)
  • [4] Dang, D.C., Opris, A., Salehi, B., Sudholt, D.: A proof that using crossover can guarantee exponential speed-ups in evolutionary multi-objective optimisation. In: Conference on Artificial Intelligence, AAAI 2023. pp. 12390–12398. AAAI Press (2023)
  • [5] Do, A.V., Neumann, A., Neumann, F., Sutton, A.M.: Rigorous runtime analysis of moea/d for solving multi-objective minimum weight base problems. In: Advances in Neural Information Processing Systems. pp. 36434–36448. Curran Associates (2023), https://proceedings.neurips.cc/paper_files/paper/2023/file/72416ded78a439907ff72165ac9c56e0-Paper-Conference.pdf
  • [6] Doerr, B., Goldberg, L.A.: Adaptive drift analysis. Algorithmica 65, 224–250 (2013)
  • [7] Doerr, B., Johannsen, D., Winzen, C.: Multiplicative drift analysis. Algorithmica 64, 673–697 (2012)
  • [8] Doerr, B., Le, H.P., Makhmara, R., Nguyen, T.D.: Fast genetic algorithms. In: Genetic and Evolutionary Computation Conference, GECCO 2017. pp. 777–784. ACM (2017)
  • [9] Doerr, B., Qu, Z.: A first runtime analysis of the NSGA-II on a multimodal problem. IEEE Transactions on Evolutionary Computation 27, 1288–1297 (2023)
  • [10] Doerr, B., Qu, Z.: From understanding the population dynamics of the NSGA-II to the first proven lower bounds. In: Conference on Artificial Intelligence, AAAI 2023. pp. 12408–12416. AAAI Press (2023)
  • [11] Doerr, B., Qu, Z.: Runtime analysis for the NSGA-II: provable speed-ups from crossover. In: Conference on Artificial Intelligence, AAAI 2023. pp. 12399–12407. AAAI Press (2023)
  • [12] Giel, O.: Expected runtimes of a simple multi-objective evolutionary algorithm. In: Congress on Evolutionary Computation, CEC 2003. pp. 1918–1925. IEEE (2003)
  • [13] Giel, O., Lehre, P.K.: On the effect of populations in evolutionary multi-objective optimisation. Evolutionary Computation 18, 335–356 (2010)
  • [14] Huang, Z., Zhou, Y.: Runtime analysis of somatic contiguous hypermutation operators in MOEA/D framework. In: Conference on Artificial Intelligence, AAAI 2020. pp. 2359–2366. AAAI Press (2020)
  • [15] Huang, Z., Zhou, Y., Chen, Z., He, X., Lai, X., Xia, X.: Running time analysis of MOEA/D on pseudo-Boolean functions. IEEE Transactions on Cybernetics 51, 5130–5141 (2021)
  • [16] Janson, S.: Tail bounds for sums of geometric and exponential variables. CoRR abs/1709.08157 (2017). https://doi.org/10.48550/arXiv.1709.08157
  • [17] Laumanns, M., Thiele, L., Zitzler, E.: Running time analysis of multiobjective evolutionary algorithms on pseudo-Boolean functions. IEEE Transactions on Evolutionary Computation 8, 170–182 (2004)
  • [18] Lengler, J.: Drift analysis. CoRR abs/1712.00964 (2017)
  • [19] Li, Y.L., Zhou, Y.R., Zhan, Z.H., Zhang, J.: A primary theoretical study on decomposition-based multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation 20, 563–576 (2016)
  • [20] Opris, A., Dang, D.C., Neumann, F., Sudholt, D.: Runtime analyses of NSGA-III on many-objective problems. In: Genetic and Evolutionary Computation Conference, GECCO 2024. ACM (2024), to appear
  • [21] Rudolph, G.: Evolutionary search for minimal elements in partially ordered finite sets. In: Evolutionary Programming, EP 1998. pp. 345–353. Springer (1998)
  • [22] Thierens, D.: Convergence time analysis for the multi-objective counting ones problem. In: Evolutionary Multi-Criterion Optimization, EMO 2003. pp. 355–364. Springer (2003)
  • [23] Wietheger, S., Doerr, B.: A mathematical runtime analysis of the Non-dominated Sorting Genetic Algorithm III (NSGA-III). In: International Joint Conference on Artificial Intelligence, IJCAI 2023. pp. 5657–5665. ijcai.org (2023)
  • [24] Zhang, J., Xing, L.: A survey of multiobjective evolutionary algorithms. In: International Conference on Computational Science and Engineering (CSE). pp. 93–100. IEEE (2017). https://doi.org/10.1109/CSE-EUC.2017.27
  • [25] Zhang, Q., Li, H.: MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation 11, 712–731 (2007)
  • [26] Zheng, W., Doerr, B.: Better approximation guarantees for the NSGA-II by using the current crowding distance. In: Genetic and Evolutionary Computation Conference, GECCO 2022. pp. 611–619. ACM (2022)
  • [27] Zheng, W., Doerr, B.: Mathematical runtime analysis for the non-dominated sorting genetic algorithm II (NSGA-II). Artificial Intelligence 325, 104016 (2023)
  • [28] Zheng, W., Doerr, B.: Runtime analysis for the NSGA-II: proving, quantifying, and explaining the inefficiency for many objectives. IEEE Transactions on Evolutionary Computation (2023), in press, https://doi.org/10.1109/TEVC.2023.3320278
  • [29] Zheng, W., Doerr, B.: Runtime analysis of the SMS-EMOA for many-objective optimization. In: Conference on Artificial Intelligence, AAAI 2024. AAAI Press (2024)
  • [30] Zheng, W., Li, M., Deng, R., Doerr, B.: How to use the metropolis algorithm for multi-objective optimization? In: Conference on Artificial Intelligence, AAAI 2024. AAAI Press (2024)
  • [31] Zheng, W., Liu, Y., Doerr, B.: A first mathematical runtime analysis of the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). In: Conference on Artificial Intelligence, AAAI 2022. pp. 10408–10416. AAAI Press (2022)
  • [32] Zhou, A., Qu, B.Y., Li, H., Zhao, S.Z., Suganthan, P.N., Zhang, Q.: Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation 1, 32–49 (2011)

Appendix 0.A Appendix of Paper 252—Proven Runtime Guarantees for How the MOEA/D Computes the Pareto Front From the Subproblem Solutions

This appendix contains all of the proofs that are omitted in the main paper, due to space restrictions. It is meant to be read at the reviewer’s discretion only.

Proofs of Section 4

The proof of Lemma 1 makes use of the anti-concentration of the hypergeometric distribution (Lemma 2), which is why we defer the proof of Lemma 1 to the end of this section.

In order to prove Lemma 2, we show that the ratio of neighboring probabilities of a hypergeometrically distributed random variable is close to 11.

Lemma 6

Let nn\in\mathbb{N}, k[0..n2]k\in[0..\frac{n}{2}], and r[0..min(k,34n)]r\in[0..\min(k,\frac{3}{4}n)], and let C>1C\in\mathbb{R}_{>1} be a constant. Moreover, let HHyp(n,k,r)H\sim\mathrm{Hyp}(n,k,r), i[2Var[H]1..2Var[H]+1]i\in[-2\sqrt{\operatorname{Var}[H]}-1..2\sqrt{\operatorname{Var}[H]}+1], and let δi=Pr[H=E[H]+i+1]/Pr[H=E[H]+i]\delta_{i}=\Pr[H=\operatorname{E}[H]+i+1]/\Pr[H=\operatorname{E}[H]+i]. Then

(17/Var[H])δi(1+7/Var[H]).\displaystyle(1-7/\sqrt{\operatorname{Var}[H]})\leq\delta_{i}\leq(1+7/\sqrt{\operatorname{Var}[H]}).
Proof (of Lemma 6)

Take HH and ii as in the statement of the lemma, and let σVar[H]\sigma\sqrt{\operatorname{Var}[H]}. Notice that σ2=rkn(1kn)nrn1\sigma^{2}=r\frac{k}{n}\mathopen{}\mathclose{{}\left(1-\frac{k}{n}}\right)\frac{n-r}{n-1} so that σ2rkn\sigma^{2}\leq\frac{rk}{n}.

Suppose that i0i\geq 0. We lower-bound δi\delta_{i} by an expression of the desired form

δi\displaystyle\delta_{i} =(krkn+i)(rrkni)(rkn+i+1)(nkr+rkn+i+1).\displaystyle=\frac{\mathopen{}\mathclose{{}\left(k-r\frac{k}{n}+i}\right)\mathopen{}\mathclose{{}\left(r-\frac{rk}{n}-i}\right)}{\mathopen{}\mathclose{{}\left(\frac{rk}{n}+i+1}\right)\mathopen{}\mathclose{{}\left(n-k-r+\frac{rk}{n}+i+1}\right)}.
In this expression, factorize by kk in the top left expression, by rr in the top right expression, by rkn\frac{rk}{n} in the bottom left expression and for the bottom right one, notice that nkr+rkn=1n(n2nknr+rk)=1n(nk)(nr)n-k-r+\frac{rk}{n}=\frac{1}{n}(n^{2}-nk-nr+rk)=\frac{1}{n}(n-k)(n-r) to obtain
δi\displaystyle\delta_{i} =k(1rn+ik)r(1knir)rkn(1+i+1rkn)(1n(nk)(nr)+i+1).\displaystyle=\frac{k\mathopen{}\mathclose{{}\left(1-\frac{r}{n}+\frac{i}{k}}\right)r\mathopen{}\mathclose{{}\left(1-\frac{k}{n}-\frac{i}{r}}\right)}{\frac{rk}{n}\mathopen{}\mathclose{{}\left(1+\frac{i+1}{\frac{rk}{n}}}\right)\mathopen{}\mathclose{{}\left(\frac{1}{n}(n-k)(n-r)+i+1}\right)}.
Here, multiply the top two expressions by nn using the 1n\frac{1}{n} terms in the denominator and obtain
δi\displaystyle\delta_{i} =(nr+nik)(nknir)(1+i+1rkn)((nk)(nr)+n(i+1)).\displaystyle=\frac{\mathopen{}\mathclose{{}\left(n-r+\frac{ni}{k}}\right)\mathopen{}\mathclose{{}\left(n-k-\frac{ni}{r}}\right)}{\mathopen{}\mathclose{{}\left(1+\frac{i+1}{\frac{rk}{n}}}\right)\big{(}(n-k)(n-r)+n(i+1)\big{)}}.
Dividing the numerator and denominator by (nk)(nr)(n-k)(n-r) and distributing the factors properly yields
δi\displaystyle\delta_{i} =(1+nik(nr))(1nir(nk))(1+i+1rkn)(1+ni+1(nk)(nr)).\displaystyle=\frac{\mathopen{}\mathclose{{}\left(1+\frac{ni}{k(n-r)}}\right)\mathopen{}\mathclose{{}\left(1-\frac{ni}{r(n-k)}}\right)}{\mathopen{}\mathclose{{}\left(1+\frac{i+1}{\frac{rk}{n}}}\right)\mathopen{}\mathclose{{}\left(1+n\frac{i+1}{(n-k)(n-r)}}\right)}.
Now remember that kn/2k\leq n/2 which implies nkkn-k\geq k. Use that in the top right expression to get
δi\displaystyle\delta_{i} (1+nik(nr))(1nirk)(1+i+1rkn)(1+ni+1k(nr)).\displaystyle\geq\frac{\mathopen{}\mathclose{{}\left(1+\frac{ni}{k(n-r)}}\right)\mathopen{}\mathclose{{}\left(1-\frac{ni}{rk}}\right)}{\mathopen{}\mathclose{{}\left(1+\frac{i+1}{\frac{rk}{n}}}\right)\mathopen{}\mathclose{{}\left(1+n\frac{i+1}{k(n-r)}}\right)}.
The top right term is at least 11/σ1-1/\sigma from iσi\leq\sigma and rk/nσ2rk/n\geq\sigma^{2} so that
δi\displaystyle\delta_{i} (1+nik(nr))(11σ)(1+i+1rkn)(1+ni+1k(nr)).\displaystyle\geq\frac{\mathopen{}\mathclose{{}\left(1+\frac{ni}{k(n-r)}}\right)\mathopen{}\mathclose{{}\left(1-\frac{1}{\sigma}}\right)}{\mathopen{}\mathclose{{}\left(1+\frac{i+1}{\frac{rk}{n}}}\right)\mathopen{}\mathclose{{}\left(1+n\frac{i+1}{k(n-r)}}\right)}.
Similarly, the bottom left term is at most (1+1/σ)(1+1/\sigma) so that
δi\displaystyle\delta_{i} (1+nik(nr))(11σ)(1+1/σ)(1+ni+1k(nr)).\displaystyle\geq\frac{\mathopen{}\mathclose{{}\left(1+\frac{ni}{k(n-r)}}\right)\mathopen{}\mathclose{{}\left(1-\frac{1}{\sigma}}\right)}{\mathopen{}\mathclose{{}\left(1+1/\sigma}\right)\mathopen{}\mathclose{{}\left(1+n\frac{i+1}{k(n-r)}}\right)}.
Now use the fact that 11/σ1+1/σ=σ1σ+113/σ\frac{1-1/\sigma}{1+1/\sigma}=\frac{\sigma-1}{\sigma+1}\geq 1-3/\sigma to get
δi\displaystyle\delta_{i} (13/σ)(1+nik(nr))(1+ni+1k(nr))\displaystyle\geq(1-3/\sigma)\frac{\mathopen{}\mathclose{{}\left(1+\frac{ni}{k(n-r)}}\right)}{\mathopen{}\mathclose{{}\left(1+n\frac{i+1}{k(n-r)}}\right)}
=(13σ)1+nik(nr)1+nik(nr)+nk(nr).\displaystyle=\mathopen{}\mathclose{{}\left(1-\frac{3}{\sigma}}\right)\frac{1+\frac{ni}{k(n-r)}}{1+\frac{ni}{k(n-r)}+\frac{n}{k(n-r)}}.
Divide the denominator by the numerator to obtain
δi\displaystyle\delta_{i} (13σ)11+nk(nr)11+nik(nr)\displaystyle\geq\mathopen{}\mathclose{{}\left(1-\frac{3}{\sigma}}\right)\frac{1}{1+\frac{n}{k(n-r)}\frac{1}{1+\frac{ni}{k(n-r)}}}
=(13σ)11+nk(nr)+ni.\displaystyle=\mathopen{}\mathclose{{}\left(1-\frac{3}{\sigma}}\right)\frac{1}{1+\frac{n}{k(n-r)+ni}}.
Because r3n/4r\leq 3n/4, it is true that nrr/3n-r\geq r/3 and because i0i\geq 0, we obtain
δi\displaystyle\delta_{i} (13σ)11+3nkr\displaystyle\geq\mathopen{}\mathclose{{}\left(1-\frac{3}{\sigma}}\right)\frac{1}{1+\frac{3n}{kr}}
(13σ)11+3/σ\displaystyle\geq\mathopen{}\mathclose{{}\left(1-\frac{3}{\sigma}}\right)\frac{1}{1+3/\sigma}
(13σ)2\displaystyle\geq\mathopen{}\mathclose{{}\left(1-\frac{3}{\sigma}}\right)^{2}
(16σ),\displaystyle\geq\mathopen{}\mathclose{{}\left(1-\frac{6}{\sigma}}\right),

which is what we wanted.

For the upper bound, we directly upper-bound δi\delta_{i} in the following way, starting like we did for the lower-bound, that is,

δi\displaystyle\delta_{i} =(krkn+i)(rrkni)(rkn+i+1)(nkr+rkn+i+1)\displaystyle=\frac{\mathopen{}\mathclose{{}\left(k-r\frac{k}{n}+i}\right)\mathopen{}\mathclose{{}\left(r-\frac{rk}{n}-i}\right)}{\mathopen{}\mathclose{{}\left(\frac{rk}{n}+i+1}\right)\mathopen{}\mathclose{{}\left(n-k-r+\frac{rk}{n}+i+1}\right)}
=(1+nik(nr))(1nir(nk))(1+i+1rkn)(1+ni+1(nk)(nr)).\displaystyle=\frac{\mathopen{}\mathclose{{}\left(1+\frac{ni}{k(n-r)}}\right)\mathopen{}\mathclose{{}\left(1-\frac{ni}{r(n-k)}}\right)}{\mathopen{}\mathclose{{}\left(1+\frac{i+1}{\frac{rk}{n}}}\right)\mathopen{}\mathclose{{}\left(1+n\frac{i+1}{(n-k)(n-r)}}\right)}.
Now upper-bound the top right term by 1 and lower-bound the terms in the denominator by 1 to obtain
δi\displaystyle\delta_{i} (1+nik(nr)).\displaystyle\leq\mathopen{}\mathclose{{}\left(1+\frac{ni}{k(n-r)}}\right).
Because r3/4nr\leq 3/4n, nrr/3n-r\geq r/3 so that
δi\displaystyle\delta_{i} (1+3nikr)\displaystyle\leq\mathopen{}\mathclose{{}\left(1+3\frac{ni}{kr}}\right)
(1+6σ).\displaystyle\leq\mathopen{}\mathclose{{}\left(1+\frac{6}{\sigma}}\right).

The same calculations work for negative ii as well, which concludes the proof. ∎

Proof of Lemma 2

Proof (of Lemma 2)

Recall that μE[H]=rkn\mu\coloneqq\operatorname{E}[H]=r\frac{k}{n}, and that σ2Var[H]=rkn(1kn)nrn1\sigma^{2}\coloneqq\operatorname{Var}[H]=r\frac{k}{n}\mathopen{}\mathclose{{}\left(1-\frac{k}{n}}\right)\frac{n-r}{n-1}.

From the expression of σ2\sigma^{2}, it is clear that σ2rkn\sigma^{2}\leq\frac{rk}{n}. Because r3n/4r\leq 3n/4 and kn/2k\leq n/2, it follows that σ2rk8n\sigma^{2}\geq\frac{rk}{8n}, so that σ2\sigma^{2} is of the order of rk/nrk/n.

We aim to show two properties: The first is that HH is concentrated around μ\mu, the second is that the probability distribution does not decrease too quickly for values larger than μ\mu. The first property follows from Chebyshev’s inequality, as

Pr[|Hμ|2σ]3/4.\Pr\mathopen{}\mathclose{{}\left[\mathopen{}\mathclose{{}\left|H-\mu}\right|\leq 2\sigma}\right]\geq 3/4.

The second property follows from Lemma 6, by which we have that

(17/μ)δi(1+7/μ),\mathopen{}\mathclose{{}\left(1-7/\mu}\right)\leq\delta_{i}\leq\mathopen{}\mathclose{{}\left(1+7/\mu}\right),

where δi=Pr[H=μ+i+1]Pr[H=μ+i]\delta_{i}=\frac{\Pr[H=\mu+i+1]}{\Pr[H=\mu+i]}.

Now notice that if μ2σu<vμ+2σ\mu-2\sigma\leq u<v\leq\mu+2\sigma, one has

Pr[H=v]=Pr[H=u]i=uv1δi\Pr[H=v]=\Pr[H=u]\cdot\prod_{i=u}^{v-1}\delta_{i}

Hence,

(17/σ)vuPr[H=u]Pr[H=v](1+7/σ)vuPr[H=u].(1-7/\sigma)^{v-u}\cdot\Pr[H=u]\leq\Pr[H=v]\leq(1+7/\sigma)^{v-u}\Pr[H=u].

Because vu4σv-u\leq 4\sigma, we obtain

e28/28Pr[H=u]Pr[H=v]e28Pr[H=u].e^{-28}/28\Pr[H=u]\leq\Pr[H=v]\leq e^{28}\Pr[H=u].

And because Pr[H[μ2σ,μ+2σ]]3/4\Pr[H\in[\mu-2\sigma,\mu+2\sigma]]\geq 3/4, we have

maxi[2σ..2σ]Pr[H=μ+i]316σ.\max_{i\in[-2\sigma..2\sigma]}\Pr[H=\mu+i]\geq\frac{3}{16\sigma}.

Thus, owing to to the comparison between any pair of probabilities in that interval that we just showed, we have that for all i[2σ..2σ]i\in[-2\sigma..2\sigma] that

Pr[H=μ+i]e2828316σ,\Pr[H=\mu+i]\geq\frac{\mathrm{e}^{-28}}{28}\frac{3}{16\sigma},

which concludes the proof. ∎

Proof of Lemma 1

Proof (of Lemma 1)

Let Δ\Delta be the number of zeros that are flipped when going from uu to vv zeros. Then vu+Δv-u+\Delta 1s have to be flipped. This happens with probability

(2Δ+vu)β(uΔ)(nuvu+Δ)(n2Δ+vu)(2Δ+vu)βpΔ.\mathopen{}\mathclose{{}\left(2\Delta+v-u}\right)^{-\beta}\frac{\binom{u}{\Delta}\binom{n-u}{v-u+\Delta}}{\binom{n}{2\cdot\Delta+v-u}}\eqqcolon\mathopen{}\mathclose{{}\left(2\Delta+v-u}\right)^{-\beta}p_{\Delta}.

The fraction pΔp_{\Delta} corresponds to the probability of getting Δ\Delta successes in a hypergeometric distribution HΔ(n,u,2Δ+vu)H_{\Delta}\sim\mathcal{H}(n,u,2\Delta+v-u).

Notice that E[HΔ]=(2Δ+vu)un\operatorname{E}[H_{\Delta}]=(2\Delta+v-u)\frac{u}{n}. Hence, E[HΔ]Δ=(vu)un(12u/n)Δ\operatorname{E}[H_{\Delta}]-\Delta=(v-u)\frac{u}{n}-(1-2u/n)\Delta. If Δ=112u/n(vu)unΔ0\Delta=\frac{1}{1-2u/n}\frac{(v-u)u}{n}\eqqcolon\Delta_{0} then E[HΔ]Δ=0\operatorname{E}[H_{\Delta}]-\Delta=0 .

We now bound Δ0\Delta_{0}. To this end, let s=n/2us=n/2-u and t=n/2vt=n/2-v. Since stss-t\leq s and n/2sn/2n/2-s\leq n/2, we have

Δ0=(st)(n/2s)2sn/4.\Delta_{0}=\frac{(s-t)(n/2-s)}{2s}\leq n/4.

Let σ=Var[HΔ0]\sigma=\sqrt{\operatorname{Var}[H_{\Delta_{0}}]}. Notice that

σ2=(2Δ0+vu)un(1un)n2Δ0(vu)n1.\sigma^{2}=(2\Delta_{0}+v-u)\frac{u}{n}\mathopen{}\mathclose{{}\left(1-\frac{u}{n}}\right)\frac{n-2\Delta_{0}-(v-u)}{n-1}.

Thus, it follows that σ2Δ0+vun\sigma\leq\sqrt{2\Delta_{0}+v-u}\leq\sqrt{n}. Hence, we obtain

|E[HΔ0+i](Δ0+i)|\displaystyle\mathopen{}\mathclose{{}\left|\operatorname{E}[H_{\Delta_{0}+i}]-(\Delta_{0}+i)}\right| =|E[HΔ0]Δ0+2iu/ni|\displaystyle=\mathopen{}\mathclose{{}\left|\operatorname{E}[H_{\Delta_{0}}]-\Delta_{0}+2iu/n-i}\right|
=|(12u/n)i|\displaystyle=\mathopen{}\mathclose{{}\left|(1-2u/n)i}\right|
|i|.\displaystyle\leq|i|.

Consequently, if 0iασ0\leq i\leq\alpha\sigma, we have Δ0+i(1/4+α)n\Delta_{0}+i\leq(1/4+\alpha)n. For the remainder, we suppose that α1/2\alpha\leq 1/2. We get Δ0+i3/4n\Delta_{0}+i\leq 3/4n. By Lemma 2, with its constant γ\gamma, the probability of getting from uu to vv with Δ0+i\Delta_{0}+i is at least

(2Δ0+2i+vu)βγVar[HΔ0+i].\mathopen{}\mathclose{{}\left(2\Delta_{0}+2i+v-u}\right)^{-\beta}\frac{\gamma}{\sqrt{\operatorname{Var}[H_{\Delta_{0}+i}]}}.

We now bound the number of bits to flips 2Δ0+2i+vu2\Delta_{0}+2i+v-u. First notice that iσvu+δvu+δi\leq\sigma\leq\sqrt{v-u+\delta}\leq v-u+\delta so that

2Δ0+2i+vu\displaystyle 2\Delta_{0}+2i+v-u 4Δ0+3(vu)\displaystyle\leq 4\Delta_{0}+3(v-u)
=4(vu)un+3(12u/n)(vu)12u/n\displaystyle=\frac{4(v-u)\frac{u}{n}+3(1-2u/n)(v-u)}{1-2u/n}
=4vu12u/n(un+36un)\displaystyle=4\frac{v-u}{1-2u/n}\mathopen{}\mathclose{{}\left(\frac{u}{n}+3-6\frac{u}{n}}\right)
=4(vu)112u/n(35un)\displaystyle=4(v-u)\frac{1}{1-2u/n}\mathopen{}\mathclose{{}\left(3-5\frac{u}{n}}\right)
12(vu)112u/n.\displaystyle\leq 12(v-u)\frac{1}{1-2u/n}.
Because un/4u\leq n/4, we have 2u/n1/22u/n\leq 1/2 so that
24(vu).\displaystyle\leq 24(v-u).

The expression of Var[HΔ0+i]\operatorname{Var}[H_{\Delta_{0}+i}] is

(2Δ0+2i+vu)un(1un)n2Δ02i(vu)n.(2\Delta_{0}+2i+v-u)\frac{u}{n}\mathopen{}\mathclose{{}\left(1-\frac{u}{n}}\right)\frac{n-2\Delta_{0}-2i-(v-u)}{n}.

Note that this expression is at most quadratic in ii independently of u,v,n,Nu,v,n,N. This implies that the square root of this expression is at most linear. From this it follows that there exists a constant κ>0\kappa>0 such that for iασi\leq\alpha\sigma, we have Var[HΔ0+i]κσ\sqrt{\operatorname{Var}[H_{\Delta_{0}+i}]}\leq\kappa\sigma. It follows that the probability of getting from uu to vv with Δ0+i\Delta_{0}+i is at least

(8(vu))βγκσ.\frac{(8(v-u))^{-\beta}\gamma}{\kappa\sigma}.

Using that there are at least ασ\alpha\sigma possible values of ii to use, we get that the probability to get from uu to vv is at least (8(vu))βαβ/κ(8(v-u))^{-\beta}\alpha\beta/\kappa. ∎

Proofs of Section 5

Proof of Theorem 5.1

Proof (of Theorem 5.1)

Let bn2Nb\coloneqq\frac{n}{2N} denote the Hamming distance maximum distance of a non-gg-optimal individual to a gg-optimum. We first prove the upper bound. For all i[0..N1]i\in[0..N-1], we first bound with high probability the number of iterations T1T_{1}, counting only the function evaluations of subproblem gig_{i}, until for all j[b]j\in[b], the mutation with parent xix_{i} created an individual with inN+ji\frac{n}{N}+j 11s. Taking the maximum of T1T_{1} among all values of ii and multiplying the result by N+1N+1 results in an upper bound for covering the left side of all gaps between two gg-optima. Afterward, we do the same for all i[N]i^{\prime}\in[N] for the number of iterations T2T_{2}, counting only the function evaluations of subproblem gig_{i^{\prime}}, until for all j[b]j\in[b], the mutation with parent xix_{i} created an individual inNji\frac{n}{N}-j 11s. Taking again the maximum over all values for ii^{\prime} and multiplying the result by N+1N+1 results in an upper bound for covering the right side of all gaps between two gg-optima.

We estimate the maximum over the respective runtimes in the same way as in the proof of Lemma 4, that is, we find a general upper bound B0B\in\mathbb{R}_{\geq 0} that holds with probability at least 1q(0,1]1-q\in(0,1] for all T1T_{1} for all choice of ii (and analogous for T2T_{2}). We do so via the multiplicative drift theorem (Theorem 3.1). Afterward, we take a union bound over all NN values of ii, showing that the maximum over all T1T_{1} is at most BB with probability at least 1Nq1-Nq. Since the overall runtime for covering the left half of each gap is dominated by a geometric random variable with success probability 1Nq1-Nq, the expected runtime for this part is at most 11NqBN\frac{1}{1-Nq}BN function evaluations. The overall expected runtime of both parts is by linearity of expectation then the sum of two expected values of each part.

We start with the left half of each gap. Let i[0..N1]i\in[0..N-1], and let j[b]j\in[b]. In order to create an individual with inN+ji\frac{n}{N}+j 11s, it is sufficient to flip jj specific 0s of xix_{i} into 0s. Hence, the probability to create such an individual is at least nj(11n)nje1nje1nbn^{-j}(1-\frac{1}{n})^{n-j}\geq\mathrm{e}^{-1}n^{-j}\geq\mathrm{e}^{-1}n^{-b}. Let (Xt)t(X_{t})_{t\in\mathbb{N}} be such that for all tt\in\mathbb{N} holds that XtX_{t} desired individuals have not been created yet (that is, XtX_{t} values of jj are still missing). Note that XT1=0X_{T_{1}}=0 and that for all t<T1t<T_{1} holds that E[XtXt+1Xt]e1nb\operatorname{E}[X_{t}-X_{t+1}\mid X_{t}]\geq\mathrm{e}^{-1}n^{-b}. Thus, by (Theorem 3.1) follows that Pr[T1>enb(ln(b)+2ln(N+1))](N+1)2\Pr[T_{1}>\mathrm{e}n^{b}\bigl{(}\ln(b)+2\ln(N+1)\bigr{)}]\leq(N+1)^{-2}. Hence, via a union bound over all NN choices of ii, it follows that the maximum T1T_{1} is at most enb(ln(b)+2ln(N))\mathrm{e}n^{b}\bigl{(}\ln(b)+2\ln(N)\bigr{)} with probability at least 1(N+1)1121-(N+1)^{-1}\geq\frac{1}{2}. Hence, the overall expected runtime of this part is O(nbNln(bN))O(n^{b}N\ln(bN)).

For the right half of each gap, the arguments are the same after restricting i[N]i\in[N] and writing j-j instead of +j+j.

Overall, noting that bNnbN\leq n, the expected runtime for filling each half is thus O(nbNlogn)O(n^{b}N\log n), concluding the proof for the upper bound.

For the lower bound, we only bound the expected time it takes to create an individual yy with |y|1=b|y|_{1}=b. To this end, we first bound the probability when choosing x1x_{1} as parent, denoted by qq. Recall that |x1|1=2b|x_{1}|_{1}=2b. In order to create yy, the mutation needs to flip at least bb of the 2b2b 11s of x1x_{1}. Hence, qnb(2bb)q\leq n^{-b}\binom{2b}{b}. Since it holds that (2bb)=O(22b/b)\binom{2b}{b}=O(2^{2b}/\sqrt{b}), it follows that qnb22b/bqq\leq n^{-b}\cdot 2^{2b}/\sqrt{b}\eqqcolon q^{\prime}. For the probability to create yy when choosing x0x_{0} as parent, we also use qq^{\prime} as an upper bound. For all other choices of parents, the probability to create yy is at most o(q)o(q^{\prime}), as at least 32b\frac{3}{2}b bits need to be flipped, which scales exponentially to the base nn whereas the binomial coefficient involved only scales exponentially with base 22. Hence, over all, the probability pp to create yy within NN function evaluations is at most 2q(1+o(1))2q^{\prime}\bigl{(}1+o(1)\bigr{)}.

The event that yy is created follows a geometric distribution with success probability pp. Due to p2q(1+o(1))p\leq 2q^{\prime}\bigl{(}1+o(1)\bigr{)}, it follows that the expected number of iterations to create yy is Ω(1q)=Ω(nbb22b)\Omega(\frac{1}{q^{\prime}})=\Omega(n^{b}\sqrt{b}2^{-2b}). Multiplying this expression by N+1N+1 in order to account for the N+1N+1 fitness evaluations each iteration concludes the proof. ∎

Proof of Theorem 5.2

For the proof of Theorem 5.2, we make use of the following theorem.

Corollary 4 ([16, Corollary 2.22.2])

Let n1n\in\mathbb{N}_{\geq 1}, let (pi)i[n](0,1]n(p_{i})_{i\in[n]}\in(0,1]^{n}, and let (Xi)i[n](X_{i})_{i\in[n]} be independent and such that for all i[n]i\in[n] holds that XiX_{i} follows a geometric distribution with success probability pip_{i}. Last, let Y=i[n]XiY=\sum_{i\in[n]}X_{i}. Then it holds for all r1r\in\mathbb{R}_{\geq 1} that Pr[YrE[Y]]re1r\Pr[Y\geq r\operatorname{E}[Y]]\leq r\mathrm{e}^{1-r}.

Proof (of Theorem 5.2)

By symmetry, we just have to prove that we can get individuals with un/2u\leq n/2 bits, as the runtime to get individuals with un/2u\geq n/2 0-bits is the same when changing 0s and 11s. Thus, the runtime is at most twice that of the runtime needed to get all individuals with un/2u\leq n/2 0s.

Let un/2u\leq n/2. By Lemma 5, an iteration gives uu with probability Ω(Nnβ)\Omega\mathopen{}\mathclose{{}\left(Nn^{-\beta}}\right). Hence, the expected number of iterations required to get uu is O(nβ/N)O\mathopen{}\mathclose{{}\left(n^{\beta}/N}\right). Multiplying by NN, we get that the expected number of function evaluations to obtain uu is O(nβ)O(n^{\beta}).

For u[0..n]u\in[0..n], let XuX_{u} be the random variable giving the number of function evaluations before uu is obtained after the first phase of the analysis is finished. By Corollary 4, since XuX_{u} is a geometric variable, we see for all λ1\lambda\in\mathbb{R}_{\geq 1} that

Pr[Xuλlog(n)nβ]=O(lognnλ).\Pr[X_{u}\geq\lambda\log(n)n^{\beta}]=O\mathopen{}\mathclose{{}\left(\frac{\log n}{n^{\lambda}}}\right).

We aim to show that E[max(X0,,Xn)]=O(nβlogn)\operatorname{E}[\max(X_{0},\ldots,X_{n})]=O(n^{\beta}\log n). First notice that

E[maxu(Xu)]uE[Xu]=O(nβ+1logn)\operatorname{E}[\max\nolimits_{u}(X_{u})]\leq\sum\nolimits_{u}\operatorname{E}[X_{u}]=O(n^{\beta+1}\log n)

Taking λ=β+3\lambda=\beta+3 in the previous tail bound and using a union bound on the events {Xuλlog(n)nβ}\{X_{u}\geq\lambda\log(n)n^{\beta}\}, we obtain

Pr[maxu(Xu)(β+3)log(n)nβ]=O(lognnβ+3)=o(nβ2).\Pr[\max\nolimits_{u}(X_{u})\geq(\beta+3)\log(n)n^{\beta}]=O\mathopen{}\mathclose{{}\left(\frac{\log n}{n^{\beta+3}}}\right)=o\mathopen{}\mathclose{{}\left(n^{-\beta-2}}\right).

Decomposing the random variable maxu(Xu)\max_{u}(X_{u}) according to maxu(Xu)(β+3)log(n)nβ\max_{u}(X_{u})\leq(\beta+3)\log(n)n^{\beta} or the complement, we obtain

E[maxu(Xu)](β+3)nβlogn+o(nβ2)E[uXu]=O(nβlog(n)).\operatorname{E}[\max\nolimits_{u}(X_{u})]\leq(\beta+3)n^{\beta}\log n+o\mathopen{}\mathclose{{}\left(n^{-\beta-2}}\right)\operatorname{E}[\sum\nolimits_{u}X_{u}]=O(n^{\beta}\log(n)).

Hence, it follows that the average number of function evaluations to get the whole Pareto Front is O(nβlogn)O\mathopen{}\mathclose{{}\left(n^{\beta}\log n}\right). ∎