This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Crowding Distance That Provably Solves the Difficulties of the NSGA-II in Many-Objective Optimization

Weijie Zheng School of Computer Science and Technology, International Research Institute for Artificial Intelligence, Harbin Institute of Technology, Shenzhen, China    Yan Gao11footnotemark: 1    Benjamin Doerr Laboratoire d’Informatique (LIX), CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, FranceCorresponding author
Abstract

Recent theoretical works have shown that the NSGA-II can have enormous difficulties to solve problems with more than two objectives. In contrast, algorithms like the NSGA-III or SMS-EMOA, differing from the NSGA-II only in the secondary selection criterion, provably perform well in these situations.

To remedy this shortcoming of the NSGA-II, but at the same time keep the advantages of the widely accepted crowding distance, we use the insights of these previous work to define a variant of the crowding distance, called truthful crowding distance. Different from the classic crowding distance, it has for any number of objectives the desirable property that a small crowding distance value indicates that some other solution has a similar objective vector.

Building on this property, we conduct mathematical runtime analyses for the NSGA-II with truthful crowding distance. We show that this algorithm can solve the many-objective versions of the OneMinMax, COCZ, LOTZ, and OneJumpZeroJumpk\textsc{OneJumpZeroJump}_{k} problems in the same (polynomial) asymptotic runtimes as the NSGA-III and the SMS-EMOA. This contrasts the exponential lower bounds previously shown for the classic NSGA-II. For the bi-objective versions of these problems, our NSGA-II has a similar performance as the classic NSGA-II, gaining however from smaller admissible population sizes. For the bi-objective OneMinMax problem, we also observe a (minimally) better performance in approximating the Pareto front.

These results suggest that our truthful version of the NSGA-II has the same good performance as the classic NSGA-II in two objectives, but can resolve the drastic problems in more than two objectives.

1 Introduction

In many practical applications, the problems to be solved have several, often conflicting objectives. Since such problems often do not have a single optimal solution, one resorts to computing a set of diverse good solutions (ideally so-called Pareto optima) and lets a human decision maker take the final decision among these.

One of the most successful algorithm for computing such a set of solutions for a multi-objective optimization problem is the Non-dominated Sorting Genetic Algorithm II (NSGA-II) by Deb et al. [DPAM02], currently cited more than 50,000 times according to Google scholar.

While it was always known that the performance of this algorithm becomes weaker with increasing numbers of objectives – this was the main motivation for Deb and Jain [DJ14] to propose the NSGA-III –, very recent mathematical analyses of multi-objective evolutionary algorithms (MOEAs) could quantify and obtain a deeper understanding of this shortcoming. In [ZD23b], it was proven that the NSGA-II with any population size linear in the Pareto front size cannot optimize the simplistic OneMinMax benchmark in subexponential time when the number of objectives is at least three (for two objectives, a small polynomial runtime guarantee was proven by Zheng and Doerr [ZD23a]). In contrast, for the NSGA-III and the SMS-EMOA, two algorithms differing from the NSGA-II only in that the crowding distance is replaced by a different secondary selection criterion, polynomial runtime guarantees could be proven for the OneMinMax and several other benchmarks in any (constant) number of objectives [WD23, ODNS24, ZD24b, WD24]. This different optimization behavior suggests that it is the crowding distance which is the root for the problems of the NSGA-II in higher numbers of objectives.

Given that the NSGA-II is the by far dominant MOEA in practice, clearly beating the NSGA-III (cited less than 6,000 times according to Google scholar) and the SMS-EMOA (cited less than 2,200 times), and speculating that practitioners prefer working with a variant of the NSGA-II rather than switching to a different algorithm (and also noting that the NSGA-III and SMS-EMOA have some known shortcomings the NSGA-II does not have), in this work we propose to use the NSGA-II unchanged apart from a mild modification to the crowding distance. This change will again build on insights from Zheng and Doerr [ZD23b]. We defer the technical details to a separate section below and state here only that we call our crowding distance truthful crowding distance since we feel that it better reflects how close a solution is to others.

Given that we build mostly on previous works of mathematical nature, and strongly profited from the precision of such results, we analyze the NSGA-II with truthful crowding distance also via mathematical means. Our main results are the following. (i) For the standard many-objective benchmarks mOneMinMax, mCOCZ, mLOTZ, and mOJZJ, the NSGA-II with truthful crowding distance computed the whole Pareto front efficiently, in asymptotically the same time as the NSGA-III or the SMS-EMOA. This demonstrates clearly that it was in fact a weakness of the original crowding distance that led to the drastic problems observed for the NSGA-II in many-objective optimization. (ii) For the bi-objective versions of these benchmarks, for which the NSGA-II was efficient, we show that the NSGA-II with truthful crowding distance is equally efficient, and this already from population sizes on that equal the Pareto front of the problem, whereas the previous results needed a population size some constant factor larger. (iii) We also regard the problem of approximating the Pareto front when the population size is too small to cover the full Pareto front. Here we show that our NSGA-II with sequential selection, in an analogous way as the sequential version of the classic NSGA-II, computes good approximations to the Pareto front of the bi-objective OneMinMax problem (the approximation quality is minimally better for our algorithm).

In summary, these results show that the NSGA-II with truthful crowding distance overcomes the difficulties of the classic NSGA-II in many-objective optimization, but preserves its good performance in bi-objective optimization.

2 Preliminaries

In this work, we discuss variants of the NSGA-II, the most prominent multi-objective evolutionary algorithm and one of the most successful approaches to solve multi-objective optimization problems.

A multi-objective optimization problem is a tuple f=(f1,,fm)f=(f_{1},\dots,f_{m}) of functions defined on a common search space. As common in discrete evolutionary optimization, we always consider the search space {0,1}n\{0,1\}^{n}. We call nn the problem size. When using asymptotic notation, this will be with respect to nn tending to infinity.

Also always, our goal will be to maximize ff. Since the individual objective fif_{i} might be conflicting, we usually do not have a single solution maximizing all fif_{i}. In this case, the best we can hope for are solutions that are not strictly dominate by others. We say that a solution xx dominates a solution yy, written as xyx\succeq y, if fi(x)fi(y)f_{i}(x)\geq f_{i}(y) for all i[1..m]i\in[1..m]. If in addition one of these inequalities is strict, we speak of strict domination, denoted by \succ. The Pareto set of a problem is the set of solutions that are not strictly dominated by another solution; the set of their objective values is called the Pareto front.

A common solution concept to multi-objective problems is to compute a small set SS of solutions such that f(S)f(S) is the Pareto front or approximates it in some sense. The idea is that a human decision maker, based on preferences not included in the problem formulation, can then select from SS the final solution.

For this problem, evolutionary algorithms have been employed with great success [CLvV07, ZQL+11]. The by far dominant algorithm among these multi-objective evolutionary algorithms (MOEAs) is the non-dominated sorting genetic algorithms II (NSGA-II) proposed by Deb et al. [DPAM02]. This algorithm works with a population PP, initialized randomly, of fixed size NN. Each iteration of the main optimization loop consist of creating NN new solutions from these parents (“offspring”) and selecting from the combined parent and offspring population the next population of NN individuals.

Various ways of creating the offspring have been used. We shall regard random parent selection (each offspring is generated from randomly chosen parents) and fair parent selection (only with mutation, here from each parent and offspring is generated via mutation), 1-bit and bit-wise mutation with mutation rate 1/n1/n, and uniform crossover. When using crossover, we assume that there is a positive constant p<1p<1 (crossover rate) and in each iteration, with probability pp and offspring is created via crossover, else via mutation. Binary tournament parent selection has also been studied, but the existing mathematical results, see, e.g., [ZD23a] suggest that it does not lead to substantially different results, but only to more complicated analyses.

More important and characteristic for the NSGA-II is the selection of the next parent population. The most important selection criterion is non-dominated sorting, that is, the combined parent and offspring population RR is partitioned into fronts F1,F2,F_{1},F_{2},\dots such that FiF_{i} consist of all non-dominated elements (that is, elements not strictly dominated by another one) of R(F1Fi1)R\setminus(F_{1}\cup\dots\cup F_{i-1}). Individuals in an earlier front are preferred in the selection of the next population, that is, for the maximum ii^{*} such that F1Fi1F_{1}\cup\dots\cup F_{i^{*}-1} contains less than NN elements, these fronts all fully go into the next population. The remaining elements are selected from the critical front FiF_{i^{*}} using a secondary criterion, which is the crowding distance for the NSGA-II. We defer the precise definition of the crowding distance to the subsequent section. We note that the non-dominated-sorting partition is uniquely defined and can be computed in time O(m|R|2)O(m|R|^{2}). The crowding distance depends on how ties in sortings are broken, the crowding distance of all individuals in FiF_{i^{*}} can be computed very efficiently in time O(m|Fi|log|Fi|)O(m|F_{i^{*}}|\log|F_{i^{*}}|).

The pseudocode of the NSGA-II can be found in Algorithms 1, where we note that the presentation here is optimized for a uniform treatment of this NSGA-II and the variant with sequential selection to be discussed now.

Noting that the removal of an individual changes the crowding distance of the remaining individuals, Kukkonen and Deb [KD06] proposed to take into account this change, that is, to sequentially remove individuals with smallest crowding distance and update the crowding distance of the remaining individuals. This was shown to give superior results in the empirical study [KD06] and the mathematical analysis [ZD24a].

Algorithm 1 The NSGA-II algorithm in its classic version [DPAM02] and with sequential selection [KD06]. When replacing in either version the original crowding distance with the truthful crowding distance proposed in this work, we obtain our truthful (sequential) NSGA-II-T.
1:Uniformly at random generate the initial population P0={x1,x2,,xN}P_{0}=\{x_{1},x_{2},\dots,x_{N}\} with xi{0,1}n,i=1,2,,N.x_{i}\in\{0,1\}^{n},i=1,2,\dots,N.
2:for t=0,1,2,t=0,1,2,\dots do
3:   Generate the offspring population QtQ_{t} with size NN
4:   Use non-dominated sorting to divide Rt=PtQtR_{t}=P_{t}\cup Q_{t} into F1,F2,F_{1},F_{2},\dots
5:   Find i1i^{*}\geq 1 such that i=1i1|Fi|<N\sum_{i=1}^{i^{*}-1}|F_{i}|<N and i=1i|Fi|N\sum_{i=1}^{i^{*}}|F_{i}|\geq N. Let Nr:=i=1i|Fi|NN_{r}:=\sum_{i=1}^{i^{*}}|F_{i}|-N
6:   Compute the crowding distance of each individual in FiF_{i^{*}}
7:   for d=1,,Nrd=1,\dots,N_{r} do
8:      Remove one individual in FiF_{i^{*}} with the smallest crowding distance, chosen at random in case of a tie
9:      For sequential selection only: Update the crowding distances of the individuals affected by the removal
10:   end for
11:   Pt+1:=i=1iFiP_{t+1}:=\bigcup_{i=1}^{i^{*}}F_{i}
12:end for

3 Classic and Truthful Crowding Distance

In this section, we first describe the original crowding distance used in the NSGA-II of Deb et al. [DPAM02] and compare it with other ways to select a subset of individuals from the critical front of a non-dominated sorting (secondary selection criterion). This comparison motivates the development of a modification of the crowding distance, called truthful crowding distance, done in the second half of this section.

3.1 Original Crowding Distance

When selecting the next population, the NSGA-II, NSGA-III, and SMS-EMOA first perform non-dominated sorting, resulting in a partition F1,F2,F_{1},F_{2},\dots of the combined parent and offspring population into fronts of pair-wise non-dominated individuals. For a suitable number ii^{*}, the first i1i^{*}-1 fronts are all taken into the next population; from FiF_{i^{*}} a subset is selecting according to a secondary criterion.

For the NSGA-II, this secondary criterion is the crowding distance. The crowding distance of an individual xx in a set SS is the sum, over all objectives, of the normalized distances of the two neighboring objective values. Formally, let mm be the number of objectives and let S={S1,,S|S|}S=\{S_{1},\dots,S_{|S|}\} be the set of individuals. For each i[1..m]i\in[1..m], let Si,1,,Si,|S|S_{i,1},\dots,S_{i,|S|} be the sorted list of SS w.r.t. fif_{i}. How ties in this sortings are broken has to be specified by the algorithm designer, we do not take any particular assumptions on that issue. For xSx\in S, we denote by ixi_{x} its position in the sorted list w.r.t. fif_{i}, that is, x=Si,ixx=S_{i,i_{x}}. The crowding distance of xx then is

cDis(x)={+,if ix{1,|S|} for some i[1..m],i=1m|fi(Si,ix1)fi(Si,ix+1)||fi(Si,1)fi(Si,|S|)|,otherwise.\operatorname{cDis}(x)=\begin{cases}+\infty,&\text{if $i_{x}\in\{1,|S|\}$ for some $i\in[1..m]$,}\\ \sum_{i=1}^{m}&\frac{|f_{i}(S_{i,i_{x}-1})-f_{i}(S_{i,i_{x}+1})|}{|f_{i}(S_{i,1})-f_{i}(S_{i,|S|})|},\text{otherwise.}\end{cases} (1)

The simple and intuitive definition of the crowding distance puts it ahead of other secondary criteria in several respects. Compared to the hypervolume contribution used by the SMS-EMOA, the crowding distance can be computed very efficiently, namely in time O(m|S|log|S|)O(m|S|\log|S|), which from m4m\geq 4 on is significantly faster than the best known approach to compute the hypervolume contribution, an algorithm with runtime O(|S|m/3polylog|S|)O(|S|^{m/3}\operatorname{polylog}|S|) from the break-through paper [Cha13], see also the surveys [SIHP20, GFP21].

Compared to the reference point mechanism employed by the NSGA-III, the crowding distance needs no parameters to be set. In contrast, the NSGA-III requires a normalization procedure (for which several proposals exist) and the set of reference points (for which several constructions exists, all having at least the number of reference points as parameter).

Besides many successful applications in practice, also a decent number of mathematical results show that the NSGA-II with its crowding distance secondary selection criterion is able to compute or approximate the Pareto front of various classic problems [ZLD22, BQ22, DQ23a, DQ23b, DOSS23b, DOSS23a, CDH+23, ZLDD24, ZD24a].

However, the positive results for the NSGA-II are limited to bi-objective problems, and this limitation is intimately connected to the crowding distance. As demonstrated by Zheng and Doerr [ZD23b], the NSGA-II fails to compute the Pareto front of the simple OneMinMax problem once the number of objective is at least three. The reason deduced in that work is the independent treatment of the objectives in calculating the crowding distance. Subsequent positive results for the NSGA-III [WD23, ODNS24] and SMS-EMOA [ZD24b, WD24]) for three or more objectives support the view that the crowding distance has intrinsic short-comings.

3.2 Truthful Crowding Distance

Given the undeniable algorithmic advantages of the crowding distance and its high acceptance by practitioners, we now design a simple and efficient variant of crowding distance that also works well for many objectives.

As pointed out in the example in [ZD23b], the original crowding distance allows that points far away from a solution still cause it to have a small crowding distance. This counter-intuitive and undesired behavior stems from the fully independent consideration of the objectives in the calculation of the crowding distance: In (1), the ii-th summand only relies on distances w.r.t. fif_{i} and ignores possibly large distances stemming from other objectives fj,jif_{j},j\neq i.

To avoid the undesired influence of points far away on the crowding distance components, but at the same time allow for a highly efficient computation of the crowding distance, we proceed as follows. (i) We replace the (normalized) distance in the ii-th objective by the (normalized) L1L_{1} distance. This avoids that points far in the objective space lead to low crowding distance values. (ii) We keep the property that the crowding distance is the sum of the crowding distance contributions of the different objectives. This was the key reason why the original crowding distance can be computed very efficiently. (iii) In the computation of the ii-th crowding distance contribution, we also keep working with the individuals sorted in order of descending fif_{i} value. (iv) Noting that the use of the L1L_{1} distance might imply that the point closest to some Si.jS_{i.j} is not necessarily Si.j1S_{i.j-1} or Si.j+1S_{i.j+1}, we consider the minimum L1L_{1} distance among Si.k,k<jS_{i.k},k<j. We note that this renders our crowding distance less symmetric than the original crowding distance, but we could not see a reason to let, in the language of the original crowding distance, |fi(Si.j)fi(Si.j1)||f_{i}(S_{i.j})-f_{i}(S_{i.j-1})| contribute to both the crowding distance of Si.jS_{i.j} and Si.j1S_{i.j-1}. In fact, we shall observe that this slightly less symmetric formulation will reduce the number of solutions with identical objective vector and positive crowding distance. (v) Finally, we shall assume that the different sortings used sort individuals with identical objective vectors in the same order (correlated tie-breaking). The original crowding distance does not specify how to break such ties, but any stable sorting algorithm will have this property, so this assumption is not very innovative. As observed in [BQ22], this assumption of correlated tie-breaking can reduce the minimum required population size for certain guarantees to hold.

We now give the formal definition of our crowding distance, which we call truthful crowding distance to reflect that fact that it better describes how isolated a solution is. Let S={S1,,S|S|}S=\{S_{1},\dots,S_{|S|}\} be a set of pair-wise non-dominated individuals. For all i[1..m]i\in[1..m], let Si.1,,Si.|S|S_{i.1},\dots,S_{i.{|S|}} be a sorted list of SS in descending order of fif_{i}. Assume correlated tie-breaking, that is, if two individuals have identical objective values, then they appear in all sortings in the same order.

If an individual xx appears as the first element of some sorting, that is, x=Si.1x=S_{i.1} for some i[1..m]i\in[1..m], then its truthful crowding distance tCD(x)\operatorname{tCD}(x) is tCD(x):=\operatorname{tCD}(x):=\infty. Otherwise, its crowding distance shall be the sum tCD(x)=i=1mtCDi(x)\operatorname{tCD}(x)=\sum_{i=1}^{m}\operatorname{tCD}_{i}(x) of the crowding distance contributions tCDi(x)\operatorname{tCD}_{i}(x), which we define now.

To this aim, let i[1..m]i\in[1..m] and j[2..|S|]j\in[2..|S|] such that x=Si.jx=S_{i.j}. For k<jk<j, we define the normalized L1L_{1} distance by

d(Si.k,Si.j)\displaystyle d(S_{i.k},S_{i.j}){} :=a=1m|fa(Si.k)fa(Si.j)|fa(Sa.1)fa(Sa.|S|),\displaystyle{}:=\sum_{a=1}^{m}\frac{|f_{a}(S_{i.k})-f_{a}(S_{i.j})|}{f_{a}(S_{a.1})-f_{a}(S_{a.{|S|}})},

where we count summands “0/00/0” as zero (this happens in the exotic case that in some objective, only a single objective value is present in SS). With this distance, the ii-th crowding distance contribution tCDi(x)\operatorname{tCD}_{i}(x) is defined as the smallest distance between Si.jS_{i.j} and a solution in an earlier position in the ii-th list:

tCDi(x):=mink<jd(Si.k,Si.j).\operatorname{tCD}_{i}(x):=\min_{k<j}d(S_{i.k},S_{i.j}).

This defines our variant of the crowding distance, called truthful crowding distance. The pseudocode of an algorithm computing it is given in Algorithm 2. As is easy to see, this algorithm has a time complexity quadratic in the size of the set SS, more precisely, Θ(m|S|2)\Theta(m|S|^{2}). This is more costly than the computation of the original crowding distance in time Θ(m|S|log|S|)\Theta(m|S|\log|S|). Since the best known time complexity of non-dominated sorting in the general case is Θ(m|S|2)\Theta(m|S|^{2}) and no better runtime can be expected in the general case [YRL+20], this moderate increase in the complexity of computing the crowding distance appears tolerable.

Algorithm 2 Computation of the truthful crowding distance tCD(S)\operatorname{\operatorname{tCD}}(S)

Input: S={S1,,S|S|}S=\{S_{1},\dots,S_{|S|}\}, a set of individuals
Output: tCD(S)=(tCD(S1),,tCD(S|S|))\operatorname{\operatorname{tCD}}(S)=(\operatorname{\operatorname{tCD}}(S_{1}),\dots,\operatorname{\operatorname{tCD}}(S_{|S|})), where tCD(Si)\operatorname{\operatorname{tCD}}\mathopen{}\mathclose{{}\left(S_{i}}\right) is the truthful crowding distance for SiS_{i}

1:tCD(S):=(0,,0)\operatorname{\operatorname{tCD}}(S):=(0,\dots,0)
2:for each objective fi,i=1,,mf_{i},i=1,\dots,m do
3:   Sort SS in order of descending fif_{i} value with correlated tie-breaking: Si.1,,Si.|S|S_{i.1},\dots,S_{i.{|S|}}
4:end for
5:for each objective fi,i=1,,mf_{i},i=1,\dots,m do
6:   tCD(Si.1):=+\operatorname{\operatorname{tCD}}\mathopen{}\mathclose{{}\left(S_{i.1}}\right):=+\infty
7:   for j=2,,|S|j=2,\dots,|S| do
8:      for k=1,,j1k=1,\dots,j-1 do
9:         d(Si.k,Si.j):=a=1m|fa(Si.k)fa(Si.j)|fa(Sa.1)fa(Sa.|S|)d(S_{i.k},S_{i.j}):=\sum_{a=1}^{m}\frac{|f_{a}(S_{i.k})-f_{a}(S_{i.j})|}{f_{a}(S_{a.1})-f_{a}(S_{a.{|S|}})}
10:      end for
11:      tCD(Si.j):=tCD(Si.j)+mink=1,,j1d(Si.k,Si.j)\operatorname{\operatorname{tCD}}(S_{i.j}):=\operatorname{\operatorname{tCD}}(S_{i.j})+\min\limits_{k=1,\dots,j-1}d(S_{i.k},S_{i.j})
12:   end for
13:end for

As said, we propose in this work to use the classic NSGA-II or the sequential NSGA-II, but with the original crowding distance replaced by the truthful crowding distance. We call the resulting algorithms truthful (sequential) NSGA-II, abbreviated (sequential) NSGA-II-T.

4 Runtime Analysis: Computing the Pareto Front

Having introduced the truthful crowding distance and the truthful (sequential) NSGA-II, denoted by NSGA-II-T, in this and the subsequent section we will conduct several runtime analyses of this algorithm. The results in this section will in particular show that the NSGA-II-T can efficiently optimize the many-objective problems, in contrast to the exponential runtime of the original NSGA-II on mOneMinMax [ZD23b].

4.1 Not Losing Pareto Front Points

The key ingredient to all proofs in this section is what we show in this subsection (in Theorem 2), namely that the NSGA-II-T with sufficiently large population size cannot lose Pareto optimal solution values (and more generally, can lose solution values only by replacing them with better ones). This is a critical difference to the classic NSGA-II, as shown in [ZD24b].

A step towards proving this important property is the following lemma, which asserts that for each objective vector of the population exactly one individual with this function value has a positive crowding distance.

Lemma 1.

Let mm\in\mathbb{N} be the number of objectives of the discussed function f=(f1,,fm)f=(f_{1},\dots,f_{m}). Let SS be a population of individuals in {0,1}n\{0,1\}^{n}. Assume that we compute the truthful crowding distance tCD(S)\operatorname{\operatorname{tCD}}(S) via Algorithm 2. Then for any function value vf(S)v\in f(S), exactly one individual xSx\in S with f(x)=vf(x)=v has a positive truthful crowding distance (and the others have a truthful crowding distance of zero).

Proof.

Let S={S1,,S|S|}S=\{S_{1},\dots,S_{|S|}\}. Let I={j[1..|S|]f(Sj)=v}I=\{j\in[1..|S|]\mid f(S_{j})=v\} be the index set of the individuals in SS with function value vv. For the sorted list Si.1,,Si.|S|S_{i.1},\dots,S_{i.|S|}, let I1(i),,I|I|(i)I^{(i)}_{1},\dots,I^{(i)}_{|I|} be the increasing sequence of the indices jj of all elements in II, that is, we have I={Si.Ik(i)k[1..|I|]}I=\{S_{i.I^{(i)}_{k}}\mid k\in[1..|I|]\} and Ik(i)<Ik+1(i)I^{(i)}_{k}<I^{(i)}_{k+1} for all k[1..|I|1]k\in[1..|I|-1]. By definition of correlated tie-breaking, we know that for all k=1,,|I|k=1,\dots,|I|, we have S1.Ik(1)==Sm.Ik(m)S_{1.I^{(1)}_{k}}=\dots=S_{m.I^{(m)}_{k}}.

We first show that S1.I1(1)S_{1.I^{(1)}_{1}} has a positive crowding distance. If I1(1)=1I^{(1)}_{1}=1, then tCD(S1.I1(1))=+>0\operatorname{\operatorname{tCD}}\big{(}S_{1.I^{(1)}_{1}}\big{)}=+\infty>0. If I1(1)>1I^{(1)}_{1}>1, then all individuals S1.jS_{1.j} with j<I1(1)j<I^{(1)}_{1} have function values different from vv. That is, for each j<I1(1)j<I^{(1)}_{1} there exists i[1..m]i^{\prime}\in[1..m] such that fi(S1.j)fi(S1.I1(1))f_{i^{\prime}}\mathopen{}\mathclose{{}\left(S_{1.j}}\right)\neq f_{i^{\prime}}\big{(}S_{1.I^{(1)}_{1}}\big{)}. Recalling that we regard a normalized version of the L1L_{1} distance, this implies d(S1.j,S1.I1(1))>0d\big{(}S_{1.j},S_{1.I^{(1)}_{1}}\big{)}>0 for all j<I1(1)j<I^{(1)}_{1}, thus 0<tCD1(S1.I1(1))tCD(S1.I1(1))0<\operatorname{tCD}_{1}\big{(}S_{1.I^{(1)}_{1}}\big{)}\leq\operatorname{tCD}\big{(}S_{1.I^{(1)}_{1}}\big{)} as desired.

We end the proof by showing that for k>1k>1, we have tCD(S1.Ik(1))=0\operatorname{tCD}(S_{1.I^{(1)}_{k}})=0. To this aim, we observe that for all i[1..m]i\in[1..m], we have Si.Ik(i)=S1.Ik(1)S_{i.I_{k}^{(i)}}=S_{1.I^{(1)}_{k}}, this individual and Si.I1(i)S_{i.I_{1}^{(i)}} have the same ff-value giving d(Si.I1(i),Si.Ik(i))=0d\big{(}S_{i.I^{(i)}_{1}},S_{i.I^{(i)}_{k}}\big{)}=0, and I1(i)<Ik(i)I^{(i)}_{1}<I^{(i)}_{k}. Consequently, tCDi(Si.Ik(i))\operatorname{\operatorname{tCD}}_{i}\big{(}S_{i.I^{(i)}_{k}}\big{)} is zero. Since this holds for all i=1,,mi=1,\dots,m, we have tCD(S1.Ik(1))=0\operatorname{\operatorname{tCD}}\big{(}S_{1.I^{(1)}_{k}}\big{)}=0. ∎

From Lemma 1, we derive our main technical tool asserting that a sufficiently high population size ensures that the NSGA-II-T does not lose desirable solutions.

Theorem 2.

Let mm\in\mathbb{N} be the number of objectives for a given f=(f1,,fm)f=(f_{1},\dots,f_{m}), and let M¯\overline{M}\in\mathbb{N} be such that any set SS of incomparable solutions satisfies |S|M¯|S|\leq\overline{M}. Consider using the (sequential) NSGA-II-T with population size NM¯N\geq\overline{M} to optimize ff. For any solution xx in the combined parent and offspring population RtR_{t}, in the next and all future generations, there is at least one individual yy such that yxy\succeq x.

We note that for many problems, the maximum size of a set of incomparable solutions M¯\overline{M} is already witnessed by the Pareto front FF^{*} (that is, |F|=M¯|F^{*}|=\overline{M}). Hence the requirement NM¯N\geq\overline{M} of the theorem is needed anyway to ensure that the algorithm can store a population PP with f(P)=Ff(P)=F^{*}.

Proof of Theorem 2.

We conduct the proof for the more complicated case of the sequential NSGA-II-T, the proof for the NSGA-II-T follows from a subset of the arguments.

Let xRtx\in R_{t} be in the ii-th front, that is, xFix\in F_{i}. If i<ii<i^{*}, from the selection in the NSGA-II-T, we know that xx will enter into the next generation, and y=xy=x suffices. If iii\geq i^{*} and i>1i^{*}>1, then there exists a solution yF1y\in F_{1} such yxy\succeq x and yy will enter into the next generation. Hence, we only need to discuss the case i=i=1i=i^{*}=1 in the following.

From Lemma 1, we know that for each function value in f(F1)f(F_{1}) there is an individual in RtR_{t} with positive truthful crowding distance. Let yRty\in R_{t} be the individual with f(y)=f(x)f(y)=f(x) and with positive truthful crowding distance. Then yF1y\in F_{1} as well and yxy\succeq x.

From the definition of M¯\overline{M} and Lemma 1 again (now referring to the assertion that there is at most one individual per objective value with positive truthful crowding distance), we know that before each the removal, there are at most M¯\overline{M} individuals in F1F_{1} with positive truthful crowding distance. Since NM¯N\geq\overline{M} individuals of F1F_{1} survive, this means that in the whole removal procedure, only individuals with zero truthful crowding distance will be removed. By the definition of the truthful crowding distance, a crowding distance of zero means that there is a second individual with same objective value appearing in all sortings before the first. Hence the removal of the first individual does not change the truthful crowding distance of any other individual (by definition of the truthful crowding distance). Hence, all individuals having initially a positive truthful crowding distance, including yy, will survive to the next population. ∎

Corollary 3.

Under the assumptions of Theorem 2, once a solution with a given Pareto optimal solution value is found, such a solution will be contained in the population for all future generations.

4.2 Runtime Results for Many Objectives

We now build on the structural insights on the NSGA-II-T gained in the previous subsection and show that this algorithm can easily optimize the standard benchmarks, roughly with the same efficiency as the global SEMO algorithm, a minimalistic MOEA mostly used in theoretical analyses, and the two classic MOEAs NSGA-III and SMS-EMOA. This in particular shows that the truthful NSGA-II does not face the problems the classic NSGA-II faces when the number of objectives is three or more.

For our analysis, we are lucky that we can heavily build on the work [WD24], in which near-tight runtime guarantees for many-objective optimization were proven. As discussed in [WD24, Section 5], their proofs only rely on two crucial properties: (i) that solutions values are never lost except when replaced by better ones, and (ii) that there is a number SS such that for any individual xx in the population, with probability 1/S1/S this xx is selected as parent and an offspring is generated from it via bit-wise mutation with mutation rate 1n\frac{1}{n}.

It is easy to see that these properties are fulfilled for our (sequential) NSGA-II-T when using bit-wise mutation. Property (i) is just the assertion of Theorem 2. Property (ii) follows immediately from the definition of our algorithm: the probability for this event is 11 for fair selection and 11p(11/N)N=Θ(1)\frac{1}{1-p}(1-1/N)^{N}=\Theta(1) for random selection and crossover rate p<1p<1. With these considerations, we immediately extend the results of [WD24] to the truthful (sequential) NSGA-II.

Theorem 4.

Consider using the (sequential) NSGA-II-T with problem size NM:=(2n/m+1)m/2N\geq M:=(2n/m+1)^{m/2}, fair or random selection, standard bit-wise mutation with mutation rate 1/n1/n, and possibly crossover with rate less than one in the case of random selection, to optimize the mOneMinMax or mCOCZ benchmarks. Then in an expected number of O(nm)+O(m2lnn)O(nm)+O(m^{2}\ln n) iterations, the full Pareto front of the mOneMinMax or mCOCZ benchmarks is covered by the population.

Theorem 5.

Consider using the (sequential) NSGA-II-T with problem size NM:=(2n/m+1)m/2N\geq M:=(2n/m+1)^{m/2}, fair or random selection, standard bit-wise mutation with mutation rate 1/n1/n, and possibly crossover with rate less than one in the case of random selection, to optimize the mLOTZ benchmark. Then in an expected number of O(n2/m)+O(mnln(n/m))+O(nlnn)O(n^{2}/m)+O(mn\ln(n/m))+O(n\ln n) iterations, the full Pareto front of the mLOTZ benchmark is covered.

Theorem 6.

Let k[2..n/m]k\in[2..n/m]. Let M=(2n/m2k+3)m/2M=(2n/m-2k+3)^{m/2}. Consider using the (sequential) NSGA-II-T with problem size NM¯N\geq\overline{M}, fair or random selection, standard bit-wise mutation with mutation rate 1/n1/n, and possibly crossover with rate less than one in the case of random selection, to optimize mOJZJkm\textsc{OJZJ}_{k}. Then in an expected number of O(mnk)O(mn^{k}) iterations, the full Pareto front of the mOJZJkm\textsc{OJZJ}_{k} benchmark is covered.

We have not defined the benchmark problems regarded in the above results, both because they are the most common benchmarks in the theory of MOEAs and because our proof do not directly refer to them (all problem-specific arguments are taken from [WD24]). The reader interested in the definitions can find them all in [WD24].

4.3 Runtime Results for Two Objectives

We now turn to the bi-objective versions of the benchmarks studied above and the DLTB benchmark. Here the classic NSGA-II was shown to be efficient in previous work [ZLD22, ZD23a, BQ22, DQ23a, ZLDD24]. Using the same arguments as in the previous section, we show the following results for the (sequential) NSGA-II-T. In the runtimes, they agree with the known asymptotic results for the classic NSGA-II. However, the minimum required population size (which has a direct influence on the cost of one iteration) is by a factor of two or four smaller than in the previous works. Since it is equal to the size of the Pareto front, it is clear that even smaller population sizes cannot be employed.

Theorem 7.

Consider using the (sequential) NSGA-II-T with problem size Nn+1N\geq n+1, fair or random selection, standard bit-wise mutation with mutation rate 1/n1/n, and possibly crossover with rate less than one in the case of random selection, to optimize OneMinMax or COCZ. Then in an expected number of O(nlogn)O(n\log n) iterations, the full Pareto front of OneMinMax or COCZ is covered.

Theorem 8.

Consider using the (sequential) NSGA-II-T with problem size Nn+1N\geq n+1, fair or random selection, standard bit-wise mutation with mutation rate 1/n1/n, and possibly crossover with rate less than one in the case of random selection, to optimize LOTZ. Then in an expected number of O(n2)O(n^{2}) iterations, the full Pareto front of LOTZ is covered.

Theorem 9.

Let k[1..n/2]k\in[1..n/2]. Consider using the (sequential) NSGA-II-T with problem size Nn2k+3N\geq n-2k+3, fair or random selection, standard bit-wise mutation with mutation rate 1/n1/n, and possibly crossover with rate less than one in the case of random selection, to optimize OneJumpZeroJumpk\textsc{OneJumpZeroJump}_{k}. Then in an expected number of O(nk)O(n^{k}) iterations, the full Pareto front of OneJumpZeroJumpk\textsc{OneJumpZeroJump}_{k} is covered.

Theorem 10.

Consider using the (sequential) NSGA-II-T with problem size Nn+1N\geq n+1, fair or random selection, standard bit-wise mutation with mutation rate 1/n1/n, and possibly crossover with rate less than one in the case of random selection, to optimize DLTB. Then in expected O(n3)O(n^{3}) iterations, the full Pareto front of the DLTB is covered.

5 Approximation Ability and Runtime

In Section 4, we proved that the standard and sequential NSGA-II-T can efficiently optimize the many-objective mOJZJ, mOneMinMax, mCOCZ, and mLOTZ benchmarks as well as the popular bi-objective OneJumpZeroJump, OneMinMax, COCZ, LOTZ, and DLTB benchmarks. In this section, we will consider the approximation ability when the population size is too small to cover the full Pareto front. We will prove that the sequential NSGA-II-T has a slightly better approximation performance than the sequential NSGA-II and the steady-state NSGA-II for OneMinMax [ZD24a].

We note that there are no proven approximation guarantees for non-sequential variants of the NSGA-II (except for the steady-state version) so far and the mathematical results in [ZD24a] suggest that such results might be difficult to obtain. For that reason, we do not aim at such results for the truthful NSGA-II. We also note that so far there is no theoretical study on the approximation ability of the NSGA-II other than for the (bi-objective) OneMinMax benchmark [ZD24a]. We shall therefore also only consider this problem. We expect that results for larger numbers of objectives or other benchmarks need considerably new methods as already the approximation measure MEI is may not be suitable then.

The following lemma gives a useful criterion for individuals surviving into the next generation.

Lemma 11.

Consider using the sequential NSGA-II-T with population size N2N\geq 2 to optimize OneMinMax with problem size nn. Assume that the two extreme points 0n0^{n} and 1n1^{n} are in the population Pt0P_{t_{0}}. Then for any generation tt0t\geq t_{0}, in Steps 7 to 10 of Algorithm 1, any individual with truthful crowding distance more than 4N1\frac{4}{N-1} (including two extreme points) will survive to Pt+1P_{t+1}.

Proof.

Consider some iteration tt0t\geq t_{0}. Let RR denote the combined parent and offspring population. We recall that Pt+1P_{t+1} is constructed from RR by sequentially removing individuals with the smallest current tCD\operatorname{\operatorname{tCD}}-value. By definition, the removal of an individual will not decrease the truthful crowding distance of the remaining individuals. In particular, individuals that initially have an infinite truthful crowding distance or have a crowding distance of at least 4N1\frac{4}{N-1} will keep this property throughout this iteration.

It is not difficult to see that there are exactly one copy of 0n0^{n} and of 1n1^{n} with infinite truthful crowding distance. Since N2N\geq 2, both individuals will be kept to the next and all future generations.

Now consider RR at some stage of the sequential selection process towards Pt+1P_{t+1}, that is, with some individuals already removed. Let r:=|R|r:=|R| and let s11,,sr1s_{1}^{1},\dots,s_{r}^{1} and s12,,sr2s_{1}^{2},\dots,s_{r}^{2} be the two lists representing RR w.r.t. decreasing values of f1f_{1} and f2f_{2}, respectively. Let j1[1..r]j_{1}\in[1..r] be the position of s11s_{1}^{1} in the sorted list w.r.t. f2f_{2}, that is, sj12=s11s^{2}_{j_{1}}=s^{1}_{1}. Likewise, let i1[1..r]i_{1}\in[1..r] be the position of s12s_{1}^{2} in the sorted list w.r.t. f1f_{1}, that is, si11=s12s^{1}_{i_{1}}=s^{2}_{1}. For any xRx\in R, we have unique i,j[1..r]i,j\in[1..r] such that x=si1=sj2x=s_{i}^{1}=s_{j}^{2}. Since 0n0^{n} and 1n1^{n} are in RR, then (s11,sr1)=(1n,0n)(s_{1}^{1},s_{r}^{1})=(1^{n},0^{n}) and (s12,sr2)=(0n,1n)(s_{1}^{2},s_{r}^{2})=(0^{n},1^{n}), and for xx with i[2..r]{i1}i\in[2..r]\setminus\{i_{1}\} and j[2..r]{j1}j\in[2..r]\setminus\{j_{1}\}, we have

tCD(x)=\displaystyle\operatorname{\operatorname{tCD}}(x)={} (f1(si11)f1(si1)n+f2(si1)f2(si11)n)\displaystyle{}\mathopen{}\mathclose{{}\left(\frac{f_{1}(s_{i-1}^{1})-f_{1}(s_{i}^{1})}{n}+\frac{f_{2}(s_{i}^{1})-f_{2}(s_{i-1}^{1})}{n}}\right)
+(f1(sj2)f1(sj12)n+f2(sj12)f2(sj2)n).\displaystyle{}+\mathopen{}\mathclose{{}\left(\frac{f_{1}(s_{j}^{2})-f_{1}(s_{j-1}^{2})}{n}+\frac{f_{2}(s_{j-1}^{2})-f_{2}(s_{j}^{2})}{n}}\right).

Noting that s11s12s_{1}^{1}\neq s_{1}^{2} since s11=1n,s12=0ns_{1}^{1}=1^{n},s_{1}^{2}=0^{n}, we compute

xR{s11,s12}\displaystyle\sum_{x\in R\setminus\{s_{1}^{1},s_{1}^{2}\}}{} tCD(x)=i[2..r]{i1}(f1(si11)f1(si1)n+f2(si1)f2(si11)n)\displaystyle{}\operatorname{\operatorname{tCD}}(x)=\sum_{i\in[2..r]\setminus\{i_{1}\}}\mathopen{}\mathclose{{}\left(\frac{f_{1}(s_{i-1}^{1})-f_{1}(s_{i}^{1})}{n}+\frac{f_{2}(s_{i}^{1})-f_{2}(s_{i-1}^{1})}{n}}\right)
+j[2..r]{j1}(f1(sj2)f1(sj12)n+f2(sj12)f2(sj2)n)\displaystyle{}+\sum_{j\in[2..r]\setminus\{j_{1}\}}\mathopen{}\mathclose{{}\left(\frac{f_{1}(s_{j}^{2})-f_{1}(s_{j-1}^{2})}{n}+\frac{f_{2}(s_{j-1}^{2})-f_{2}(s_{j}^{2})}{n}}\right)
=\displaystyle={} (i=2i11+i=i1+1r)(f1(si11)f1(si1)n+f2(si1)f2(si11)n)\displaystyle{}\mathopen{}\mathclose{{}\left(\sum_{i=2}^{i_{1}-1}+\sum_{i=i_{1}+1}^{r}}\right)\mathopen{}\mathclose{{}\left(\frac{f_{1}(s_{i-1}^{1})-f_{1}(s_{i}^{1})}{n}+\frac{f_{2}(s_{i}^{1})-f_{2}(s_{i-1}^{1})}{n}}\right)
+(j=2j11+j=j1+1r)(f1(sj2)f1(sj12)n+f2(sj12)f2(sj2)n)\displaystyle{}+\mathopen{}\mathclose{{}\left(\sum_{j=2}^{j_{1}-1}+\sum_{j=j_{1}+1}^{r}}\right)\mathopen{}\mathclose{{}\left(\frac{f_{1}(s_{j}^{2})-f_{1}(s_{j-1}^{2})}{n}+\frac{f_{2}(s_{j-1}^{2})-f_{2}(s_{j}^{2})}{n}}\right)
=\displaystyle={} (f1(s11)f1(si111))+(f2(si111)f2(s11))n\displaystyle{}\frac{(f_{1}(s_{1}^{1})-f_{1}(s_{i_{1}-1}^{1}))+(f_{2}(s_{i_{1}-1}^{1})-f_{2}(s_{1}^{1}))}{n}
+(f1(si11)f1(sr1))+(f2(sr1)f2(si11))n\displaystyle{}+\frac{(f_{1}(s_{i_{1}}^{1})-f_{1}(s_{r}^{1}))+(f_{2}(s_{r}^{1})-f_{2}(s_{i_{1}}^{1}))}{n}
+(f1(sj112)f1(s12))+(f2(s12)f2(sj112))n\displaystyle{}+\frac{(f_{1}(s_{j_{1}-1}^{2})-f_{1}(s_{1}^{2}))+(f_{2}(s_{1}^{2})-f_{2}(s_{j_{1}-1}^{2}))}{n}
+(f1(sr2)f1(sj12))+(f2(sj12)f2(sr2))n\displaystyle{}+\frac{(f_{1}(s_{r}^{2})-f_{1}(s_{j_{1}}^{2}))+(f_{2}(s_{j_{1}}^{2})-f_{2}(s_{r}^{2}))}{n}
=\displaystyle={} f1(s11)f1(sr1)+f1(si11)f1(si111)n\displaystyle{}\frac{f_{1}(s_{1}^{1})-f_{1}(s_{r}^{1})+f_{1}(s_{i_{1}}^{1})-f_{1}(s_{i_{1}-1}^{1})}{n}
+f2(sr1)f2(s11)+f2(si111)f2(si11)n\displaystyle{}+\frac{f_{2}(s_{r}^{1})-f_{2}(s_{1}^{1})+f_{2}(s_{i_{1}-1}^{1})-f_{2}(s_{i_{1}}^{1})}{n}
+f1(sr2)f1(s12)+f1(sj112)f1(sj12)n\displaystyle{}+\frac{f_{1}(s_{r}^{2})-f_{1}(s_{1}^{2})+f_{1}(s_{j_{1}-1}^{2})-f_{1}(s_{j_{1}}^{2})}{n}
+f2(s12)f2(sr2)+f2(sj12)f2(sj112)n\displaystyle{}+\frac{f_{2}(s_{1}^{2})-f_{2}(s_{r}^{2})+f_{2}(s_{j_{1}}^{2})-f_{2}(s_{j_{1}-1}^{2})}{n}
\displaystyle\leq{} f1(s11)f1(sr1)+f2(sr1)f2(s11)n\displaystyle{}\frac{f_{1}(s_{1}^{1})-f_{1}(s_{r}^{1})+f_{2}(s_{r}^{1})-f_{2}(s_{1}^{1})}{n}
+f1(sr2)f1(s12)+f2(s12)f2(sr2)n\displaystyle{}+\frac{f_{1}(s_{r}^{2})-f_{1}(s_{1}^{2})+f_{2}(s_{1}^{2})-f_{2}(s_{r}^{2})}{n}
=\displaystyle={} 4,\displaystyle{}4,

where the last inequality uses f1(si111)f1(si11)f_{1}(s_{i_{1}-1}^{1})\geq f_{1}(s_{i_{1}}^{1}) and f2(sj112)f2(sj12)f_{2}(s_{j_{1}-1}^{2})\geq f_{2}(s_{j_{1}}^{2}) due to the sorted lists, and further f2(si111)f2(si11)f_{2}(s_{i_{1}-1}^{1})\leq f_{2}(s_{i_{1}}^{1}) and f1(sj112)f1(sj12)f_{1}(s_{j_{1}-1}^{2})\leq f_{1}(s_{j_{1}}^{2}) since in a bi-objective incomparable set, any sorting with respect to the first objective is a sorting in inverse order for the second objective. Since |R{s11,s12}|N1|R\setminus\{s_{1}^{1},s_{1}^{2}\}|\geq N-1, we know that at least one of individuals in R{s11,s12}R\setminus\{s_{1}^{1},s_{1}^{2}\} will have its tCD4N1\operatorname{\operatorname{tCD}}\leq\frac{4}{N-1}, and thus any individual with its tCD>4N1\operatorname{\operatorname{tCD}}>\frac{4}{N-1} will not be removed. ∎

The following lemma shows that once the two extreme points are in the population, a linear runtime suffices to obtain a good approximation of the Pareto front of OneMinMax.

Lemma 12.

Consider using the sequential NSGA-II-T with population size N2N\geq 2, fair or random parent selection, one-bit mutation or standard bit-wise mutation, crossover with constant rate below 11 or no crossover, to optimize OneMinMax with problem size nn. Let L:=max{2nN1,1}L:=\max\{\frac{2n}{N-1},1\}. Assume that the two extreme points 0n0^{n} and 1n1^{n} are in the population for the first time at some generation t0t_{0}. Then after O(n)O(n) more iterations (both in expectation and with high probability), the population will have its an MEI value of at most LL. It remains in this state for all future generations.

Proof sketch.

We use similar argument as in the proof of [ZD24a, Lemma 14] and only show the difference here. Let i[0..n1],tt0i\in[0..n-1],t\geq t_{0}. Let XtX_{t} and YtY_{t} be the lengths of the empty intervals containing i+0.5i+0.5 in f(Pt)f(P_{t}) and in f(Rt)f(R_{t}), respectively. We first show that if YtMY_{t}\leq M for some MLM\geq L, then XτMX_{\tau}\leq M for all τ>t\tau>t. If not, since YtXtY_{t}\leq X_{t}, w.l.o.g., we assume that Xt+1>MX_{t+1}>M. Let xx be the individual whose removal lets the length of the empty interval containing i+0.5i+0.5 increase from a value of at most MM to a value larger than MM. Then before the removal, f1(x)f_{1}(x) must be one of the end points of the empty interval containing i+0.5i+0.5, w.l.o.g, the left end point (the smaller f1f_{1} value). We also know that the empty interval containing i+0.5i+0.5 after the removal of xx has lengths equal to the truthful crowding distance of xx multiplied by n/2n/2. Hence, we know that

tCD(x)>2Mn2Ln=4N1,\displaystyle\operatorname{\operatorname{tCD}}(x)>\frac{2M}{n}\geq\frac{2L}{n}=\frac{4}{N-1},

which is contradicts our insight from Lemma 11 that an xx with tCD>4/(N1)\operatorname{\operatorname{tCD}}>4/(N-1) cannot be removed.

The remaining argument about the first time the population has MEIL\textsc{MEI}\leq L is exactly the same as in the proof of [ZD24a, Lemma 14], except for the case with crossover. Since crossover is used with a constant probability less than one, there is a constant rate of iterations using mutation only. The arguments analyzing the selection are independent of the variation operator (so in particular, the empty interval lengths are non-increasing when at least LL). Consequently, by simply ignoring a possible profit from crossover iterations, we obtain the same runtime guarantee as in the mutation-only case. ∎

Noting that the maximal function value of f1(Pt)f_{1}(P_{t}) and f2(Pt)f_{2}(P_{t}) cannot decrease (there always is one individual witnessing this value and having infinite truthful crowding distance), we easily obtain that in expected O(nlogn)O(n\log n) iterations both extreme points 0n0^{n} and 1n1^{n} are reached for the first time, and also for all future iterations. This can be shown with a proof analogous to the one of [ZD24a, Lemma 15]. Therefore, we have the following main result on the approximation ability and runtime of the sequential NSGA-II-T.

Theorem 13.

Consider using the sequential NSGA-II-T with population size N2N\geq 2, fair or random parent selection, one-bit mutation or standard bit-wise mutation, crossover with constant rate below 11 or no crossover, to optimize OneMinMax with problem size nn. Then after an expected number of O(nlogn)O(n\log n) iterations, the population contains 0n0^{n} and 1n1^{n} and satisfies MEImax{2nN1,1}\textsc{MEI}\leq\max\{\frac{2n}{N-1},1\}. After that, these conditions will be kept for all future iterations.

Note that the best possible value for the MEI is nN1\lceil\frac{n}{N-1}\rceil [ZD24a]. Theorem 13 shows that the sequential NSGA-II-T can reach a good approximation guarantee of MEI of the optimal value multiplied by at most only a factor of two. We also note that for the NSGA-II with sequential survival selection using the classic crowding distance, an approximation guarantee of MEImax{2nN3,1}\textsc{MEI}\leq\max\{\frac{2n}{N-3},1\} (also within O(nlogn)O(n\log n) iterations) was shown in [ZD24a]. The slightly better approximation ability shown above (within the same runtime), as the proof of Lemma 12 shows, stems from the fact that our definition of the crowding distance admits at most one individual with infinite crowding distance contribution per objective, whereas the classic crowding distance admits two.

6 Experiments

To complement our theoretical findings, we now show a few experimental results. These meant as illustration of our main (mathematical) results, not as substantial stand-alone results.

Computing the full Pareto front, many objectives: Our main theoretical result was a proof that the NSGA-II-T can efficiently solve many-objective problems, different from the classic NSGA-II, where an exponential lower bound was shown for the OneMinMax problem. To illustrate how the NSGA-II-T solves many-objective problems, we regard the 4-objective OneMinMax problem. That the NSGA-II cannot solve this problem efficiently was shown, also experimentally, in [ZD23b]. We hence study only the (sequential) NSGA-II-T as algorithm, using random selection, bit-wise mutation, no crossover, and the minimal possible population size N=MN=M (the Pareto front size) and N=2MN=2M. We also use the GSEMO toy algorithm. In Figure 1, we display the median (over 20 runs) number of function evaluations these algorithms took to cover the full Pareto front of 44-OneMinMax for different problem sizes nn.

Refer to caption
Figure 1: Median number (with 11st and 33rd quartiles, in 20 runs) of function evaluations to compute the full Pareto front of the 44-objective OneMinMax problem.

We observe that all algorithms efficiently find the full Pareto front, in drastic contrast to the results for the classic NSGA-II in [ZD23b, Figure 1 and 2]. Not surprisingly, a larger population size is not helpful, which shows that it is good that the NSGA-II-T admits smaller population sizes than the classic NSGA-II. Also not surprisingly, the sequential versions give slightly better results. The minimal inferiority of the (sequential) NSGA-II-T (with N=MN=M) to the toy GSEMO does not mean a lot given that this algorithm is rarely used in practice.

Computing the full Pareto front, two objectives: We conducted analoguous experiments for two objectives, where a comparison with the NSGA-II is interesting. The proven guarantees for the NSGA-II require a population size of N4MN\geq 4M, where this algorithm is clearly slower than all others regarded by us. We therefore did some preliminary experiments showing that already for N=1.5MN=1.5M the NSGA-II consistently is able to solve our problem instances. The results for this NSGA-II, the NSGA-II-T with optimal population size N=MN=M and with N=1.5MN=1.5M, and the GSEMO are shown in Figure 2. With this optimized population size for the NSGA-II, all algorithms show a roughly similar performance on the 22-objective OneMinMax problem, with the NSGA-II slightly ahead.

Refer to caption
Figure 2: The number of function evaluations to cover the full Pareto front for OneMinMax.

Approximation results: To analyze who well the different NSGA-II variants with small population size approximate the Pareto front, we conduct the following experiments. Note that the GSEMO cannot be used for approximative solution and is therefore not included. Following experimental settings in the only previous theoretical work [ZD24a] on the approximation topic, we regard the bi-objective OneMinMax problem with problem size n=601n=601. We use the same algorithms as above (except for the GSEMO), with population sizes N=(n+1)/2(=301),(n+1)/4(=151),(n+1)/8(=76)N=\lceil(n+1)/2\rceil(=301),\lceil(n+1)/4\rceil(=151),\lceil(n+1)/8\rceil(=76). As before, we measure the approximation quality via the MEI. We note that the best possible MEI values are 3,5,93,5,9 for N=301,151,76N=301,151,76 respectively.

As in [ZD24a], we regard the approximation quality in two time intervals, namely in iterations [1..100][1..100] and [3001..3100][3001..3100] after the two extreme points of the Pareto front have entered the population.

Refer to caption
Figure 3: The MEI for generations [1..100][1..100] and [3001..3100][3001..3100] after the two extreme points were found (one run).

Figure 3 shows the MEI values for the different algorithms in a single run, for reasons of space only for N=151N=151 (but the other population sizes gave a similar picture). We clearly see a much better performance of the sequential algorithms, with no significant differences between the classic and the truthful sequential NSGA-II.

7 Conclusion and Future Work

To overcome the difficulties the NSGA-II was found to have in many-objective optimization, we used the insights from several previous theoretical works, most profoundly [ZD23b], to design a truthful crowding distance for the NSGA-II. Different from the original crowding distance, this new measure has the natural and desirable property that solutions with objective vector far from all others receive a large crowding distance value. The truthful crowding distances are slightly more complex to compute, but asymptotically not more complex than the non-dominated sorting step of the NSGA-II.

Via mathematical runtime analyses on several classical benchmark problems, we prove that the NSGA-II with the truthful crowding distance indeed is effective in more than two objectives, admitting the same performance guarantees as previously shown for the harder to use NSGA-III and the SMS-EMOA, which is computationally demanding due to the use of the hypervolume contribution. For the bi-objective benchmarks, for which the classic NSGA-II has been analyzed, we prove the same runtime guarantees (however, only requiring the population size to be at least the size of the Pareto front). Similarly, the truthful NSGA-II admits essentially the same (that is, minimally stronger) approximation guarantees as previously shown for the classic NSGA-II.

Consequently, the NSGA-II with truthful crowding distance overcomes the difficulties of the classic NSGA-II in many objective without that we observe disadvantages in two objectives, where the classic NSGA-II has shown a very good performance.

References

  • [BQ22] Chao Bian and Chao Qian. Better running time of the non-dominated sorting genetic algorithm II (NSGA-II) by using stochastic tournament selection. In Parallel Problem Solving From Nature, PPSN 2022, pages 428–441. Springer, 2022.
  • [CDH+23] Sacha Cerf, Benjamin Doerr, Benjamin Hebras, Jakob Kahane, and Simon Wietheger. The first proven performance guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) on a combinatorial optimization problem. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5522–5530. ijcai.org, 2023.
  • [Cha13] Timothy M. Chan. Klee’s measure problem made easy. In IEEE Symposium on Foundations of Computer Science, FOCS 2013, pages 410–419. IEEE Computer Society, 2013.
  • [CLvV07] Carlos Artemio Coello Coello, Gary B. Lamont, and David A. van Veldhuizen. Evolutionary Algorithms for Solving Multi-Objective Problems. Springer, 2nd edition, 2007.
  • [DJ14] Kalyanmoy Deb and Himanshu Jain. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints. IEEE Transactions on Evolutionary Computation, 18:577–601, 2014.
  • [DOSS23a] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. Analysing the robustness of NSGA-II under noise. In Genetic and Evolutionary Computation Conference, GECCO 2023, pages 642–651. ACM, 2023.
  • [DOSS23b] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. A proof that using crossover can guarantee exponential speed-ups in evolutionary multi-objective optimisation. In Conference on Artificial Intelligence, AAAI 2023, pages 12390–12398. AAAI Press, 2023.
  • [DPAM02] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6:182–197, 2002.
  • [DQ23a] Benjamin Doerr and Zhongdi Qu. A first runtime analysis of the NSGA-II on a multimodal problem. IEEE Transactions on Evolutionary Computation, 27:1288–1297, 2023.
  • [DQ23b] Benjamin Doerr and Zhongdi Qu. Runtime analysis for the NSGA-II: provable speed-ups from crossover. In Conference on Artificial Intelligence, AAAI 2023, pages 12399–12407. AAAI Press, 2023.
  • [GFP21] Andreia P. Guerreiro, Carlos M. Fonseca, and Luís Paquete. The hypervolume indicator: Computational problems and algorithms. ACM Computing Surveys (CSUR), 54:1–42, 2021.
  • [KD06] Saku Kukkonen and Kalyanmoy Deb. Improved pruning of non-dominated solutions based on crowding distance for bi-objective optimization problems. In Conference on Evolutionary Computation, CEC 2006, pages 1179–1186. IEEE, 2006.
  • [ODNS24] Andre Opris, Duc Cuong Dang, Frank Neumann, and Dirk Sudholt. Runtime analyses of NSGA-III on many-objective problems. In Genetic and Evolutionary Computation Conference, GECCO 2024, pages 1596–1604. ACM, 2024.
  • [SIHP20] Ke Shang, Hisao Ishibuchi, Linjun He, and Lie Meng Pang. A survey on the hypervolume indicator in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation, 25:1–20, 2020.
  • [WD23] Simon Wietheger and Benjamin Doerr. A mathematical runtime analysis of the Non-dominated Sorting Genetic Algorithm III (NSGA-III). In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5657–5665. ijcai.org, 2023.
  • [WD24] Simon Wietheger and Benjamin Doerr. Near-tight runtime guarantees for many-objective evolutionary algorithms. In Parallel Problem Solving From Nature, PPSN 2024. Springer, 2024. To appear. Preprint at https://arxiv.org/abs/2404.12746.
  • [YRL+20] Sorrachai Yingchareonthawornchai, Proteek Chandan Roy, Bundit Laekhanukit, Eric Torng, and Kalyanmoy Deb. Worst-case conditional hardness and fast algorithms with random inputs for non-dominated sorting. In Genetic and Evolutionary Computation Conference, GECCO 2020, Companion Volume, pages 185–186. ACM, 2020.
  • [ZD23a] Weijie Zheng and Benjamin Doerr. Mathematical runtime analysis for the non-dominated sorting genetic algorithm II (NSGA-II). Artificial Intelligence, 325:104016, 2023.
  • [ZD23b] Weijie Zheng and Benjamin Doerr. Runtime analysis for the NSGA-II: proving, quantifying, and explaining the inefficiency for many objectives. IEEE Transactions on Evolutionary Computation, 2023. In press, https://doi.org/10.1109/TEVC.2023.3320278.
  • [ZD24a] Weijie Zheng and Benjamin Doerr. Approximation guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). IEEE Transactions on Evolutionary Computation, 2024. In press, https://doi.org/10.1109/TEVC.2024.3402996.
  • [ZD24b] Weijie Zheng and Benjamin Doerr. Runtime analysis of the SMS-EMOA for many-objective optimization. In Conference on Artificial Intelligence, AAAI 2024, pages 20874–20882. AAAI Press, 2024.
  • [ZLD22] Weijie Zheng, Yufei Liu, and Benjamin Doerr. A first mathematical runtime analysis of the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). In Conference on Artificial Intelligence, AAAI 2022, pages 10408–10416. AAAI Press, 2022.
  • [ZLDD24] Weijie Zheng, Mingfeng Li, Renzhong Deng, and Benjamin Doerr. How to use the Metropolis algorithm for multi-objective optimization? In Conference on Artificial Intelligence, AAAI 2024, pages 20883–20891. AAAI Press, 2024.
  • [ZQL+11] Aimin Zhou, Bo-Yang Qu, Hui Li, Shi-Zheng Zhao, Ponnuthurai Nagaratnam Suganthan, and Qingfu Zhang. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation, 1:32–49, 2011.