This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

CMA-ES for Safe Optimization

Kento Uchida uchida-kento-fz@ynu.ac.jp 0000-0002-4179-6020 Yokohama National UniversityYokohamaKanagawaJapan240-8501 Ryoki Hamano hamano˙ryoki˙xa@cyberagent.co.jp 0000-0002-4425-1683 CyberAgent, Inc. and Yokohama National UniversityShibuyaTokyoJapan150-0042 Masahiro Nomura nomura_masahiro@cyberagent.co.jp 0000-0002-4945-5984 CyberAgent, Inc.ShibuyaTokyoJapan150-0042 Shota Saito saito-shota-bt@ynu.jp 0000-0002-9863-6765 Yokohama National University and SKILLUP NeXt Ltd.YokohamaKanagawaJapan240-8501  and  Shinichi Shirakawa shirakawa-shinichi-bg@ynu.ac.jp 0000-0002-4659-6108 Yokohama National UniversityYokohamaKanagawaJapan240-8501
(2024)
Abstract.

In several real-world applications in medical and control engineering, there are unsafe solutions whose evaluations involve inherent risk. This optimization setting is known as safe optimization and formulated as a specialized type of constrained optimization problem with constraints for safety functions. Safe optimization requires performing efficient optimization without evaluating unsafe solutions. A few studies have proposed the optimization methods for safe optimization based on Bayesian optimization and the evolutionary algorithm. However, Bayesian optimization-based methods often struggle to achieve superior solutions, and the evolutionary algorithm-based method fails to effectively reduce unsafe evaluations. This study focuses on CMA-ES as an efficient evolutionary algorithm and proposes an optimization method termed safe CMA-ES. The safe CMA-ES is designed to achieve both safety and efficiency in safe optimization. The safe CMA-ES estimates the Lipschitz constants of safety functions transformed with the distribution parameters using the maximum norm of the gradient in Gaussian process regression. Subsequently, the safe CMA-ES projects the samples to the nearest point in the safe region constructed with the estimated Lipschitz constants. The numerical simulation using the benchmark functions shows that the safe CMA-ES successfully performs optimization, suppressing the unsafe evaluations, while the existing methods struggle to significantly reduce the unsafe evaluations.

safe optimization, covariance matrix adaptation evolution strategy, Gaussian process regression, Lipschitz constant
journalyear: 2024copyright: acmlicensedconference: Genetic and Evolutionary Computation Conference; July 14–18, 2024; Melbourne, VIC, Australiabooktitle: Genetic and Evolutionary Computation Conference (GECCO ’24), July 14–18, 2024, Melbourne, VIC, Australiadoi: 10.1145/3638529.3654193isbn: 979-8-4007-0494-9/24/07ccs: Mathematics of computing Continuous mathematics

1. Introduction

In the real-world applications within the medical and control engineering fields (Berkenkamp et al., 2023; Louis et al., 2022; Modugno et al., 2016; Sui et al., 2015), unsafe solutions may be present, and their evaluations involve risk such as clinical deterioration and breakdown of control systems. For instance, in spinal cord therapy (Sui et al., 2015), the configuration of the electrical stimulation is optimized to improve spinal reflex and locomotor function, where unsafe configuration can aggravate spinal cord injuries. In real-world optimization of drone control system (Berkenkamp et al., 2023), we have to optimize the system parameters preventing drone collisions with the surrounding objects so as not to break down the drone. These optimization problems, preventing the evaluation of unsafe solutions that should not be evaluated, are termed as safe optimization. Safe optimization is formulated as a specialized type of constrained optimization problem aiming to reduce the evaluations of the solutions whose safety function values exceed the safety threshold. Additionally, a set of safe solutions, referred to as safe seeds is provided to the optimizer.

Several methods have been developed for safe optimization. SafeOpt (Sui et al., 2015) is a representative in this category. SafeOpt relies on Bayesian optimization and constructs the safe region using the Lipschitz constant of the safety function not to evaluate the unsafe solutions. Several extensions of SafeOpt have been proposed, such as modified SafeOpt (Berkenkamp et al., 2016), which eliminates the need for the Lipschitz constant of the safety function, and swarm-based SafeOpt (Duivenvoorden et al., 2017), which introduces the particle swarm optimization (Serkan Kiranyaz, 2014) to improve the search performance on high-dimensional problems. In the context of evolutionary algorithms within safe optimization, a general approach called violation avoidance (Kaji et al., 2009) has been proposed. This method involves regenerating a solution when the nearest solution to the generated one is deemed unsafe. However, according to reference (Kim et al., 2022), it points out that the violation avoidance may not effectively suppress the evaluations of unsafe solutions compared with SafeOpt. Considering the computational cost of updates and the optimization performance with large budget, evolutionary algorithms have advantages over Bayesian optimization (Hutter et al., 2013). Consequently, there is a demand for the development of efficient evolutionary algorithms tailored for safe optimization.

This study focuses on the covariance matrix adaptation evolution strategy (CMA-ES) (Hansen and Ostermeier, 1996) as an efficient evolutionary algorithm. CMA-ES employs a multivariate Gaussian distribution as a sampling distribution for solutions and updates the distribution parameters to generate better solutions. The CMA-ES possesses several invariance properties, such as the invariance to affine transformations of the search space, and realizes a powerful optimization performance on ill-conditioned and non-separable problems (Hansen and Auger, 2014).

This study proposes an optimization method for safe optimization based on CMA-ES, termed safe CMA-ES, to realize both safety and efficiency in safe optimization. In addition to the original update procedure of the CMA-ES, the safe CMA-ES estimates the Lipschitz constant of the safety function and constructs a safe region to repair the samples from multivariate Gaussian distribution. The estimation process of the Lipschitz constant uses the Gaussian process regression (GPR) (Rasmussen et al., 2006) trained with the evaluated solutions and computes the maximum norm of the gradient of the prediction mean. The safe CMA-ES estimates the Lipschitz constant in the space transformed using the distribution parameter to inherit the invariance to affine transformations of the search space. Then, the safe CMA-ES projects the sample generated from the multivariate Gaussian distribution to the nearest point in the safe region computed with the estimated Lipschitz constant. Additionally, the safe CMA-ES corrects the initial distribution parameters using the safe seeds.

In numerical simulations, we evaluated the search performance of the safe CMA-ES on the benchmark functions for safe optimization. While the existing method for safe optimization failed to suppress the evaluation of unsafe solutions, the safe CMA-ES successfully found the optimal solution with few or no unsafe evaluations.

2. CMA-ES

The CMA-ES is a probabilistic model-based evolutionary algorithm for continuous black-box optimization problems. CMA-ES employs a multivariate Gaussian distribution parameterized by the mean vector 𝒎d\boldsymbol{m}\in\mathbb{R}^{d}, the covariance matrix 𝑪d×d\boldsymbol{C}\in\mathbb{R}^{d\times d}, and the step-size σ>0\sigma\in\mathbb{R}_{>0}.

We introduce the update procedure of the CMA-ES on the objective function f:df:\mathbb{R}^{d}\to\mathbb{R}. At first, the CMA-ES generates λ\lambda samples {𝒙}=1λ\{\boldsymbol{x}^{\langle\ell\rangle}\}_{\ell=1}^{\lambda} from the multivariate Gaussian distribution 𝒩(𝒎(t),(σ(t))2𝑪(t))\mathcal{N}(\boldsymbol{m}^{(t)},(\sigma^{(t)})^{2}\boldsymbol{C}^{(t)}) as

(1) 𝒛\displaystyle\boldsymbol{z}^{\langle\ell\rangle} 𝒩(𝟎,𝐈)\displaystyle\sim\mathcal{N}(\mathbf{0},\mathbf{I})
(2) 𝒚\displaystyle\boldsymbol{y}^{\langle\ell\rangle} =𝑪(t)𝒛\displaystyle=\sqrt{\boldsymbol{C}^{(t)}}\boldsymbol{z}^{\langle\ell\rangle}
(3) 𝒙\displaystyle\boldsymbol{x}^{\langle\ell\rangle} =𝒎(t)+σ(t)𝒚.\displaystyle=\boldsymbol{m}^{(t)}+\sigma^{(t)}\boldsymbol{y}^{\langle\ell\rangle}\enspace.

Then, the CMA-ES evaluates the samples on the objective function and computes their rankings. We denote the index of the \ell-th best sample as :λ\ell\!:\!\lambda.

Next, the CMA-ES updates two evolution paths 𝒑σ,𝒑cd\boldsymbol{p}_{\sigma},\boldsymbol{p}_{c}\in\mathbb{R}^{d}, which are initialized as 𝒑σ(0)=𝒑c(0)=𝟎\boldsymbol{p}_{\sigma}^{(0)}=\boldsymbol{p}_{c}^{(0)}=\mathbf{0}. The update of evolution paths uses two weighted sums of μ\mu-best solutions Δ𝒛==1μw𝒛:λ\Delta_{\boldsymbol{z}}=\sum_{\ell=1}^{\mu}w_{\ell}\boldsymbol{z}^{\langle\ell:\lambda\rangle} and Δ𝒚==1μw𝒚:λ\Delta_{\boldsymbol{y}}=\sum_{\ell=1}^{\mu}w_{\ell}\boldsymbol{y}^{\langle\ell:\lambda\rangle} computed using the positive weights w1wμ>0w_{1}\geq\cdots\geq w_{\mu}>0 and is performed as

(4) 𝒑σ(t+1)\displaystyle\boldsymbol{p}_{\sigma}^{(t+1)} =(1cσ)𝒑σ(t)+cσ(2cσ)μeffΔ𝒛\displaystyle=(1-c_{\sigma})\boldsymbol{p}_{\sigma}^{(t)}+\sqrt{c_{\sigma}(2-c_{\sigma})\mu_{\mathrm{eff}}}\cdot\Delta_{\boldsymbol{z}}
(5) 𝒑c(t+1)\displaystyle\boldsymbol{p}_{c}^{(t+1)} =(1cc)𝒑c(t)+hσ(t+1)cc(2cc)μeffΔ𝒚,\displaystyle=(1-c_{c})\boldsymbol{p}_{c}^{(t)}+h_{\sigma}^{(t+1)}\sqrt{c_{c}(2-c_{c})\mu_{\mathrm{eff}}}\cdot\Delta_{\boldsymbol{y}}\enspace,

where cσ,cc>0c_{\sigma},c_{c}\in\mathbb{R}_{>0} are the accumulation rates of the evolution paths and μeff=(=1μw2)1\mu_{\mathrm{eff}}=(\sum_{\ell=1}^{\mu}w_{\ell}^{2})^{-1}. The Heaviside function hσ(t+1){0,1}h_{\sigma}^{(t+1)}\in\{0,1\} becomes one if and only if it satisfies

(6) 𝒑σ(t+1)1(1cσ)2(t+1)<(1.4+2d+1)χd,\displaystyle\frac{\|\boldsymbol{p}_{\sigma}^{(t+1)}\|}{\sqrt{1-(1-c_{\sigma})^{2(t+1)}}}<\left(1.4+\frac{2}{d+1}\right)\chi_{d}\enspace,

where χd=d(114d+121d2)\chi_{d}=\sqrt{d}\left(1-\frac{1}{4d}+\frac{1}{21d^{2}}\right) is approximated value of the expectation 𝔼[𝒩(𝟎,𝐈)]\mathbb{E}[\|\mathcal{N}(\mathbf{0},\mathbf{I})\|].

Finally, the CMA-ES updates the distribution parameters of the multivariate Gaussian distribution as

(7) 𝒎(t+1)\displaystyle\boldsymbol{m}^{(t+1)} =𝒎(t)+cmσ(t)Δ𝒚\displaystyle=\boldsymbol{m}^{(t)}+c_{m}\sigma^{(t)}\Delta_{\boldsymbol{y}}
(8) σ(t+1)\displaystyle\sigma^{(t+1)} =σ(t)exp(cσdσ(𝒑σ(t+1)χd1))\displaystyle=\sigma^{(t)}\exp\left(\frac{c_{\sigma}}{d_{\sigma}}\left(\frac{\|\boldsymbol{p}_{\sigma}^{(t+1)}\|}{\chi_{d}}-1\right)\right)\allowdisplaybreaks[3]
(9) 𝑪(t+1)=(1+(1hσ(t+1))c1cc(2cc))𝑪(t)+c1(𝒑c(t+1)(𝒑c(t+1))T𝑪(t))+cμ=1μw(𝒚:λ(𝒚:λ)T𝑪(t))\displaystyle\begin{split}\boldsymbol{C}^{(t+1)}&=(1+(1-h_{\sigma}^{(t+1)})c_{1}c_{c}(2-c_{c}))\boldsymbol{C}^{(t)}\\ &+c_{1}\left(\boldsymbol{p}_{c}^{(t+1)}\left(\boldsymbol{p}_{c}^{(t+1)}\right)^{\mathrm{T}}-\boldsymbol{C}^{(t)}\right)\\ &+c_{\mu}\sum_{\ell=1}^{\mu}w_{\ell}\left(\boldsymbol{y}^{\langle\ell:\lambda\rangle}\left(\boldsymbol{y}^{\langle\ell:\lambda\rangle}\right)^{\mathrm{T}}-\boldsymbol{C}^{(t)}\right)\end{split}

where cm,c1,cμ>0c_{m},c_{1},c_{\mu}\in\mathbb{R}_{>0} are the learning rates and dσ>0d_{\sigma}\in\mathbb{R}_{>0} is the damping factor. The CMA-ES contains well-tuned default values for hyperparameters. Refer to the details in the references (Hansen, 2016; Hansen and Auger, 2014).

Algorithm 1 Safe CMA-ES
0:  The objective function ff to be minimized
0:  Initial distribution parameters 𝒎(0),𝑪(0),σ(0)\boldsymbol{m}^{(0)},\boldsymbol{C}^{(0)},\sigma^{(0)}
0:  Hyperparameters for estimation of Lipschitz constant: Tdata=5T_{\mathrm{data}}=5, ζinit=10\zeta_{\mathrm{init}}=10, α=10\alpha=10
0:  Hyperparameters for initialization: Lmin=100L_{\min}=100, γ=0.9\gamma=0.9
0:  Safe seeds 𝒳seed={𝒙seed}\mathcal{X}_{\mathrm{seed}}=\{\boldsymbol{x}_{\mathrm{seed}}\}, safety thresholds {hj}j=1p\{h_{j}\}_{j=1}^{p} iterator t=0t=0
1:  Compute initial estimation of the Lipschitz constants L1(0),,Lp(0)L_{1}^{(0)},\cdots,L_{p}^{(0)} of the transformed safety functions using (25).
2:  Modify the initial mean vector 𝒎(0)\boldsymbol{m}^{(0)} and the initial step-size σ(0)\sigma^{(0)} using (26) and (27), respectively.
3:  while termination condition is not met do
4:     for =1,,λ\ell=1,\cdots,\lambda do
5:        Generate 𝒛𝒩(𝟎,𝐈)\boldsymbol{z}^{\langle\ell\rangle}\sim\mathcal{N}(\mathbf{0},\mathbf{I}) and project 𝒛\boldsymbol{z}^{\langle\ell\rangle} to the nearest point 𝒛~\tilde{\boldsymbol{z}}^{\langle\ell\rangle} in safe region by (15).
6:        Compute 𝒙\boldsymbol{x}^{\langle\ell\rangle} using (3) with modified sample 𝒛~\tilde{\boldsymbol{z}}^{\langle\ell\rangle}.
7:        Evaluate 𝒙\boldsymbol{x}^{\langle\ell\rangle} on the objective function ff and the safety functions sjs_{j} for j=1,,pj=1,\cdots,p.
8:     end for
9:     Update 𝒎(t+1),𝑪(t+1),σ(t+1)\boldsymbol{m}^{(t+1)},\boldsymbol{C}^{(t+1)},\sigma^{(t+1)} with modified samples {𝒙}=1λ\{\boldsymbol{x}^{\langle\ell\rangle}\}_{\ell=1}^{\lambda} using the update rules (7), (8), and (9), respectively.
10:     Estimate the Lipschitz constants L1(t+1),,Lp(t+1)L_{1}^{(t+1)},\cdots,L_{p}^{(t+1)} of the transformed safety functions using (21) with NdataN_{\mathrm{data}} solutions evaluated in the last TdataT_{\mathrm{data}} iterations.
11:     tt+1t\leftarrow t+1
12:  end while

3. Formulation of Safe Optimization

We follow the formulation of the safe optimization outlined in the reference (Kim et al., 2021). We consider a constrained minimization problem of the objective function as

(10) min𝒙df(𝒙)s.t.sj(𝒙)hjfor allj=1,,p\displaystyle\begin{split}\min_{\boldsymbol{x}\in\mathbb{R}^{d}}\,f(\boldsymbol{x})\quad\text{s.t.}\quad s_{j}(\boldsymbol{x})\leq h_{j}\quad\text{for all}\enspace j=1,\cdots,p\end{split}

where sj:ds_{j}:\mathbb{R}^{d}\to\mathbb{R} and hjh_{j}\in\mathbb{R} are the safety function and the safety threshold, respectively. We consider a solution as a safe solution when it satisfies all safety constraints and regard a solution as an unsafe solution when it violates at least one safety constraint. In safe optimization, the optimizer can access the safety thresholds and is required to optimize the objective function, suppressing the evaluations of the unsafe solutions. In addition, the optimizer receives NseedN_{\mathrm{seed}} safe solutions as the safe seeds. The upper limit of unsafe evaluations depends on the target application. The evaluation of unsafe solutions is usually prohibited in medical applications, while a few unsafe evaluations may be acceptable for control system optimization. For example, in control system optimization for robots, one can prepare multiple robots of the same type in preparation for breakage, and the number of robots serves as the upper limit for unsafe evaluations in this case.

We assume that the evaluations of the objective function and the safety constraints are jointly performed. Additionally, we assume that the safety functions are noiseless black-box functions. For simplicity in this research, we assume that the evaluation values of the unsafe solutions on the objective function are accessible.

4. Proposed Method: Safe CMA-ES

This study proposes the safe CMA-ES, an extension of the CMA-ES that achieves efficient optimization performance in safe optimization. The safe CMA-ES constructs the safe region based on the estimated Lipschitz constants of the safety functions and projects the generated samples to the nearest point in the safe region to avoid evaluating unsafe solutions (Section 4.1). The Lipschitz constants are estimated in the transformed search space by the Gaussian process regression (Section 4.2). To further enhance safety, the safe CMA-ES initializes the distribution, ensuring that it fits within the safe region (Section 4.3). Algorithm 1 shows the pseudo-code of the safe CMA-ES.

Refer to caption
Figure 1. The safe region on the safety function s(𝒙)=x12+10x22s(\boldsymbol{x})=x_{1}^{2}+10x_{2}^{2} with the safety threshold h=5h=5 with four safe seeds. The distribution parameters are given by 𝒎=𝟎\boldsymbol{m}=\mathbf{0}, σ=1\sigma=1, and 𝑪=diag((1,0.1)T)\boldsymbol{C}=\mathrm{diag}((1,0.1)^{\mathrm{T}}). We generated safe seeds uniformly at random in the range [1,1]2[-1,1]^{2} plotted as a white dotted box. The white circles and the orange line show the safe region and the border of the safety constraint. The left figure shows the safe region on the composition function sjϕ1s_{j}\circ\phi^{-1}, and the center figure shows the safe region on the safety function sjs_{j}. The right figure shows the safe region on the safety function sjs_{j} computed with the Lipschitz constant of the safety function sjs_{j} instead of the composition function sjϕ1s_{j}\circ\phi^{-1}.

4.1. Projection of Samples to Safe Region

The safe CMA-ES introduces the transformation ϕ\phi using the current distribution parameters 𝜽(t)={𝒎(t),𝑪(t),σ(t)}\boldsymbol{\theta}^{(t)}=\{\boldsymbol{m}^{(t)},\boldsymbol{C}^{(t)},\sigma^{(t)}\} to inherit the invariance to affine transformations of the search space as

(11) ϕ(𝒙):=ϕ(𝒙;𝜽(t))=(𝑪(t))1𝒙𝒎(t)σ(t).\displaystyle\phi(\boldsymbol{x}):=\phi(\boldsymbol{x};\boldsymbol{\theta}^{(t)})=\left(\sqrt{\boldsymbol{C}^{(t)}}\right)^{-1}\frac{\boldsymbol{x}-\boldsymbol{m}^{(t)}}{\sigma^{(t)}}\enspace.

The safe CMA-ES constructs the safe region using the following fact: given the Lipschitz constant LjL_{j} of the composition function sjϕ1s_{j}\circ\phi^{-1}, any safe solution 𝒙d\boldsymbol{x}\in\mathbb{R}^{d} and any solutions 𝒙d\boldsymbol{x}^{\prime}\in\mathbb{R}^{d} satisfy

(12) ϕ(𝒙)ϕ(𝒙)\displaystyle\|\phi(\boldsymbol{x})-\phi(\boldsymbol{x}^{\prime})\| hjsj(𝒙)Ljsj(𝒙)hj.\displaystyle\leq\frac{h_{j}-s_{j}(\boldsymbol{x})}{L_{j}}\enspace\Rightarrow\enspace s_{j}(\boldsymbol{x}^{\prime})\leq h_{j}\enspace.

The safe CMA-ES uses Ndata=min{Nseed+λt,λTdata}N_{\mathrm{data}}=\min\{N_{\mathrm{seed}}+\lambda t,\lambda T_{\mathrm{data}}\} solutions evaluated in the last TdataT_{\mathrm{data}} iterations to construct the safe region. The safe CMA-ES computes the safe region using safe solutions 𝒜safe\mathcal{A}_{\mathrm{safe}} in NdataN_{\mathrm{data}} evaluated solutions and the estimated Lipschitz constant Lj(t)L_{j}^{(t)}, explained in the next section, as

𝒳safe=𝒙𝒜safe{𝒙𝒳ϕ(𝒙)ϕ(𝒙)δ(𝒙)}.\displaystyle\mathcal{X}_{\mathrm{safe}}=\bigcup_{\boldsymbol{x}\in\mathcal{A}_{\mathrm{safe}}}\left\{\boldsymbol{x}^{\prime}\in\mathcal{X}\mid\|\phi(\boldsymbol{x})-\phi(\boldsymbol{x}^{\prime})\|\leq\delta(\boldsymbol{x})\right\}\enspace.

The function δ\delta returns the radius of the safe region with a given safe solution on the composition function sjϕ1s_{j}\circ\phi^{-1}, as

(13) δ(𝒙)\displaystyle\delta(\boldsymbol{x}) =min1jphjsj(𝒙)Lj(t).\displaystyle=\min_{1\leq j\leq p}\frac{h_{j}-s_{j}(\boldsymbol{x})}{L_{j}^{(t)}}\enspace.

Figure 1 illustrates an example of the safe region. Figure 1 also shows the safe region computed with the Lipschitz constant of the safety function sjs_{j} instead of the composition function sjϕ1s_{j}\circ\phi^{-1} for reference. It can be observed that the Lipschitz constant of our composition function expands the safe region when the distribution parameters of the CMA-ES is learned appropriately.

To prevent the evaluation of unsafe solutions, the safe CMA-ES projects the generated solution 𝒙\boldsymbol{x}^{\langle\ell\rangle} in (3) to the nearest point in the safe region with respect to the Mahalanobis distance as

(14) 𝒙~argmin𝒙𝒳safeϕ(𝒙)ϕ(𝒙).\displaystyle\tilde{\boldsymbol{x}}^{\langle\ell\rangle}\in\mathop{\rm arg~{}min}\limits_{\boldsymbol{x}^{\prime}\in\mathcal{X}_{\mathrm{safe}}}\|\phi(\boldsymbol{x}^{\prime})-\phi(\boldsymbol{x}^{\langle\ell\rangle})\|\enspace.

The projection is performed by modifying the sample 𝒛\boldsymbol{z}^{\langle\ell\rangle} generated in (1) as

(15) 𝒛~=ξ𝒛+(1ξ)ϕ(𝒙near),\displaystyle\tilde{\boldsymbol{z}}^{\langle\ell\rangle}=\xi^{\langle\ell\rangle}\boldsymbol{z}^{\langle\ell\rangle}+(1-\xi^{\langle\ell\rangle})\phi(\boldsymbol{x}^{\langle\ell\rangle}_{\mathrm{near}})\enspace,

where 𝒙near𝒜safe\boldsymbol{x}^{\langle\ell\rangle}_{\mathrm{near}}\in\mathcal{A}_{\mathrm{safe}} and ξ[0,1]\xi^{\langle\ell\rangle}\in[0,1] are given as

(16) 𝒙near\displaystyle\boldsymbol{x}^{\langle\ell\rangle}_{\mathrm{near}} =argmax𝒙𝒜safe{δ(𝒙)𝒛ϕ(𝒙)}\displaystyle=\mathop{\rm arg~{}max}\limits_{\boldsymbol{x}\in\mathcal{A}_{\mathrm{safe}}}\left\{\delta(\boldsymbol{x})-\|{\boldsymbol{z}}^{\langle\ell\rangle}-\phi(\boldsymbol{x})\|\right\}
(17) ξ\displaystyle\xi^{\langle\ell\rangle} =min{1,δ(𝒙near)𝒛ϕ(𝒙near)}.\displaystyle=\min\left\{1,\frac{\delta(\boldsymbol{x}^{\langle\ell\rangle}_{\mathrm{near}})}{\|{\boldsymbol{z}}^{\langle\ell\rangle}-\phi(\boldsymbol{x}^{\langle\ell\rangle}_{\mathrm{near}})\|}\right\}\enspace.

The safe solution 𝒙near\boldsymbol{x}^{\langle\ell\rangle}_{\mathrm{near}} has the safe region closest to 𝒛{\boldsymbol{z}}^{\langle\ell\rangle} in 𝒜safe\mathcal{A}_{\mathrm{safe}}, and ξ\xi^{\langle\ell\rangle} determines 𝒛~\tilde{\boldsymbol{z}}^{\langle\ell\rangle} between 𝒛{\boldsymbol{z}}^{\langle\ell\rangle} and ϕ(𝒙near)\phi(\boldsymbol{x}^{\langle\ell\rangle}_{\mathrm{near}}), which becomes ξ=1\xi^{\langle\ell\rangle}=1 when the solution 𝒙\boldsymbol{x}^{\langle\ell\rangle} is originally generated in the safe region.

Refer to caption
Figure 2. Transitions of the best evaluation value and the number of evaluations of unsafe solutions with safety function s(𝒙)=f(𝒙)s(\boldsymbol{x})=f(\boldsymbol{x}). We plot the medians and interquartile ranges over 50 trials.

4.2. Estimation of Lipschitz Constants

Based on the updated distribution parameters, the safe CMA-ES estimates the Lipschitz constant of each safety function using the Gaussian process regression (GPR) trained with evaluated solutions. The safe CMA-ES also uses NdataN_{\mathrm{data}} solutions evaluated in the last TdataT_{\mathrm{data}} iterations (including the solutions evaluated in the current iteration) as the training data for the GPR. The safe CMA-ES normalizes the ii-th solution in the training data 𝒙i\boldsymbol{x}_{i} using the updated distribution parameters as 𝒛i=ϕ(𝒙i;𝜽(t+1))\boldsymbol{z}_{i}=\phi(\boldsymbol{x}_{i};\boldsymbol{\theta}^{(t+1)}). Additionally, the target variable corresponding to the ii-th training data is normalized as

(18) ωi,j=sj(𝒙i)μjσj,\displaystyle\omega_{i,j}=\frac{s_{j}(\boldsymbol{x}_{i})-\mu_{j}}{\sigma_{j}}\enspace,

where μj\mu_{j} and σj\sigma_{j} are the average and standard deviation of the evaluation values on the jj-th safety function over the NdataN_{\mathrm{data}} solutions, respectively. The safe CMA-ES computes the estimated Lipschitz constant of the composition function sjϕ1s_{j}\circ\phi^{-1} using the posterior distribution of GPR with the training data 𝒟={(𝒛i,ωi,j)}i=1Ndata\mathcal{D}=\{(\boldsymbol{z}_{i},\omega_{i,j})\}_{i=1}^{N_{\mathrm{data}}} as

(19) L^j(t+1)=σjmax𝒛𝒵𝒛μ(𝒛𝒟),\displaystyle\hat{L}_{j}^{(t+1)}=\sigma_{j}\cdot\max_{\boldsymbol{z}\in\mathcal{Z}}\left\|\nabla_{\boldsymbol{z}}\mu(\boldsymbol{z}\mid\mathcal{D})\right\|\enspace,

where μ(𝒛𝒟)\mu(\boldsymbol{z}\mid\mathcal{D}) is the mean of the posterior distribution. We set the search space for maximization in (19) as 𝒵=[3,3]d\mathcal{Z}=[-3,3]^{d}. Based on the approach in (Gonzalez et al., 2016), the safe CMA-ES applies a two-step optimization process to solve the maximization. In the first step, the safe CMA-ES generates 5λ5\lambda samples 𝒵^={𝒛^i}i=15λ\hat{\mathcal{Z}}=\{\hat{\boldsymbol{z}}_{i}\}_{i=1}^{5\lambda} from the standard multivariate normal distribution 𝒩(𝟎,𝐈)\mathcal{N}(\mathbf{0},\mathbf{I}) and computes the norm 𝒛μ(𝒛)\|\nabla_{\boldsymbol{z}}\mu(\boldsymbol{z})\| of the gradient of the mean of the posterior distribution with respect to each sample 𝒛^i\hat{\boldsymbol{z}}_{i}. In the second step, the safe CMA-ES employs L-BFGS (Liu and Nocedal, 1989) with box-constraint handling and runs L-BFGS from the maximizer in 𝒵^\hat{\mathcal{Z}} for 200 iterations. Following the settings of the surrogate-assisted CMA-ES with GPR (Toal and Arnold, 2020; Yamada et al., 2023), we use the RBF kernel as

(20) k(𝒛,𝒛)=exp(𝒛𝒛22H2).\displaystyle k(\boldsymbol{z},\boldsymbol{z}^{\prime})=\exp\left(\frac{\|\boldsymbol{z}-\boldsymbol{z}^{\prime}\|^{2}}{2H^{2}}\right)\enspace.

We set the length scale and observation noise as H=8dH=8d and 0, respectively.

In addition, the safe CMA-ES has two correction mechanisms that increase the estimated Lipschitz constant in case the estimation using the GPR is unreliable. Introducing coefficients τ(t+1)\tau^{(t+1)} and ρj(t+1)\rho_{j}^{(t+1)} for the correction, the estimated Lipschitz constant is computed as

(21) Lj(t+1)=L~j(t+1)τ(t+1)ρj(t+1).\displaystyle L_{j}^{(t+1)}=\tilde{L}_{j}^{(t+1)}\cdot\tau^{(t+1)}\cdot\rho_{j}^{(t+1)}\enspace.

The updates of those coefficients are as follows. The safe CMA-ES increases the coefficient τ(t+1)\tau^{(t+1)} when the number of training data NdataN_{\mathrm{data}} is small because the prediction with small training dataset is unreliable. Recalling the maximum number of training data is λTdata\lambda T_{\mathrm{data}}, the coefficient τ(t+1)\tau^{(t+1)} is updated as

(22) τ(t+1)={(ζinit)1NdataifNdata<λTdata1otherwise,\displaystyle\tau^{(t+1)}=\begin{cases}(\zeta_{\mathrm{init}})^{\frac{1}{N_{\mathrm{data}}}}&\text{if}\enspace N_{\mathrm{data}}<\lambda T_{\mathrm{data}}\\ 1&\text{otherwise}\end{cases}\enspace,

where ζinit>1\zeta_{\mathrm{init}}>1 determines the effect of the coefficient τ(t+1)\tau^{(t+1)}. The coefficient ρj(t+1)\rho_{j}^{(t+1)} is set for each safety constraint and increased when the solutions violate it. Since the safety constraints violated by many solutions require drastic correction, the safe CMA-ES determines the update strength of the coefficient based on the ratio νj\nu_{j} of λ\lambda solutions generated in the current iteration as

(23) νj=1λi=1λ𝕀{sj(𝒙i)>hj}.\displaystyle\nu_{j}=\frac{1}{\lambda}\sum_{i=1}^{\lambda}\mathbb{I}\{s_{j}(\boldsymbol{x}_{i})>h_{j}\}\enspace.

Then, the safe CMA-ES updates the coefficient as

(24) ρj(t+1)={ρj(t)ανjifνj>0max{1,ρj(t)/α1/d}otherwise\displaystyle\rho_{j}^{(t+1)}=\begin{cases}\rho_{j}^{(t)}\cdot\alpha^{\nu_{j}}&\text{if}\quad\nu_{j}>0\\ \max\{1,\rho_{j}^{(t)}/\alpha^{1/d}\}&\text{otherwise}\end{cases}\enspace

where α>1\alpha>1 determines the effect of the coefficient ρj(t+1)\rho_{j}^{(t+1)}. We set the initial value of the coefficient as ρj(0)=1\rho_{j}^{(0)}=1.

4.3. Initialization of Distribution Parameters

The Lipschitz constants of the safety functions are estimated before generating solutions in the first iteration. In the first estimation, the safe CMA-ES requires a predefined lower bound LminL_{\min} for the estimated Lipschitz constant and estimates the Lipschitz constant as

(25) Lj(0)=max{Lmin,L^j(0)τ(0)}.\displaystyle L_{j}^{(0)}=\max\left\{L_{\mathrm{min}},\hat{L}^{(0)}_{j}\cdot\tau^{(0)}\right\}\enspace.

We note L^j(0)\hat{L}_{j}^{(0)} is computed with the initial distribution parameters 𝒎(0),σ(0),𝑪(0)\boldsymbol{m}^{(0)},\sigma^{(0)},\boldsymbol{C}^{(0)} and the GPR with the safe seeds, and τ(0)=(ζinit)1/Nseed\tau^{(0)}=(\zeta_{\mathrm{init}})^{1/{N_{\mathrm{seed}}}} is computed with the number NseedN_{\mathrm{seed}} of the safe seeds. If the number of the safe seeds is one, the safe CMA-ES set Lj(0)=LminL_{j}^{(0)}=L_{\mathrm{min}} because the GPR does not work well.

The safe CMA-ES also modifies the initial mean vector and step-size using the safe seeds. The initial mean vector is set to the safe seed with the best evaluation value on the objective function as

(26) 𝒎~(0)=argmin𝒙𝒳seedf(𝒙).\displaystyle\tilde{\boldsymbol{m}}^{(0)}=\mathop{\rm arg~{}min}\limits_{\boldsymbol{x}\in\mathcal{X}_{\mathrm{seed}}}f(\boldsymbol{x})\enspace.

The step-size is modified to maintain the ratio of solutions not changed after the projection in (15) above γ(0,1)\gamma\in(0,1) as

(27) σ~(0)=σ(0)min{δ(𝒎~(0))χppf2(γ), 1},\displaystyle\tilde{\sigma}^{(0)}=\sigma^{(0)}\cdot\min\left\{\frac{\delta(\tilde{\boldsymbol{m}}^{(0)})}{\sqrt{\chi^{2}_{\mathrm{ppf}}(\gamma)}},\,1\right\}\enspace,

where χppf2\chi^{2}_{\mathrm{ppf}} is the percent point function of the χ2\chi^{2}-distribution with the degree of freedom dd, which gives the squared radius of trust region on the standard dd-dimensional Gaussian distribution with the probability γ\gamma. We note the function δ\delta is given by (13). We set the target ratio to γ=0.9\gamma=0.9.

Refer to caption
Figure 3. Transitions of the best evaluation value and the number of evaluations of unsafe solutions with safety function s(𝒙)=x1s(\boldsymbol{x})=x_{1}. We plot the medians and interquartile ranges over 50 trials.

5. Experiment

We investigate the following aspects through numerical simulation.111The code will be made available at https://github.com/CyberAgentAILab/cmaes (Nomura and Shibata, 2024).

  • The performance evaluation of the safe CMA-ES in the early phase of the optimization (Section 5.3).

  • The performance evaluation of the safe CMA-ES throughout the optimization (Section 5.4).

  • The performance comparison of the safe CMA-ES with the optimization methods for safe optimization (Section 5.5).

We also provide the results of sensitivity experiments of hyperparameters of the safe CMA-ES in the supplemental material.

5.1. Comparative Methods

SafeOpt

SafeOpt (Sui et al., 2015) is an optimization method for safe optimization based on Bayesian optimization. SafeOpt assumes accessibility to the Lipschitz constant LL of the safety function and constructs the safe region based on the upper confidence bound ut(𝒙)u_{t}(\boldsymbol{x}) of the evaluation value on the safety function as 222Differently from the original paper (Sui et al., 2015), which assumes the safety constraint s(𝒙)hs(\boldsymbol{x})\geq h, we explain the update of SafeOpt with the safety constraint s(𝒙)hs(\boldsymbol{x})\leq h.

(28) St=𝒙St1{𝒙𝒳𝒙𝒙hut(𝒙)L}\displaystyle S_{t}=\bigcup_{\boldsymbol{x}\in S_{t-1}}\left\{\boldsymbol{x}^{\prime}\in\mathcal{X}\mid\|\boldsymbol{x}-\boldsymbol{x}^{\prime}\|\leq\frac{h-u_{t}(\boldsymbol{x})}{L}\right\}

The initial safe region is given by the safe seeds. Subsequently, SafeOpt computes two regions within the safe region: the region MtStM_{t}\subseteq S_{t}, where the optimal solution seems to be included, and the region GtStG_{t}\subseteq S_{t}, comprising solutions that could potentially extend the safe region. The SafeOpt evaluates the solution with the largest confidence interval in the union MtGtM_{t}\cup G_{t}. As a variant of SafeOpt, reference (Sui et al., 2015) also proposes SafeOpt-UCB, which evaluates the solution with the smallest lower bound of the predictive confidence interval within the safe region StS_{t}.

Several methods have been developed as extensions of SafeOpt. The reference (Berkenkamp et al., 2016) modified the update of SafeOpt not to require the Lipschitz constant of the safety function and proposed modified SafeOpt. Additionally, to reduce the computational cost in high-dimensional problems, swarm-based SafeOpt (Duivenvoorden et al., 2017) uses the particle swarm optimization (Serkan Kiranyaz, 2014) to search for the solution on the GPR.

Violation Avoidance

The violation avoidance (Kaji et al., 2009) is a general handling for evolutionary algorithms in the safe optimization. The violation avoidance modifies the generation process of the evolutionary algorithm to regenerate a solution when the nearest solution to the generated one is unsafe. The distance between the generated solution 𝒙new\boldsymbol{x}_{\mathrm{new}} and the solution 𝒙old\boldsymbol{x}_{\mathrm{old}} already evaluated is given by

(29) d(𝒙new,𝒙old)=1w(𝒙old)𝒙new𝒙old.\displaystyle d(\boldsymbol{x}_{\mathrm{new}},\boldsymbol{x}_{\mathrm{old}})=\frac{1}{w(\boldsymbol{x}_{\mathrm{old}})}\cdot\|\boldsymbol{x}_{\mathrm{new}}-\boldsymbol{x}_{\mathrm{old}}\|\enspace.

The weight is determined based on whether 𝒙old\boldsymbol{x}_{\mathrm{old}} is safe or unsafe as

(30) w(𝒙old)={wsafeif 𝒙old is safewunsafeif 𝒙old is unsafe\displaystyle w(\boldsymbol{x}_{\mathrm{old}})=\begin{cases}w_{\mathrm{safe}}&\text{if $\boldsymbol{x}_{\mathrm{old}}$ is safe}\\ w_{\mathrm{unsafe}}&\text{if $\boldsymbol{x}_{\mathrm{old}}$ is unsafe}\end{cases}

where wsafe,wunsafe>0w_{\mathrm{safe}},w_{\mathrm{unsafe}}\in\mathbb{R}_{>0} are the predefined weights. The evolutionary algorithm tends to evaluate a solution close to a safe solution when wsafew_{\mathrm{safe}} is larger than wunsafew_{\mathrm{unsafe}} and close to an unsafe solutions otherwise.

In numerical simulation, we included the CMA-ES with violation avoidance as one of the comparative methods. We set the weights as wsafe=wunsafe=1w_{\mathrm{safe}}=w_{\mathrm{unsafe}}=1. We sampled 10λ10\lambda points from the multivariate Gaussian distribution and randomly selected λ\lambda solutions from the samples whose closest evaluated solutions are safe. We terminated the optimization when we could not obtain λ\lambda solutions from 10λ10\lambda samples.

5.2. Experimental Setting

We used the following benchmark functions with a unique optimal solution 𝒙=𝟎\boldsymbol{x}^{\ast}=\mathbf{0}.

  • Sphere: f(𝒙)=i=1dxi2f(\boldsymbol{x})=\sum_{i=1}^{d}x_{i}^{2}

  • Ellipsoid: f(𝒙)=i=1d(1000i1d1xi)2f(\boldsymbol{x})=\sum_{i=1}^{d}\left(1000^{\frac{i-1}{d-1}}x_{i}\right)^{2}

  • Reversed Ellipsoid: f(𝒙)=i=1d(1000did1xi)2f(\boldsymbol{x})=\sum_{i=1}^{d}\left(1000^{\frac{d-i}{d-1}}x_{i}\right)^{2}

  • Rosenbrock: f(𝒙)=i=1d1(100((xi+1+1)(xi+1)2)2+xi2)f(\boldsymbol{x})=\sum_{i=1}^{d-1}\left(100((x_{i+1}+1)-(x_{i}+1)^{2})^{2}+x_{i}^{2}\right)

The sphere, ellipsoid, and rosenbrock functions are well-known benchmarks for continuous black-box optimization. Additionally, we designed the reversed ellipsoid function by reversing the order of the coefficients in the ellipsoid function. This reversed ellipsoid function was employed to investigate the impact of safety function settings on the performance of the safe CMA-ES.

In the first and second experiments, we compared the safe CMA-ES with three comparative methods: the naive CMA-ES, CMA-ES with constraint handling, and CMA-ES with violation avoidance. We used augmented Lagrangian constraint handling (Atamna et al., 2016)333We implemented the constraint handling with pycma (Hansen et al., 2019). We do not use the negative weights for fair comparison. as the constraint handling for the CMA-ES. We terminated the optimization when the best evaluation value reached 10810^{-8} or when the minimum eigenvalue of (σ(t))2𝑪(t)(\sigma^{(t)})^{2}\boldsymbol{C}^{(t)} became smaller than 103010^{-30}.

In the third experiment, we compared the safe CMA-ES with the existing optimization methods designed for safe optimization: SafeOpt, modified SafeOpt, and swarm-based SafeOpt.444We used the implementation of SafeOpt, modified SafeOpt, and swarm-based SafeOpt provided by authors in https://github.com/befelix/SafeOpt. We used the RBF kernel in (20) as the kernel of them for a fair comparison with the safe CMA-ES.555 We did not use the automatic relevance determination (ARD) as it did not lead performance improvements in preliminary experiments. We optimized the hyperparameters of the kernel by the maximum likelihood method. As SafeOpt requires the Lipschitz constant of the safety function, we provided the maximum of the norm of the gradient over the grid points that divide each dimension on the search space into 1111 equally, i.e., 11d11^{d} points on the grid in total. For the violation avoidance, consistent with the setting in the original paper, we fixed wunsafe=1w_{\mathrm{unsafe}}=1 and varied the weight for safe solution in addition to the default setting as wsafe=0.5,1,2w_{\mathrm{safe}}=0.5,1,2.

For all methods, we set the search space as 𝒳=[5,5]d\mathcal{X}=[-5,5]^{d}. We obtained the safe seeds from the samples that are generated uniformly at random within the search space and satisfy all safety constraints. We set the number of the safe seeds as Nseed=10N_{\mathrm{seed}}=10. For the methods employing CMA-ES, the initial mean vector 𝒎(0)\boldsymbol{m}^{(0)} was set to the safe seed with the best evaluation value on the objective function. The initial step-size and the covariance matrix are given by σ(0)=2\sigma^{(0)}=2 and 𝑪(0)=𝐈\boldsymbol{C}^{(0)}=\mathbf{I}, respectively. It is important to note that the safe CMA-ES corrects the initial step-size using (27). We set the number of dimensions as d=5,20d=5,20. The population size was set as λ=4+3lnd\lambda=4+\lfloor 3\ln d\rfloor. We conducted 50 independent trials for each setting.

5.3. Result of Performance Evaluation in Early Phase of Optimization

We set the safety function and threshold to investigate the performance in the early phase of the optimization process as

(31) s(𝒙)=f(𝒙)andh=q(f,𝒳,0.5),\displaystyle s(\boldsymbol{x})=f(\boldsymbol{x})\qquad\text{and}\qquad h=q(f,\mathcal{X},0.5)\enspace,

where q(f,𝒳,γ)q(f,\mathcal{X},\gamma) represents γ\gamma-quantile of the uniform distribution on the objective function ff over the search space 𝒳\mathcal{X}. In this case, the solution with a poor evaluation value on the objective function is unsafe. The safety threshold q(f,𝒳,γ)q(f,\mathcal{X},\gamma) was estimated using 10,000 samples generated uniformly at random across the search space. We set the total number of evaluations to 1,000.

Figure 2 shows the transitions of the best evaluation value and the number of unsafe evaluations. Focusing on the result of 5-dimensional problems, the safe CMA-ES successfully optimized all benchmark functions, suppressing the unsafe evaluations. It is noteworthy that the safe CMA-ES avoided evaluating unsafe solutions in over 75% of the trials. Meanwhile, the violation avoidance did not significantly reduce the unsafe evaluations compared to the naive CMA-ES.

For 20-dimensional problems, the safe CMA-ES initially evaluated the unsafe solutions in the early iterations on the ellipsoid and rosenbrock functions. We consider that this is because the estimation of 20-dimensional GPR trained with limited training data was unreliable. The safe CMA-ES successfully evaluated only safe solutions after initial iterations. Meanwhile, we observe a gradual decrease in the best evaluation value of the safe CMA-ES on the sphere function compared to other methods. We consider that the overestimation of the Lipschitz constant of the safety function led to this slight deterioration. However, the safe CMA-ES accelerated the decrease of the best evaluation value on the ellipsoid function. This is because the safe solution had a good evaluation value in this setting, and the projection to the safe region improved the evaluation value of the samples on the objective function, thereby accelerating the optimization process.

Refer to caption
Figure 4. The computational time for performing five updates in each method. We plot the average time over three trials.
Refer to caption
Figure 5. The performance comparison of the safe CMA-ES with the swarm-based SafeOpt and the CMA-ES with violation avoidance. We plot the medians and interquartile ranges of the best evaluation value and the number of unsafe evaluations.

5.4. Result of Performance Evaluation Throughout Optimization Process

We used the safety function and safety threshold to investigate the performance throughout the optimization process as

(32) s(𝒙)=x1andh=0.\displaystyle s(\boldsymbol{x})=x_{1}\qquad\text{and}\qquad h=0\enspace.

This safety function constrains the first element of the solution. The optimization without evaluating unsafe solutions become challenging for the reversed ellipsoid function. We set the total number of evaluations as d×104d\times 10^{4}.

Figure 3 shows the transitions of the best evaluation value satisfying the safety constraint and the number of unsafe evaluations. It is noteworthy that the median numbers of the unsafe evaluations were zero for the safe CMA-ES in all problems. The results on the sphere function show that the best evaluation value of the safe CMA-ES stagnated in the early phase of the optimization. This occurred due to the overestimation of the Lipschitz constant of the safety function, and the modification of the initial step-size in (27) set an initial value smaller than necessary. On other functions, the safe CMA-ES required more evaluations to reduce the best evaluation value throughout the optimization compared to the comparative methods. The safe CMA-ES, especially on the 20-dimensional reversed ellipsoid, increased the number of evaluations by almost two times. However, the safe CMA-ES completed the whole optimization process without evaluating unsafe solutions, in contrast to the comparative methods, which continued to increase unsafe solutions. Additionally, the CMA-ES with violation avoidance failed to optimize the reversed ellipsoid and rosenbrock functions. This results show the safety and efficiency of the safe CMA-ES on safe optimization.

5.5. Result of Performance Comparison with Existing Methods

Finally, we compared the safe CMA-ES with existing optimization methods for safe optimization. Before evaluating the optimization performance, we compared the computational cost for updating. Figure 4 shows the computational time for performing five updates varying the number of dimensions as d=2,3,4,5d=2,3,4,5.666 We measured the computational time using AMD EPYC 7763 (2.45GHz, 64 cores). We implemented the safe CMA-ES and violation avoidance using NumPy 1.21.3 (Harris et al., 2020), SciPy 1.7.1 (Virtanen et al., 2020), and GpyTorch 1.10 (Gardner et al., 2018). We observed a significant increase in the computational costs of SafeOpt and modified SafeOpt as the number of dimensions increased, which made the comparison with the safe CMA-ES difficult. We consider that SafeOpt and modified SafeOpt used a grid space dividing the search space into even intervals, and the growing number of grid points resulted in an increased computational cost.777 We used a grid dividing each dimension into 20 points. We did not optimize the hyperparameters using the maximum likelihood method because it deteriorated their performance in our preliminary experiment.

Figure 5 shows the comparison results between the safe CMA-ES, swarm-based SafeOpt, and violation avoidance using the safety constraint (31) as used in the first experiment. We observed that swarm-based SafeOpt failed to reduce the best evaluation value on the ellipsoid and rosenbrock functions. The reason for this failure is that the GPR could not accurately estimate the original landscape of those functions. In contrast, because the safe CMA-ES uses the GPR on the composition function sϕ1s\circ\phi^{-1}, the safe CMA-ES obtained a reliable estimation realizing the safe optimization efficiently. Furthermore, more unsafe evaluations occurred in the violation avoidance than in the safe CMA-ES. Additionally, in 20-dimensional problems, the violation avoidance with wsafe=0.5w_{\mathrm{safe}}=0.5 terminated due to the inability to generate solutions whose nearest evaluated solutions were safe. These results reveal that the continued superiority of the safe CMA-ES is not lost even when adjusting the weights of violation avoidance.

6. Conclusion

We proposed the safe CMA-ES as an efficient optimization method tailored for safe optimization. The safe CMA-ES estimates the Lipschitz constants of the transformed safety function using the distribution parameters. The estimation process uses GPR trained with NdataN_{\mathrm{data}} evaluated solutions and the maximum norm of the gradient μ(𝒛)\|\nabla\mu(\boldsymbol{z})\| on the mean of the posterior distribution. Additionally, the safe CMA-ES constructs the safe region and projects the samples to the nearest points in the safe region to reduce the unsafe evaluations. The safe CMA-ES also modifies the initial mean vector and initial step-size using the safe seeds. The numerical simulation shows that the safe CMA-ES optimized the benchmark problems suppressing the unsafe evaluations although the rate of improvement in the best evaluation value was slower compared to other methods. As the safe CMA-ES assumes the existence of the Lipschitz constant of the safety functions, the algorithm improvement for safe optimization with discontinuous safety functions is considered as the future work. Additionally, since we used the synthetic problems in our experiment, the evaluation of the safe CMA-ES in realistic problems is left as a future work.

Acknowledgements.
This work was partially supported by JSPS KAKENHI (JP23H00491, JP23H03466, JP23KJ0985), JST PRESTO (JPMJPR2133), and NEDO (JPNP18002, JPNP20006).

References

  • (1)
  • Atamna et al. (2016) Asma Atamna, Anne Auger, and Nikolaus Hansen. 2016. Augmented Lagrangian Constraint Handling for CMA-ES — Case of a Single Linear Constraint. In Parallel Problem Solving from Nature – PPSN XIV. Springer International Publishing, Cham, 181–191.
  • Berkenkamp et al. (2023) Felix Berkenkamp, Andreas Krause, and Angela P. Schoellig. 2023. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. Machine Learning 112, 10 (2023), 3713–3747. https://doi.org/10.1007/s10994-021-06019-1
  • Berkenkamp et al. (2016) Felix Berkenkamp, Angela P. Schoellig, and Andreas Krause. 2016. Safe controller optimization for quadrotors with Gaussian processes. In 2016 IEEE International Conference on Robotics and Automation (ICRA). 491–496. https://doi.org/10.1109/ICRA.2016.7487170
  • Duivenvoorden et al. (2017) Rikky R.P.R. Duivenvoorden, Felix Berkenkamp, Nicolas Carion, Andreas Krause, and Angela P. Schoellig. 2017. Constrained Bayesian Optimization with Particle Swarms for Safe Adaptive Controller Tuning. IFAC-PapersOnLine 50, 1 (2017), 11800–11807. https://doi.org/10.1016/j.ifacol.2017.08.1991 20th IFAC World Congress.
  • Gardner et al. (2018) Jacob R Gardner, Geoff Pleiss, David Bindel, Kilian Q Weinberger, and Andrew Gordon Wilson. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. In Advances in Neural Information Processing Systems.
  • Gonzalez et al. (2016) Javier Gonzalez, Zhenwen Dai, Philipp Hennig, and Neil Lawrence. 2016. Batch Bayesian Optimization via Local Penalization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Vol. 51. PMLR, 648–657.
  • Hansen (2016) Nikolaus Hansen. 2016. The CMA Evolution Strategy: A Tutorial. CoRR abs/1604.00772 (2016). arXiv:1604.00772
  • Hansen et al. (2019) Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. 2019. CMA-ES/pycma on Github.
  • Hansen and Auger (2014) Nikolaus Hansen and Anne Auger. 2014. Principled Design of Continuous Stochastic Search: From Theory to Practice. Springer Berlin Heidelberg, Berlin, Heidelberg, 145–180. https://doi.org/10.1007/978-3-642-33206-7_8
  • Hansen and Ostermeier (1996) Nikolaus Hansen and Andreas Ostermeier. 1996. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of IEEE International Conference on Evolutionary Computation. 312–317. https://doi.org/10.1109/ICEC.1996.542381
  • Harris et al. (2020) Charles R. Harris, K. Jarrod Millman, St’efan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (2020), 357–362. https://doi.org/10.1038/s41586-020-2649-2
  • Hutter et al. (2013) Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2013. An evaluation of sequential model-based optimization for expensive blackbox functions. In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation. Association for Computing Machinery, New York, NY, USA, 1209–1216. https://doi.org/10.1145/2464576.2501592
  • Kaji et al. (2009) Hirotaka Kaji, Kokolo Ikeda, and Hajime Kita. 2009. Avoidance of constraint violation for experiment-based evolutionary multi-objective optimization. In 2009 IEEE Congress on Evolutionary Computation. 2756–2763. https://doi.org/10.1109/CEC.2009.4983288
  • Kim et al. (2022) Youngmin Kim, Richard Allmendinger, and Manuel López-Ibáñez. 2022. Are Evolutionary Algorithms Safe Optimizers?. In Proceedings of the Genetic and Evolutionary Computation Conference. Association for Computing Machinery, 814–822. https://doi.org/10.1145/3512290.3528818
  • Kim et al. (2021) Youngmin Kim, Richard Allmendinger, and Manuel López-Ibáñez. 2021. Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art. In Trustworthy AI - Integrating Learning, Optimization and Reasoning. Springer International Publishing, Cham, 123–139.
  • Liu and Nocedal (1989) Dong C Liu and Jorge Nocedal. 1989. On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 1 (1989), 503–528. https://doi.org/10.1007/BF01589116
  • Louis et al. (2022) Maxime Louis, Hector Romero Ugalde, Pierre Gauthier, Alice Adenis, Yousra Tourki, and Erik Huneker. 2022. Safe Reinforcement Learning for Automatic Insulin Delivery in Type I Diabetes. In Reinforcement Learning for Real Life Workshop, NeurIPS 2022.
  • Modugno et al. (2016) Valerio Modugno, Ugo Chervet, Giuseppe Oriolo, and Serena Ivaldi. 2016. Learning soft task priorities for safe control of humanoid robots with constrained stochastic optimization. In 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). 101–108. https://doi.org/10.1109/HUMANOIDS.2016.7803261
  • Nomura and Shibata (2024) Masahiro Nomura and Masashi Shibata. 2024. cmaes : A Simple yet Practical Python Library for CMA-ES. arXiv preprint arXiv:2402.01373 (2024).
  • Rasmussen et al. (2006) Carl Edward Rasmussen, Christopher KI Williams, et al. 2006. Gaussian Processes for Machine Learning. Vol. 1. Springer.
  • Serkan Kiranyaz (2014) Moncef Gabbouj Serkan Kiranyaz, Turker Ince. 2014. Multidimensional Particle Swarm Optimization for Machine Learning and Pattern Recognition (first ed.). Springer. https://doi.org/10.1007/978-3-642-37846-1
  • Sui et al. (2015) Yanan Sui, Alkis Gotovos, Joel Burdick, and Andreas Krause. 2015. Safe Exploration for Optimization with Gaussian Processes. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37). PMLR, 997–1005.
  • Toal and Arnold (2020) Lauchlan Toal and Dirk V. Arnold. 2020. Simple Surrogate Model Assisted Optimization with Covariance Matrix Adaptation. In Parallel Problem Solving from Nature – PPSN XVI. Springer International Publishing, 184–197.
  • Virtanen et al. (2020) Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
  • Yamada et al. (2023) Yutaro Yamada, Kento Uchida, Shota Saito, and Shinichi Shirakawa. 2023. Surrogate-Assisted (1+1)-CMA-ES with Switching Mechanism of Utility Functions. In Applications of Evolutionary Computation. Springer Nature Switzerland, 798–814.

Appendix A Investigation of Hyperparameter Sensitivity

We investigated the sensitivities of hyperparameters α\alpha, ζinit\zeta_{\mathrm{init}}, and TdataT_{\mathrm{data}} of the safe CMA-ES. The effect of those hyperparameters are summarized as

  • The hyperparameter α\alpha controls the rates of increase and decrease for the coefficient ρj(t+1)\rho_{j}^{(t+1)} for the estimated Lipschitz constant when the unsafe solutions are evaluated.

  • The hyperparameter ζinit\zeta_{\mathrm{init}} controls the coefficient τ(t+1)\tau^{(t+1)} for the estimated Lipschitz constant when the number of evaluated solutions used for the Gaussian process regression is small.

  • The hyperparameter TdataT_{\mathrm{data}} controls the number NdataN_{\mathrm{data}} of solutions to construct the safe region and the number NdataN_{\mathrm{data}} of training data for Gaussian process regression.

We used the safety constraint used in the first and third experiments in the paper as

(33) s(𝒙)=f(𝒙)andh=q(f,𝒳,0.5).\displaystyle s(\boldsymbol{x})=f(\boldsymbol{x})\quad\text{and}\quad h=q(f,\mathcal{X},0.5)\enspace.

We ran the safe CMA-ES changing a single hyperparameter and remaining the other hyperparameters to their recommended settings. We ran 50 independent trials for each setting.

A.1. Result of Sensitivity Experiment of α\alpha

We varied the setting of α\alpha as α=1,5,10,20,40,80,160\alpha=1,5,10,20,40,80,160. Figure 6 shows the transitions of the best evaluation value and the number of unsafe evaluations. We note that no unsafe evaluation occured in 5-dimensional functions and the 20-dimensional sphere function. Since the hyperparameter α\alpha affects the dynamics only when an unsafe evaluation occurs, the dynamics of the safe CMA-ES were not changed in those cases. For the 20-dimensional ellipsoid function, the unsafe evaluations tended to be reduced as the hyperparameter α\alpha increased, while the best evaluation value slightly stagnated in the first few updates. On the 20-dimensional rosenbrock function, a similar stagnation of the best evaluation value was observed with large α\alpha, while the number of unsafe evaluations remained unchanged except for the case α=1\alpha=1, i.e., the case without the adaptation of the coefficient ρj(t+1)\rho_{j}^{(t+1)}. Overall, the recommended setting α=10\alpha=10 seems to be a reasonable choice.

A.2. Result of Sensitivity Experiment of ζinit\zeta_{\mathrm{init}}

We varied the setting of ζinit\zeta_{\mathrm{init}} as ζinit=1,5,10,20,40\zeta_{\mathrm{init}}=1,5,10,20,40. Figure 7 shows the transitions of the best evaluation value and the number of unsafe evaluations. When ζinit=1\zeta_{\mathrm{init}}=1, indicating that the adaptation of coefficient τ(t+1)\tau^{(t+1)} is not applied, the unsafe evaluations were occasionally occurred on the 5-dimensional ellipsoid. On the 20-dimensional sphere function, a larger setting of ζinit\zeta_{\mathrm{init}} delayed the decrease of the best evaluation value, while unsafe evaluation did not occur in any of the settings. In contrast, on the 20-dimensional ellipsoid function, a larger setting of ζinit\zeta_{\mathrm{init}} reduced the number of unsafe evaluations while maintaining the decrease rate in the best evaluation value. We consider the performance of the safe CMA-ES to be a somewhat sensitive to ζinit\zeta_{\mathrm{init}}, and the suitable setting of ζinit\zeta_{\mathrm{init}} depends on the problem.

A.3. Result of Sensitivity Experiment of TdataT_{\mathrm{data}}

We varied the setting of TdataT_{\mathrm{data}} as Tdata=1,3,5,7,9T_{\mathrm{data}}=1,3,5,7,9. Figure 8 shows the transitions of the best evaluation value and the number of unsafe evaluations. The setting of TdataT_{\mathrm{data}} did not significantly change the dynamics of the safe CMA-ES significantly on most of the problems. When Tdata=1T_{\mathrm{data}}=1, the unsafe evaluation occurred in the 5-dimensional ellipsoid function. On the 20-dimensional rosenbrock function, a lager TdataT_{\mathrm{data}} reduced the unsafe evaluations. Overall, the recommended setting Tdata=5T_{\mathrm{data}}=5 performed well across all problems.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6. Result of Sensitivity Experiment of α\alpha. We plot the medians and interquartile ranges of the best evaluation value and the number of evaluations of unsafe solutions.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7. Result of Sensitivity Experiment of ζinit\zeta_{\mathrm{init}}. We plot the medians and interquartile ranges of the best evaluation value and the number of evaluations of unsafe solutions.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8. Result of Sensitivity Experiment of TdataT_{\mathrm{data}}. We plot the medians and interquartile ranges of the best evaluation value and the number of evaluations of unsafe solutions.