This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Gradient-free Importance Sampling Scheme for Efficient Reliability Estimation

Elsayed Eshra
The Pennsylvania State University
University Park, PA 16802
eme5375@psu.edu
&Konstantinos G. Papakonstantinou
The Pennsylvania State University
University Park, PA 16802
kpapakon@psu.edu
Abstract

This work presents a novel gradient-free importance sampling-based framework for precisely and efficiently estimating rare event probabilities, often encountered in reliability analyses of engineering systems. The approach is formulated around our foundational Approximate Sampling Target with Post-processing Adjustment (ASTPA) methodology. ASTPA uniquely constructs and directly samples an unnormalized target distribution, relaxing the optimal importance sampling distribution (ISD). The target’s normalizing constant is then estimated using our inverse importance sampling (IIS) scheme, employing an ISD fitted based on the obtained samples. In this work, a gradient-free sampling method within ASTPA is developed through a guided dimension-robust preconditioned Crank-Nicolson (pCN) algorithm, particularly suitable for black-box computational models where analytical gradient information is not available. To boost the sampling efficiency of pCN in our context, a computationally effective, general discovery stage for the rare event domain is devised, providing (multi-modal) rare event samples used in initializing the pCN chains. A series of diverse test functions and engineering problems involving high dimensionality and strong nonlinearity is presented, demonstrating the advantages of the proposed framework compared to several state-of-the-art sampling methods.

Keywords Rare Event  \cdot Gradient-free Sampling  \cdot Inverse Importance Sampling  \cdot Reliability Estimation  \cdot Preconditioned Crank-Nicolson  \cdot High dimensions  \cdot ASTPA.

1 Introduction

This work introduces a novel gradient-free framework for reliability analysis and rare event probabilities estimation in high-dimensional random variable spaces. Evaluation of system reliability and rare event uncertainty quantification (UQ) play a vital role in modern decision-making across diverse fields. Yet, as our systems and computational models grow increasingly complex, the computational cost of repeatedly evaluating them becomes a major challenge. Additionally, analytically deriving gradients for such models can sometimes be cumbersome or even infeasible, and using numerical gradient estimation methods in high-dimensional settings is impractical due to the high computational cost involved. These challenges are further compounded by the exceptionally low probabilities associated with such rare events, often ranging from 10410^{-4} to 10910^{-9} or lower. This necessitates the development of efficient gradient-free approaches capable of accurately estimating rare event probabilities while minimizing the number of model evaluations, a challenge that this work aims to address.

In the context of reliability estimation, failures are often considered rare events; therefore, rare event probability is also commonly referred to as failure probability. Several techniques have been developed over the years to estimate rare event (failure) probabilities, with the sampling-based approaches being able to address this problem in its utmost generality. To overcome the limitations of the crude Monte Carlo approach [1], various advanced sampling techniques have been proposed. Among others, Importance sampling (IS) [2], a well-known variance-reduction approach, is capable of efficiently estimating rare event probabilities. IS employs an appropriate importance sampling density (ISD), placing greater importance on rare event domains in the random variable space. This results in an increased number of rare event samples, thereby reducing the number of required model calls. Although the theoretically optimal ISD is known [3], it is challenging to sample from and impractical to implement. Notably, finding an effective near-optimal ISD is an ongoing, challenging research pursuit. The existing approaches can be largely classified into two categories [4]. The first seeks to approximate the optimal ISD by a parametric or non-parametric normalized probability density function (PDF). An early approach in this category utilizes densities centered around one or more design points [5], a powerful, practical process, yet with certain limitations for high-dimensional, complex spaces, as also discussed in [6]. A more recent, popular approach is the cross entropy-based importance sampling (CE-IS). It alternatively constructs a parametric ISD hh by minimizing the Kullback-Leibler (KL) divergence between the optimal ISD and a chosen parametric family of probability distributions [7, 8, 9, 10, 11]. Recent CE-IS variants incorporate information from influential rare event points, e.g., design points, to effectively initialize the cross entropy process [12, 13]. Within this first category of methods, several other different noteworthy approaches have also been proposed to approximate the optimal ISD, e.g., [14, 15, 16, 17, 18]. The second category of methods employs sequential variants of importance sampling, mainly utilizing a number of intermediate unnormalized distributions [19, 20, 21, 22]. These unnormalized distributions are generally more flexible than the PDFs usually utilized within the first category, thus being capable of providing a better approximation of the optimal ISD. Another sequential sampling method that shares similarities [4] with this second methodological category is the Subset Simulation (SuS) approach, originally presented in [6], and continuously refined [23, 24, 25, 26, 27], due to its potential to handle high-dimensional problems, despite some limitations [28]. A key to the success of all sequential methods is to design nearby intermediate distributions and to sufficiently sample them to accurately compute the sought rare event probability, a non-trivial and computationally demanding task.

Recently, we developed a novel foundational importance sampling-based approach in a non-sequential manner, termed Approximate Sampling Target with Post-processing Adjustment (ASTPA), for reliability and rare event probabilities estimation [29, 4]. The ASTPA approach innovatively decomposes the challenging rare event probability estimation process into two less demanding estimation problems. First, ASTPA uniquely constructs and directly samples an unnormalized target distribution, relaxing the optimal ISD, thereby making it comparatively much easier to sample. The obtained samples are subsequently utilized not only to compute a shifted estimate of the sought probability but also to guide the second ASTPA stage. Post-sampling, the normalizing constant of the approximate sampling target is computed through the inverse importance sampling (IIS) technique [29], which utilizes an importance sampling density (ISD) fitted based on the samples drawn in the first stage. The rare event probability is eventually computed by utilizing this computed normalizing constant to correct the shifted estimate of the first stage. Consequently, ASTPA significantly reduces the computational cost typically associated with using multiple intermediate distributions. Sampling the approximate target distribution in ASTPA has been so far performed using gradient-based Hamiltonian Markov Chain Monte Carlo (HMCMC) methods, demonstrating outstanding performance in complex, high-dimensional static and first-passage dynamic problems, both in Gaussian spaces [29] and directly in non-Gaussian spaces [30, 4]. However, the competitiveness of this gradient-based approach is hindered when analytical gradients cannot be obtained.

Gradient-free sampling has been extensively employed in various reliability estimation approaches, such as importance sampling and Subset Simulation, as discussed earlier. One common method employed is the Component-wise Metropolis-Hastings (CWMH) algorithm, often applied within Subset Simulation [6] to effectively sample high-dimensional conditional distributions by sequentially updating individual dimensions. To further enhance efficiency, delayed rejection variants of CWMH have been introduced, allowing for multiple proposal attempts within a single iteration when a proposal is rejected, thereby increasing acceptance rates and reducing sample autocorrelation [31, 32, 33, 34, 35]. Another significant development is the preconditioned Crank-Nicolson (pCN) algorithm, a MCMC method designed for high-dimensional probability distributions and infinite-dimensional settings, often encountered in Bayesian inverse problems [36, 37]. Inspired by the Crank-Nicolson discretization for solving partial differential equations, the pCN sampler proposes updates by combining the current state with a perturbation drawn from the prior distribution, scaled by a tunable parameter to control the step size. This design maintains stable acceptance probabilities even as dimensionality increases, achieving dimension-robust performance [38]. To further improve efficiency, some extensions of the standard pCN algorithm incorporate information from the posterior distribution by adapting the proposal covariance using sampling history [39, 40] or by estimating the posterior covariance through gradient information [41]. The pCN algorithm has been applied in various reliability estimation frameworks, including Subset Simulation [42, 32] and importance sampling [20, 43].

In this paper, we introduce a novel gradient-free framework for efficiently and accurately estimating rare event probabilities, building on our foundational ASTPA approach, particularly suitable for black-box computational models where analytical gradient information is not available, for any reason possible. Central to the framework is a guided variant of the dimension-robust pCN algorithm, tailored to sample the approximate target distribution within ASTPA. The standard pCN algorithm encounters challenges with multimodal distributions or when the target is distant from prior information, as often occurs with rare event domains located in the tails of original probability distributions. To overcome these limitations, we propose a computationally efficient discovery stage in this work, properly designed to enhance sampling efficiency by generating (multimodal) rare event samples to serve as initial seeds for the pCN chains. This process begins by sampling from the original distribution or an adjusted version, such as one with a modified covariance matrix, followed by conditional sampling based on the reciprocal of the original distribution, facilitating rapid diffusion toward the rare event domain. The procedure is stopped after generating a specified number of rare event samples, from which a defined number of seeds is randomly selected to initialize the pCN chains. The generated pCN sample set is then employed within the ASTPA approach to eventually estimate the sought failure probability. Consistent with our previous findings [29, 4], the ASTPA estimator remains unbiased, and the derived analytical coefficient of variation aligns very well with the sampling one, as demonstrated by the numerical experiments presented in this work.

The structure of this paper is as follows: Section 2 defines the rare event probability estimation problem, highlighting its complexities and the considered challenging yet realistic and practical scenarios. The ASTPA framework is presented in Section 3, followed by a detailed description of the developed gradient-free sampling scheme in Section 4. A summary of the pCN-based ASTPA framework is presented in Section 5. To fully evaluate the capabilities of the proposed methodology, Section 6 demonstrates its performance, successfully comparing it with state-of-the-art Subset Simulation and advanced importance sampling methods across a series of challenging low- and high-dimensional problems. The paper then concludes in Section 7.

2 Rare Event Probability Estimation

Let 𝑿\bm{X} be a continuous random vector taking values in 𝒳d\mathcal{X}\subseteq\mathbb{R}^{d} and having a joint probability density function (PDF) π𝑿\pi_{\bm{X}}. We are interested in rare events {𝒙:g(𝒙)0}\mathcal{F}\coloneqq\{\bm{x}\,:\,g(\bm{x})\leq 0\}, where g:𝒳g:\mathcal{X}\rightarrow\mathbb{R} is a performance function, also known as the limit-state function, defining the occurrence of a rare event. In this work, we aim to estimate the rare event probability pp_{\mathcal{F}}:

p=π𝑿(𝒙)𝑑𝒙=𝒳I(𝒙)π𝑿(𝒙)𝑑𝒙=𝔼π𝑿[I(𝑿)]p_{\mathcal{F}}=\int_{\mathcal{F}}\pi_{\bm{X}}(\bm{x})d\bm{x}=\int_{\mathcal{X}}I_{\mathcal{F}}(\bm{x})\,\pi_{\bm{X}}(\bm{x})d\bm{x}={\mathop{\mathbb{E}}}_{\pi_{\bm{X}}}[I_{\mathcal{F}}(\bm{X})] (1)

where I:𝒳{0,1}I_{\mathcal{F}}:\mathcal{X}\rightarrow\{0,1\} is the indicator function, i.e., I(𝒙)=1I_{\mathcal{F}}(\bm{x})=1 if 𝒙\bm{x}\in\mathcal{F}, and I(𝒙)=0I_{\mathcal{F}}(\bm{x})=0 otherwise, and 𝔼\mathop{\mathbb{E}} is the expectation operator.

Our objective in this work is to estimate the described integration in Eq. 1 under these challenging yet realistic settings: (i) the analytical calculation of Eq. 1 is generally intractable; (ii) the computational effort for evaluating II_{\mathcal{F}} for each realization 𝒙\bm{x} is assumed to be quite significant, often relying on computationally intensive models, necessitating the minimization of such function evaluations (model calls); (iii) the rare event probability pp_{\mathcal{F}} is extremely low, typically in the range of 10410910^{-4}-10^{-9} or lower; (iv) the random variable space, 𝒳d\mathcal{X}\subseteq\mathbb{R}^{d}, is high-dimensional, with dd often as large as 10210^{2} or more; (v) the analytical gradient of the limit state function is not available, motivating the use of gradient-free approaches; and (vi) the joint probability density, π𝑿(𝒙)\pi_{\bm{X}}(\bm{x}), is assumed to follow an independent standard Gaussian distribution, i.e., π𝑿(𝒙)=𝒩(𝑿; 0,𝐈)\pi_{\bm{X}}(\bm{x})=\mathcal{N}(\bm{X};\,\mathbf{0},\,\mathbf{I}), with 𝐈\mathbf{I} being an identity matrix, a common assumption in rare event estimation. In most cases, this assumption is not restrictive, as transforming the original random variables to the independent standard Gaussian space can be uncomplicated [44, 45, 46, 47]. However, when such Gaussian transformations are not feasible, directly estimating rare event probabilities in complex non-Gaussian spaces may require sophisticated gradient-based samplers, a challenge addressed in our Quasi-Newton mass preconditioned HMCMC (QNp-HMCMC)-based ASTPA framework, as demonstrated in [4, 48, 49]. The gradient-free sampling approach developed in this work is specifically tailored for Gaussian spaces, with ongoing efforts to extend these methods in the future to direct estimations in non-Gaussian spaces.

Under these settings, several sampling-based methods for estimating rare event probabilities become highly inefficient. For instance, the standard Monte Carlo estimator of Eq. 1 using {𝒙i}1N\{\bm{x}_{i}\}_{1}^{N} draws from π𝑿\pi_{\bm{X}}, p^=1Ni=1NI(𝒙i)\hat{p}_{\mathcal{F}}=\frac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i}), has a coefficient of variation (1p)/(Np)\sqrt{(1-p_{\mathcal{F}})/(Np_{\mathcal{F}})}, rendering this estimate inefficient by requiring a prohibitively large number of samples to accurately quantify small probabilities; see setting (iii) above. As discussed in Section 1, importance sampling (IS) can efficiently compute the integral in Eq. 1, through sampling from an appropriate importance sampling density (ISD), h(𝑿)h(\bm{\bm{X}}). The ISD places greater importance on rare event domain, \mathcal{F}, in the random variable space compared to π𝑿(𝒙)\pi_{\bm{X}}(\bm{x}), satisfying the sufficient condition supp(Iπ𝑿)supp(h)\text{supp}(I_{\mathcal{F}}\,\pi_{\bm{X}})\subseteq\text{supp}(h), with supp()\text{supp}(\cdot) denoting the support of the function. Eq. (1) can then be written as:

p=𝒳I(𝒙)π𝑿(𝒙)h(𝒙)h(𝒙)𝑑𝒙=𝔼h[I(𝑿)π𝑿(𝑿)h(𝑿)]p_{\mathcal{F}}=\int_{\mathcal{X}}I_{\mathcal{F}}(\bm{x})\,\dfrac{\pi_{\bm{X}}(\bm{x})}{h(\bm{x})}h(\bm{x})d\bm{x}={\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{h(\bm{X})}\big{]} (2)

leading to the unbiased IS estimator:

p^=1Ni=1NI(𝒙i)π𝑿(𝒙i)h(𝒙i)\hat{p}_{\mathcal{F}}=\frac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i})\dfrac{\pi_{\bm{X}}(\bm{x}_{i})}{h(\bm{x}_{i})} (3)

where {𝒙i}1Nh\{\bm{x}_{i}\}_{1}^{N}\sim h. The efficiency of IS relies heavily on the careful selection of both the ISD and the sampling algorithm. A theoretically optimal ISD hh^{*}, providing zero-variance IS estimator, is given by [3]:

h(𝑿)=1pI(𝑿)π𝑿(𝑿)\displaystyle h^{*}(\bm{X})=\dfrac{1}{p_{\mathcal{F}}}I_{\mathcal{F}}(\bm{X})\pi_{\bm{X}}(\bm{X}) (4)

However, this optimal ISD cannot be utilized because the normalizing constant in Eq. 4 is the sought probability pp_{\mathcal{F}}. Additionally, hh^{*} can be highly concentrated within the rare event domain, making it challenging to sample, particularly in high dimensions [4]. To address these challenges, we employ our developed importance sampling-based framework, ASTPA [29, 4], which constructs a relaxed ISD that approximates the optimal one while being much easier to sample. This relaxation allows for more efficient sampling, leading to a more accurate estimation of the sought probability. The primary focus of this work is to introduce an efficient, gradient-free sampling approach, which is especially beneficial when analytical gradients are not available, as detailed in Section 4.

3 Approximate Sampling Target with Post-processing Adjustment (ASTPA) in Gaussian Spaces

ASTPA estimates the rare event probability in Eq. 1 through an innovative non-sequential, importance sampling variant, as introduced in [29, 4]. In this approach, ASTPA constructs a single unnormalized approximate sampling target h~\tilde{h}, relaxing the optimal ISD in Eq. 4. After sampling h~\tilde{h}, its normalizing constant ChC_{h} is computed utilizing inverse importance sampling. By employing the unnormalized importance sampling distribution h~\tilde{h}, Eq. 2 can be written as:

p=𝔼h[I(𝑿)π𝑿(𝑿)h(𝑿)]=Ch𝔼h[I(𝑿)π𝑿(𝑿)h~(𝑿)]=Chp~p_{\mathcal{F}}={\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{h(\bm{X})}\big{]}=C_{h}\,{\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{\tilde{h}(\bm{X})}\big{]}=C_{h}\,\tilde{p}_{\mathcal{F}} (5)

Thus, computing pp_{\mathcal{F}} is decomposed into two generally simpler tasks. The first involves constructing h~\tilde{h} and sampling {𝒙i}i=1Nh\{\bm{x}_{i}\}_{i=1}^{N}\sim h111{𝒙i}i=1Nh\{\bm{x}_{i}\}_{i=1}^{N}\sim h is equivalent to {𝒙i}i=1Nh~\{\bm{x}_{i}\}_{i=1}^{N}\sim\tilde{h}., to compute the unbiased expectation of the weighted indicator function (shifted probability estimate) as:

p~=𝔼h[I(𝑿)π𝑿(𝑿)h~(𝑿)]p~^=1Ni=1NI(𝒙i)π𝑿(𝒙i)h~(𝒙i)\tilde{p}_{\mathcal{F}}=\,{\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{\tilde{h}(\bm{X})}\big{]}\approx\hat{\tilde{p}}_{\mathcal{F}}=\dfrac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i})\dfrac{\pi_{\bm{X}}(\bm{x}_{i})}{\tilde{h}(\bm{x}_{i})} (6)

The normalizing constant, C^h\hat{C}_{h}, is then estimated using inverse importance sampling, as discussed in Section 3.2. The ASTPA estimator for the sought rare event probability can then be computed as:

p^=p~^C^h\hat{p}_{\mathcal{F}}=\hat{\tilde{p}}_{\mathcal{F}}\,\,\hat{C}_{h} (7)

Fig. 1 provides a concise illustration of the ASTPA framework using a nonlinear bimodal limit state function, as detailed in Section 6.1, with p9.47×106p_{\mathcal{F}}\sim 9.47\times 10^{-6}. The construction of the approximate sampling target in ASTPA, along with recommended parameters, is discussed in Section 3.1. We then present the inverse importance sampling technique in Section 3.2. Subsequently, Section 3.3 shows the statistical properties of the ASTPA estimator presented in Eq. 7.

Refer to caption Refer to caption
(a) π𝑿\pi_{\bm{X}} (b) h(𝑿)h^{*}(\bm{X})
Refer to caption Refer to caption Refer to caption Refer to caption
(c) h~(𝑿)\tilde{h}(\bm{X}) (d) Samples h~(𝑿)\sim\tilde{h}(\bm{X}) (e) Q(𝑿)Q(\bm{X}) (f) Samples Q(𝑿)\sim Q(\bm{X})
Figure 1: Outlining the ASTPA framework; (a) shows an original bivariate independent standard Gaussian distribution π𝑿\pi_{\bm{X}} in red, and a limit state function at g(𝑿)=0g(\bm{X})=0 shown using a grey line, with a bimodal rare event domain being inside g(𝑿)=0g(\bm{X})=0, as detailed in Section 6.1, (b) visualizes the optimal importance sampling density, hh^{*}, in Eq. 4, (c) depicts the constructed approximate sampling target, h~\tilde{h}, in Eq. 9, (d) showcases samples from this target, subsequently used to compute p~^\hat{\tilde{p}}_{\mathcal{F}} in Eq. 6, and (e) and (f) demonstrate the inverse importance sampling procedure to compute the normalizing constant C^h\hat{C}_{h} in Eq. 11. Specifically, (e) and (f) present a crudely fitted Q(𝑿)Q(\bm{X}) and its samples, respectively. The sought probability is eventually computed using Eq. 7.

3.1 Target distribution formulation

As discussed earlier, the basic idea is to construct a near-optimal sampling target distribution h~\tilde{h}, placing higher importance on the rare event domain \mathcal{F}. In ASTPA, this is achieved by approximating the indicator function II_{\mathcal{F}} in Eq. 4 with a smooth likelihood function g𝑿\ell_{g_{\bm{X}}}, chosen as a logistic cumulative distribution function (CDF) of the negative of a scaled limit state function, FCDF(g(𝑿)/gc)F_{\text{CDF}}\left(\nicefrac{{-g(\bm{X})}}{{g_{c}}}\right), with gcg_{c} being a scaling constant:

g𝑿=FCDF(g(𝑿)gcμg,σ)=(1+exp((g(𝑿)gc)+μg(3π)σ))1\displaystyle\ell_{g_{\bm{X}}}=\,F_{\text{CDF}}\bigg{(}\dfrac{-g(\bm{X})}{g_{c}}\mid\ \mu_{g},\ \sigma\bigg{)}=\,\Bigg{(}1+\exp\bigg{(}\dfrac{(\frac{g(\bm{X})}{g_{c}})+\mu_{g}}{(\frac{\sqrt{3}}{\pi})\sigma}\bigg{)}\Bigg{)}^{-1} (8)

where μg\mu_{g} and σ\sigma are the mean and dispersion factor of the logistic CDF, respectively. The approximate sampling target is ASTPA is then expressed as:

h~(𝑿)=g𝑿π𝑿(𝑿)=(1+exp((g(𝑿)gc)+μg(3π)σ))1π𝑿(𝑿)\displaystyle\tilde{h}(\bm{X})=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}(\bm{X})=\,\Bigg{(}1+\exp\bigg{(}\dfrac{(\frac{g(\bm{X})}{g_{c}})+\mu_{g}}{(\frac{\sqrt{3}}{\pi})\sigma}\bigg{)}\Bigg{)}^{-1}\,\pi_{\bm{X}}(\bm{X}) (9)

The dispersion factor σ\sigma controls the spread of the likelihood function and thus influences the shape of the constructed target h~\tilde{h}. Recommended values for σ\sigma lie in the range of [0.1, 0.6][0.1,\,0.6], with lower values (typically between 0.10.1 and 0.40.4) generally working efficiently. Higher values within the recommended range may, however, be needed for high-dimensional cases involving strongly nonlinear effects. The mean parameter, μg\mu_{g}, of the logistic CDF is typically chosen as μg=1.21σ\mu_{g}=1.21\sigma, generally placing μg\mu_{g} inside the rare event domain, as derived in [4]. In Fig. 2, we illustrate how the dispersion factor, σ\sigma, influences the shape of the target distribution, based on a nonlinear bimodal limit state function, as detailed in Section 6.1. As shown, decreasing σ\sigma results in a more concentrated target distribution inside the rare event domain. Although pushing the target further inside the rare event domain, i.e., closer to the optimal distribution (Iπ𝑿)(I_{\mathcal{F}}\pi_{\bm{X}}), can generate more rare event samples, it may also complicate the sampling process and, thus, the computation of the normalizing constant C^h\hat{C}_{h}, as discussed in [4].

The scaling g(𝑿)/gc\nicefrac{{g(\bm{X})}}{{g_{c}}} accounts for the varying magnitudes of limit-state functions, ensuring that the input to the logistic CDF (g(𝑿)/gc)\left(\nicefrac{{g(\bm{X})}}{{g_{c}}}\right), evaluated at the mean of the original distribution, consistently falls within a predefined range. This process standardizes the input while keeping the recommended values for σ\sigma unchanged. To ensure the input at 𝒙=𝟎\bm{x}=\bm{0} lies within a desired range of [3,7][3,7], gcg_{c} is defined as gc=g(𝟎)/qg_{c}=\nicefrac{{g(\bm{0})}}{{q}}, where q[3,7] if g(𝟎)[3,7]\,q\in[3,7]\text{ if }g(\bm{0})\notin[3,7], and gc=1g_{c}=1 otherwise. Our empirical analysis confirms the effectiveness of this scaling technique, demonstrating that the recommended range for qq is practical for the studied Gaussian problems.

Fig. 3 demonstrates the effect of the scaling constant gcg_{c} on constructing the target distribution h~(𝑿)\tilde{h}(\bm{X}). Since the limit state function considered in this problem provides g(𝟎)>7g(\bm{0})>7, the scaling constant gcg_{c} can be defined as g(𝟎)/4\nicefrac{{g(\bm{0})}}{{4}}. Fig. 3(b) shows the unnormalized target distribution h~(𝑿)\tilde{h}(\bm{X}) constructed without applying the proper scaling constant. As a result, the input (g(𝑿)/gc)\left(\nicefrac{{g(\bm{X})}}{{g_{c}}}\right) at 𝒙=0\bm{x}=0 does not lie within the desired range, leading to a highly concentrated target. In contrast, Fig. 3(c) presents the corrected target distribution after applying the appropriate scaling factor gc=g(𝟎)/4g_{c}=\nicefrac{{g(\mathbf{0})}}{{4}}. These plots clearly demonstrate how scaling the limit-state function helps maintain an appropriately relaxed target, with the recommended parameter values remaining unchanged, thus enhancing sampling efficiency.

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Figure 2: Effect of the likelihood dispersion factor σ\sigma on the constructed target distribution for a nonlinear bimodal limit-state function, shown in gray at g(𝑿)=0g(\bm{X})=0, as detailed in Section 6.1.
Refer to caption Refer to caption Refer to caption
(a) π𝑿\pi_{\bm{X}} (b) h~(𝑿);gc=1\tilde{h}(\bm{X});g_{c}=1 (c) h~(𝑿);gc=g(0)4\tilde{h}(\bm{X});{g_{c}=\dfrac{g(\textbf{0})}{4}}
Figure 3: Illustrating the impact of the scaling constant gcg_{c} on constructing of the target distribution h~(𝑿)\tilde{h}(\bm{X}). The gray island-shaped curve represents Himmelblau’s multi-modal limit-state function g(𝑿)g(\bm{X}), with the failure domain inside the curves and p2.81×107p_{\mathcal{F}}\sim 2.81\times 10^{-7}, as detailed in Section 6.3. (a) Bivariate Gaussian distribution. (b) Constructed target distribution h~(𝑿)\tilde{h}(\bm{X}) without the proper value of the scaling constant. (c) The corrected target h~(𝑿)\tilde{h}(\bm{X}) after applying an appropriate scaling constant gc=g(𝟎)/4g_{c}=\nicefrac{{g(\bm{0})}}{{4}}.

3.2 Inverse importance sampling

Inverse importance sampling (IIS) is a general technique to compute normalizing constants, demonstrating exceptional performance in the context of rare event probability estimation [29]. Given an unnormalized distribution h~\tilde{h}, and samples {𝒙i}i=1Nh\{\bm{x}_{i}\}_{i=1}^{N}\sim h, inverse importance sampling first fits an ISD Q(𝒙)Q(\bm{x}) based on the samples {𝒙i}i=1Nh\{\bm{x}_{i}\}_{i=1}^{N}\sim h. IIS then estimates the normalizing constant ChC_{h} as:

Ch=𝒳h~(𝒙)𝑑𝒙=𝒳h~(𝒙)Q(𝒙)Q(𝒙)𝑑𝒙=𝔼Q[h~(𝑿)Q(𝑿)]C_{h}=\int_{\mathcal{X}}\tilde{h}(\bm{x})d\bm{x}=\int_{\mathcal{X}}\dfrac{\tilde{h}(\bm{x})}{Q(\bm{x})}Q(\bm{x})d\bm{x}\,=\mathbb{E}_{Q}\big{[}\dfrac{\tilde{h}(\bm{X})}{Q(\bm{X})}] (10)

By drawing {𝒙}i=1M\{\bm{x}^{\prime}\}_{i=1}^{M} i.i.d. samples from QQ, the unbiased IIS estimator can be computed as:

C^h=1Mi=1Mh~(𝒙i)Q(𝒙i)\hat{C}_{h}=\dfrac{1}{M}\sum_{i=1}^{M}\dfrac{\tilde{h}(\bm{x}_{i}^{\prime})}{Q(\bm{x}_{i}^{\prime})} (11)

In this work, a Gaussian Mixture Model (GMM), based on the available samples {𝒙i}i=1Nh\{\bm{x}_{i}\}_{i=1}^{N}\sim h, is employed as the ISD Q(𝑿)Q(\bm{X}). In low-dimensional spaces, where d<20d<20, we employ a GMM with a large number (10)(\sim 10) of Gaussian components that have full covariance matrices. This approach aims to closely approximate the target distribution h~\tilde{h} with an ISD Q(.)Q(.), thus improving the IIS estimator in eq. 11. To address the challenges posed by high-dimensional spaces, GMMs with diagonal covariance matrices are used in high-dimensional examples in this work. This choice reduces the number of GMM parameters to be estimated, thereby mitigating the scalability issues typically associated with density estimation methods. Notably, IIS has been shown to work effectively even with a crudely fitted QQ [4, 29].

The IIS technique is further enhanced using the splitting approach introduced in [4] to improve the robustness of the IIS estimator against the impact of rarely observed outliers or anomalous samples. In this approach, the sample set {𝒙i}i=1MQ\{\bm{x}_{i}^{\prime}\}_{i=1}^{M}\sim Q is split into two equal-sized subsets, and the IIS estimate C^h\hat{C}_{h} is computed separately for each subset. If the two estimates are within a reasonable range of each other (specifically, 13C^h1/C^h23\frac{1}{3}\leq\nicefrac{{\hat{C}_{h}^{1}}}{{\hat{C}_{h}^{2}}}\leq 3), the final estimate is taken as their average. Otherwise, the final estimate is conservatively set to the minimum of the two. This method has proven effective across all examples presented in this work.

3.3 Statistical properties of the ASTPA estimator

The ASTPA estimator of the rare event probability can finally be computed as:

p\displaystyle p_{\mathcal{F}} =𝔼h[I(𝑿)π𝑿(𝑿)h~(𝑿)]𝔼Q[h~(𝑿)Q(𝑿)]p^=p~^C^h=(1Ni=1NI(𝒙i)π𝑿(𝒙i)h~(𝒙i))(1Mi=1Mh~(𝒙i)Q(𝒙i))\displaystyle=\,{\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{\tilde{h}(\bm{X})}\big{]}\,\mathbb{E}_{Q}\big{[}\dfrac{\tilde{h}(\bm{X})}{Q(\bm{X})}]\approx\hat{p}_{\mathcal{F}}=\hat{\tilde{p}}_{\mathcal{F}}\,\hat{C}_{h}=\bigg{(}\dfrac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i})\dfrac{\pi_{\bm{X}}(\bm{x}_{i})}{\tilde{h}(\bm{x}_{i})}\bigg{)}\bigg{(}\dfrac{1}{M}\sum_{i=1}^{M}\dfrac{\tilde{h}(\bm{x}_{i}^{\prime})}{Q(\bm{x}_{i}^{\prime})}\bigg{)} (12)

where {𝒙i}i=1N\{\bm{x}_{i}\}_{i=1}^{N} and {𝒙i}i=1M\{\bm{x}_{i}^{\prime}\}_{i=1}^{M} are samples from hh and QQ, respectively.

As shown in [29, 4], the ASTPA estimator is unbiased, i.e., 𝔼[p^]=𝔼[p~^C^h]=p~Ch\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]=\mathop{\mathbb{E}}[\hat{\tilde{p}}_{\mathcal{F}}\,\hat{C}_{h}]=\tilde{p}_{\mathcal{F}}\,C_{h}. The analytical Coefficient of Variation (C.o.V) of the ASTPA estimator can also be computed as C.o.VVar^(p^)/p^\textrm{C.o.V}\approx\nicefrac{{\sqrt{\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})}}}{{\hat{p}_{\mathcal{F}}}}, where the variance (Var^(p^)\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})) is estimated by:

Var^(p^)=(p~^)2Var^(C^h)+(C^h)2Var^(p~^)+Var^(p~^)Var^(C^h)\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})=(\hat{\tilde{p}}_{\mathcal{F}})^{2}\,\widehat{\text{Var}}(\hat{C}_{h})+(\hat{C}_{h})^{2}\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}})+\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}})\widehat{\text{Var}}(\hat{C}_{h}) (13)

Here, p~^\hat{\tilde{p}}_{\mathcal{F}} and C^h\hat{C}_{h} can be computed according to Eqs. 6 and 11, respectively. The variances Var^(p~^)\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}}) and Var^(C^h)\widehat{\text{Var}}(\hat{C}_{h}) can thus be estimated as follows:

Var^(p~^)=1Ns(Ns1)i=1Ns(I(𝒙i)π𝑿(𝒙i)h~(𝒙i)p~^)2,Var^(C^h)=1M(M1)i=1M(h~(𝒙i)Q(𝒙i)C^h)2\displaystyle\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}})=\frac{1}{N_{s}(N_{s}-1)}\sum_{i=1}^{N_{s}}\bigg{(}\dfrac{I_{\mathcal{F}}(\bm{x}_{i})\,\pi_{\bm{X}}(\bm{x}_{i})}{\tilde{h}(\bm{x}_{i})}-\hat{\tilde{p}}_{\mathcal{F}}\bigg{)}^{2},\quad\quad\widehat{\text{Var}}(\hat{C}_{h})=\frac{1}{M(M-1)}\sum_{i=1}^{M}\Bigg{(}\dfrac{\tilde{h}(\bm{x}_{i}^{\prime})}{Q(\bm{x}_{i}^{\prime})}-\hat{C}_{h}\Bigg{)}^{2} (14)

where NsN_{s} denotes the number of used Markov chain samples, accounting for the fact that the samples are not independent and identically distributed (i.i.d) in this case. To address this, a thinning process is adopted to reduce the sample size from NN to NsN_{s} based on the effective sample size (ESS) of the sample set {𝒙i}i=1N\{\bm{x}_{i}\}_{i=1}^{N}. We select every jthj^{th} sample, where jj, an integer, is determined by:

j=N4ESSminj=\left\lfloor\dfrac{N}{4\cdot\text{ESS}_{\text{min}}}\right\rfloor (15)

where ESSmin\text{ESS}_{\text{min}} represents the minimum effective sample size across all dimensions. To ensure jj remains within practical bounds, it is constrained to the set {3,4,5,,30}\{3,4,5,\ldots,30\}. Any calculated value of jj outside this range is adjusted to the nearest bound within it. The analytical C.o.V computed according to this approach showed very good agreement with the sample C.o.V in all our numerical examples.

4 Gradient-free Sampling of the Approximate Target (h~\tilde{h}) in ASTPA

In this section, we introduce a novel gradient-free approach for sampling the approximate target, h~=g𝑿π𝑿\tilde{h}=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}, in ASTPA. Rare event probability estimation typically faces two fundamental sampling challenges. First, efficiently sampling high-dimensional distributions is difficult due to the curse of dimensionality, where the volume of the space expands exponentially with the number of dimensions [50]. Second, the rare event domain, {𝒙:g(𝒙)0}\mathcal{F}\coloneqq\{\bm{x}\,:\,g(\bm{x})\leq 0\}, often exists randomly in the tails of the probability distribution π𝑿\pi_{\bm{X}}, making it challenging to effectively guide samples toward these regions of interest, which may also exhibit multi-modality; see Fig. 1(a) for an example. In addressing the first challenge, we adopt the dimension-robust preconditioned Crank-Nicolson (pCN) algorithm within our approach, as discussed in Section 4.1. For the second challenge, we develop an effective discovery method, aimed at precisely locating rare event domains and appropriately initializing the pCN chains, as detailed in Section 4.2. Our developed algorithm, termed guided-pCN, thus enhances sampling for h~\tilde{h} by overcoming the limitations of the standard pCN sampler in the context of rare event probability estimation.

4.1 The preconditioned Crank-Nicolson (pCN) algorithm

The preconditioned Crank-Nicolson (pCN) algorithm is a gradient-free Markov Chain Monte Carlo (MCMC) method designed for efficient sampling from high-dimensional probability distributions [36, 37]. To sample from a target distribution h~=g𝑿π𝑿\tilde{h}=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}, the pCN sampler proposes an update, 𝒙~\tilde{\bm{x}}, at each iteration, tt, by combining the current sample, 𝒙t\bm{x}_{t}, with a perturbation drawn from the original (prior) Gaussian distribution, π𝑿=𝒩(𝑿; 0,𝐂)\pi_{\bm{X}}=\mathcal{N}(\bm{X};\,\bm{0},\mathbf{C}), as follows:

𝒙~=1β2𝒙t+β𝝃,𝝃𝒩(𝟎,𝐂)\tilde{\bm{x}}=\sqrt{1-\beta^{2}}\,\bm{x}_{t}+\beta\,\bm{\xi},\quad\bm{\xi}\sim\mathcal{N}(\mathbf{0},\mathbf{C}) (16)

where β(0,1]\beta\in(0,1] is a tunable scaling parameter, discussed below, and 𝐂\mathbf{C} is a covariance matrix. This update is equivalent to using a Gaussian proposal distribution 𝒩(1β2𝒙t,β2𝐂)\mathcal{N}(\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C}), yielding the acceptance probability:

α=min{1,g𝑿(𝒙~)g𝑿(𝒙t)}\alpha=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})}{\ell_{g_{\bm{X}}}(\bm{x}_{t})}\bigg{\}} (17)

where g𝑿\ell_{g_{\bm{X}}} is the likelihood function in Eq. 8. This formula illustrates a fundamental characteristic of the pCN algorithm: its invariance to the underlying Gaussian distribution 𝒩(𝑿; 0,𝐂)\mathcal{N}(\bm{X};\,\bm{0},\mathbf{C}). In essence, this ensures that the proposed sample is always accepted when sampling from π𝑿\pi_{\bm{X}}, making the pCN highly suitable for efficiently exploring high-dimensional Gaussian spaces. This invariance can be understood by examining the Metropolis-Hastings (MH) ratio when using the pCN algorithm to sample 𝒩(𝑿; 0,𝐂)\mathcal{N}(\bm{X};\,\bm{0},\mathbf{C}), as:

α=min{1,𝒩(𝒙~; 0,𝐂)𝒩(𝒙t𝒙~;1β2𝒙~,β2𝐂)𝒩(𝒙t; 0,𝐂)𝒩(𝒙~𝒙t;1β2𝒙t,β2𝐂)}\alpha=\min\bigg{\{}1,\dfrac{\mathcal{N}(\tilde{\bm{x}};\,\bm{0},\mathbf{C})\,\mathcal{N}(\bm{x}_{t}\mid\tilde{\bm{x}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C})}{\mathcal{N}(\bm{x}_{t};\,\bm{0},\mathbf{C})\,\mathcal{N}(\tilde{\bm{x}}\mid\bm{x}_{t};\,\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C})}\bigg{\}} (18)
Algorithm 1 The preconditioned Crank-Nicolson (pCN) Algorithm
1:procedure pCN(𝒙0\bm{x}_{0}, β0\beta_{0}, α\alpha^{*}, LchainL_{\text{chain}}, g𝑿(𝑿)\ell_{g_{\bm{X}}}(\bm{X}), 𝐂\mathbf{C})
2:     for t=1t=1 toto LChain{L_{\text{Chain}}} do
3:         𝒙t\bm{x}_{t} \leftarrow 𝒙t1\bm{x}_{t-1}
4:         𝒙~=1βt12𝒙t1+βt1𝝃,𝝃𝒩(𝟎,𝐂)\tilde{\bm{x}}=\sqrt{1-\beta_{t-1}^{2}}\,\bm{x}_{t-1}+\beta_{t-1}\,\bm{\xi},\quad\bm{\xi}\sim\mathcal{N}(\mathbf{0},\mathbf{C}) \triangleright Propose candidate sample
5:         withwith probabilityprobability:\triangleright Accept or reject candidate sample
6:                  αt=min{1,g𝑿(𝒙~)g𝑿(𝒙t1)}\alpha_{t}=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})}{\ell_{g_{\bm{X}}}(\bm{x}_{t-1})}\bigg{\}}
7:                  𝒙t\bm{x}_{t} \leftarrow 𝒙~\tilde{\bm{x}}
8:         logβt=logβt1+t1/2(αtα)\log\beta_{t}=\log\beta_{t-1}+t^{\nicefrac{{-1}}{{2}}}(\alpha_{t}-\alpha^{*}) \triangleright Adapt β\beta using Robbins-Monro algorithm (Eq. 21)
9:     end for
10:end procedure

Considring the marginal distribution 𝒩(𝑿~; 0,𝐂)\mathcal{N}(\tilde{\bm{X}};\,\bm{0},\mathbf{C}) and the conditional distribution 𝒩(𝑿t𝑿~;1β2𝒙~,β2𝐂)\mathcal{N}(\bm{X}_{t}\mid\tilde{\bm{X}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C}), the joint random vector [𝑿~𝑿t]T[\tilde{\bm{X}}\,\,\bm{X}_{t}]^{\text{T}} is therefore Gaussian with a mean vector of [0  0]T[\bm{0}\,\,\bm{0}]^{\text{T}} and a covariance matrix of [𝐂1β2𝐂1β2𝐂𝐂]\begin{bmatrix}\mathbf{C}&\sqrt{1-\beta^{2}}\,\mathbf{C}\\ \sqrt{1-\beta^{2}}\,\mathbf{C}&\mathbf{C}\end{bmatrix}. The conditional distribution of 𝑿~\tilde{\bm{X}} given 𝑿t\bm{X}_{t} is hence Gaussian with a conditional mean 𝝁𝑿~𝑿t=1β2𝑿t\bm{\mu}_{\tilde{\bm{X}}\mid\bm{X}_{t}}=\sqrt{1-\beta^{2}}\bm{X}_{t} and a covariance 𝚺𝑿~𝑿t=β2𝐂\mathbf{\Sigma}_{\tilde{\bm{X}}\mid\bm{X}_{t}}=\beta^{2}\mathbf{C}. We therefore get:

𝒩(𝑿~; 0,𝐂)𝒩(𝑿t𝑿~;1β2𝒙~,β2𝐂)\displaystyle\mathcal{N}(\tilde{\bm{X}};\,\bm{0},\mathbf{C})\,\mathcal{N}(\bm{X}_{t}\mid\tilde{\bm{X}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C}) =𝒩([𝑿~𝑿t];[𝟎𝟎],[𝐂1β2𝐂1β2𝐂𝐂])\displaystyle=\mathcal{N}\bigg{(}\begin{bmatrix}\tilde{\bm{X}}\\ \bm{X}_{t}\end{bmatrix};\,\begin{bmatrix}\bm{0}\\ \bm{0}\end{bmatrix},\begin{bmatrix}\mathbf{C}&\sqrt{1-\beta^{2}}\,\mathbf{C}\\ \sqrt{1-\beta^{2}}\,\mathbf{C}&\mathbf{C}\end{bmatrix}\bigg{)} (19)
=𝒩(𝑿t; 0,𝐂)𝒩(𝑿~𝑿t;1β2𝒙t,β2𝐂)\displaystyle=\mathcal{N}(\bm{X}_{t};\,\bm{0},\mathbf{C})\,\mathcal{N}(\tilde{\bm{X}}\mid\bm{X}_{t};\,\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C})

Consequently, the MH ratio in Eq. 18 simplifies to one, confirming that the pCN proposal is indeed invariant under the original Gaussian distribution π𝑿\pi_{\bm{X}}. As a result, when applying the pCN algorithm to sample from the target distribution h~=g𝑿π𝑿\tilde{h}=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}, the acceptance probability reduces to the ratio between the likelihood function for the proposed and the current samples:

α=min{1,g𝑿(𝒙~)𝒩(𝒙~; 0,𝐂)𝒩(𝒙t𝒙~;1β2𝒙~,β2𝐂)g𝑿(𝒙t)𝒩(𝒙t; 0,𝐂)𝒩(𝒙~𝒙t;1β2𝒙t,β2𝐂)}=min{1,g𝑿(𝒙~)g𝑿(𝒙t)}\alpha=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})\,\mathcal{N}(\tilde{\bm{x}};\,\bm{0},\mathbf{C})\,\mathcal{N}(\bm{x}_{t}\mid\tilde{\bm{x}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C})}{\ell_{g_{\bm{X}}}(\bm{x}_{t})\,\mathcal{N}(\bm{x}_{t};\,\bm{0},\mathbf{C})\,\mathcal{N}(\tilde{\bm{x}}\mid\bm{x}_{t};\,\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C})}\bigg{\}}=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})}{\ell_{g_{\bm{X}}}(\bm{x}_{t})}\bigg{\}} (20)

The scaling parameter β\beta determines the balance between retaining information from the current sample and exploring new regions of the random variable space. When β\beta is close to zero, we have 1β21\sqrt{1-\beta^{2}}\approx 1, which keeps the proposed sample 𝒙~\tilde{\bm{x}} close to the current one. This proximity leads to high acceptance rates but potentially slows the mixing of the Markov chain due to limited exploration, thus reducing the effective sample size. Conversely, as β\beta approaches one, we have 1β20\sqrt{1-\beta^{2}}\approx 0, making 𝒙~\tilde{\bm{x}} largely independent of 𝒙t\bm{x}_{t}. This allows the sampler to explore the random variable space more broadly, but the proposed samples are more likely to land in low-probability regions, leading to an increase in rejections. To enhance the sampling efficiency, it is essential to find an optimal β\beta balancing these trade-offs, which is typically achieved by tuning β\beta to target a desired average acceptance ratio α\alpha^{*}. The Robbins–Monro stochastic approximation algorithm provides a method for such adaptation as [51, 52]:

logβt+1=logβt+γt(αtα)\log\beta_{t+1}=\log\beta_{t}+\gamma_{t}\left(\alpha_{t}-\alpha^{*}\right) (21)

where αt\alpha_{t} is the acceptance probability at iteration tt, computed by Eq. 17, and γt\gamma_{t} is a diminishing step size sequence, commonly chosen as γt=t0.5\gamma_{t}=t^{-0.5} [32]. In this work, we target an average acceptance probability in the range of 202040%40\%, with the lower values applied to high-dimensional distributions and the higher values to low-dimensional ones. This range is consistent with the values typically employed in the literature [37, 41] and has proven effective across all numerical examples considered in this work. Algorithm 1 describes the pCN algorithm, where 𝒙0\bm{x}_{0} is the initial sample, β0\beta_{0} is the initial scaling parameter (β0=0.5\beta_{0}=0.5 in this work), α\alpha^{*} is the desired average acceptance ratio, and LchainL_{\text{chain}} indicates the number of steps in the pCN chain.

Refer to caption Refer to caption Refer to caption
(a) π𝑿=𝒩(𝑿; 0,𝐈)\pi_{\bm{X}}=\mathcal{N}(\bm{X};\,\bm{0},\mathbf{I}) (b) {𝒙i0}i=1300π𝑿\{\bm{x}_{i}^{0}\}_{i=1}^{300}\sim\pi_{\bm{X}} (c) {𝒙i1}i=13001π𝑿1\{\bm{x}_{i}^{1}\}_{i=1}^{300}\sim\dfrac{1}{\pi_{\bm{X}\mid\mathcal{F}_{1}}}
                           Refer to caption Refer to caption
                              (d) {𝒙i2}i=13001π𝑿2\{\bm{x}_{i}^{2}\}_{i=1}^{300}\sim\dfrac{1}{\pi_{\bm{X}\mid\mathcal{F}_{2}}} (e) Selected seeds
Figure 4: Outlining the developed discovery approach of the rare event domain discussed in Section 4.2; (a) shows a bivariate independent standard Gaussian distribution in red, and a limit state function at g(𝑿)=0g(\bm{X})=0 shown using a grey line, (b) presents the first sample set drawn from the original distribution 𝒩(𝑿; 0,𝐈)\mathcal{N}(\bm{X};\,\bm{0},\mathbf{I}), (c) visualizes an additional conditional sample set drawn from 1/π𝑿1\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{1}}}}, (d) adds a second conditional sample set drawn from 1/π𝑿2\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{2}}}}, which provided at least p0Nlevelp_{0}N_{\text{level}} rare event samples, thus this procedure was stopped, and (e) depicts the pCN seeds in red, selected according to the computed probabilities in Eq. 22. These seeds are used to initialize the pCN chains but are not utilized in estimating the sought probability.

4.2 Rare event domain discovery

In this section, we introduce a computationally efficient discovery approach for rare event domains, designed to enhance sampling efficiency by generating (multimodal) rare event samples that serve as initial seeds for the pCN chains. This approach addresses the limitations the pCN algorithm may encounter in scenarios involving multimodality and target distributions, h~\tilde{h}, that are significantly distant from the original (prior) distribution π𝑿\pi_{\bm{X}}.

Refer to caption Refer to caption Refer to caption Refer to caption
(a) 𝒩(𝑿; 0,4𝐈)\mathcal{N}(\bm{X};\,\bm{0},4\mathbf{I}) (b) {𝒙i0}i=1300𝒩(𝑿; 0,4𝐈)\{\bm{x}_{i}^{0}\}_{i=1}^{300}\sim\mathcal{N}(\bm{X};\,\bm{0},4\mathbf{I}) (c) {𝒙i1}i=13001π𝑿1\{\bm{x}_{i}^{1}\}_{i=1}^{300}\sim\dfrac{1}{\pi_{\bm{X}\mid\mathcal{F}_{1}}} (d) Selected seeds
Figure 5: Illustrating the benefits of utilizing a more dispersed initial distribution during the rare event domain discovery, using the same example as in Fig. 4; (a) shows a bivariate independent Gaussian distribution with a covariance matrix of 4𝐈4\mathbf{I}, (b) presents the first sample set drawn from 𝒩(𝑿; 0,4𝐈)\mathcal{N}(\bm{X};\,\bm{0},4\mathbf{I}), (c) visualizes an additional conditional sample set drawn from 1/π𝑿1=1/𝒩(𝑿; 0,𝐈)\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{1}}}}=\nicefrac{{1}}{{\mathcal{N}(\bm{X}\mid\mathcal{F};\,\bm{0},\,\mathbf{I})}}, which provides at least p0Nlevelp_{0}N_{\text{level}} rare event samples. As a result, this procedure is stopped one step earlier compared to Fig. 4, highlighting greater computational efficiency. (d) depicts the pCN seeds in red, selected according to the computed probabilities in Eq. 22. These seeds are used to initialize the pCN chains but are not utilized in estimating the sought probability.
Algorithm 2 Rare Event Domain Discovery Algorithm
1:procedure RareEventDiscovery(NlevelN_{\text{level}}, p0p_{0}, ϵ\epsilon, g(𝑿)g(\bm{X}), NchainN_{\text{chain}})
2:     Generate an initial sample set {𝒙i0}i=1Nlevel\{\bm{x}_{i}^{0}\}_{i=1}^{N_{\text{level}}} from 𝒩(𝟎,ϵ𝐈)\mathcal{N}(\bm{0},\epsilon\mathbf{I}).
3:     Sort the samples {𝒙i0}i=1Nlevel\{\bm{x}_{i}^{0}\}_{i=1}^{N_{\text{level}}} in increasing order of their limit-state values {g(𝒙i0)}i=1Nlevel\{g(\bm{x}_{i}^{0})\}_{i=1}^{N_{\text{level}}}.
4:     Find the threshold λ1\lambda_{1} as the p0p_{0}-percentile of the ordered {g(𝒙i0)}i=1Nlevel\{g(\bm{x}_{i}^{0})\}_{i=1}^{N_{\text{level}}}, and set 1={𝒙:g(𝒙)λ1}\mathcal{F}_{1}=\{\bm{x}:g(\bm{x})\leq\lambda_{1}\}.
5:     Set the intermediate events counter j1j\leftarrow 1.
6:     while λj>0\lambda_{j}>0 do
7:         Find NS=p0NlevelN_{S}=p_{0}N_{\text{level}} seeds, {𝒙i(j1)}i=1NS\{\bm{x}_{i}^{(j-1)}\}_{i=1}^{N_{S}}, where 𝒙i(j1)j\bm{x}_{i}^{(j-1)}\in\mathcal{F}_{j}.
8:         Generate NlevelN_{\text{level}} samples {𝒙ij}i=1Nlevel\{\bm{x}_{i}^{j}\}_{i=1}^{N_{\text{level}}} from 1/π𝑿j\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{j}}}}, by drawing (1/p01)(\nicefrac{{1}}{{p_{0}}}-1) additional samples, starting from           each seed 𝒙i(j1)j\bm{x}_{i}^{(j-1)}\in\mathcal{F}_{j}, using an MCMC sampler.\triangleright π𝑿j=𝒩(𝑿j; 0,𝐈)\pi_{\bm{X}\mid\mathcal{F}_{j}}=\mathcal{N}(\bm{X}\mid\mathcal{F}_{j};\,\bm{0},\mathbf{I})
9:         Find λj+1\lambda_{j+1} as the p0p_{0}-percentile of the ascendly ordered {g(𝒙ij)}i=1Nlevel\{g(\bm{x}_{i}^{j})\}_{i=1}^{N_{\text{level}}}, and set j+1={𝒙:g(𝒙)λj+1}\mathcal{F}_{j+1}=\{\bm{x}:g(\bm{x})\leq\lambda_{j+1}\}.
10:         jj+1j\leftarrow j+1
11:     end while
12:     Sample NchainN_{\text{chain}} pCN seeds from the final rare event sample set {𝒙i(j1)}i=1NS\{\bm{x}_{i}^{(j-1)}\}_{i=1}^{N_{S}}, for which 𝒙i(j1)\bm{x}_{i}^{(j-1)}\in\mathcal{F}.
13:end procedure

This process begins by sampling from the original distribution, π𝑿=𝒩(𝑿; 0,𝐈)\pi_{\bm{X}}=\mathcal{N}(\bm{X};\,\bm{0},\mathbf{I}), or a more dispersed version, 𝒩(𝑿; 0,ϵ𝐈)\mathcal{N}(\bm{X};\,\bm{0},\epsilon\mathbf{I}), with the dispersion parameter ϵ1\epsilon\geq 1, to facilitate broader exploration of the random variable space. This step is followed by conditional sampling based on the reciprocal of the conditional distribution 1/π𝑿j\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{j}}}}, where j\mathcal{F}_{j} is an intermediate event defined as j{𝒙:g(𝒙)λj}\mathcal{F}_{j}\coloneqq\{\bm{x}\,:\,g(\bm{x})\leq\lambda_{j}\}, with 12M\mathcal{F}_{1}\supset\mathcal{F}_{2}\supset\dots\supset\mathcal{F}_{M}, where M=\mathcal{F}_{M}=\mathcal{F} and λ1λ2λM=0\lambda_{1}\geq\lambda_{2}\geq\dots\geq\lambda_{M}=0, chosen adaptively as discussed below. The use of 1/π𝑿j\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{j}}}} significantly facilitates rapid diffusion toward the rare event domain, compared to sampling from the conditional distribution π𝑿j\pi_{\bm{X}\mid\mathcal{F}_{j}}. While the latter is necessary for methods sequentially computing the sought probability, our method focuses solely on generating samples in the rare event domain (without utilizing any sequential information). The procedure stops after generating a specified number of rare event samples from the synthetic distribution 1/π𝑿\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}}}}, from which a defined number of seeds, NchainN_{\text{chain}}, is randomly selected to initialize the pCN chains, ensuring that the sampler is initialized within the rare event domain.

The detailed steps of this discovery process are outlined in Algorithm 2, where NlevelN_{\text{level}} denotes the number of samples of each intermediate event j\mathcal{F}_{j}, p0p_{0} is the conditional probability for intermediate events, controlling the selection of threshold levels, λj\lambda_{j}, ϵ\epsilon is a dispersion parameter for the initial distribution, 𝒩(𝟎,ϵ𝐈)\mathcal{N}(\bm{0},\epsilon\mathbf{I}), and NchainN_{\text{chain}} represents the number of the desired pCN chains. Fig. 4 outlines the proposed rare event domain discovery stage, showing the evolution of samples and seed selection, while the computational benefits of using a more dispersed initial distribution are demonstrated in Fig. 5. In this work, we use ϵ=4\epsilon=4 for all numerical examples to facilitate broader exploration of the random variable space by the initial sample set {𝒙i0}i=1Nlevel\{\bm{x}_{i}^{0}\}_{i=1}^{N_{\text{level}}}.

Let 𝐗={𝒙1,𝒙2,,𝒙N}\mathbf{X}^{\mathcal{F}}=\{\bm{x}_{1},\bm{x}_{2},\ldots,\bm{x}_{N_{\mathcal{F}}}\}, with NNSN_{\mathcal{F}}\geq N_{S}, be the rare event sample set obtained in Algorithm 2. The probability of selecting 𝒙i𝐗\bm{x}_{i}\in\mathbf{X}^{\mathcal{F}} as a pCN seed can be given by:

Pr(𝒙i)=h~(𝒙i)jNh~(𝒙j)\vspace{-0.05in}\text{Pr}(\bm{x}_{i})=\dfrac{\tilde{h}(\bm{x}_{i})}{\sum_{j}^{N_{\mathcal{F}}}\tilde{h}(\bm{x}_{j})}\vspace{-0.05in} (22)

where h~(𝒙i)\tilde{h}(\bm{x}_{i}) is the value of the target distribution in ASTPA at sample 𝒙i\bm{x}_{i}. Then, NchainN_{\text{chain}} pCN seeds can be sampled from the set 𝐗\mathbf{X}^{\mathcal{F}} without replacement, using the selection probability Pr(𝒙i)\text{Pr}(\bm{x}_{i}). This approach ensures that pCN seeds are more likely to be selected from different rare event modes, based on their statistical weights. Fig. 4(e) and Fig. 5(d) illustrate the selected pCN seeds using this weighted selection method. While this technique is effective for generating multimodal seeds in low-dimensional cases, as demonstrated in the low-dimensional examples in this work, it may not scale well in high-dimensional, multimodal spaces due to the sensitivity of the high-dimensional target distribution values, even for relatively small perturbations. In such cases, we recommend using uniform sampling to select pCN seeds, which ensures a broader exploration of high-dimensional, multimodal rare event domains. This uniform sampling method is successfully adopted in the high-dimensional examples in this work. Two examples of this uniform sampling are shown in Fig. 6(a) and Fig. 6(b), illustrating that the selected seeds are located in the rare event domain but do not follow any relevant target distribution (π𝑿\pi_{\bm{X}} or h~\tilde{h}). The computational efficiency of the proposed discovery approach, which requires significantly fewer model evaluations, just in order to identify the relevant domains, compared to conventional sequential methods utilized for probability estimation, is highlighted by comparing its performance in Fig. 6(a) and Fig. 6(b) with the sampling evolution of the Subset Simulation (SuS) method, as shown in Fig. 6(c).

Refer to caption Refer to caption Refer to caption
  (a) (b) (c)
Figure 6: (a) and (b) visualize uniformly selected seeds (shown in red) for the two cases depicted in Fig. 4 and Fig. 5, respectively. These seeds are used to initialize the pCN chains but, unlike Subset Simulation (SuS), are not utilized in estimating the sought probability. (c) shows the sampling evolution of the SuS method, emphasizing the computational efficiency of the proposed discovery stage, marked by utilizing significantly fewer samples just in order to discover the rare event domain.

5 Summary of Gradient-free ASTPA

The proposed guided pCN-based ASTPA framework can be applied as follows:

  1. 1.

    Construct the approximate sampling target h~\tilde{h} in Eq. 9, where its two parameters σ\sigma and gcg_{c} are determined as recommended in Section 3.1.

  2. 2.

    Run the rare event discovery stage as described in Algorithm 2 to obtain NchainN_{\text{chain}} initial seeds for the pCN sampler. Nlevel=300N_{\text{level}}=300, p0={0.1,0.2}p_{0}=\{0.1,0.2\}, and ϵ=4\epsilon=4 typically work effectively in Algorithm 2, with p0=0.2p_{0}=0.2 being more suitable for high-dimensional problems. Denote the total number of model calls in this stage as NdiscoveryN_{\text{discovery}}.

  3. 3.

    Generate NchainN_{\text{chain}} pCN chains, as described in Algorithm 1, with a number of steps, LchainL_{\text{chain}}, starting from the seeds obtained in the previous step, to get a sample set {𝒙i}i=1Nh\{\bm{x}_{i}\}_{i=1}^{N}\sim h. A burn-in phase with a length of 10%ofLchain10\%\,\text{of}\,L_{\text{chain}} is considered, with NburnN_{\text{burn}} denoting the total number of burn-in samples, i.e, N=0.9NchainLchainN=\left\lceil 0.9\,N_{\text{chain}}\cdot L_{\text{chain}}\right\rceil, and Nburn=0.1NchainLchainN_{\text{burn}}=\left\lfloor 0.1\,N_{\text{chain}}\cdot L_{\text{chain}}\right\rfloor. The required number of chains, NchainN_{\text{chain}}, typically ranges between 5 and 25, with higher numbers recommended for multimodal cases.

  4. 4.

    Compute the expected value of the weighted indicator function, p~^\hat{\tilde{p}}_{\mathcal{F}}, using Eq. 6, based on {𝒙i}i=1N\{\bm{x}_{i}\}_{i=1}^{N}.

  5. 5.

    Apply Inverse Importance Sampling (IIS) to compute the normalizing constant of C^h\hat{C}_{h}. This process initiates by fitting a GMM, Q(𝑿)Q(\bm{\bm{X}}), based on {𝒙i}i=1N\{\bm{x}_{i}\}_{i=1}^{N}, with the Q(𝑿)Q(\bm{\bm{X}}) structured as recommended in Section 3.2. An i.i.d. sample set {𝒙i}i=1M\{\bm{x}_{i}^{\prime}\}_{i=1}^{M} is then drawn from Q(𝑿)Q(\bm{\bm{X}}) to compute C^h\hat{C}_{h} using Eq. 11. MM can be generally around 30%30\% of NN.

  6. 6.

    Compute the sought rare event probability as p^=p~^C^h\hat{p}_{\mathcal{F}}=\hat{\tilde{p}}_{\mathcal{F}}\,\,\hat{C}_{h}.

A target total computational effort of model calls, Ntotal=Ndiscovery+Nburn+N+MN_{\text{total}}=N_{\text{discovery}}+N_{\text{burn}}+N+M, can be attributed to the different stages according to all the aforementioned steps. The convergence of the estimator and/or any additional needed model calls can then be checked through Eq. 14.

6 Numerical Results

This section studies several numerical examples to examine the performance and efficiency of the proposed methods. In all examples, the guided pCN-based ASTPA is implemented as summarized in Section 5. The results are compared with state-of-the-art gradient-free algorithms, including the improved Cross Entropy-based Importance Sampling (iCE-IS) algorithm [9], the Sequential Importance Sampling (SIS) method [20], and the adaptive Conditional Sampling Subset Simulation variant (aCS-SuS) [32], all of which have well documented, publicly available code implementations [53]. Comparisons are made in terms of accuracy and computational cost, based on 500500 independent simulations. Specifically, for each simulation jj, we record the target probability estimate (p^)j(\hat{p}_{\mathcal{F}})^{j}, the total number of model calls (Ntotal)j(N_{\text{total}})^{j}, and the analytical Coefficient of Variation of ASTPA, (C.o.V-Anal)j(\text{C.o.V-Anal})^{j}, computed as discussed in Section 3.3. NtotalN_{\text{total}} is defined for ASTPA in Section 5. Then, we report the mean of the rare event probability estimates, 𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}], the mean of the total number of model calls 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{tota}l}], and the sampling C.o.V, computed as:

C.o.V=Var^(p^)𝔼[p^],Var^(p^)=15001j=1500((p^)j𝔼[p^])2\textrm{C.o.V}=\dfrac{\sqrt{\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})}}{\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]},\quad\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})=\dfrac{1}{500-1}\,\sum_{j=1}^{500}\big{(}(\hat{p}_{\mathcal{F}})^{j}-\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]\big{)}^{2} (23)

The mean of the analytical C.o.V, 𝔼[C.o.V-Anal]\mathbb{E}[\text{C.o.V-Anal}], is also reported in parentheses. The total number of limit-state function evaluations NtotalN_{\text{total}} for ASTPA has been determined based on achieving C.o.V values [0.1, 0.35]\in[0.1,\,0.35]. Reference rare event probabilities are computed based on 100100 independent standard Monte Carlo simulations (MCs), as described in Section 2, using 10710^{7}-10910^{9} samples, as appropriate. The problem dimensionality is denoted by dd, and all ASTPA parameters are carefully chosen for all examples but are not optimized.

6.1 Example 1: Nonlinear bimodal convex limit-state function

The first example involves a bimodal nonlinear limit-state function characterized by two independent standard normal random variables:

g(𝑿)=min[412(X1+X2)+2.5(X1X2)2,  4+12(X1+X2)+2.5(X1X2)2]\displaystyle g(\bm{X})=\min\left[4-\frac{1}{\sqrt{2}}\ (X_{1}+X_{2})+2.5\ (X_{1}-X_{2})^{2},\,\,4+\frac{1}{\sqrt{2}}\ (X_{1}+X_{2})+2.5\ (X_{1}-X_{2})^{2}\right] (24)

The approximate sampling target, h~\tilde{h}, in ASTPA is constructed using a likelihood dispersion factor σ=0.3\sigma=0.3 and a scaling constant gc=1g_{c}=1, as discussed in Section 3.1. The discovery stage then starts using Nlevel=300N_{\text{level}}=300 and p0=0.1p_{0}=0.1, yielding 10 rare event seeds for the pCN sampler (Nchain=10N_{\text{chain}}=10), with each chain running for Lchain=150L_{\text{chain}}=150 iterations. Subsequently, inverse importance sampling (IIS) is applied using M=300M=300 samples. Table 1 compares the performance of the guided pCN-ASTPA with iCE-IS, SIS, and aCS-SuS. As shown in table 1, the guided pCN-based ASTPA provides an 𝔼[p^]\mathbb{E}[\hat{p}_{\mathcal{F}}] estimate, closely matching the reference probability obtained through standard Monte Carlo simulation (MCS). Compared to the other methods considered, the guided pCN-based ASTPA demonstrates improved performance, achieving a lower C.o.V using fewer total model calls, 𝔼[Ntotal]\mathbb{E}[N_{\text{total}}]. Notably, iCE-IS also demonstrates strong efficiency, outperforming both SIS and aCS-SuS in this 2D example. The excellent agreement between the sampling C.o.V and analytical C.o.V, as reported in parentheses, confirms the accuracy and effectiveness of the ASTPA analytical C.o.V expression. Since this example was introduced earlier in the paper to illustrate the ASTPA framework and the discovery stage, the steps and performance of the proposed framework are visualized in Figs. 1 and 5.

Table 1: Example 1: Performance of various methods for the nonlinear bimodal limit-state function.
Eq. 24 500 Simulations MCS pCN-ASTPA iCE-IS SIS aCS-SuS
(σ=0.3,q=\sigma=0.3,\,q= n.a.)
d=2d=2 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E9 2,373 2,972 5,525 5,741
C.o.V 0.01 0.16(\color[rgb]{0,0.88,0}(0.16)\color[rgb]{0,0.88,0}) 0.18 1.18 0.72
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 9.47E-6 9.46E-6 9.52E-6 9.89E-6 9.66E-6

6.2 Example 2: Quartic bimodal limit-state function

In the second example, we consider a quartic bimodal limit-state function, defined in the standard Gaussian space as:

g(𝑿)=6.512(X1+X2)2.5(X1X2)2+(X1X2)4\displaystyle g(\bm{X})=6.5-\frac{1}{\sqrt{2}}\ (X_{1}+X_{2})-2.5\ (X_{1}-X_{2})^{2}+\ (X_{1}-X_{2})^{4} (25)

The approximate target h~\tilde{h} is constructed with a likelihood dispersion factor of σ=0.2\sigma=0.2 and a scaling constant gc=1g_{c}=1. The discovery stage is applied using Nlevel=300N_{\text{level}}=300 and p0=0.1p_{0}=0.1, generating 18 rare event seeds for the pCN sampler (Nchain=18N_{\text{chain}}=18), with each chain running for Lchain=120L_{\text{chain}}=120 steps. As in the previous example, inverse importance sampling (IIS) is used with M=300M=300 samples. Fig. 7 shows the evolution of the rare event domain discovery stage, highlighting the effectiveness of the proposed algorithm in accurately identifying and sampling rare event regions.

Table 2: Example 2: Performance of various methods for the quartic bimodal limit-state function.
Eq. 25 500 Simulations MCS pCN-ASTPA iCE-IS SIS aCS-SuS
(σ=0.2,q=\sigma=0.2,\,q= n.a.)
d=2d=2 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E9 3,165 5,000 10,750 7,945
C.o.V 0.30 0.12(\color[rgb]{0,0.88,0}(0.11)\color[rgb]{0,0.88,0}) 0.15 2.12 2.07
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 5.91E-8 5.92E-8 5.80E-8 4.28E-8 5.79E-8

The results, as summarized in Table 2, show that the guided pCN-ASTPA significantly outperforms the other methods, achieving the lowest C.o.V, while using the fewest model calls (𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}]). This demonstrates the robustness and efficiency of the guided pCN-ASTPA framework, even in the presence of increased nonlinearity introduced through higher-order terms in the limit-state function. The guided pCN-based ASTPA also provides 𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] estimates in close agreement with the reference probability computed by Monte Carlo simulation (MCS). The very good agreement between the sampling and analytical C.o.V values (reported in parentheses) further confirms the reliability of the analytical C.o.V expression for ASTPA in this more challenging problem.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Example 2: (a) The constructed approximate sampling target h~\tilde{h}, (b) the sampling evolution of the rare event domain discovery stage, with red points being the selected pCN seeds, and (c) samples from h~\tilde{h}, subsequently used to compute p~^\hat{\tilde{p}}_{\mathcal{F}} in Eq. 6.

6.3 Example 3: The Himmelblau function

The Himmelblau function [54] is a commonly used fourth-order polynomial test function in nonlinear optimization. In this example, we modify the Himmelblau function to suit rare event scenarios, particularly those with multiple separated rare event domains. The modified limit-state function is defined in the standard Gaussian space as:

g(𝑿)=((0.75X10.5)21.81+(0.75X20.5)1.8111)2+((0.75X11)1.81+(0.75X20.5)21.817)250g(\bm{X})=\bigg{(}\dfrac{(0.75X_{1}-0.5)^{2}}{1.81}+\dfrac{(0.75X_{2}-0.5)}{1.81}-11\bigg{)}^{2}+\bigg{(}\dfrac{(0.75X_{1}-1)}{1.81}+\dfrac{(0.75X_{2}-0.5)^{2}}{1.81}-7\bigg{)}^{2}-50 (26)

where X1X_{1} and X2X_{2} are independent standard normal random variables. The approximate target h~\tilde{h} is constructed using a likelihood dispersion factor σ=0.3\sigma=0.3 and a scaling constant gc=g(𝟎)/4g_{c}=\nicefrac{{g(\bm{0})}}{{4}}, as described in Section 3.1 and Fig. 3. The discovery stage is applied using Nlevel=300N_{\text{level}}=300 and p0=0.1p_{0}=0.1, generating 16 rare event seeds for the pCN sampler, with each chain subsequently running for Lchain=160L_{\text{chain}}=160 iterations. Similar to the previous examples, inverse importance sampling (IIS) is applied using M=300M=300 samples. Fig. 8 illustrates the constructed target distribution, the rare event domain discovery process, and the pCN samples, effectively exploring all rare event modes. The results in Table 3 show that the guided pCN-ASTPA outperforms the other methods, achieving a lower C.o.V and fewer total model evaluations. This demonstrates the robustness and efficiency of the guided pCN-ASTPA framework, particularly in handling complex, multi-modal rare event domains. The 𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] estimate produced by the guided pCN-based ASTPA closely matches the reference probability, further confirming the method’s accuracy.

Refer to caption
Refer to caption
Refer to caption
Figure 8: Example 3: (a) The constructed approximate sampling target h~\tilde{h}, (b) the sampling evolution of the rare event domain discovery stage, with red points representing the selected pCN seeds, and (c) samples from h~\tilde{h}.
Table 3: Example 3: Performance of various methods for the Himmelblau limit-state function.
Eq. 26 500 Simulations MCS pCN-ASTPA iCE-IS SIS aCS-SuS
(σ=0.3,q=4\sigma=0.3,\,q=4)
d=2d=2 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E9 3,430 6,466 6,216 7,101
C.o.V 0.05 0.18(\color[rgb]{0,0.88,0}(0.17)\color[rgb]{0,0.88,0}) 0.37 0.30 0.53
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 2.81E-7 2.82E-7 2.36E-7 2.21E-7 3.02E-7

6.4 Example 4: Limit-state surface with changing topological parameter space.

This example examines a limit-state function with significant topological changes, making it a challenging scenario for rare event sampling methods, as described in [28]. The function is defined as:

g(𝑿)=30[(4(X1+2)29+X2225)2+1]1+20[((X12.5)24+(X20.5)225)2+1]15g(\bm{X})=30\left[\bigg{(}\dfrac{4(X_{1}+2)^{2}}{9}+\dfrac{X_{2}^{2}}{25}\bigg{)}^{2}+1\right]^{-1}+20\left[\bigg{(}\dfrac{(X_{1}-2.5)^{2}}{4}+\dfrac{(X_{2}-0.5)^{2}}{25}\bigg{)}^{2}+1\right]^{-1}-5 (27)

where X1X_{1} and X2X_{2} are independent standard normal random variables. This limit-state surface is particularly challenging due to its abrupt changes in topology, as shown in Fig. 9(a), requiring a robust sampling framework to accurately capture the rare event domain.

The guided pCN-ASTPA algorithm is applied with a likelihood dispersion factor of σ=0.1\sigma=0.1 and a scaling constant gc=g(𝟎)/5g_{c}=\nicefrac{{g(\mathbf{0})}}{{5}}, as outlined in Section 3.1. The discovery stage uses Nlevel=300N_{\text{level}}=300 and p0=0.1p_{0}=0.1, generating 6 rare event seeds for the pCN sampler, with each chain running for Lchain=100L_{\text{chain}}=100 iterations. Inverse importance sampling (IIS) is employed with M=200M=200 samples. Fig. 9 visualizes the limit-state surface, the constructed sampling target, and the evolution of samples in the rare event domain discovery process, highlighting the effectiveness of guided pCN-ASTPA in accurately capturing the influential rare event region. In contrast, Fig. 9(e) demonstrates the limitations of sequential methods like SuS, which struggle to efficiently identify the rare event mode with a competitive number of model evaluations. These rare event domain discovery limitations are more pronounced in iCE-IS in this example, which exhibits a significant error in the estimated mean of the rare event probability. As presented in Table 4, the guided pCN-ASTPA achieves an accurate rare event probability estimate, 𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}], with significantly fewer model evaluations and a lower Coefficient of Variation (C.o.V) compared to the other methods. This underscores the effectiveness of the pCN-ASTPA framework in tackling problems with complex topological changes, where traditional sequential methods often underperform.

Refer to caption
Refer to caption
Refer to caption

Refer to caption
Refer to caption
Figure 9: Example 4: (a) The surface of the limit-state in Eq. 27, showing changing topology, (b) the constructed ASTPA sampling target h~\tilde{h}, capturing the influential rare event mode, (c) the sampling evolution of the rare event domain discovery stage, with red points representing the selected pCN seeds, (d) samples generated from h~\tilde{h}, and (e) the sampling evolution of SuS, which fails to identify the rare event mode with a competitive number of model calls, demonstrating the limitations of sequential methods in such scenarios.
Table 4: Example 4: Performance of various methods for the limit-state surface with changing topological parameter space.
Eq. 27 500 Simulations MCS pCN-ASTPA iCE-IS SIS aCS-SuS
(σ=0.1,q=5\sigma=0.1,\,q=5)
d=2d=2 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E9 1,370 14,744 13,542 14,597
C.o.V 0.01 0.11(\color[rgb]{0,0.88,0}(0.13)\color[rgb]{0,0.88,0}) 0.09 3.33 1.92
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 1.13E-5 1.10E-5 1.92E-8 1.32E-5 1.26E-5

6.5 Example 5: A thirty four-story structural example

This example analyzes a thirty-four-story structure, adapted from [55], as represented in Fig. 10. The structure is subjected to thirty four static loads FiF_{i}, for i=1,2,,34i=1,2,\dots,34. The floor slabs/beams are assumed to be rigid, and all columns have the same height, H=4H=4 m, but different flexural stiffnesses EIkEI_{k}, for k=1,2,,68k=1,2,\dots,68. Both the loads and stiffnesses are treated as random variables, resulting in a total of d=102d=102 random variables. The loads are modeled as normally distributed with a mean of 2 kN and a C.o.V. of 0.4. Similarly, the stiffnesses are normally distributed with a mean of 20 MNm² and a C.o.V. of 0.2. The top story displacement, uu, can be determined by summing the interstory displacements uiu_{i}, expressed as:

ui=(j=i34Fj)H312(k=2i12iEIk),g=Y0i=134uiu_{i}=\dfrac{\left(\sum_{j=i}^{34}F_{j}\right)H^{3}}{12\left(\sum_{k=2i-1}^{2i}EI_{k}\right)},\quad g=Y_{0}-\sum_{i=1}^{34}u_{i} (28)

The limit-state function gg indicates failure (a rare event) when the top story displacement uu exceeds a threshold value Y0Y_{0}, set to 0.22, 0.23, and 0.235 m for different cases, to examine different levels of failure probability. All random variables are transformed into the independent standard Gaussian space for the analysis. Since g(𝟎)<2g(\bm{0})<2, as discussed in Section 3.1, scaling is performed with gc=g(𝟎)/4g_{c}=\nicefrac{{g(\mathbf{0})}}{{4}}. The discovery stage is performed with Nlevel=300N_{\text{level}}=300 and p0=0.2p_{0}=0.2 to effectively identify the rare event seeds for 5 pCN chains (Nchains=5N_{\text{chains}}=5), each with a chain length of 1,0001,000 samples (Lchain=1,000L_{\text{chain}}=1,000). The inverse importance sampling (IIS) is performed with 2,0002,000 resampled points (M=2,000M=2,000). Table 5 highlights the outstanding performance of the guided pCN-ASTPA in this high-dimensional problem. Our framework consistently provides accurate failure probability estimates with a low Coefficient of Variation (C.o.V.), while requiring significantly fewer total model evaluations compared to the other methods considered.

Refer to caption
Figure 10: Example 5: A thirty four-story structure under static loads.
Table 5: Example 5: Performance of various methods for the thirty four-story structure.
Eq. 28 500 Simulations MCS pCN-ASTPA iCE-IS SIS aCS-SuS
(σ=0.3,q=4\sigma=0.3,\,q=4)
d=102d=102 Y0=0.22Y_{0}=0.22 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E8 7,540 7,570 10,082 9,200
C.o.V 0.03 0.14(\color[rgb]{0,0.88,0}(0.14)\color[rgb]{0,0.88,0}) 0.55 0.60 0.24
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 2.41E-5 2.44E-5 1.93E-5 2.75E-5 2.42E-5
d=102d=102 Y0=0.23Y_{0}=0.23 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E8 7,540 10,851 11,504 11,470
C.o.V 0.09 0.22(\color[rgb]{0,0.88,0}(0.20)\color[rgb]{0,0.88,0}) 0.81 0.63 0.30
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 1.22E-6 1.22E-6 0.83E-6 1.25E-6 1.22E-6
d=102d=102 Y0=0.235Y_{0}=0.235 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E8 7,540 11,833 12,782 12,815
C.o.V 0.20 0.27(\color[rgb]{0,0.88,0}(0.28)\color[rgb]{0,0.88,0}) 0.85 1.32 0.35
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 2.46E-7 2.48E-7 1.55E-7 2.41E-7 2.41E-7

6.6 Example 6: Steel plate

In this example, we consider a low-carbon steel plate, as shown in Fig. 11 [56, 57]. The plate has dimensions of 2,0002,000 mm × 2,0002,000 mm, with a circular hole of radius 200 mm located at the center. Due to symmetry, only a quarter of the plate is analyzed. The Poisson ratio is assumed to be ν=0.3\nu=0.3 and the plate thickness is t=10t=10 mm. A deterministic uniform load of q=100N/mm2q=100\,\text{N/mm}^{2} is applied on the right edge of the plate. The elastic modulus E(x,y)E(x,y) of the plate is modeled as a two-dimensional homogeneous normal random field with a mean value of μE=2×105N/mm2\mu_{E}=2\times 10^{5}\,\text{N/mm}^{2} and a standard deviation of σE=2×104N/mm2\sigma_{E}=2\times 10^{4}\,\text{N/mm}^{2}. The autocorrelation function of the random field E(x,y)E(x,y) is given as:

ρE(Δx,Δy)=exp[(Δxlx)2(Δyly)2]\displaystyle\rho_{E}(\Delta x,\Delta y)=\exp\left[-\left(\dfrac{\Delta x}{l_{x}}\right)^{2}-\left(\dfrac{\Delta y}{l_{y}}\right)^{2}\right] (29)

where lxl_{x} and lyl_{y} are correlation lengths, both set to 300 mm in this example. The stochastic field is described using the Karhunen-Loève (KL) expansion with 100 terms, implying 100 standard Gaussian random variables utilized within the KL expansion (d=100d=100).

The stress, strain, and displacement fields of the plate are governed by the Navier-Cauchy equilibrium equation [58], simplified under the plane stress assumption as:

G(x,y)2𝒖(x,y)+E(x,y)2(1ν)(𝒖(x,y))+𝐟=𝟎G(x,y)\nabla^{2}\bm{u}(x,y)+\dfrac{E(x,y)}{2(1-\nu)}\nabla(\nabla\cdot\bm{u}(x,y))+\mathbf{f}=\mathbf{0} (30)

where G(x,y)G(x,y) is the shear modulus, defined as E(x,y)/[2(1+ν)]E(x,y)/[2(1+\nu)], 𝒖\bm{u} is the displacement vector, and 𝐟\mathbf{f} is the body force acting on the plate. This equation is solved using Finite Element Analysis (FEA) with a mesh of 368 four-node quadrilateral elements, as shown in Fig. 11. The limit-state function is defined as:

g(𝑿)=410max(σ1)g(\bm{X})=410-\max(\sigma_{1}) (31)

where max(σ1)\max(\sigma_{1}) is the maximum first principal stress, with failure occurring when this stress exceeds the threshold of 410 MPa.

For the guided pCN-ASTPA implementation, 5 chains (Nchains=5N_{\text{chains}}=5) are used, each with a chain length of 1,0001,000 iterations (Lchain=1,000L_{\text{chain}}=1,000). The IIS step is performed with 2,0002,000 resampled points (M=2,000M=2,000). The likelihood dispersion factor is set to σ=0.2\sigma=0.2, and the scaling constant gc=g(𝟎)/4g_{c}=\nicefrac{{g(\mathbf{0})}}{{4}} is applied. The discovery stage utilizes Nlevel=100N_{\text{level}}=100 and p0=0.2p_{0}=0.2. The results in Table 6 demonstrate that the guided pCN-ASTPA and iCE-IS methods provide comparable performance in this example, with pCN-ASTPA being slightly better, and with both significantly outperforming SIS and aCS-SuS.

Refer to caption Refer to caption
(a) (b)
Figure 11: Example 6: A quarter of a symmetric 2D plate-with-a-hole under random elastic modulus. (a) shows the finite element mesh. (b) showcases a random field realization of the elastic modulus.
Table 6: Example 6: Performance of various methods for the plate example.
Eq. 31 500 Simulations MCS pCN-ASTPA iCE-IS SIS aCS-SuS
(σ=0.2,q=4\sigma=0.2,\,q=4)
d=100d=100 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E7 7,247 7,611 11,605 12,996
C.o.V 0.29 0.16(\color[rgb]{0,0.88,0}(0.17)\color[rgb]{0,0.88,0}) 0.21 1.22 0.31
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 1.04E-6 1.01E-6 1.02E-6 0.96E-6 1.01E-6

6.7 Example 7: Bimodal high-dimensional decic problem

To further investigate the performance of the proposed method in high-dimensional spaces with very significant nonlinearities, we consider the following challenging decic limit-state function in the standard normal space:

g(𝑿)=min{2.801di=1200Xi+f(𝑿)2.80+1di=1200Xi+f(𝑿),wheref(𝑿)=(j=1γXj)2+exp[(k=1γXk)7]+(l=1γXl)10g(\bm{X})=\min\begin{cases}2.80-\frac{1}{\sqrt{d}}\ \sum_{i=1}^{200}X_{i}+f(\bm{X})\\[12.0pt] 2.80+\frac{1}{\sqrt{d}}\ \sum_{i=1}^{200}X_{i}+f(\bm{X})\end{cases},\text{where}\,f(\bm{X})=\big{(}\sum_{j=1}^{\gamma}X_{j}\big{)}^{2}+\exp\big{[}\big{(}\sum_{k=1}^{\gamma}X_{k}\big{)}^{7}\big{]}+\big{(}\sum_{l=1}^{\gamma}X_{l}\big{)}^{10} (32)

where γ\gamma controls the number of nonlinear terms, and hence the problem’s complexity. To construct the sampling target in ASTPA, the likelihood dispersion factor is set to σ=0.5\sigma=0.5, and the scaling constant to gc=1g_{c}=1. The discovery stage of the proposed framework employs Nlevel=500N_{\text{level}}=500 and p0=0.2p_{0}=0.2. Following this stage, 2020 to 2222 pCN chains are used, each with a chain length ranging from 1,0001,000 to 1,2001,200 iterations. Higher values within this range are utilized in scenarios with greater nonlinearities to account for the increased sampling complexity. The inverse importance sampling (IIS) step employs 4,0004,000 to 7,0007,000 samples, with larger numbers chosen for more complex scenarios. As shown in Table 7, the guided pCN-ASTPA framework demonstrates remarkable adaptability to increasing nonlinearity, achieving the lowest coefficient of variation (C.o.V.) while requiring fewer model evaluations compared to other methods. In contrast, the iCE-IS method fails to converge with a comparable number of model calls in this example. The ability of the guided pCN-ASTPA to efficiently manage both high dimensionality and strong nonlinearity highlights its robustness and computational efficiency, making it particularly well-suited for complex reliability estimation problems involving intricate variable interactions.

Table 7: Example 7: Performance of various methods for the bimodal high-dimensional decic problem.
Eq. 32 500 Simulations MCS pCN-ASTPA SIS aCS-SuS
(σ=0.6,q=\sigma=0.6,\,q=\,n.a.)
d=200d=200 γ=10\gamma=10 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E7 25,434 38,488 28,526
C.o.V 0.08 0.26(\color[rgb]{0,0.88,0}(0.23)\color[rgb]{0,0.88,0}) 3.39 0.76
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 1.02E-5 1.01E-5 1.20E-5 1.06E-5
d=200d=200 γ=15\gamma=15 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E7 26,568 41,921 29,946
C.o.V 0.12 0.33(\color[rgb]{0,0.88,0}(0.30)\color[rgb]{0,0.88,0}) 1.66 1.26
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 6.66E-6 6.58E-6 6.42E-6 6.83E-6
d=200d=200 γ=20\gamma=20 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E7 29,630 43,886 31,007
C.o.V 0.16 0.34(\color[rgb]{0,0.88,0}(0.31)\color[rgb]{0,0.88,0}) 1.86 1.56
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 4.51E-6 4.46E-6 4.41E-6 4.58E-6
d=200d=200 γ=25\gamma=25 𝔼[Ntotal]\mathop{\mathbb{E}}[N_{\text{total}}] 1.00E7 35,072 45,830 38,520
C.o.V 0.18 0.33(\color[rgb]{0,0.88,0}(0.33)\color[rgb]{0,0.88,0}) 2.89 2.07
𝔼[p^]\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}] 3.12E-6 3.15E-6 3.24E-6 3.42E-6

7 Conclusions

In this work, we introduce a novel gradient-free framework for reliability and rare event probabilities estimation in complex, high-dimensional Gaussian spaces, particularly suited for black-box computational models where analytical gradients are not available. At the core of this framework is the foundational Approximate Sampling Target with Post-processing Adjustment (ASTPA) approach, which efficiently estimates rare event (failure) probabilities by constructing and sampling from an unnormalized target distribution, relaxing the optimal importance sampling density (ISD) [29]. This relaxation enables efficient sampling, especially in high-dimensional and highly nonlinear problem settings, with the generated samples employed to compute a shifted estimate of the sought probability. The inverse importance sampling (IIS) procedure subsequently refines the probability estimation by employing a fitted ISD using the already obtained samples to compute the target’s normalizing constant, avoiding scalability challenges.

While previous work has shown that integrating gradient-based samplers with ASTPA yields outstanding performance in both Gaussian and non-Gaussian spaces [4, 29], the absence of analytical gradients hinders the efficient applicability of such approaches. The proposed gradient-free framework addresses this limitation by advancing the preconditioned Crank-Nicolson (pCN) algorithm, incorporating a rare event domain discovery stage to significantly enhance the sampling process. The scale parameter of the pCN algorithm is adapted using the Robbins–Monro stochastic approximation algorithm to achieve a 202040%40\% average acceptance probability, which was proven effective in our numerical examples. Despite the dimension-robustness of pCN, its effectiveness depends on carefully initializing its chains in relation to the regions of interest, particularly in high-dimensional multimodal distributions. To address this, we developed a computationally efficient rare event domain discovery technique, conditionally sampling well-designed synthetic variants of the original distribution, thereby facilitating rapid diffusion toward relevant regions in the random variable space. Our investigation into selecting rare event (pCN) seeds (from the obtained rare event sample set) shows that weighted sampling is effective in low-dimensional cases, whereas uniform sampling is more suitable for initializing the pCN chains in high-dimensional, multimodal cases, where the target distributions become sensitive to even small perturbations. This approach significantly enhances the performance of the pCN algorithm in sampling the approximate target in ASTPA.

The guided pCN-based ASTPA methodology has been tested across a range of challenging Gaussian problems, including mathematically formulated test functions and engineering examples, such as a 34-story frame structure and a steel plate, with rare event probabilities as low as 10810^{-8}. These mathematical limit-state functions are carefully designed to involve multimodality and up to decic nonlinear terms, representing challenging benchmarks. Our method consistently outperformed state-of-the-art gradient-free approaches, including improved Cross-Entropy Importance Sampling (iCE-IS), Sequential Importance Sampling (SIS), and adaptive Conditional Sampling Subset Simulation (aCS-SuS), both in computational cost and accuracy. Moreover, the ASTPA analytical coefficient of variation (C.o.V) demonstrated close agreement with empirical results, further validating the accuracy of this theoretical estimate.

This guided pCN-based ASTPA framework sets a new performance benchmark for gradient-free rare event probability estimation, offering a scalable and computationally efficient solution for high-dimensional, nonlinear, and multimodal problems. Future work will explore further extensions of the framework, including the development of gradient-free variants suited for direct applicability in non-Gaussian spaces and estimating high-dimensional first-passage problems under various settings [30].

References

  • [1] Y. Bai, A. B. Dieker, and H. Lam, “Curse of dimensionality in rare-event simulation,” in 2023 Winter Simulation Conference (WSC), pp. 375–384, IEEE, 2023.
  • [2] R. Y. Rubinstein and D. P. Kroese, Simulation and the Monte Carlo Method. John Wiley & Sons, 2016.
  • [3] G. L. Ang, A. H.-S. Ang, and W. H. Tang, “Optimal importance-sampling density estimator,” Journal of Engineering Mechanics, vol. 118, no. 6, pp. 1146–1163, 1992.
  • [4] E. Eshra, K. G. Papakonstantinou, and H. Nikbakht, “A direct importance sampling-based framework for rare event uncertainty quantification in non-Gaussian spaces,” arXiv preprint arXiv:2405.14149, 2025.
  • [5] G. I. Schuëller and R. Stix, “A critical appraisal of methods to determine failure probabilities,” Structural Safety, vol. 4, no. 4, pp. 293–309, 1987.
  • [6] S.-K. Au and J. L. Beck, “Estimation of small failure probabilities in high dimensions by Subset Simulation,” Probabilistic Engineering Mechanics, vol. 16, no. 4, pp. 263–277, 2001.
  • [7] N. Kurtz and J. Song, “Cross-entropy-based adaptive importance sampling using Gaussian mixture,” Structural Safety, vol. 42, pp. 35–44, 2013.
  • [8] Z. Wang and J. Song, “Cross-entropy-based adaptive importance sampling using von Mises-Fisher mixture for high dimensional reliability analysis,” Structural Safety, vol. 59, pp. 42–52, 2016.
  • [9] I. Papaioannou, S. Geyer, and D. Straub, “Improved cross entropy-based importance sampling with a flexible mixture model,” Reliability Engineering & System Safety, vol. 191, p. 106564, 2019.
  • [10] F. Uribe, I. Papaioannou, Y. M. Marzouk, and D. Straub, “Cross-entropy-based importance sampling with failure-informed dimension reduction for rare event simulation,” SIAM/ASA Journal on Uncertainty Quantification, vol. 9, no. 2, pp. 818–847, 2021.
  • [11] J. Demange-Chryst, F. Bachoc, J. Morio, and T. Krauth, “Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling,” arXiv preprint arXiv:2310.09194, 2023.
  • [12] S. Tong and G. Stadler, “Large deviation theory-based adaptive importance sampling for rare events in high dimensions,” SIAM/ASA Journal on Uncertainty Quantification, vol. 11, no. 3, pp. 788–813, 2023.
  • [13] M. Chiron, C. Genest, J. Morio, and S. Dubreuil, “Failure probability estimation through high-dimensional elliptical distribution modeling with multiple importance sampling,” Reliability Engineering & System Safety, vol. 235, p. 109238, 2023.
  • [14] S.-K. Au and J. L. Beck, “A new adaptive importance sampling scheme for reliability calculations,” Structural Safety, vol. 21, no. 2, pp. 135–158, 1999.
  • [15] S.-K. Au and J. L. Beck, “Important sampling in high dimensions,” Structural Safety, vol. 25, no. 2, pp. 139–163, 2003.
  • [16] A. Tabandeh, G. Jia, and P. Gardoni, “A review and assessment of importance sampling methods for reliability analysis,” Structural Safety, vol. 97, p. 102216, 2022.
  • [17] M. Ehre, I. Papaioannou, and D. Straub, “Stein variational rare event simulation,” arXiv preprint arXiv:2308.04971, 2023.
  • [18] T. Cui, S. Dolgov, and R. Scheichl, “Deep importance sampling using tensor trains with application to a priori and a posteriori rare events,” SIAM Journal on Scientific Computing, vol. 46, no. 1, pp. C1–C29, 2024.
  • [19] P. Del Moral, A. Doucet, and A. Jasra, “Sequential Monte Carlo samplers,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 3, pp. 411–436, 2006.
  • [20] I. Papaioannou, C. Papadimitriou, and D. Straub, “Sequential importance sampling for structural reliability analysis,” Structural Safety, vol. 62, pp. 66–75, 2016.
  • [21] A. Sinha, M. O’Kelly, R. Tedrake, and J. C. Duchi, “Neural bridge sampling for evaluating safety-critical autonomous systems,” Advances in Neural Information Processing Systems, vol. 33, pp. 6402–6416, 2020.
  • [22] J. Xian and Z. Wang, “Relaxation-based importance sampling for structural reliability analysis,” Structural Safety, vol. 106, p. 102393, 2024.
  • [23] K. M. Zuev, J. L. Beck, S.-K. Au, and L. S. Katafygiotis, “Bayesian post-processor and other enhancements of Subset Simulation for estimating failure probabilities in high dimensions,” Computers & Structures, vol. 92, pp. 283–296, 2012.
  • [24] Z. Wang, M. Broccardo, and J. Song, “Hamiltonian Monte Carlo methods for Subset Simulation in reliability analysis,” Structural Safety, vol. 76, pp. 51–67, 2019.
  • [25] M. D. Shields, D. G. Giovanis, and V. Sundar, “Subset Simulation for problems with strongly non-Gaussian, highly anisotropic, and degenerate distributions,” Computers & Structures, vol. 245, p. 106431, 2021.
  • [26] W. Chen, Z. Wang, M. Broccardo, and J. Song, “Riemannian Manifold Hamiltonian Monte Carlo based Subset Simulation for reliability analysis in non-Gaussian space,” Structural Safety, vol. 94, p. 102134, 2022.
  • [27] D. Thaler, S. L. Dhulipala, F. Bamer, B. Markert, and M. D. Shields, “Reliability analysis of complex systems using Subset Simulations with Hamiltonian Neural Networks,” arXiv preprint arXiv:2401.05244, 2024.
  • [28] K. Breitung, “The geometry of limit state function graphs and Subset Simulation: Counterexamples,” Reliability Engineering & System Safety, vol. 182, pp. 98–106, 2019.
  • [29] K. G. Papakonstantinou, H. Nikbakht, and E. Eshra, “Hamiltonian MCMC methods for estimating rare events probabilities in high-dimensional problems,” Probabilistic Engineering Mechanics, vol. 74, p. 103485, 2023.
  • [30] K. G. Papakonstantinou, E. Eshra, and H. Nikbakht, “Hamiltonian MCMC based framework for time-variant rare event uncertainty quantification,” in 14th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP), Dublin, Ireland, 2023.
  • [31] K. M. Zuev and L. S. Katafygiotis, “Modified Metropolis–Hastings algorithm with delayed rejection,” Probabilistic Engineering Mechanics, vol. 26, no. 3, pp. 405–412, 2011.
  • [32] I. Papaioannou, W. Betz, K. Zwirglmaier, and D. Straub, “MCMC algorithms for Subset Simulation,” Probabilistic Engineering Mechanics, vol. 41, pp. 89–103, 2015.
  • [33] A. Mira, “On Metropolis–Hastings algorithms with delayed rejection,” Metron, vol. 59, no. 3-4, pp. 231–241, 2001.
  • [34] F. Miao and M. Ghosn, “Modified Subset Simulation method for reliability analysis of structural systems,” Structural Safety, vol. 33, no. 4-5, pp. 251–260, 2011.
  • [35] A. Santoso, K. Phoon, and S. Quek, “Modified Metropolis–Hastings algorithm with reduced chain correlation for efficient Subset Simulation,” Probabilistic Engineering Mechanics, vol. 26, no. 2, pp. 331–341, 2011.
  • [36] R. M. Neal, “Regression and classification using Gaussian process priors,” Bayesian Statistics, vol. 6, pp. 475–501, 1998.
  • [37] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White, “MCMC methods for functions: modifying old algorithms to make them faster,” Statistical Science, pp. 424–446, 2013.
  • [38] M. Hairer, A. M. Stuart, and S. J. Vollmer, “Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions,” The Annals of Applied Probability, vol. 24, no. 6, pp. 2455 – 2490, 2014.
  • [39] Z. Hu, Z. Yao, and J. Li, “On an adaptive preconditioned Crank–Nicolson MCMC algorithm for infinite dimensional Bayesian inference,” Journal of Computational Physics, vol. 332, pp. 492–503, 2017.
  • [40] B. Carrera and I. Papaioannou, “Covariance-based MCMC for high-dimensional Bayesian updating with Sequential Monte Carlo,” Probabilistic Engineering Mechanics, vol. 77, p. 103667, 2024.
  • [41] D. Rudolf and B. Sprungk, “On a generalization of the preconditioned Crank–Nicolson Metropolis algorithm,” Foundations of Computational Mathematics, vol. 18, pp. 309–343, 2018.
  • [42] S.-K. Au and E. Patelli, “Rare event simulation in finite-infinite dimensional space,” Reliability Engineering & System Safety, vol. 148, pp. 67–77, 2016.
  • [43] S. Xiao and W. Nowak, “Failure probability estimation with failure samples: An extension of the two-stage Markov chain Monte Carlo simulation,” Mechanical Systems and Signal Processing, vol. 212, p. 111300, 2024.
  • [44] M. Hohenbichler and R. Rackwitz, “Non-normal dependent vectors in structural safety,” Journal of the Engineering Mechanics Division, vol. 107, no. 6, pp. 1227–1238, 1981.
  • [45] A. Der Kiureghian and P.-L. Liu, “Structural reliability under incomplete probability information,” Journal of Engineering Mechanics, vol. 112, no. 1, pp. 85–104, 1986.
  • [46] R. Lebrun and A. Dutfoy, “An innovating analysis of the Nataf transformation from the copula viewpoint,” Probabilistic Engineering Mechanics, vol. 24, no. 3, pp. 312–320, 2009.
  • [47] I. Papaioannou and D. Straub, “Efficient sampling of non-Gaussian priors in high dimensions,” in Proc. 12th International Conference on Structural Safety & Reliability (ICOSSAR), Vienna, Austria, 2017.
  • [48] K. G. Papakonstantinou, H. Nikbakht, and E. Eshra, “Quasi-Newton Hamiltonian MCMC sampling for reliability estimation in high-dimensional non-Gaussian spaces,” in Proc. 13th International Conference on Structural Safety & Reliability (ICOSSAR), Shanghai, China, 2022.
  • [49] H. Nikbakht and K. G. Papakonstantinou, “A direct Hamiltonian MCMC approach for reliability estimation,” in Proc. 3rd International Conference on Uncertainty Quantification in Computational Sciences and Engineering (UNCECOMP), Hersonissos, Greece, 2019.
  • [50] M. Betancourt, “A conceptual introduction to Hamiltonian Monte Carlo,” arXiv preprint arXiv:1701.02434, 2017.
  • [51] H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
  • [52] C. Andrieu and J. Thoms, “A tutorial on adaptive MCMC,” Statistics and Computing, vol. 18, pp. 343–373, 2008.
  • [53] Engineering Risk Analysis Group, “Reliability analysis tools,” 2024. Technical University of Munich. Available at https://github.com/ERA-Software/Overview.
  • [54] D. M. Himmelblau, Applied Nonlinear Programming. McGraw-Hill, 1972.
  • [55] C. Bucher, Computational Analysis of Randomness in Structural Mechanics: Structures and Infrastructures Book Series, vol. 3. CRC Press, 2009.
  • [56] I. Papaioannou, M. Ehre, and D. Straub, “PLS-based adaptation for efficient PCE representation in high dimensions,” Journal of Computational Physics, vol. 387, pp. 186–204, 2019.
  • [57] M. Ko and K. G. Papakonstantinou, “Quadratic point estimate method for probabilistic moments computation,” Probabilistic Engineering Mechanics, vol. 79, p. 103705, 2025.
  • [58] C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method. Dover Publications, 2012.