Gradient-free Importance Sampling Scheme for Efficient Reliability Estimation

Elsayed Eshra
The Pennsylvania State University
University Park, PA 16802
eme5375@psu.edu
&Konstantinos G. Papakonstantinou
The Pennsylvania State University
University Park, PA 16802
kpapakon@psu.edu

Abstract

This work presents a novel gradient-free importance sampling-based framework for precisely and efficiently estimating rare event probabilities, often encountered in reliability analyses of engineering systems. The approach is formulated around our foundational Approximate Sampling Target with Post-processing Adjustment (ASTPA) methodology. ASTPA uniquely constructs and directly samples an unnormalized target distribution, relaxing the optimal importance sampling distribution (ISD). The target’s normalizing constant is then estimated using our inverse importance sampling (IIS) scheme, employing an ISD fitted based on the obtained samples. In this work, a gradient-free sampling method within ASTPA is developed through a guided dimension-robust preconditioned Crank-Nicolson (pCN) algorithm, particularly suitable for black-box computational models where analytical gradient information is not available. To boost the sampling efficiency of pCN in our context, a computationally effective, general discovery stage for the rare event domain is devised, providing (multi-modal) rare event samples used in initializing the pCN chains. A series of diverse test functions and engineering problems involving high dimensionality and strong nonlinearity is presented, demonstrating the advantages of the proposed framework compared to several state-of-the-art sampling methods.

Keywords Rare Event $\cdot$ Gradient-free Sampling $\cdot$ Inverse Importance Sampling $\cdot$ Reliability Estimation $\cdot$ Preconditioned Crank-Nicolson $\cdot$ High dimensions $\cdot$ ASTPA.

1 Introduction

This work introduces a novel gradient-free framework for reliability analysis and rare event probabilities estimation in high-dimensional random variable spaces. Evaluation of system reliability and rare event uncertainty quantification (UQ) play a vital role in modern decision-making across diverse fields. Yet, as our systems and computational models grow increasingly complex, the computational cost of repeatedly evaluating them becomes a major challenge. Additionally, analytically deriving gradients for such models can sometimes be cumbersome or even infeasible, and using numerical gradient estimation methods in high-dimensional settings is impractical due to the high computational cost involved. These challenges are further compounded by the exceptionally low probabilities associated with such rare events, often ranging from $10^{-4}$ to $10^{-9}$ or lower. This necessitates the development of efficient gradient-free approaches capable of accurately estimating rare event probabilities while minimizing the number of model evaluations, a challenge that this work aims to address.

In the context of reliability estimation, failures are often considered rare events; therefore, rare event probability is also commonly referred to as failure probability. Several techniques have been developed over the years to estimate rare event (failure) probabilities, with the sampling-based approaches being able to address this problem in its utmost generality. To overcome the limitations of the crude Monte Carlo approach [1], various advanced sampling techniques have been proposed. Among others, Importance sampling (IS) [2], a well-known variance-reduction approach, is capable of efficiently estimating rare event probabilities. IS employs an appropriate importance sampling density (ISD), placing greater importance on rare event domains in the random variable space. This results in an increased number of rare event samples, thereby reducing the number of required model calls. Although the theoretically optimal ISD is known [3], it is challenging to sample from and impractical to implement. Notably, finding an effective near-optimal ISD is an ongoing, challenging research pursuit. The existing approaches can be largely classified into two categories [4]. The first seeks to approximate the optimal ISD by a parametric or non-parametric normalized probability density function (PDF). An early approach in this category utilizes densities centered around one or more design points [5], a powerful, practical process, yet with certain limitations for high-dimensional, complex spaces, as also discussed in [6]. A more recent, popular approach is the cross entropy-based importance sampling (CE-IS). It alternatively constructs a parametric ISD $h$ by minimizing the Kullback-Leibler (KL) divergence between the optimal ISD and a chosen parametric family of probability distributions [7, 8, 9, 10, 11]. Recent CE-IS variants incorporate information from influential rare event points, e.g., design points, to effectively initialize the cross entropy process [12, 13]. Within this first category of methods, several other different noteworthy approaches have also been proposed to approximate the optimal ISD, e.g., [14, 15, 16, 17, 18]. The second category of methods employs sequential variants of importance sampling, mainly utilizing a number of intermediate unnormalized distributions [19, 20, 21, 22]. These unnormalized distributions are generally more flexible than the PDFs usually utilized within the first category, thus being capable of providing a better approximation of the optimal ISD. Another sequential sampling method that shares similarities [4] with this second methodological category is the Subset Simulation (SuS) approach, originally presented in [6], and continuously refined [23, 24, 25, 26, 27], due to its potential to handle high-dimensional problems, despite some limitations [28]. A key to the success of all sequential methods is to design nearby intermediate distributions and to sufficiently sample them to accurately compute the sought rare event probability, a non-trivial and computationally demanding task.

Recently, we developed a novel foundational importance sampling-based approach in a non-sequential manner, termed Approximate Sampling Target with Post-processing Adjustment (ASTPA), for reliability and rare event probabilities estimation [29, 4]. The ASTPA approach innovatively decomposes the challenging rare event probability estimation process into two less demanding estimation problems. First, ASTPA uniquely constructs and directly samples an unnormalized target distribution, relaxing the optimal ISD, thereby making it comparatively much easier to sample. The obtained samples are subsequently utilized not only to compute a shifted estimate of the sought probability but also to guide the second ASTPA stage. Post-sampling, the normalizing constant of the approximate sampling target is computed through the inverse importance sampling (IIS) technique [29], which utilizes an importance sampling density (ISD) fitted based on the samples drawn in the first stage. The rare event probability is eventually computed by utilizing this computed normalizing constant to correct the shifted estimate of the first stage. Consequently, ASTPA significantly reduces the computational cost typically associated with using multiple intermediate distributions. Sampling the approximate target distribution in ASTPA has been so far performed using gradient-based Hamiltonian Markov Chain Monte Carlo (HMCMC) methods, demonstrating outstanding performance in complex, high-dimensional static and first-passage dynamic problems, both in Gaussian spaces [29] and directly in non-Gaussian spaces [30, 4]. However, the competitiveness of this gradient-based approach is hindered when analytical gradients cannot be obtained.

Gradient-free sampling has been extensively employed in various reliability estimation approaches, such as importance sampling and Subset Simulation, as discussed earlier. One common method employed is the Component-wise Metropolis-Hastings (CWMH) algorithm, often applied within Subset Simulation [6] to effectively sample high-dimensional conditional distributions by sequentially updating individual dimensions. To further enhance efficiency, delayed rejection variants of CWMH have been introduced, allowing for multiple proposal attempts within a single iteration when a proposal is rejected, thereby increasing acceptance rates and reducing sample autocorrelation [31, 32, 33, 34, 35]. Another significant development is the preconditioned Crank-Nicolson (pCN) algorithm, a MCMC method designed for high-dimensional probability distributions and infinite-dimensional settings, often encountered in Bayesian inverse problems [36, 37]. Inspired by the Crank-Nicolson discretization for solving partial differential equations, the pCN sampler proposes updates by combining the current state with a perturbation drawn from the prior distribution, scaled by a tunable parameter to control the step size. This design maintains stable acceptance probabilities even as dimensionality increases, achieving dimension-robust performance [38]. To further improve efficiency, some extensions of the standard pCN algorithm incorporate information from the posterior distribution by adapting the proposal covariance using sampling history [39, 40] or by estimating the posterior covariance through gradient information [41]. The pCN algorithm has been applied in various reliability estimation frameworks, including Subset Simulation [42, 32] and importance sampling [20, 43].

In this paper, we introduce a novel gradient-free framework for efficiently and accurately estimating rare event probabilities, building on our foundational ASTPA approach, particularly suitable for black-box computational models where analytical gradient information is not available, for any reason possible. Central to the framework is a guided variant of the dimension-robust pCN algorithm, tailored to sample the approximate target distribution within ASTPA. The standard pCN algorithm encounters challenges with multimodal distributions or when the target is distant from prior information, as often occurs with rare event domains located in the tails of original probability distributions. To overcome these limitations, we propose a computationally efficient discovery stage in this work, properly designed to enhance sampling efficiency by generating (multimodal) rare event samples to serve as initial seeds for the pCN chains. This process begins by sampling from the original distribution or an adjusted version, such as one with a modified covariance matrix, followed by conditional sampling based on the reciprocal of the original distribution, facilitating rapid diffusion toward the rare event domain. The procedure is stopped after generating a specified number of rare event samples, from which a defined number of seeds is randomly selected to initialize the pCN chains. The generated pCN sample set is then employed within the ASTPA approach to eventually estimate the sought failure probability. Consistent with our previous findings [29, 4], the ASTPA estimator remains unbiased, and the derived analytical coefficient of variation aligns very well with the sampling one, as demonstrated by the numerical experiments presented in this work.

The structure of this paper is as follows: Section 2 defines the rare event probability estimation problem, highlighting its complexities and the considered challenging yet realistic and practical scenarios. The ASTPA framework is presented in Section 3, followed by a detailed description of the developed gradient-free sampling scheme in Section 4. A summary of the pCN-based ASTPA framework is presented in Section 5. To fully evaluate the capabilities of the proposed methodology, Section 6 demonstrates its performance, successfully comparing it with state-of-the-art Subset Simulation and advanced importance sampling methods across a series of challenging low- and high-dimensional problems. The paper then concludes in Section 7.

2 Rare Event Probability Estimation

Let $\bm{X}$ be a continuous random vector taking values in $\mathcal{X}\subseteq\mathbb{R}^{d}$ and having a joint probability density function (PDF) $\pi_{\bm{X}}$ . We are interested in rare events $\mathcal{F}\coloneqq\{\bm{x}\,:\,g(\bm{x})\leq 0\}$ , where $g:\mathcal{X}\rightarrow\mathbb{R}$ is a performance function, also known as the limit-state function, defining the occurrence of a rare event. In this work, we aim to estimate the rare event probability $p_{\mathcal{F}}$ :

p_{\mathcal{F}}=\int_{\mathcal{F}}\pi_{\bm{X}}(\bm{x})d\bm{x}=\int_{\mathcal{X}}I_{\mathcal{F}}(\bm{x})\,\pi_{\bm{X}}(\bm{x})d\bm{x}={\mathop{\mathbb{E}}}_{\pi_{\bm{X}}}[I_{\mathcal{F}}(\bm{X})]

(1)

where $I_{\mathcal{F}}:\mathcal{X}\rightarrow\{0,1\}$ is the indicator function, i.e., $I_{\mathcal{F}}(\bm{x})=1$ if $\bm{x}\in\mathcal{F}$ , and $I_{\mathcal{F}}(\bm{x})=0$ otherwise, and $\mathop{\mathbb{E}}$ is the expectation operator.

Our objective in this work is to estimate the described integration in Eq. 1 under these challenging yet realistic settings: (i) the analytical calculation of Eq. 1 is generally intractable; (ii) the computational effort for evaluating $I_{\mathcal{F}}$ for each realization $\bm{x}$ is assumed to be quite significant, often relying on computationally intensive models, necessitating the minimization of such function evaluations (model calls); (iii) the rare event probability $p_{\mathcal{F}}$ is extremely low, typically in the range of $10^{-4}-10^{-9}$ or lower; (iv) the random variable space, $\mathcal{X}\subseteq\mathbb{R}^{d}$ , is high-dimensional, with $d$ often as large as $10^{2}$ or more; (v) the analytical gradient of the limit state function is not available, motivating the use of gradient-free approaches; and (vi) the joint probability density, $\pi_{\bm{X}}(\bm{x})$ , is assumed to follow an independent standard Gaussian distribution, i.e., $\pi_{\bm{X}}(\bm{x})=\mathcal{N}(\bm{X};\,\mathbf{0},\,\mathbf{I})$ , with $\mathbf{I}$ being an identity matrix, a common assumption in rare event estimation. In most cases, this assumption is not restrictive, as transforming the original random variables to the independent standard Gaussian space can be uncomplicated [44, 45, 46, 47]. However, when such Gaussian transformations are not feasible, directly estimating rare event probabilities in complex non-Gaussian spaces may require sophisticated gradient-based samplers, a challenge addressed in our Quasi-Newton mass preconditioned HMCMC (QNp-HMCMC)-based ASTPA framework, as demonstrated in [4, 48, 49]. The gradient-free sampling approach developed in this work is specifically tailored for Gaussian spaces, with ongoing efforts to extend these methods in the future to direct estimations in non-Gaussian spaces.

Under these settings, several sampling-based methods for estimating rare event probabilities become highly inefficient. For instance, the standard Monte Carlo estimator of Eq. 1 using $\{\bm{x}_{i}\}_{1}^{N}$ draws from $\pi_{\bm{X}}$ , $\hat{p}_{\mathcal{F}}=\frac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i})$ , has a coefficient of variation $\sqrt{(1-p_{\mathcal{F}})/(Np_{\mathcal{F}})}$ , rendering this estimate inefficient by requiring a prohibitively large number of samples to accurately quantify small probabilities; see setting (iii) above. As discussed in Section 1, importance sampling (IS) can efficiently compute the integral in Eq. 1, through sampling from an appropriate importance sampling density (ISD), $h(\bm{\bm{X}})$ . The ISD places greater importance on rare event domain, $\mathcal{F}$ , in the random variable space compared to $\pi_{\bm{X}}(\bm{x})$ , satisfying the sufficient condition $\text{supp}(I_{\mathcal{F}}\,\pi_{\bm{X}})\subseteq\text{supp}(h)$ , with $\text{supp}(\cdot)$ denoting the support of the function. Eq. (1) can then be written as:

p_{\mathcal{F}}=\int_{\mathcal{X}}I_{\mathcal{F}}(\bm{x})\,\dfrac{\pi_{\bm{X}}(\bm{x})}{h(\bm{x})}h(\bm{x})d\bm{x}={\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{h(\bm{X})}\big{]}

(2)

leading to the unbiased IS estimator:

\hat{p}_{\mathcal{F}}=\frac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i})\dfrac{\pi_{\bm{X}}(\bm{x}_{i})}{h(\bm{x}_{i})}

(3)

where $\{\bm{x}_{i}\}_{1}^{N}\sim h$ . The efficiency of IS relies heavily on the careful selection of both the ISD and the sampling algorithm. A theoretically optimal ISD $h^{*}$ , providing zero-variance IS estimator, is given by [3]:

\displaystyle h^{*}(\bm{X})=\dfrac{1}{p_{\mathcal{F}}}I_{\mathcal{F}}(\bm{X})\pi_{\bm{X}}(\bm{X})

(4)

However, this optimal ISD cannot be utilized because the normalizing constant in Eq. 4 is the sought probability $p_{\mathcal{F}}$ . Additionally, $h^{*}$ can be highly concentrated within the rare event domain, making it challenging to sample, particularly in high dimensions [4]. To address these challenges, we employ our developed importance sampling-based framework, ASTPA [29, 4], which constructs a relaxed ISD that approximates the optimal one while being much easier to sample. This relaxation allows for more efficient sampling, leading to a more accurate estimation of the sought probability. The primary focus of this work is to introduce an efficient, gradient-free sampling approach, which is especially beneficial when analytical gradients are not available, as detailed in Section 4.

3 Approximate Sampling Target with Post-processing Adjustment (ASTPA) in Gaussian Spaces

ASTPA estimates the rare event probability in Eq. 1 through an innovative non-sequential, importance sampling variant, as introduced in [29, 4]. In this approach, ASTPA constructs a single unnormalized approximate sampling target $\tilde{h}$ , relaxing the optimal ISD in Eq. 4. After sampling $\tilde{h}$ , its normalizing constant $C_{h}$ is computed utilizing inverse importance sampling. By employing the unnormalized importance sampling distribution $\tilde{h}$ , Eq. 2 can be written as:

p_{\mathcal{F}}={\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{h(\bm{X})}\big{]}=C_{h}\,{\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{\tilde{h}(\bm{X})}\big{]}=C_{h}\,\tilde{p}_{\mathcal{F}}

(5)

Thus, computing $p_{\mathcal{F}}$ is decomposed into two generally simpler tasks. The first involves constructing $\tilde{h}$ and sampling $\{\bm{x}_{i}\}_{i=1}^{N}\sim h$ ¹¹1 $\{\bm{x}_{i}\}_{i=1}^{N}\sim h$ is equivalent to $\{\bm{x}_{i}\}_{i=1}^{N}\sim\tilde{h}$ ., to compute the unbiased expectation of the weighted indicator function (shifted probability estimate) as:

\tilde{p}_{\mathcal{F}}=\,{\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{\tilde{h}(\bm{X})}\big{]}\approx\hat{\tilde{p}}_{\mathcal{F}}=\dfrac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i})\dfrac{\pi_{\bm{X}}(\bm{x}_{i})}{\tilde{h}(\bm{x}_{i})}

(6)

The normalizing constant, $\hat{C}_{h}$ , is then estimated using inverse importance sampling, as discussed in Section 3.2. The ASTPA estimator for the sought rare event probability can then be computed as:

\hat{p}_{\mathcal{F}}=\hat{\tilde{p}}_{\mathcal{F}}\,\,\hat{C}_{h}

(7)

Fig. 1 provides a concise illustration of the ASTPA framework using a nonlinear bimodal limit state function, as detailed in Section 6.1, with $p_{\mathcal{F}}\sim 9.47\times 10^{-6}$ . The construction of the approximate sampling target in ASTPA, along with recommended parameters, is discussed in Section 3.1. We then present the inverse importance sampling technique in Section 3.2. Subsequently, Section 3.3 shows the statistical properties of the ASTPA estimator presented in Eq. 7.

Refer to caption — Figure 1: Outlining the ASTPA framework; (a) shows an original bivariate independent standard Gaussian distribution $\pi_{\bm{X}}$ in red, and a limit state function at $g(\bm{X})=0$ shown using a grey line, with a bimodal rare event domain being inside $g(\bm{X})=0$ , as detailed in Section 6.1, (b) visualizes the optimal importance sampling density, $h^{*}$ , in Eq. 4, (c) depicts the constructed approximate sampling target, $\tilde{h}$ , in Eq. 9, (d) showcases samples from this target, subsequently used to compute $\hat{\tilde{p}}_{\mathcal{F}}$ in Eq. 6, and (e) and (f) demonstrate the inverse importance sampling procedure to compute the normalizing constant $\hat{C}_{h}$ in Eq. 11. Specifically, (e) and (f) present a crudely fitted $Q(\bm{X})$ and its samples, respectively. The sought probability is eventually computed using Eq. 7.

3.1 Target distribution formulation

As discussed earlier, the basic idea is to construct a near-optimal sampling target distribution $\tilde{h}$ , placing higher importance on the rare event domain $\mathcal{F}$ . In ASTPA, this is achieved by approximating the indicator function $I_{\mathcal{F}}$ in Eq. 4 with a smooth likelihood function $\ell_{g_{\bm{X}}}$ , chosen as a logistic cumulative distribution function (CDF) of the negative of a scaled limit state function, $F_{\text{CDF}}\left(\nicefrac{{-g(\bm{X})}}{{g_{c}}}\right)$ , with $g_{c}$ being a scaling constant:

\displaystyle\ell_{g_{\bm{X}}}=\,F_{\text{CDF}}\bigg{(}\dfrac{-g(\bm{X})}{g_{c}}\mid\ \mu_{g},\ \sigma\bigg{)}=\,\Bigg{(}1+\exp\bigg{(}\dfrac{(\frac{g(\bm{X})}{g_{c}})+\mu_{g}}{(\frac{\sqrt{3}}{\pi})\sigma}\bigg{)}\Bigg{)}^{-1}

(8)

where $\mu_{g}$ and $\sigma$ are the mean and dispersion factor of the logistic CDF, respectively. The approximate sampling target is ASTPA is then expressed as:

\displaystyle\tilde{h}(\bm{X})=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}(\bm{X})=\,\Bigg{(}1+\exp\bigg{(}\dfrac{(\frac{g(\bm{X})}{g_{c}})+\mu_{g}}{(\frac{\sqrt{3}}{\pi})\sigma}\bigg{)}\Bigg{)}^{-1}\,\pi_{\bm{X}}(\bm{X})

(9)

The dispersion factor $\sigma$ controls the spread of the likelihood function and thus influences the shape of the constructed target $\tilde{h}$ . Recommended values for $\sigma$ lie in the range of $[0.1,\,0.6]$ , with lower values (typically between $0.1$ and $0.4$ ) generally working efficiently. Higher values within the recommended range may, however, be needed for high-dimensional cases involving strongly nonlinear effects. The mean parameter, $\mu_{g}$ , of the logistic CDF is typically chosen as $\mu_{g}=1.21\sigma$ , generally placing $\mu_{g}$ inside the rare event domain, as derived in [4]. In Fig. 2, we illustrate how the dispersion factor, $\sigma$ , influences the shape of the target distribution, based on a nonlinear bimodal limit state function, as detailed in Section 6.1. As shown, decreasing $\sigma$ results in a more concentrated target distribution inside the rare event domain. Although pushing the target further inside the rare event domain, i.e., closer to the optimal distribution $(I_{\mathcal{F}}\pi_{\bm{X}})$ , can generate more rare event samples, it may also complicate the sampling process and, thus, the computation of the normalizing constant $\hat{C}_{h}$ , as discussed in [4].

The scaling $\nicefrac{{g(\bm{X})}}{{g_{c}}}$ accounts for the varying magnitudes of limit-state functions, ensuring that the input to the logistic CDF $\left(\nicefrac{{g(\bm{X})}}{{g_{c}}}\right)$ , evaluated at the mean of the original distribution, consistently falls within a predefined range. This process standardizes the input while keeping the recommended values for $\sigma$ unchanged. To ensure the input at $\bm{x}=\bm{0}$ lies within a desired range of $[3,7]$ , $g_{c}$ is defined as $g_{c}=\nicefrac{{g(\bm{0})}}{{q}}$ , where $\,q\in[3,7]\text{ if }g(\bm{0})\notin[3,7]$ , and $g_{c}=1$ otherwise. Our empirical analysis confirms the effectiveness of this scaling technique, demonstrating that the recommended range for $q$ is practical for the studied Gaussian problems.

Fig. 3 demonstrates the effect of the scaling constant $g_{c}$ on constructing the target distribution $\tilde{h}(\bm{X})$ . Since the limit state function considered in this problem provides $g(\bm{0})>7$ , the scaling constant $g_{c}$ can be defined as $\nicefrac{{g(\bm{0})}}{{4}}$ . Fig. 3(b) shows the unnormalized target distribution $\tilde{h}(\bm{X})$ constructed without applying the proper scaling constant. As a result, the input $\left(\nicefrac{{g(\bm{X})}}{{g_{c}}}\right)$ at $\bm{x}=0$ does not lie within the desired range, leading to a highly concentrated target. In contrast, Fig. 3(c) presents the corrected target distribution after applying the appropriate scaling factor $g_{c}=\nicefrac{{g(\mathbf{0})}}{{4}}$ . These plots clearly demonstrate how scaling the limit-state function helps maintain an appropriately relaxed target, with the recommended parameter values remaining unchanged, thus enhancing sampling efficiency.


(a) $\pi_{\bm{X}}$	(b) $\tilde{h}(\bm{X});g_{c}=1$	(c) $\tilde{h}(\bm{X});{g_{c}=\dfrac{g(\textbf{0})}{4}}$

3.2 Inverse importance sampling

Inverse importance sampling (IIS) is a general technique to compute normalizing constants, demonstrating exceptional performance in the context of rare event probability estimation [29]. Given an unnormalized distribution $\tilde{h}$ , and samples $\{\bm{x}_{i}\}_{i=1}^{N}\sim h$ , inverse importance sampling first fits an ISD $Q(\bm{x})$ based on the samples $\{\bm{x}_{i}\}_{i=1}^{N}\sim h$ . IIS then estimates the normalizing constant $C_{h}$ as:

C_{h}=\int_{\mathcal{X}}\tilde{h}(\bm{x})d\bm{x}=\int_{\mathcal{X}}\dfrac{\tilde{h}(\bm{x})}{Q(\bm{x})}Q(\bm{x})d\bm{x}\,=\mathbb{E}_{Q}\big{[}\dfrac{\tilde{h}(\bm{X})}{Q(\bm{X})}]

(10)

By drawing $\{\bm{x}^{\prime}\}_{i=1}^{M}$ i.i.d. samples from $Q$ , the unbiased IIS estimator can be computed as:

\hat{C}_{h}=\dfrac{1}{M}\sum_{i=1}^{M}\dfrac{\tilde{h}(\bm{x}_{i}^{\prime})}{Q(\bm{x}_{i}^{\prime})}

(11)

In this work, a Gaussian Mixture Model (GMM), based on the available samples $\{\bm{x}_{i}\}_{i=1}^{N}\sim h$ , is employed as the ISD $Q(\bm{X})$ . In low-dimensional spaces, where $d<20$ , we employ a GMM with a large number $(\sim 10)$ of Gaussian components that have full covariance matrices. This approach aims to closely approximate the target distribution $\tilde{h}$ with an ISD $Q(.)$ , thus improving the IIS estimator in eq. 11. To address the challenges posed by high-dimensional spaces, GMMs with diagonal covariance matrices are used in high-dimensional examples in this work. This choice reduces the number of GMM parameters to be estimated, thereby mitigating the scalability issues typically associated with density estimation methods. Notably, IIS has been shown to work effectively even with a crudely fitted $Q$ [4, 29].

The IIS technique is further enhanced using the splitting approach introduced in [4] to improve the robustness of the IIS estimator against the impact of rarely observed outliers or anomalous samples. In this approach, the sample set $\{\bm{x}_{i}^{\prime}\}_{i=1}^{M}\sim Q$ is split into two equal-sized subsets, and the IIS estimate $\hat{C}_{h}$ is computed separately for each subset. If the two estimates are within a reasonable range of each other (specifically, $\frac{1}{3}\leq\nicefrac{{\hat{C}_{h}^{1}}}{{\hat{C}_{h}^{2}}}\leq 3$ ), the final estimate is taken as their average. Otherwise, the final estimate is conservatively set to the minimum of the two. This method has proven effective across all examples presented in this work.

3.3 Statistical properties of the ASTPA estimator

The ASTPA estimator of the rare event probability can finally be computed as:

\displaystyle p_{\mathcal{F}}

\displaystyle=\,{\mathop{\mathbb{E}}}_{h}\big{[}I_{\mathcal{F}}(\bm{X})\dfrac{\pi_{\bm{X}}(\bm{X})}{\tilde{h}(\bm{X})}\big{]}\,\mathbb{E}_{Q}\big{[}\dfrac{\tilde{h}(\bm{X})}{Q(\bm{X})}]\approx\hat{p}_{\mathcal{F}}=\hat{\tilde{p}}_{\mathcal{F}}\,\hat{C}_{h}=\bigg{(}\dfrac{1}{N}\sum_{i=1}^{N}I_{\mathcal{F}}(\bm{x}_{i})\dfrac{\pi_{\bm{X}}(\bm{x}_{i})}{\tilde{h}(\bm{x}_{i})}\bigg{)}\bigg{(}\dfrac{1}{M}\sum_{i=1}^{M}\dfrac{\tilde{h}(\bm{x}_{i}^{\prime})}{Q(\bm{x}_{i}^{\prime})}\bigg{)}

(12)

where $\{\bm{x}_{i}\}_{i=1}^{N}$ and $\{\bm{x}_{i}^{\prime}\}_{i=1}^{M}$ are samples from $h$ and $Q$ , respectively.

As shown in [29, 4], the ASTPA estimator is unbiased, i.e., $\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]=\mathop{\mathbb{E}}[\hat{\tilde{p}}_{\mathcal{F}}\,\hat{C}_{h}]=\tilde{p}_{\mathcal{F}}\,C_{h}$ . The analytical Coefficient of Variation (C.o.V) of the ASTPA estimator can also be computed as $\textrm{C.o.V}\approx\nicefrac{{\sqrt{\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})}}}{{\hat{p}_{\mathcal{F}}}}$ , where the variance ( $\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})$ ) is estimated by:

\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})=(\hat{\tilde{p}}_{\mathcal{F}})^{2}\,\widehat{\text{Var}}(\hat{C}_{h})+(\hat{C}_{h})^{2}\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}})+\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}})\widehat{\text{Var}}(\hat{C}_{h})

(13)

Here, $\hat{\tilde{p}}_{\mathcal{F}}$ and $\hat{C}_{h}$ can be computed according to Eqs. 6 and 11, respectively. The variances $\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}})$ and $\widehat{\text{Var}}(\hat{C}_{h})$ can thus be estimated as follows:

\displaystyle\widehat{\text{Var}}(\hat{\tilde{p}}_{\mathcal{F}})=\frac{1}{N_{s}(N_{s}-1)}\sum_{i=1}^{N_{s}}\bigg{(}\dfrac{I_{\mathcal{F}}(\bm{x}_{i})\,\pi_{\bm{X}}(\bm{x}_{i})}{\tilde{h}(\bm{x}_{i})}-\hat{\tilde{p}}_{\mathcal{F}}\bigg{)}^{2},\quad\quad\widehat{\text{Var}}(\hat{C}_{h})=\frac{1}{M(M-1)}\sum_{i=1}^{M}\Bigg{(}\dfrac{\tilde{h}(\bm{x}_{i}^{\prime})}{Q(\bm{x}_{i}^{\prime})}-\hat{C}_{h}\Bigg{)}^{2}

(14)

where $N_{s}$ denotes the number of used Markov chain samples, accounting for the fact that the samples are not independent and identically distributed (i.i.d) in this case. To address this, a thinning process is adopted to reduce the sample size from $N$ to $N_{s}$ based on the effective sample size (ESS) of the sample set $\{\bm{x}_{i}\}_{i=1}^{N}$ . We select every $j^{th}$ sample, where $j$ , an integer, is determined by:

j=\left\lfloor\dfrac{N}{4\cdot\text{ESS}_{\text{min}}}\right\rfloor

(15)

where $\text{ESS}_{\text{min}}$ represents the minimum effective sample size across all dimensions. To ensure $j$ remains within practical bounds, it is constrained to the set $\{3,4,5,\ldots,30\}$ . Any calculated value of $j$ outside this range is adjusted to the nearest bound within it. The analytical C.o.V computed according to this approach showed very good agreement with the sample C.o.V in all our numerical examples.

4 Gradient-free Sampling of the Approximate Target ( $\tilde{h}$ ) in ASTPA

In this section, we introduce a novel gradient-free approach for sampling the approximate target, $\tilde{h}=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}$ , in ASTPA. Rare event probability estimation typically faces two fundamental sampling challenges. First, efficiently sampling high-dimensional distributions is difficult due to the curse of dimensionality, where the volume of the space expands exponentially with the number of dimensions [50]. Second, the rare event domain, $\mathcal{F}\coloneqq\{\bm{x}\,:\,g(\bm{x})\leq 0\}$ , often exists randomly in the tails of the probability distribution $\pi_{\bm{X}}$ , making it challenging to effectively guide samples toward these regions of interest, which may also exhibit multi-modality; see Fig. 1(a) for an example. In addressing the first challenge, we adopt the dimension-robust preconditioned Crank-Nicolson (pCN) algorithm within our approach, as discussed in Section 4.1. For the second challenge, we develop an effective discovery method, aimed at precisely locating rare event domains and appropriately initializing the pCN chains, as detailed in Section 4.2. Our developed algorithm, termed guided-pCN, thus enhances sampling for $\tilde{h}$ by overcoming the limitations of the standard pCN sampler in the context of rare event probability estimation.

4.1 The preconditioned Crank-Nicolson (pCN) algorithm

The preconditioned Crank-Nicolson (pCN) algorithm is a gradient-free Markov Chain Monte Carlo (MCMC) method designed for efficient sampling from high-dimensional probability distributions [36, 37]. To sample from a target distribution $\tilde{h}=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}$ , the pCN sampler proposes an update, $\tilde{\bm{x}}$ , at each iteration, $t$ , by combining the current sample, $\bm{x}_{t}$ , with a perturbation drawn from the original (prior) Gaussian distribution, $\pi_{\bm{X}}=\mathcal{N}(\bm{X};\,\bm{0},\mathbf{C})$ , as follows:

\tilde{\bm{x}}=\sqrt{1-\beta^{2}}\,\bm{x}_{t}+\beta\,\bm{\xi},\quad\bm{\xi}\sim\mathcal{N}(\mathbf{0},\mathbf{C})

(16)

where $\beta\in(0,1]$ is a tunable scaling parameter, discussed below, and $\mathbf{C}$ is a covariance matrix. This update is equivalent to using a Gaussian proposal distribution $\mathcal{N}(\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C})$ , yielding the acceptance probability:

\alpha=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})}{\ell_{g_{\bm{X}}}(\bm{x}_{t})}\bigg{\}}

(17)

where $\ell_{g_{\bm{X}}}$ is the likelihood function in Eq. 8. This formula illustrates a fundamental characteristic of the pCN algorithm: its invariance to the underlying Gaussian distribution $\mathcal{N}(\bm{X};\,\bm{0},\mathbf{C})$ . In essence, this ensures that the proposed sample is always accepted when sampling from $\pi_{\bm{X}}$ , making the pCN highly suitable for efficiently exploring high-dimensional Gaussian spaces. This invariance can be understood by examining the Metropolis-Hastings (MH) ratio when using the pCN algorithm to sample $\mathcal{N}(\bm{X};\,\bm{0},\mathbf{C})$ , as:

\alpha=\min\bigg{\{}1,\dfrac{\mathcal{N}(\tilde{\bm{x}};\,\bm{0},\mathbf{C})\,\mathcal{N}(\bm{x}_{t}\mid\tilde{\bm{x}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C})}{\mathcal{N}(\bm{x}_{t};\,\bm{0},\mathbf{C})\,\mathcal{N}(\tilde{\bm{x}}\mid\bm{x}_{t};\,\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C})}\bigg{\}}

(18)

Algorithm 1 The preconditioned Crank-Nicolson (pCN) Algorithm

1:procedure pCN(

\bm{x}_{0}

\beta_{0}

\alpha^{*}

L_{\text{chain}}

\ell_{g_{\bm{X}}}(\bm{X})

\mathbf{C}

)

2: for

t=1

to

{L_{\text{Chain}}}

\bm{x}_{t}

\leftarrow

\bm{x}_{t-1}

\tilde{\bm{x}}=\sqrt{1-\beta_{t-1}^{2}}\,\bm{x}_{t-1}+\beta_{t-1}\,\bm{\xi},\quad\bm{\xi}\sim\mathcal{N}(\mathbf{0},\mathbf{C})

\triangleright

Propose candidate sample

with

probability

\triangleright

Accept or reject candidate sample

\alpha_{t}=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})}{\ell_{g_{\bm{X}}}(\bm{x}_{t-1})}\bigg{\}}

\bm{x}_{t}

\leftarrow

\tilde{\bm{x}}

\log\beta_{t}=\log\beta_{t-1}+t^{\nicefrac{{-1}}{{2}}}(\alpha_{t}-\alpha^{*})

\triangleright

Adapt

\beta

using Robbins-Monro algorithm (Eq. 21)

9: end for

10:end procedure

Considring the marginal distribution $\mathcal{N}(\tilde{\bm{X}};\,\bm{0},\mathbf{C})$ and the conditional distribution $\mathcal{N}(\bm{X}_{t}\mid\tilde{\bm{X}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C})$ , the joint random vector $[\tilde{\bm{X}}\,\,\bm{X}_{t}]^{\text{T}}$ is therefore Gaussian with a mean vector of $[\bm{0}\,\,\bm{0}]^{\text{T}}$ and a covariance matrix of $\begin{bmatrix}\mathbf{C}&\sqrt{1-\beta^{2}}\,\mathbf{C}\\ \sqrt{1-\beta^{2}}\,\mathbf{C}&\mathbf{C}\end{bmatrix}$ . The conditional distribution of $\tilde{\bm{X}}$ given $\bm{X}_{t}$ is hence Gaussian with a conditional mean $\bm{\mu}_{\tilde{\bm{X}}\mid\bm{X}_{t}}=\sqrt{1-\beta^{2}}\bm{X}_{t}$ and a covariance $\mathbf{\Sigma}_{\tilde{\bm{X}}\mid\bm{X}_{t}}=\beta^{2}\mathbf{C}$ . We therefore get:

	$\displaystyle\mathcal{N}(\tilde{\bm{X}};\,\bm{0},\mathbf{C})\,\mathcal{N}(\bm{X}_{t}\mid\tilde{\bm{X}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C})$	$\displaystyle=\mathcal{N}\bigg{(}\begin{bmatrix}\tilde{\bm{X}}\\ \bm{X}_{t}\end{bmatrix};\,\begin{bmatrix}\bm{0}\\ \bm{0}\end{bmatrix},\begin{bmatrix}\mathbf{C}&\sqrt{1-\beta^{2}}\,\mathbf{C}\\ \sqrt{1-\beta^{2}}\,\mathbf{C}&\mathbf{C}\end{bmatrix}\bigg{)}$		(19)
		$\displaystyle=\mathcal{N}(\bm{X}_{t};\,\bm{0},\mathbf{C})\,\mathcal{N}(\tilde{\bm{X}}\mid\bm{X}_{t};\,\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C})$		(19)

Consequently, the MH ratio in Eq. 18 simplifies to one, confirming that the pCN proposal is indeed invariant under the original Gaussian distribution $\pi_{\bm{X}}$ . As a result, when applying the pCN algorithm to sample from the target distribution $\tilde{h}=\ell_{g_{\bm{X}}}\,\pi_{\bm{X}}$ , the acceptance probability reduces to the ratio between the likelihood function for the proposed and the current samples:

\alpha=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})\,\mathcal{N}(\tilde{\bm{x}};\,\bm{0},\mathbf{C})\,\mathcal{N}(\bm{x}_{t}\mid\tilde{\bm{x}};\,\sqrt{1-\beta^{2}}\,\tilde{\bm{x}},\,\beta^{2}\mathbf{C})}{\ell_{g_{\bm{X}}}(\bm{x}_{t})\,\mathcal{N}(\bm{x}_{t};\,\bm{0},\mathbf{C})\,\mathcal{N}(\tilde{\bm{x}}\mid\bm{x}_{t};\,\sqrt{1-\beta^{2}}\,\bm{x}_{t},\,\beta^{2}\mathbf{C})}\bigg{\}}=\min\bigg{\{}1,\dfrac{\ell_{g_{\bm{X}}}(\tilde{\bm{x}})}{\ell_{g_{\bm{X}}}(\bm{x}_{t})}\bigg{\}}

(20)

The scaling parameter $\beta$ determines the balance between retaining information from the current sample and exploring new regions of the random variable space. When $\beta$ is close to zero, we have $\sqrt{1-\beta^{2}}\approx 1$ , which keeps the proposed sample $\tilde{\bm{x}}$ close to the current one. This proximity leads to high acceptance rates but potentially slows the mixing of the Markov chain due to limited exploration, thus reducing the effective sample size. Conversely, as $\beta$ approaches one, we have $\sqrt{1-\beta^{2}}\approx 0$ , making $\tilde{\bm{x}}$ largely independent of $\bm{x}_{t}$ . This allows the sampler to explore the random variable space more broadly, but the proposed samples are more likely to land in low-probability regions, leading to an increase in rejections. To enhance the sampling efficiency, it is essential to find an optimal $\beta$ balancing these trade-offs, which is typically achieved by tuning $\beta$ to target a desired average acceptance ratio $\alpha^{*}$ . The Robbins–Monro stochastic approximation algorithm provides a method for such adaptation as [51, 52]:

\log\beta_{t+1}=\log\beta_{t}+\gamma_{t}\left(\alpha_{t}-\alpha^{*}\right)

(21)

where $\alpha_{t}$ is the acceptance probability at iteration $t$ , computed by Eq. 17, and $\gamma_{t}$ is a diminishing step size sequence, commonly chosen as $\gamma_{t}=t^{-0.5}$ [32]. In this work, we target an average acceptance probability in the range of $20$ – $40\%$ , with the lower values applied to high-dimensional distributions and the higher values to low-dimensional ones. This range is consistent with the values typically employed in the literature [37, 41] and has proven effective across all numerical examples considered in this work. Algorithm 1 describes the pCN algorithm, where $\bm{x}_{0}$ is the initial sample, $\beta_{0}$ is the initial scaling parameter ( $\beta_{0}=0.5$ in this work), $\alpha^{*}$ is the desired average acceptance ratio, and $L_{\text{chain}}$ indicates the number of steps in the pCN chain.

4.2 Rare event domain discovery

In this section, we introduce a computationally efficient discovery approach for rare event domains, designed to enhance sampling efficiency by generating (multimodal) rare event samples that serve as initial seeds for the pCN chains. This approach addresses the limitations the pCN algorithm may encounter in scenarios involving multimodality and target distributions, $\tilde{h}$ , that are significantly distant from the original (prior) distribution $\pi_{\bm{X}}$ .

Algorithm 2 Rare Event Domain Discovery Algorithm

1:procedure RareEventDiscovery(

N_{\text{level}}

p_{0}

\epsilon

g(\bm{X})

N_{\text{chain}}

)

2: Generate an initial sample set

\{\bm{x}_{i}^{0}\}_{i=1}^{N_{\text{level}}}

from

\mathcal{N}(\bm{0},\epsilon\mathbf{I})

3: Sort the samples

\{\bm{x}_{i}^{0}\}_{i=1}^{N_{\text{level}}}

in increasing order of their limit-state values

\{g(\bm{x}_{i}^{0})\}_{i=1}^{N_{\text{level}}}

4: Find the threshold

\lambda_{1}

as the

p_{0}

-percentile of the ordered

\{g(\bm{x}_{i}^{0})\}_{i=1}^{N_{\text{level}}}

, and set

\mathcal{F}_{1}=\{\bm{x}:g(\bm{x})\leq\lambda_{1}\}

5: Set the intermediate events counter

j\leftarrow 1

6: while

\lambda_{j}>0

7: Find

N_{S}=p_{0}N_{\text{level}}

seeds,

\{\bm{x}_{i}^{(j-1)}\}_{i=1}^{N_{S}}

, where

\bm{x}_{i}^{(j-1)}\in\mathcal{F}_{j}

8: Generate

N_{\text{level}}

samples

\{\bm{x}_{i}^{j}\}_{i=1}^{N_{\text{level}}}

from

\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{j}}}}

, by drawing

(\nicefrac{{1}}{{p_{0}}}-1)

additional samples, starting from each seed

\bm{x}_{i}^{(j-1)}\in\mathcal{F}_{j}

, using an MCMC sampler.

\triangleright

\pi_{\bm{X}\mid\mathcal{F}_{j}}=\mathcal{N}(\bm{X}\mid\mathcal{F}_{j};\,\bm{0},\mathbf{I})

9: Find

\lambda_{j+1}

as the

p_{0}

-percentile of the ascendly ordered

\{g(\bm{x}_{i}^{j})\}_{i=1}^{N_{\text{level}}}

, and set

\mathcal{F}_{j+1}=\{\bm{x}:g(\bm{x})\leq\lambda_{j+1}\}

10:

j\leftarrow j+1

11: end while

12: Sample

N_{\text{chain}}

pCN seeds from the final rare event sample set

\{\bm{x}_{i}^{(j-1)}\}_{i=1}^{N_{S}}

, for which

\bm{x}_{i}^{(j-1)}\in\mathcal{F}

13:end procedure

This process begins by sampling from the original distribution, $\pi_{\bm{X}}=\mathcal{N}(\bm{X};\,\bm{0},\mathbf{I})$ , or a more dispersed version, $\mathcal{N}(\bm{X};\,\bm{0},\epsilon\mathbf{I})$ , with the dispersion parameter $\epsilon\geq 1$ , to facilitate broader exploration of the random variable space. This step is followed by conditional sampling based on the reciprocal of the conditional distribution $\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{j}}}}$ , where $\mathcal{F}_{j}$ is an intermediate event defined as $\mathcal{F}_{j}\coloneqq\{\bm{x}\,:\,g(\bm{x})\leq\lambda_{j}\}$ , with $\mathcal{F}_{1}\supset\mathcal{F}_{2}\supset\dots\supset\mathcal{F}_{M}$ , where $\mathcal{F}_{M}=\mathcal{F}$ and $\lambda_{1}\geq\lambda_{2}\geq\dots\geq\lambda_{M}=0$ , chosen adaptively as discussed below. The use of $\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}_{j}}}}$ significantly facilitates rapid diffusion toward the rare event domain, compared to sampling from the conditional distribution $\pi_{\bm{X}\mid\mathcal{F}_{j}}$ . While the latter is necessary for methods sequentially computing the sought probability, our method focuses solely on generating samples in the rare event domain (without utilizing any sequential information). The procedure stops after generating a specified number of rare event samples from the synthetic distribution $\nicefrac{{1}}{{\pi_{\bm{X}\mid\mathcal{F}}}}$ , from which a defined number of seeds, $N_{\text{chain}}$ , is randomly selected to initialize the pCN chains, ensuring that the sampler is initialized within the rare event domain.

The detailed steps of this discovery process are outlined in Algorithm 2, where $N_{\text{level}}$ denotes the number of samples of each intermediate event $\mathcal{F}_{j}$ , $p_{0}$ is the conditional probability for intermediate events, controlling the selection of threshold levels, $\lambda_{j}$ , $\epsilon$ is a dispersion parameter for the initial distribution, $\mathcal{N}(\bm{0},\epsilon\mathbf{I})$ , and $N_{\text{chain}}$ represents the number of the desired pCN chains. Fig. 4 outlines the proposed rare event domain discovery stage, showing the evolution of samples and seed selection, while the computational benefits of using a more dispersed initial distribution are demonstrated in Fig. 5. In this work, we use $\epsilon=4$ for all numerical examples to facilitate broader exploration of the random variable space by the initial sample set $\{\bm{x}_{i}^{0}\}_{i=1}^{N_{\text{level}}}$ .

Let $\mathbf{X}^{\mathcal{F}}=\{\bm{x}_{1},\bm{x}_{2},\ldots,\bm{x}_{N_{\mathcal{F}}}\}$ , with $N_{\mathcal{F}}\geq N_{S}$ , be the rare event sample set obtained in Algorithm 2. The probability of selecting $\bm{x}_{i}\in\mathbf{X}^{\mathcal{F}}$ as a pCN seed can be given by:

\vspace{-0.05in}\text{Pr}(\bm{x}_{i})=\dfrac{\tilde{h}(\bm{x}_{i})}{\sum_{j}^{N_{\mathcal{F}}}\tilde{h}(\bm{x}_{j})}\vspace{-0.05in}

(22)

where $\tilde{h}(\bm{x}_{i})$ is the value of the target distribution in ASTPA at sample $\bm{x}_{i}$ . Then, $N_{\text{chain}}$ pCN seeds can be sampled from the set $\mathbf{X}^{\mathcal{F}}$ without replacement, using the selection probability $\text{Pr}(\bm{x}_{i})$ . This approach ensures that pCN seeds are more likely to be selected from different rare event modes, based on their statistical weights. Fig. 4(e) and Fig. 5(d) illustrate the selected pCN seeds using this weighted selection method. While this technique is effective for generating multimodal seeds in low-dimensional cases, as demonstrated in the low-dimensional examples in this work, it may not scale well in high-dimensional, multimodal spaces due to the sensitivity of the high-dimensional target distribution values, even for relatively small perturbations. In such cases, we recommend using uniform sampling to select pCN seeds, which ensures a broader exploration of high-dimensional, multimodal rare event domains. This uniform sampling method is successfully adopted in the high-dimensional examples in this work. Two examples of this uniform sampling are shown in Fig. 6(a) and Fig. 6(b), illustrating that the selected seeds are located in the rare event domain but do not follow any relevant target distribution ( $\pi_{\bm{X}}$ or $\tilde{h}$ ). The computational efficiency of the proposed discovery approach, which requires significantly fewer model evaluations, just in order to identify the relevant domains, compared to conventional sequential methods utilized for probability estimation, is highlighted by comparing its performance in Fig. 6(a) and Fig. 6(b) with the sampling evolution of the Subset Simulation (SuS) method, as shown in Fig. 6(c).

5 Summary of Gradient-free ASTPA

The proposed guided pCN-based ASTPA framework can be applied as follows:

1.

Construct the approximate sampling target $\tilde{h}$ in Eq. 9, where its two parameters $\sigma$ and $g_{c}$ are determined as recommended in Section 3.1.
2.

Run the rare event discovery stage as described in Algorithm 2 to obtain $N_{\text{chain}}$ initial seeds for the pCN sampler. $N_{\text{level}}=300$ , $p_{0}=\{0.1,0.2\}$ , and $\epsilon=4$ typically work effectively in Algorithm 2, with $p_{0}=0.2$ being more suitable for high-dimensional problems. Denote the total number of model calls in this stage as $N_{\text{discovery}}$ .
3.

Generate $N_{\text{chain}}$ pCN chains, as described in Algorithm 1, with a number of steps, $L_{\text{chain}}$ , starting from the seeds obtained in the previous step, to get a sample set $\{\bm{x}_{i}\}_{i=1}^{N}\sim h$ . A burn-in phase with a length of $10\%\,\text{of}\,L_{\text{chain}}$ is considered, with $N_{\text{burn}}$ denoting the total number of burn-in samples, i.e, $N=\left\lceil 0.9\,N_{\text{chain}}\cdot L_{\text{chain}}\right\rceil$ , and $N_{\text{burn}}=\left\lfloor 0.1\,N_{\text{chain}}\cdot L_{\text{chain}}\right\rfloor$ . The required number of chains, $N_{\text{chain}}$ , typically ranges between 5 and 25, with higher numbers recommended for multimodal cases.
4.

Compute the expected value of the weighted indicator function, $\hat{\tilde{p}}_{\mathcal{F}}$ , using Eq. 6, based on $\{\bm{x}_{i}\}_{i=1}^{N}$ .
5.

Apply Inverse Importance Sampling (IIS) to compute the normalizing constant of $\hat{C}_{h}$ . This process initiates by fitting a GMM, $Q(\bm{\bm{X}})$ , based on $\{\bm{x}_{i}\}_{i=1}^{N}$ , with the $Q(\bm{\bm{X}})$ structured as recommended in Section 3.2. An i.i.d. sample set $\{\bm{x}_{i}^{\prime}\}_{i=1}^{M}$ is then drawn from $Q(\bm{\bm{X}})$ to compute $\hat{C}_{h}$ using Eq. 11. $M$ can be generally around $30\%$ of $N$ .
6.

Compute the sought rare event probability as $\hat{p}_{\mathcal{F}}=\hat{\tilde{p}}_{\mathcal{F}}\,\,\hat{C}_{h}$ .

A target total computational effort of model calls, $N_{\text{total}}=N_{\text{discovery}}+N_{\text{burn}}+N+M$ , can be attributed to the different stages according to all the aforementioned steps. The convergence of the estimator and/or any additional needed model calls can then be checked through Eq. 14.

6 Numerical Results

This section studies several numerical examples to examine the performance and efficiency of the proposed methods. In all examples, the guided pCN-based ASTPA is implemented as summarized in Section 5. The results are compared with state-of-the-art gradient-free algorithms, including the improved Cross Entropy-based Importance Sampling (iCE-IS) algorithm [9], the Sequential Importance Sampling (SIS) method [20], and the adaptive Conditional Sampling Subset Simulation variant (aCS-SuS) [32], all of which have well documented, publicly available code implementations [53]. Comparisons are made in terms of accuracy and computational cost, based on $500$ independent simulations. Specifically, for each simulation $j$ , we record the target probability estimate $(\hat{p}_{\mathcal{F}})^{j}$ , the total number of model calls $(N_{\text{total}})^{j}$ , and the analytical Coefficient of Variation of ASTPA, $(\text{C.o.V-Anal})^{j}$ , computed as discussed in Section 3.3. $N_{\text{total}}$ is defined for ASTPA in Section 5. Then, we report the mean of the rare event probability estimates, $\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$ , the mean of the total number of model calls $\mathop{\mathbb{E}}[N_{\text{tota}l}]$ , and the sampling C.o.V, computed as:

\textrm{C.o.V}=\dfrac{\sqrt{\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})}}{\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]},\quad\widehat{\text{Var}}(\hat{p}_{\mathcal{F}})=\dfrac{1}{500-1}\,\sum_{j=1}^{500}\big{(}(\hat{p}_{\mathcal{F}})^{j}-\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]\big{)}^{2}

(23)

The mean of the analytical C.o.V, $\mathbb{E}[\text{C.o.V-Anal}]$ , is also reported in parentheses. The total number of limit-state function evaluations $N_{\text{total}}$ for ASTPA has been determined based on achieving C.o.V values $\in[0.1,\,0.35]$ . Reference rare event probabilities are computed based on $100$ independent standard Monte Carlo simulations (MCs), as described in Section 2, using $10^{7}$ - $10^{9}$ samples, as appropriate. The problem dimensionality is denoted by $d$ , and all ASTPA parameters are carefully chosen for all examples but are not optimized.

6.1 Example 1: Nonlinear bimodal convex limit-state function

The first example involves a bimodal nonlinear limit-state function characterized by two independent standard normal random variables:

\displaystyle g(\bm{X})=\min\left[4-\frac{1}{\sqrt{2}}\ (X_{1}+X_{2})+2.5\ (X_{1}-X_{2})^{2},\,\,4+\frac{1}{\sqrt{2}}\ (X_{1}+X_{2})+2.5\ (X_{1}-X_{2})^{2}\right]

(24)

The approximate sampling target, $\tilde{h}$ , in ASTPA is constructed using a likelihood dispersion factor $\sigma=0.3$ and a scaling constant $g_{c}=1$ , as discussed in Section 3.1. The discovery stage then starts using $N_{\text{level}}=300$ and $p_{0}=0.1$ , yielding 10 rare event seeds for the pCN sampler ( $N_{\text{chain}}=10$ ), with each chain running for $L_{\text{chain}}=150$ iterations. Subsequently, inverse importance sampling (IIS) is applied using $M=300$ samples. Table 1 compares the performance of the guided pCN-ASTPA with iCE-IS, SIS, and aCS-SuS. As shown in table 1, the guided pCN-based ASTPA provides an $\mathbb{E}[\hat{p}_{\mathcal{F}}]$ estimate, closely matching the reference probability obtained through standard Monte Carlo simulation (MCS). Compared to the other methods considered, the guided pCN-based ASTPA demonstrates improved performance, achieving a lower C.o.V using fewer total model calls, $\mathbb{E}[N_{\text{total}}]$ . Notably, iCE-IS also demonstrates strong efficiency, outperforming both SIS and aCS-SuS in this 2D example. The excellent agreement between the sampling C.o.V and analytical C.o.V, as reported in parentheses, confirms the accuracy and effectiveness of the ASTPA analytical C.o.V expression. Since this example was introduced earlier in the paper to illustrate the ASTPA framework and the discovery stage, the steps and performance of the proposed framework are visualized in Figs. 1 and 5.

Table 1: Example 1: Performance of various methods for the nonlinear bimodal limit-state function.

Eq. 24	500 Simulations	MCS	pCN-ASTPA	iCE-IS	SIS	aCS-SuS
Eq. 24	500 Simulations	MCS	( $\sigma=0.3,\,q=$ n.a.)	iCE-IS	SIS	aCS-SuS
$d=2$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E9	2,373	2,972	5,525	5,741
	C.o.V	0.01	0.16 $\color[rgb]{0,0.88,0}($ 0.16 $\color[rgb]{0,0.88,0})$	0.18	1.18	0.72
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	9.47E-6	9.46E-6	9.52E-6	9.89E-6	9.66E-6

6.2 Example 2: Quartic bimodal limit-state function

In the second example, we consider a quartic bimodal limit-state function, defined in the standard Gaussian space as:

\displaystyle g(\bm{X})=6.5-\frac{1}{\sqrt{2}}\ (X_{1}+X_{2})-2.5\ (X_{1}-X_{2})^{2}+\ (X_{1}-X_{2})^{4}

(25)

The approximate target $\tilde{h}$ is constructed with a likelihood dispersion factor of $\sigma=0.2$ and a scaling constant $g_{c}=1$ . The discovery stage is applied using $N_{\text{level}}=300$ and $p_{0}=0.1$ , generating 18 rare event seeds for the pCN sampler ( $N_{\text{chain}}=18$ ), with each chain running for $L_{\text{chain}}=120$ steps. As in the previous example, inverse importance sampling (IIS) is used with $M=300$ samples. Fig. 7 shows the evolution of the rare event domain discovery stage, highlighting the effectiveness of the proposed algorithm in accurately identifying and sampling rare event regions.

Table 2: Example 2: Performance of various methods for the quartic bimodal limit-state function.

Eq. 25	500 Simulations	MCS	pCN-ASTPA	iCE-IS	SIS	aCS-SuS
Eq. 25	500 Simulations	MCS	( $\sigma=0.2,\,q=$ n.a.)	iCE-IS	SIS	aCS-SuS
$d=2$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E9	3,165	5,000	10,750	7,945
	C.o.V	0.30	0.12 $\color[rgb]{0,0.88,0}($ 0.11 $\color[rgb]{0,0.88,0})$	0.15	2.12	2.07
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	5.91E-8	5.92E-8	5.80E-8	4.28E-8	5.79E-8

The results, as summarized in Table 2, show that the guided pCN-ASTPA significantly outperforms the other methods, achieving the lowest C.o.V, while using the fewest model calls ( $\mathop{\mathbb{E}}[N_{\text{total}}]$ ). This demonstrates the robustness and efficiency of the guided pCN-ASTPA framework, even in the presence of increased nonlinearity introduced through higher-order terms in the limit-state function. The guided pCN-based ASTPA also provides $\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$ estimates in close agreement with the reference probability computed by Monte Carlo simulation (MCS). The very good agreement between the sampling and analytical C.o.V values (reported in parentheses) further confirms the reliability of the analytical C.o.V expression for ASTPA in this more challenging problem.

6.3 Example 3: The Himmelblau function

The Himmelblau function [54] is a commonly used fourth-order polynomial test function in nonlinear optimization. In this example, we modify the Himmelblau function to suit rare event scenarios, particularly those with multiple separated rare event domains. The modified limit-state function is defined in the standard Gaussian space as:

g(\bm{X})=\bigg{(}\dfrac{(0.75X_{1}-0.5)^{2}}{1.81}+\dfrac{(0.75X_{2}-0.5)}{1.81}-11\bigg{)}^{2}+\bigg{(}\dfrac{(0.75X_{1}-1)}{1.81}+\dfrac{(0.75X_{2}-0.5)^{2}}{1.81}-7\bigg{)}^{2}-50

(26)

where $X_{1}$ and $X_{2}$ are independent standard normal random variables. The approximate target $\tilde{h}$ is constructed using a likelihood dispersion factor $\sigma=0.3$ and a scaling constant $g_{c}=\nicefrac{{g(\bm{0})}}{{4}}$ , as described in Section 3.1 and Fig. 3. The discovery stage is applied using $N_{\text{level}}=300$ and $p_{0}=0.1$ , generating 16 rare event seeds for the pCN sampler, with each chain subsequently running for $L_{\text{chain}}=160$ iterations. Similar to the previous examples, inverse importance sampling (IIS) is applied using $M=300$ samples. Fig. 8 illustrates the constructed target distribution, the rare event domain discovery process, and the pCN samples, effectively exploring all rare event modes. The results in Table 3 show that the guided pCN-ASTPA outperforms the other methods, achieving a lower C.o.V and fewer total model evaluations. This demonstrates the robustness and efficiency of the guided pCN-ASTPA framework, particularly in handling complex, multi-modal rare event domains. The $\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$ estimate produced by the guided pCN-based ASTPA closely matches the reference probability, further confirming the method’s accuracy.

Table 3: Example 3: Performance of various methods for the Himmelblau limit-state function.

Eq. 26	500 Simulations	MCS	pCN-ASTPA	iCE-IS	SIS	aCS-SuS
Eq. 26	500 Simulations	MCS	( $\sigma=0.3,\,q=4$ )	iCE-IS	SIS	aCS-SuS
$d=2$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E9	3,430	6,466	6,216	7,101
	C.o.V	0.05	0.18 $\color[rgb]{0,0.88,0}($ 0.17 $\color[rgb]{0,0.88,0})$	0.37	0.30	0.53
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	2.81E-7	2.82E-7	2.36E-7	2.21E-7	3.02E-7

6.4 Example 4: Limit-state surface with changing topological parameter space.

This example examines a limit-state function with significant topological changes, making it a challenging scenario for rare event sampling methods, as described in [28]. The function is defined as:

g(\bm{X})=30\left[\bigg{(}\dfrac{4(X_{1}+2)^{2}}{9}+\dfrac{X_{2}^{2}}{25}\bigg{)}^{2}+1\right]^{-1}+20\left[\bigg{(}\dfrac{(X_{1}-2.5)^{2}}{4}+\dfrac{(X_{2}-0.5)^{2}}{25}\bigg{)}^{2}+1\right]^{-1}-5

(27)

where $X_{1}$ and $X_{2}$ are independent standard normal random variables. This limit-state surface is particularly challenging due to its abrupt changes in topology, as shown in Fig. 9(a), requiring a robust sampling framework to accurately capture the rare event domain.

The guided pCN-ASTPA algorithm is applied with a likelihood dispersion factor of $\sigma=0.1$ and a scaling constant $g_{c}=\nicefrac{{g(\mathbf{0})}}{{5}}$ , as outlined in Section 3.1. The discovery stage uses $N_{\text{level}}=300$ and $p_{0}=0.1$ , generating 6 rare event seeds for the pCN sampler, with each chain running for $L_{\text{chain}}=100$ iterations. Inverse importance sampling (IIS) is employed with $M=200$ samples. Fig. 9 visualizes the limit-state surface, the constructed sampling target, and the evolution of samples in the rare event domain discovery process, highlighting the effectiveness of guided pCN-ASTPA in accurately capturing the influential rare event region. In contrast, Fig. 9(e) demonstrates the limitations of sequential methods like SuS, which struggle to efficiently identify the rare event mode with a competitive number of model evaluations. These rare event domain discovery limitations are more pronounced in iCE-IS in this example, which exhibits a significant error in the estimated mean of the rare event probability. As presented in Table 4, the guided pCN-ASTPA achieves an accurate rare event probability estimate, $\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$ , with significantly fewer model evaluations and a lower Coefficient of Variation (C.o.V) compared to the other methods. This underscores the effectiveness of the pCN-ASTPA framework in tackling problems with complex topological changes, where traditional sequential methods often underperform.

Table 4: Example 4: Performance of various methods for the limit-state surface with changing topological parameter space.

Eq. 27	500 Simulations	MCS	pCN-ASTPA	iCE-IS	SIS	aCS-SuS
Eq. 27	500 Simulations	MCS	( $\sigma=0.1,\,q=5$ )	iCE-IS	SIS	aCS-SuS
$d=2$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E9	1,370	14,744	13,542	14,597
	C.o.V	0.01	0.11 $\color[rgb]{0,0.88,0}($ 0.13 $\color[rgb]{0,0.88,0})$	0.09	3.33	1.92
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	1.13E-5	1.10E-5	1.92E-8	1.32E-5	1.26E-5

6.5 Example 5: A thirty four-story structural example

This example analyzes a thirty-four-story structure, adapted from [55], as represented in Fig. 10. The structure is subjected to thirty four static loads $F_{i}$ , for $i=1,2,\dots,34$ . The floor slabs/beams are assumed to be rigid, and all columns have the same height, $H=4$ m, but different flexural stiffnesses $EI_{k}$ , for $k=1,2,\dots,68$ . Both the loads and stiffnesses are treated as random variables, resulting in a total of $d=102$ random variables. The loads are modeled as normally distributed with a mean of 2 kN and a C.o.V. of 0.4. Similarly, the stiffnesses are normally distributed with a mean of 20 MNm² and a C.o.V. of 0.2. The top story displacement, $u$ , can be determined by summing the interstory displacements $u_{i}$ , expressed as:

u_{i}=\dfrac{\left(\sum_{j=i}^{34}F_{j}\right)H^{3}}{12\left(\sum_{k=2i-1}^{2i}EI_{k}\right)},\quad g=Y_{0}-\sum_{i=1}^{34}u_{i}

(28)

The limit-state function $g$ indicates failure (a rare event) when the top story displacement $u$ exceeds a threshold value $Y_{0}$ , set to 0.22, 0.23, and 0.235 m for different cases, to examine different levels of failure probability. All random variables are transformed into the independent standard Gaussian space for the analysis. Since $g(\bm{0})<2$ , as discussed in Section 3.1, scaling is performed with $g_{c}=\nicefrac{{g(\mathbf{0})}}{{4}}$ . The discovery stage is performed with $N_{\text{level}}=300$ and $p_{0}=0.2$ to effectively identify the rare event seeds for 5 pCN chains ( $N_{\text{chains}}=5$ ), each with a chain length of $1,000$ samples ( $L_{\text{chain}}=1,000$ ). The inverse importance sampling (IIS) is performed with $2,000$ resampled points ( $M=2,000$ ). Table 5 highlights the outstanding performance of the guided pCN-ASTPA in this high-dimensional problem. Our framework consistently provides accurate failure probability estimates with a low Coefficient of Variation (C.o.V.), while requiring significantly fewer total model evaluations compared to the other methods considered.

Table 5: Example 5: Performance of various methods for the thirty four-story structure.

Eq. 28	500 Simulations	MCS	pCN-ASTPA	iCE-IS	SIS	aCS-SuS
Eq. 28	500 Simulations	MCS	( $\sigma=0.3,\,q=4$ )	iCE-IS	SIS	aCS-SuS
$d=102$ $Y_{0}=0.22$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E8	7,540	7,570	10,082	9,200
	C.o.V	0.03	0.14 $\color[rgb]{0,0.88,0}($ 0.14 $\color[rgb]{0,0.88,0})$	0.55	0.60	0.24
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	2.41E-5	2.44E-5	1.93E-5	2.75E-5	2.42E-5
$d=102$ $Y_{0}=0.23$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E8	7,540	10,851	11,504	11,470
	C.o.V	0.09	0.22 $\color[rgb]{0,0.88,0}($ 0.20 $\color[rgb]{0,0.88,0})$	0.81	0.63	0.30
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	1.22E-6	1.22E-6	0.83E-6	1.25E-6	1.22E-6
$d=102$ $Y_{0}=0.235$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E8	7,540	11,833	12,782	12,815
	C.o.V	0.20	0.27 $\color[rgb]{0,0.88,0}($ 0.28 $\color[rgb]{0,0.88,0})$	0.85	1.32	0.35
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	2.46E-7	2.48E-7	1.55E-7	2.41E-7	2.41E-7

6.6 Example 6: Steel plate

In this example, we consider a low-carbon steel plate, as shown in Fig. 11 [56, 57]. The plate has dimensions of $2,000$ mm × $2,000$ mm, with a circular hole of radius 200 mm located at the center. Due to symmetry, only a quarter of the plate is analyzed. The Poisson ratio is assumed to be $\nu=0.3$ and the plate thickness is $t=10$ mm. A deterministic uniform load of $q=100\,\text{N/mm}^{2}$ is applied on the right edge of the plate. The elastic modulus $E(x,y)$ of the plate is modeled as a two-dimensional homogeneous normal random field with a mean value of $\mu_{E}=2\times 10^{5}\,\text{N/mm}^{2}$ and a standard deviation of $\sigma_{E}=2\times 10^{4}\,\text{N/mm}^{2}$ . The autocorrelation function of the random field $E(x,y)$ is given as:

\displaystyle\rho_{E}(\Delta x,\Delta y)=\exp\left[-\left(\dfrac{\Delta x}{l_{x}}\right)^{2}-\left(\dfrac{\Delta y}{l_{y}}\right)^{2}\right]

(29)

where $l_{x}$ and $l_{y}$ are correlation lengths, both set to 300 mm in this example. The stochastic field is described using the Karhunen-Loève (KL) expansion with 100 terms, implying 100 standard Gaussian random variables utilized within the KL expansion ( $d=100$ ).

The stress, strain, and displacement fields of the plate are governed by the Navier-Cauchy equilibrium equation [58], simplified under the plane stress assumption as:

G(x,y)\nabla^{2}\bm{u}(x,y)+\dfrac{E(x,y)}{2(1-\nu)}\nabla(\nabla\cdot\bm{u}(x,y))+\mathbf{f}=\mathbf{0}

(30)

where $G(x,y)$ is the shear modulus, defined as $E(x,y)/[2(1+\nu)]$ , $\bm{u}$ is the displacement vector, and $\mathbf{f}$ is the body force acting on the plate. This equation is solved using Finite Element Analysis (FEA) with a mesh of 368 four-node quadrilateral elements, as shown in Fig. 11. The limit-state function is defined as:

g(\bm{X})=410-\max(\sigma_{1})

(31)

where $\max(\sigma_{1})$ is the maximum first principal stress, with failure occurring when this stress exceeds the threshold of 410 MPa.

For the guided pCN-ASTPA implementation, 5 chains ( $N_{\text{chains}}=5$ ) are used, each with a chain length of $1,000$ iterations ( $L_{\text{chain}}=1,000$ ). The IIS step is performed with $2,000$ resampled points ( $M=2,000$ ). The likelihood dispersion factor is set to $\sigma=0.2$ , and the scaling constant $g_{c}=\nicefrac{{g(\mathbf{0})}}{{4}}$ is applied. The discovery stage utilizes $N_{\text{level}}=100$ and $p_{0}=0.2$ . The results in Table 6 demonstrate that the guided pCN-ASTPA and iCE-IS methods provide comparable performance in this example, with pCN-ASTPA being slightly better, and with both significantly outperforming SIS and aCS-SuS.

Table 6: Example 6: Performance of various methods for the plate example.

Eq. 31	500 Simulations	MCS	pCN-ASTPA	iCE-IS	SIS	aCS-SuS
Eq. 31	500 Simulations	MCS	( $\sigma=0.2,\,q=4$ )	iCE-IS	SIS	aCS-SuS
$d=100$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E7	7,247	7,611	11,605	12,996
	C.o.V	0.29	0.16 $\color[rgb]{0,0.88,0}($ 0.17 $\color[rgb]{0,0.88,0})$	0.21	1.22	0.31
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	1.04E-6	1.01E-6	1.02E-6	0.96E-6	1.01E-6

6.7 Example 7: Bimodal high-dimensional decic problem

To further investigate the performance of the proposed method in high-dimensional spaces with very significant nonlinearities, we consider the following challenging decic limit-state function in the standard normal space:

g(\bm{X})=\min\begin{cases}2.80-\frac{1}{\sqrt{d}}\ \sum_{i=1}^{200}X_{i}+f(\bm{X})\\[12.0pt] 2.80+\frac{1}{\sqrt{d}}\ \sum_{i=1}^{200}X_{i}+f(\bm{X})\end{cases},\text{where}\,f(\bm{X})=\big{(}\sum_{j=1}^{\gamma}X_{j}\big{)}^{2}+\exp\big{[}\big{(}\sum_{k=1}^{\gamma}X_{k}\big{)}^{7}\big{]}+\big{(}\sum_{l=1}^{\gamma}X_{l}\big{)}^{10}

(32)

where $\gamma$ controls the number of nonlinear terms, and hence the problem’s complexity. To construct the sampling target in ASTPA, the likelihood dispersion factor is set to $\sigma=0.5$ , and the scaling constant to $g_{c}=1$ . The discovery stage of the proposed framework employs $N_{\text{level}}=500$ and $p_{0}=0.2$ . Following this stage, $20$ to $22$ pCN chains are used, each with a chain length ranging from $1,000$ to $1,200$ iterations. Higher values within this range are utilized in scenarios with greater nonlinearities to account for the increased sampling complexity. The inverse importance sampling (IIS) step employs $4,000$ to $7,000$ samples, with larger numbers chosen for more complex scenarios. As shown in Table 7, the guided pCN-ASTPA framework demonstrates remarkable adaptability to increasing nonlinearity, achieving the lowest coefficient of variation (C.o.V.) while requiring fewer model evaluations compared to other methods. In contrast, the iCE-IS method fails to converge with a comparable number of model calls in this example. The ability of the guided pCN-ASTPA to efficiently manage both high dimensionality and strong nonlinearity highlights its robustness and computational efficiency, making it particularly well-suited for complex reliability estimation problems involving intricate variable interactions.

Table 7: Example 7: Performance of various methods for the bimodal high-dimensional decic problem.

Eq. 32	500 Simulations	MCS	pCN-ASTPA	SIS	aCS-SuS
Eq. 32	500 Simulations	MCS	( $\sigma=0.6,\,q=\,$ n.a.)	SIS	aCS-SuS
$d=200$ $\gamma=10$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E7	25,434	38,488	28,526
	C.o.V	0.08	0.26 $\color[rgb]{0,0.88,0}($ 0.23 $\color[rgb]{0,0.88,0})$	3.39	0.76
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	1.02E-5	1.01E-5	1.20E-5	1.06E-5
$d=200$ $\gamma=15$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E7	26,568	41,921	29,946
	C.o.V	0.12	0.33 $\color[rgb]{0,0.88,0}($ 0.30 $\color[rgb]{0,0.88,0})$	1.66	1.26
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	6.66E-6	6.58E-6	6.42E-6	6.83E-6
$d=200$ $\gamma=20$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E7	29,630	43,886	31,007
	C.o.V	0.16	0.34 $\color[rgb]{0,0.88,0}($ 0.31 $\color[rgb]{0,0.88,0})$	1.86	1.56
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	4.51E-6	4.46E-6	4.41E-6	4.58E-6
$d=200$ $\gamma=25$	$\mathop{\mathbb{E}}[N_{\text{total}}]$	1.00E7	35,072	45,830	38,520
	C.o.V	0.18	0.33 $\color[rgb]{0,0.88,0}($ 0.33 $\color[rgb]{0,0.88,0})$	2.89	2.07
	$\mathop{\mathbb{E}}[\hat{p}_{\mathcal{F}}]$	3.12E-6	3.15E-6	3.24E-6	3.42E-6

7 Conclusions

In this work, we introduce a novel gradient-free framework for reliability and rare event probabilities estimation in complex, high-dimensional Gaussian spaces, particularly suited for black-box computational models where analytical gradients are not available. At the core of this framework is the foundational Approximate Sampling Target with Post-processing Adjustment (ASTPA) approach, which efficiently estimates rare event (failure) probabilities by constructing and sampling from an unnormalized target distribution, relaxing the optimal importance sampling density (ISD) [29]. This relaxation enables efficient sampling, especially in high-dimensional and highly nonlinear problem settings, with the generated samples employed to compute a shifted estimate of the sought probability. The inverse importance sampling (IIS) procedure subsequently refines the probability estimation by employing a fitted ISD using the already obtained samples to compute the target’s normalizing constant, avoiding scalability challenges.

While previous work has shown that integrating gradient-based samplers with ASTPA yields outstanding performance in both Gaussian and non-Gaussian spaces [4, 29], the absence of analytical gradients hinders the efficient applicability of such approaches. The proposed gradient-free framework addresses this limitation by advancing the preconditioned Crank-Nicolson (pCN) algorithm, incorporating a rare event domain discovery stage to significantly enhance the sampling process. The scale parameter of the pCN algorithm is adapted using the Robbins–Monro stochastic approximation algorithm to achieve a $20$ – $40\%$ average acceptance probability, which was proven effective in our numerical examples. Despite the dimension-robustness of pCN, its effectiveness depends on carefully initializing its chains in relation to the regions of interest, particularly in high-dimensional multimodal distributions. To address this, we developed a computationally efficient rare event domain discovery technique, conditionally sampling well-designed synthetic variants of the original distribution, thereby facilitating rapid diffusion toward relevant regions in the random variable space. Our investigation into selecting rare event (pCN) seeds (from the obtained rare event sample set) shows that weighted sampling is effective in low-dimensional cases, whereas uniform sampling is more suitable for initializing the pCN chains in high-dimensional, multimodal cases, where the target distributions become sensitive to even small perturbations. This approach significantly enhances the performance of the pCN algorithm in sampling the approximate target in ASTPA.

The guided pCN-based ASTPA methodology has been tested across a range of challenging Gaussian problems, including mathematically formulated test functions and engineering examples, such as a 34-story frame structure and a steel plate, with rare event probabilities as low as $10^{-8}$ . These mathematical limit-state functions are carefully designed to involve multimodality and up to decic nonlinear terms, representing challenging benchmarks. Our method consistently outperformed state-of-the-art gradient-free approaches, including improved Cross-Entropy Importance Sampling (iCE-IS), Sequential Importance Sampling (SIS), and adaptive Conditional Sampling Subset Simulation (aCS-SuS), both in computational cost and accuracy. Moreover, the ASTPA analytical coefficient of variation (C.o.V) demonstrated close agreement with empirical results, further validating the accuracy of this theoretical estimate.

This guided pCN-based ASTPA framework sets a new performance benchmark for gradient-free rare event probability estimation, offering a scalable and computationally efficient solution for high-dimensional, nonlinear, and multimodal problems. Future work will explore further extensions of the framework, including the development of gradient-free variants suited for direct applicability in non-Gaussian spaces and estimating high-dimensional first-passage problems under various settings [30].

References

[1] Y. Bai, A. B. Dieker, and H. Lam, “Curse of dimensionality in rare-event simulation,” in 2023 Winter Simulation Conference (WSC), pp. 375–384, IEEE, 2023.
[2] R. Y. Rubinstein and D. P. Kroese, Simulation and the Monte Carlo Method. John Wiley & Sons, 2016.
[3] G. L. Ang, A. H.-S. Ang, and W. H. Tang, “Optimal importance-sampling density estimator,” Journal of Engineering Mechanics, vol. 118, no. 6, pp. 1146–1163, 1992.
[4] E. Eshra, K. G. Papakonstantinou, and H. Nikbakht, “A direct importance sampling-based framework for rare event uncertainty quantification in non-Gaussian spaces,” arXiv preprint arXiv:2405.14149, 2025.
[5] G. I. Schuëller and R. Stix, “A critical appraisal of methods to determine failure probabilities,” Structural Safety, vol. 4, no. 4, pp. 293–309, 1987.
[6] S.-K. Au and J. L. Beck, “Estimation of small failure probabilities in high dimensions by Subset Simulation,” Probabilistic Engineering Mechanics, vol. 16, no. 4, pp. 263–277, 2001.
[7] N. Kurtz and J. Song, “Cross-entropy-based adaptive importance sampling using Gaussian mixture,” Structural Safety, vol. 42, pp. 35–44, 2013.
[8] Z. Wang and J. Song, “Cross-entropy-based adaptive importance sampling using von Mises-Fisher mixture for high dimensional reliability analysis,” Structural Safety, vol. 59, pp. 42–52, 2016.
[9] I. Papaioannou, S. Geyer, and D. Straub, “Improved cross entropy-based importance sampling with a flexible mixture model,” Reliability Engineering & System Safety, vol. 191, p. 106564, 2019.
[10] F. Uribe, I. Papaioannou, Y. M. Marzouk, and D. Straub, “Cross-entropy-based importance sampling with failure-informed dimension reduction for rare event simulation,” SIAM/ASA Journal on Uncertainty Quantification, vol. 9, no. 2, pp. 818–847, 2021.
[11] J. Demange-Chryst, F. Bachoc, J. Morio, and T. Krauth, “Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling,” arXiv preprint arXiv:2310.09194, 2023.
[12] S. Tong and G. Stadler, “Large deviation theory-based adaptive importance sampling for rare events in high dimensions,” SIAM/ASA Journal on Uncertainty Quantification, vol. 11, no. 3, pp. 788–813, 2023.
[13] M. Chiron, C. Genest, J. Morio, and S. Dubreuil, “Failure probability estimation through high-dimensional elliptical distribution modeling with multiple importance sampling,” Reliability Engineering & System Safety, vol. 235, p. 109238, 2023.
[14] S.-K. Au and J. L. Beck, “A new adaptive importance sampling scheme for reliability calculations,” Structural Safety, vol. 21, no. 2, pp. 135–158, 1999.
[15] S.-K. Au and J. L. Beck, “Important sampling in high dimensions,” Structural Safety, vol. 25, no. 2, pp. 139–163, 2003.
[16] A. Tabandeh, G. Jia, and P. Gardoni, “A review and assessment of importance sampling methods for reliability analysis,” Structural Safety, vol. 97, p. 102216, 2022.
[17] M. Ehre, I. Papaioannou, and D. Straub, “Stein variational rare event simulation,” arXiv preprint arXiv:2308.04971, 2023.
[18] T. Cui, S. Dolgov, and R. Scheichl, “Deep importance sampling using tensor trains with application to a priori and a posteriori rare events,” SIAM Journal on Scientific Computing, vol. 46, no. 1, pp. C1–C29, 2024.
[19] P. Del Moral, A. Doucet, and A. Jasra, “Sequential Monte Carlo samplers,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 3, pp. 411–436, 2006.
[20] I. Papaioannou, C. Papadimitriou, and D. Straub, “Sequential importance sampling for structural reliability analysis,” Structural Safety, vol. 62, pp. 66–75, 2016.
[21] A. Sinha, M. O’Kelly, R. Tedrake, and J. C. Duchi, “Neural bridge sampling for evaluating safety-critical autonomous systems,” Advances in Neural Information Processing Systems, vol. 33, pp. 6402–6416, 2020.
[22] J. Xian and Z. Wang, “Relaxation-based importance sampling for structural reliability analysis,” Structural Safety, vol. 106, p. 102393, 2024.
[23] K. M. Zuev, J. L. Beck, S.-K. Au, and L. S. Katafygiotis, “Bayesian post-processor and other enhancements of Subset Simulation for estimating failure probabilities in high dimensions,” Computers & Structures, vol. 92, pp. 283–296, 2012.
[24] Z. Wang, M. Broccardo, and J. Song, “Hamiltonian Monte Carlo methods for Subset Simulation in reliability analysis,” Structural Safety, vol. 76, pp. 51–67, 2019.
[25] M. D. Shields, D. G. Giovanis, and V. Sundar, “Subset Simulation for problems with strongly non-Gaussian, highly anisotropic, and degenerate distributions,” Computers & Structures, vol. 245, p. 106431, 2021.
[26] W. Chen, Z. Wang, M. Broccardo, and J. Song, “Riemannian Manifold Hamiltonian Monte Carlo based Subset Simulation for reliability analysis in non-Gaussian space,” Structural Safety, vol. 94, p. 102134, 2022.
[27] D. Thaler, S. L. Dhulipala, F. Bamer, B. Markert, and M. D. Shields, “Reliability analysis of complex systems using Subset Simulations with Hamiltonian Neural Networks,” arXiv preprint arXiv:2401.05244, 2024.
[28] K. Breitung, “The geometry of limit state function graphs and Subset Simulation: Counterexamples,” Reliability Engineering & System Safety, vol. 182, pp. 98–106, 2019.
[29] K. G. Papakonstantinou, H. Nikbakht, and E. Eshra, “Hamiltonian MCMC methods for estimating rare events probabilities in high-dimensional problems,” Probabilistic Engineering Mechanics, vol. 74, p. 103485, 2023.
[30] K. G. Papakonstantinou, E. Eshra, and H. Nikbakht, “Hamiltonian MCMC based framework for time-variant rare event uncertainty quantification,” in 14th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP), Dublin, Ireland, 2023.
[31] K. M. Zuev and L. S. Katafygiotis, “Modified Metropolis–Hastings algorithm with delayed rejection,” Probabilistic Engineering Mechanics, vol. 26, no. 3, pp. 405–412, 2011.
[32] I. Papaioannou, W. Betz, K. Zwirglmaier, and D. Straub, “MCMC algorithms for Subset Simulation,” Probabilistic Engineering Mechanics, vol. 41, pp. 89–103, 2015.
[33] A. Mira, “On Metropolis–Hastings algorithms with delayed rejection,” Metron, vol. 59, no. 3-4, pp. 231–241, 2001.
[34] F. Miao and M. Ghosn, “Modified Subset Simulation method for reliability analysis of structural systems,” Structural Safety, vol. 33, no. 4-5, pp. 251–260, 2011.
[35] A. Santoso, K. Phoon, and S. Quek, “Modified Metropolis–Hastings algorithm with reduced chain correlation for efficient Subset Simulation,” Probabilistic Engineering Mechanics, vol. 26, no. 2, pp. 331–341, 2011.
[36] R. M. Neal, “Regression and classification using Gaussian process priors,” Bayesian Statistics, vol. 6, pp. 475–501, 1998.
[37] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White, “MCMC methods for functions: modifying old algorithms to make them faster,” Statistical Science, pp. 424–446, 2013.
[38] M. Hairer, A. M. Stuart, and S. J. Vollmer, “Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions,” The Annals of Applied Probability, vol. 24, no. 6, pp. 2455 – 2490, 2014.
[39] Z. Hu, Z. Yao, and J. Li, “On an adaptive preconditioned Crank–Nicolson MCMC algorithm for infinite dimensional Bayesian inference,” Journal of Computational Physics, vol. 332, pp. 492–503, 2017.
[40] B. Carrera and I. Papaioannou, “Covariance-based MCMC for high-dimensional Bayesian updating with Sequential Monte Carlo,” Probabilistic Engineering Mechanics, vol. 77, p. 103667, 2024.
[41] D. Rudolf and B. Sprungk, “On a generalization of the preconditioned Crank–Nicolson Metropolis algorithm,” Foundations of Computational Mathematics, vol. 18, pp. 309–343, 2018.
[42] S.-K. Au and E. Patelli, “Rare event simulation in finite-infinite dimensional space,” Reliability Engineering & System Safety, vol. 148, pp. 67–77, 2016.
[43] S. Xiao and W. Nowak, “Failure probability estimation with failure samples: An extension of the two-stage Markov chain Monte Carlo simulation,” Mechanical Systems and Signal Processing, vol. 212, p. 111300, 2024.
[44] M. Hohenbichler and R. Rackwitz, “Non-normal dependent vectors in structural safety,” Journal of the Engineering Mechanics Division, vol. 107, no. 6, pp. 1227–1238, 1981.
[45] A. Der Kiureghian and P.-L. Liu, “Structural reliability under incomplete probability information,” Journal of Engineering Mechanics, vol. 112, no. 1, pp. 85–104, 1986.
[46] R. Lebrun and A. Dutfoy, “An innovating analysis of the Nataf transformation from the copula viewpoint,” Probabilistic Engineering Mechanics, vol. 24, no. 3, pp. 312–320, 2009.
[47] I. Papaioannou and D. Straub, “Efficient sampling of non-Gaussian priors in high dimensions,” in Proc. 12th International Conference on Structural Safety & Reliability (ICOSSAR), Vienna, Austria, 2017.
[48] K. G. Papakonstantinou, H. Nikbakht, and E. Eshra, “Quasi-Newton Hamiltonian MCMC sampling for reliability estimation in high-dimensional non-Gaussian spaces,” in Proc. 13th International Conference on Structural Safety & Reliability (ICOSSAR), Shanghai, China, 2022.
[49] H. Nikbakht and K. G. Papakonstantinou, “A direct Hamiltonian MCMC approach for reliability estimation,” in Proc. 3rd International Conference on Uncertainty Quantification in Computational Sciences and Engineering (UNCECOMP), Hersonissos, Greece, 2019.
[50] M. Betancourt, “A conceptual introduction to Hamiltonian Monte Carlo,” arXiv preprint arXiv:1701.02434, 2017.
[51] H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
[52] C. Andrieu and J. Thoms, “A tutorial on adaptive MCMC,” Statistics and Computing, vol. 18, pp. 343–373, 2008.
[53] Engineering Risk Analysis Group, “Reliability analysis tools,” 2024. Technical University of Munich. Available at https://github.com/ERA-Software/Overview.
[54] D. M. Himmelblau, Applied Nonlinear Programming. McGraw-Hill, 1972.
[55] C. Bucher, Computational Analysis of Randomness in Structural Mechanics: Structures and Infrastructures Book Series, vol. 3. CRC Press, 2009.
[56] I. Papaioannou, M. Ehre, and D. Straub, “PLS-based adaptation for efficient PCE representation in high dimensions,” Journal of Computational Physics, vol. 387, pp. 186–204, 2019.
[57] M. Ko and K. G. Papakonstantinou, “Quadratic point estimate method for probabilistic moments computation,” Probabilistic Engineering Mechanics, vol. 79, p. 103705, 2025.
[58] C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method. Dover Publications, 2012.


	(a) $\pi_{\bm{X}}$	(b) $h^{*}(\bm{X})$

(c) $\tilde{h}(\bm{X})$	(d) Samples $\sim\tilde{h}(\bm{X})$	(e) $Q(\bm{X})$	(f) Samples $\sim Q(\bm{X})$


(a) $\pi_{\bm{X}}=\mathcal{N}(\bm{X};\,\bm{0},\mathbf{I})$	(b) $\{\bm{x}_{i}^{0}\}_{i=1}^{300}\sim\pi_{\bm{X}}$	(c) $\{\bm{x}_{i}^{1}\}_{i=1}^{300}\sim\dfrac{1}{\pi_{\bm{X}\mid\mathcal{F}_{1}}}$

(d) $\{\bm{x}_{i}^{2}\}_{i=1}^{300}\sim\dfrac{1}{\pi_{\bm{X}\mid\mathcal{F}_{2}}}$	(e) Selected seeds


(a) $\mathcal{N}(\bm{X};\,\bm{0},4\mathbf{I})$	(b) $\{\bm{x}_{i}^{0}\}_{i=1}^{300}\sim\mathcal{N}(\bm{X};\,\bm{0},4\mathbf{I})$	(c) $\{\bm{x}_{i}^{1}\}_{i=1}^{300}\sim\dfrac{1}{\pi_{\bm{X}\mid\mathcal{F}_{1}}}$	(d) Selected seeds