This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Non-intrusive optimal experimental design for large-scale nonlinear Bayesian inverse problems using a Bayesian approximation error approach

Karina Koval111Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany ().    Ruanui Nicholson333Department of Engineering Science, University of Auckland, Auckland, New Zealand (). karina.koval@iwr.uni-heidelberg.de ruanui.nicholson@auckland.ac.nz
Abstract

We consider optimal experimental design (OED) for nonlinear inverse problems within the Bayesian framework. Optimizing the data acquisition process for large-scale nonlinear Bayesian inverse problems is a computationally challenging task since the posterior is typically intractable and commonly-encountered optimality criteria depend on the observed data. Since these challenges are not present in OED for linear Bayesian inverse problems, we propose an approach based on first linearizing the associated forward problem and then optimizing the experimental design. Replacing an accurate but costly model with some linear surrogate, while justified for certain problems, can lead to incorrect posteriors and sub-optimal designs if model discrepancy is ignored. To avoid this, we use the Bayesian approximation error (BAE) approach to formulate an A-optimal design objective for sensor selection that is aware of the model error. In line with recent developments, we prove that this uncertainty-aware objective is independent of the exact choice of linearization. This key observation facilitates the formulation of an uncertainty-aware OED objective function using a completely trivial linear map, the zero map, as a surrogate to the forward dynamics. The base methodology is also extended to marginalized OED problems, accommodating uncertainties arising from both linear approximations and unknown auxiliary parameters. Our approach only requires parameter and data sample pairs, hence it is particularly well-suited for black box forward models. We demonstrate the effectiveness of our method for finding optimal designs in an idealized subsurface flow inverse problem and for tsunami detection.

keywords:
optimal experimental design, Bayesian inverse problems, Bayesian approximation error, linearization
{AMS}

62F15, 65K05, 62-08, 62E17

1 Introduction

Many problems of interest in engineering and the natural sciences can be described by mathematical models involving partial differential equations (PDEs) or systems of ordinary differential equations. In various settings, however, it is parameters of the models, such as coefficients or boundary/initial conditions, which are of importance. These parameters often cannot be observed directly, and are instead estimated using observations of a related quantity as well as the governing mathematical equations. In typical applications, the observations are noisy and informative about a small subset of the unknown parameters, making parameter estimation an ill-posed problem. The Bayesian approach to parameter estimation (or inverse problems) is commonly used to deal with the ill-posedness of the problem. In the Bayesian approach, the solution to the inverse problem is a conditional distribution that enables quantifying the probability that the observed data originated from any candidate parameter choice.

The quality and quantity of the measurement data has a significant effect on the quality of the solution to the Bayesian inverse problem. Thus, it is crucial to guide data acquisition and choose experimental conditions that lead to informative observations. This requires solving an optimal experimental design (OED) problem [1, 13]. While the OED framework encompasses various different problem-specific design types, the subclass we focus on is that of sensor placement. Specifically, assuming measurements can be acquired at some set of sensors, the OED problem consists of choosing the most informative sensor subset from some candidate set.

Finding the optimal sensor placement is a particularly challenging problem if the parameter-to-observable (PTO) map is nonlinear. For nonlinear Bayesian inverse problems, the most informative design can be formulated as the optimizer of an expected utility function that evaluates the effectiveness of solving the inverse problem with data collected at any design choice. However, finding the analytic optimizer is generally impossible, and numerical approximation presents further difficulties. These stem from various factors — many utility functions (including the classical A- and D-optimality) are functions of the intractable posterior distribution, the expectation is taken with respect to the (typically unknown) conditional density for the data given the design (i.e., the evidence), and even approximate evaluation of the utility function requires a large number of costly forward (and potentially adjoint) PDE solves.

A commonly employed approximation to the expected utility function is obtained via linearization of the PTO map or through a Gaussian approximation to the posterior [4, 2, 56]. This is also the approach we take in this present work, however, we employ a global linearization (as opposed to local linearizations as in the aforementioned works) and incorporate the resulting model error into the Bayesian inverse problem using the Bayesian approximation error (BAE) approach [34, 36]. Focusing on the A-optimal criterion, we expand on the work of [49] and show that the optimal designs obtained using the global linear map are independent of the choice of linearization when model error is incorporated with the BAE approach. As a consequence, we present a scalable, derivative-free linearization approach for finding A-optimal sensor placements for nonlinear Bayesian inverse problems. We also extend our approach to OED under uncertainty, i.e., finding designs that are optimal for estimating some primary parameter(s) of interest while incorporating uncertainty in auxiliary nuisance parameters.

Related work

There has been a recent surge of interest in optimal experimental design for parameter estimation problems governed by uncertain mathematical models. Many of these references assume exact knowledge of the governing dynamics and focus on uncertainty due to unknown nuisance/auxiliary inputs to the model [5, 2, 23, 38], or misspecification of the statistical model [7]. However, in certain situations, the accurate forward model is unknown or prohibitively expensive to use in an optimization procedure, hence an approximate or surrogate model is used instead. The importance of accounting for model error in the context of Bayesian inversion has been well-established in the literature (see, e.g., [35, 37]). This importance naturally extends to the OED problem, and in [18], the authors explore OED under uncertainty from the perspective of designing observation operators that mitigate model error. In contrast, we focus on the classical OED problem of choosing experimental conditions that lead to optimal parameter inference and formulate an uncertainty-aware optimality criterion.

Model error-informed approximations to posterior distributions can be obtained in various ways, e.g., by employing Gaussian processes [37, 29], via error-corrected delayed acceptance Markov chain Monte Carlo (MCMC) [17], and through the Bayesian approximation error approach [36]. Here, we follow the latter technique for incorporating model error into the OED problem. The BAE approach is also used for sensor placement design problems in [2] to account for model error stemming from unknown nuisance parameters, however our formulation differs in the following key ways: (i) we use a global linear approximation to the parameter-to-observable map, rather than many local Gaussian approximations, (ii) our approach accommodates model error due to the use of a linear surrogate as well as auxiliary input parameters, and (iii) we employ the full BAE error approach, rather than the so-called enhanced error model. The latter point facilitates a derivative-free approach to linearized OED. An alternative derivative-free approach to OED under model uncertainty is presented in [20], where D-optimal designs are obtained using an ensemble-based approximation to the posterior covariance matrix. In their approach, which is based on a calibrate-emulate-sample algorithm, a cheap-to-evaluate emulator and the corresponding model error correction term are learned using Gaussian processes.

Contributions

The main contributes of this article are: (i) We propose a tractable, uncertainty-aware approach for optimal sensor placement in infinite-dimensional Bayesian inverse problems by substituting the costly accurate forward model with an inexpensive linear surrogate and incorporating the corresponding model error using the BAE approach. (ii) Expanding on the work of [49], we show that the resulting approximate uncertainty-aware OED objective is asymptotically independent of the specific choice of linear surrogate. This result is key in formulating a black-box OED algorithm that only requires sample parameter and data pairs. (iii) The base methodology and theorems are also extended to marginalized OED, where the unknowns include some primary parameters of interest as well as some unimportant (or uninteresting) auxiliary inputs. (iv) With two model problems, we present comprehensive computational studies that illustrate the effectiveness of our approach for both standard and marginalized OED. In particular, we showcase the necessity of incorporating model error into the design problem, and the effectiveness of the optimal designs computed using our non-intrusive approach in solving the original high-fidelity inverse problem.

Limitations

Of course, the approach presented also has some limitations, which include: (i) Similar to any sort of linearization-based approximation procedure, our approach may not be well-suited for Bayesian inverse problems involving highly nonlinear PTO maps or non-Gaussian priors. While our approach does not place explicit restrictions on the prior, there is no guarantee that the resulting designs are effective in solving the original problem. However, if a the problem is not well-approximated by a Gaussian, the uncertainty-informed approximate posterior can still be used in more accurate (hence more costly) OED schemes as, e.g., a proposal density for MCMC-based methods, or as a reference density for transport-based approaches. (ii) Independence of the approximate posterior to the choice of linear surrogate is only guaranteed asymptotically, thus sufficient approximation to the second-order statistics may require a potentially large number of expensive PDE solves, particularly if the linear surrogate is a poor approximation to the true dynamics (as is the case when using the zero operator). This challenge is not exclusive to our methodology, since many OED algorithms rely on expensive offline PDE solves. However, if the number of samples required is prohibitively expensive and the PTO map is differentiable, using the derivative as a control variate could speed up convergence to the true error statistics. A theoretical study of the surrogate-dependent statistical convergence could help develop a better understanding of the suitability and limitations of our approach. (iii) In the current paper the methods used are based on explicit formulation of the associated covariance operators/matrices, and as such could be become infeasible for large problems. However, low-rank and other matrix representations, or potentially matrix-free variants, could avoid this potential bottleneck.

Outline of the paper

In Section 2 we introduce notation used throughout the paper and outline relevant background material on infinite-dimensional Bayesian inverse problems, and the Bayesian approximation error approach. In Section 3, we outline our uncertainty-aware linearize-then-optimize approach to optimal experimental design and show that the resulting objective is independent of the choice of linear surrogate. As shown in Section 3.2, this independence also extends to marginalized OED under model error, i.e., in this section, we consider OED for Bayesian inverse problems where the model error stems from the use of a linear surrogate and the presence of auxiliary unknown inputs to the forward model. The effectiveness of our proposed method is illustrated in Section 4 and Section 5, where OED is considered for an idealized subsurface flow geometric inverse problem and a tsunami source detection problem, respectively.

2 Background

In this section, we motivate our approach for carrying out derivative-free linearized OED for large-scale PDE-constrained problems. We begin by introducing the required notation and preliminaries in Section 2.1. We then briefly recall the Bayesian approach to inverse problems and the Bayesian approximation error approach.

2.1 Preliminaries

In this article we are interested in parameters which take values in possibly infinite-dimensional Hilbert spaces. For a Hilbert space \mathscr{H}, the corresponding inner product is denoted by ,\langle\cdot,\cdot\rangle_{\mathscr{H}} and the associated norm by \left\|\cdot\right\|_{\mathscr{H}}. For 1\mathscr{H}_{1} and 2\mathscr{H}_{2} Hilbert spaces, we let (1,2)\mathcal{L}(\mathscr{H}_{1},\mathscr{H}_{2}) denote the space of bounded linear operators from 1\mathscr{H}_{1} to 2\mathscr{H}_{2}.

In this article we are particularly interested in Gaussian measures on Hilbert spaces. To this end, throughout the article we use 𝒩(v0,𝒞vv)\mathcal{N}(v_{0},\mathcal{C}_{vv}) to denote a Gaussian measure with mean v0v_{0} and covariance operator 𝒞vv\mathcal{C}_{vv} . In the infinite dimensional case, the covariance operator is required to satisfy certain regularity assumptions to ensure the Bayesian inverse problem is well-defined [53]. As such, we assume 𝒞vv\mathcal{C}_{vv} is a strictly positive self-adjoint trace-class operator. We recall the Gaussian measure 𝒩(v0,𝒞vv)\mathcal{N}(v_{0},\mathcal{C}_{vv}) on \mathscr{H} induces the Cameron-Martin space range(𝒞vv1/2)\mathscr{E}\coloneqq\text{range}(\mathcal{C}_{vv}^{1/2}), which is endowed with the inner product u,w𝒞vv1/2u,𝒞vv1/2w\langle u,w\rangle_{\mathscr{E}}\coloneqq\langle\mathcal{C}_{vv}^{-1/2}u,\mathcal{C}_{vv}^{-1/2}w\rangle_{\mathscr{H}} for all u,wu,w\in\mathscr{E}.

For 1\mathscr{H}_{1} and 2\mathscr{H}_{2} real Hilbert spaces, and for a linear operator 𝒜(1,2)\mathcal{A}\in\mathcal{L}(\mathscr{H}_{1},\mathscr{H}_{2}), we denote by 𝒜(2,1)\mathcal{A}^{\ast}\in\mathcal{L}(\mathscr{H}_{2},\mathscr{H}_{1}) the adjoint. Moreover, for u1u\in\mathscr{H}_{1} and w2w\in\mathscr{H}_{2}, we define the (outer product) operator uwu\otimes w by (uw)v=w,v2u(u\otimes w)v=\langle w,v\rangle_{\mathcal{H}_{2}}u, for any v2v\in\mathscr{H}_{2}. The (cross-)covariance operator (between uu and ww) is then defined

(1) 𝒞uw=𝔼[(uu0)(ww0)]=𝒞wu.\mathcal{C}_{uw}=\mathbb{E}[(u-u_{0})\otimes(w-w_{0})]=\mathcal{C}_{wu}^{\ast}.

2.2 Bayesian inverse problems

The discussion presented here is carried out in the possibly infinite-dimensional Hilbert space setting [53]. We consider the problem of inferring an unknown parameter vv\in\mathscr{H} given observations 𝐝d\mathbf{d}\in\mathbb{R}^{d} that are related to vv as

(2) 𝐝=𝒢(v)+𝐞,\mathbf{d}=\mathcal{G}(v)+\mathbf{e},

where 𝐞d\mathbf{e}\in\mathbb{R}^{d} denotes measurement noise. In our target applications, the (potentially) nonlinear parameter-to-observable map 𝒢:d\mathcal{G}:\mathscr{H}\rightarrow\mathbb{R}^{d} can be decomposed into the composition of two operators, 𝒢()(𝒱𝒮)()\mathcal{G}(\cdot)\coloneqq(\mathcal{V}\circ\mathcal{S})(\cdot), where the parameter-to-state map 𝒮:𝒳\mathcal{S}:\mathscr{H}\rightarrow\mathcal{X} (for some suitably-chosen function space 𝒳\mathcal{X}) requires solving a partial differential equation (PDE), and 𝒱:𝒳d\mathcal{V}:\mathcal{X}\rightarrow\mathbb{R}^{d} denotes a linear spatio-temporal observation operator.

In the Bayesian approach for parameter estimation, vv is treated as a random variable and is endowed with a prior probability law (μv\mu_{v}) that encompasses any knowledge we have about vv prior to data collection. Here we use a Gaussian prior 𝒩(v0,𝒞vv)\mathcal{N}(v_{0},\mathcal{C}_{vv}) with prior mean v0v_{0}\in\mathscr{E} and a trace-class covariance operator 𝒞vv\mathcal{C}_{vv}. In practice it is often assumed that the data is corrupted by Gaussian noise which is independent of the parameter, i.e., 𝐞𝒩(𝐞0,𝚪𝐞)\mathbf{e}\sim\mathcal{N}(\mathbf{e}_{0},\mathbf{\Gamma_{e}}), where 𝚪𝐞d×d\mathbf{\Gamma_{e}}\in\mathbb{R}^{d\times d} is a symmetric-positive-definite (SPD) covariance matrix and 𝐞v\mathbf{e}\perp v.

Within the Bayesian framework, the solution to the inverse problem is the (data-informed) posterior law of v|𝐝v|\mathbf{d}, which we denote by μv|d\mu_{v|d}. The posterior law is established as a Radon-Nikodym derivative via Bayes’ law and takes the following form under our additive Gaussian noise model,

(3) dμv|ddμvπlike(𝐝|v),πlike(𝐝|v)exp[12𝐝(𝒢(v)+𝐞0)𝚪𝐞12].\frac{d\mu_{v|d}}{d\mu_{v}}\propto\pi_{\rm{like}}(\mathbf{d}|v),\quad\pi_{\rm{like}}(\mathbf{d}|v)\propto\exp\left[-\frac{1}{2}\left\|\mathbf{d}-\left(\mathcal{G}(v)+\mathbf{e}_{0}\right)\right\|_{\mathbf{\Gamma_{e}}^{\!\!-1}}^{2}\right].

Fully characterizing the posterior in the case of PDE-constrained inverse problems is generally infeasible. As a computationally practical alternative it is common to compute a Gaussian approximation to the posterior measure. A commonly utilized approximation is the so-called Laplace approximation [55], which approximates μv|d\mu_{v|d} as a Gaussian, μv|d𝒩(vMAP,𝒞post)\mathscr{L}_{\mu_{v|d}}\coloneqq\mathcal{N}(v_{\rm MAP},\mathcal{C}_{\rm post}), centered around the maximum a posteriori (MAP) estimate [27],

(4) vMAPargminv𝒥(𝐝,v),𝒥(𝐝,v)12𝐝(𝒢(v)+𝐞0)𝚪𝐞12+12vv0𝒞vv12.v_{\rm MAP}\coloneqq\operatorname*{arg\,min}_{v\in\mathscr{E}}\mathcal{J}(\mathbf{d},v),\quad\mathcal{J}(\mathbf{d},v)\coloneqq\frac{1}{2}\left\|\mathbf{d}-\left(\mathcal{G}(v)+\mathbf{e}_{0}\right)\right\|_{\mathbf{\Gamma_{e}}^{\!\!-1}}^{2}+\frac{1}{2}\left\|v-v_{0}\right\|_{\mathcal{C}_{vv}^{-1}}^{2}.

The approximate posterior covariance operator of the Laplace approximation is given by 𝒞post=vMAP1\mathcal{C}_{\rm post}=\mathcal{H}^{-1}_{v_{\rm MAP}}, where vMAP\mathcal{H}_{v_{\rm MAP}} denotes the Hessian of 𝒥\mathcal{J} evaluated at the MAP estimate.

Construction of the Laplace approximation presents many challenges. While the MAP estimator is guaranteed to exist under relatively mild assumptions on the PTO map and data misfit, uniqueness of the estimator can not be guaranteed in general for nonlinear Bayesian inverse problems (see [19] for details). Additional theoretical difficulties arise if the map 𝒢\mathcal{G} is not differentiable since the Laplace approximation requires access to derivatives of the PTO map. Constructing the Laplace approximation is also computationally costly. Finding the MAP estimator via minimization of (4) can involve many applications of the PTO map and its adjoint. Additionally, the inverse Hessian defining the posterior covariance operators typically not available in explicit form and needs to be estimated with additional forward and adjoint solves using, e.g., optimal low-rank approximations as described in [52]. This computational cost is exacerbated when one seeks optimal designs for nonlinear Bayesian inverse problems. As discussed in Section 3, OED in the nonlinear setting typically requires solving many Bayesian inverse problems. Thus, typical approaches (e.g., [4]) involve building many Gaussian approximations.

An alternative approach to obtaining a Gaussian approximation to the posterior distribution that circumvents some of the aforementioned challenges is to employ a linear surrogate to the PTO map. We leave the precise form of the linear (affine) surrogate arbitrary for now, and denote the linear map with (,d)\mathcal{F}\in\mathcal{L}(\mathscr{H},\mathbb{R}^{d}). Naturally, replacing the nonlinear map with some linear approximation leads to errors that need to be accounted for to avoid erroneous overconfidence and bias in the resulting Gaussian posterior. These model approximation errors can be incorporated using the Bayesian approximation error (BAE) approach [34, 36].

Before outlining the BAE approach to account for the resulting approximation errors, we recall the explicit results of the linear Gaussian case.

The linear Gaussian case

In the case of a linear PTO map, i.e.,

𝐝=v+𝐞\displaystyle\mathbf{d}=\mathcal{F}v+\mathbf{e}

for (,d)\mathcal{F}\in\mathcal{L}(\mathscr{H},\mathbb{R}^{d}), under Gaussian prior and additive Gaussian noise, the posterior measure is Gaussian, with conditional mean and conditional covariance given, respectively, by (see [53, Example 6.23])

(5) v0|d\displaystyle v_{0|d} =𝒞v|d(𝚪𝐞1(𝐝𝐞0)+𝒞vv1v0),𝒞v|d=(𝒞vv1+𝚪𝐞1)1.\displaystyle=\mathcal{C}_{v|d}\left(\mathcal{F}^{\ast}\mathbf{\Gamma_{e}}^{\!\!-1}(\mathbf{d}-\mathbf{e}_{0})+\mathcal{C}_{vv}^{-1}v_{0}\right),\quad\mathcal{C}_{v|d}=(\mathcal{C}_{vv}^{-1}+\mathcal{F}^{\ast}\mathbf{\Gamma_{e}}^{\!\!-1}\mathcal{F})^{-1}.

2.3 The Bayesian approximation error approach

Neglecting the model errors and uncertainties induced by the use of a surrogate to the PTO map in general leads to overconfidence in biased estimates [36, 34]. To avoid this, we employ the BAE approach, which has been used for incorporating various types of model uncertainties into the Bayesian inverse problem as well as the OED problem (see, e.g., [6, 48, 26, 2, 2]). Here we outline the BAE approach as it pertains to our method.

As alluded to in the preceding section, the typical approach to OED within the Bayesian framework requires repeatedly solving the optimization problem (4) and constructing the (local) Laplace approximation***An alternative (approximate) approach is proposed in [56] (see Section 3.4) where the Laplace approximation is carried out at samples from the prior, thus negating the need to compute any MAP estimates. However, this approximate approach is not well-suited for highly ill-posed inverse problems.. While using a surrogate (or reduced order model approximation) to the PTO map can reduce the computational costs associated with computing the MAP estimate, the cost for accurate Monte Carlo (MC) approximation of the optimality criterion can still be significant since it requires approximating the MAP estimate for many data samples.

In what follows, we let :d\mathcal{F}:\mathscr{H}\to\mathbb{R}^{d} denote a surrogate model. The starting point for the BAE approach is to rewrite the accurate relationship between the data and parameters (2) as

(6) 𝐝\displaystyle\mathbf{d} =𝒢(v)+𝐞=(v)+𝜺(v)+𝐞=(v)+𝜼(v),\displaystyle=\mathcal{G}(v)+\mathbf{e}=\mathcal{F}(v)+\boldsymbol{\boldsymbol{\varepsilon}}(v)+\mathbf{e}=\mathcal{F}(v)+\boldsymbol{\eta}(v),

where d𝜺(v)𝒢(v)(v)\mathbb{R}^{d}\ni\boldsymbol{\boldsymbol{\varepsilon}}(v)\coloneqq\mathcal{G}(v)-\mathcal{F}(v) is the approximation error, and d𝜼(v)𝜺(v)+𝐞\mathbb{R}^{d}\ni\boldsymbol{\eta}(v)\coloneqq\boldsymbol{\boldsymbol{\varepsilon}}(v)+\mathbf{e} is the total error. At this point, a conditional Gaussian approximation for the approximation error is made, i.e.,

(7) π𝜺|v(𝜺|v)𝒩(𝜺0|v,𝚪ε|v)𝜺(v),\displaystyle\pi_{\boldsymbol{\varepsilon}|v}(\boldsymbol{\varepsilon}|v)\approx\mathcal{N}(\boldsymbol{\varepsilon}_{0|v},\mathbf{\Gamma}_{\varepsilon|v})\sim\boldsymbol{\boldsymbol{\varepsilon}}(v),

with (where we denote 𝜺0𝔼𝜺\boldsymbol{\varepsilon}_{0}\coloneqq\mathbb{E}\boldsymbol{\varepsilon})

𝜺0|v=𝜺0+𝒞εv𝒞vv1(vv0),𝚪ε|v=𝚪εε𝒞εv𝒞vv1𝒞vε,\displaystyle\boldsymbol{\varepsilon}_{0|v}=\boldsymbol{\varepsilon}_{0}+\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}(v-v_{0}),\quad\mathbf{\Gamma}_{\varepsilon|v}=\mathbf{\Gamma}_{\varepsilon\varepsilon}-\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}\mathcal{C}_{v\varepsilon},

where 𝒞vε\mathcal{C}_{v\varepsilon} and 𝒞εv\mathcal{C}_{\varepsilon v} defined as in (1). As a direct consequence of (7) we have

(8) π𝜼|v(𝜼|v)𝒩(𝜼0|v,𝚪η|v),𝜼0|v=𝐞0+𝜺0|v,𝚪η|v=𝚪𝐞+𝚪ε|v\displaystyle\pi_{\boldsymbol{\eta}|v}(\boldsymbol{\eta}|v)\approx\mathcal{N}(\boldsymbol{\eta}_{0|v},\mathbf{\Gamma}_{\eta|v}),\quad\boldsymbol{\eta}_{0|v}=\mathbf{e}_{0}+\boldsymbol{\varepsilon}_{0|v},\quad\mathbf{\Gamma}_{\eta|v}=\mathbf{\Gamma_{e}}+\mathbf{\Gamma}_{\varepsilon|v}

and we approximate Bayes’ law (3) by

(9) dμv|ddμvπ𝜼|v(𝜼|v)exp{12𝐝((v)+𝜼0|v)𝚪η|v12}.\frac{d\mu_{v|d}}{d\mu_{v}}\propto\pi_{\boldsymbol{\eta}|v}(\boldsymbol{\eta}|v)\propto\exp\left\{-\frac{1}{2}\left\|\mathbf{d}-\left(\mathcal{F}(v)+\boldsymbol{\eta}_{0|v}\right)\right\|_{\mathbf{\Gamma}_{\eta|v}^{-1}}^{2}\right\}.

In some applications of the BAE the further approximation 𝒞vε=0\mathcal{C}_{v\varepsilon}=0 is employed, which is often referred to as the enhanced error model [6, 34].

A particularly straight-forward choice of surrogate model is a (affine) linear model, i.e., take (,d)\mathcal{F}\in\mathcal{L}(\mathscr{H},\mathbb{R}^{d}). The standard choice for such a surrogate model is the (generalized) derivative of 𝒢\mathcal{G} evaluated at some nominal point vv_{\ast}\in\mathscr{H}, i.e., v=𝒢(v)+𝖣m𝒢(v)(vv)\mathcal{F}v=\mathcal{G}(v_{\ast})+\mathsf{D}_{m}\mathcal{G}(v_{\ast})(v-v_{\ast}). However, as long as the approximation error is taken into account using the BAE approach, the resulting (approximate) posterior is independent of the particular choice of linear surrogate, as stated in the following theorem from [49].

Theorem 1.

Let \mathscr{H} be a Hilbert space with vv\in\mathscr{H} and assume vv has prior measure μv\mu_{v} with mean and (trace-class) covariance operator given by v0v_{0} and 𝒞vv\mathcal{C}_{vv}, respectively. Suppose further that

𝐝=𝒢(v)+𝐞,\displaystyle\mathbf{d}=\mathcal{G}(v)+\mathbf{e},

where 𝒢:d\mathcal{G}:\mathscr{H}\rightarrow\mathbb{R}^{d} is the bounded PTO map, and 𝐞d\mathbf{e}\in\mathbb{R}^{d} has mean 𝐞0\mathbf{e}_{0} and covariance matrix 𝚪𝐞\mathbf{\Gamma_{e}}. Then the approximate likelihood model parameterized by (,d)\mathcal{F}\in\mathcal{L}(\mathscr{H},\mathbb{R}^{d}),

πlike(𝐝|v)exp{12Lη|v(𝐝v𝜼0|v)22},\displaystyle\pi^{\mathcal{F}}_{\rm{like}}(\mathbf{d}|v)\propto\exp{\left\{-\frac{1}{2}\left\|L_{\eta|v}(\mathbf{d}-\mathcal{F}v-\boldsymbol{\eta}_{0|v})\right\|_{2}^{2}\right\}},

where Lη|vTLη|v=𝚪η|v1L_{\eta|v}^{T}L_{\eta|v}=\mathbf{\Gamma}_{\eta|v}^{-1}, and

𝜼0|v=𝐞0+𝜺0+𝒞εv𝒞vv1(vv0),𝚪η|v=𝚪𝐞+𝚪εε𝒞εv𝒞vv1𝒞vε,\displaystyle\boldsymbol{\eta}_{0|v}=\mathbf{e}_{0}+\boldsymbol{\varepsilon}_{0}+\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}(v-v_{0}),\quad\mathbf{\Gamma}_{\eta|v}=\mathbf{\Gamma_{e}}+\mathbf{\Gamma}_{\varepsilon\varepsilon}-\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}\mathcal{C}_{v\varepsilon},

is independent of the choice of \mathcal{F}.

As an immediate consequence, using (5), we have the following corollary.

Corollary 1.

The resulting (approximate) posterior is Gaussian and independent of the choice of linear model, with conditional mean and covariance given by

(10) v0|d\displaystyle v_{0|d} =𝒞v|d(~𝚪η|v1(𝐝𝐞0𝜺0+𝒞εv𝒞vv1v0)+𝒞vv1v0),\displaystyle=\mathcal{C}_{v|d}(\tilde{\mathcal{F}}^{*}\mathbf{\Gamma}_{\eta|v}^{-1}(\mathbf{d}-\mathbf{e}_{0}-\boldsymbol{\varepsilon}_{0}+\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}v_{0})+\mathcal{C}_{vv}^{-1}v_{0}),
(11) 𝒞v|d\displaystyle\mathcal{C}_{v|d} =(𝒞vv1+~𝚪η|v1~)1,\displaystyle=(\mathcal{C}_{vv}^{-1}+\tilde{\mathcal{F}}^{*}\mathbf{\Gamma}_{\eta|v}^{-1}\tilde{\mathcal{F}})^{-1},

respectively, with (,d)~+𝒞εv𝒞vv1\mathcal{L}(\mathscr{H},\mathbb{R}^{d})\ni\tilde{\mathcal{F}}\coloneqq\mathcal{F}+\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}.

Remark 1.

The independence of the approximate posterior to the choice of linearization is a consequence of employing a linear surrogate map and accounting for model uncertainty using the (full) BAE approach. Specifically, employing the BAE approach introduces an affine (in vv) correction to the approximate forward model, namely 𝛆0+𝒞εv𝒞vv1(vv0)\boldsymbol{\varepsilon}_{0}+\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}(v-v_{0}). As such, any change in the linear surrogate map is canceled out/offset by the correction term. However, as seen in the proof of [49, Theorem 1], the cross-covariance term 𝒞εv\mathcal{C}_{\varepsilon v} is crucial for the result. If the cross-covariance is neglected (e.g., using the enhanced error model) the result would not follow.

In various inverse problems there are additional uncertain model parameters which are not estimated [34]. This is usually because aa) these parameters are not of interest, or bb) estimation of these parameters is too costly (or impossible). We will refer to these additional (to the primary parameters of interest) parameters as auxiliary parameters, though in the literature the terms nuisance parameters, secondary parameters, and latent parameters are also common. Neglecting the uncertainty in the auxiliary parameters while inferring the primary parameters typically results in misleading estimates and significantly underestimated uncertainty. The same can also be said for the OED process; neglecting the uncertainty in the auxiliary parameters while carrying out OED generally yields sub-optimal designs [2].

The BAE approach can be applied as a means to account for uncertainty in auxiliary parameters during both the inference and OED stages. Specifically, let v=(m,ξ)v=(m,\xi), with mm\in\mathscr{M} the primary parameter ξ𝒳\xi\in\mathscr{X} the auxiliary parameter. Then assuming the accurate PTO model is 𝒢:×𝒳d\mathcal{G}:\mathscr{M}\times\mathscr{X}\to\mathbb{R}^{d}, we can rewrite the relationship between the data and the primary parameters as (cf. (6))

(12) 𝐝\displaystyle\mathbf{d} =𝒢(m,ξ)+𝐞=(m)+𝜺(m,ξ)+𝐞=(m)+𝜼(m,ξ).\displaystyle=\mathcal{G}(m,\xi)+\mathbf{e}=\mathcal{F}(m)+\boldsymbol{\boldsymbol{\varepsilon}}(m,\xi)+\mathbf{e}=\mathcal{F}(m)+\boldsymbol{\eta}(m,\xi).

In keeping with the theme of the paper, we will be particularly interested in cases where the surrogate model is linear, i.e., (,d)\mathcal{F}\in\mathcal{L}(\mathscr{M},\mathbb{R}^{d}). For such cases the following corollary, outlining the independence of the approximate posterior to the choice of linear(-ized) model can be useful.

Corollary 2.

Let \mathscr{M} and 𝒳\mathscr{X} be Hilbert spaces with mm\in\mathscr{M} and ξ𝒳\xi\in\mathscr{X}. Assume (m,ξ)(m,\xi) has (joint) prior measure μm,ξ\mu_{m,\xi} with mean and (trace-class) covariance operator given by (m0,ξ0)(m_{0},\xi_{0}) and 𝒞m,ξ=(𝒞mm𝒞mξ𝒞ξm𝒞ξξ)\mathcal{C}_{m,\xi}=\begin{pmatrix}\mathcal{C}_{mm}&\mathcal{C}_{m\xi}\\ \mathcal{C}_{\xi m}&\mathcal{C}_{\xi\xi}\end{pmatrix}, respectively. Suppose further that

𝐝=𝒢(m,ξ)+𝐞,\displaystyle\mathbf{d}=\mathcal{G}(m,\xi)+\mathbf{e},

where 𝒢:×𝒳d\mathcal{G}:\mathscr{M}\times\mathscr{X}\rightarrow\mathbb{R}^{d} is the bounded PTO map, and 𝐞d\mathbf{e}\in\mathbb{R}^{d} has mean 𝐞0\mathbf{e}_{0} and covariance matrix 𝚪𝐞\mathbf{\Gamma_{e}}. Then the approximate Gaussian marginal posterior measure parameterized by (,d)\mathcal{F}\in\mathcal{L}(\mathscr{M},\mathbb{R}^{d}), with mean and covariance given by

m0|d=𝒞m|d(~Γη|m(𝐝𝐞0𝜼0|v+𝒞εm𝒞mm1m0)+𝒞mm1m0),𝒞m|η=(𝒞mm1+~𝒞η|m~)1,\displaystyle m_{0|d}^{\mathcal{F}}=\mathcal{C}_{m|d}\left(\tilde{\mathcal{F}}^{*}\Gamma_{\eta|m}(\mathbf{d}-\mathbf{e}_{0}-\boldsymbol{\eta}_{0|v}+\mathcal{C}_{\varepsilon m}\mathcal{C}_{mm}^{-1}m_{0})+\mathcal{C}_{mm}^{-1}m_{0}\right),\quad\mathcal{C}_{m|\eta}^{\mathcal{F}}=(\mathcal{C}_{mm}^{-1}+\tilde{\mathcal{F}}^{*}\mathcal{C}_{\eta|m}\tilde{\mathcal{F}})^{-1},

respectively, with (,d)~+𝒞ηm𝒞mm1\mathcal{L}(\mathscr{M},\mathbb{R}^{d})\ni\tilde{\mathcal{F}}\coloneqq\mathcal{F}+\mathcal{C}_{\eta m}\mathcal{C}_{mm}^{-1}, is independent of the choice of \mathcal{F}.

Computing the approximation error statistics

In general the (second order) statistics of the approximation error are not known a priori. As such, these are computed using Monte Carlo simulations. More specifically, an ensemble of samples v()=(m(),ξ())v^{(\ell)}=(m^{(\ell)},\xi^{(\ell)}) for =1,2,,q\ell=1,2,\dots,q are drawn from the joint prior and the associated approximation error is computed, i.e.,

(13) 𝜺()=𝒢(v())v().\displaystyle\boldsymbol{\varepsilon}^{(\ell)}=\mathcal{G}(v^{(\ell)})-\mathcal{F}v^{(\ell)}.

From this ensemble of samples the sample means and (cross-)covariances can be computed:

(14) 𝜺0\displaystyle\boldsymbol{\varepsilon}_{0} 1q=1q𝜺(),𝚪εε1q1=1q(𝜺()𝜺0)(𝜺()𝜺0),\displaystyle\approx\frac{1}{q}\sum_{\ell=1}^{q}\boldsymbol{\varepsilon}^{(\ell)},\quad\mathbf{\Gamma}_{\varepsilon\varepsilon}\approx\frac{1}{q-1}\sum_{\ell=1}^{q}(\boldsymbol{\varepsilon}^{(\ell)}-\boldsymbol{\varepsilon}_{0})\otimes(\boldsymbol{\varepsilon}^{(\ell)}-\boldsymbol{\varepsilon}_{0}),
(15) 𝒞εv\displaystyle\mathcal{C}_{\varepsilon v} 1q1=1q(𝜺()𝜺0)(v()v0),\displaystyle\approx\frac{1}{q-1}\sum_{\ell=1}^{q}(\boldsymbol{\varepsilon}^{(\ell)}-\boldsymbol{\varepsilon}_{0})\otimes(v^{(\ell)}-v_{0}),
(16) v0\displaystyle v_{0} 1q=1qv(),𝒞vv1q1=1q(v()v0)(v()v0).\displaystyle\approx\frac{1}{q}\sum_{\ell=1}^{q}v^{(\ell)},\quad\mathcal{C}_{vv}\approx\frac{1}{q-1}\sum_{\ell=1}^{q}(v^{(\ell)}-v_{0})\otimes(v^{(\ell)}-v_{0}).

As noted previously, the number of samples required to compute the required quantities in (14) and (15) could be reduced by employing a control variate approach [51, Chapter 2.3] with a judicious choice of control variate such as the (generalized) derivative.

3 A data-driven approach to optimal experimental designs

Here we present our tractable approach to approximating optimal designs for nonlinear Bayesian inverse problems. In Section 3.1 we outline relevant material on sensor placement design problems and their associated computational challenges. Our black-box approach that overcomes these challenges for OED and marginalized OED is presented in Section 3.2. In Section 3.3 we discuss an efficient numerical implementation of our method.

3.1 Optimal sensor placement for Bayesian inverse problems

Here we focus on inverse problems where data is collected at a set of sensors. In this setting, the OED goal is finding an optimal set of locations for sensor deployment in some prescribed measurement domain 𝒟\mathscr{D}. To this end, we fix a candidate set of locations {𝐬i}i=1s\{\mathbf{s}_{i}\}_{i=1}^{s} (with each 𝐬i𝒟\mathbf{s}_{i}\in\mathscr{D}) and define the design problem as that of finding an optimal subset of locations from this setThis discrete approach to sensor placement is common in OED literature [1], however a continuous formulation is also possible [47].. To distinguish between different designs or sensor arrangements, a binary weight wi{0,1}w_{i}\in\{0,1\} is assigned to each sensor location 𝐬i\mathbf{s}_{i}. If wi=1w_{i}=1, then data is collected using the sensor at location 𝐬i\mathbf{s}_{i}, whereas a weight of 0 implies no measurement is conducted at 𝐬i\mathbf{s}_{i}. For general Bayesian inverse problems, the optimal arrangement of kk sensors is then defined through the optimal weight vector 𝐰{0,1}s\mathbf{w}^{*}\in\{0,1\}^{s}, which minimizes an expected utility function, i.e.,

(17) 𝐰=argmin𝐰{0,1}s𝔼𝐝|𝐰[U(𝐰,𝐝)]s.t.i=1swi=k.\mathbf{w}^{*}=\operatorname*{arg\,min}_{\mathbf{w}\in\{0,1\}^{s}}\mathbb{E}_{\mathbf{d}|\mathbf{w}}\left[U(\mathbf{w},\mathbf{d})\right]\quad\text{s.t.}\sum_{i=1}^{s}w_{i}=k.

The utility function UU assesses the effectiveness of using any sensor combination (defined via the weight vector 𝐰{0,1}s\mathbf{w}\in\{0,1\}^{s}) in solving the Bayesian inverse problem with measurement data 𝐝d\mathbf{d}\in\mathbb{R}^{d}. Since the posterior measure depends on the data, minimizing the expectation of the utility function ensures that the chosen sensor placement works well on average for all possible realizations of the data. The utility function is typically problem-specific and some common choices for infinite-dimensional Bayesian OED are described in [1]. For illustrative purposes, we focus on the A-optimality criterion, though our approach could be extended to other utility functions. The kk-sensor A-optimal design minimizes the expected value of the average posterior pointwise variance. Letting 𝒞v|d,w(𝐰,𝐝)\mathcal{C}_{v|d,w}(\mathbf{w},\mathbf{d}) denote the posterior covariance operator at a fixed design 𝐰\mathbf{w} and measurement data 𝐝\mathbf{d}, the A-optimal design can be obtained by solving the minimization problem (17) with U(𝐰,𝐝)trace(𝒞v|d,w(𝐰,𝐝))U(\mathbf{w},\mathbf{d})\coloneqq\operatorname*{trace}(\mathcal{C}_{v|d,w}(\mathbf{w},\mathbf{d})).

For Bayesian inverse problems governed by nonlinear PTO maps or involving non-Gaussian prior measures, there is no closed-form expression for the data-dependent posterior covariance operator. Thus efficient techniques for approximating the expected utility are required. In the infinite-dimensional setting, the approximation is often carried out by combining a Gaussian approximation to the posterior measure with a sample average approximation (SAA) to the expectation, e.g., using a Laplace or Gauss-Newton approximation (see [4, 43]). As mentioned in Section 2.2, employing these techniques to approximate the SAA to the expected utility requires computing the MAP estimator for many data samples.

The linear Gaussian case

For linear Bayesian inverse problems with Gaussian priors and additive Gaussian noise, the A-optimality criterion simplifies, and an explicit formula is available. For notational convenience, we introduce a weight matrix 𝐖r×d\mathbf{W}\in\mathbb{R}^{r\times d}. In our target applications, we assume measurements can be obtained at each sensor for τ\tau\in\mathbb{N} different times, so r=kτr=k\tau (where ksk\leq s denotes the number of selected sensors) and d=τsd=\tau s. The matrix 𝐖\mathbf{W} is thus a block-diagonal matrix with each diagonal block 𝐏𝐰{0,1}k×s\mathbf{P}_{\mathbf{w}}\in\{0,1\}^{k\times s} defined as a sub-matrix of diag(𝐰)\text{diag}(\mathbf{w}) where the rows corresponding to zero weights have been removed (as described in [2]).

With this notation and the assumptions v𝒩(v0,𝒞vv)v\sim\mathcal{N}(v_{0},\mathcal{C}_{vv}) and 𝐞𝒩(𝐞0,𝚪𝐞)\mathbf{e}\sim\mathcal{N}(\mathbf{e}_{0},\mathbf{\Gamma_{e}}), the design-dependent relationship between the data and parameters,

𝐝(𝐰)=𝐖(v+𝐞),\mathbf{d}(\mathbf{w})=\mathbf{W}\left(\mathcal{F}v+\mathbf{e}\right),

induces a design-dependent Gaussian posterior μv|d,w\mu_{v|d,w} with conditional mean and covariance

(18) v0|d,w(𝐰,𝐝)\displaystyle v_{0|d,w}(\mathbf{w},\mathbf{d}) =𝒞v|w(𝚺(𝐰)(𝐝𝐞0)+𝒞vv1v0),𝒞v|w(𝐰)=(𝒞vv1+𝚺(𝐰))1,\displaystyle=\mathcal{C}_{v|w}\left(\mathcal{F}^{\ast}\boldsymbol{\Sigma}(\mathbf{w})\left(\mathbf{d}-\mathbf{e}_{0}\right)+\mathcal{C}_{vv}^{-1}v_{0}\right),\quad\mathcal{C}_{v|w}(\mathbf{w})=(\mathcal{C}_{vv}^{-1}+\mathcal{F}^{\ast}\boldsymbol{\Sigma}(\mathbf{w})\mathcal{F})^{-1},

where 𝚺(𝐰)𝐖T(𝐖𝚪𝐞𝐖T)1𝐖\boldsymbol{\Sigma}(\mathbf{w})\coloneqq\mathbf{W}^{T}\left(\mathbf{W}\mathbf{\Gamma_{e}}\mathbf{W}^{T}\right)^{-1}\mathbf{W}. For linear Bayesian inverse problems, the posterior covariance operator is independent of the observed data, i.e., the expectation in (17) is extraneous. Hence the A-optimal design can be found by minimizing the simplified criterion:

(19) 𝐰=argmin𝐰{0,1}strace[𝒞v|w(𝐰)],s.t.i=1swi=k.\mathbf{w}^{\ast}=\operatorname*{arg\,min}_{\mathbf{w}\in\{0,1\}^{s}}\operatorname*{trace}\left[\mathcal{C}_{v|w}(\mathbf{w})\right],\quad\text{s.t.}\sum_{i=1}^{s}w_{i}=k.

3.2 Invariance of the uncertainty-aware optimal design to linearization choice

In this section, we outline our approach to optimal sensor placement for Bayesian inverse problems where the accurate (but computationally costly) PTO map has been replaced with some linear surrogate. As in Section 2.3, with (,d)\mathcal{F}\in\mathcal{L}(\mathscr{H},\mathbb{R}^{d}) we denote the surrogate map that approximates the accurate nonlinear operator 𝒢:d\mathcal{G}:\mathscr{H}\rightarrow\mathbb{R}^{d}.

To incorporate model error into the design-dependent Bayesian inverse problem, we follow the BAE approach outlined in Section 2.3. The data measured at any subset of the candidate locations can be modeled by

𝐝(𝐰)=𝐖(𝒢(v)+𝐞)=𝐖(v+𝐞)+ε(v,𝐰)𝐖v+𝜼|v(𝐰),\displaystyle\mathbf{d}(\mathbf{w})=\mathbf{W}\left(\mathcal{G}(v)+\mathbf{e}\right)=\mathbf{W}\left(\mathcal{F}v+\mathbf{e}\right)+\varepsilon(v,\mathbf{w})\approx\mathbf{W}\mathcal{F}v+\boldsymbol{\eta}|v(\mathbf{w}),

where ε(v,𝐰)=𝐖(𝒢(v)v)\varepsilon(v,\mathbf{w})=\mathbf{W}\left(\mathcal{G}(v)-\mathcal{F}v\right), and the random variable 𝜼|v(𝐰)𝒩(𝐖𝜼0|v,𝐖𝚪η|v𝐖T)\boldsymbol{\eta}|v(\mathbf{w})\sim\mathcal{N}(\mathbf{W}\boldsymbol{\eta}_{0|v},\mathbf{W}\mathbf{\Gamma}_{\eta|v}\mathbf{W}^{T}) is used to approximate the total error at the chosen sensor locations. The total error mean 𝜼0|v\boldsymbol{\eta}_{0|v} and covariance 𝚪η|v\mathbf{\Gamma}_{\eta|v} are defined in (2.3) and (7) and can be estimated using Monte Carlo sampling as described in the end of Section 2.3. Since the design matrix enters linearly into the model, the design-dependent approximate posterior is also independent of the particular choice of linearization. That is, we can extend the results of 1 and 1 to design-dependent Bayesian inverse problems. This is done in the following corollary.

Corollary 3.

Let \mathscr{H} be a Hilbert space and assume vv\in\mathscr{H} has prior measure μv\mu_{v} with mean v0v_{0} and trace-class covariance operator 𝒞vv\mathcal{C}_{vv}. Assume that

𝐝(𝐰)=𝐖(𝒢(v)+𝐞),\mathbf{d}(\mathbf{w})=\mathbf{W}\left(\mathcal{G}(v)+\mathbf{e}\right),

where 𝒢:d\mathcal{G}:\mathscr{H}\rightarrow\mathbb{R}^{d} is a bounded PTO map, the matrix 𝐖𝐖(𝐰){0,1}r×d\mathbf{W}\coloneqq\mathbf{W}(\mathbf{w})\in\{0,1\}^{r\times d} is defined as described in Section 3.1 for any 𝐰{0,1}s\mathbf{w}\in\{0,1\}^{s}, and 𝐞𝒩(𝐞0,𝚪𝐞)\mathbf{e}\sim\mathcal{N}(\mathbf{e}_{0},\mathbf{\Gamma_{e}}).

Then for any 𝐰\mathbf{w}, the approximate posterior parameterized by some linear map (,d)\mathcal{F}\in\mathcal{L}(\mathscr{H},\mathbb{R}^{d}),

μv|d,w(𝐰,𝐝)=𝒩(v0|d,w(𝐰,𝐝),𝒞v|w(𝐰)),\mu^{\mathcal{F}}_{v|d,w}(\mathbf{w},\mathbf{d})=\mathcal{N}(v^{\mathcal{F}}_{0|d,w}(\mathbf{w},\mathbf{d}),\mathcal{C}^{\mathcal{F}}_{v|w}(\mathbf{w})),

where

𝒞v|w(𝐰)\displaystyle\mathcal{C}^{\mathcal{F}}_{v|w}(\mathbf{w}) =(𝒞vv1+~𝐖T(𝐖𝚪η|v𝐖T)1𝐖~)1,\displaystyle=\left(\mathcal{C}_{vv}^{-1}+\widetilde{\mathcal{F}}^{\ast}\mathbf{W}^{T}(\mathbf{W}\mathbf{\Gamma}_{\eta|v}\mathbf{W}^{T})^{-1}\mathbf{W}\widetilde{\mathcal{F}}\right)^{-1},
v0|d,w(𝐰,𝐝)\displaystyle v^{\mathcal{F}}_{0|d,w}(\mathbf{w},\mathbf{d}) =𝒞v|w(𝐰)(~𝐖T(𝐖𝚪η|v𝐖T)1𝐖𝐝~+𝒞vv1v0),\displaystyle=\mathcal{C}^{\mathcal{F}}_{v|w}(\mathbf{w})\left(\widetilde{\mathcal{F}}^{\ast}\mathbf{W}^{T}(\mathbf{W}\mathbf{\Gamma}_{\eta|v}\mathbf{W}^{T})^{-1}\mathbf{W}\widetilde{\mathbf{d}}+\mathcal{C}_{vv}^{-1}v_{0}\right),
~\displaystyle\widetilde{\mathcal{F}} =+𝒞εv𝒞vv1,𝐝~=𝐝𝜼0|v=𝐝𝐞0𝜺0+𝒞εv𝒞vv1v0,\displaystyle=\mathcal{F}+\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1},\quad\widetilde{\mathbf{d}}=\mathbf{d}-\boldsymbol{\eta}_{0|v}=\mathbf{d}-\mathbf{e}_{0}-\boldsymbol{\varepsilon}_{0}+\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}v_{0},
𝚪η|v\displaystyle\mathbf{\Gamma}_{\eta|v} =𝚪𝐞+𝚪εε𝒞εv𝒞vv1𝒞vε,\displaystyle=\mathbf{\Gamma_{e}}+\mathbf{\Gamma}_{\varepsilon\varepsilon}-\mathcal{C}_{\varepsilon v}\mathcal{C}_{vv}^{-1}\mathcal{C}_{v\varepsilon},

is independent of the choice of \mathcal{F}. In particular, for any 1,2(,d)\mathcal{F}_{1},\mathcal{F}_{2}\in\mathcal{L}(\mathscr{H},\mathbb{R}^{d}), we have:

trace[𝒞v|w1(𝐰)]=trace[𝒞v|w2(𝐰)].\operatorname*{trace}\left[\mathcal{C}^{\mathcal{F}_{1}}_{v|w}(\mathbf{w})\right]=\operatorname*{trace}\left[\mathcal{C}^{\mathcal{F}_{2}}_{v|w}(\mathbf{w})\right].

Proof.

Note that as a consequence of 1, ~\widetilde{\mathcal{F}}, 𝐝~\widetilde{\mathbf{d}} and 𝚪η|v\mathbf{\Gamma}_{\eta|v} are independent of the choice of \mathcal{F}, and the result thus follows.

An immediate consequence of the invariance of the A-optimality criterion to the specific linearization is that the uncertainty-aware A-optimal designs are also independent of the specific choice of surrogate map \mathcal{F}. Thus the trivial map, the zero operator 𝖮:v𝟎\mathsf{O}\colon v\mapsto\boldsymbol{0}, leads to the same optimal sensor placements as a more complicated tailored surrogate when model error is incorporated using the BAE approach. This is the key insight that facilitates our greedy, derivative-free approach to optimal experimental design, which we describe in Section 3.3.

Remark 2.

As mentioned previously, in practice, the results in 1 and thus 3 only hold asymptotically as the number of samples used in the MC approximation of the model error statistics go to infinity. In particular, in [49], it was shown that the use of the zero map led to posteriors with underestimated pointwise variance when a small number of samples (typically <1000<1000) was used. Thus, if the cost of evaluating 𝒢\mathcal{G} significantly limits NN, it is advisable to use a surrogate \mathcal{F} that is highly correlated with 𝒢\mathcal{G}.

Uncertainty-aware marginalized OED

As alluded to in Section 2.3, in many inverse problems there may be uncertain auxiliary parameters which are not estimated. In such settings, designs are chosen to minimize uncertainty in the marginal posterior for the primary parameters-of-interest. It is straightforward to extend our uncertainty-aware approach to approximate optimal sensor placements for such marginalized design problems. Decompose v=(m,ξ)v=(m,\xi) into a primary parameter-of-interest mm\in\mathscr{M} and an auxiliary parameter ξ𝒳\xi\in\mathscr{X}, and let Π:\Pi_{\mathscr{M}}:\mathscr{H}\to\mathscr{M} denote linear a projection operator onto \mathscr{M}. The approximate marginal posterior measure, parameterized by some linear surrogate \mathcal{F}, is μm|d,w𝒩(Πv0|d,w,Π𝒞v|wΠ)𝒩(m0|d,w,𝒞m|d,w),\mu^{\mathcal{F}}_{m|d,w}\sim\mathcal{N}(\Pi_{\mathscr{M}}v^{\mathcal{F}}_{0|d,w},\Pi_{\mathscr{M}}\mathcal{C}^{\mathcal{F}}_{v|w}\Pi_{\mathscr{M}}^{\ast})\eqqcolon\mathcal{N}(m_{0|d,w}^{\mathcal{F}},\mathcal{C}^{\mathcal{F}}_{m|d,w}), and the optimal sensor placements for the marginalized design problem thus satisfy

(20) 𝐰=argmin𝐰{0,1}strace[𝒞m|d,w(𝐰)],s.t.i=1swi=k.\mathbf{w}^{*}=\operatorname*{arg\,min}_{\mathbf{w}\in\{0,1\}^{s}}\operatorname*{trace}\left[\mathcal{C}^{\mathcal{F}}_{m|d,w}(\mathbf{w})\right],\quad\text{s.t.}\sum_{i=1}^{s}w_{i}=k.

Since v0|d,wv^{\mathcal{F}}_{0|d,w} and 𝒞v|w\mathcal{C}^{\mathcal{F}}_{v|w} are independent of the choice of \mathcal{F} (using 3), as is the projection operator Π\Pi_{\mathscr{M}}, it is straightforward to see that the (approximate) optimal design for the marginalized problem is independent of the linear surrogate \mathcal{F}. As a consequence, the zero operator is also a valid surrogate for the marginalized OED problem.

3.3 Efficient computation of the greedy A-optimal designs

In this section, we outline a greedy procedure for computing A-optimal sensor placements for both design problems (19) and (20). Although we emphasize that our approach could be used with the zero map surrogate (and this is what we will use for the numerical examples in Section 4 and Section 5), for generality we present the procedure using an arbitrary linear surrogate map \mathcal{F}.

For the remainder of this section, let vn\textbf{v}\in\mathbb{R}^{n}, 𝐆:nd\mathbf{G}\colon\mathbb{R}^{n}\rightarrow\mathbb{R}^{d}, and 𝐅(n,d)\mathbf{F}\in\mathcal{L}(\mathbb{R}^{n},\mathbb{R}^{d}) denote the discretized parameter vv, accurate PTO map 𝒢\mathcal{G}, and linear surrogate PTO map \mathcal{F}, respectively. The discretized uncertainty-aware model is thus

𝐝(𝐰)=𝐖(𝐅v+𝜼|v),\displaystyle\mathbf{d}(\mathbf{w})=\mathbf{W}(\mathbf{F}\textbf{v}+\boldsymbol{\eta}|\textbf{v}),

where 𝜼|v𝒩(𝜼0|v,𝚪η|v)\boldsymbol{\eta}|\textbf{v}\sim\mathcal{N}(\boldsymbol{\eta}_{0|v},\mathbf{\Gamma}_{\eta|v}), with 𝜼0|v=𝐞0+𝜺0𝐂εv𝐂vv1(vv0)\boldsymbol{\eta}_{0|v}=\mathbf{e}_{0}+\boldsymbol{\varepsilon}_{0}-\mathbf{C}_{\varepsilon v}\mathbf{C}_{vv}^{-1}\left(\textbf{v}-\textbf{v}_{0}\right) and 𝚪η|v=𝚪𝐞+𝚪εε𝐂εv𝐂vv1𝐂vε\mathbf{\Gamma}_{\eta|v}=\mathbf{\Gamma_{e}}+\mathbf{\Gamma}_{\varepsilon\varepsilon}-\mathbf{C}_{\varepsilon v}\mathbf{C}_{vv}^{-1}\mathbf{C}_{v\varepsilon}. Here, the discretized statistics v0,𝜺0,𝐂vv,𝐂vε=𝐂εv\textbf{v}_{0},\,\boldsymbol{\varepsilon}_{0},\,\mathbf{C}_{vv},\,\mathbf{C}_{v\varepsilon}=\mathbf{C}^{\ast}_{\varepsilon v}, and 𝚪εε\mathbf{\Gamma}_{\varepsilon\varepsilon} are computed via Monte Carlo (as in (14)) using the discretized operators 𝐅\mathbf{F} and 𝐆\mathbf{G}. Letting 𝐅~=𝐅+𝐂εv𝐂vv1\widetilde{\mathbf{F}}=\mathbf{F}+\mathbf{C}_{\varepsilon v}\mathbf{C}_{vv}^{-1}, the corresponding discretized posterior covariance operator is then

(21) 𝐂v|d,w𝐅(𝐰)\displaystyle\mathbf{C}^{\mathbf{F}}_{v|d,w}(\mathbf{w}) =(𝐂vv1+𝐅~𝐖T(𝐖𝚪η|v𝐖T)1𝐖𝐅~)1\displaystyle=\left(\mathbf{C}_{vv}^{-1}+\widetilde{\mathbf{F}}^{\ast}\mathbf{W}^{T}\left(\mathbf{W}\mathbf{\Gamma}_{\eta|v}\mathbf{W}^{T}\right)^{-1}\mathbf{W}\widetilde{\mathbf{F}}\right)^{-1}
(22) =𝐂vv(𝐂vv𝐅+𝐂v𝜺)𝐖T𝐐(𝐖)𝐖(𝐅𝐂vv+𝐂𝜺v)\displaystyle=\mathbf{C}_{vv}-\left(\mathbf{C}_{vv}{\mathbf{F}}^{\ast}+\mathbf{C}_{v\boldsymbol{\varepsilon}}\right)\mathbf{W}^{T}\mathbf{Q}(\mathbf{W})\mathbf{W}\left(\mathbf{F}\mathbf{C}_{vv}+\mathbf{C}_{\boldsymbol{\varepsilon}v}\right)

where 𝐅~\widetilde{\mathbf{F}}^{\ast} denotes the adjoint of 𝐅~\widetilde{\mathbf{F}}, 𝐐(𝐖)[𝐖(𝚪η|v+𝐅~𝐂vv𝐅~)𝐖T]1\mathbf{Q}(\mathbf{W})\coloneqq\left[\mathbf{W}\left(\mathbf{\Gamma}_{\eta|v}+\widetilde{\mathbf{F}}\mathbf{C}_{vv}\widetilde{\mathbf{F}}^{\ast}\right)\mathbf{W}^{T}\right]^{-1} and the last equation follows from the Woodbury matrix identity and 𝐅~𝐂vv=𝐅𝐂vv+𝐂εv\widetilde{\mathbf{F}}\mathbf{C}_{vv}=\mathbf{F}\mathbf{C}_{vv}+\mathbf{C}_{\varepsilon v}.

Using the last equality and the cyclic property of the trace operator, we have

trace[𝐂v|d,w𝐅(𝐰)]=trace[𝐂vv]trace[𝐐(𝐖)𝐖(𝐅𝐂vv+𝐂εv)(𝐂vv𝐅+𝐂vε)𝐖T].\displaystyle\operatorname*{trace}\left[\mathbf{C}^{\mathbf{F}}_{v|d,w}(\mathbf{w})\right]=\operatorname*{trace}\left[\mathbf{C}_{vv}\right]-\operatorname*{trace}\left[\mathbf{Q}(\mathbf{W})\mathbf{W}\left(\mathbf{F}\mathbf{C}_{vv}+\mathbf{C}_{\varepsilon v}\right)\left(\mathbf{C}_{vv}{\mathbf{F}}^{\ast}+\mathbf{C}_{v\varepsilon}\right)\mathbf{W}^{T}\right].

Thus, letting 𝐊(𝐰)𝐐(𝐖)𝐖(𝐅𝐂vv+𝐂εv)(𝐂vv𝐅+𝐂vε)𝐖T\mathbf{K}(\mathbf{w})\coloneqq\mathbf{Q}(\mathbf{W})\mathbf{W}\left(\mathbf{F}\mathbf{C}_{vv}+\mathbf{C}_{\varepsilon v}\right)\left(\mathbf{C}_{vv}{\mathbf{F}}^{\ast}+\mathbf{C}_{v\varepsilon}\right)\mathbf{W}^{T} the A-optimal design satisfies

(23) 𝐰=argmax𝐰{0,1}strace[𝐊(𝐰)]s.t.i=1swi=k.\mathbf{w}^{*}=\operatorname*{arg\,max}_{\mathbf{w}\in\{0,1\}^{s}}\operatorname*{trace}\left[\mathbf{K}(\mathbf{w})\right]\quad\text{s.t.}\sum_{i=1}^{s}w_{i}=k.

This reformulation of the A-optimality criterion is computationally advantageous if ndn\gg d, which is often the case for inverse problems, particularly when dealing with discretizations of infinite-dimensional parameters vv.

Discrete marginalized A-optimal design problem

In the case where v=(m,ξ)v=(m,\xi), and we are interested in finding designs that optimize the marginalized A-optimality criterion (20), we again let v=[m,𝝃]\textbf{v}=[\textbf{m},\boldsymbol{\xi}], with mnm\textbf{m}\in\mathbb{R}^{n_{m}} and 𝝃nξ\boldsymbol{\xi}\in\mathbb{R}^{n_{\xi}} (with n=nm+nξn=n_{m}+n_{\xi}) denoting the discretized primary and auxiliary parameters, respectively. Additionally, we decompose 𝐅=[𝐅m𝐅ξ]{\mathbf{F}}=[\mathbf{F}_{m}\quad\mathbf{F}_{\xi}], 𝐂εv=[𝐂εm𝐂εξ]\mathbf{C}_{\varepsilon v}=[\mathbf{C}_{\varepsilon m}\quad\mathbf{C}_{\varepsilon\xi}], and 𝐂vv=[𝐂mm𝐂mξ𝐂ξm𝐂ξξ]\mathbf{C}_{vv}=\begin{bmatrix}\mathbf{C}_{mm}&\mathbf{C}_{m\xi}\\ \mathbf{C}_{\xi m}&\mathbf{C}_{\xi\xi}\end{bmatrix}. Using (22), it is straightforward to see that

𝐂m|d,w(𝐰)=𝐂mm(𝐂mm𝐅m+𝐂mξ𝐅ξ+𝐂mε)𝐖T𝐐(𝐖)𝐖(𝐅m𝐂mm+𝐅ξ𝐂ξm+𝐂εm),.\mathbf{C}_{m|d,w}(\mathbf{w})=\mathbf{C}_{mm}-\left(\mathbf{C}_{mm}\mathbf{F}^{\ast}_{m}+\mathbf{C}_{m\xi}\mathbf{F}^{\ast}_{\xi}+\mathbf{C}_{m\varepsilon}\right)\mathbf{W}^{T}\mathbf{Q}(\mathbf{W})\mathbf{W}\left(\mathbf{F}_{m}\mathbf{C}_{mm}+\mathbf{F}_{\xi}\mathbf{C}_{\xi m}+\mathbf{C}_{\varepsilon m}\right),.

Denoting 𝐊m(𝐰)𝐐(𝐖)𝐖(𝐅m𝐂mm+𝐅ξ𝐂ξm+𝐂εm)(𝐂mm𝐅m+𝐂mξ𝐅ξ+𝐂mε)𝐖T\mathbf{K}_{m}(\mathbf{w})\coloneqq\mathbf{Q}(\mathbf{W})\mathbf{W}(\mathbf{F}_{m}\mathbf{C}_{mm}+\mathbf{F}_{\xi}\mathbf{C}_{\xi m}+\mathbf{C}_{\varepsilon m})(\mathbf{C}_{mm}\mathbf{F}^{\ast}_{m}+\mathbf{C}_{m\xi}\mathbf{F}^{\ast}_{\xi}+\mathbf{C}_{m\varepsilon})\mathbf{W}^{T}, the marginal A-optimal design thus satisfies

(24) 𝐰=argmax𝐰{0,1}strace[𝐊m(𝐰)]s.t.i=1swi=k.\mathbf{w}^{*}=\operatorname*{arg\,max}_{\mathbf{w}\in\{0,1\}^{s}}\operatorname*{trace}\left[\mathbf{K}_{m}(\mathbf{w})\right]\quad\text{s.t.}\sum_{i=1}^{s}w_{i}=k.

Note that (23) and (24) define NP-hard optimization problems. Approaches to solving this challenging problem often involve relaxation techniques [3, 28, 57], although relaxation-free approaches are also available (see, e.g., [8]). However, here we take a greedy approach to finding the placement of kk sensors that approximately solves the optimization problem (23). In the greedy approach to sensor placement, sensors that lead to the largest decrease in the A-optimality criterion are chosen one at a time from the (diminishing) candidate set. The procedure is outlined in Algorithm 1. The greedy approach to sensor placement generally yields sub-optimal designs. However, it has been shown to produce reasonable designs in practice [5, 2, 33]. Additionally, the greedy approach to sensor placement is completely hands-off (since no derivative information of the objective is required). Thus, when combined with the zero map surrogate, 𝖮\mathcal{F}\equiv\mathsf{O}, our proposed approach is a black box that only requires sample parameter and data pairs.

At each step =1,,k\ell=1,\ldots,k, the objective function defined in (23) (resp. (24) in the marginal OED case) needs to be evaluated s(1)s-(\ell-1) times. Once the matrices 𝚪η|v+𝐅~𝐂vv𝐅~\mathbf{\Gamma}_{\eta|v}+\widetilde{\mathbf{F}}\mathbf{C}_{vv}\widetilde{\mathbf{F}}^{\ast} and (𝐅𝐂vv+𝐂εv)(𝐂vv𝐅+𝐂vε)\left(\mathbf{F}\mathbf{C}_{vv}+\mathbf{C}_{\varepsilon v}\right)\left(\mathbf{C}_{vv}{\mathbf{F}}^{\ast}+\mathbf{C}_{v\varepsilon}\right) (resp. (𝐅m𝐂mm+𝐅ξ𝐂ξm+𝐂εm)(𝐂mm𝐅m+𝐂mξ𝐅ξ+𝐂mε)(\mathbf{F}_{m}\mathbf{C}_{mm}+\mathbf{F}_{\xi}\mathbf{C}_{\xi m}+\mathbf{C}_{\varepsilon m})(\mathbf{C}_{mm}\mathbf{F}^{\ast}_{m}+\mathbf{C}_{m\xi}\mathbf{F}^{\ast}_{\xi}+\mathbf{C}_{m\varepsilon})) are precomputed, each evaluation of the objective only requires solving a system with a matrix of dimension τ\ell\tau (where τ\tau denotes the number of times measurements are collected at each sensor) and summing up the diagonal entries. The computational bottleneck for most problems with PDE-dependent PTO maps will be the computation of the statistics ((14)-(16)) required for modeling the error using the BAE approach, since this requires solving qq PDEs with the accurate (costly) PTO map. We note that for our proposed zero map surrogate, precomputing the matrices in question is straightforward. In particular, 𝐐=[𝐖(𝚪𝐞+𝚪εε)𝐖T]1\mathbf{Q}=\left[\mathbf{W}\left(\mathbf{\Gamma_{e}}+\mathbf{\Gamma}_{\varepsilon\varepsilon}\right)\mathbf{W}^{T}\right]^{-1} and the equations for 𝐊(𝐰)\mathbf{K}(\mathbf{w}) and 𝐊m(𝐰)\mathbf{K}_{m}(\mathbf{w}) simplify:

𝐊(𝐰)\displaystyle\mathbf{K}(\mathbf{w}) =[𝐖(𝚪𝐞+𝚪εε)𝐖T]1𝐖𝐂εv𝐂vε𝐖T\displaystyle=\left[\mathbf{W}\left(\mathbf{\Gamma_{e}}+\mathbf{\Gamma}_{\varepsilon\varepsilon}\right)\mathbf{W}^{T}\right]^{-1}\mathbf{W}\mathbf{C}_{\varepsilon v}\mathbf{C}_{v\varepsilon}\mathbf{W}^{T}
𝐊m(𝐰)\displaystyle\mathbf{K}_{m}(\mathbf{w}) =[𝐖(𝚪𝐞+𝚪εε)𝐖T]1𝐖𝐂εm𝐂mε𝐖T.\displaystyle=\left[\mathbf{W}\left(\mathbf{\Gamma_{e}}+\mathbf{\Gamma}_{\varepsilon\varepsilon}\right)\mathbf{W}^{T}\right]^{-1}\mathbf{W}\mathbf{C}_{\varepsilon m}\mathbf{C}_{m\varepsilon}\mathbf{W}^{T}.

However, if 𝐅\mathbf{F} is defined via a discretized (linear) PDE, then some approximation to 𝐅\mathbf{F} may be required. There are various ways of constructing such approximations efficiently, e.g., using randomized matrix algorithms [25].

Algorithm 1 Algorithm for finding A-optimal uncertainty-aware greedy sensor placements

Input Target number of sensors kk, prior distribution μv\mu_{v}, mean 𝐞0\mathbf{e}_{0} and covariance 𝚪𝐞\mathbf{\Gamma_{e}} of the observational noise, accurate and (linear) surrogate parameter-to-observable maps 𝐆\mathbf{G}, 𝐅\mathbf{F}
    Output Optimal weight vector 𝐰\mathbf{w}^{*}

1:  Create the tuple {(v(),𝜺())}=1N\{(\textbf{v}^{(\ell)},\boldsymbol{\varepsilon}^{(\ell)})\}_{\ell=1}^{N} of parameter samples v()μv\textbf{v}^{(\ell)}\sim\mu_{v} and corresponding approximation error samples 𝜺()\boldsymbol{\varepsilon}^{(\ell)} using Equation 13
2:  Compute sample means 𝜺0,v0\boldsymbol{\varepsilon}_{0},\textbf{v}_{0} and (cross-)covariances 𝚪εε,𝐂𝜺v,𝐂v𝜺,𝐂vv\mathbf{\Gamma}_{\varepsilon\varepsilon},\mathbf{C}_{\boldsymbol{\varepsilon}v},\mathbf{C}_{v\boldsymbol{\varepsilon}},\mathbf{C}_{vv} using (14)-(16)
3:  Precompute 𝚪η|v+𝐅~𝐂vv𝐅~\mathbf{\Gamma}_{\eta|v}+\widetilde{\mathbf{F}}\mathbf{C}_{vv}\widetilde{\mathbf{F}}^{\ast} and (𝐅𝐂vv+𝐂εv)(𝐂vv𝐅+𝐂vε)\left(\mathbf{F}\mathbf{C}_{vv}+\mathbf{C}_{\varepsilon v}\right)\left(\mathbf{C}_{vv}{\mathbf{F}}^{\ast}+\mathbf{C}_{v\varepsilon}\right)                  {or (𝐅m𝐂mm+𝐅ξ𝐂ξm+𝐂εm)(𝐂mm𝐅m+𝐂mξ𝐅ξ+𝐂mε)(\mathbf{F}_{m}\mathbf{C}_{mm}+\mathbf{F}_{\xi}\mathbf{C}_{\xi m}+\mathbf{C}_{\varepsilon m})(\mathbf{C}_{mm}\mathbf{F}^{\ast}_{m}+\mathbf{C}_{m\xi}\mathbf{F}^{\ast}_{\xi}+\mathbf{C}_{m\varepsilon}) in marginal case}
4:  Initialize candidate and chosen index sets: chosen\mathcal{I}_{\text{chosen}}\leftarrow\varnothing, candidate{1,,s}\mathcal{I}_{\text{candidate}}\leftarrow\{1,\ldots,s\}
5:  for i=1ki=1\cdots k do
6:     𝐰(chosen)=1\mathbf{w}^{*}(\mathcal{I}_{\text{chosen}})=1
7:     Ii=argmaxjcandidatetrace[𝐊(𝐰+𝐞j)]I_{i}=\operatorname*{arg\,max}_{j\in\mathcal{I}_{\text{candidate}}}\operatorname*{trace}\big{[}\mathbf{K}(\mathbf{w}^{*}+\mathbf{e}_{j})\big{]} {𝐞js\mathbf{e}_{j}\in\mathbb{R}^{s} denotes the jj-th coordinate vector}                            {or trace[𝐊m(𝐰+𝐞j)]\operatorname*{trace}\big{[}\mathbf{K}_{m}(\mathbf{w}^{*}+\mathbf{e}_{j})\big{]} in marginal case}
8:     chosenchosenIi\mathcal{I}_{\text{chosen}}\leftarrow\mathcal{I}_{\text{chosen}}\cup I_{i}
9:     candidatecandidatechosen\mathcal{I}_{\text{candidate}}\leftarrow\mathcal{I}_{\text{candidate}}\setminus\mathcal{I}_{\text{chosen}}
10:  end for
11:  return  𝐰\mathbf{w}^{*}

In the next two sections we illustrate the application of our proposed linearize-then-optimize approach for computing optimal designs for two inverse problems. Specifically, for an idealized subsurface flow problem and for a tsunami detection problem.

4 Numerical example 1: Idealized subsurface flow

The first numerical example we consider is an idealized subsurface flow problem. Specifically, we consider the inverse problem of estimating the the (log-)boundary flux mm and the (log-)permeability field ξ\xi in a two dimensional subsurface aquifer from noisy (point-wise) measurements of the (head) pressure uu. The boundary flux and permeability field are related to the pressure by the steady state Darcy flow equation within the aquifer:

(25) (exp(ξ)u)\displaystyle\nabla\cdot(\exp{(\xi)}\nabla u) =0\displaystyle=0\quad in Ω,\displaystyle\text{in }\Omega,
(exp(ξ)u)𝒏\displaystyle(\exp{(\xi)}\nabla u)\cdot\boldsymbol{n} =exp(m)\displaystyle=\exp{(m)}\quad on ΓT,\displaystyle\text{on }\Gamma_{\rm T},
(exp(ξ)u)𝒏\displaystyle(\exp{(\xi)}\nabla u)\cdot\boldsymbol{n} =1\displaystyle=-1\quad on ΓB,\displaystyle\text{on }\Gamma_{\rm B},
u\displaystyle u =0\displaystyle=0\quad on ΓL,\displaystyle\text{on }\Gamma_{\rm L},
(exp(ξ)u)𝒏\displaystyle(\exp{(\xi)}\nabla u)\cdot\boldsymbol{n} =0\displaystyle=0\quad on ΓR,\displaystyle\text{on }\Gamma_{\rm R},

with 𝒏\boldsymbol{n} denoting the outward normal vector. For a more in-depth discussion on the physical interpretation of the problem, we refer to  [11, 10, 12]. Similar problems are commonly considered as benchmark tests [42, 31, 4].

In the current work we employ a level set approach[21, 32] for the parametrization of the permeability ξ\xi. Specifically, we take

(26) ξ=Φ(ψ)=i=1L𝗓iχDi(ψ),\displaystyle\xi=\Phi(\psi)=\sum_{i=1}^{L}\mathsf{z}_{i}\chi_{D_{i}(\psi)},

where χ\chi is the indicator function, Di(ψ)={xΩ|i1ψ(x)i}D_{i}(\psi)=\{x\in\Omega\,|\,\ell_{i-1}\leq\psi(x)\leq\ell_{i}\}, and 𝗓i\mathsf{z}_{i}\in\mathbb{R} for i=1,2,,Li=1,2,\dots,L are the discrete (log\log-)permeability values. Thus Φ\Phi maps the so-called level set field ψ\psi to the permeability ξ\xi. We make explicit that although the permeability is the parameter of interest, inference and the associated optimal experimental design problem are carried out for the the level set field ψ\psi.

4.1 Problem set up

We consider the problem in the unit square; Ω=(0,1)×(0,1)\Omega=(0,1)\times(0,1), and set ΓT\Gamma_{\rm T}, ΓB\Gamma_{\rm B}, ΓL\Gamma_{\rm L} and ΓR\Gamma_{\rm R} as the top, bottom and left and right boundaries, respectively, i.e., ΓT:=(0,1)×{1}\Gamma_{\rm T}:=(0,1)\times\{1\}, ΓB:=(0,1)×{0}\Gamma_{\rm B}:=(0,1)\times\{0\}, ΓL:={0}×(0,1)\Gamma_{\rm L}:=\{0\}\times(0,1), and ΓR:={1}×(0,1)\Gamma_{\rm R}:=\{1\}\times(0,1). A Galerkin finite element method is used to solve the forward problem (25). More specifically, the domain is discretized into 3042 triangular elements and 1600 continuous piece-wise linear basis functions. Thus the pressure uu as well as the permeability have dimension 1600. On the other hand, the unknown log-boundary flux is 40 dimensions. We impose a single level set threshold at =0\ell=0 on the level set field, with the permeability values (see (26)) 𝗓1=0\mathsf{z}_{1}=0 when ψ0\psi\leq 0, and 𝗓2=1\mathsf{z}_{2}=1 when ψ>0\psi>0. Finally, we take a total of 1600 candidate sensor locations which are distributed in an equally spaced 40 by 40 grid-like array (see Figure 5). In Figure 1 we show the true underlying level set field ψtrue\psi_{\rm true}, the resulting true permeability ξtrue=Φ(ψtrue)\xi_{\rm true}=\Phi(\psi_{\rm true}), the true boundary flux mtruem_{\rm true}, and the resulting true pressure utrueu_{\rm true}.

Refer to captionRefer to captionRefer to captionRefer to caption
Figure 1: The true underlying level set field ψtrue\psi_{\rm true} (far left), the induced true (log\log-) permeability ξtrue=Φ(ψtrue)\xi_{\rm true}=\Phi(\psi_{\rm true}) (center left), the true (log\log-) boundary flux mtruem_{\rm true} through ΓT\Gamma_{\rm T} (center right), and the resulting pressure utrueu_{\rm true} (far right).

Prior models

We assume the parameters mm and ξ\xi are a priori independent, i.e., μm,ξ=μmμξ\mu_{m,\xi}=\mu_{m}\mu_{\xi}. Furthermore, the (marginal) prior measure for the permeability is the push-forward of the prior measure of the level set field [21, 22, 32], i.e., μξ=Φμψ\mu_{\xi}=\Phi_{\sharp}\mu_{\psi}. As such, the joint prior used here is of the form μm,ψ=μmμψ\mu_{m,\psi}=\mu_{m}\mu_{\psi}.

We postulate Gaussian prior measures for both the top boundary flux and the underlying level set field of the permeability, i.e., μm=𝒩(m0,𝒞mm)\mu_{m}=\mathcal{N}(m_{0},\mathcal{C}_{mm}) and μψ=𝒩(ψ0,𝒞ψψ)\mu_{\psi}=\mathcal{N}(\psi_{0},\mathcal{C}_{\psi\psi}). The covariance operators are defined by

(27) 𝒞mm\displaystyle\mathcal{C}_{mm} =𝒜2,with 𝒜=c2c3d2dx2,\displaystyle=\mathcal{A}^{-2},\quad\text{with }\mathcal{A}=c_{2}-c_{3}\frac{d^{2}}{dx^{2}},
(28) 𝒞ψψv(x)\displaystyle\mathcal{C}_{\psi\psi}v(x) =Ωc(x,y)v(y)𝑑ywith c(x,y)=exp(xy22c12),\displaystyle=\int_{\Omega}c(x,y)v(y)\,dy\quad\text{with }c(x,y)=\exp{\left(-\frac{\left\|x-y\right\|_{\mathbb{R}^{2}}^{2}}{c_{1}^{2}}\right)},

with v(x)L2(Ω)v(x)\in L^{2}(\Omega) and the domain of 𝒜\mathcal{A} being 𝒟(𝒜):=H02(ΓT)\mathcal{D}(\mathcal{A}):=H^{2}_{0}(\Gamma_{\rm T}). The parameter c1c_{1} controls the characteristic length scale of samples of ψ\psi, while c2c_{2} and c3c_{3} control correlation length and variance of the samples of mm. Such covariance operators are fairly standard for subsurface flow Bayesian inverse problems [32, 30, 4]. In the current example we set c1=18c_{1}=\frac{1}{8}, c2=2c_{2}=2, and c3=8×102c_{3}=8\times 10^{-2}. In Figure 2 we show several prior samples mm, samples pf ψ\psi along with the corresponding samples of the permeability ξ\xi.

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Figure 2: Samples from the prior. Top row are samples of ψ\psi, bottom row are corresponding samples of ξ\xi. Samples of mm are visualized in the right-most column where the grey-scale represents the prior density and the black dotted lines the plus/minus (marginal) one standard deviation.

Optimal experimental design

We consider several cases of OED for this numerical example, each carried out using the proposed linearize-then-optimize procedure. In particular, we consider joint OED (of both mm and ξ\xi), and then marginal OED for mm. For the marginal OED case we also compare the results found when neglecting the uncertainty in ξ\xi. Thus, for the idealized subsurface flow example, we consider the following three OED problems:

  1. 1.

    Joint OED for mm and ξ\xi,

  2. 2.

    Marginal OED for mm while accounting for the uncertainty in ξ\xi, and

  3. 3.

    Marginal OED for mm while ignoring the uncertainty in ξ\xi.

Linearization

In all cases we use the zero operator 𝖮(𝒟(𝒜)×L2(Ω),d)\mathsf{O}\in\mathcal{L}(\mathcal{D}(\mathcal{A})\times L^{2}(\Omega),\mathbb{R}^{d}) defined by 𝖮v=0\mathsf{O}v=0 for all v𝒟(𝒜)×L2(Ω)v\in\mathcal{D}(\mathcal{A})\times L^{2}(\Omega) as the approximate linear forward model. The zero model is particularly straight-forward to implement, completely non-invasive (i.e., well-suited in the case of black-box models), computationally cheap, and avoids the differentiability issues associated with the level set parameterization. More specifically, we generate q=10,000q=10,000 samples from the joint prior, i.e., (m(),ψ())μm,ψ(m^{(\ell)},\psi^{(\ell)})\sim\mu_{m,\psi} for =1,2,,10,000\ell=1,2,\dots,10,000, and compute the corresponding (noiseless) model predictions, 𝐝()\mathbf{d}^{(\ell)}. Subsequently, all required quantities are computed using Equations (14)-(16).

4.2 Reference case

Before considering the OED problem we solve the Bayesian inverse problem based on measurements collected at all candidate sensor locations. Specifically, we compute the approximate joint posterior for mm and ξ\xi based on the linear(-ized) BAE approach outlined in Section 2.3 and the “true” joint posterior with the full nonlinear PTO map using Markov chain Monte Carlo (MCMC) with the preconditioned Crank–Nicolson (pCN) algorithm [14, 16]. This reference task serves to provide a best case scenario in terms of uncertainty reduction, while also giving some insights into the accuracy of the approximate conditional mean (m0|d,ξ0|d)(m_{0|d},\xi_{0|d}) and covariance 𝒞m,ξ|d\mathcal{C}_{m,\xi|d} (see (10) and (11) respectively) relative to those computed using MCMC. For the MCMC approach, we use a single chain of four million samples and discard the first four hundred thousand as burn-in. The results of the reference case, including those found using the linear BAE-based approach, are shown in Figure 3 for mm and in Figure 4 for ξ\xi (and ψ\psi).

We generally observe that the (approximate) posterior computed using the linear BAE approach has higher levels of uncertainty (in mm, ξ\xi and ψ\psi) compared to respective posteriors computing using MCMC. This is expected however, as a fundamental feature of the BAE framework is the incorporation of model errors/uncertainties (induced here by the use of the linear(-ized) surrogate model), which generally increases posterior uncertainty in the parameters. To investigate the posterior uncertainty of ξ\xi we compute the (sample-based) posterior covariance based on 10,000 posterior samples of ψ\psi (one batch from the MCMC-based posterior, and one batch for the linear BAE-based posterior) and computing ξ=Φ(ψ)\xi=\Phi(\psi). It is worth noting that due to the nature of the relationship between ψ\psi and ξ\xi, namely via Φ\Phi, the posterior uncertainty of ξ\xi found using the linear BAE-approach is dependent on the data. This is evident by the increased uncertainty towards the interfaces of the conditional mean of ξ\xi in Figure 4.

Refer to captionRefer to caption
Figure 3: Comparison of the marginal distributions of mm for the subsurface flow problem. On the left we show samples of mm from the (approximate) posterior found using the linear(-ized) approach while on the right we show samples of mm from the posterior found using pCN MCMC. In both figures the grey-scale represents the respective posterior density and the black dotted lines the respective posterior plus/minus (marginal) one standard deviation.
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Figure 4: Comparison of the marginal distributions of ψ\psi and ξ\xi for the subsurface flow problem. On the top row from left to right are shown the the marginal posterior standard deviation of ψ\psi, the conditional mean estimate for ψ\psi, the push-forward of conditional mean of ψ\psi, i.e., ϕ(w|𝐝)\phi(w_{\ast|\mathbf{d}}), and the marginal posterior standard deviation of ξ\xi computing using the linear(-ized) approach, while on the bottom we show the respective quantities computed using pCN MCMC.

Our main focus in this work is concerned with reductions and measures of posterior uncertainty rather than posterior estimates themselves. However, as expected, the conditional mean estimates found MCMC-based approach do more closely resemble the true parameters compared to those found using the linear BAE-based approach (see Figure 4 and Figure 3).

4.3 Optimal experimental design results

We now consider carrying out the problem of optimal sensor selection using the proposed linearize-then-optimize OED procedure. Specifically, we consider the problem of computing the optimal 20 sensor locations to measure the pressure uu for each of the three problems listed above.

To illustrate the effectiveness of the proposed approach to OED, we compare the optimal design found to randomly chosen designs. Specifically, for a given number of sensors k{1,2,,20}{k}\in\{1,2,\dots,20\} we sample (without replacement) 100 random sensor configurations. The design criterion (trace of the relevant joint or marginal (approximate) posterior covariance operator) is then evaluated for each of the random sensor configurations. It should be noted that the optimal designs computed using the linear surrogate may not be optimal for the true posterior, even if uncertainty is accounted for. To this end, for the joint OED problem (Problem 1) we also investigate the effectiveness of our computed designs in minimizing 𝔼𝐝|𝐰[trace(𝒞v|d,w(𝐰,𝐝))]\mathbb{E}_{\mathbf{d}|\mathbf{w}}\left[\operatorname*{trace}(\mathcal{C}_{v|d,w}(\mathbf{w},\mathbf{d}))\right], where 𝒞v|d,w\mathcal{C}_{v|d,w} is the “true” posterior covariance operator obtained using the model (2) with the accurate PTO map 𝒢\mathcal{G}.

The results for Problem 1 are shown in Figure 5. The optimal design found using the proposed linearize-then-optimize approach as well a comparison of this optimal design to randomly selected designs is presented. We see that the design found using our proposed approach outperforms the random designs. This becomes more pronounced as the number of sensors increases up to 20If the number of sensors were to grow toward the total possible number of sensors the difference would diminish.. This appears to demonstrate the usefulness of the proposed method for OED. As alluded to above, to further investigate the applicability of our approach, for Problem 1 we also compare the performance of the designs found using the linearized approach to random designs using a SAA to 𝔼𝐝|𝐰[trace(𝒞v|d,w(𝐰,𝐝))]\mathbb{E}_{\mathbf{d}|\mathbf{w}}\left[\operatorname*{trace}(\mathcal{C}_{v|d,w}(\mathbf{w},\mathbf{d}))\right]. Specifically, for comparison, 10 different samples of (m,ξ)(m,\xi) from the joint prior are used to generate 10 sets of data at all measurement locations. For each sample we then evaluate the trace of the accurate posterior covariance operator 𝒞v|d,w(𝐰,𝐝)\mathcal{C}_{v|d,w}(\mathbf{w},\mathbf{d}) with a subset of data collected at: the optimal design computed using the linearize-then-optimize approach, as well as for 10 (randomly selected) different designs, all consisting of 20 sensors. Each of the (110 in total) accurate posterior covariances are computed using the pCN MCMC algorithm using a single chain of four hundred thousand samples with the first one hundred thousand discarded as burn in. Note in this case the optimal design significantly outperforms the randomly chosen designs.

Refer to captionRefer to captionRefer to caption
Figure 5: Optimal experimental design results for the the subsurface flow problem treating mm and ξ\xi as inversion parameters. On the left we show the optimal sensor placements using the linearize-then-optimize approach. In the center we compare the trace of the uncertainty-aware approximate posterior covariance operator found using the optimal sensor (green stars) to 100 randomly chosen designs (black boxplot). On the right we compare the trace of the posterior covariance operator for 20 sensors using the linearization and pCN MCMC for the optimal sensor placements (green and magenta stars, respectively) and for randomly chosen designs (black and blue boxplots, respectively) and the trace of the prior covariance operator (black dotted line).

We next investigate the performance of the proposed approach on the marginal OED problem, i.e., carrying out OED for mm while treating ξ\xi as an auxiliary parameter. As a comparison, we also consider carrying out the uncertainty-unaware formulation of the marginal OED problem, i.e., ignoring the additional uncertainty due to unknown auxiliary parameter. The results for the marginal OED problem are shown in Figure 6. There is a significant difference in the optimal designs found using the uncertainty-aware approach and the uncertainty-unaware approach. This can most likely be attributed to large uncertainty in the approximation errors resulting from marginalizing over ξ\xi (which has significantly larger uncertainty than mm). While the optimal design found using the proposed uncertainty-aware approach are reasonably distributed throughout the domain, the optimal design found using the uncertainty-unaware approach are localized near the top boundary. This is to be expected: the boundary flux mm is defined over the top boundary ΓT\Gamma_{\rm T} only, thus when ignoring the additional uncertainty induced by ξ\xi it is natural to measure near or on ΓT\Gamma_{\rm T} to reduce the uncertainty in mm. When comparing to random designs, we see that both the uncertainty-aware and uncertainty-unaware designs perform well (using the uncertainty-aware formulation of the posterior), however the uncertainty-aware design outperforms the corresponding uncertainty-unaware design for all cases considered.

Refer to captionRefer to captionRefer to caption
Figure 6: Marginal optimal experimental design results for the subsurface flow problem treating only the boundary flux mm as the inversion parameter. On the left is the optimal sensor locations found using the uncertainty-aware approach, while in the center is the optimal sensor locations found using the uncertainty-unaware approach. On the right we compare the trace of the uncertainty-aware posterior covariance operator found using the optimal sensor locations found using the uncertainty-aware(green stars) and uncertainty-unaware (purple circles) to 100 randomly chosen designs (black boxplot) and the trace of the (marginal) prior covariance operator (black dotted line).

5 Numerical example 2: tsunami detection problem

In our second example, we aim to find optimal designs for tsunami source reconstruction in the deep ocean. Propagation of earthquake-induced tsunami waves in a two-dimensional spatial domain Ω2\Omega\subset\mathbb{R}^{2} is commonly modeled using the shallow water equations (SWE) [40, 54]. The SWEs are a nonlinear hyperbolic system of depth-averaged conservation laws used to model gravity waves and are well-suited for simulating tsunami waves due to their characteristically long wavelengths relative to water depth. For tsunamis arising from an instantaneous change to the ocean floor, i.e., a bathymetry change, the shallow water equations describing the changing water depth h(x,y,t)h(x,y,t) (defined as the height of the water above the ocean floor) and fluid flows in the xx and yy directions (u(x,y,t)u(x,y,t) and w(x,y,t)w(x,y,t) respectively) at any point (x,y,t)Ω×(0,T](x,y,t)\in\Omega\times(0,T], are

ht+ux+wy\displaystyle h_{t}+u_{x}+w_{y} =0,\displaystyle=0,
ut+(u2h+12gh2)x+(uwh)y\displaystyle u_{t}+\left(\frac{u^{2}}{h}+\frac{1}{2}gh^{2}\right)_{x}+\left(\frac{uw}{h}\right)_{y} =ghBx,\displaystyle=-ghB_{x},
vt+(uwh)x+(w2h+12gh2)y\displaystyle v_{t}+\left(\frac{uw}{h}\right)_{x}+\left(\frac{w^{2}}{h}+\frac{1}{2}gh^{2}\right)_{y} =ghBy,\displaystyle=-ghB_{y},

where 0>B(x,y)0>B(x,y)\in\mathcal{B} is the post-earthquake bathymetry and gg is the gravitational acceleration. It is assumed that the ocean is initially at rest, i.e., h(x,y,0)=BR(x,y)h(x,y,0)=-B_{R}(x,y) and ht(x,y,0)=0h_{t}(x,y,0)=0 where BR(x,y)B_{R}(x,y) is the pre-earthquake bathymetry.

5.1 Modeling the bathymetry change using the Okada model

Our target example are tsunamis caused by suboceanic earthquakes. The Okada model [50] is commonly utilized to model the relationship between slips at fault plates beneath the ocean floor and seafloor deformations or bathymetry changes. Given a discretization of the fault region into a finite number of patches, the Okada model assumes the Earth behaves like a linear elastic material and provides a closed-form expression for evaluating the instantaneous bathymetry change induced by slips at these fault patches in a prescribed rake or direction.

Given a vector of slip magnitudes 𝐦=[m1,,mN]M\mathbf{m}=[m_{1},\ldots,m_{N}]\in M and rakes 𝝃=[ξ1,,ξN]X\boldsymbol{\xi}=[\xi_{1},\ldots,\xi_{N}]\in X at NN\in\mathbb{N} fault patches, the post-earthquake bathymetry B(x,y)B(x,y) can be defined as

(29) B(x,y)=BR(x,y)+(𝒪𝐡(𝐦,𝝃))(x,y),B(x,y)=B_{R}(x,y)+\left(\mathcal{O}\mathbf{h}(\mathbf{m},\boldsymbol{\xi})\right)(x,y),

where the linear operator 𝒪:2N\mathcal{O}:\mathbb{R}^{2N}\rightarrow\mathscr{H} is defined by

(30) (𝒪𝐡(𝐦,𝝃))(x,y)=i=1N𝒪i[misinξimicosξi],\left(\mathcal{O}\mathbf{h}(\mathbf{m},\boldsymbol{\xi})\right)(x,y)=\sum_{i=1}^{N}\mathcal{O}_{i}\begin{bmatrix}m_{i}\sin\xi_{i}\\ m_{i}\cos\xi_{i}\end{bmatrix},

with the functions 𝒪i=[𝒪is,𝒪ic]\mathcal{O}_{i}=[\mathcal{O}_{i}^{s},\mathcal{O}_{i}^{c}] defining the seafloor deformation induced by a slip at patch ii.

The Okada model makes various simplifications about the physical properties of the Earth as well as the mechanisms of the deformation (e.g., it assumes the Earth is a homogeneous isotropic elastic material and that the rupture occurs instantaneously) and thus only provides an approximation to the true bathymetry change induced by a slip at a fault. However, it is assumed to provide adequate approximations for the purposes of tsunami modeling and is often used in literature. Additionally, the scarcity of the observed data (due to financial constraints limiting the quantity of deployed deep-ocean pressure sensors) makes detailed reconstruction of the infinite-dimensional ocean deformations difficult, particularly without the use of a physically relevant prior. Parametrization of the seafloor deformation through the Okada model facilitates the use of a prior on the slip patches that results in physically realistic seafloor deformations, as will be discussed in Section 5.2.

5.2 Bayesian inversion using a linearization of the SWEs

The typical goal for tsunami hazard assessment and tsunami warning systems is to estimate the tsunami-causing seafloor rupture and use the estimate for predictions and threat assessment. Time is crucial for these predictions, and using the nonlinear shallow water equations can be costly. However, away from shore in the deep ocean, the linearized SWEs (centered around the ocean at rest) provide a reasonably accurate approximation to the dynamics of propagating tsunami waves [40]. This motivates the use of an affine surrogate to the PTO map obtained through a first-order Taylor expansion centered around the bathymetry of the ocean at rest, i.e.,

(31) 𝐝𝒢(mR,𝝃R)+𝒢m|mR,𝝃R(mmR)+𝒢𝝃|mR,𝝃R(𝝃𝝃R)=𝒢(vR)+𝖣v𝒢(vR)[m^𝝃^],\mathbf{d}\approx\mathcal{G}(\textbf{m}_{R},\boldsymbol{\xi}_{R})+\mathcal{G}_{\textbf{m}}^{\prime}|_{\textbf{m}_{R},\boldsymbol{\xi}_{R}}(\textbf{m}-\textbf{m}_{R})+\mathcal{G}_{\boldsymbol{\xi}}^{\prime}|_{\textbf{m}_{R},\boldsymbol{\xi}_{R}}(\boldsymbol{\xi}-\boldsymbol{\xi}_{R})=\mathcal{G}(\textbf{v}_{R})+\mathsf{D}_{\textbf{v}}\mathcal{G}(\textbf{v}_{R})\begin{bmatrix}\hat{\textbf{m}}\\ \hat{\boldsymbol{\xi}}\end{bmatrix},

with mR=[0,,0]\textbf{m}_{R}=[0,\ldots,0], 𝝃R=[π2,,π2]\boldsymbol{\xi}_{R}=[\frac{\pi}{2},\ldots,\frac{\pi}{2}] and 𝒢m|mR,𝝃R\mathcal{G}_{\textbf{m}}^{\prime}|_{\textbf{m}_{R},\boldsymbol{\xi}_{R}}, 𝒢𝝃|mR,𝝃R\mathcal{G}_{\boldsymbol{\xi}}^{\prime}|_{\textbf{m}_{R},\boldsymbol{\xi}_{R}} denoting the derivative of 𝒢\mathcal{G} (with respect to m and 𝝃\boldsymbol{\xi}, respectively) evaluated at the fixed parameters mR\textbf{m}_{R} and 𝝃R\boldsymbol{\xi}_{R}. The linearized PTO map 𝖣v𝒢(vR)\mathsf{D}_{\textbf{v}}\mathcal{G}(\textbf{v}_{R}) mapping the slip magnitude perturbations m^M\hat{\textbf{m}}\in M and rake vectors 𝝃^X\hat{\boldsymbol{\xi}}\in X to dd incremental ocean-depth observations is obtained through: (1)(1) solution of the hyperbolic system

(32) [h^u^w^]t+[010gBR00000][h^u^w^]x+[001000gBR00][h^u^w^]y=[0gBR(𝒪sm^)xgBR(𝒪sm^)y],\begin{bmatrix}\hat{h}\\ \hat{u}\\ \hat{w}\end{bmatrix}_{t}+\begin{bmatrix}0&1&0\\ -gB_{R}&0&0\\ 0&0&0\end{bmatrix}\begin{bmatrix}\hat{h}\\ \hat{u}\\ \hat{w}\end{bmatrix}_{x}+\begin{bmatrix}0&0&1\\ 0&0&0\\ -gB_{R}&0&0\end{bmatrix}\begin{bmatrix}\hat{h}\\ \hat{u}\\ \hat{w}\end{bmatrix}_{y}=\begin{bmatrix}0\\ gB_{R}\left(\mathcal{O}^{s}\hat{\textbf{m}}\right)_{x}\\ gB_{R}\left(\mathcal{O}^{s}\hat{\textbf{m}}\right)_{y}\end{bmatrix},

with zero initial conditions for h^,u^\hat{h},\hat{u} and w^\hat{w} and (2)(2) application of an observation operator mapping the incremental state h^\hat{h} to the spatio-temporal observations of the water depth. We note here that the linearization, when centered at the rest bathymetry, is invariant to changes in the rake vector, i.e., 𝒢𝝃|vR𝝃^0\mathcal{G}^{\prime}_{\boldsymbol{\xi}}|_{\textbf{v}_{R}}\hat{\boldsymbol{\xi}}\equiv 0. This means that simultaneous inversion for both the magnitude and rake is not possible using this linear model alone.

Our primary focus is on tsunamis caused by large magnitude earthquakes. In particular, we focus on earthquakes with a magnitude class between 898-9. To ensure our designs perform well for detecting such earthquakes we choose a Gaussian prior on the slip magnitudes (m𝒩(m0,𝚪m)\textbf{m}\sim\mathcal{N}(\textbf{m}_{0},\boldsymbol{\Gamma}_{\textbf{m}})) following the procedure outlined in [41, Sections 2 and 5]. Regardless of the earthquake magnitude, the direction of the slip for thrust earthquakes is typically around 9090^{\circ}, therefore we assume a priori that 𝝃i𝒩(90,10)\boldsymbol{\xi}_{i}\sim\mathcal{N}(90,10) for each slip patch i=1,,Ni=1,\ldots,N. Some sample slip magnitudes (as well as the corresponding bathymetry changes) obtained from this tailored prior are visualized in Figure 7.

Refer to caption
Figure 7: Sample magnitudes and rakes from the prior distribution (as described in Section 5.2) are visualized in the top row at the slip patches used to parameterize the tsunami-inducing bathymetry change. The bottom row visualizes the corresponding induced bathymetry change due to each sample slip.

5.3 Computational setup

The bathymetry data used for the simulations, visualized in Figure 8, is from [44] and can be found in the corresponding repository [39]. Since the linear approximation to the shallow water equations degrades in accuracy near shore, we limit our computational domain to a rectangular region offshore as can be seen in the right image in Figure 8. We assume the tsunami originates from a vertical deformation of the seafloor in this region. To simulate the bathymetry change, a potential fault region is discretized into 4444 slip patches 2525 km in length and width, each with a dip of 1414^{\circ} and strike of 193193^{\circ} (see Figure 8). The slip patches chosen for these numerical experiments are a slightly modified subset of of the patches used in [24]. Specifically, to suit our needs, each original patch was split into four equally-sized patches and the depths were adjusted accordingly.

We model our ocean-floor sensors on a simplified Deep-ocean Assessment and Reporting of Tsunamis (DART) II system, which consists of a pressure sensor tethered to the ocean floor and a seasurface companion buoy equipped with satellite telecommunications capability [46]. For simplicity, we assume the sensors can measure the height of the water column above them directly. We specify 214214 locations for possible data collection, assuming the sensors can only be placed in depths between 161-6 kilometers. A visualization of these possible locations is shown on the right in Figure 8. Since the water amplitude can be resolved with higher accuracy away from the region of maximum deformation (see, e.g., [46]) we impose uncorrelated Gaussian measurement noise with zero mean and standard deviation 0.0050.005 meters for data collected at sensors situated away from the fault and 0.080.08 meters for sensors close to the fault.

Refer to caption
Figure 8: The left figure is included for spatial reference, it depicts the topological data of the full domain. The solid grey lines indicate the boundaries of the slip patches obtained from [24] which we have split in quarters. The boundaries of the patches used in our simulations are depicted by solid red lines. The right image shows a zoomed-in view of our computational domain with the numbering used for the patches. The white dots show potential locations for sensor placement.

To simulate “event mode” of the DART sensors, we assume each sensor produces measurements at 3030 second intervals for the first four minutes (starting at 55 seconds after the seafloor rupture) resulting in 99 depth readings for each sensor. For simplicity, we assume the 3030-second depth interval readings are average readings over a two second interval, i.e., for τi{5,35,,245}\tau_{i}\in\{5,35,\ldots,245\}, h(τi)12τi1τi+1h(t)𝑑th(\tau_{i})\approx\frac{1}{2}\int_{\tau_{i}-1}^{\tau_{i}+1}h(t)\,dt, which we approximate with the trapezoidal rule.

5.4 Results

In this section, we compare the optimal placement of sensors using the uncertainty-aware and uncertainty-unaware formulation of the OED problem. The uncertainty-aware designs are obtained treating both the magnitudes m and directions 𝝃\boldsymbol{\xi} at each fault patch as parameters-of-interest and accounting for uncertainty due to the use of the linear surrogate map. All the forward simulations in this section were performed using the Conservation Laws Package (Clawpack [15, 45]) and the GeoClaw toolbox [9].

As in Section 4, we again choose the zero operator for obtaining the uncertainty-aware designs. For the uncertainty-unaware designs, the linearized SWEs (32) are used to approximate the propagating tsunami and model error is ignored. For both uncertainty-aware and uncertainty-unaware designs, we compute the optimal placement of the sensors using the greedy approach outlined in Section 3.1 with 500500 samples used to approximate the BAE statistics for the uncertainty-aware model. The uncertainty-aware and uncertainty-unaware optimal placements of 2020 sensors are visualized in Figure 9.

Refer to caption
Figure 9: Uncertainty-aware sensor placements for the tsunami model problem (left column), and the uncertainty-unaware optimal design (right column). In all the plots, the chosen sensors are numbered according to the order in which they were chosen using the greedy approach.

To illustrate the effectiveness of uncertainty-aware designs compared to uncertainty-unaware designs, in Figure 10 we compare the posterior mean and standard deviation obtained using both designs with 13 sensors and the uncertainty-aware formulation of the Bayesian inverse problem. In all cases, noisy data was simulated with a randomly chosen “true” parameter vector [mtrue,𝝃true][\textbf{m}_{\text{true}},\boldsymbol{\xi}_{\text{true}}] and the full nonlinear SWEs. The designs optimized using the linearized SWE without accounting for uncertainty lead to lower posterior uncertainty in the slip magnitudes than the uncertainty-aware designs. This is not surprising — uncertainty in the rakes can not be reduced using (32), therefore sensor placements are chosen to optimally infer the slip magnitudes. Thus, the uncertainty is higher in the slip rakes, and the overall reconstruction of the fault (and resulting bathymetry change) is worse when using the uncertainty-unaware designs. This observation is strengthened by the relative distance (measured using the Euclidean norm) from the truth to the posterior means, which is approximately: 0.44 and 0.26 for the relative error in the slip magnitude estimates using the uncertainty-unaware and uncertainty-aware designs, respectively, and 0.17 and 0.13 for the relative error in the slip rake estimates using the uncertainty-unaware and uncertainty-aware designs, respectively. Note that without accounting for model error in the Bayesian inverse problem, the optimal uncertainty-unaware design performs rather poorly in reconstructing the true parameter vector (see Figure 11).

Refer to caption(i)(ii)(iii)(iv)(v)
Figure 10: Visualization of posterior statistics using the uncertainty-aware and unaware optimal sensor placements. The slip magnitudes mtrue\textbf{m}_{\text{true}} and rakes 𝝃true\boldsymbol{\xi}_{\text{true}} as well as the corresponding bathymetry change used to simulate data is shown in the top and bottom figures of column (i) respectively. In the top row of columns (ii) and (iii) we show the posterior slip magnitudes and rakes obtained using the uncertainty-aware and uncertainty-unaware design. The corresponding average bathymetry change induced by the posterior distribution with both design choices is shown in the bottom row of columns (ii) and (iii). Columns (iv) and (v) visualize the posterior standard deviations (obtained using the uncertainty-aware and uncertainty-unaware designs respectively) in the slip magnitude (top row) and rakes (bottom row).
Refer to caption(i)(ii)
Figure 11: Visualization of the posterior mean using the uncertainty-unaware optimal sensor placement. The slip magnitudes mtrue\textbf{m}_{\text{true}} and rakes 𝝃true\boldsymbol{\xi}_{\text{true}} as well as the corresponding bathymetry change used to simulate data is shown in the top and bottom figures of column (i) respectively. In the top row of column (ii) we show the estimated slip magnitudes and rakes for each slip patch. The corresponding average bathymetry change induced by the posterior distribution is shown in the bottom row of column (ii). The posterior was obtained using the linearized SWEs without accounting for model error.

We further evaluate the quality of the optimal uncertainty-aware designs compared to the optimal uncertainty-unaware designs and randomly chosen configurations in Figure 12. For each design configuration involving kk sensors, we evaluate: the trace of the resulting uncertainty-aware posterior covariance matrix, and the expected relative error of the posterior mean. To obtain the latter, we fix 100100 sample parameters v(i)\textbf{v}^{(i)} from the prior distribution (different than those used for computing the covariance statistics required for the BAE approach) and synthesize noisy data using the nonlinear SWEs, i.e., 𝐝(i)=𝒢(v(i))+𝐞\mathbf{d}^{(i)}=\mathcal{G}(\textbf{v}^{(i)})+\mathbf{e}. The expected error in the posterior mean is then approximated using a sample average approach, 1100i=1100v(i)v¯𝐝(i)v(i)\frac{1}{100}\sum_{i=1}^{100}\frac{\|\textbf{v}^{(i)}-\bar{\textbf{v}}^{\mathbf{d}^{(i)}}\|}{\|\textbf{v}^{(i)}\|}, where v¯𝐝(i)\bar{\textbf{v}}^{\mathbf{d}^{(i)}} denotes the posterior mean corresponding to data 𝐝(i)\mathbf{d}^{(i)}. The uncertainty-aware designs outperform the uncertainty-unaware and random designs in reducing the posterior trace as well as the average approximation error in the posterior mean. Additionally, while the uncertainty-aware designs in comparisons in Fig. 9 and Fig. 10 were obtained using error statistics computed with 500500 samples, for Figure 12 we computed optimal uncertainty-aware designs using q=50,250,500,1000q=50,250,500,1000 samples. While using more samples does produce more effective designs, we emphasize that reasonable designs could be obtained using as few as 5050 samples, i.e., using only 5050 runs of the costly forward model.

Refer to captionRefer to caption
Figure 12: In the left we compare the trace of the posterior covariance operator using uncertainty-aware designs (green stars), uncertainty-unaware designs (purple circles) and 100 randomly chosen designs (black boxplot). In all cases, the posterior covariance operator was computed using the zero map while incorporating model uncertainty with 10001000 samples using the BAE approach. In the right figure we compare the average relative error of the posterior mean (computed using 100100 samples v(i)μv\textbf{v}^{(i)}\sim\mu_{v}) for each design choice. The posterior mean was obtained using noisy data synthesized using v(i)\textbf{v}^{(i)} and the nonlinear SWE. The black squares correspond to the average over 100 randomly chosen designs. In both figures, the uncertainty aware designs were obtained using q=50,250,500,1000q=50,250,500,1000 (visualized using green stars of increasing opacity, respectively) samples to approximate the error statistics.

6 Discussion and Conclusion

We have presented a scalable approach for approximating optimal sensor placements for nonlinear Bayesian inverse problems. Our method involves replacing a nonlinear parameter-to-observable map with a linear surrogate while incorporating model error into the inverse problem with the Bayesian approximation error approach. Notably, we demonstrated that this formulation yields an approximation to the A-optimality criterion that is asymptotically independent of the specific linearization choice. This result enables a derivative-free approach to linearized OED. Through two numerical examples, we illustrated the efficacy of sensor placements obtained with our method using the zero map as a surrogate to the accurate forward dynamics.

The results in this article point to several possibilities for future work. Firstly, we reiterate that equivalence of the A-optimal designs under different linearizations can only be guaranteed asymptotically. While the design comparisons in Sections Section 4 and Section 5 indicate that the number of samples used in our examples were sufficient in yielding effective designs, simulating tens-of-thousands of data samples using the accurate model may be prohibitively expensive for certain problems. To accommodate such cases, exploring the use of well-chosen control variates to expedite convergence will be crucial to reduce the number of samples needed.

Additionally, a key aspect of our approach is that it relies solely on sample parameter and data pairs. While the “accurate” data used in our examples was synthetic, this suggests the potential of applying our method for model-free OED. That is, in situations where the accurate forward model is unknown, our approach could be utilized using experimental data for fixed parameter choices. Lastly, another intriguing direction is a study of how well our designs perform in minimizing the uncertainty in the “true” posterior. We provide a heuristic comparison for the subsurface flow example in Section 4, however a more rigorous study would provide insight into the limitations of linearization-based approximations to optimal designs.

Acknowledgments

The work of KK has been partially funded by Carl-Zeiss-Stiftung through the project “Model-Based AI: Physical Models and Deep Learning for Imaging and Cancer Treatment”. The authors thank Alen Alexanderian, Alex de Beer, Oliver Maclaren, and Georg Stadler for helpful discussions and their valuable comments about this manuscript.

References

  • [1] A. Alexanderian, Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review, Inverse Problems, (2021).
  • [2] A. Alexanderian, R. Nicholson, and N. Petra, Optimal design of large-scale nonlinear Bayesian inverse problems under model uncertainty, arXiv preprint arXiv:2211.03952, (2022).
  • [3] A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas, A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized 0\ell_{0}-sparsification, SIAM Journal on Scientific Computing, 36 (2014), pp. A2122–A2148.
  • [4] A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas, A fast and scalable method for A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems, SIAM Journal on Scientific Computing, 38 (2016), pp. A243–A272.
  • [5] A. Alexanderian, N. Petra, G. Stadler, and I. Sunseri, Optimal design of large-scale Bayesian linear inverse problems under reducible model uncertainty: good to know what you don’t know, SIAM/ASA Journal on Uncertainty Quantification, 9 (2021), pp. 163–184.
  • [6] S. R. Arridge, J. P. Kaipio, V. Kolehmainen, M. Schweiger, E. Somersalo, T. Tarvainen, and M. Vauhkonen, Approximation errors and model reduction with an application in optical diffusion tomography, Inverse problems, 22 (2006), p. 175.
  • [7] A. Attia, S. Leyffer, and T. Munson, Robust A-optimal experimental design for Bayesian inverse problems, arXiv preprint arXiv:2305.03855, (2023).
  • [8] A. Attia, S. Leyffer, and T. S. Munson, Stochastic learning approach for binary optimization: Application to Bayesian optimal design of experiments, SIAM Journal on Scientific Computing, 44 (2022), pp. B395–B427.
  • [9] M. J. Berger, D. L. George, R. J. LeVeque, and K. T. Mandli, The GeoClaw software for depth-averaged flows with adaptive refinement, Advances in Water Resources, 34 (2011), pp. 1195–1206.
  • [10] J. Carrera and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions: 1. Maximum likelihood method incorporating prior information, Water Resources Research, 22 (1986), pp. 199–210.
  • [11]  , Estimation of aquifer parameters under transient and steady state conditions: 2. Uniqueness, stability, and solution algorithms, Water Resources Research, 22 (1986), pp. 211–227.
  • [12]  , Estimation of aquifer parameters under transient and steady state conditions: 3. Application to synthetic and field data, Water Resources Research, 22 (1986), pp. 228–242.
  • [13] K. Chaloner and I. Verdinelli, Bayesian experimental design: A review, Statistical Science, (1995), pp. 273–304.
  • [14] V. Chen, M. M. Dunlop, O. Papaspiliopoulos, and A. M. Stuart, Dimension-robust MCMC in Bayesian inverse problems, arXiv preprint arXiv:1803.03344, (2018).
  • [15] Clawpack Development Team, Clawpack software. https://doi.org/10.5281/zenodo.4025432, 2020. Version 5.8.0.
  • [16] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White, MCMC methods for functions: Modifying old algorithms to make them faster, (2013).
  • [17] T. Cui, C. Fox, and M. J. O’Sullivan, A posteriori stochastic correction of reduced models in delayed-acceptance MCMC, with application to multiphase subsurface inverse problems, International Journal for Numerical Methods in Engineering, 118 (2019), pp. 578–605.
  • [18] N. Cvetković, H. C. Lie, H. Bansal, and K. Veroy, Choosing observation operators to mitigate model error in Bayesian inverse problems, arXiv preprint arXiv:2301.04863, (2023).
  • [19] M. Dashti, K. J. Law, A. M. Stuart, and J. Voss, MAP estimators and their consistency in Bayesian nonparametric inverse problems, Inverse Problems, 29 (2013), p. 095017.
  • [20] O. R. Dunbar, M. F. Howland, T. Schneider, and A. M. Stuart, Ensemble-based experimental design for targeting data acquisition to inform climate models, Journal of Advances in Modeling Earth Systems, 14 (2022), p. e2022MS002997.
  • [21] M. M. Dunlop, M. A. Iglesias, and A. M. Stuart, Hierarchical Bayesian level set inversion, Statistics and Computing, 27 (2017), pp. 1555–1584.
  • [22] M. M. Dunlop and Y. Yang, Stability of Gibbs posteriors from the Wasserstein loss for Bayesian full waveform inversion, SIAM/ASA Journal on Uncertainty Quantification, 9 (2021), pp. 1499–1526.
  • [23] C. Feng and Y. M. Marzouk, A layered multiple importance sampling scheme for focused optimal Bayesian experimental design, arXiv preprint arXiv:1903.11187, (2019).
  • [24] Y. Fujii, K. Satake, S. Sakai, M. Shinohara, and T. Kanazawa, Tsunami source of the 2011 off the pacific coast of Tohoku earthquake, Earth, planets and space, 63 (2011), pp. 815–820.
  • [25] N. Halko, P. G. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), pp. 217–288.
  • [26] N. Hänninen, A. Pulkkinen, A. Leino, and T. Tarvainen, Application of diffusion approximation in quantitative photoacoustic tomography in the presence of low-scattering regions, Journal of Quantitative Spectroscopy and Radiative Transfer, 250 (2020), p. 107065.
  • [27] T. Helin and M. Burger, Maximum a posteriori probability estimates in infinite-dimensional Bayesian inverse problems, Inverse Problems, 31 (2015), p. 085009.
  • [28] E. Herman, A. Alexanderian, and A. K. Saibaba, Randomization and reweighted 1\ell_{1}-minimization for A-optimal design of linear inverse problems, SIAM Journal on Scientific Computing, 42 (2020), pp. A1714–A1740.
  • [29] D. Higdon, M. Kennedy, J. C. Cavendish, J. A. Cafeo, and R. D. Ryne, Combining field data and computer simulations for calibration and prediction, SIAM Journal on Scientific Computing, 26 (2004), pp. 448–466.
  • [30] L. Holbach, M. Gurnis, and G. Stadler, A Bayesian level set method for identifying subsurface geometries and rheological properties in Stokes flow, Geophysical Journal International, 235 (2023), pp. 260–272.
  • [31] M. A. Iglesias, K. J. Law, and A. M. Stuart, Ensemble Kalman methods for inverse problems, Inverse Problems, 29 (2013), p. 045001.
  • [32] M. A. Iglesias, Y. Lu, and A. M. Stuart, A Bayesian level set method for geometric inverse problems, Interfaces and free boundaries, 18 (2016), pp. 181–217.
  • [33] J. Jagalur-Mohan and Y. Marzouk, Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design, The Journal of Machine Learning Research, 22 (2021), pp. 11397–11458.
  • [34] J. Kaipio and V. Kolehmainen, Approximate marginalization over modeling errors and uncertainties in inverse problems, Bayesian theory and applications, (2013), pp. 644–672.
  • [35] J. Kaipio and E. Somersalo, Statistical and computational inverse problems, Springer, Dordrecht, 2005.
  • [36]  , Statistical inverse problems: Discretization, model reduction and inverse crimes, Journal of computational and applied mathematics, 198 (2007), pp. 493–504.
  • [37] M. C. Kennedy and A. O’Hagan, Bayesian calibration of computer models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63 (2001), pp. 425–464.
  • [38] K. Koval, A. Alexanderian, and G. Stadler, Optimal experimental design under irreducible uncertainty for linear inverse problems governed by PDEs, Inverse Problems, 36 (2020), p. 075007.
  • [39] R. LeVeque. https://github.com/rjleveque/tohoku2011-paper1, 2014.
  • [40] R. J. LeVeque, D. L. George, and M. J. Berger, Tsunami modelling with adaptively refined finite volume methods, Acta Numerica, 20 (2011), p. 211.
  • [41] R. J. LeVeque, K. Waagan, F. I. González, D. Rim, and G. Lin, Generating random earthquake events for probabilistic tsunami hazard assessment, in Global Tsunami Science: Past and Future, Volume I, Springer, 2016, pp. 3671–3692.
  • [42] Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, and A. Anandkumar, Neural operator: Learning maps between function spaces with applications to PDEs., J. Mach. Learn. Res., 24 (2023), pp. 1–97.
  • [43] Q. Long, M. Scavino, R. Tempone, and S. Wang, Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations, Computer Methods in Applied Mechanics and Engineering, 259 (2013), pp. 24–39.
  • [44] B. T. MacInnes, A. R. Gusman, R. J. LeVeque, and Y. Tanioka, Comparison of earthquake source models for the 2011 Tohoku event using tsunami simulations and near-field observations, Bulletin of the Seismological Society of America, 103 (2013), pp. 1256–1274.
  • [45] K. T. Mandli, A. J. Ahmadia, M. Berger, D. Calhoun, D. L. George, Y. Hadjimichael, D. I. Ketcheson, G. I. Lemoine, and R. J. LeVeque, Clawpack: Building an open source ecosystem for solving hyperbolic PDEs, PeerJ Computer Science, 2 (2016), p. e68.
  • [46] C. Meinig, S. E. Stalin, A. I. Nakamura, and H. B. Milburn, Real-time deep-ocean tsunami measuring, monitoring, and reporting system: The NOAA DART II description and disclosure, NOAA, Pacific Marine Environmental Laboratory (PMEL), (2005), pp. 1–15.
  • [47] I. Neitzel, K. Pieper, B. Vexler, and D. Walter, A sparse control approach to optimal sensor placement in PDE-constrained parameter estimation problems, Numerische Mathematik, 143 (2019), pp. 943–984.
  • [48] R. Nicholson, N. Petra, and J. P. Kaipio, Estimation of the Robin coefficient field in a Poisson problem with uncertain conductivity field, Inverse Problems, 34 (2018), p. 115005.
  • [49] R. Nicholson, N. Petra, U. Villa, and J. P. Kaipio, On global normal linear approximations for nonlinear Bayesian inverse problems, Inverse Problems, 39 (2023), p. 054001.
  • [50] Y. Okada, Surface deformation due to shear and tensile faults in a half-space, Bulletin of the seismological society of America, 75 (1985), pp. 1135–1154.
  • [51] C. P. Robert, G. Casella, and G. Casella, Monte Carlo statistical methods, vol. 2, Springer, 1999.
  • [52] A. Spantini, A. Solonen, T. Cui, J. Martin, L. Tenorio, and Y. Marzouk, Optimal low-rank approximations of Bayesian linear inverse problems, SIAM Journal on Scientific Computing, 37 (2015), pp. A2451–A2487.
  • [53] A. M. Stuart, Inverse problems: A Bayesian perspective, Acta numerica, 19 (2010), pp. 451–559.
  • [54] S. Tong, E. Vanden-Eijnden, and G. Stadler, Extreme event probability estimation using PDE-constrained optimization and large deviation theory, with application to tsunamis, Communications in Applied Mathematics and Computational Science, 16 (2021), pp. 181–225.
  • [55] R. Wong, Asymptotic approximations of integrals, SIAM, 2001.
  • [56] K. Wu, P. Chen, and O. Ghattas, A fast and scalable computational framework for large-scale and high-dimensional Bayesian optimal experimental design, arXiv preprint arXiv:2010.15196, (2020).
  • [57] J. Yu and M. Anitescu, Multidimensional sum-up rounding for integer programming in optimal experimental design, Mathematical Programming, (2017), pp. 1–40.