This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal design of large-scale nonlinear Bayesian inverse problems under model uncertainty

Alen Alexanderian1, Ruanui Nicholson2, and Noemi Petra3 1Department of Mathematics, North Carolina State University, Raleigh, NC, USA
2Department of Engineering Science, University of Auckland, New Zealand
3Department of Applied Mathematics, University of California, Merced, CA, USA
alexanderian@ncsu.edu, ruanui.nicholson@auckland.ac.nz, npetra@ucmerced.edu
Abstract

We consider optimal experimental design (OED) for Bayesian nonlinear inverse problems governed by partial differential equations (PDEs) under model uncertainty. Specifically, we consider inverse problems in which, in addition to the inversion parameters, the governing PDEs include secondary uncertain parameters. We focus on problems with infinite-dimensional inversion and secondary parameters and present a scalable computational framework for optimal design of such problems. The proposed approach enables Bayesian inversion and OED under uncertainty within a unified framework. We build on the Bayesian approximation error (BAE) approach, to incorporate modeling uncertainties in the Bayesian inverse problem, and methods for A-optimal design of infinite-dimensional Bayesian nonlinear inverse problems. Specifically, a Gaussian approximation to the posterior at the maximum a posteriori probability point is used to define an uncertainty aware OED objective that is tractable to evaluate and optimize. In particular, the OED objective can be computed at a cost, in the number of PDE solves, that does not grow with the dimension of the discretized inversion and secondary parameters. The OED problem is formulated as a binary bilevel PDE constrained optimization problem and a greedy algorithm, which provides a pragmatic approach, is used to find optimal designs. We demonstrate the effectiveness of the proposed approach for a model inverse problem governed by an elliptic PDE on a three-dimensional domain. Our computational results also highlight the pitfalls of ignoring modeling uncertainties in the OED and/or inference stages.

MnLargeSymbols’164 MnLargeSymbols’171

Keywords: Optimal experimental design, sensor placement, Bayesian inverse problems, model uncertainty, Bayesian approximation error.

1 Introduction

Models governed by partial differential equations (PDEs) are common in science and engineering applications. Such PDE models often contain parameters that need to be estimated using observed data and the model. This requires solving an inverse problem. The quality of the estimated parameters is influenced significantly by the quantity and quality of the measurement data. Therefore, optimizing the data acquisition process is crucial. This requires solving an optimal experimental design (OED) problem [7, 13, 43]. In the present work, we focus on inverse problems in which measurement data are collected at a set of sensors. In this case, OED amounts to finding an optimal sensor placement. In this context, OED is especially important when only a few sensors can be deployed.

While some parameters in the governing PDEs can be estimated by solving an inverse problem, often there are additional uncertain model parameters that are not being estimated. These parameters might be too costly or impossible to estimate. We call the uncertain parameters that are being estimated in an inverse problem the inversion parameters and refer to the additional uncertain parameters as the secondary parameters. Such secondary parameters have also been referred to as auxiliary parameters, latent parameters, or nuisance parameters in the literature. When solving inverse problems with secondary parameters with significant uncertainty levels, both the parameter estimation and data acquisition processes need to be aware of such uncertainties. In this article, we present a computational framework for optimal design of nonlinear Bayesian inverse problems governed by PDEs under model uncertainty.

Uncertainties in mathematical models can be divided into two classes: reducible and irreducible [39]. The reducible uncertainties are epistemic uncertainties that can be reduced via statistical parameter estimation. On the other hand, irreducible uncertainties are either aleatoric uncertainties inherent to the model that are impossible to reduce or are epistemic uncertainties that are too costly or impractical to reduce. Also, in some applications we might have access to a probabilistic description of secondary model parameters from previous studies and further reduction of the uncertainty in such parameters may not be worth the additional computational cost. We consider such uncertainties as irreducible as well. In the present work, we focus on the case of irreducible uncertainties.

We consider models of the form

𝒚=𝒢(m,ξ)+𝜼,{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\mathcal{G}(m,\xi)+{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}, (1)

where 𝒚\textstyle{y} is a vector of measured data, 𝒢\mathcal{G} is a PDE-based model, mm and ξ\xi are uncertain parameters, and 𝜼\textstyle{\eta} is a random vector that models measurement noise. Herein, mm is the inversion parameter which we seek to infer, and ξ\xi is a secondary uncertain model parameter. We assume the uncertainty in ξ\xi to be irreducible. The parameters mm and ξ\xi are assumed to be independent random variables that take values in infinite-dimensional real separable Hilbert spaces \mathscr{M} and 𝒳\mathscr{X}, respectively. Moreover, 𝒢\mathcal{G} is assumed to be nonlinear in both mm and ξ\xi. The methods presented in this article enable computing optimal experimental designs in such a way that the uncertainty in the secondary parameters is accounted for.

Related work

In the recent years, there have been numerous research efforts directed at OED in inverse problems governed by PDEs. See the review [1], for a survey of the recent literature in this area. There has also been an increased interest in parameter inversion and design of experiments in systems governed by uncertain forward models; see e.g., [27, 6, 24, 33, 14, 38] for a small sample of the literature addressing inverse problems under uncertainty. Methods for OED in such inverse problems have been studied in [28, 5, 18, 11]. The works [28, 5] concern optimal design of infinite-dimensional Bayesian linear inverse problems governed by PDEs. Specifically, [28] considers design of linear inverse problems governed by PDEs with irreducible sources of model uncertainty. On the other hand, [5] targets OED for linear inverse problems with reducible sources of uncertainty. The efforts [18, 11] focus on inverse problems with finite- and low-dimensional inversion and secondary parameters. These articles devise sampling approaches for estimating the expected information gain in such problems. The setting considered in [18] is that of inverse problems with reducible uncertainties. The approach in [11] employs a small noise approximation that applies to problems with nuisance parameters that have small uncertainty levels.

Our approach

We focus on Bayesian nonlinear inverse problems governed by PDEs with infinite-dimensional inversion and secondary parameters. Traditionally, when solving such inverse problems all the secondary model parameters are fixed at their nominal values and the focus is on the estimation of the inversion parameters. Considering 1, this amounts to using the approximate model (m):=𝒢(m,ξ¯)\mathcal{F}(m)\vcentcolon=\mathcal{G}(m,\bar{\xi}), where ξ¯\bar{\xi} is some nominal value. In Section 2, we consider simpler instances of 1 and illustrate the role of model uncertainty in Bayesian inverse problems and the importance of accounting for such uncertainties in parameter inversion and OED. The discussion in Section 2 motivates our approach for incorporating secondary uncertainties in nonlinear Bayesian inverse problems and the corresponding OED problems. This is done using the Bayesian approximation error (BAE) approach [26, 24]; see Section 3.

Subsequently, we build on methods for optimal design of infinite-dimensional nonlinear inverse problems [4, 44, 1] to derive an uncertainty aware OED objective. Specifically, we follow an A-optimal design strategy where the goal is to obtain designs that minimize average posterior variance. To cope with the non-Gaussianity of the posterior, we rely on a Gaussian approximation to the posterior. This enables deriving approximate measures of posterior uncertainty that are tractable to optimize for infinite-dimensional inverse problems; see Section 4. We then present two approaches for formulating and computing the OED objective; see Section 5. The first approach uses Monte Carlo trace estimators and the second one formulates the OED problem as an eigenvalue optimization problem. In each case, the OED problem is formulated as a bilevel binary PDE-constrained optimization problem. In both approaches, the cost of computing the OED objective, in terms of the number of PDE solves, is independent of the dimensions of discretized inversion and secondary parameters. This makes these approaches suitable for large-scale applications. In the present work, we rely on a greedy approach to solve the resulting optimization problems. As discussed in Section 5, a greedy algorithm is especially suited for the formulation of the OED problem in Section 5.2.2, as a binary PDE-constrained eigenvalue optimization problem.

We elaborate the proposed approach in the context of a model inverse problem governed by an elliptic PDE in a three-dimensional domain; see Section 6. This inverse problem, which is motivated by heat transfer applications, concerns estimation of a coefficient field on the bottom boundary of the domain, using sensor measurements of the temperature on the top boundary. The secondary uncertain parameter in this inverse problem is the log-conductivity field, which is modeled as a random field on the three-dimensional physical domain. Our computational results in Section 7 demonstrate the effectiveness of the proposed strategy in computing optimal sensor placements under model uncertainty. We also systematically study the drawbacks of ignoring the uncertainty in the Bayesian inversion and experimental design stages. These studies illustrate the fact that ignoring uncertainty in OED or inference stages can lead to inferior designs and highly inaccurate results in parameter estimation.

Contributions

The contributions of this article are as follows: (1) We present an uncertainty aware formulation of the OED problem, uncertainty aware OED objectives along with scalable methods for computing them, and an extensible optimization framework for computing optimal designs. These make OED for nonlinear Bayesian inverse problems governed by PDEs with infinite-dimensional inversion and secondary parameters feasible. Additionally, the proposed approach enables Bayesian inversion and OED under uncertainty within a unified framework. (2) We elaborate the proposed approach for an inverse problem governed by an elliptic PDE, on a three-dimensional domain, with infinite-dimensional inversion and secondary parameters. This is used to elucidate the implementation of our proposed approach for design of inverse problems governed by PDEs under uncertainty. (3) We present comprehensive computational studies that illustrate the effectiveness of the proposed approach and also the importance of accounting for modeling uncertainties in both parameter inversion and experimental design stages. (4) By considering simpler instances of 1, in Section 2, we present a systematic study of the role of model uncertainty in Bayesian inverse problems and the importance of accounting for such uncertainties in parameter inversion and OED. That study also reveals a connection between the BAE-based approach taken in the present work and the method for OED in linear inverse problems under uncertainty in [5].

The developments in this article point naturally to a number of extensions of the presented methods. We discuss such issues in Section 8, where we present our concluding remarks, and discuss potential limitations of the presented approach and opportunities for future extensions.

2 Motivation and overview

In this section, we motivate our approach for OED under uncertainty and set the stage for the developments in the rest of the article. After a brief coverage of requisite notation and preliminaries in Section 2.1, we begin our discussion in Section 2.2 by considering a simple form of 1 where 𝒢\mathcal{G} is linear in mm and ξ\xi. This facilitates an intuitive study of the role of model uncertainty in Bayesian inverse problems and OED. We then consider nonlinear models of varying complexity in Section 2.3 to motivate our approach for design of inverse problems governed by nonlinear models of the form 1.

2.1 Preliminaries

In this article, we consider inversion and secondary parameters that take values in infinite-dimensional Hilbert spaces. For a Hilbert space \mathscr{H}, we denote the corresponding inner product by ,{\langle{\cdot},{\cdot}\rangle}_{{\mathscr{H}}} and the associated induced norm by \|\cdot\|_{\mathscr{H}}; i.e., :=,1/2\|\cdot\|_{\mathscr{H}}\vcentcolon={\langle{\cdot},{\cdot}\rangle}_{{\mathscr{H}}}^{1/2}. For Hilbert spaces 1\mathscr{H}_{1} and 2\mathscr{H}_{2}, (1,2)\mathscr{L}(\mathscr{H}_{1},\mathscr{H}_{2}) denotes the space of bounded linear transformations from 1\mathscr{H}_{1} to 2\mathscr{H}_{2}. The space of bounded linear operators on a Hilbert space \mathscr{H} is denoted by ()\mathscr{L}(\mathscr{H}), and the subspace of bounded selfadjoint operators is denoted by sym()\mathscr{L}^{\text{sym}}(\mathscr{H}). We let sym+()\mathscr{L}^{\text{sym{+}}}(\mathscr{H}) denote the set of bounded positive selfadjoint operators. The subspace of trace-class operators in ()\mathscr{L}(\mathscr{H}) is denoted by 1()\mathscr{L}_{1}(\mathscr{H}), and the subspace of selfadjoint trace-class operators is denoted by 1sym()\mathscr{L}_{1}^{\text{sym}}(\mathscr{H}). Also, the sets of positive and strictly positive selfadjoint trace-class operators are denoted by 1sym+()\mathscr{L}_{1}^{\text{sym{+}}}(\mathscr{H}) and 1sym++()\mathscr{L}_{1}^{\text{sym{+}{+}}}(\mathscr{H}), respectively.

Throughout the article, 𝒩(a,𝒞)\mathcal{N}\!\left({a},{\mathcal{C}}\right) denotes a Gaussian measure with mean aa and covariance operator 𝒞\mathcal{C}. For a Gaussian measure on an infinite-dimensional Hilbert space \mathscr{H}, the covariance operator 𝒞\mathcal{C} is required to be in 1sym+()\mathscr{L}_{1}^{\text{sym{+}}}(\mathscr{H}). Herein, we consider non-degenerate Gaussian measures; i.e., we assume that 𝒞1sym++()\mathcal{C}\in\mathscr{L}_{1}^{\text{sym{+}{+}}}(\mathscr{H}). For further details on Gaussian measures, we refer to [15, 40]. Also, when considering measures on Hilbert spaces, we equip these spaces with their associated Borel sigma algebra. Throughout the article, for notational convenience, we suppress this choice of the sigma-algebra in our notations.

The adjoint of a linear transformation 𝒜(1,2)\mathcal{A}\in\mathscr{L}(\mathscr{H}_{1},\mathscr{H}_{2}), where 1\mathscr{H}_{1} and 2\mathscr{H}_{2} are (real) Hilbert spaces, is denoted by 𝒜\mathcal{A}^{*}. Recall that 𝒜(2,1)\mathcal{A}^{*}\in\mathscr{L}(\mathscr{H}_{2},\mathscr{H}_{1}) and

𝒜v1,v22=v1,𝒜v21,for all v11,v22.{\langle{\mathcal{A}v_{1}},{v_{2}}\rangle}_{{\mathscr{H}_{2}}}={\langle{v_{1}},{\mathcal{A}^{*}v_{2}}\rangle}_{{\mathscr{H}_{1}}},\quad\text{for all }v_{1}\in\mathscr{H}_{1},v_{2}\in\mathscr{H}_{2}.

We also recall the following basic result regarding affine transformations of Gaussian random variables. Let XX be an 1\mathscr{H}_{1}-valued Gaussian random variable with law 𝒩(a,𝒞)\mathcal{N}\!\left({a},{\mathcal{C}}\right), 𝒜(1,2)\mathcal{A}\in\mathscr{L}(\mathscr{H}_{1},\mathscr{H}_{2}), and b2b\in\mathscr{H}_{2}. Then, the random variable 𝒜X+b\mathcal{A}X+b is an 2\mathscr{H}_{2}-valued Gaussian random variable with law 𝒩(𝒜a+b,𝒜𝒞𝒜)\mathcal{N}\!\left({\mathcal{A}a+b},{\mathcal{A}\mathcal{C}\mathcal{A}^{*}}\right); see [15] for details.

2.2 Linear models

Consider the model

𝒚=SSm+𝒯ξ+𝜼,{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\SS m+\mathcal{T}\xi+{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}, (2)

where 𝒚d{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}\in\mathbb{R}^{d} denotes measurement data, mm\in\mathscr{M} is the inversion parameter, ξ𝒳\xi\in\mathscr{X} is the secondary uncertain parameter, and 𝜼\textstyle{\eta} is the measurement noise vector. (The spaces \mathscr{M} and 𝒳\mathscr{X} are as described in the introduction.) This type of model, which was considered in [5], may correspond to inverse problems governed by linear PDEs with uncertainties in source terms or boundary conditions. We assume SS(,d)\SS\in\mathscr{L}(\mathscr{M},\mathbb{R}^{d}) and 𝒯(𝒳,d)\mathcal{T}\in\mathscr{L}(\mathscr{X},\mathbb{R}^{d}). Moreover, we assume that 𝜼𝒩(𝟎,𝚪noise){\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}\sim\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}},{\mathbf{{\Gamma}}_{\mathup{noise}}}\right) and that 𝜼\textstyle{\eta} is independent of mm and ξ\xi. We consider a Gaussian prior law μpr=𝒩(mpr,𝒞pr)\mu_{\mathup{pr}}=\mathcal{N}\!\left({m_{\mathup{pr}}},{\mathcal{C}_{\mathup{pr}}}\right) for mm, and for the purpose of this illustrative example, let the secondary model uncertainty ξ\xi be distributed according to a Gaussian μξ=𝒩(ξ¯,𝒞ξ)\mu_{\xi}=\mathcal{N}\!\left({\bar{\xi}},{\mathcal{C}_{\xi}}\right). Also, we assume 𝚪noise=σ2𝐈\mathbf{{\Gamma}}_{\mathup{noise}}=\sigma^{2}\mathbf{{I}}, with σ2\sigma^{2} denoting the noise level.

Incorporating model uncertainty in the inverse problem

For a fixed realization of ξ\xi, estimating mm from 2 is a standard problem. This also follows the common practice of fixing additional model parameters at some nominal values before solving the inverse problem. However, it is possible to account for the model uncertainty in this process. To see this, suppose we fix ξ\xi at ξ¯\bar{\xi} and consider the approximate model (m)=SSm+𝒯ξ¯\mathcal{F}(m)=\SS m+\mathcal{T}\bar{\xi}. Note that

SSm+𝒯ξaccurate model=SSm+𝒯ξ¯(m)+𝒯(ξξ¯)error 𝜺.\underbrace{\SS m+\mathcal{T}\xi}_{\text{accurate model}}=\underbrace{\SS m+\mathcal{T}\bar{\xi}}_{\mathcal{F}(m)}+\underbrace{\mathcal{T}(\xi-\bar{\xi})}_{\text{error }{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}}.

In this case, the approximation error 𝜺=𝒯(ξξ¯){\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}=\mathcal{T}(\xi-\bar{\xi}) has a Gaussian law, 𝜺𝒩(𝟎,𝒯𝒞ξ𝒯){\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}\sim\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}},{\mathcal{T}\mathcal{C}_{\xi}\mathcal{T}^{*}}\right). Hence, we may rewrite 2 in terms of the approximate model \mathcal{F} as follows:

𝒚=(m)+𝝂,{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\mathcal{F}(m)+{\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}, (3)

where 𝝂=𝜺+𝜼{\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}={\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}+{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}} denotes the total error. In the present setting, 𝝂𝒩(𝟎,𝒯𝒞ξ𝒯+𝚪noise){\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}\sim\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}},{\mathcal{T}\mathcal{C}_{\xi}\mathcal{T}^{*}+\mathbf{{\Gamma}}_{\mathup{noise}}}\right). Note that we have incorporated the uncertainty due to ξ\xi in the error covariance matrix.111 If the approximate model \mathcal{F} was defined by fixing ξ\xi at a point different from ξ¯\bar{\xi}, then the error term 𝝂\textstyle{\nu} would have nonzero mean. The present procedure for incorporating the secondary model uncertainty into the inverse problem is a special case of the Bayesian approximation error (BAE) approach (see section 3).

Since we have Gaussian prior and noise models and \mathcal{F} in 3 is affine, the posterior is also Gaussian with analytic formulas for its mean and covariance operator; see e.g., [40]. In particular, the posterior covariance operator is given by

𝒞post=(SS𝚪ν1SS+𝒞pr1)1=(SS(𝒯𝒞ξ𝒯+𝚪noise)1SS+𝒞pr1)1.\mathcal{C}_{\mathup{post}}=\big{(}\SS^{*}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}\SS+\mathcal{C}_{\mathup{pr}}^{-1}\big{)}^{-1}=\big{(}\SS^{*}(\mathcal{T}\mathcal{C}_{\xi}\mathcal{T}^{*}+\mathbf{{\Gamma}}_{\mathup{noise}})^{-1}\SS+\mathcal{C}_{\mathup{pr}}^{-1}\big{)}^{-1}. (4)

Considering, for example, the A-optimality criterion 𝗍𝗋(𝒞post)\mathsf{tr}(\mathcal{C}_{\mathup{post}}), 4 illustrates the manner in which the uncertainty due to use of an approximate model impacts the posterior uncertainty.

Interplay between measurement error and approximation error

Note that 𝒯𝒞ξ𝒯sym+(d)\mathcal{T}\mathcal{C}_{\xi}\mathcal{T}^{*}\in\mathscr{L}^{\text{sym{+}}}(\mathbb{R}^{d}). This operator admits a spectral decomposition 𝐕𝚲𝐕𝖳\mathbf{{V}}\mathbf{{\Lambda}}\mathbf{{V}}^{\mkern-1.5mu\mathsf{T}}, where 𝐕\mathbf{{V}} is an orthogonal matrix of eigenvectors and 𝚲\mathbf{{\Lambda}} is a diagonal matrix with the eigenvalues on it diagonal. Thus, the total error covariance matrix can be written as 𝚪ν=𝐕𝚲𝐕𝖳+σ2𝐈=j(λj+σ2)𝒗j𝒗j𝖳\mathbf{{\Gamma}}_{\mathup{\nu}}=\mathbf{{V}}\mathbf{{\Lambda}}\mathbf{{V}}^{\mkern-1.5mu\mathsf{T}}+\sigma^{2}\mathbf{{I}}=\sum_{j}(\lambda_{j}+\sigma^{2}){\mathchoice{\mbox{\boldmath$\displaystyle{v}$}}{\mbox{\boldmath$\textstyle{v}$}}{\mbox{\boldmath$\scriptstyle{v}$}}{\mbox{\boldmath$\scriptscriptstyle{v}$}}}_{j}{\mathchoice{\mbox{\boldmath$\displaystyle{v}$}}{\mbox{\boldmath$\textstyle{v}$}}{\mbox{\boldmath$\scriptstyle{v}$}}{\mbox{\boldmath$\scriptscriptstyle{v}$}}}_{j}^{\mkern-1.5mu\mathsf{T}}. Therefore, the modes for which λjσ2\lambda_{j}\ll\sigma^{2} may be ignored. To observe the impact of model uncertainty on individual observations, we note that

𝖵𝖺𝗋{νi}=𝒆i𝖳𝚪ν𝒆i=j(λj+σ2)(𝒆i𝖳𝒗j)2,i{1,,d},\mathsf{Var}\{\nu_{i}\}={\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{i}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Gamma}}_{\mathup{\nu}}{\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{i}=\sum_{j}(\lambda_{j}+\sigma^{2})({\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{i}^{\mkern-1.5mu\mathsf{T}}{\mathchoice{\mbox{\boldmath$\displaystyle{v}$}}{\mbox{\boldmath$\textstyle{v}$}}{\mbox{\boldmath$\scriptstyle{v}$}}{\mbox{\boldmath$\scriptscriptstyle{v}$}}}_{j})^{2},\quad i\in\{1,\ldots,d\},

where 𝒆i{\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{i}’s are the standard basis vectors in d\mathbb{R}^{d}. Thus, we see that only large eigenvalues contribute significantly to the total error in the iith measurement.

We can also consider the interplay between the spectral representation of the error covariance and the posterior covariance operator. Specifically, 𝒞post=(SS𝚪ν1SS+𝒞pr1)1=𝒞pr1/2(SS~𝚪ν1SS~+I)1𝒞pr1/2\mathcal{C}_{\mathup{post}}=\big{(}\SS^{*}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}\SS+\mathcal{C}_{\mathup{pr}}^{-1}\big{)}^{-1}=\mathcal{C}_{\mathup{pr}}^{1/2}(\tilde{\SS}^{*}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}\tilde{\SS}+I)^{-1}\mathcal{C}_{\mathup{pr}}^{1/2} with SS~=SS𝒞pr1/2\tilde{\SS}=\SS\mathcal{C}_{\mathup{pr}}^{1/2}. Thus, letting 𝐄j=𝒗j𝒗j𝖳\mathbf{{E}}_{j}={\mathchoice{\mbox{\boldmath$\displaystyle{v}$}}{\mbox{\boldmath$\textstyle{v}$}}{\mbox{\boldmath$\scriptstyle{v}$}}{\mbox{\boldmath$\scriptscriptstyle{v}$}}}_{j}{\mathchoice{\mbox{\boldmath$\displaystyle{v}$}}{\mbox{\boldmath$\textstyle{v}$}}{\mbox{\boldmath$\scriptstyle{v}$}}{\mbox{\boldmath$\scriptscriptstyle{v}$}}}_{j}^{\mkern-1.5mu\mathsf{T}}, j=1,,dj=1,\ldots,d, we can write

𝒞post=𝒞pr1/2[j=1d(λj+σ2)1SS~𝐄jSS~+I]1𝒞pr1/2.\mathcal{C}_{\mathup{post}}=\mathcal{C}_{\mathup{pr}}^{1/2}\Big{[}\sum_{j=1}^{d}(\lambda_{j}+\sigma^{2})^{-1}\tilde{\SS}^{*}\mathbf{{E}}_{j}\tilde{\SS}+I\Big{]}^{-1}\mathcal{C}_{\mathup{pr}}^{1/2}.

Note that in the directions corresponding to very large eigenvalues the measurements will have a negligible impact on posterior uncertainty.

The above discussion indicates that model uncertainty cannot be ignored in the inverse problem, especially when the model uncertainty overwhelms measurement noise. The latter is also important in design of experiments. Specifically, some of the measurements might be completely useless due to large amount of model uncertainty associated to the corresponding measurements. This information needs to be accounted for in the OED problem to ensure only measurements that are helpful in reducing posterior uncertainty are selected.

Connection to post-marginalization

In general, the BAE approach involves pre-marginalization over the secondary model uncertainties. This can be related to the idea of post-marginalization in the case of linear Gaussian inverse problems. Namely, if we consider ξ\xi as a reducible uncertainty that is being estimated along with mm and μξ\mu_{\xi} as the corresponding prior law, then 4 is the covariance operator of the marginal posterior law of mm. To see this, we first note that

(𝒯𝒞ξ𝒯+𝚪noise)1=𝚪noise1𝚪noise1𝒯(𝒞ξ1+𝒯𝚪noise1𝒯)1𝒯𝚪noise1.(\mathcal{T}\mathcal{C}_{\xi}\mathcal{T}^{*}+\mathbf{{\Gamma}}_{\mathup{noise}})^{-1}=\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}-\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}\mathcal{T}(\mathcal{C}_{\xi}^{-1}+\mathcal{T}^{*}\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}\mathcal{T})^{-1}\mathcal{T}^{*}\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}. (5)

This relation can be derived by following a similar calculation as the one in [40, p. 536].222 Note that 5 can be viewed as a special form of the Sherman–Morrison–Woodbury formula involving Hilbert space operators. Then, we substitute 5 in 4 to obtain

𝒞post=[𝒞pr1+SS𝚪noise1SSSS𝚪noise1𝒯(𝒞ξ1+𝒯𝚪noise1𝒯)1𝒯𝚪noise1SS]1.\mathcal{C}_{\mathup{post}}=\big{[}\mathcal{C}_{\mathup{pr}}^{-1}+\SS^{*}\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}\SS-\SS^{*}\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}\mathcal{T}(\mathcal{C}_{\xi}^{-1}+\mathcal{T}^{*}\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}\mathcal{T})^{-1}\mathcal{T}^{*}\mathbf{{\Gamma}}_{\mathup{noise}}^{-1}\SS\big{]}^{-1}.

This 𝒞post\mathcal{C}_{\mathup{post}} is the same as the marginal posterior covariance operator of mm as noted in [5, Equation 2.4]. Thus, for a linear Gaussian inverse problem with reducible secondary uncertainty, the posterior covariance operator obtained using the BAE approach equals the marginal posterior covariance operator of mm obtained following joint estimation of mm and ξ\xi.

2.3 Nonlinear models

The linear model 2 can be generalized in the following ways:

additive model, linear in m, nonlinear in ξ:𝒢(m,ξ)=SSm+𝒯(ξ);\displaystyle\text{additive model, linear in $m$, nonlinear in $\xi$}:\mathcal{G}(m,\xi)=\SS m+\mathcal{T}(\xi); (6)
additive model, nonlinear in m, linear in ξ:𝒢(m,ξ)=SS(m)+𝒯ξ;\displaystyle\text{additive model, nonlinear in $m$, linear in $\xi$}:\mathcal{G}(m,\xi)=\SS(m)+\mathcal{T}\xi; (7)
additive model, nonlinear in both m and ξ:𝒢(m,ξ)=SS(m)+𝒯(ξ);\displaystyle\text{additive model, nonlinear in both $m$ and $\xi$}:\mathcal{G}(m,\xi)=\SS(m)+\mathcal{T}(\xi); (8)
nonadditive model, nonlinear in both mm and ξ\xi. (9)

While the cases 68 might be of independent interest, our focus in this article is on the general case 9. However, items 68 do serve to illustrate some of the key challenges.

In the case of 6, one can repeat the steps leading to 3, except 𝜺\textstyle{\varepsilon} will not be Gaussian anymore and therefore the distribution of 𝝂\textstyle{\nu} will not be known analytically. In that case, one may obtain a Gaussian approximation to 𝜺\textstyle{\varepsilon} either by fitting a Gaussian to 𝜺\textstyle{\varepsilon} or by using a linear approximation of 𝒯(ξ)\mathcal{T}(\xi). Then, one may obtain a Gaussian posterior, where one also relies on the linearity of SS\SS. On the other hand, in the case of 7, the total error 𝝂\textstyle{\nu} will be Gaussian as before, but due to nonlinearity of SS(m)\SS(m) the posterior will not be Gaussian. The latter leads to one of the fundamental challenges in OED of nonlinear inverse problem—defining a suitable OED objective whose optimization is tractable. The more complicated cases of 89 inherit the challenges corresponding to the previous cases.

In the rest of this article, we build on the BAE approach to incorporate the uncertainty in ξ\xi in inverse problems governed by nonlinear models of the type 9. The uncertainty in ξ\xi will be assumed irreducible, and in general, ξ\xi will not be assumed to follow a Gaussian law. All that we require is the ability to generate samples of ξ\xi. Following the BAE approach, we approximate the approximation error 𝜺\textstyle{\varepsilon} with a Gaussian. This enables incorporating the model uncertainty in the data likelihood; see Section 3. To cope with non-Gaussianity of the posterior, we rely on a Gaussian approximation to the posterior, to derive an uncertainty aware OED objective that is tractable to evaluate and optimize for infinite-dimensional inverse problems; see Section 4 for the definition of the OED objective and Section 5 for computational methods.

3 Infinite-dimensional Bayesian inverse problems under uncertainty

We consider the inverse problem of inferring a parameter mm from a model of the form 1, where 𝒢:×𝒳d\mathcal{G}:\mathscr{M}\times\mathscr{X}\to\mathbb{R}^{d} is a parameter-to-observable map that in general is nonlinear in both arguments. We focus on problems where 𝒢\mathcal{G} is defined as a composition of an observation operator and a PDE solution operator. The model 𝒢\mathcal{G} is assumed to be Fréchet differentiable in mm, at ξ=ξ¯\xi=\bar{\xi}, where ξ¯𝒳\bar{\xi}\in\mathscr{X} is a nominal value. As before, we employ a Gaussian noise model, 𝜼𝒩(𝟎,𝚪noise){\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}\sim\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}},{\mathbf{{\Gamma}}_{\mathup{noise}}}\right), a Gaussian prior law μpr=𝒩(mpr,𝒞pr)\mu_{\mathup{pr}}=\mathcal{N}\!\left({m_{\mathup{pr}}},{\mathcal{C}_{\mathup{pr}}}\right) for mm, and assume 𝜼\textstyle{\eta} is independent of mm and ξ\xi. The prior induces the Cameron–Martin space =𝗋𝖺𝗇𝗀𝖾(𝒞pr1/2)\mathscr{E}=\mathsf{range}(\mathcal{C}_{\mathup{pr}}^{1/2}), which is endowed with the inner product [15, 17]

a,b=𝒞pr1/2a,𝒞pr1/2b,a,b,{\langle{a},{b}\rangle}_{{\mathscr{E}}}={\langle{\mathcal{C}_{\mathup{pr}}^{-1/2}a},{\mathcal{C}_{\mathup{pr}}^{-1/2}b}\rangle}_{{\!\mathscr{M}}},\quad a,b\in\mathscr{E},

where ,{\langle{\cdot},{\cdot}\rangle}_{{\!\mathscr{M}}} is the inner product on the parameter space \mathscr{M}.

To account for model uncertainty (due to ξ\xi) in the inverse problem, we rely on the BAE approach [26, 24], which we explain next. We fix the secondary parameter to ξ¯\bar{\xi} and consider the approximate (also known as inaccurate, reduced order, or surrogate) model

(m)=𝒢(m,ξ¯).\mathcal{F}(m)=\mathcal{G}(m,\bar{\xi}). (10)

As mentioned before, this is typically what is done in practice where the secondary model parameters are fixed at some (possibly well-justified) nominal values. In the BAE approach, we quantify and incorporate errors due to the use of this approximate model in the Bayesian inverse problem analogously to what was done in Section 2.2. Namely, we consider

𝒚=(m)+𝒢(m,ξ)(m)approximation error 𝜺+𝜼=(m)+𝝂(m,ξ),{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\mathcal{F}(m)+\underbrace{\mathcal{G}(m,\xi)-\mathcal{F}(m)}_{\text{approximation error }{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}}+{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}=\mathcal{F}(m)+{\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}(m,\xi), (11)

where 𝝂(m,ξ)=𝜺(m,ξ)+𝜼{\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}(m,\xi)={\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}(m,\xi)+{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}} is the total error. In the BAE framework, the approximation error 𝜺\textstyle{\varepsilon}, is approximated as a conditionally Gaussian random variable. That is, the distribution of 𝜺|m{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}|m is assumed to be Gaussian. In the present work, we employ the so-called enhanced error model [27, 24, 10], that ignores the correlation between 𝜺\textstyle{\varepsilon} and mm and approximates the law of 𝜺\textstyle{\varepsilon} as a Gaussian 𝜺𝒩(𝜺0,𝚪ε){\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}\sim\mathcal{N}({\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0},\mathbf{{\Gamma}}_{\mathup{\varepsilon}}) with

𝜺0\displaystyle{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0} =𝒳𝜺(m,ξ)μpr(dm)μξ(dξ),\displaystyle=\int_{\mathscr{M}}\int_{\mathscr{X}}{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}(m,\xi)\,\mu_{\mathup{pr}}(dm)\,\mu_{\xi}(d\xi), (12)
𝚪ε\displaystyle\mathbf{{\Gamma}}_{\mathup{\varepsilon}} =𝒳(𝜺(m,ξ)𝜺0)(𝜺(m,ξ)𝜺0)𝖳μpr(dm)μξ(dξ).\displaystyle=\int_{\mathscr{M}}\int_{\mathscr{X}}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}(m,\xi)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}(m,\xi)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}^{\mkern-1.5mu\mathsf{T}}\,\mu_{\mathup{pr}}(dm)\,\mu_{\xi}(d\xi).\quad

In general, the approximation errors can be (highly) correlated with the parameters [34, 24]. However, ignoring the correlation between 𝜺\textstyle{\varepsilon} and mm (i.e., employing the enhanced error model) is typically viewed as a conservative (safe) approximation as it is analogous to approximating the conditional distribution of 𝜺|m{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}|m with the marginal distribution of 𝜺\textstyle{\varepsilon}, which cannot reduce variance [25, Section 3.4]. On the other hand, employing the enhanced error model can significantly reduce the costs associated with computing the mean and covariance operator of 𝜺\textstyle{\varepsilon} [24] which are in practice computed via sampling; see Section 5.

With these approximations, and by our assumption on the measurement noise (which is independent of the parameters), the total error 𝝂\textstyle{\nu} is modeled by a Gaussian 𝒩(𝜺0,𝚪ν)\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}},{\mathbf{{\Gamma}}_{\mathup{\nu}}}\right), where 𝚪ν=𝚪ε+𝚪noise\mathbf{{\Gamma}}_{\mathup{\nu}}=\mathbf{{\Gamma}}_{\mathup{\varepsilon}}+\mathbf{{\Gamma}}_{\mathup{noise}}. Using this approximate noise model along with the approximate model \mathcal{F} we arrive at the following data likelihood:

ΠlikeBAE(𝒚|m)exp{12(𝒚(m)𝜺0)𝖳𝚪ν1(𝒚(m)𝜺0)}.\Pi_{\mathup{like}}^{\scriptscriptstyle\mathup{BAE}}({\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}|m)\propto\exp\Big{\{}-\frac{1}{2}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}\Big{\}}. (13)

With the prior measure in place, and using this BAE-based data likelihood, we can state the Bayes formula [40],

dμpost𝒚dμprΠlikeBAE(𝒚|m).\frac{d\mu_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}}{d\mu_{\mathup{pr}}}\propto\Pi_{\mathup{like}}^{\scriptscriptstyle\mathup{BAE}}({\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}|m).

To make computations involved in design of large-scale inverse problems tractable, we rely on a local Gaussian approximation to the posterior. Namely, we use μ^post𝒚=𝒩(mMAP𝒚,𝒞post𝒚){\hat{\mu}}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\mathcal{N}\!\left({m_{\scriptscriptstyle\mathup{MAP}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}},{\mathcal{C}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}}\right), where mMAP𝒚m_{\scriptscriptstyle\mathup{MAP}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}} is the maximum a posteriori probability (MAP) point and 𝒞post𝒚\mathcal{C}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}} is an approximate posterior covariance operator, described below. The MAP point is given by

mMAP𝒚=argminm𝒥(m):=12(𝒚(m)𝜺0)𝖳𝚪ν1(𝒚(m)𝜺0)+12mmpr,mmpr.m_{\scriptscriptstyle\mathup{MAP}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\operatorname*{arg\,min}_{m\in\mathscr{E}}\mathcal{J}(m)\vcentcolon=\frac{1}{2}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}+\frac{1}{2}{\langle{m-m_{\mathup{pr}}},{m-m_{\mathup{pr}}}\rangle}_{{\mathscr{E}}}. (14)

For the approximate posterior covariance operator 𝒞post𝒚\mathcal{C}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}, we use

𝒞post𝒚=(m(mMAP𝒚)𝚪ν1m(mMAP𝒚)+𝒞pr1)1,\mathcal{C}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\big{(}\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}})^{*}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}})+\mathcal{C}_{\mathup{pr}}^{-1}\big{)}^{-1}, (15)

where m(mMAP𝒚)\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}) is the Fréchet derivative of \mathcal{F} evaluated at mMAP𝒚m_{\scriptscriptstyle\mathup{MAP}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}.

Note that the true posterior μpost𝒚\mu_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}} is equivalent to the prior measure. It is also possible to show that the Gaussian approximation is equivalent to μpr\mu_{\mathup{pr}}, as well. This fact, which is made precise below, is important in justifying the use of this Gaussian approximation for defining an approximate measure of posterior uncertainty. Namely, to define a notion of uncertainty reduction, it is important that our surrogate for the posterior measure is absolutely continuous with respect to our reference measure, which is given by the prior. The equivalence of μ^post𝒚{\hat{\mu}}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}} to μpr\mu_{\mathup{pr}} may be inferred from the more general developments in [35]. However, we present an accessible argument that applies to the specific problem setup under study in the present work. We first present the following result.

Proposition 3.1.

Consider the linearized forward model

(m)=(mMAP)+m(mMAP)(mmMAP),m,\mathcal{L}(m)=\mathcal{F}(m_{\scriptscriptstyle\mathup{MAP}})+\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}})(m-m_{\scriptscriptstyle\mathup{MAP}}),\quad m\in\mathscr{H}, (16)

where we have suppressed the dependence of mMAPm_{\scriptscriptstyle\mathup{MAP}} to 𝐲\textstyle{y} for notational convenience. Define the data model

𝒚=(m)+𝝂,{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}=\mathcal{L}(m)+{\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}, (17)

where 𝛎𝒩(𝛆0,𝚪ν){\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}\sim\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}},{\mathbf{{\Gamma}}_{\mathup{\nu}}}\right). Consider the Bayesian inverse problem of estimating mm using 17 and the prior μpr=𝒩(mpr,𝒞pr)\mu_{\mathup{pr}}=\mathcal{N}\!\left({m_{\mathup{pr}}},{\mathcal{C}_{\mathup{pr}}}\right). The corresponding posterior measure is given by μ^post𝐲{\hat{\mu}}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}.

Proof.

Using the theory of Bayesian linear inverse problems in a Hilbert space [40], the solution of the linear inverse problem under study yields a Gaussian posterior μpostlin=𝒩(mMAPlin,𝒞post)\mu_{\mathup{post}}^{\text{lin}}=\mathcal{N}\!\left({m_{\scriptscriptstyle\mathup{MAP}}^{\text{lin}}},{\mathcal{C}_{\mathup{post}}}\right). The covariance operator 𝒞post\mathcal{C}_{\mathup{post}} is as in 15 and mMAPlinm_{\scriptscriptstyle\mathup{MAP}}^{\text{lin}} is found by minimizing

𝒥L(m):=12(𝒚(m)𝜺0)𝖳𝚪ν1(𝒚(m)𝜺0)+12mmpr,mmpr,\mathcal{J}_{\!L}(m)\vcentcolon=\frac{1}{2}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{L}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{L}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}+\frac{1}{2}{\langle{m-m_{\mathup{pr}}},{m-m_{\mathup{pr}}}\rangle}_{{\mathscr{E}}}, (18)

over the Cameron–Martin space \mathscr{E}. To complete the proof, we show mMAPlin=mMAPm_{\scriptscriptstyle\mathup{MAP}}^{\text{lin}}=m_{\scriptscriptstyle\mathup{MAP}}. Note that 𝒥L\mathcal{J}_{\!L} is a strictly convex quadratic functional with a unique global minimizer. We consider the Euler–Lagrange equation for the present optimization problem. Note that the Fréchet derivative of \mathcal{L} is given by m=m(mMAP)\mathcal{L}_{m}=\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}). It is straightforward to see

ddε|ε=0𝒥L(m+εm~)=m~,m(mMAP)𝚪ν1(𝒚(m)𝜺0)+m~,mmpr,\frac{d}{d\varepsilon}\Big{|}_{\varepsilon=0}\mathcal{J}_{\!L}(m+\varepsilon\tilde{m})={\langle{\tilde{m}},{\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}})^{*}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}({\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{L}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0})}\rangle}_{{\!\mathscr{M}}}+{\langle{\tilde{m}},{m-m_{\mathup{pr}}}\rangle}_{{\mathscr{E}}},

for every m~\tilde{m}\in\mathscr{E}. Thus, recalling (mMAP)=(mMAP)\mathcal{L}(m_{\scriptscriptstyle\mathup{MAP}})=\mathcal{F}(m_{\scriptscriptstyle\mathup{MAP}}), we note that for every m~\tilde{m}\in\mathscr{E},

ddε|ε=0𝒥L(mMAP+εm~)\displaystyle\frac{d}{d\varepsilon}\Big{|}_{\varepsilon=0}\mathcal{J}_{\!L}(m_{\scriptscriptstyle\mathup{MAP}}+\varepsilon\tilde{m}) =m~,m(mMAP)𝚪ν1(𝒚(mMAP)𝜺0)+m~,mMAPmpr\displaystyle={\langle{\tilde{m}},{\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}})^{*}\mathbf{{\Gamma}}_{\mathup{\nu}}^{-1}({\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m_{\scriptscriptstyle\mathup{MAP}})-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0})}\rangle}_{{\!\mathscr{M}}}+{\langle{\tilde{m}},{m_{\scriptscriptstyle\mathup{MAP}}-m_{\mathup{pr}}}\rangle}_{{\mathscr{E}}}
=ddε|ε=0𝒥(mMAP+εm~)=0.\displaystyle=\frac{d}{d\varepsilon}\Big{|}_{\varepsilon=0}\mathcal{J}(m_{\scriptscriptstyle\mathup{MAP}}+\varepsilon\tilde{m})=0.

The last equality follows from the fact that mMAPm_{\scriptscriptstyle\mathup{MAP}} is a minimizer of 𝒥\mathcal{J} in 14. Hence, mMAPm_{\scriptscriptstyle\mathup{MAP}} is the unqiue global minimizer of 𝒥L\mathcal{J}_{\!L}. Therefore, mMAPlin=mMAPm_{\scriptscriptstyle\mathup{MAP}}^{\text{lin}}=m_{\scriptscriptstyle\mathup{MAP}}. ∎

Proposition 3.1 shows that μ^post𝒚{\hat{\mu}}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}} is the posterior measure corresponding to the linearized Bayesian inverse problem considered in the result. Therefore, by construction, μ^post𝒚{\hat{\mu}}_{\mathup{post}}^{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}} is equivalent to μpr\mu_{\mathup{pr}}.

4 A-optimal experimental design under uncertainty

We consider inverse problems in which measurement data are collected at a set of sensors. In this case, the OED problem seeks to find an optimal placement of sensors. Specifically, we formulate the OED problem as that of selecting an optimal subset from a set of candidate sensor locations, which is a common approach; see e.g., [43, 20, 3]. To make matters concrete, we begin by fixing a set of points {𝒙1,𝒙2,,𝒙ns}\{{\mathchoice{\mbox{\boldmath$\displaystyle{x}$}}{\mbox{\boldmath$\textstyle{x}$}}{\mbox{\boldmath$\scriptstyle{x}$}}{\mbox{\boldmath$\scriptscriptstyle{x}$}}}_{1},{\mathchoice{\mbox{\boldmath$\displaystyle{x}$}}{\mbox{\boldmath$\textstyle{x}$}}{\mbox{\boldmath$\scriptstyle{x}$}}{\mbox{\boldmath$\scriptscriptstyle{x}$}}}_{2},\ldots,{\mathchoice{\mbox{\boldmath$\displaystyle{x}$}}{\mbox{\boldmath$\textstyle{x}$}}{\mbox{\boldmath$\scriptstyle{x}$}}{\mbox{\boldmath$\scriptscriptstyle{x}$}}}_{n_{\text{s}}}\} that indicate the candidate sensor locations. We then assign a binary weight wiw_{i} to each candidate location 𝒙i{\mathchoice{\mbox{\boldmath$\displaystyle{x}$}}{\mbox{\boldmath$\textstyle{x}$}}{\mbox{\boldmath$\scriptstyle{x}$}}{\mbox{\boldmath$\scriptscriptstyle{x}$}}}_{i}; a weight of one indicates that a sensor will be placed at the corresponding candidate location. The binary vector 𝒘{0,1}ns{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}\in\{0,1\}^{n_{\text{s}}} thus fully specifies an experimental design in the present setting. Note that specification of a set of candidate sensor locations will in general depend on the specific application at hand. For example, placing sensors in certain parts of the domain might be impractical or impossible. Also, in problems with Dirichlet boundary conditions, placing sensors on or very close to such boundaries will be a waste of resources.

4.1 Design of the Bayesian inverse problem

The design 𝒘\textstyle{w} enters the formulation of the Bayesian inverse problem through the data likelihood. This requires additional care in the present work because the total error covariance matrix 𝚪ν\mathbf{{\Gamma}}_{\mathup{\nu}} is non-diagonal. We follow the setup in [31] to incorporate 𝒘\textstyle{w} in the Bayesian inverse problem formulation. For a binary design vector 𝒘{0,1}ns{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}\in\{0,1\}^{n_{\text{s}}}, we define the matrix 𝐏𝒘\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}} as submatrix of diag(𝒘)\mathrm{diag}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) with rows corresponding to the zero weights removed. Thus, given a generic measurement vector 𝒅ns{\mathchoice{\mbox{\boldmath$\displaystyle{d}$}}{\mbox{\boldmath$\textstyle{d}$}}{\mbox{\boldmath$\scriptstyle{d}$}}{\mbox{\boldmath$\scriptscriptstyle{d}$}}}\in\mathbb{R}^{n_{\text{s}}}, 𝐏𝒘𝒅\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}{\mathchoice{\mbox{\boldmath$\displaystyle{d}$}}{\mbox{\boldmath$\textstyle{d}$}}{\mbox{\boldmath$\scriptstyle{d}$}}{\mbox{\boldmath$\scriptscriptstyle{d}$}}} returns the measurements corresponding to active sensors. For 𝒅ns{\mathchoice{\mbox{\boldmath$\displaystyle{d}$}}{\mbox{\boldmath$\textstyle{d}$}}{\mbox{\boldmath$\scriptstyle{d}$}}{\mbox{\boldmath$\scriptscriptstyle{d}$}}}\in\mathbb{R}^{n_{\text{s}}}, we use the notation 𝒅𝒘=𝐏𝒘𝒅{\mathchoice{\mbox{\boldmath$\displaystyle{d}$}}{\mbox{\boldmath$\textstyle{d}$}}{\mbox{\boldmath$\scriptstyle{d}$}}{\mbox{\boldmath$\scriptscriptstyle{d}$}}}_{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}=\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}{\mathchoice{\mbox{\boldmath$\displaystyle{d}$}}{\mbox{\boldmath$\textstyle{d}$}}{\mbox{\boldmath$\scriptstyle{d}$}}{\mbox{\boldmath$\scriptscriptstyle{d}$}}}.

Next, we describe how a design vector 𝒘\textstyle{w} enters the Bayesian inverse problem, within the BAE framework. For a given 𝒘\textstyle{w}, we consider the model 𝒚𝒘=𝐏𝒘((m)+𝝂){{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(\mathcal{F}(m)+{\mathchoice{\mbox{\boldmath$\displaystyle{\nu}$}}{\mbox{\boldmath$\textstyle{\nu}$}}{\mbox{\boldmath$\scriptstyle{\nu}$}}{\mbox{\boldmath$\scriptscriptstyle{\nu}$}}}). Using this model leads to the following, 𝒘\textstyle{w}-dependent, data likelihood

ΠlikeBAE(𝒚|m)exp{12(𝒚(m)𝜺0)𝖳𝚺(𝒘)(𝒚(m)𝜺0)},\Pi_{\mathup{like}}^{\scriptscriptstyle\mathup{BAE}}({\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}|m)\propto\exp\Big{\{}-\frac{1}{2}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}\Big{\}}, (19)

with

𝚺(𝒘)=𝐏𝒘𝖳𝚪ν,𝒘1𝐏𝒘,where𝚪ν,𝒘=𝐏𝒘𝚪ν𝐏𝒘𝖳.\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Gamma}}_{\mathup{\nu},{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}^{-1}\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}},\quad\text{where}\quad\mathbf{{\Gamma}}_{\mathup{\nu},{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}=\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}\mathbf{{\Gamma}}_{\mathup{\nu}}\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}^{\mkern-1.5mu\mathsf{T}}. (20)

Consequently, we obtain the following 𝒘\textstyle{w}-dependent Gaussian approximation to the posterior, μ^post𝒚,𝒘=𝒩(mMAP𝒚𝒘,𝒞post𝒚𝒘){\hat{\mu}}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}},{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}=\mathcal{N}\!\left({m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}},{\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}}\right), where mMAP𝒚𝒘m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}} is obtained by minimizing

𝒥𝒘(m;𝒚)=12(𝒚(m)𝜺0)𝖳𝚺(𝒘)(𝒚(m)𝜺0)+12mmpr,mmpr,\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}})=\frac{1}{2}\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\big{(}{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}-\mathcal{F}(m)-{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}+\frac{1}{2}{\langle{m-m_{\mathup{pr}}},{m-m_{\mathup{pr}}}\rangle}_{{\mathscr{E}}}, (21)

and

𝒞post𝒚𝒘=(m(mMAP𝒚𝒘)𝚺(𝒘)m(mMAP𝒚𝒘)+𝒞pr1)1.\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\big{(}\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})+\mathcal{C}_{\mathup{pr}}^{-1}\big{)}^{-1}. (22)

Note that in practical computations, the cost function 𝒥𝒘\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}} in 21 is implemented as

𝒥𝒘(m;𝒚)=12(𝒚𝒘𝐏𝒘(m)𝐏𝒘𝜺0)𝖳𝚪ν,𝒘1(𝒚𝒘𝐏𝒘(m)𝐏𝒘𝜺0)+12mmpr,mmpr,\mathcal{J}_{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}})=\frac{1}{2}\big{(}{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}-\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}\mathcal{F}(m)-\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}^{\mkern-1.5mu\mathsf{T}}\mathbf{{\Gamma}}_{\mathup{\nu},{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}^{-1}\big{(}{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}-\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}\mathcal{F}(m)-\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\big{)}+\frac{1}{2}{\langle{m-m_{\mathup{pr}}},{m-m_{\mathup{pr}}}\rangle}_{{\mathscr{E}}},

with 𝚪ν,𝒘\mathbf{{\Gamma}}_{\mathup{\nu},{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}} as in 20. This amounts to using the data from the active sensors and removing the rows and columns corresponding to inactive sensors from the error covariance matrix 𝚪ν\mathbf{{\Gamma}}_{\mathup{\nu}}.

4.2 The design criterion

In the present work, we follow an A-optimal design strategy, where the goal is to find designs that minimize the average posterior variance. Generally, computing the average posterior variance for a Bayesian nonlinear inverse problem is computationally challenging. We follow the developments in [4] to define a Bayesian A-optimality criterion in the case of nonlinear inverse problems.

Given a data vector 𝒚ns{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}\in\mathbb{R}^{n_{\text{s}}}, an approximate measure of posterior uncertainty is provided by 𝗍𝗋(𝒞post𝒚𝒘)\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}). However, when solving the OED problem data is not available. Indeed, it is the goal of the OED problem to specify how data should be collected. To overcome this, we follow the general approach in Bayesian OED of nonlinear inverse problems, where we consider 𝔼𝒚{𝗍𝗋(𝒞post𝒚𝒘)}\mathbb{E}_{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}\{\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})\}, with 𝔼𝒚\mathbb{E}_{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}} denoting expectation with respect to the set of all likely data. We compute this expectation by using the information available in the Bayesian inverse problem and the information regarding the distribution of the model uncertainty. Namely, following the approach in [4], we use the design criterion

Φ(𝒘)=𝒳ns𝗍𝗋(𝒞post𝒚𝒘)Πnoise(𝒢(m,ξ)𝒚)𝑑𝒚μpr(dm)μξ(dξ),\Phi({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\int_{\mathscr{X}}\int_{\mathscr{M}}\int_{\mathbb{R}^{n_{\text{s}}}}\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})\,\Pi_{\text{noise}}(\mathcal{G}(m,\xi)-{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}})d{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}\,\mu_{\mathup{pr}}(dm)\,\mu_{\xi}(d\xi), (23)

where Πnoise\Pi_{\text{noise}} is the probability density function (pdf) of the noise distribution 𝒩(𝟎,𝚪noise)\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}},{\mathbf{{\Gamma}}_{\mathup{noise}}}\right). In practice, Φ(𝒘)\Phi({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) will be computed via sample averaging. Specifically, we use

Φnd(𝒘)=1ndi=1nd𝗍𝗋(𝒞post𝒚𝒘i),\Phi_{n_{\text{d}}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\frac{1}{{n_{\text{d}}}}\sum_{i=1}^{n_{\text{d}}}\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}}), (24)

where the training data samples 𝒚𝒘i{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}} are given by 𝒚𝒘i=𝐏𝒘(𝒢(mi,ξi)+𝜼i){{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(\mathcal{G}(m^{i},\xi^{i})+{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}^{i}), with {(mi,ξi,𝜼i)}i=1nd\{(m^{i},\xi^{i},{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}^{i})\}_{i=1}^{n_{\text{d}}} a sample set from the product space (,μpr)(𝒳,μξ)(ns,𝒩(𝟎,𝚪noise))(\mathscr{M},\mu_{\mathup{pr}})\otimes(\mathscr{X},\mu_{\xi})\otimes(\mathbb{R}^{n_{\text{s}}},\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}},{\mathbf{{\Gamma}}_{\mathup{noise}}}\right)). In large-scale applications, typically a small nd{n_{\text{d}}} can be afforded. However, in this context, typically a modest nd{n_{\text{d}}} enables computing good quality optimal designs. This is also demonstrated in the computational results in the present work.

Note that, for each i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\},

𝗍𝗋(𝒞post𝒚𝒘i)=j=1𝒞post𝒚𝒘iej,ej,\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}})=\sum_{j=1}^{\infty}{\langle{\mathcal{C}_{\mathup{post}}^{{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}}e_{j}},{e_{j}}\rangle}_{{\!\mathscr{M}}},

where {ej}j=1\{e_{j}\}_{j=1}^{\infty} is a complete orthonormal set in \mathscr{M}. Inserting this in 24 and using the definition of 𝒞post𝒚𝒘i\mathcal{C}_{\mathup{post}}^{{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}}, we have

Φnd(𝒘)=1ndi=1ndj=1cij,ej,\Phi_{n_{\text{d}}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\frac{1}{{n_{\text{d}}}}\sum_{i=1}^{n_{\text{d}}}\sum_{j=1}^{\infty}{\langle{c_{ij}},{e_{j}}\rangle}_{{\!\mathscr{M}}}, (25a)
where, for i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\} and jj\in\mathbb{N},
mMAP𝒚𝒘i=argminm𝒥𝒘(m;𝒚i),\displaystyle m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\operatorname*{arg\,min}_{m}\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}), (25b)
(m(mMAP𝒚𝒘)𝚺(𝒘)m(mMAP𝒚𝒘)+𝒞pr1)cij=ej.\displaystyle\big{(}\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})+\mathcal{C}_{\mathup{pr}}^{-1}\big{)}c_{ij}=e_{j}. (25c)

Computing the OED objective as defined above is not practical. However, 25 provides insight into the key challenges in computing the OED objective. In the first place, 25b is a challenging PDE-constrained optimization problem whose solution is the MAP point mMAP𝒚𝒘im_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}. Moreover, upon discretization, 25c will be a high-dimensional linear system with the discretization of the operator (m(mMAP𝒚𝒘)𝚺(𝒘)m(mMAP𝒚𝒘)+𝒞pr1)\big{(}\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})+\mathcal{C}_{\mathup{pr}}^{-1}\big{)} as its coefficient matrix. Such systems can be tackled with Krylov iterative methods that require the application of the coefficient matrix on vectors. In Section 5, we present two approaches for efficiently computing the OED objective: one approach uses randomized trace estimation and the other utilizes low-rank approximation of m(mMAP𝒚𝒘)𝚺(𝒘)m(mMAP𝒚𝒘)\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}). These approaches rely on scalable optimization methods for 25b as well as adjoint based gradient and Hessian apply computation.

4.3 The optimization problem for finding an A-optimal design

We state the OED problem of selecting the best KK sensors, with KnsK\leq{n_{\text{s}}}, as follows:

min𝒘{0,1}nsΦnd(𝒘)\displaystyle\min_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}\in\{0,1\}^{n_{\text{s}}}}\Phi_{n_{\text{d}}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) (26)
s.t. =1nsw=K.\displaystyle\quad\text{s.t.\,}\sum_{\ell=1}^{n_{\text{s}}}w_{\ell}=K.

This is a challenging binary optimization problem. One possibility is to pursue a relaxation strategy [31, 3, 4] to enable gradient-based optimization with design weights wi[0,1]w_{i}\in[0,1]. A practical alternative is to follow a greedy approach to find an approximate solution to 26. Namely, we place sensors one at a time. In each step of a greedy algorithm, we select the sensor that provides the largest decrease in the value of the OED objective Φnd\Phi_{n_{\text{d}}}. While the solutions obtained using a greedy algorithm are suboptimal in general, greedy approaches have shown good performance in many sensor placement problems; see, e.g., [29, 37, 23, 5]. In the present work, we follow a greedy approach for finding approximate solutions for 26. The effectiveness of this approach is demonstrated in our computational results in Section 7.

5 Computational methods

We begin this section with a brief discussion on computing the BAE error statistics in Section 5.1. We then detail our proposed methods for computing the OED objective in Section 5.2. The greedy approach for computing optimal designs is outlined in Section 5.3. Then, we discuss the computational cost of OED objective evaluation, using our proposed methods, as well as the greedy procedure in Section 5.4.

5.1 Estimating approximation error statistics

As discussed in Section 3, the mean and covariance of the approximation error in 12 will be approximated via Monte Carlo sampling. Specifically, we begin by drawing samples {(mi,ξi)}i=1ns\{(m^{i},\xi^{i})\}_{i=1}^{n_{\text{s}}} in (×𝒳,μprμξ)(\mathscr{M}\times\mathscr{X},\mu_{\mathup{pr}}\otimes\mu_{\xi}) and compute

𝜺^0=1nmci=1nmc𝜺i,𝚪^ε=1nmc1i=1nmc(𝜺i𝜺^0)(𝜺i𝜺^0)𝖳,where 𝜺i=𝒢(mi,ξi)(mi).\widehat{{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}}_{0}=\frac{1}{{n_{\text{mc}}}}\sum_{i=1}^{n_{\text{mc}}}{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}^{i},\quad\widehat{\mathbf{{\Gamma}}}_{\mathup{\varepsilon}}=\frac{1}{{n_{\text{mc}}}-1}\sum_{i=1}^{n_{\text{mc}}}({\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}^{i}-\widehat{{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}}_{0})({\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}^{i}-\widehat{{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}}_{0})^{\mkern-1.5mu\mathsf{T}},\quad\text{where }{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}^{i}=\mathcal{G}(m^{i},\xi^{i})-\mathcal{F}(m^{i}). (27)

Subsequently, we use 𝜺0𝜺^0{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0}\approx\widehat{{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}}_{0} and 𝚪ε𝚪^ε\mathbf{{\Gamma}}_{\mathup{\varepsilon}}\approx\widehat{\mathbf{{\Gamma}}}_{\mathup{\varepsilon}}. The computational cost of this process is 2nmc2{n_{\text{mc}}} model evaluations. Note, however, that these model evaluations are done a priori, in parallel, and can be used for both the OED problem and solving the inverse problem. Typically, only a modest sample size is sufficient for approximating the mean and covariance of the model error [33, 10]. Specifically, as noted in Section 2, only dominant modes of the error covariance matrix need to be resolved. Note also that (a subset of) the model evaluations in 27 may be reused in generation of training data needed for computation of the OED objective.

5.2 Computation of the OED objective

In this section, we present scalable computational methods for computing the OED objective. Specifically, we present two methods: (i) a method based on the use of randomized trace estimators (Section 5.2.1) and (ii) a method based on low-rank spectral decompositions (Section 5.2.2). As discussed further below, the latter is particularly suited to our overall approach in the present work. Therefore, we primarily focus on the method based on low-rank spectral decompositions, which we also fully elaborate in the context of a model inverse problem (see Section 6 and Section 7). However, the first method does have its own merits and is included to provide an alternative approach. The relative benefits and computational complexity of these methods are discussed in Section 5.4.

Before we proceed further, we define some notations that simplify the discussions that follow. For i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\}, we define

i(𝒘)=m(mMAP𝒚𝒘i)𝚺(𝒘)m(mMAP𝒚𝒘i)and~i(𝒘)=𝒞pr1/2i(𝒘)𝒞pr1/2.\mathcal{H}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})\quad\text{and}\quad\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\mathcal{C}_{\mathup{pr}}^{1/2}\mathcal{H}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{C}_{\mathup{pr}}^{1/2}. (28)

Note that the operator i\mathcal{H}^{i} is the so-called Gauss–Newton Hessian of the data mistfit term in the definition of 𝒥𝒘(m;𝒚i)\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}) in 21, evaluated at mMAP𝒚𝒘im_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}.

5.2.1 Monte Carlo trace estimator approach

We begin by deriving a randomized (Monte Carlo) trace estimator for the posterior covariance operator 22. This is facilitated by the following technical result.

Proposition 5.1.

Let 𝒞1sym++()\mathcal{C}\in\mathscr{L}_{1}^{\text{sym{+}{+}}}(\mathscr{M}) and 𝒦()\mathcal{K}\in\mathscr{L}(\mathscr{M}), and consider the Gaussian measure μ=𝒩(0,𝒞)\mu=\mathcal{N}\!\left({0},{\mathcal{C}}\right) on \mathscr{M}. Then, 𝒦z,zμ(dz)=𝗍𝗋(𝒞1/2𝒦𝒞1/2)\int_{\mathscr{M}}{\langle{\mathcal{K}z},{z}\rangle}_{{\!\mathscr{M}}}\,\mu(dz)=\mathsf{tr}(\mathcal{C}^{1/2}\mathcal{K}\mathcal{C}^{1/2}).

Proof.

By the assumptions on 𝒞\mathcal{C} and 𝒦\mathcal{K}, we have that both 𝒞𝒦\mathcal{C}\mathcal{K} and 𝒞1/2𝒦𝒞1/2\mathcal{C}^{1/2}\mathcal{K}\mathcal{C}^{1/2} are trace-class. Also, by the formula for the expectation of a quadratic form in the infinite-dimensional Hilbert space setting [2, Lemma 1], we have that 𝒦z,zμ(dz)=𝗍𝗋(𝒞𝒦)\int_{\mathscr{M}}{\langle{\mathcal{K}z},{z}\rangle}_{{\!\mathscr{M}}}\,\mu(dz)=\mathsf{tr}(\mathcal{C}\mathcal{K}). To finish the proof, we let {ej}j=1\{e_{j}\}_{j=1}^{\infty} be the (orthonormal) basis of the eigenvectors of 𝒞\mathcal{C} with corresponding (real, positive) eigenvalues {λj}j=1\{\lambda_{j}\}_{j=1}^{\infty}, and note 𝗍𝗋(𝒞𝒦)=j𝒞𝒦ej,ej=jλj𝒦ej,ej=j𝒦𝒞1/2ej,𝒞1/2ej=j𝒞1/2𝒦𝒞1/2ej,ej=𝗍𝗋(𝒞1/2𝒦𝒞1/2).\mathsf{tr}(\mathcal{C}\mathcal{K})=\sum_{j}{\langle{\mathcal{C}\mathcal{K}e_{j}},{e_{j}}\rangle}_{{\!\mathscr{M}}}=\sum_{j}\lambda_{j}{\langle{\mathcal{K}e_{j}},{e_{j}}\rangle}_{{\!\mathscr{M}}}=\sum_{j}{\langle{\mathcal{K}\mathcal{C}^{1/2}e_{j}},{\mathcal{C}^{1/2}e_{j}}\rangle}_{{\!\mathscr{M}}}=\sum_{j}{\langle{\mathcal{C}^{1/2}\mathcal{K}\mathcal{C}^{1/2}e_{j}},{e_{j}}\rangle}_{{\!\mathscr{M}}}=\mathsf{tr}(\mathcal{C}^{1/2}\mathcal{K}\mathcal{C}^{1/2}).

Next, note that we can write 𝒞post𝒚𝒘i\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}} as

𝒞post𝒚𝒘i=(m(mMAP𝒚𝒘i)𝚺(𝒘)m(mMAP𝒚𝒘i)+𝒞pr1)1=𝒞pr1/2(~i(𝒘)+I)1𝒞pr1/2.\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\big{(}\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{F}_{m}(m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})+\mathcal{C}_{\mathup{pr}}^{-1}\big{)}^{-1}=\mathcal{C}_{\mathup{pr}}^{1/2}\big{(}\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})+I\big{)}^{-1}\mathcal{C}_{\mathup{pr}}^{1/2}. (29)

Using this and letting 𝒞=𝒞pr\mathcal{C}=\mathcal{C}_{\mathup{pr}} and 𝒦=(~i(𝒘)+I)1\mathcal{K}=\big{(}\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})+I\big{)}^{-1} in Proposition 5.1, we have 𝗍𝗋(𝒞post𝒚𝒘i)=(~i(𝒘)+I)1z,zμ(dz)\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})=\int_{\mathscr{M}}{\langle{\big{(}\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})+I\big{)}^{-1}z},{z}\rangle}_{{\!\mathscr{M}}}\,\mu(dz). Approximating the integral on the right hand side via sampling, we obtain

𝗍𝗋(𝒞post𝒚𝒘i)1ntrj=1ntr(~i(𝒘)+I)1zj,zj,\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})\approx\frac{1}{{n_{\text{tr}}}}\sum_{j=1}^{n_{\text{tr}}}{\langle{\big{(}\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})+I\big{)}^{-1}z_{j}},{z_{j}}\rangle}_{{\!\mathscr{M}}},

where {zj}j=1ntr\{z_{j}\}_{j=1}^{n_{\text{tr}}} are draws from the measure μ=𝒩(0,𝒞pr)\mu=\mathcal{N}\!\left({0},{\mathcal{C}_{\mathup{pr}}}\right). This randomized (Monte Carlo) trace estimator allows approximating the OED objective 24, as follows.

Φndtr(𝒘)=1ntrndi=1ndj=1ntrcij,zj,\Phi_{{n_{\text{d}}}}^{\mathrm{tr}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\frac{1}{{n_{\text{tr}}}{n_{\text{d}}}}\sum_{i=1}^{n_{\text{d}}}\sum_{j=1}^{{n_{\text{tr}}}}{\langle{c_{ij}},{z_{j}}\rangle}_{{\!\mathscr{M}}}, (30a)
where, for i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\} and j{1,,ntr}j\in\{1,\ldots,{n_{\text{tr}}}\},
mMAP𝒚𝒘i=argminm𝒥𝒘(m;𝒚i),\displaystyle m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\operatorname*{arg\,min}_{m}\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}), (30b)
(~i(𝒘)+I)cij=zj.\displaystyle\big{(}\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})+I\big{)}c_{ij}=z_{j}. (30c)

The major computational challenge in computing Φndtr(𝒘)\Phi_{{n_{\text{d}}}}^{\mathrm{tr}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) is solving the MAP estimation problems in 30b. These inner optimization problems can be solved efficiently using an inexact Newton-Conjugate Gradient (Newton-CG) method. The required first and second order derivatives can be obtained efficiently using adjoint-based gradient and Hessian apply computation. The Hessian solves in 30c are also done using (preconditioned) CG. This only requires the action of ~i(𝒘)\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) on vectors, which can be done using the adjoint method. A detailed study of the computational cost of computing Φndtr(𝒘)\Phi_{{n_{\text{d}}}}^{\mathrm{tr}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) is provided in Section 5.4.

5.2.2 Low-rank approximation approach

Due to the use of finite-dimensional observations, ~i\widetilde{\mathcal{H}}^{i} in 28 has a finite-dimensional range. Also, often ~i\widetilde{\mathcal{H}}^{i} exhibits rapid spectral decay and, in cases where the dimension of measurements is high, the numerical rank of this operator is typically much smaller than its exact rank. Note that the exact rank is bounded by the measurement dimension. Such low-rank structures are due to possible smoothing properties of the forward operator and that of 𝒞pr\mathcal{C}_{\mathup{pr}}; see, e.g., [12]. The following technical result facilitates exploiting such low-rank structures to compute the OED objective.

Proposition 5.2.

Let 𝒞\mathcal{C} and 𝒜\mathcal{A} be in 1sym+()\mathscr{L}_{1}^{\text{sym{+}}}(\mathscr{M}). Letting {vk}k=1\{v_{k}\}_{k=1}^{\infty} be the orthonormal basis of eigenvectors of 𝒜\mathcal{A} with corresponding (real non-negative) eigenvalues {λk}k=1\{\lambda_{k}\}_{k=1}^{\infty}, we have

𝗍𝗋(𝒞1/2(I+𝒜)1𝒞1/2)=𝗍𝗋(𝒞)k=1λk1+λk𝒞1/2vk2.\mathsf{tr}(\mathcal{C}^{1/2}(I+\mathcal{A})^{-1}\mathcal{C}^{1/2})=\mathsf{tr}(\mathcal{C})-\sum_{k=1}^{\infty}\frac{\lambda_{k}}{1+\lambda_{k}}\|\mathcal{C}^{1/2}v_{k}\|_{\mathscr{M}}^{2}. (31)
Proof.

The result follows by noting that

𝗍𝗋(𝒞)𝗍𝗋(𝒞1/2(I+𝒜)1𝒞1/2)\displaystyle\mathsf{tr}(\mathcal{C})-\mathsf{tr}(\mathcal{C}^{1/2}(I+\mathcal{A})^{-1}\mathcal{C}^{1/2}) =k=1𝒞vk,vkk=1𝒞(I+𝒜)1vk,vk\displaystyle=\sum_{k=1}^{\infty}{\langle{\mathcal{C}v_{k}},{v_{k}}\rangle}_{{\!\mathscr{M}}}-\sum_{k=1}^{\infty}{\langle{\mathcal{C}(I+\mathcal{A})^{-1}v_{k}},{v_{k}}\rangle}_{{\!\mathscr{M}}}
=k=1𝒞vk,vkk=111+λk𝒞vk,vk\displaystyle=\sum_{k=1}^{\infty}{\langle{\mathcal{C}v_{k}},{v_{k}}\rangle}_{{\!\mathscr{M}}}-\sum_{k=1}^{\infty}\frac{1}{1+\lambda_{k}}{\langle{\mathcal{C}v_{k}},{v_{k}}\rangle}_{{\!\mathscr{M}}}
=k=1λk1+λk𝒞prvk,vk=k=1λk1+λk𝒞1/2vk2.\displaystyle=\sum_{k=1}^{\infty}\frac{\lambda_{k}}{1+\lambda_{k}}{\langle{\mathcal{C}_{\mathup{pr}}v_{k}},{v_{k}}\rangle}_{{\!\mathscr{M}}}=\sum_{k=1}^{\infty}\frac{\lambda_{k}}{1+\lambda_{k}}\|\mathcal{C}^{1/2}v_{k}\|_{\mathscr{M}}^{2}.

The relation 31 facilitates approximating 𝗍𝗋(𝒞1/2(I+𝒜)1𝒞1/2)\mathsf{tr}(\mathcal{C}^{1/2}(I+\mathcal{A})^{-1}\mathcal{C}^{1/2}). Namely, we can truncate the infinite summation in the right-hand side to the first rr terms, corresponding to the rr dominant eigenvalues of 𝒜\mathcal{A}. We can also quantify the approximation error due to this truncation as follows.

Proposition 5.3.

Let 𝒜\mathcal{A} and 𝒞\mathcal{C} be as in proposition 5.2 and let {λk}k=1r+1\{\lambda_{k}\}_{k=1}^{r+1} be the r+1r+1 largest eigenvalues of 𝒜\mathcal{A}. Define Tr(𝒜,𝒞)=𝗍𝗋(𝒞)k=1rλk1+λk𝒞1/2vk2T_{r}(\mathcal{A},\mathcal{C})=\mathsf{tr}(\mathcal{C})-\sum_{k=1}^{r}\frac{\lambda_{k}}{1+\lambda_{k}}\|\mathcal{C}^{1/2}v_{k}\|_{\mathscr{M}}^{2}. Then,

|𝗍𝗋(C1/2(I+𝒜)1𝒞1/2)Tr(𝒜,𝒞)|λr+11+λr+1𝗍𝗋(𝒞).|\mathsf{tr}(C^{1/2}(I+\mathcal{A})^{-1}\mathcal{C}^{1/2})-T_{r}(\mathcal{A},\mathcal{C})|\leq\frac{\lambda_{r+1}}{1+\lambda_{r+1}}\mathsf{tr}(\mathcal{C}). (32)
Proof.

We have

|𝗍𝗋(C1/2(I+𝒜)1𝒞1/2)Tr(𝒜,𝒞)|\displaystyle|\mathsf{tr}(C^{1/2}(I+\mathcal{A})^{-1}\mathcal{C}^{1/2})-T_{r}(\mathcal{A},\mathcal{C})| =k=r+1λk1+λk𝒞1/2vk2\displaystyle=\sum_{k=r+1}^{\infty}\frac{\lambda_{k}}{1+\lambda_{k}}\|\mathcal{C}^{1/2}v_{k}\|_{\mathscr{M}}^{2}
λr+11+λr+1k=r+1𝒞1/2vk2\displaystyle\leq\frac{\lambda_{r+1}}{1+\lambda_{r+1}}\sum_{k=r+1}^{\infty}\|\mathcal{C}^{1/2}v_{k}\|_{\mathscr{M}}^{2}
=λr+11+λr+1k=1𝒞vk,vk=λr+11+λr+1𝗍𝗋(𝒞).\displaystyle=\frac{\lambda_{r+1}}{1+\lambda_{r+1}}\sum_{k=1}^{\infty}{\langle{\mathcal{C}v_{k}},{v_{k}}\rangle}_{{\!\mathscr{M}}}=\frac{\lambda_{r+1}}{1+\lambda_{r+1}}\mathsf{tr}(\mathcal{C}).\qed

Next, we consider ~i(𝒘)\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) and let {(λik,vik)}k=1r\{(\lambda_{ik},v_{ik})\}_{k=1}^{r} be its dominant eigenpairs. (The dependence of eigenvalues and eigenvectors on 𝒘\textstyle{w} is suppressed, for notational convenience.) Using Proposition 5.2 with 𝒞=𝒞pr\mathcal{C}=\mathcal{C}_{\mathup{pr}} and 𝒜=~i(𝒘)\mathcal{A}=\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}), we obtain the approximation

𝗍𝗋(𝒞post𝒚𝒘i)\displaystyle\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}) =𝗍𝗋(𝒞pr1/2(I+~i(𝒘))1𝒞pr1/2)\displaystyle=\mathsf{tr}\big{(}\mathcal{C}_{\mathup{pr}}^{1/2}(I+\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}))^{-1}\mathcal{C}_{\mathup{pr}}^{1/2}\big{)} (33)
=𝗍𝗋(𝒞pr(I+~i(𝒘))1)𝗍𝗋(𝒞pr)k=1rλik1+λik𝒞pr1/2vik2.\displaystyle=\mathsf{tr}\big{(}\mathcal{C}_{\mathup{pr}}(I+\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}))^{-1}\big{)}\approx\mathsf{tr}(\mathcal{C}_{\mathup{pr}})-\sum_{k=1}^{r}\frac{\lambda_{ik}}{1+\lambda_{ik}}\|\mathcal{C}_{\mathup{pr}}^{1/2}v_{ik}\|_{\mathscr{M}}^{2}.

Observe that only the second term in the right-hand side depends on 𝒘\textstyle{w}. This term, which provides a measure of uncertainty reduction, can be used to define the OED objective. Specifically, using 24 along with (33), leads to following form of the OED objective:

Φndeig(𝒘)=1ndi=1ndk=1rλik1+λik𝒞pr1/2vik2,\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=-\frac{1}{{n_{\text{d}}}}\sum_{i=1}^{n_{\text{d}}}\sum_{k=1}^{r}\frac{\lambda_{ik}}{1+\lambda_{ik}}\|\mathcal{C}_{\mathup{pr}}^{1/2}v_{ik}\|^{2}_{\mathscr{M}}, (34a)
where, for i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\},
mMAP𝒚𝒘i=argminm𝒥𝒘(m;𝒚i),\displaystyle m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\operatorname*{arg\,min}_{m}\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}), (34b)
~i(𝒘)vik=λikvik\displaystyle\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})v_{ik}=\lambda_{ik}v_{ik} k{1,,r},\displaystyle\quad k\in\{1,\ldots,r\}, (34c)
vik,vil=δkl\displaystyle{\langle{v_{ik}},{v_{il}}\rangle}_{{\!\mathscr{M}}}=\delta_{kl} k,l{1,,r}.\displaystyle\quad k,l\in\{1,\ldots,r\}. (34d)

Note that the numerical rank of ~i\widetilde{\mathcal{H}}^{i} will, in general, depend on ii. In 34, we have used a common target rank rr for simplicity. In the present setting, this target rank is bounded by the number of active sensors for a given 𝒘\textstyle{w}. Note also that Proposition 5.3 shows how to quantify the error due to the use of truncated spectral decompositions. Namely, in view of 32, the error due to the truncation is bounded by (1ndi=1ndλi,r+11+λi,r+1)𝗍𝗋(𝒞pr)\left(\frac{1}{{n_{\text{d}}}}\sum_{i=1}^{n_{\text{d}}}\frac{\lambda_{i,r+1}}{1+\lambda_{i,r+1}}\right)\mathsf{tr}(\mathcal{C}_{\mathup{pr}}).

As in the case of the approach outlined in Section 5.2.1, the dominant computational challenge in computing Φndeig(𝒘)\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) is the solution of the MAP estimation problems 34b. This will be tackled using the same techniques. The eigenvalue problem 34c can be tackled via Lanczos or randomized approaches. The target rank rr, can be selected as the number of active sensors. This is suitable in cases where we have a small number of active sensors. See Section 5.4 for further details regarding the computational cost of evaluating Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}.

We point out that if the dominant eigenvalues are simple, the condition vik,vik=1{\langle{v_{ik}},{v_{ik}}\rangle}_{{\!\mathscr{M}}}=1 for all k{1,,r}k\in\{1,\ldots,r\}, implies the orthonormality condition 34d. This follows from the fact that the eigenvectors corresponding to distinct eigenvalues of ~i\widetilde{\mathcal{H}}^{i}, which belongs to 1sym+()\mathscr{L}_{1}^{\text{sym{+}}}(\mathscr{M}), are orthogonal. The simplicity assumption on the dominant eigenvalues is observed in many applications where the dominant eigenvalues decay rapidly. Making this simplicity assumption, and letting sik=𝒞pr1/2viks_{ik}=\mathcal{C}_{\mathup{pr}}^{1/2}v_{ik}, we may also write the eigenvalue problem above as the following generalized eigenvalue problem

i(𝒘)sik=λik𝒞pr1sik,\displaystyle\mathcal{H}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})s_{ik}=\lambda_{ik}\mathcal{C}_{\mathup{pr}}^{-1}s_{ik}, (35)
sik,sik=1,\displaystyle{\langle{s_{ik}},{s_{ik}}\rangle}_{{\mathscr{E}}}=1,

where i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\} and k{1,,r}k\in\{1,\ldots,r\}. This formulation is helpful when describing the eigenvalue problem in the weak form, as seen in Section 6.2. In particular, the eigenvalue problem (35) will be formulated in terms of the incremental state and adjoint equations and the adjoint based expression for ~i\widetilde{\mathcal{H}}^{i} applies.

5.3 Greedy optimization

We follow a greedy approach for solving the OED problem. That is, we select sensors one at a time: at each step, we pick the sensor that results in the greatest decrease in the OED objective value; see algorithm 1. As mentioned before, we use the approach in Section 5.2.2 for computing the OED objective. That is, we use the OED objective Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}, as defined in 34.

0:  The target number of sensors KK in 26.
0:  The optimal design vector 𝒘\textstyle{w}.
1:  𝒘𝟎{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}\leftarrow{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}
2:  candidate{1,,ns}\mathcal{I}_{\mathrm{candidate}}\leftarrow\{1,\ldots,{n_{\text{s}}}\}
3:  active\mathcal{I}_{\mathrm{active}}\leftarrow\emptyset
4:  for =1\ell=1 to KK do
5:     Evaluate Φndeig(𝒘+𝒆j)\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}+{\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{j}), for all jcandidatej\in\mathcal{I}_{\mathrm{candidate}} {𝒆j{\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{j} is the jjth coordinate vector in ns\mathbb{R}^{n_{\text{s}}}}
6:     iargminjcandidateΦndeig(𝒘+𝒆j)\displaystyle i_{\ell}\leftarrow\displaystyle\operatorname*{arg\,min}_{j\in\mathcal{I}_{\mathrm{candidate}}}\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}+{\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{j})
7:     activeactive{i}\mathcal{I}_{\mathrm{active}}\leftarrow\mathcal{I}_{\mathrm{active}}\cup\{i_{\ell}\}
8:     candidatecandidate{i}\mathcal{I}_{\mathrm{candidate}}\leftarrow\mathcal{I}_{\mathrm{candidate}}\setminus\{i_{\ell}\}
9:     𝒘𝒘+𝒆i{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}\leftarrow{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}+{\mathchoice{\mbox{\boldmath$\displaystyle{e}$}}{\mbox{\boldmath$\textstyle{e}$}}{\mbox{\boldmath$\scriptstyle{e}$}}{\mbox{\boldmath$\scriptscriptstyle{e}$}}}_{i_{\ell}}
10:  end for
Algorithm 1 Greedy approach for solving the OED problem 26 with Φnd=Φndeig\Phi_{n_{\text{d}}}=\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}.

Theoretical justifications behind the use of a greedy approach for sensor placement, in various contexts, have been investigated in a number of works [29, 37, 23]. The solution obtained using the greedy algorithm is known to be suboptimal. However, as observed in practice, the use of a greedy algorithm is a practical approach and is effective in obtaining near optimal sensor placements; see also [30, 44, 5]. We demonstrate the effectiveness of this approach, in our computational results in Section 7.

5.4 Computational cost of sensor placement

The \ellth step of the greedy algorithm requires ns1{n_{\text{s}}}-\ell-1 OED objective evaluations (cf. step 5 of algorithm 1); these can be performed in parallel. It is straightforward to note that placing KK sensors using the greedy approach requires a total of C(K,ns):=KnsK(K1)/2C(K,{n_{\text{s}}}):=K{n_{\text{s}}}-K(K-1)/2 OED objective function evaluations. Therefore, the overall cost of computing an optimal design, in terms of the number of PDE solves, using the proposed approach is C(K,ns)C(K,{n_{\text{s}}}) times the number of PDE solves required in each OED objective evaluation. Thus, to provide a complete picture, in this section, we detail the cost of OED objective evaluation using the two approaches discussed in Section 5.2. A key aspect of both of these approaches is that the cost of OED objective evaluation, in terms of the number of required PDE solves, is independent of the dimension of the discretized inversion and secondary parameters.

In the following discussion, the rank of the operators ~i(𝒘)\widetilde{\mathcal{H}}^{i}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) in 28 plays an important role. As mentioned before, the exact rank of these operators is bounded by the number of active sensors. Moreover, if the number of active sensors is high, the numerical rank is typically considerably smaller than the exact rank. We denote the number of active sensors in a given design by nact=nact(𝒘){n_{\text{act}}}={n_{\text{act}}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}). Note that for a binary design vectors 𝒘\textstyle{w}, nact(𝒘)=𝒘1{n_{\text{act}}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\|{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}\|_{1}.

Cost of evaluating Φndtr\Phi_{{n_{\text{d}}}}^{\mathrm{tr}}

The most expensive step in evaluating 30 is solving the inner optimization problems for the MAP points mMAP𝒚𝒘im_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}, i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\}. In the present work, the inner optimization problem is solved via inexact Gauss–Newton-CG with line search. The cost of each Gauss–Newton iteration is dominated by the CG solves for the search direction. When using the prior covariance operator as a preconditioner, the number of CG iterations is bounded by 𝒪(nact)\mathcal{O}({n_{\text{act}}}); see e.g., [22]. Each CG step in turn requires two linearized PDE solves (incremental forward/adjoint solves). Hence, the cost of the nd{n_{\text{d}}} MAP estimation problems, in terms of “forward-like” PDE solves, is 𝒪(2×nd×nnewton×nact)\mathcal{O}(2\times{n_{\text{d}}}\times n_{\text{newton}}\times{n_{\text{act}}}), where nnewtonn_{\text{newton}} is the (average) number of Gauss–Newton iterations. The 2×nd×ntr2\times{n_{\text{d}}}\times{n_{\text{tr}}} solves in 30c are also done using preconditioned CG. As noted before, each of these solves requires 𝒪(nact)\mathcal{O}({n_{\text{act}}}) CG iterations. To summarize, the overall cost of evaluating 30 is bounded by

𝒪(2×nd×nnewton×nact)cost of MAP point solves in 30b+𝒪(2×nd×ntr×nact)cost of solves in 30c.\underbrace{\mathcal{O}(2\times{n_{\text{d}}}\times n_{\text{newton}}\times{n_{\text{act}}})}_{\text{cost of MAP point solves in \lx@cref{creftype~refnum}{equ:map_estimation_trace_est}}}+\underbrace{\mathcal{O}(2\times{n_{\text{d}}}\times{n_{\text{tr}}}\times{n_{\text{act}}})}_{\text{cost of solves in~{}\lx@cref{creftype~refnum}{equ:hess_sys_trace_est}}}. (36)

It is important to note that MAP estimation problems as well as the linear solves in 30c can be performed in parallel.

Cost of evaluating Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}

As is the case with 30, the solution of the MAP estimation problems 34b dominates the computational cost of evaluating Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}. This cost was analyzed for the case of Φndtr\Phi_{{n_{\text{d}}}}^{\mathrm{tr}}. We next discuss the cost of solving the eigenvalue problems in 34c. We rely on the Lanczos method [19] for these eigenvalue problems.555Another option is the use of randomized methods [21]. For each i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\}, the Lanczos method requires 𝒪(nact)\mathcal{O}({n_{\text{act}}}) applications of ~i\widetilde{\mathcal{H}}^{i} on vectors, each costing two PDE solves. Therefore, solving the eigenvalue problems requires a total of 𝒪(2×nd×nact)\mathcal{O}(2\times{n_{\text{d}}}\times{n_{\text{act}}}) PDE solves. Hence, the overall cost of computing Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}} is bounded by

𝒪(2×nd×nnewton×nact)cost of MAP point solves in 34b+𝒪(2×nd×nact)cost of solves in 34c.\underbrace{\mathcal{O}(2\times{n_{\text{d}}}\times n_{\text{newton}}\times{n_{\text{act}}})}_{\text{cost of MAP point solves in \lx@cref{creftype~refnum}{equ:map_estimation_LR}}}+\underbrace{\mathcal{O}(2\times{n_{\text{d}}}\times{n_{\text{act}}})}_{\text{cost of solves in~{}\lx@cref{creftype~refnum}{equ:eig_LR}}}. (37)

To sum up, the computational cost of evaluating both Φndtr\Phi_{{n_{\text{d}}}}^{\mathrm{tr}} and Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}} is dominated by the cost of solving the MAP estimation problems. However, by exploiting the low-rank structure of the operators ~i\widetilde{\mathcal{H}}^{i}, Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}} is more efficient to compute, as it does not require ntr{n_{\text{tr}}} Hessian solves (compare also the second terms in  36 and 37). Moreover, computing the traces using the eigenvalues will be, in general, more accurate than a sampling based approach, as long as sufficiently many eigenvalues are used. Note also that when using a greedy approach, nact{n_{\text{act}}} starts at nact=1{n_{\text{act}}}=1 in the first step and increase by one in each iteration.

However, the idea of using a randomized trace estimator does have some merits. For one thing, typically a small (in order of tens) ntr{n_{\text{tr}}} is sufficient when solving the OED problem. Moreover, Φndtr\Phi_{{n_{\text{d}}}}^{\mathrm{tr}} is simpler to implement as it does not require solving eigenvalue problems. Note, however, that if a large ntr{n_{\text{tr}}} is needed, then the cost of Hessian solves in 30c might exceed the cost of the MAP estimation problems.

6 Model problem

Here, we present the model problem used to study our approach for OED under uncertainty. We consider a nonlinear inverse problem governed by a linear elliptic PDE, in a three-dimensional domain. This problem, which is adapted from [33], is motivated by heat transfer applications. In Section 6.1, we detail the description of the Bayesian inverse problem under study. Subsequently, we detail the description of the OED objective Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}, for this specific model problem in Section 6.2.

Refer to caption
Figure 1: Sketch of the physical domain Ω\Omega and location of candidate sensor locations (blue circles).

6.1 The Bayesian inverse problem

We consider the following model

(eξu)=0 in Ω,u=0 on ΓD,eξu𝒏=h on ΓN,eξu𝒏+emu=0 on ΓR.\begin{split}-\nabla\cdot(e^{{\xi}}\nabla u)&=0\quad\text{ in }\Omega,\\ u&=0\quad\text{ on }\Gamma_{\!\text{D}},\\ e^{{\xi}}\nabla{u}\cdot{\mathchoice{\mbox{\boldmath$\displaystyle{n}$}}{\mbox{\boldmath$\textstyle{n}$}}{\mbox{\boldmath$\scriptstyle{n}$}}{\mbox{\boldmath$\scriptscriptstyle{n}$}}}&=h\quad\text{ on }\Gamma_{\!\text{N}},\\ e^{{\xi}}\nabla{u}\cdot{\mathchoice{\mbox{\boldmath$\displaystyle{n}$}}{\mbox{\boldmath$\textstyle{n}$}}{\mbox{\boldmath$\scriptstyle{n}$}}{\mbox{\boldmath$\scriptscriptstyle{n}$}}}+e^{{m}}u&=0\quad\text{ on }\Gamma_{\!\text{R}}.\end{split} (38)

In this problem, Ω3\Omega\subset\mathbb{R}^{3} is bounded domain with sufficiently smooth boundary Γ=ΓDΓNΓR\Gamma=\Gamma_{\!\text{D}}\cup\Gamma_{\!\text{N}}\cup\Gamma_{\!\text{R}}, where ΓD\Gamma_{\!\text{D}}, ΓN\Gamma_{\!\text{N}}, and ΓR\Gamma_{\!\text{R}} are mutually disjoint. In 38, uu is the state variable, mm is the inversion parameter, and ξ\xi is the secondary uncertain parameter. The Neumann boundary data is hL2(ΓN)h\in L^{2}(\Gamma_{\!\text{N}}), 𝒏\textstyle{n} is the unit-length outward normal for the boundaries ΓN\Gamma_{\!\text{N}} and ΓR\Gamma_{\!\text{R}}, and for simplicity we have considered zero volume source, and homogeneous Dirichlet and Robin conditions. In our numerical experiments, Ω\Omega is a three-dimensional domain with a unit square base and height such that aspect ratio (base/height) is 100; see Figure 1. In the present example, ΓR\Gamma_{\!\text{R}} is the bottom edge of the domain, ΓN\Gamma_{\!\text{N}} is the top edge, and ΓD\Gamma_{\!\text{D}} is the union of the side edges. In our computations, we define the boundary source term hh according to h(𝒙)=1+sin(4π(x11)2+(x21)2)h({\mathchoice{\mbox{\boldmath$\displaystyle{x}$}}{\mbox{\boldmath$\textstyle{x}$}}{\mbox{\boldmath$\scriptstyle{x}$}}{\mbox{\boldmath$\scriptscriptstyle{x}$}}})=1+\sin\big{(}4\pi\sqrt{(x_{1}-1)^{2}+(x_{2}-1)^{2}}\big{)}.

Note that in this problem, the inversion and secondary parameters are both functions. The inversion parameter belongs to the Hilbert space =L2(ΓR)\mathscr{M}=L^{2}(\Gamma_{\!\text{R}}), which is equipped with the L2(ΓR)L^{2}(\Gamma_{\!\text{R}})-inner product. The secondary parameter ξ\xi takes values in L2(Ω)L^{2}(\Omega). As discussed below, we define ξ\xi as an L2(Ω)L^{2}(\Omega)-valued Gaussian random variable with almost surely continuous realizations, which ensures (almost sure) well-posedness of the problem 38. Below we use ,Ω{\langle{\cdot},{\cdot}\rangle}_{{\Omega}} and ,ΓN{\langle{\cdot},{\cdot}\rangle}_{{\Gamma_{\!\text{N}}}} to denote the L2(Ω)L^{2}(\Omega)- and L2(ΓN)L^{2}(\Gamma_{\!\text{N}})-inner products, respectively. Also, with a slight abuse of notation, we denote the L2(Ω)3L^{2}(\Omega)^{3} inner-product with ,Ω{\langle{\cdot},{\cdot}\rangle}_{{\Omega}} as well.

We use measurements of uu on the top boundary, ΓN\Gamma_{\!\text{N}}, to estimate the inversion parameter mm. The measurements are collected at a set of sensor locations as depicted in Figure 1. Hence, in the present problem, the parameter-to-observable is defined as the composition of a linear observation operator \mathcal{B}, which extracts solution values at the sensor locations, and the PDE solution operator. Note that this parameter-to-observable map is a nonlinear function of the inversion and secondary parameters.

As detailed in Section 3, we assume a Gaussian noise model, and use the BAE approach to account for the secondary uncertainties in the inverse problem. Also, we use a Gaussian prior law for mm and model the secondary uncertainty as a Gaussian random variable.777As discussed earlier, the presented framework does not rely on a specific choice of distribution law for ξ\xi. The Gaussian assumption here is merely for computational convenience. For both mm and ξ\xi, we use covariance operators that are defined as negative powers of Laplacian-like operators [40]; see Section 7. Note that with the appropriate choice of the covariance operator, the realizations of ξ\xi will be almost surely continuous on the closure of Ω\Omega.

6.2 OED under uncertainty problem formulation

In this section we discuss the precise definition of Φndeig\Phi_{{n_{\text{d}}}}^{\mathrm{eig}}, for the model inverse problem discussed in Section 6.1. The MAP estimation problems in 34b require minimizing 𝒥𝒘(m;𝒚i)\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}) defined in 21. For notational convenience, in this section, we use the shorthand mim_{i}^{\star} to denote mMAP𝒚𝒘im_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}. In the present example, the first order optimality conditions for this optimization problem are given by

eξui,p~Ωh,p~ΓN+emiui,p~\displaystyle{\langle{e^{{\xi}}\nabla{u_{i}}},{\nabla{\tilde{p}}}\rangle}_{{\Omega}}-{\langle{h},{\tilde{p}}\rangle}_{\Gamma_{\!\text{N}}}+{\langle{e^{{m_{i}^{\star}}}u_{i}},{\tilde{p}}\rangle}_{{\!\mathscr{M}}} =0,\displaystyle=0, p~𝒱0,\displaystyle\quad\forall\tilde{p}\in\mathscr{V}_{\!\scriptscriptstyle{0}}, (39)
eξpi,u~Ω+emiu~,pi+𝚺(𝒘)(ui𝒚i+𝜺0),u~Ω\displaystyle{\langle{e^{{\xi}}\nabla{p_{i}}},{\nabla{\tilde{u}}}\rangle}_{{\Omega}}+{\langle{e^{{m_{i}^{\star}}}\tilde{u}},{p_{i}}\rangle}_{{\!\mathscr{M}}}+{\langle{\mathcal{B}^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})(\mathcal{B}u_{i}-{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}+{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0})},{\tilde{u}}\rangle}_{{\Omega}} =0,\displaystyle=0, u~𝒱0,\displaystyle\quad\forall\tilde{u}\in\mathscr{V}_{\!\scriptscriptstyle{0}}, (40)
mimpr,m~+m~emiui,pi\displaystyle{\langle{m_{i}^{\star}-m_{\mathup{pr}}},{\tilde{m}}\rangle}_{{\mathscr{E}}}+{\langle{\tilde{m}e^{{m_{i}^{\star}}}u_{i}},{p_{i}}\rangle}_{{\!\mathscr{M}}} =0,\displaystyle=0, m~,\displaystyle\quad\forall\tilde{m}\in\mathscr{E}, (41)

where 𝒱0={vH1(Ω):v|ΓD=0}\mathscr{V}_{\!\scriptscriptstyle{0}}=\{v\in H^{1}(\Omega):{\left.\kern-1.2pt{v}\vphantom{\big{|}}\right|_{\Gamma_{\!\text{D}}}}=0\}. For the derivation details, we refer to [33]. Note that 39 is the weak form of the state equation 38, and 40 and 41 are the weak forms of the adjoint and gradient equations, respectively. Note also that the left hand side of 41 describes the action of the derivative of 𝒥𝒘(m;𝒚i)\mathcal{J}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(m;{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}) in the direction m~\tilde{m}.

To specify the eigenvalue problem in 34c, we first discuss the action of the operator i\mathcal{H}^{i} (evaluated at mim_{i}^{\star}) in the direction m^\hat{m}. In weak form, this Hessian application satisfies,

i(mi)m^,m~=m~emiui,p^i,\displaystyle{\langle{\mathcal{H}^{i}(m_{i}^{\star})\hat{m}},{\tilde{m}}\rangle}_{{\!\mathscr{M}}}={\langle{\tilde{m}e^{{m_{i}^{\star}}}u_{i}},{\hat{p}_{i}}\rangle}_{{\!\mathscr{M}}}, (42)

for every m~\tilde{m}\in\mathscr{E}. In 42, for a given mim_{i}^{\star}, uiu_{i} solves the state problem (38), p^i\hat{p}_{i} solves the incremental adjoint problem

eξp^i,u~Ω+emiu~,p^i+𝚺(𝒘)u^i,u~Ω=0,u~𝒱0,\displaystyle{\langle{e^{{\xi}}\nabla{\hat{p}_{i}}},{\nabla{\tilde{u}}}\rangle}_{{\Omega}}+{\langle{e^{{m_{i}^{\star}}}\tilde{u}},{\hat{p}_{i}}\rangle}_{{\!\mathscr{M}}}+{\langle{\mathcal{B}^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{B}\hat{u}_{i}},{\tilde{u}}\rangle}_{{\Omega}}=0,\quad\forall\tilde{u}\in\mathscr{V}_{\!\scriptscriptstyle{0}}, (43)

and u^i\hat{u}_{i} solves the so-called incremental state problem

eξu^i,p~Ω+emiu^i,p~+emim^ui,p~ΓR=0,p~𝒱0.\displaystyle{\langle{e^{{\xi}}\nabla{\hat{u}_{i}}},{\nabla{\tilde{p}}}\rangle}_{{\Omega}}+{\langle{e^{{m_{i}^{\star}}}\hat{u}_{i}},{\tilde{p}}\rangle}_{{\!\mathscr{M}}}+{\langle{e^{{m_{i}^{\star}}}\hat{m}u_{i}},{\tilde{p}}\rangle}_{\Gamma_{\!\text{R}}}=0,\quad\forall\tilde{p}\in\mathscr{V}_{\!\scriptscriptstyle{0}}. (44)

We next summarize the OED problem of minimizing (34) as a PDE-constrained optimization problem specialized for the model inverse problem in Section 6:

min𝒘{0,1}nd1ndi=1ndk=1rλik1+λiksik2,\displaystyle\min_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}\in\{0,1\}^{n_{\text{d}}}}-\frac{1}{{n_{\text{d}}}}\sum_{i=1}^{n_{\text{d}}}\sum_{k=1}^{r}\frac{\lambda_{ik}}{1+\lambda_{ik}}\|s_{ik}\|_{\mathscr{M}}^{2}, (45a)
where, for i{1,,nd}i\in\{1,\ldots,{n_{\text{d}}}\} and k{1,,r}k\in\{1,\ldots,r\},
eξui,p~Ωh,p~ΓN+emiui,p~\displaystyle{\langle{e^{{\xi}}\nabla{u_{i}}},{\nabla{\tilde{p}}}\rangle}_{{\Omega}}-{\langle{h},{\tilde{p}}\rangle}_{{\Gamma_{\!\text{N}}}}+{\langle{e^{{m_{i}^{\star}}}u_{i}},{\tilde{p}}\rangle}_{{\!\mathscr{M}}} =0,\displaystyle=0, p~𝒱0,\displaystyle\forall\tilde{p}\in\mathscr{V}_{\!\scriptscriptstyle{0}}, (45b)
eξpi,u~Ω+emiu~,pi+𝚺(𝒘)(ui𝒚i+𝜺0),u~Ω\displaystyle{\langle{e^{{\xi}}\nabla{p_{i}}},{\nabla{\tilde{u}}}\rangle}_{{\Omega}}+{\langle{e^{{m_{i}^{\star}}}\tilde{u}},{p_{i}}\rangle}_{{\!\mathscr{M}}}+{\langle{\mathcal{B}^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})(\mathcal{B}u_{i}-{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}+{\mathchoice{\mbox{\boldmath$\displaystyle{\varepsilon}$}}{\mbox{\boldmath$\textstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptstyle{\varepsilon}$}}{\mbox{\boldmath$\scriptscriptstyle{\varepsilon}$}}}_{0})},{\tilde{u}}\rangle}_{{\Omega}} =0,\displaystyle=0, u~𝒱0,\displaystyle\forall\tilde{u}\in\mathscr{V}_{\!\scriptscriptstyle{0}}, (45c)
mimpr,m~+m~emiui,pi\displaystyle{\langle{m_{i}^{\star}-m_{\mathup{pr}}},{\tilde{m}}\rangle}_{{\mathscr{E}}}+{\langle{\tilde{m}e^{{m_{i}^{\star}}}u_{i}},{p_{i}}\rangle}_{{\!\mathscr{M}}} =0,\displaystyle=0, m~,\displaystyle\forall\tilde{m}\in\mathscr{E}, (45d)
eξp^ik,u~Ω+emiu~,p^ik+𝚺(𝒘)u^ik,u~Ω\displaystyle{\langle{e^{{\xi}}\nabla{\hat{p}_{ik}}},{\nabla{\tilde{u}}}\rangle}_{{\Omega}}+{\langle{e^{{m_{i}^{\star}}}\tilde{u}},{\hat{p}_{ik}}\rangle}_{{\!\mathscr{M}}}+{\langle{\mathcal{B}^{*}\mathbf{{\Sigma}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})\mathcal{B}\hat{u}_{ik}},{\tilde{u}}\rangle}_{{\Omega}} =0,\displaystyle=0, u~𝒱0,\displaystyle\forall\tilde{u}\in\mathscr{V}_{\!\scriptscriptstyle{0}}, (45e)
m~emiui,p^ik\displaystyle{\langle{\tilde{m}e^{{m_{i}^{\star}}}u_{i}},{\hat{p}_{ik}}\rangle}_{{\!\mathscr{M}}} =λiksik,m~,\displaystyle=\lambda_{ik}{\langle{s_{ik}},{\tilde{m}}\rangle}_{{\mathscr{E}}}, m~,\displaystyle\forall\tilde{m}\in\mathscr{E}, (45f)
eξu^ik,p~Ω+emiu^ik,p~+emisikui,p~\displaystyle{\langle{e^{{\xi}}\nabla{\hat{u}_{ik}}},{\nabla{\tilde{p}}}\rangle}_{{\Omega}}+{\langle{e^{{m_{i}^{\star}}}\hat{u}_{ik}},{\tilde{p}}\rangle}_{{\!\mathscr{M}}}+{\langle{e^{{m_{i}^{\star}}}s_{ik}u_{i}},{\tilde{p}}\rangle}_{{\!\mathscr{M}}} =0,\displaystyle=0, p~𝒱0,\displaystyle\forall\tilde{p}\in\mathscr{V}_{\!\scriptscriptstyle{0}}, (45g)
sik,sik\displaystyle{\langle{s_{ik}},{s_{ik}}\rangle}_{{\mathscr{E}}} =1.\displaystyle=1. (45h)

The PDE constraints (45b)–(45d) are the optimality system (39)–(41) characterizing the MAP point mi=mMAP𝒚𝒘im_{i}^{\star}=m_{\scriptscriptstyle\mathup{MAP}}^{{\mathchoice{\mbox{\boldmath$\displaystyle{y}$}}{\mbox{\boldmath$\textstyle{y}$}}{\mbox{\boldmath$\scriptstyle{y}$}}{\mbox{\boldmath$\scriptscriptstyle{y}$}}}^{i}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}} described in (34b). The equations (45e)–(45g) are the PDE constraints that describe the Hessian apply and eigenvalue problem. Note that we have reformulated the eigenvalue problem according to 35.

7 Computational results

In this section, we numerically study our OED under uncertainty approach, which we apply to the model inverse problem described in Section 6. We begin by specifying the Bayesian problem setup and discretization details in Section 7.1. Then, we study the impact of secondary model uncertainty on the measurements and compute the BAE error statistics in Section 7.2. Finally, in Section 7.3, we examine the effectiveness of our approach in computing uncertainty aware designs. We also study the impact of ignoring model uncertainty in (i) experimental design stage and (ii) both experimental design and inference stages. Ignoring uncertainty in the design stage amounts to fixing the secondary parameter ξ\xi at its nominal value (i.e., using the approximate model 10) and ignoring the approximation error when solving the OED problem. We refer to designs computed in this manner as uncertainty unaware designs. Note that in the case of (i), the uncertainty is still accounted for when solving the inverse problem. This study illustrates the importance of accounting for model uncertainty, when computing experimental designs. On the other hand, the study of case (ii) illustrates the pitfalls of ignoring uncertainty in both the optimal design problem and subsequent solution of the inverse problem.

7.1 Problem setup

We consider ns=100{n_{\text{s}}}=100 candidate sensor locations that are arranged in a regular grid on the top boundary ΓN\Gamma_{\!\text{N}}; see Figure 1. The additive noise in the synthetic measurements has a covariance matrix of the form 𝚪noise=σ2𝐈\mathbf{{\Gamma}}_{\mathup{noise}}=\sigma^{2}\mathbf{{I}}, with σ=103\sigma=10^{-3}. This amounts to about one percent noise.

We use a Gaussian prior law μpr=𝒩(mpr,𝒞pr)\mu_{\mathup{pr}}=\mathcal{N}\!\left({m_{\mathup{pr}}},{\mathcal{C}_{\mathup{pr}}}\right) for mm. The prior mean is taken as a constant function mpr1m_{\mathup{pr}}\equiv 1, and we use a covariance operator given by the inverse of a Laplacian-like operator. Specifically, we let 𝒞pr:=𝒜2\mathcal{C}_{\mathup{pr}}:=\mathcal{A}^{-2}, with 𝒜[m]=(θm)+αm\mathcal{A}[m]=-\nabla\cdot(\theta\nabla m)+\alpha m, where we take θ=0.1\theta=0.1 and α=1\alpha=1. To help mitigate undesirable boundary affects that can arise due to the use of PDE-based prior covariance operators, we equip the operator 𝒜\mathcal{A} with Robin boundary conditions [16]. For illustration, four random draws from the prior distribution are shown in Figure 2.

Refer to caption
Refer to caption
Refer to caption
Refer to captionRefer to caption
Figure 2: Samples of the primary uncertain parameter mm.

The law of the secondary parameter ξ\xi is chosen to be a Gaussian measure μξ=𝒩(ξ¯,𝒞ξ)\mu_{\xi}=\mathcal{N}\!\left({\bar{\xi}},{\mathcal{C}_{\xi}}\right). We set the mean of ξ\xi as the constant function ξ¯0\bar{\xi}\equiv 0, and the covariance operator is defined as 𝒞ξ:=2\mathcal{C}_{\xi}:=\mathcal{L}^{-2}, where [ξ]=(𝚯ξ)+γξ\mathcal{L}[\xi]=-\nabla\cdot(\mathbf{{\Theta}}\nabla\xi)+\gamma\xi with 𝚯=0.25diag(1,1,1100)\mathbf{{\Theta}}=0.25\mathop{\mathrm{diag}}(1,1,\tfrac{1}{100}) and γ=50\gamma=50. This choice of 𝚯\mathbf{{\Theta}} corresponds to a random field with much shorter correlation in the zz-direction than in the xx- and yy-directions, inline with aspect ratio of 100 used in defining the domain Ω\Omega. We show four representative samples of ξ\xi in Figure 3.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Samples of the secondary uncertain parameter ξ\xi.

The forward problem 38 is solved using a continuous Galerkin finite element method. We use 9,6009{,}600 tetrahedral elements and 2,2052{,}205 piecewise linear basis functions. As such, the discretized state and adjoint variables, as well as the secondary parameter ξ\xi, have dimension 2,2052{,}205. On the other hand, the discretized inversion parameter mm, which is defined on the bottom boundary, is of dimension 441441.

7.2 Incorporating the model uncertainty in the inverse problem

We begin by studying the impact of the secondary parameter on the solution of the forward problem. This is illustrated in Figure 4. In this experiment, we solve the forward problem for a fixed mm and four different realizations of ξ\xi. Note that, although the qualitative behavior of uu on ΓN\Gamma_{\!\text{N}} is similar for the different samples of ξ\xi, there are considerable differences in the values. This indicates that the approximation error due to fixing ξ\xi at the nominal value will have significant variations.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: The effect of the secondary uncertain parameter ξ\xi on the state uu on top surface for the (fixed) true primary parameter mm and four different realizations of ξ\xi.

We compute the sample mean and covariance matrix of the approximation error as in 27 with nmc=1,000{n_{\text{mc}}}=1{,}000. The mean and marginal standard deviations are shown in Figure 5. To illustrate the correlation structure of the approximation error, we show the correlation of the approximation error at two of the candidate locations with the other candidate locations in Figure 6 (left-middle). To provide an overall picture, we report the correlation matrix of the approximation error in Figure 6 (right). Note that the approximation errors at the sensor sites are not only larger than the measurement noise, but also they are highly correlated and have a nonzero and non-constant mean. Thus, in the present application, the approximation error due to model uncertainty cannot be ignored and needs to be accounted for.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Left: the mean of the approximation error at the candidate sensor locations; right: the marginal standard deviation of the approximation error at these locations (i.e., the square root of diagonal entries of 𝚪^ε\widehat{\mathbf{{\Gamma}}}_{\mathup{\varepsilon}} given in 27).
Refer to captionRefer to captionRefer to captionRefer to caption
Figure 6: Left: The correlation of a measurement from a sensor near the middle marked by red dot, with the remaining sensors; middle: The same information for the sensor in the top left corner of the sensor grid; right: the correlation matrix of the approximation error.

7.3 Optimal experimental design under uncertainty

We begin by solving the OED problem 45 with nd=5{n_{\text{d}}}=5 training data samples. In Figure 7 (left), we show an uncertainty aware optimal sensor placement with 20 sensors. Note that due to the use of a greedy algorithm, we can track the order in which the sensors are picked. To understand the impact of ignoring model uncertainty in the design stage, we also compute an uncertainty unaware design; this is reported in Figure 7 (right).

Refer to caption
Refer to caption
Figure 7: The uncertainty aware (left) and uncertainty unaware (right) sensor placements. The numbers indicate the order in which the sensors are picked in the greedy algorithms for minimizing 34a.

To evaluate the quality of the computed designs, we first study the expected posterior variance and expected relative error of the MAP point. To facilitate this, we draw parameter samples {miv}i=1nv\{m^{\text{v}}_{i}\}_{i=1}^{n_{\text{v}}} from μpr\mu_{\mathup{pr}} and generate validation data samples,

𝐝𝒘i=𝐏𝒘(𝒢(miv,ξiv)+𝜼iv),i{1,,nv},{\mathbf{d}^{i}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}=\mathbf{{P}}_{\!{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}(\mathcal{G}(m^{\text{v}}_{i},\xi^{\text{v}}_{i})+{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}^{\text{v}}_{i}),\quad i\in\{1,\ldots,{n_{\text{v}}}\},

where {ξiv}i=1nv\{\xi^{\text{v}}_{i}\}_{i=1}^{n_{\text{v}}} and {𝜼iv}i=1nv\{{\mathchoice{\mbox{\boldmath$\displaystyle{\eta}$}}{\mbox{\boldmath$\textstyle{\eta}$}}{\mbox{\boldmath$\scriptstyle{\eta}$}}{\mbox{\boldmath$\scriptscriptstyle{\eta}$}}}^{\text{v}}_{i}\}_{i=1}^{n_{\text{v}}} are draws from μξ\mu_{\xi} and 𝒩(𝟎,𝚪noise)\mathcal{N}\!\left({{\mathchoice{\mbox{\boldmath$\displaystyle{0}$}}{\mbox{\boldmath$\textstyle{0}$}}{\mbox{\boldmath$\scriptstyle{0}$}}{\mbox{\boldmath$\scriptscriptstyle{0}$}}}},{\mathbf{{\Gamma}}_{\mathup{noise}}}\right), respectively. Then, we consider

V¯(𝒘)=1nvi=1nv𝗍𝗋(𝒞post𝐝𝒘i)andE¯MAP(𝒘)=1nvi=1nvmMAP𝐝𝒘imivmiv.\bar{V}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\frac{1}{{n_{\text{v}}}}\sum_{i=1}^{{n_{\text{v}}}}\mathsf{tr}(\mathcal{C}_{\mathup{post}}^{\mathbf{d}^{i}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}})\quad\text{and}\quad\bar{E}_{\scriptscriptstyle\text{MAP}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}})=\frac{1}{{n_{\text{v}}}}\sum_{i=1}^{{n_{\text{v}}}}\frac{\|m_{\scriptscriptstyle\mathup{MAP}}^{\mathbf{d}^{i}_{{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}}}-m^{\text{v}}_{i}\|_{\mathscr{M}}}{\|m^{\text{v}}_{i}\|_{\mathscr{M}}}. (46)

In our numerical experiments we use nv=100{n_{\text{v}}}=100. We compare V¯(𝒘)\bar{V}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) and E¯MAP(𝒘)\bar{E}_{\scriptscriptstyle\text{MAP}}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}) when solving the inverse problem with the computed optimal design versus randomly chosen designs in Figure 8, where we consider designs with different numbers of sensors. We also examine the impact of ignoring the model uncertainty in solving the OED problem (see the black dots in Figure 8). Note that the uncertainty aware designs outperform the random designs as well as the uncertainty unaware designs. This is most pronounced when the number of sensors is small. This is precisely when optimal placement of sensors is crucial. We also observe that as the number of sensors in the designs increase, the cloud moves left and downward. This is expected. The more sensors we use, the more we can improve the quality of the MAP point and reduce posterior uncertainty. For ns30{n_{\text{s}}}\geq 30, we note that the difference between uncertainty aware and uncertainty unaware designs become small (in the case of ns=30{n_{\text{s}}}=30, they nearly overlap). For nd40{n_{\text{d}}}\geq 40, even the random designs become competitive. All these are expected. As mentioned, earlier, optimal placement of sensors is crucial, when we have access only to a “small” number measurement points. What constitutes “small” is problem dependent.

Refer to caption
Figure 8: The expected relative error E¯MAP(w)\bar{E}_{\scriptscriptstyle\text{MAP}}(w) of the MAP point versus expected posterior variance V¯(w)\bar{V}(w) using random designs (red dots), an uncertainty aware sensor placements (blue dots), and uncertainty unaware sensor placements (black dots) using 10, 15, and 20 sensors.

Next, we illustrate the effectiveness of the computed optimal designs in reducing posterior uncertainty. To this end, we consider the computed optimal designs with 10 sensors. In Figure 9, we show the effectiveness of the uncertainty aware optimal design in reducing posterior uncertainty; we also report the posterior standard deviation field, when solving the inverse problem with an uncertainty unaware design. To complement this study, we consider the quality of the MAP points computed using uncertainty aware and uncertainty unaware designs in Figure 10. Overall, we observe that the uncertainty aware design is more effective in reducing posterior uncertainty and results in a higher quality MAP point. This conclusion is also supported by the results reported in Figure 8.

Note that the data used to solve the inverse problem is synthesized by solving the PDE model 1, using our choice of the “truth” inversion parameter (see Figure 10 (left)) and a randomly chosen ξ\xi followed by extracting measurements at the sensor sites and adding measurement noise. This simulates the practical situation when field data that corresponds to an unknown choice of ξ\xi is collected.

Refer to caption
Refer to caption
Refer to captionRefer to caption
Figure 9: The effectiveness of the computed optimal design with 10 sensors on posterior uncertainty. We report the pointwise prior standard deviation of mm (left), and the pointwise posterior standard deviation of mm using the uncertainty aware (middle) and uncertainty unaware (right) optimal designs.
Refer to caption
Refer to caption
Refer to captionRefer to caption
Figure 10: The quality of the MAP points computed using optimal designs with 10 sensors. We show the true parameter field (left), the MAP point computed using the uncertainty aware optimal design (middle) and using the uncertainty unaware design (right).

In the above experiments, when examining the performance of uncertainty unaware designs, model uncertainty was still accounted for (following the BAE framework) when solving the inverse problem using these designs. Next, we examine the impact of ignoring model uncertainty in both OED and inference stages. For this experiment, we use the same synthesized data as that used to obtain the results in Figure 10 (right). In Figure 11 we report the result of solving the inverse problem with an uncertainty unaware design, when ignoring model uncertainty in the inverse problem as well. We note an impressive reduction of uncertainty; see Figure 11 (left), which uses the same scale as the standard deviation plots in Figure 9. On the other hand, the MAP point computed in this case is of very poor quality; see Figure 11 (right), where we have used the same scale as the MAP point plots in Figure 10. Intuitively, this indicates that ignoring model uncertainty in both OED and inference stages, in presence of significant modeling uncertainties, can lead to an unfortunate situation where one is highly certain (i.e., low posterior variance) about a very wrong parameter estimate.

Refer to captionRefer to captionRefer to captionRefer to caption
Figure 11: The posterior standard deviation field (left) and MAP point (right) when solving the uncertainty unaware inverse problem using an uncertainty unaware design.
Refer to caption
Figure 12: The impact of the number of training data samples nd{n_{\text{d}}} on the quality of the computed optimal design with 10 sensors.

As mentioned earlier, we used nd=5{n_{\text{d}}}=5 training data samples when solving the OED problem 45. Clearly, increasing nd{n_{\text{d}}} would improve the quality of the design—we would be optimizing a more accurate estimate of the OED objective. However, this comes with increased computational overhead. In practice, typically a small nd{n_{\text{d}}} is effective in obtaining good quality designs. This is demonstrated in the results above (e.g., fig. 8). To examine the impact of changing nd{n_{\text{d}}} on the quality of the computed optimal design, we perform a brief computational study where we focus on designs with 10 sensors. We compute the expected value of the posterior variance (as in 46) at the computed optimal design, 𝒘opt{\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}_{\text{opt}}, with values of nd{3,5,10,20,30}{n_{\text{d}}}\in\{3,5,10,20,30\}. The results are reported in fig. 12. As before, V¯(𝒘opt)\bar{V}({\mathchoice{\mbox{\boldmath$\displaystyle{w}$}}{\mbox{\boldmath$\textstyle{w}$}}{\mbox{\boldmath$\scriptstyle{w}$}}{\mbox{\boldmath$\scriptscriptstyle{w}$}}}_{\text{opt}}) is computed with nv=100{n_{\text{v}}}=100 validation data samples. Note that going beyond nd=10{n_{\text{d}}}=10 results in diminishing returns. This, however, entails increased computational cost. In practice, studies such as the one conducted here may be necessary to estimate a reasonable choice for nd{n_{\text{d}}}. In the present example, we see that nd=5{n_{\text{d}}}=5 is approximately at the “elbow” of the curve in fig. 12, making it a reasonable practical choice.

8 Conclusions

In the present work, we addressed OED under uncertainty for Bayesian nonlinear inverse problems governed by PDEs with infinite-dimensional inversion and secondary parameters. We have presented a mathematical framework and scalable computational methods for computing uncertainty aware optimal designs. Our results demonstrate that ignoring the uncertainty in the OED and/or the parameter inversion stages can lead to inferior designs and inaccurate parameter estimation. Hence, it is important to account for modeling uncertainties in Bayesian inversion and OED.

The limitations of the proposed approach are in its reliance on Gaussian approximations for the posterior and the approximation error. The former is a common approach in large-scale Bayesian inverse problems as well as in the BAE literature. The Gaussian approximation to the posterior is suitable if a linearization of the forward model, at the MAP point, is sufficiently accurate for the set of parameters with significant posterior probability. On the other hand, the Gaussian approximation of the approximation error, which is guided by the BAE approach, might fail to adequately capture the distribution of the approximation error. However, as shown in various studies in the BAE literature, this Gaussian approximation is reasonable in broad classes of inverse problems; see, e.g., [27, 24, 32, 33]. Additionally, in the present work we considered a greedy approach for tackling the binary OED optimization problems. This can become expensive if the number of candidate sensor locations or the desired number of sensors in the computed sensor placements become very large.

The numerical experiments in the present work focus on an academic, albeit application-driven, model problem. The present application was chosen since it exhibits key problem structures seen in broad classes of ill-posed inverse problems. Namely, smoothing properties of the forward operator lead to low-rank structures that can be exploited when developing numerical methods. The present numerical study also builds on and complements the previous study of BAE for the Robin boundary condition inversion problems in [33]. Examining the properties of the present method in more challenging inverse problems with multiple types of modeling uncertainties is a subject for future work.

The discussions in this article point to a number of opportunities for future work. In the first place, we point out that the use of BAE approach to account for modeling uncertainties in Bayesian inverse problems is only one possible application of this approach. In general, BAE has been used in a wide range of applications to account for uncertainty due to the use of approximate forward models. For example, BAE can be used to model the approximation errors due to the use of reduced order models or upscaled models. BAE has also been used to account for the errors due to the use of a mean-field model instead of an underlying high-fidelity stochastic model [38]. Thus, the framework presented for OED under uncertainty in the present work can be extended to OED in inverse problems where the forward model is replaced with a low-fidelity approximate model instead of a computationally intensive high-fidelity model, as long the approximation error can be modeled adequately by a Gaussian.

Another interesting line of inquiry involves replacing the greedy strategy used to find optimal designs with more powerful optimization algorithms. One possibility is to follow a relaxation approach [31, 3, 4, 8] where the design weights are allowed to take values in the interval [0,1][0,1]. This enables use of efficient gradient-based optimization algorithms and can be combined with a suitable penalty approach to control the sparsity of the computed sensor placements. An attractive alternative is the approach in [9], which tackles the binary OED optimization problem by replacing it with a related stochastic programming problem. A further line of inquiry is investigating the idea of using fixed MAP points, computed prior to solving the OED problem, as done in [44] for OED problems with no additional uncertainties. This idea, at the expense of further approximations, replaces the bilevel optimization problem with a simpler one, hence reducing computational cost significantly.

Finally, in inverse problems governed by complex models with multiple sources of secondary uncertainty, an a priori sensitivity analysis of the Bayesian inverse problem may be necessary to identify secondary modeling uncertainties that are most influential to the solution of the inverse problem. This may be accomplished by suitable adaptations of hyper-differential sensitivity analysis methods for inverse problems [42, 36, 41].

The work of N. Petra was supported in part by US National Science Foundation grant DMS #1723211. The work of A. Alexanderian was supported in part by US National Science Foundation grants DMS #1745654 and #2111044.

References

References

  • [1] A. Alexanderian. Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review. Inverse Problems, 37(4), 2021.
  • [2] A. Alexanderian, P. J. Gloor, and O. Ghattas. On Bayesian A-and D-optimal experimental designs in infinite dimensions. Bayesian Anal., 11(3):671–695, 2016.
  • [3] A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas. A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized 0\ell_{0}-sparsification. SIAM J. Sci. Comput., 36(5):A2122–A2148, 2014.
  • [4] A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas. A fast and scalable method for A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems. SIAM J. Sci. Comput., 38(1):A243–A272, 2016.
  • [5] A. Alexanderian, N. Petra, G. Stadler, and I. Sunseri. Optimal design of large-scale Bayesian linear inverse problems under reducible model uncertainty: good to know what you don’t know. SIAM/ASA J. Uncertain. Quantif., 9(1):163–184, 2021.
  • [6] A. Y. Aravkin and T. Van Leeuwen. Estimating nuisance parameters in inverse problems. Inverse Problems, 28(11):115016, 2012.
  • [7] A. C. Atkinson and A. N. Donev. Optimum Experimental Designs. Oxford, 1992.
  • [8] A. Attia and E. Constantinescu. Optimal experimental design for inverse problems in the presence of observation correlations. SIAM J. Sci. Comput., 44(4):A2808–A2842, 2022.
  • [9] A. Attia, S. Leyffer, and T. S. Munson. Stochastic learning approach for binary optimization: Application to Bayesian optimal design of experiments. SIAM J. Sci. Comput., 44(2):B395–B427, 2022.
  • [10] O. Babaniyi, R. Nicholson, U. Villa, and N. Petra. Inferring the basal sliding coefficient field for the Stokes ice sheet model under rheological uncertainty. The Cryosphere, 15(4):1731–1750, 2021.
  • [11] A. Bartuska, L. Espath, and R. Tempone. Small-noise approximation for Bayesian optimal experimental design with nuisance uncertainty. Comput. Methods Appl. Mech. Engrg., 399:115320, 2022.
  • [12] T. Bui-Thanh, O. Ghattas, J. Martin, and G. Stadler. A computational framework for infinite-dimensional Bayesian inverse problems. Part I: The linearized case, with application to global seismic inversion. SIAM J. Sci. Comput., 35(6):A2494–A2523, 2013.
  • [13] K. Chaloner and I. Verdinelli. Bayesian experimental design: A review. Statist. Sci., 10(3):273–304, 1995.
  • [14] E. M. Constantinescu, N. Petra, J. Bessac, and C. G. Petra. Statistical treatment of inverse problems constrained by differential equations-based models with stochastic terms. SIAM/ASA J. Uncertain. Quantif., 8(1):170–197, 2020.
  • [15] G. Da Prato. An introduction to infinite-dimensional analysis. Springer, 2006.
  • [16] Y. Daon and G. Stadler. Mitigating the influence of boundary conditions on covariance operators derived from elliptic PDEs. Inverse Probl. Imaging, 12(5):1083–1102, 2018.
  • [17] M. Dashti and A. M. Stuart. The Bayesian approach to inverse problems. In R. Ghanem, D. Higdon, and H. Owhadi, editors, Handbook of Uncertainty Quantification, pages 311–428. Spinger, 2017.
  • [18] C. Feng and Y. M. Marzouk. A layered multiple importance sampling scheme for focused optimal Bayesian experimental design. preprint, 2019. https://arxiv.org/abs/1903.11187.
  • [19] G. H. Golub and C. F. Van Loan. Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, fourth edition, 2013.
  • [20] E. Haber, L. Horesh, and L. Tenorio. Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Problems, 24(055012):125–137, 2008.
  • [21] N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217–288, 2011.
  • [22] T. Isaac, N. Petra, G. Stadler, and O. Ghattas. Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet. J. Comput. Phys., 296:348–368, 2015.
  • [23] J. Jagalur-Mohan and Y. M. Marzouk. Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design. J. Mach. Learn. Res., 22, 2021.
  • [24] J. Kaipio and V. Kolehmainen. Approximate marginalization over modeling errors and uncertainties in inverse problems. Bayesian Theory and Applications, pages 644–672, 2013.
  • [25] J. Kaipio and E. Somersalo. Statistical and Computational Inverse Problems, volume 160 of Applied Mathematical Sciences. Springer-Verlag, New York, 2005.
  • [26] J. Kaipio and E. Somersalo. Statistical inverse problems: discretization, model reduction and inverse crimes. Journal of computational and applied mathematics, 198(2):493–504, 2007.
  • [27] V. Kolehmainen, T. Tarvainen, S. R. Arridge, and J. P. Kaipio. Marginalization of uninteresting distributed parameters in inverse problems-application to diffuse optical tomography. Int. J. Uncertain. Quantif., 1(1), 2011.
  • [28] K. Koval, A. Alexanderian, and G. Stadler. Optimal experimental design under irreducible uncertainty for linear inverse problems governed by PDEs. Inverse Problems, 36(7), 2020.
  • [29] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. J. Mach. Learn. Res., 9:235–284, 2008.
  • [30] F. Li. A combinatorial approach to goal-oriented optimal Bayesian experimental design. Master’s thesis, Massachusetts Institute of Technology, 2019.
  • [31] S. Liu, S. P. Chepuri, M. Fardad, E. Maşazade, G. Leus, and P. K. Varshney. Sensor selection for estimation with correlated measurement noise. IEEE Trans. Signal Process., 64(13):3509–3522, 2016.
  • [32] M. Mozumder, T. Tarvainen, S. Arridge, J. P. Kaipio, C. D’Andrea, and V. Kolehmainen. Approximate marginalization of absorption and scattering in fluorescence diffuse optical tomography. Inverse Problems & Imaging, 10(1):227, 2016.
  • [33] R. Nicholson, N. Petra, and J. P. Kaipio. Estimation of the Robin coefficient field in a Poisson problem with uncertain conductivity field. Inverse Problems, 34(11):115005, 2018.
  • [34] R. Nicholson, N. Petra, U. Villa, and J. P. Kaipio. On global normal linear approximations for nonlinear Bayesian inverse problems. Inverse Problems, 39(5):054001, 2023.
  • [35] F. J. Pinski, G. Simpson, A. M. Stuart, and H. Weber. Kullback–Leibler approximation for probability measures on infinite dimensional spaces. SIAM J. Math. Anal., 47(6):4091–4122, 2015.
  • [36] W. Reese, J. Hart, B. van Bloemen Waanders, M. Perego, J. Jakeman, and A. Saibaba. Hyper-differential sensitivity analysis in the context of Bayesian inference applied to ice-sheet problems. International Journal for Uncertainty Quantification, 14(3).
  • [37] G. Shulkind, L. Horesh, and H. Avron. Experimental design for nonparametric correction of misspecified dynamical models. SIAM/ASA J. Uncertain. Quantif., 6(2):880–906, 2018.
  • [38] M. J. Simpson, R. E. Baker, P. R. Buenzli, R. Nicholson, and O. J. Maclaren. Reliable and efficient parameter estimation using approximate continuum limit descriptions of stochastic models. J. Theoret. Biol., 549, 2022.
  • [39] R. C. Smith. Uncertainty quantification: Theory, implementation, and applications, volume 12 of Computational Science and Engineering Series. SIAM, 2013.
  • [40] A. M. Stuart. Inverse problems: A Bayesian perspective. Acta Numerica, 19:451–559, 2010.
  • [41] I. Sunseri, A. Alexanderian, J. Hart, and B. van Bloemen Waanders. Hyper-differential sensitivity analysis for nonlinear Bayesian inverse problems. International Journal for Uncertainty Quantification, 14(2), 2024.
  • [42] I. Sunseri, J. Hart, B. van Bloemen Waanders, and A. Alexanderian. Hyper-differential sensitivity analysis for inverse problems constrained by partial differential equations. Inverse Problems, 2020.
  • [43] D. Uciński. Optimal measurement methods for distributed parameter system identification. CRC Press, Boca Raton, 2005.
  • [44] K. Wu, P. Chen, and O. Ghattas. A fast and scalable computational framework for large-scale high-dimensional Bayesian optimal experimental design. SIAM/ASA Journal on Uncertainty Quantification, 11(1):235–261, 2023.