This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Nonlinear reduced models for state and parameter estimation

Albert Cohen, Wolfgang Dahmen, Olga Mula and James Nichols This research was supported by the ERC Adv grant BREAD; the Emergence Project of the Paris City Council “Models and Measures”; the NSF Grants DMS ID 1720297, DMS ID 2012469, and by the SmartState and Williams-Hedberg Foundation.
Abstract

State estimation aims at approximately reconstructing the solution uu to a parametrized partial differential equation from mm linear measurements, when the parameter vector yy is unknown. Fast numerical recovery methods have been proposed in [19] based on reduced models which are linear spaces of moderate dimension nn which are tailored to approximate the solution manifold {\cal M} where the solution sits. These methods can be viewed as deterministic counterparts to Bayesian estimation approaches, and are proved to be optimal when the prior is expressed by approximability of the solution with respect to the reduced model [2]. However, they are inherently limited by their linear nature, which bounds from below their best possible performance by the Kolmogorov width dm()d_{m}({\cal M}) of the solution manifold. In this paper we propose to break this barrier by using simple nonlinear reduced models that consist of a finite union of linear spaces VkV_{k}, each having dimension at most mm and leading to different estimators uku_{k}^{*}. A model selection mechanism based on minimizing the PDE residual over the parameter space is used to select from this collection the final estimator uu^{*}. Our analysis shows that uu^{*} meets optimal recovery benchmarks that are inherent to the solution manifold and not tied to its Kolmogorov width. The residual minimization procedure is computationally simple in the relevant case of affine parameter dependence in the PDE. In addition, it results in an estimator yy^{*} for the unknown parameter vector. In this setting, we also discuss an alternating minimization (coordinate descent) algorithm for joint state and parameter estimation, that potentially improves the quality of both estimators.

1 Introduction

1.1 Parametrized PDEs and inverse problems

Parametrized partial differential equations are of common used to model complex physical systems. Such equations can generally be written in abstract form as

𝒫(u,y)=0,{\cal P}(u,y)=0, (1.1)

where y=(y1,,yd)y=(y_{1},\dots,y_{d}) is a vector of scalar parameters ranging in some domain YdY\subset\mathbb{R}^{d}. We assume well-posedness, that is, for any yYy\in Y the problem admits a unique solution u=u(y)u=u(y) in some Hilbert space VV. We may therefore consider the parameter to solution map

yu(y),y\mapsto u(y), (1.2)

from YY to VV, which is typically nonlinear, as well as the solution manifold

:={u(y):yY}V{\cal M}:=\{u(y)\,:\,y\in Y\}\subset V (1.3)

that describes the collection of all admissible solutions. Throughout this paper, we assume that YY is compact in d\mathbb{R}^{d} and that the map (1.2) is continuous. Therefore {\cal M} is a compact set of VV. We sometimes refer to the solution u(y)u(y) as the state of the system for the given parameter vector yy.

The parameters are used to represent physical quantities such as diffusivity, viscosity, velocity, source terms, or the geometry of the physical domain in which the PDE is posed. In several relevant instances, yy may be high or even countably infinite dimensional, that is, d1d\gg 1 or d=d=\infty.

In this paper, we are interested in inverse problems which occur when only a vector of linear measurements

z=(z1,,zm)m,zi=i(u),i=1,,m,z=(z_{1},\dots,z_{m})\in\mathbb{R}^{m},\quad z_{i}=\ell_{i}(u),\quad i=1,\dots,m, (1.4)

is observed, where each iV\ell_{i}\in V^{\prime} is a known continuous linear functional on VV. We also sometimes use the notation

z=(u),=(1,,m).z=\ell(u),\quad\ell=(\ell_{1},\dots,\ell_{m}). (1.5)

One wishes to recover from zz the unknown state uu\in{\cal M} or even the underlying parameter vector yYy\in Y for which u=u(y)u=u(y). Therefore, in an idealized setting, one partially observes the result of the composition map

yYuzm.y\in Y\mapsto u\in{\cal M}\mapsto z\in\mathbb{R}^{m}. (1.6)

for the unknown yy. More realistically, the measurements may be affected by additive noise

zi=i(u)+ηi,z_{i}=\ell_{i}(u)+\eta_{i}, (1.7)

and the model itself might be biased, meaning that the true state uu deviates from the solution manifold {\cal M} by some amount. Thus, two types of inverse problems may be considered:

  1. (i)

    State estimation: recover an approximation uu^{*} of the state uu from the observation z=(u)z=\ell(u). This is a linear inverse problem, in which the prior information on uu is given by the manifold {\cal M} which has a complex geometry and is not explicitly known.

  2. (ii)

    Parameter estimation: recover an approximation yy^{*} of the parameter yy from the observation z=(u)z=\ell(u) when u=u(y)u=u(y). This is a nonlinear inverse problem, for which the prior information available on yy is given by the domain YY.

These problems become severely ill-posed when Y{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}Y} has dimension d>md>m. For this reason, they are often addressed through Bayesian approaches [17, 25]: a prior probability distribution PyP_{y} being assumed on yYy\in Y (thus inducing a push forward distribution PuP_{u} for uu\in{\cal M}), the objective is to understand the posterior distributions of yy or uu conditioned by the observations zz in order to compute plausible solutions yy^{*} or uu^{*} under such probabilistic priors. The accuracy of these solutions should therefore be assessed in some average sense.

In this paper, we do not follow this avenue: the only priors made on yy and uu are their membership to YY and {\cal M}. We are interested in developping practical estimation methods that offer uniform recovery guarantees under such deterministic priors in the form of upper bounds on the worst case error for the estimators over all yYy\in Y or uu\in{\cal M}. We also aim to understand whether our error bounds are optimal in some sense. Our primary focus will actually be on state estimation (i). Nevertheless we present in §4 several implications on parameter estimation (ii), which to our knowledge are new. For state estimation, error bounds have recently been established for a class of methods based on linear reduced modeling, as we recall next.

1.2 Reduced models: the PBDW method

In several relevant instances, the particular parametrized PDE structure allows one to construct linear spaces VnV_{n} of moderate dimension nn that are specifically tailored to the approximation of the solution manifold {\cal M}, in the sense that

dist(,Vn)=maxuminvVnuvεn,{\rm dist}({\cal M},V_{n})=\max_{u\in{\cal M}}\min_{v\in V_{n}}\|u-v\|\leq\varepsilon_{n}, (1.8)

where εn\varepsilon_{n} is a certified bound that decays with nn significanly faster than when using for VnV_{n} classical approximation spaces of dimension nn such as finite elements, algebraic or trigonometric polynomials, or spline functions. Throughout this paper

,V1/2=:=V,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\langle\cdot,\cdot\rangle_{V}^{1/2}=:}\|\cdot\|=\|\cdot\|_{V}, (1.9)

denotes the norm of the Hilbert space VV. The natural benchmark for such approximation spaces is the Kolmogorov nn-width

dn():=mindim(E)=ndist(,E).d_{n}({\cal M}):=\min_{\dim(E)=n}{\rm dist}({\cal M},E). (1.10)

The space EnE_{n} that achieves the above minimum is thus the best possible reduced model for approximating all of {\cal M}, however it is computationally out of reach.

One instance of computational reduced model spaces is generated by sparse polynomial approximations of the form

un(y)=νΛnuνyν,yν:=j1yjνj,u_{n}(y)=\sum_{\nu\in\Lambda_{n}}u_{\nu}y^{\nu},\quad y^{\nu}:=\prod_{j\geq 1}y_{j}^{\nu_{j}}, (1.11)

where Λn\Lambda_{n} is a conveniently chosen set of multi-indices such that #(Λn)=n\#(\Lambda_{n})=n. Such approximations can be derived, for example, by best nn-term truncations of infinite Taylor or orthogonal polynomial expansions. We refer to [11, 12] where convergence estimates of the form

supyYu(y)un(y)Cns,\sup_{y\in Y}\|u(y)-u_{n}(y)\|\leq Cn^{-s}, (1.12)

are established for some s>0s>0 even when d=d=\infty. Therefore, the space Vn:=span{uν:νΛn}V_{n}:={\rm span}\{u_{\nu}\,:\,\nu\in\Lambda_{n}\} approximates the solution manifold with accuracy εn=Cns\varepsilon_{n}=Cn^{-s}.

Another instance, known as reduced basis approximation, consists of using spaces of the form

Vn:=span{u1,,un},V_{n}:={\rm span}\{u_{1},\dots,u_{n}\}, (1.13)

where ui=u(yi)u_{i}=u(y^{i})\in{\cal M} are instances of solutions corresponding to a particular selection of parameter values yiYy^{i}\in Y (see [20, 23, 24]). One typical selection procedure is based on a greedy algorithm: one picks yky^{k} such that uk=u(yk)u_{k}=u(y^{k}) is furthest away from the previously constructed space Vk1V_{k-1} in the sense of maximizing a computable and tight a-posteriori bound of the projection error u(y)PVk1u(y)\|u(y)-P_{V_{k-1}}u(y)\| over a sufficiently fine discrete training set Y~Y\tilde{Y}\subset Y. In turn, this method also delivers a computable upper estimate εk\varepsilon_{k} for dist(,Vk){\rm dist}({\cal M},V_{k}). It was proved in [1, 16] that the reduced basis spaces resulting from this greedy algorithm have near-optimal approximation property, in the sense that if dn()d_{n}({\cal M}) has a certain polynomial or exponential rate of decay as nn\to\infty, then the same rate is achieved by dist(,Vn){\rm dist}({\cal M},V_{n}).

In both cases, these reduced models come in the form of a hierarchy (Vn)n1(V_{n})_{n\geq 1}, with computable decreasing error bounds (εn)n1(\varepsilon_{n})_{n\geq 1}, where nn corresponds to the level of truncation in the first case and the step of the greedy algorithm in the second case. Given a reduced model VnV_{n}, one way of tackling the state estimation problem is to replace the complex solution manifold {\cal M} by the simpler prior class described by the cylinder

𝒦=𝒦(Vn,εn)={vV:dist(v,Vn)εn}.{\cal K}={\cal K}(V_{n},\varepsilon_{n})=\{v\in V\;:\;\mathop{\rm dist}(v,V_{n})\leq\varepsilon_{n}\}. (1.14)

that contains {\cal M}. The set 𝒦{\cal K} therefore reflects the approximability of {\cal M} by VnV_{n}. This point of view leads to the Parametrized Background Data Weak (PBDW) method introduced in [19], also termed as one space method and further analyzed in [2], that we recall below in a nutshell.

In the noiseless case, the knowledge of z=(zi)i=1,,mz=(z_{i})_{i=1,\dots,m} is equivalent to that of the orthogonal projection w=PWuw=P_{W}u, where

W:=span{ω1,,ωm}W:={\rm span}\{\omega_{1},\dots,\omega_{m}\} (1.15)

and ωiV\omega_{i}\in V are the Riesz representers of the linear functionals i\ell_{i}, that is

i(v)=ωi,v,vV.\ell_{i}(v)=\langle\omega_{i},v\rangle,\quad v\in V. (1.16)

Thus, the data indicates that uu belongs to the affine space

Vw:=w+W.V_{w}:=w+W^{\perp}. (1.17)

Combining this information with the prior class 𝒦{\cal K}, the unknown state thus belongs to the ellipsoid

𝒦w:=𝒦Vw={v𝒦:PWv=w}.{\cal K}_{w}:={\cal K}\cap V_{w}=\{v\in{\cal K}\;:\;P_{W}v=w\}. (1.18)

For this posterior class 𝒦w{\cal K}_{w}, the optimal recovery estimator uu^{*} that minimizes the worst case error maxu𝒦wuu\max_{u\in{\cal K}_{w}}\|u-u^{*}\| is therefore the center of the ellipsoid, which is equivalently given by

u=u(w):=argmin{vPVnv:PWv=w}.u^{*}=u^{*}(w):={\rm argmin}\{\|v-P_{V_{n}}v\|\;:\;P_{W}v=w\}. (1.19)

It can be computed from the data ww in an elementary manner by solving a finite set of linear equations. The worst case performance for this estimator, both over 𝒦{\cal K} and 𝒦w{\cal K}_{w}, for any ww, is thus given by the half-diameter of the ellipsoid which is the product of the width εn\varepsilon_{n} of 𝒦{\cal K} and the quantity

μn=μ(Vn,W):=maxvVnvPWv.\mu_{n}=\mu(V_{n},W):=\max_{v\in V_{n}}\frac{\|v\|}{\|P_{W}v\|}. (1.20)

Note that μn\mu_{n} is the inverse of the cosine of the angle between VnV_{n} and WW. For n1n\geq 1, this quantity can be computed as the inverse of the smallest singular value of the n×mn\times m cross-Gramian matrix with entries ϕi,ψj\langle\phi_{i},\psi_{j}\rangle between any pair of orthonormal bases (ϕi)i=1,,n(\phi_{i})_{i=1,\dots,n} and (ψj)j=1,,m(\psi_{j})_{j=1,\dots,m} of VnV_{n} and WW, respectively. It is readily seen that one also has

μn=maxwWwPVnw,\mu_{n}=\max_{w\in W^{\perp}}\frac{\|w\|}{\|P_{V_{n}^{\perp}}w\|}, (1.21)

allowing us to extend the above definition to the case of the zero-dimensional space Vn={0}V_{n}=\{0\} for which μ({0},W)=1\mu(\{0\},W)=1.

Since 𝒦{\cal M}\subset{\cal K}, the worst case error bound over {\cal M} of the estimator, defined as

Ewc:=maxuuu(PWu),E_{wc}:=\max_{u\in{\cal M}}\|u-u^{*}(P_{W}u)\|, (1.22)

satisfies the error bound

Ewc=maxuuu(PWu)maxu𝒦uu(PWu)=μnεn.E_{wc}=\max_{u\in{\cal M}}\|u-u^{*}(P_{W}u)\|\leq\max_{u\in{\cal K}}\|u-u^{*}(P_{W}u)\|{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=\mu_{n}\varepsilon_{n}}. (1.23)
Remark 1.1

The estimation map wu(w)w\mapsto u^{*}(w) is linear with norm μn\mu_{n} and does not depend on εn\varepsilon_{n}. It thus satisfies, for any individual uVu\in V and ηW\eta\in W,

uu(PWu+η)μn(dist(u,Vn)+η),\|u-u^{*}(P_{W}u+\eta)\|\leq\mu_{n}({\rm dist}(u,V_{n})+\|\eta\|), (1.24)

We may therefore account for an additional measurement noise and model bias: if the observation is w=PWu+ηw=P_{W}u+\eta with ηεnoise\|\eta\|\leq\varepsilon_{noise}, and if the true states do not lie in {\cal M} but satisfy dist(u,)εmodel{\rm dist}(u,{\cal M})\leq\varepsilon_{model}, the guaranteed error bound (1.23) should be modified into

uu(w)μn(εn+εnoise+εmodel).\|u-u^{*}(w)\|\leq\mu_{n}(\varepsilon_{n}+\varepsilon_{noise}+\varepsilon_{model}). (1.25)

In practice, the noise component ηW\eta\in W typically results from a noise vector η¯m\overline{\eta}\in\mathbb{R}^{m} affecting the observation zz according to z=(u)+η¯z=\ell(u)+\overline{\eta}. Assuming a bound η¯2ε¯noise\|\overline{\eta}\|_{2}\leq\overline{\varepsilon}_{noise} where 2\|\cdot\|_{2} is the Euclidean norm in m\mathbb{R}^{m}, we thus receive the above error bound with εnoise:=Mε¯noise\varepsilon_{noise}:=\|M\|\overline{\varepsilon}_{noise}, where Mm×mM\in\mathbb{R}^{m\times m} is the matrix that transforms the representer basis ω={ω1,,ωm}\omega=\{\omega_{1},\ldots,\omega_{m}\} into an orthonormal basis ψ={ψ1,,ψm}\psi=\{\psi_{1},\ldots,\psi_{m}\} of WW. Here estimation accuracy benefits from decreasing noise without increasing computational cost. This is in contrast to Bayesian methods for which small noise level induces computational difficulties due to the concentration of the posterior distribution.

Remark 1.2

To bring out the essential mechanisms, we have idealized (and technically simplified) the description of the PBDW method by omitting certain discretization aspects that are unavoidable in computational practice and should be accounted for. To start with, the snapshots uiu_{i} (or the polynomial coefficients uνu_{\nu}) that span the reduced basis spaces VnV_{n} cannot be computed exactly, but only up to some tolerance by a numerical solver. One typical instance is the finite element method, which yields an approximate parameter to solution map

yuh(y)Vh,y\mapsto u_{h}(y)\in V_{h}, (1.26)

where VhV_{h} is a reference finite element space ensuring a prescribed accuracy

u(y)uh(y)εh,yY.\|u(y)-u_{h}(y)\|\leq\varepsilon_{h},\quad y\in Y. (1.27)

The computable states are therefore elements of the perturbed manifold

h:={uh(y):yY}.{\cal M}_{h}:=\{u_{h}(y)\,:\,y\in Y\}. (1.28)

The reduced model spaces VnV_{n} are low dimensional subspaces of VhV_{h}, and with certified accuracy

dist(h,Vn)εn.{\rm dist}({\cal M}_{h},V_{n})\leq\varepsilon_{n}. (1.29)

The true states do not belong to h{\cal M}_{h} and this deviation can therefore be interpreted as a model bias in the sense of the previous remark with εmodel=εh\varepsilon_{model}=\varepsilon_{h}. The application of the PDBW also requires the introduction of the Riesz lifts ωi\omega_{i} in order to define the measurement space WW. Since we operate in the space VhV_{h}, these can be defined as elements of this space satisfying

ωi,vV=i(v),vVh,\langle\omega_{i},v\rangle_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}V}}=\ell_{i}(v),\quad v\in V_{h}, (1.30)

thus resulting in a measurement space WVhW\subset V_{h}. For example, if VV is the Sobolev spaces H01(Ω)H^{1}_{0}(\Omega) for some domain Ω\Omega and VhV_{h} is a finite element subspace, the Riesz lifts are the unique solutions to the Galerkin problem

Ωωiv=i(v),vVh,\intop\limits_{\Omega}\nabla\omega_{i}\nabla v=\ell_{i}(v),\quad v\in V_{h}, (1.31)

and can be identified by solving nh×nhn_{h}\times n_{h} linear systems. Measuring accuracy in VV, i.e., in a metric dictated by the continuous PDE model, the idealization, to be largely maintained in what follows, also helps understanding how to properly adapt the background-discretization VhV_{h} to the overall achievable estimation accuracy. Other computational issues involving the space VhV_{h} will be discussed in §3.4.

Note that μn1\mu_{n}\geq 1 increases with nn and that its finiteness imposes that dim(Vn)dim(W)\dim(V_{n})\leq\dim(W), that is mnm\geq n. Therefore, one natural way to decide which space VnV_{n} to use is to take the value of n{0,,m}n\in\{0,\dots,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{m}}\} that minimizes the bound μnεn\mu_{n}\varepsilon_{n}. This choice is somehow crude since it might not be the value of nn that minimizes the true reconstruction error for a given uu\in{\cal M}, and for this reason it was referred to as a poor man algorithm in [1].

The PBDW approach to state estimation can be improved in various ways:

  • One variant that is relevant to the present work is to use reduced models of affine form

    Vn=u¯n+V¯n,V_{n}=\overline{u}_{n}+\overline{V}_{n}, (1.32)

    where V¯n\overline{V}_{n} is a linear space and u¯\overline{u} is a given offset. The optimal recovery estimator is again defined by the minimization property (1.19). Its computation amounts to the same type of linear systems and the reconstruction map wu(w)w\mapsto u^{*}(w) is now affine. The error bound (1.23) remains valid with μn=μ(V¯n,W)\mu_{n}=\mu(\overline{V}_{n},W) and εn\varepsilon_{n} a bound for dist(,Vn){\rm dist}({\cal M},V_{n}). Note that εn\varepsilon_{n} is also a bound for the distance of {\cal M} to the linear space V¯n+1:=V¯nu¯n\overline{V}_{n+1}:=\overline{V}_{n}\,\oplus\,\mathbb{R}\overline{u}_{n} of dimension n+1n+1. However, using instead this linear space, could result in a stability constant μn+1=μ(V¯n+1,W)\mu_{n+1}=\mu(\overline{V}_{n+1},W) that is much larger than μn\mu_{n}, in particular, when the offset u¯n\overline{u}_{n} is close to WW^{\perp}.

  • Another variant proposed in [10] consists in using a large set 𝒯N={ui=u(yi):i=1,,N}{\cal T}_{N}=\{u_{i}=u(y^{i})\,:\,i=1,\dots,N\} of precomputed solutions in order to train the reconstruction maps wu(w)w\mapsto u^{*}(w) by minimizing the least-square fit i=1Nuiu(PWui)2\sum_{i=1}^{N}\|u_{i}-u^{*}(P_{W}u_{i})\|^{2} over all linear or affine map, which amounts to optimizing the choice of the space VnV_{n} in the PBDW method.

  • Conversely, for a given reduced basis space VnV_{n}, it is also possible to optimize the choice of linear functionals (1,,m)(\ell_{1},\dots,\ell_{m}) giving rise to the data, among a dictionary 𝒟{\cal D} that represent a set of admissible measurement devices. The objective is to minimize the stability constant μ(Vn,W)\mu(V_{n},W) for the resulting space WW, see in particular [3] where a greedy algorithm is proposed for selecting the i\ell_{i}. We do not take this view in the present paper and think of the space WW as fixed once and for all: the measurement devices are given to us and cannot be modified.

1.3 Objective and outline

The simplicity of the PBDW method and its above variants come together with a fundamental limitation of its performance: since the map wu(w)w\mapsto u^{*}(w) is linear or affine, the reconstruction necessarily belongs to an mm or m+1m+1 dimensional space, and thefore the worst case performance is necessarily bounded from below by the Kolmogorov width dm()d_{m}({\cal M}) or dm+1()d_{m+1}({\cal M}). In view of this limitation, one principle objective of the present work is to develop nonlinear state estimation techniques which provably overcome the bottleneck of the Kolmogorov width dm()d_{m}({\cal M}).

In §2, we introduce various benchmark quantities that describe the best possible performance of a recovery map in a worst case sense. We first consider an idealized setting where the state uu is assumed to exactly satisfy the theoretical model described by the parametric PDE, that is uu\in{\cal M}. Then we introduce similar benchmarks quantities in the presence of model bias and measurement noise. All these quantities can be substantially smaller than dm()d_{m}({\cal M}).

In §3, we discuss a nonlinear recovery method, based on a family of affine reduced models (Vk)k=1,,K(V_{k})_{k=1,\dots,K}, where each VkV_{k} has dimension nkmn_{k}\leq m and serves as a local approximations to a portion k{\cal M}_{k} of the solution manifold. Applying the PBDW method with each such space, results in a collection of state estimators uku^{*}_{k}. The value kk for which the true state uu belongs to k{\cal M}_{k} being unknown, we introduce a model selection procedure in order to pick a value kk^{*}, and define the resulting estimator u=uku^{*}=u^{*}_{k^{*}}. We show that this estimator has performance comparable to the benchmark introduced in §2. Such performances cannot be achieved by the standard PBDW method due to the above described limitations.

Model selection is a classical topic of mathematical statistics [22], with representative techniques such as complexity penalization or cross-validation in which the data are used to select a proper model. Our approach differs from these techniques in that it exploits (in the spirit of data assimilation) the PDE model which is available to us, by evaluating the distance to the manifold

dist(v,)=minyYvu(y),\mathop{\rm dist}(v,{\cal M})=\min_{y\in Y}\|v-u(y)\|, (1.33)

of the different estimators v=ukv=u^{*}_{k} for k=1,,Kk=1,\dots,K, and picking the value kk^{*} that minimizes it. In practice, the quantity (1.33) cannot be exactly computed and we instead rely on a computable surrogate quantity 𝒮(v,){\cal S}(v,{\cal M}) expressed in terms of the residual to the PDE, see § 3.4.

One typical instance where such a surrogate is available is when (1.1) has the form of a linear operator equation

A(y)u=f(y),A(y)u=f(y), (1.34)

where A(y)A(y) is boundedly invertible from VV to VV^{\prime}, or more generally, from VZV\to Z^{\prime} for a test space ZZ different from VV, uniformly over yYy\in Y. Then 𝒮(v,){\cal S}(v,{\cal M}) is obtained by minimizing the residual

(v,y)=A(y)vf(y)Z,{\cal R}(v,y)=\|A(y)v{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}-}f(y)\|_{Z^{\prime}}, (1.35)

over yYy\in Y. This task itself is greatly facilitated in the case where the operators A(y)A(y) and source terms f(y)f(y) have affine dependence in yy. One relevant example that has been often considered in the literature is the second order elliptic diffusion equation with affine diffusion coefficient,

div(au)=f(y),a=a(y)=a¯+j=1dyjψj.-{\rm div}(a\nabla u)=f(y),\quad a=a(y)=\overline{a}+\sum_{j=1}^{d}y_{j}\psi_{j}. (1.36)

In §4, we discuss the more direct approach for both state and parameter estimation based on minimizing (v,y){\cal R}(v,y) over both yYy\in Y and vw+Wv\in w+W^{\perp}. The associated alternating minimization algorithm amounts to a simple succession of quadratic problems in the particular case of linear PDE’s with affine parameter dependence. Such an algorithm is not guaranteed to converge to a global minimum (since the residual is not globally convex), and for this reason its limit may miss the optimaliy benchmark. On the other hand, using the estimator derived in §3 as a “good initialization point” to this mimimization algorithm leads to a limit state that has at least the same order of accuracy.

These various approaches are numerically tested in §5 for the elliptic equation (1.36), for both the overdetermined regime mdm\geq d, and the underdetermined regime m<dm<d.

2 Optimal recovery benchmarks

In this section we describe the performance of the best possible recovery map

wu(w),w\mapsto u^{*}(w), (2.1)

in terms of its worst case error. We consider first the case of noiseless data and no model bias. In a subsequent step we take such perturbations into account. While these best recovery maps cannot be implemented by a simple algorithm, their performance serves as benchmark for the nonlinear state estimation algorithms discussed in the next section.

2.1 Optimal recovery for the solution manifold

In the absence of model bias and when a noiseless measurement w=PWuw=P_{W}u is given, our knowledge on uu is that it belongs to the set

w:=Vw.{\cal M}_{w}:={\cal M}\cap V_{w}. (2.2)

The best possible recovery map can be described through the following general notion.

Definition 2.1

The Chebychev ball of a bounded set SVS\in V is the closed ball B(v,r)B(v,r) of minimal radius that contains SS. One denotes by v=cen(S)v={\rm cen}(S) the Chebychev center of SS and r=rad(S)r={\rm rad}(S) its Chebychev radius.

In particular one has

12diam(S)rad(S)diam(S),\frac{1}{2}{\rm diam}(S)\leq{\rm rad}(S)\leq{\rm diam}(S), (2.3)

where diam(S):=sup{uv:u,vS}{\rm diam}(S):={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\sup}\{\|u-v\|\,:\,u,v\in S\} is the diameter of SS. Therefore, the recovery map that minimizes the worst case error over w{\cal M}_{w} for any given ww, and therefore over {\cal M} is defined by

u(w)=cen(w).u^{*}(w)={\rm cen}({\cal M}_{w}). (2.4)

Its worst case error is

Ewc=sup{rad(w):wW}.E_{\rm wc}^{*}=\sup\{\mathop{\rm rad}({\cal M}_{w})\,:\,w\in W\}. (2.5)

In view of the equivalence (2.3), we can relate EwcE^{*}_{\rm wc} to the quantity

δ0=δ0(,W):=sup{diam(w):wW}=sup{uv:u,v,uvW},\delta_{0}=\delta_{0}({\cal M},W):={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\sup}\{{\rm diam}({\cal M}_{w})\,:\,w\in W\}=\sup\{\|u-v\|\;:\;u,v\in{\cal M},\;u-v\in W^{\bot}\}, (2.6)

by the equivalence

12δ0Ewcδ0.\frac{1}{2}\delta_{0}\leq E^{*}_{\rm wc}\leq\delta_{0}. (2.7)

Note that injectivity of the measurement map PWP_{W} over {\cal M} is equivalent to δ0=0\delta_{0}=0. We provide in Figure (1(a)) an illustration the above benchmark concepts.

If w=PWuw=P_{W}u for some uu\in{\cal M}, then any uu^{*}\in{\cal M} such that PWu=wP_{W}u^{*}=w, meets the ideal benchmark uuδ0\|u-u^{*}\|\leq\delta_{0}. Therefore, one way for finding such a uu^{*} would be to minimize the distance to the manifold over all functions such that PWv=wP_{W}v=w, that is, solve

minvVwdist(v,)=minvVwminyYu(y)v.\min_{v\in V_{w}}{\rm dist}(v,{\cal M})=\min_{v\in V_{w}}\min_{y\in Y}\|u(y)-v\|. (2.8)

This problem is computationally out of reach since it amounts to the nested minimization of two non-convex functions in high dimension.

Computationally feasible algorithms such as the PBDW methods are based on a simplification of the manifold {\cal M} which induces an approximation error. We introduce next a somewhat relaxed benchmark that takes this error into account.

(a) Perfect model.
(b) Model bias.
Figure 2.1: Illustration of the optimal recovery benchmark on a manifold in the two dimensional Euclidean space.

2.2 Optimal recovery under perturbations

In order to account for manifold simplification as well as model bias, for any given accucary σ>0\sigma>0, we introduce the σ\sigma-offset of {\cal M},

σ:={vV:dist(v,)σ}=uB(u,σ).{\cal M}_{\sigma}:=\{v\in V\;:\;\mathop{\rm dist}(v,{\cal M})\leq\sigma\}=\bigcup_{u\in{\cal M}}B(u,\sigma). (2.9)

Likewise, we introduce the perturbed set

σ,w=σVw,{\cal M}_{\sigma,w}={\cal M}_{\sigma}\cap V_{w}, (2.10)

which, however, still excludes uncertainties in ww. Our benchmark for the worst case error is now defined as (see Figure (1(b)) for an illustration)

δσ:=maxwWdiam(σ,w)=max{uv:u,vσ,uvW}.\delta_{\sigma}:=\max_{w\in W}{\rm diam}({\cal M}_{\sigma,w})=\max\{\|u-v\|\;:\;u,v\in{\cal M}_{\sigma},\;u-v\in W^{\bot}\}. (2.11)

The map σδσ\sigma\mapsto\delta_{\sigma} satisfies some elementary properties:

  • Monotonicity and continuity: it is obviously non-decreasing

    σσ~δσδσ~.\sigma\leq\tilde{\sigma}\implies\delta_{\sigma}\leq\delta_{\tilde{\sigma}}. (2.12)

    Simple finite dimensional examples show that this map may have jump discontinuities. Take for example a compact set 2{\cal M}\subset\mathbb{R}^{2} consisting of the two points (0,0)(0,0) and (1/2,1)(1/2,1), and W=e1W=\mathbb{R}e_{1} where e1=(1,0)e_{1}=(1,0). Then δσ=2σ\delta_{\sigma}=2\sigma for 0σ140\leq\sigma\leq\frac{1}{4}, while δ14(,W)=1\delta_{\frac{1}{4}}({\cal M},W)=1. Using the compactness of {\cal M}, it is possible to check that σδσ\sigma\mapsto\delta_{\sigma} is continuous from the right and in particular limσ0δσ(,W)=δ0\lim_{\sigma\to 0}\delta_{\sigma}({\cal M},W)=\delta_{0}.

  • Bounds from below and above: for any u,vσ,wu,v\in{\cal M}_{\sigma,w}, and for any σ~0\tilde{\sigma}\geq 0, let u~=u+σ~g\tilde{u}=u+\tilde{\sigma}g and v~=vσ~g\tilde{v}=v-\tilde{\sigma}g with g=(uv)/uvg=(u-v)/\|u-v\|. Then, u~v~=uv+2σ~\|\tilde{u}-\tilde{v}\|=\|u-v\|+2{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\tilde{\sigma}}} and u~v~W\tilde{u}-\tilde{v}\in W^{\bot}, which shows that u~,v~σ+σ~,w\tilde{u},\,\tilde{v}\in{\cal M}_{\sigma+\tilde{\sigma},w}, and

    δσ+σ~δσ+2σ~.\delta_{\sigma+\tilde{\sigma}}\geq\delta_{\sigma}+2\tilde{\sigma}. (2.13)

    In particular,

    δσδ0+2σ2σ.\delta_{\sigma}\geq\delta_{0}+2\sigma\geq 2\sigma. (2.14)

    On the other hand, we obviously have the upper bound δσdiam(σ)diam()+2σ\delta_{\sigma}\leq{\rm diam}({\cal M}_{\sigma})\leq{\rm diam}({\cal M})+2\sigma.

  • The quantity

    μ(,W):=12supσ>0δσδ0σ,\mu({\cal M},W):=\frac{1}{2}\sup_{\sigma>0}\frac{\delta_{\sigma}-\delta_{0}}{\sigma}, (2.15)

    may be viewed as a general stability constant inherent to the recovery problem, similar to μ(Vn,W)\mu(V_{n},W) that is more specific to the particular PBDW method: in the special case where =Vn{\cal M}=V_{n} and VnW={0}V_{n}\cap W^{\perp}=\{0\}, one has δ0=0\delta_{0}=0 and δσσ=μ(Vn,W)\frac{\delta_{\sigma}}{\sigma}=\mu(V_{n},W) for all σ>0\sigma>0. Note that μ(,W)1\mu({\cal M},W)\geq 1 in view of (2.14).

Regarding measurement noise, it suggests to introduce the quantity

δ~σ:=max{uv:u,v,PWuPWvσ}.\tilde{\delta}_{\sigma}:=\max\{\|u-v\|\;:\;u,v\in{\cal M},\;\|P_{W}u-P_{W}v\|\leq\sigma\}. (2.16)

The two quantities δσ\delta_{\sigma} and δ~σ\tilde{\delta}_{\sigma} are not equivalent, however one has the framing

δσ2σδ~2σδσ+2σ.\delta_{\sigma}-2\sigma\leq\tilde{\delta}_{2\sigma}\leq\delta_{\sigma}+2\sigma. (2.17)

To prove the upper inequality, we note that for any u,vσu,v\in{\cal M}_{\sigma} such that uvWu-v\in W^{\bot}, there exists u~,v~\tilde{u},\tilde{v}\in{\cal M} at distance σ\sigma from u,vu,v, respectively, and therefore such that PW(u~v~)2σ\|P_{W}(\tilde{u}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}-}\tilde{v})\|\leq 2\sigma. Conversely, for any u~,v~\tilde{u},\tilde{v}\in{\cal M} such that PW(u~v~)2σ\|P_{W}(\tilde{u}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}-}\tilde{v})\|\leq 2\sigma, there exists u,vu,v at distance σ\sigma from u~,v~\tilde{u},\tilde{v}, respectively, such that uvWu-v\in W^{\bot}, which gives the lower inequality. Note that the upper inequality in (2.17) combined with (2.14) implies that δ~2σ2δσ\tilde{\delta}_{2\sigma}\leq 2\delta_{\sigma}.

In the following analysis of reconstruction methods, we use the quantity δσ\delta_{\sigma} as a benchmark which, in view of this last observation, also accounts for the lack of accuracy in the measurement of PWuP_{W}u. Our objective is therefore to design an algorithm that, for a given tolerance σ>0\sigma>0, recovers from the measurement w=PWuw=P_{W}u an approximation to uu with accuracy comparable to δσ\delta_{\sigma}. Such an algorithm requires that we are able to capture the solution manifold up to some tolerance εσ\varepsilon\leq\sigma by some reduced model.

3 Nonlinear recovery by reduced model selection

3.1 Piecewise affine reduced models

Linear or affine reduced models, as used in the PBDW algorithm, are not suitable for approximating the solution manifold when the required tolerance ε\varepsilon is too small. In particular, when ε<dm()\varepsilon{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}<}d_{m}({\cal M}) one would then need to use a linear space VnV_{n} of dimension n>mn{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}>}m, therefore making μ(Vn,W)\mu(V_{n},W) infinite.

One way out is to replace the single space VnV_{n} by a family of affine spaces

Vk=u¯k+V¯k,k=1,,K,V_{k}=\overline{u}_{k}+\overline{V}_{k},\quad k=1,\dots,K, (3.1)

each of them having dimension

dim(Vk)=nkm,\dim(V_{k})=n_{k}\leq m, (3.2)

such that the manifold is well captured by the union of these spaces, in the sense that

dist(,k=1KVk)ε{\rm dist}\Bigl{(}{\cal M},\bigcup_{k=1}^{K}V_{k}\Bigr{)}\leq\varepsilon (3.3)

for some prescribed tolerance ε>0\varepsilon>0. This is equivalent to saying that there exists a partition of the solution manifold

=k=1Kk,{\cal M}=\bigcup_{k=1}^{K}{\cal M}_{k}, (3.4)

such that we have local certified bounds

dist(k,Vk)εkε,k=1,,K.{\rm dist}({\cal M}_{k},V_{k})\leq\varepsilon_{k}\leq\varepsilon,\quad k=1,\dots,K. (3.5)

We may thus think of the family (Vk)k=1,,K(V_{k})_{k=1,\dots,K} as a piecewise affine approximation to {\cal M}. We stress that, in contrast to the hierarchies (Vn)n=0,,m(V_{n})_{n=0,\dots,m} of reduced models discussed in §1.2, the spaces VkV_{k} do not have dimension kk and are not nested. Most importantly, KK is not limited by mm while each nkn_{k} is.

The objective of using a piecewise reduced model in the context of state estimation is to have a joint control on the local accuracy εk\varepsilon_{k} as expressed by (3.5) and on the stability of the PBDW when using any individual VkV_{k}. This means that, for some prescribed μ>1\mu>1, we ask that

μk=μ(V¯k,W)μ,k=1,,K.\mu_{k}=\mu(\overline{V}_{k},W)\leq\mu,\quad k=1,\dots,K. (3.6)

According to (1.23), the worst case error bound over k{\cal M}_{k} when using the PBDW method with a space VkV_{k} is given by the product μkεk\mu_{k}\varepsilon_{k}. This suggests to alternatively require from the collection (Vk)k=1,,K(V_{k})_{k=1,\dots,K}, that for some prescribed σ>0\sigma>0, one has

σk:=μkεkσ,k=1,,K.\sigma_{k}:=\mu_{k}\varepsilon_{k}\leq\sigma,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\quad k=1,\dots,K.}} (3.7)

This leads us to the following definitions.

Definition 3.1

The family (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is σ\sigma-admissible if (3.7) holds. It is (ε,μ)(\varepsilon,\mu)-admissible if (3.5) and (3.6) are jointly satisfied.

Obviously, any (ε,μ)(\varepsilon,\mu)-admissible family is σ\sigma-admissible with σ:=με\sigma:=\mu\varepsilon. In this sense the notion of (ε,μ)(\varepsilon,\mu)-admissibility is thus more restrictive than that of σ\sigma-admissibility. The benefit of the first notion is in the uniform control on the size of μ\mu which is critical in the presence of noise, as hinted at by Remark 1.1.

If uu\in{\cal M} is our unknown state and w=PWuw=P_{W}u is its observation, we may apply the PBDW method for the different VkV_{k} in the given family, which yields a corresponding family of estimators

uk=uk(w)=argmin{dist(v,Vk):vVw},k=1,,K.u_{k}^{*}=u_{k}^{*}(w)={\rm argmin}\{\mathop{\rm dist}(v,V_{k})\,:\,v\in V_{w}\},\quad k=1,\dots,K. (3.8)

If (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is σ\sigma-admissible, we find that the accuracy bound

uukμkdist(u,Vk)μkεk=σkσ,\|u-u_{k}^{*}\|\leq\mu_{k}{\rm dist}(u,V_{k})\leq\mu_{k}\varepsilon_{k}=\sigma_{k}\leq\sigma, (3.9)

holds whenever uku\in{\cal M}_{k}.

Therefore, if in addition to the observed data ww one had an oracle giving the information on which portion k{\cal M}_{k} of the manifold the unknown state sits, we could derive an estimator with worst case error

Ewcσ.E_{wc}\leq\sigma. (3.10)

This information is, however, not available and such a worst case error estimate cannot be hoped for, even with an additional multiplicative constant. Indeed, as we shall see below, σ\sigma can be fixed arbitrarily small by the user when building the family (Vk)k=1,,K(V_{k})_{k=1,\dots,K}, while we know from §2.1 that the worst case error is bounded from below by Ewc12δ0E_{wc}^{*}\geq\frac{1}{2}\delta_{0} which could be non-zero. We will thus need to replace the ideal choice of kk by a model selection procedure only based on the data ww, that is, a map

wk(w),w\mapsto k^{*}(w), (3.11)

leading to a choice of estimator u=uku^{*}=u^{*}_{k^{*}}. We shall prove further that such an estimator is able to achieve the accuracy

Ewcδσ,E_{wc}\leq\delta_{\sigma}, (3.12)

that is, the benchmark introduced in §2.2. Before discussing this model selection, we discuss the existence and construction of σ\sigma-admissible or (ε,μ)(\varepsilon,\mu)-admissible families.

3.2 Constructing admissible reduced model families

For any arbitrary choice of ε>0\varepsilon>0 and μ1\mu\geq 1, the existence of an (ε,μ)(\varepsilon,\mu)-admissible family results from the following observation: since the manifold {\cal M} is a compact set of VV, there exists a finite ε\varepsilon-cover of {\cal M}, that is, a family u¯1,,u¯KV\overline{u}_{1},\dots,\overline{u}_{K}\in V such that

k=1KB(u¯k,ε),{\cal M}\subset\bigcup_{k=1}^{K}B(\overline{u}_{k},\varepsilon), (3.13)

or equivalently, for all vv\in{\cal M}, there exists a kk such that vukε\|v-u_{k}\|\leq\varepsilon. With such an ε\varepsilon cover, we consider the family of trivial affine spaces defined by

Vk={u¯k}=u¯k+V¯k,V¯k={0},V_{k}=\{\overline{u}_{k}\}=\overline{u}_{k}+\overline{V}_{k},\quad\overline{V}_{k}=\{0\}, (3.14)

thus with nk=0n_{k}=0 for all kk. The covering property implies that (3.5) holds. On the other hand, for the 0 dimensional space, one has

μ({0},W)=1,\mu(\{0\},W)=1, (3.15)

and therefore (3.6) also holds. The family (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is therefore (ε,μ)(\varepsilon,\mu)-admissible, and also σ\sigma-admissible with σ=ε\sigma=\varepsilon.

This family is however not satisfactory for algorithmic purposes for two main reasons. First, the manifold is not explicely given to us and the construction of the centers u¯k\overline{u}_{k} is by no means trivial. Second, asking for an ε\varepsilon-cover, would typically require that KK becomes extremely large as ε\varepsilon goes to 0. For example, assuming that the parameter to solution yu(y)y\mapsto u(y) has Lipschitz constant LL,

u(y)u(y~)L|yy~|,y,y~Y,\|u(y)-u(\tilde{y})\|\leq L|y-\tilde{y}|,\quad y,\tilde{y}\in Y, (3.16)

for some norm |||\cdot| of d\mathbb{R}^{d}, then an ε\varepsilon cover for {\cal M} would be induced by an L1εL^{-1}\varepsilon cover for YY which has cardinality KK growing like εd\varepsilon^{-d} as ε0\varepsilon\to 0. Having a family of moderate size KK is important for the estimation procedure since we intend to apply the PBDW method for all k=1,,Kk=1,\dots,K.

In order to construct (ε,μ)(\varepsilon,\mu)-admissible or σ\sigma-admissible families of better controlled size, we need to split the manifold in a more economical manner than through an ε\varepsilon-cover, and use spaces VkV_{k} of general dimensions nk{0,,m}n_{k}\in\{0,\dots,m\} for the various manifold portions k{\cal M}_{k}. To this end, we combine standard constructions of linear reduced model spaces with an iterative splitting procedure operating on the parameter domain YY. Let us mention that various ways of splitting the parameter domain have already been considered in order to produce local reduced bases having both, controlled cardinality and prescribed accuracy [18, 21, 5]. Here our goal is slightly different since we want to control both the accuracy ε\varepsilon and the stability μ\mu with respect to the measurement space WW.

We describe the greedy algorithm for constructing σ\sigma-admissible families, and explain how it should be modified for (ε,μ)(\varepsilon,\mu)-admissible families. For simplicity we consider the case where YY is a rectangular domain with sides parallel to the main axes, the extension to a more general bounded domain YY being done by embedding it in such a hyper-rectangle. We are given a prescribed target value σ>0\sigma>0 and the splitting procedure starts from YY.

At step jj, a disjoint partition of YY into rectangles (Yk)k=1,,Kj(Y_{k})_{k=1,\dots,K_{j}} with sides parallel to the main axes has been generated. It induces a partition of {\cal M} given by

k:={u(y):yYk},k=1,,Kj.{\cal M}_{k}:=\{u(y)\,:\,y\in Y_{k}\},\quad k=1,\dots,K_{j}. (3.17)

To each k{1,,Kj}k\in\{1,\dots,K_{j}\} we associate a hierarchy of affine reduced basis spaces

Vn,k=u¯k+V¯n,k,n=0,,m.V_{n,k}=\overline{u}_{k}+\overline{V}_{n,k},\quad n=0,\dots,m. (3.18)

where u¯k=u(y¯k)\overline{u}_{k}=u(\overline{y}_{k}) with y¯k\overline{y}_{k} the vector defined as the center of the rectangle YkY_{k}. The nested linear spaces

V¯0,kV¯1,kV¯m,k,dim(V¯n,k)=n,\overline{V}_{0,k}\subset\overline{V}_{1,k}\subset\cdots\subset\overline{V}_{m,k},\quad\dim({\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\overline{V}}_{n,k})=n, (3.19)

are meant to approximate the translated portion of the manifold ku¯k{\cal M}_{k}-\overline{u}_{k}. For example, they could be reduced basis spaces obtained by applying the greedy algorithm to ku¯k{\cal M}_{k}-\overline{u}_{k}, or spaces resulting from local nn-term polynomial approximations of u(y)u(y) on the rectangle YkY_{k}. Each space Vn,kV_{n,k} has a given accuracy bound and stability constant

dist(k,Vn,k)εn,kandμn,k:=μ(V¯n,k,W).{\rm dist}({\cal M}_{k},V_{n,k})\leq\varepsilon_{n,k}\quad{\rm and}\quad\mu_{n,k}:=\mu(\overline{V}_{n,k},W). (3.20)

We define the test quantity

τk=minn=0,,mμn,kεn,k.\tau_{k}=\min_{n=0,\dots,m}\mu_{n,k}\varepsilon_{n,k}. (3.21)

If τkσ\tau_{k}\leq\sigma, the rectangle YkY_{k} is not split and becomes a member of the final partition. The affine space associated to k{\cal M}_{k} is

Vk=u¯k+V¯k,V_{k}=\overline{u}_{k}+\overline{V}_{k}, (3.22)

where Vk=Vn,kV_{k}=V_{n,k} for the value of nn that minimizes μn,kεn,k\mu_{n,k}\varepsilon_{n,k}. The rectangles YkY_{k} with τk>σ\tau_{k}>\sigma are, on the other hand, split into a finite number of sub-rectangles in a way that we discuss below. This results in the new larger partition (Yk)k=1,,Kj+1(Y_{k})_{k=1,\dots,K_{j+1}} after relabelling the YkY_{k}. The algorithm terminates at the step jj as soon as τkσ\tau_{k}\leq\sigma for all k=1,,Kj=Kk=1,\dots,K_{j}=K, and the family (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is σ\sigma-admissible. In order to obtain an (ε,μ)(\varepsilon,\mu)-admissible family, we simply modify the test quantity τk\tau_{k} by defining it instead as

τk:=minn=0,,mmax{μn,kμ,εn,kε}\tau_{k}:=\min_{n=0,\dots,m}\max\Big{\{}\frac{\mu_{n,k}}{\mu},\frac{\varepsilon_{n,k}}{\varepsilon}\Big{\}} (3.23)

and splitting the cells for which τk>1\tau_{k}>1.

The splitting of one single rectangle YkY_{k} can be performed in various ways. When the parameter dimension dd is moderate, we may subdivide each side-length at the mid-point, resulting into 2d2^{d} sub-rectangles of equal size. This splitting becomes too costly as dd gets large, in which case it is preferable to make a choice of i{1,,d}i\in\{1,\dots,d\} and subdivide YkY_{k} at the mid-point of the side-length in the ii-coordinate, resulting in only 22 sub-rectangles. In order to decide which coordinate to pick, we consider the dd possibilities and take the value of ii that minimizes the quantity

τk,i=max{τk,i,τk,i+},\tau_{k,i}=\max\{\tau_{k,i}^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}-}},\tau_{k,i}^{+}\}, (3.24)

where (τk,i,τk,i+)(\tau_{k,i}^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}-}},\tau_{k,i}^{+}) are the values of τk\tau_{k} for the two subrectangles obtained by splitting along the ii-coordinate. In other words, we split in the direction that decreases τk\tau_{k} most effectively. In order to be certain that all sidelength are eventually split, we can mitigate the greedy choice of ii in the following way: if YkY_{k} has been generated by ll consecutive refinements, and therefore has volume |Yk|=2l|Y||Y_{k}|=2^{-l}|Y|, and if ll is even, we choose i=(l/2modd)i={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}(l/2\,{\rm mod}\,d)}. This means that at each even level we split in a cyclic manner in the coordinates i{1,,d}i\in\{1,\dots,d\}.

Using such elementary splitting rules, we are ensured that the algorithm must terminate. Indeed, we are guaranteed that for any η>0\eta>0, there exists a level l=l(η)l=l(\eta) such that any rectangle YkY_{k} generated by ll consecutive refinements has side-length smaller than 2η2\eta in each direction. Since the parameter-to-solution map is continuous, for any ε>0\varepsilon>0, we can pick η>0\eta>0 such that

yy~ηu(y)u(y~)ε,y,y~Y.\|y-\tilde{y}\|_{\ell^{\infty}}\leq\eta\implies\|u(y)-u(\tilde{y})\|\leq\varepsilon,\quad y,\tilde{y}\in Y. (3.25)

Applying this to yYky\in Y_{k} and y~=y¯k\tilde{y}=\overline{y}_{k}, we find that for u¯k=u(y¯k)\overline{u}_{k}=u(\overline{y}_{k})

uu¯kε,uk.\|u-\overline{u}_{k}\|\leq\varepsilon,\quad u\in{\cal M}_{k}. (3.26)

Therefore, for any rectangle YkY_{k} of generation ll, we find that the trivial affine space Vk=u¯kV_{k}=\overline{u}_{k} has local accuracy εkε\varepsilon_{k}\leq\varepsilon and μk=μ({0},W)=1μ\mu_{k}=\mu(\{0\},W)=1\leq\mu, which implies that such a rectangle would not anymore be refined by the algorithm.

3.3 Reduced model selection and recovery bounds

We return to the problem of selecting an estimator within the family (uk)k=1,,K(u_{k}^{*})_{k=1,\dots,K} defined by (3.8). In an idealized version, the selection procedure picks the value kk^{*} that minimizes the distance of uku_{k}^{*} to the solution manifold, that is,

k=argmin{dist(uk,):k=1,,K}k^{*}={\rm argmin}\{\mathop{\rm dist}(u_{k}^{*},{\cal M})\,:\,k=1,\dots,K\} (3.27)

and takes for the final estimator

u=u(w):=uk(w).u^{*}=u^{*}(w):=u^{*}_{k^{*}}(w). (3.28)

Note that kk^{*} also depends on the observed data ww. This estimation procedure is not realistic, since the computation of the distance of a known function vv to the manifold

dist(v,)=minyYu(y)v,\mathop{\rm dist}(v,{\cal M})=\min_{y\in Y}\|u(y)-v\|, (3.29)

is a high-dimensional non-convex problem, which necessitates to explore the solution manifold. A more realistic procedure is based on replacing this distance by a surrogate quantity 𝒮(v,){\cal S}(v,{\cal M}) that is easily computable and satisfies a uniform equivalence

rdist(v,)𝒮(v,)Rdist(v,),vV,r\mathop{\rm dist}(v,{\cal M})\leq{\cal S}(v,{\cal M})\leq R\mathop{\rm dist}(v,{\cal M}),\quad v\in V, (3.30)

for some constants 0<rR0<r\leq R. We then instead take for kk^{*} the value that minimizes this surrogate, that is,

k=argmin{𝒮(uk,):k=1,,K}.k^{*}={\rm argmin}\{{\cal S}(u_{k}^{*},{\cal M})\,:\,k=1,\dots,K\}. (3.31)

Before discussing the derivation of 𝒮(v,){\cal S}(v,{\cal M}) in concrete cases, we establish a recovery bound in the absence of model bias and noise.

Theorem 3.2

Assume that the family (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is σ\sigma-admissible for some σ>0\sigma>0. Then, the idealized estimator based on (3.27), (3.28), satisfies the worst case error estimate

Ewc=maxuuu(PWu)δσ,E_{wc}=\max_{u\in{\cal M}}\|u-u^{*}(P_{W}u)\|\leq\delta_{\sigma}, (3.32)

where δσ\delta_{\sigma} is the benchmark quantity defined in (2.11). When using the estimator based on (3.31), the worst case error estimate is modified into

Ewcδκσ,κ=Rr>1.E_{wc}\leq\delta_{\kappa\sigma},\quad\kappa=\frac{R}{r}>1. (3.33)

Proof: Let uu\in{\cal M} be an unknown state and w=PWuw=P_{W}u. There exists l=l(u)1,,Kl=l(u)\in{1,\dots,K}, such that ulu\in{\cal M}_{l}, and for this value, we know that

uulμlεl=σlσ.\|u-u_{l}^{*}\|\leq\mu_{l}\varepsilon_{l}=\sigma_{l}\leq\sigma. (3.34)

Since uu\in{\cal M}, it follows that

dist(ul,)σ.\mathop{\rm dist}(u_{l}^{*},{\cal M})\leq\sigma. (3.35)

On the other hand, for the value kk^{*} selected by (3.31) and u=uku^{*}=u_{k^{*}}^{*}, we have

dist(u,)R𝒮(u,)R𝒮(ul,)κdist(ul,)κσ.\mathop{\rm dist}(u^{*},{\cal M})\leq R\,{\cal S}(u^{*},{\cal M})\leq R\,{\cal S}(u_{l}^{*},{\cal M})\leq\kappa\mathop{\rm dist}(u_{l}^{*},{\cal M})\leq\kappa\sigma. (3.36)

It follows that uu^{*} belongs to the offset κσ{\cal M}_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\kappa}}\sigma}. Since uσκσu\in{\cal M}\subset{\cal M}_{\sigma}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\subseteq{\cal M}_{\kappa\sigma}}} and uuWu-u^{*}\in W^{\perp}, we find that

uuδκσ,\|u-u^{*}\|\leq\delta_{\kappa\sigma}, (3.37)

which establishes the recovery estimate (3.33). The estimate (3.32) for the idealized estimator follows since it corresponds to having r=R=1r=R=1. \Box

Remark 3.3

One possible variant of the selection mechanism, which is actually adopted in our numerical experiments, consists in picking the value kk^{*} that minimizes the distance of uku_{k}^{*} to the corresponding local portion k{\cal M}_{k} of the solution manifold, or a surrogate 𝒮(uk,k){\cal S}(u_{k}^{*},{\cal M}_{k}) with equivalence properties analogous to (3.30). It is readily checked that Theorem 3.2 remains valid for the resulting estimator uu^{*} with the same type of proof.

In the above result, we do not obtain the best possible accuracy satisfied by the different uku_{k}^{*}, since we do not have an oracle providing the information on the best choice of kk. We next show that this order of accuracy is attained in the particular case where the measurement map PWP_{W} is injective on {\cal M} and the stability constant of the recovery problem defined in (2.15) is finite.

Theorem 3.4

Assume that δ0=0\delta_{0}=0 and that

μ(,W)=12supσ>0δσσ<.\mu({\cal M},W)=\frac{1}{2}\sup_{\sigma>0}\frac{\delta_{\sigma}}{\sigma}<\infty. (3.38)

Then, for any given state uu\in{\cal M} with observation w=PWuw=P_{W}u, the estimator uu^{*} obtained by the model selection procedure (3.31) satisfies the oracle bound

uuCmink=1,,Kuuk,C:=2μ(,W)κ.\|u-u^{*}\|\leq C\min_{k=1,\dots,K}\|u-u_{k}^{*}\|,\quad C:=2\mu({\cal M},W)\kappa. (3.39)

In particular, if (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is σ\sigma-admissible, it satisfies

uuCσ.\|u-u^{*}\|\leq C\sigma. (3.40)

Proof: Let l{1,,K}l\in\{1,\dots,K\} be the value for which uul=mink=1,,Kuuk\|u-u_{l}^{*}\|=\min_{k=1,\dots,K}\|u-u_{k}^{*}\|. Reasoning as in the proof of Theorem 3.2, we find that

dist(u,)κβ,β:=dist(ul,),\mathop{\rm dist}(u^{*},{\cal M})\leq\kappa\beta,\quad\beta:=\mathop{\rm dist}(u_{l}^{*},{\cal M}), (3.41)

and therefore

uuδκβ2μ(,W)κdist(ul,),\|u-u^{*}\|\leq\delta_{\kappa\beta}\leq 2\mu({\cal M},W)\kappa\mathop{\rm dist}(u_{l}^{*},{\cal M}), (3.42)

which is (3.39). We then obtain (3.40) using the fact that uukσ\|u-u^{*}_{k}\|\leq\sigma for the value of kk such that uku\in{\cal M}_{k}. \Box

We next discuss how to incorporate model bias and noise in the recovery bound, provided that we have a control on the stability of the PBDW method, through a uniform bound on μk\mu_{k}, which holds when we use (ε,μ)(\varepsilon,\mu)-admissible families.

Theorem 3.5

Assume that the family (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is (ε,μ)(\varepsilon,\mu)-admissible for some ε>0\varepsilon>0 and μ1\mu\geq 1. If the observation is w=PWu+ηw=P_{W}u+\eta with ηεnoise\|\eta\|\leq\varepsilon_{noise}, and if the true state does not lie in {\cal M} but satisfies dist(u,)εmodel{\rm dist}(u,{\cal M})\leq\varepsilon_{model}, then, the estimator based on (3.31) satisfies the estimate

uu(w)δκρ+εnoise,ρ:=μ(ε+εnoise)+(μ+1)εmodel,κ=Rr,\|u-u^{*}(w)\|\leq\delta_{\kappa\rho}+\varepsilon_{noise},\quad\rho:=\mu(\varepsilon+\varepsilon_{noise})+(\mu+1)\varepsilon_{model},\quad\kappa=\frac{R}{r}, (3.43)

and the idealized estimator based on (3.27) satifies a similar estimate with κ=1\kappa=1.

Proof: There exists l=l(u){1,,K}l=l(u)\in\{1,\dots,K\} such that

dist(u,l)εmodel,\mathop{\rm dist}(u,{\cal M}_{l})\leq\varepsilon_{model}, (3.44)

and therefore

dist(u,Vl)εl+εmodel.\mathop{\rm dist}(u,V_{l})\leq\varepsilon_{l}+\varepsilon_{model}. (3.45)

As already noted in Remark 1.1, we know that the PBDW method for this value of kk has accuracy

uulμl(εl+εnoise+εmodel)μ(ε+εnoise+εmodel).\|u-u_{l}^{*}\|\leq\mu_{l}(\varepsilon_{l}+\varepsilon_{noise}+\varepsilon_{model})\leq\mu(\varepsilon+\varepsilon_{noise}+\varepsilon_{model}). (3.46)

Therefore

dist(ul,)μ(ε+εnoise+εmodel)+εmodel=ρ,\mathop{\rm dist}(u_{l}^{*},{\cal M})\leq\mu(\varepsilon+\varepsilon_{noise}+\varepsilon_{model})+\varepsilon_{model}=\rho, (3.47)

and in turn

dist(u,)κρ.\mathop{\rm dist}(u^{*},{\cal M})\leq\kappa\rho. (3.48)

On the other hand, we define

v:=u+η=u+wPWu=u+PW(uu),v:=u+\eta=u+w-P_{W}u=u+P_{W}(u^{*}-u), (3.49)

so that

dist(v,)vu+εmodelεnoise+εmodelρ.\mathop{\rm dist}(v,{\cal M})\leq\|v-u\|+\varepsilon_{model}\leq\varepsilon_{noise}+\varepsilon_{model}\leq\rho. (3.50)

Since vuWv-u^{*}\in W^{\perp}, we conclude that vuδκρ\|v-u^{*}\|\leq\delta_{\kappa\rho}, from which (3.43) follows. \Box

While the reduced model selection approach provides us with an estimator wu(w)w\mapsto u^{*}(w) of a single plausible state, the estimated distance of some of the other estimates uk(w)u_{k}(w) may be of comparable size. Therefore, one could be interested in recovering a more complete estimate on a plausible set that may contain the true state uu or even several states in {\cal M} sharing the same measurement. This more ambitious goal can be viewed as a deterministic counterpart to the search for the entire posterior probability distribution of the state in a Bayesian estimation framework, instead of only searching for a single estimated state, for instance, the expectation of this distribution. For simplicity, we discuss this problem in the absence of model bias and noise. Our goal is therefore to approximate the set

w=Vw.{\cal M}_{w}={\cal M}\cap V_{w}. (3.51)

Given the family (Vk)k=1,,K(V_{k})_{k=1,\dots,K}, we consider the ellipsoids

k:={vVwdist(v,Vk)εk},k=1,,K,{\cal E}_{k}:=\{v\in V_{w}\,\;\,\mathop{\rm dist}(v,V_{k})\leq\varepsilon_{k}\},\quad k=1,\dots,K, (3.52)

which have center uku^{*}_{k} and diameter at most μkεk\mu_{k}\varepsilon_{k}. We already know that w{\cal M}_{w} is contained inside the union of the k{\cal E}_{k} which could serve as a first estimator. In order to refine this estimator, we would like to discard the k{\cal E}_{k} that do not intersect the associated portion k{\cal M}_{k} of the solution manifold.

For this purpose, we define our estimator of w{\cal M}_{w} as the union

w:=kSk,{\cal M}_{w}^{*}:=\bigcup_{k\in S}{\cal E}_{k}, (3.53)

where SS is the set of those kk such that

𝒮(uk,k)Rμkεk.{\cal S}(u_{k}^{*},{\cal M}_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}})\leq R\mu_{k}\varepsilon_{k}. (3.54)

It is readily seen that kSk\notin S implies that kk={\cal E}_{k}\cap{\cal M}_{k}=\emptyset. The following result shows that this set approximates w{\cal M}_{w} with an accuracy of the same order as the recovery bound established for the estimator u(w)u^{*}(w).

Theorem 3.6

For any state uu\in{\cal M} with observation w=PWuw=P_{W}u, one has the inclusion

ww.{\cal M}_{w}\subset{\cal M}_{w}^{*}. (3.55)

If the family (Vk)k=1,,K(V_{k})_{k=1,\dots,K} is σ\sigma-admissible for some σ>0\sigma>0, the Hausdorff distance between the two sets satisfies the bound

d(w,w)=maxvwminuwvuδ(κ+1)σ,κ=Rr.d_{{\cal H}}({\cal M}_{w}^{*},{\cal M}_{w})=\max_{v\in{\cal M}_{w}^{*}}\min_{u\in{\cal M}_{w}}\|v-u\|\leq\delta_{(\kappa+1)\sigma},\quad\kappa=\frac{R}{r}. (3.56)

Proof: Any uwu\in{\cal M}_{w} is a state from {\cal M} that gives the observation PWuP_{W}u. This state belongs to l{\cal M}_{l} for some particular l=l(u)l=l(u), for which we know that uu belongs to the ellipsoid l{\cal E}_{l} and that

uulμlεl.\|u-u^{*}_{l}\|\leq\mu_{l}\varepsilon_{l}. (3.57)

This implies that dist(ul,l)μlεl\mathop{\rm dist}(u^{*}_{l},{\cal M}_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}l}})\leq\mu_{l}\varepsilon_{l}, and therefore 𝒮(ul,l)Rμlεl{\cal S}(u^{*}_{l},{\cal M}_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}l}})\leq R\mu_{l}\varepsilon_{l}. Hence lSl\in S, which proves the inclusion (3.55). In order to prove the estimate on the Hausdorff distance, we take any kSk\in S, and notice that

dist(uk,k)κμkεkκσ,\mathop{\rm dist}(u_{k}^{*},{\cal M}_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}})\leq\kappa\mu_{k}\varepsilon_{k}\leq\kappa\sigma, (3.58)

and therefore, for all such kk and all vkv\in{\cal E}_{k}, we have

dist(v,k)(κ+1)μkεk.\mathop{\rm dist}(v,{\cal M}_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}})\leq(\kappa+1)\mu_{k}\varepsilon_{k}. (3.59)

Since uvWu-v\in W^{\perp}, it follows that

vuδ(κ+1)σ,\|v-u\|\leq\delta_{(\kappa+1)\sigma}, (3.60)

which proves (3.56). \Box

Remark 3.7

If we could take SS to be exactly the set of those kk such that kk{\cal E}_{k}\cap{\cal M}_{k}\neq\emptyset, the resulting w{\cal M}^{*}_{w} would still contain w{\cal M}_{w} but with a sharper error bound. Indeed, any vwv\in{\cal M}^{*}_{w} belongs to a set k{\cal E}_{k} that intersects k{\cal M}_{k} at some uwu\in{\cal M}_{w}, so that

d(w,w)2σ.d_{{\cal H}}({\cal M}_{w}^{*},{\cal M}_{w})\leq 2\sigma. (3.61)

In order to identify if a kk belongs to this smaller SS, we need to solve the minimization problem

minvk𝒮(v,k),\min_{v\in{\cal E}_{k}}{\cal S}(v,{\cal M}_{k}), (3.62)

and check whether the minimum is zero. As explained next, the quantity 𝒮(v,k){\cal S}(v,{\cal M}_{k}) is itself obtained by a minimization problem over yYky\in Y_{k}. The resulting double minimization problem is globally non-convex, but it is convex separately in vv and yy, which allows one to apply simple alternating minimization techniques. These procedures (which are not guaranteed to converge to the global minimum) are discussed in §4.2.

3.4 Residual based surrogates

The computational realization of the above concepts hinges on two main constituents, namely (i) the ability to evaluate bounds εn\varepsilon_{n} for dist(,Vn)\mathop{\rm dist}({\cal M},V_{n}) as well as (ii) to have at hand computationally affordable surrogates 𝒮(v,){\cal S}(v,{\cal M}) for dist(v,)=minuvu\mathop{\rm dist}(v,{\cal M})=\min_{u\in{\cal M}}\|v-u\|. In both cases one exploits the fact that errors in VV are equivalent to residuals in a suitable dual norm. Regarding (i), the derivation of bounds εn\varepsilon_{n} has been discussed extensively in the context of Reduced Basis Methods [23], see also [15] for the more general framework discussed below. Substantial computational effort in an offline phase provides residual based surrogates for uu(y)\|u-u(y)\| permitting frequent parameter queries at an online stage needed, in particular, to construct reduced bases. This strategy becomes challenging though for high parameter dimensionality and we refer to [9] for remedies based on trading deterministic certificates against probabilistic ones at significantly reduced computational cost. Therefore, we focus here on task (ii).

One typical setting where a computable surrogate 𝒮(v,){\cal S}(v,{\cal M}) can be derived is when u(y)u(y) is the solution to a parametrized operator equation of the general form

A(y)u(y)=f(y).A(y)u(y)=f(y). (3.63)

Here we assume that for every yYy\in Y the right side f(y)f(y) belongs to the dual ZZ^{\prime} of a Hilbert test space ZZ, and A(y)A(y) is boundedly invertible from VV to ZZ^{\prime}. The operator equation has therefore an equivalent variational formulation

𝒜y(u(y),v)=y(v),vZ,{\cal A}_{y}(u(y),v)={\cal F}_{y}(v),\quad v\in Z, (3.64)

with parametrized bilinear form 𝒜y(w,v)=A(y)w,vZ,Z{\cal A}_{y}(w,v)=\langle A(y)w,v\rangle_{Z^{\prime},Z} and linear form y(v)=f(y),vZ,Z{\cal F}_{y}(v)=\langle f(y),v\rangle_{Z^{\prime},Z}. This setting includes classical elliptic problems with Z=VZ=V, as well as saddle-point and unsymmetric problems such as convection-diffusion problems or space-time formulations of parabolic problems.

We assume continuous dependence of A(y)A(y) and f(y)f(y) with respect to yYy\in Y, which by compactness of YY, implies uniform boundedness and invertibility, that is

𝒜(y)VZRand𝒜(y)1ZVr1,yY,\|{\cal A}(y)\|_{V\to Z^{\prime}}\leq R\quad{\rm and}\quad\|{\cal A}(y)^{-1}\|_{Z^{\prime}\to V}\leq r^{-1},\quad y\in Y, (3.65)

for some 0<rR<0<r\leq R<\infty. It follows that for any vVv\in V, one has the equivalence

rvu(y)V(v,y)Rvu(y)V.r\|v-u(y)\|_{V}\leq{\cal R}(v,y)\leq R\|v-u(y)\|_{V}. (3.66)

where

(v,y):=A(y)vf(y)Z,{\cal R}(v,y):=\|A(y)v-f(y)\|_{Z^{\prime}}, (3.67)

is the residual of the PDE for a state vv and parameter yy.

Therefore the quantity

𝒮(v,):=minyY(v,y),{\cal S}(v,{\cal M}):=\min_{y\in Y}{\cal R}(v,y), (3.68)

provides us with a surrogate of dist(v,)\mathop{\rm dist}(v,{\cal M}) that satisfies the required framing (3.30).

One first advantage of this surrogate quantity is that for each given yYy\in Y, the evaluation of the residual A(y)vf(y)Z\|A(y)v-f(y)\|_{Z^{\prime}} does not require to compute the solution u(y)u(y). Its second advantage is that the minimization in yy is facilitated in the relevant case where A(y)A(y) and f(y)f(y) have affine dependence on yy, that is,

A(y)=A0+j=1dyjAjandf(y)=f0+j=1dyjfj.A(y)=A_{0}+\sum_{j=1}^{d}y_{j}A_{j}\quad{\rm and}\quad f(y)=f_{0}+\sum_{j=1}^{d}y_{j}f_{j}. (3.69)

Indeed, 𝒮(v,){\cal S}(v,{\cal M}) then amounts to the minimization over yYy\in Y of the function

(v,y)2:=A0vf0+j=1dyj(Ajvfj)Z2,{\cal R}(v,y)^{2}:=\Big{\|}A_{0}v-f_{0}+\sum_{j=1}^{d}y_{j}(A_{j}v-f_{j})\Big{\|}_{Z^{\prime}}^{2}, (3.70)

which is a convex quadratic polynomial in yy. Hence, a minimizer y(v)Yy(v)\in Y of the corresponding constrained linear least-squares problem exists, rendering the surrogate 𝒮(v,)=(v,y(v)){\cal S}(v,{\cal M})={\cal R}(v,y(v)) well-defined.

In all the above mentioned examples the norm Z=,Z1/2\|\cdot\|_{Z}=\langle\cdot,\cdot\rangle_{Z}^{1/2} can be efficiently computed. For instance, in the simplest case of an H01(Ω)H^{1}_{0}(\Omega)-elliptic problem one has Z=V=H01(Ω)Z=V=H^{1}_{0}(\Omega) with

v,zZ=Ωvzdx.\langle v,z\rangle_{Z}=\intop\limits_{\Omega}\nabla v\cdot\nabla zdx. (3.71)

The obvious obstacle is then, however, the computation of the dual norm Z\|\cdot\|_{Z^{\prime}} which in the particular example above is the H1(Ω)H^{-1}(\Omega)-norm. A viable strategy is to use the Riesz lift rZ:ZZr_{\mbox{\tiny Z}}:Z^{\prime}\to Z, defined by

rZg,zZ=g,zZ,Z=g(z),gZ,zZ,\langle r_{\mbox{\tiny Z}}g,z\rangle_{Z}=\langle g,z\rangle_{Z^{\prime},Z}=g(z),\quad g\in Z^{\prime},\,z\in Z, (3.72)

which implies that rZgZ=gZ\|r_{\mbox{\tiny Z}}g\|_{Z}=\|g\|_{Z^{\prime}}. Thus, (v,y)2{\cal R}(v,y)^{2} is computed for a given (v,y)V×Y(v,y)\in V\times Y by introducing the lifted elements

ejrZ(Ajvfj),j=0,,d,e_{j}\coloneqq r_{\mbox{\tiny Z}}(A_{j}v-f_{j}),\quad j=0,\ldots,d, (3.73)

so that, by linearity

(v,y)2=e0+j=1dyjejZ2.{\cal R}(v,y)^{2}=\Big{\|}e_{0}+\sum_{j=1}^{d}y_{j}e_{j}\Big{\|}^{2}_{Z}. (3.74)

Note that the above derivation is still idealized as the d+1d+1 variational problems (3.73) are posed in the infinite dimensional space ZZ. As already stressed in Remark 1.2, all computations take place in reference finite element spaces VhVV_{h}\subset V and ZhZZ_{h}\subset Z. We thus approximate the eje_{j} by ej,hZhe_{j,h}\in Z_{h}, for vVhv\in V_{h}, using the Galerkin approximation of (3.72). This gives rise to a computable least-squares functional

h(v,y)2=e0,h+j=1dyjej,hZ2,yY.{\cal R}_{h}(v,y)^{2}=\Big{\|}e_{0,h}+\sum_{j=1}^{d}y_{j}e_{j,h}\Big{\|}^{2}_{Z},\quad y\in Y. (3.75)

The practical distance surrogate is then defined through the corresponding constrained least-squares problem

𝒮h(v,h):=minyYh(v,y),{\cal S}_{h}(v,{\cal M}_{h}):=\min_{y\in Y}{\cal R}_{h}(v,y), (3.76)

which can be solved by standard optimization methods. As indicated earlier, the recovery schemes can be interpreted as taking place in a fixed discrete setting, with {\cal M} replaced by h{\cal M}_{h}, comprised of approximate solutions in a large finite element space VhVV_{h}\subset V, and measuring accuracy only in this finite dimensional setting. One should note though that the approach allows one to disentangle discretization errors from recovery estimates, even with regard to the underlying continuous PDE model. In fact, given any target tolerance εh\varepsilon_{h}, using a posteriori error control in ZZ, the spaces Vh,ZhV_{h},Z_{h} can be chosen large enough to guarantee that

|(v,y)h(v,y)|εhv,vVh.\big{|}{\cal R}(v,y)-{\cal R}_{h}(v,y)\big{|}\leq\varepsilon_{h}\|v\|,\quad v\in V_{h}. (3.77)

Accordingly, one has |𝒮h(v,h)𝒮(v,)|εhv\big{|}{\cal S}_{h}(v,{\cal M}_{h})-{\cal S}(v,{\cal M})\big{|}\leq\varepsilon_{h}\|v\|, so that recovery estimates remain meaningful with respect to the continuous setting as long as εh\varepsilon_{h} remains sufficiently dominated by the the threshholds εk,σ,εnoise\varepsilon_{k},\sigma,\varepsilon_{\rm noise} appearing in the above results. For notational simplicity we therefore continue working in the continuous setting.

4 Joint parameter and state estimation

4.1 An estimate for yy

Searching for a parameter yYy\in Y, that explains an observation w=PWu(y)w=P_{W}u(y), is a nonlinear inverse problem. As shown next, a quantifiable estimate for yy can be obtained from a state estimate u(w)u^{*}(w) combined with a residual minimization.

For any state estimate u(w)u^{*}(w) which we compute from ww, the most plausible parameter is the one associated to the metric projection of u(w)u^{*}(w) into {\cal M}, that is,

yargminyYu(y)u(w).y^{*}\in\mathop{\rm argmin}_{y\in Y}\|u(y)-u^{*}(w)\|.

Note that yy^{*} depends on ww but we omit the dependence in the notation in what follows. Finding yy^{*} is a difficult task since it requires solving a non-convex optimization problem.

However, as we have already noticed, a near metric projection of uu^{*} to {\cal M} can be computed through a simple convex problem in the case of affine parameter dependence (3.69), minimizing the residual (v,y){\cal R}(v,y) given by (3.70). Our estimate for the parameter is therefore

yargminyY(u,y),y^{*}\in\mathop{\rm argmin}_{y\in Y}{\cal R}(u^{*},y), (4.1)

and it satisfies, in view of (3.66),

uu(y)r1(u,y)κdist(u,),κ=R/r.\|u^{*}-u(y^{*})\|\leq r^{-1}{\cal R}(u^{*},y^{*})\leq\kappa\mathop{\rm dist}(u^{*},{\cal M}),\quad\kappa=R/r. (4.2)

Hence, if we use, for instance, the state estimate u(w)u^{*}(w) from (3.28), we conclude by Theorem 3.2 that u(y)u(y^{*}) deviates from the true state u(y)u(y) by

u(y)u(y)\displaystyle\|u(y)-u(y^{*})\| u(y)u(w)+u(w)u(y)\displaystyle\leq\|u(y)-u^{*}(w)\|+\|u^{*}(w)-u(y^{*})\|
(1+κ)u(y)u(w)\displaystyle\leq(1+\kappa)\|u(y)-u^{*}(w)\|
(1+κ)δκσ,\displaystyle\leq(1+\kappa)\delta_{\kappa\sigma}, (4.3)

where δκσ\delta_{\kappa\sigma} is the benchmark quantity defined in (2.11). If in addition also PW:WP_{W}:{\cal M}\to W is injective so that δ0=0\delta_{0}=0 and if WW and {\cal M} are favorably oriented, as detailed in the assumptions of Theorem 3.4, one even obtains

u(y)u(y)(2μ(,W)+1)κσ.\|u(y)-u(y^{*})\|\leq(2\mu({\cal M},W)+1)\kappa\sigma. (4.4)

To derive from such bounds estimates for the deviation of yy^{*} from yy, more information on the underlying PDE model is needed. For instance, for the second order parametric family of elliptic PDEs (1.36) and strictly positive right hand side ff, it is shown in [4] that the parameter-to-solution map is injective. If in addition the parameter dependent diffusion coefficient a(y)a(y) belongs to H1(Ω)H^{1}(\Omega), one has a quantitative inverse stability estimate of the form

a(y)a(y)L2(Ω)Cu(y)u(y)1/6.\|a(y)-a(y^{\prime})\|_{L^{2}(\Omega)}\leq C\|u(y)-u(y^{\prime})\|^{1/6}. (4.5)

Combining this, for instance, with (4.3), yields

a(y)a(y)L2(Ω)C(1+δ)1/6δκσ1/6.\|a(y)-a(y^{*})\|_{L^{2}(\Omega)}\leq C(1+\delta)^{1/6}\delta_{\kappa\sigma}^{1/6}. (4.6)

Under the favorable assumptions of Theorem 3.4, one obtains a bound of the form

a(y)a(y)L2(Ω)σ1/6.\|a(y)-a(y^{*})\|_{L^{2}(\Omega)}\lesssim\sigma^{1/6}. (4.7)

Finally, in relevant situations (Karhunen-Loeve expansions) the functions ψj\psi_{j} in the expansion of a(y)a(y) form an L2L^{2}-orthogonal system. The above estimates translate then into estimates for a weighted 2\ell_{2}-norm,

(j1cj(yjyj)2)1/2σ1/6.\Bigl{(}\sum_{j\geq 1}c_{j}(y_{j}-y_{j}^{*})^{2}\Bigr{)}^{1/2}\lesssim\sigma^{1/6}. (4.8)

where cj=ψjL22c_{j}=\|\psi_{j}\|_{L^{2}}^{2}.

4.2 Alternating residual minimization

The state estimate u(w)u^{*}(w) is defined by selecting among the potential estimates uk(w)u^{*}_{k}(w) the one that sits closest to the solution manifold, in the sense of the surrogate distance 𝒮(v,){\cal S}(v,{\cal M}). Finding the element in Vw=w+WV_{w}=w+W^{\perp} that is closest to {\cal M} would provide a possibly improved state estimate, and as pointed out in the previous section, also an improved parameter estimate. As explained earlier, it would help in addition with improved set estimators for w{\cal M}_{w}.

Adhering to the definition of the residual (v,y){\cal R}(v,y) from (3.67), we are thus led to consider the double minimization problem

min(v,y)(w+W)×Y(v,y)=minvw+W𝒮(v,).\min_{(v,y)\in(w+W^{\perp})\times Y}{\cal R}(v,y)=\min_{v\in w+W^{\perp}}{\cal S}(v,{\cal M}). (4.9)

We first show that a global minimizing pair (u,y)(u^{*},y^{*}) meets the optimal benchmarks introduced in §2. In the unbiased and non-noisy case, the value of the global minimum is 0, attained by the exact parameter yy and state u(y)u(y). Any global minimizing pair (u,y)(u^{*},y^{*}) will thus satisfy PWu=wP_{W}u^{*}=w and u=u(y)u^{*}=u(y^{*})\in{\cal M}. In other word, the state estimate uu^{*} belongs to w{\cal M}_{w}, and therefore meets the optimal benchmark

uuδ0.\|u-u^{*}\|\leq\delta_{0}. (4.10)

In the case of model bias and noise of amplitude, the state uu satisfies

dist(u,)εmodelandwPWuεnoise.{\rm dist}(u,{\cal M})\leq\varepsilon_{model}\quad{\rm and}\quad\|w-P_{W}u\|\leq\varepsilon_{noise}. (4.11)

It follows that there exists a parameter yy such that uu(y)εmodel\|u-u(y)\|\leq\varepsilon_{model} and a state u~w+W\tilde{u}\in w+W^{\perp} such that uu~εnoise\|u-\tilde{u}\|\leq\varepsilon_{noise}. For this state and parameter, one thus have

(u~,y)Ru(y)u~R(εmodel+εnoise).{\cal R}(\tilde{u},y)\leq R\|u(y)-\tilde{u}\|\leq R(\varepsilon_{model}+\varepsilon_{noise}). (4.12)

Any global minimizing pair (u,y)(u^{*},y^{*}) will thus satisfy

uu(y)1r(u,y)κ(εmodel+εnoise),κ:=Rr.\|u^{*}-u(y^{*})\|\leq\frac{1}{r}{\cal R}(u^{*},y^{*})\leq\kappa(\varepsilon_{model}+\varepsilon_{noise}),\quad\kappa:=\frac{R}{r}. (4.13)

Therefore uu^{*} belongs to the set ε,w{\cal M}_{\varepsilon,w}, as defined by (2.10), with ε:=κ(εmodel+εnoise)\varepsilon:=\kappa(\varepsilon_{model}+\varepsilon_{noise}) and so does u~\tilde{u} since u~u(y)εmodel+εnoiseε\|\tilde{u}-u(y)\|\leq\varepsilon_{model}+\varepsilon_{noise}\leq\varepsilon. In turn, the state estimate uu^{*} meets the perturbed benchmark

uuεnoise+uu~εnoise+δε2δε.\|u^{*}-u\|\leq\varepsilon_{noise}+\|u^{*}-\tilde{u}\|\leq\varepsilon_{noise}+\delta_{\varepsilon}\leq 2\delta_{\varepsilon}. (4.14)

From a numerical perspective, the search for a global minimizing pair is a difficult task due to the fact that (v,y)(v,y)(v,y)\mapsto{\cal R}(v,y) is generally not a convex function. However, it should be noted that in the case of affine parameter dependence (3.69), the residual (v,y){\cal R}(v,y) given by (3.70) is a convex function in each of the two variables v,yv,y separately, keeping the other one fixed. More precisely (v,y)(v,y)2(v,y)\mapsto{\cal R}(v,y)^{2} is a quadratic convex function in each variable. This suggests the following alternating minimization procedure. Starting with an initial guess u0w+Wu^{0}\in w+W^{\perp}, we iteratively compute for j=0,1,2,..j=0,1,2,..,

yj+1\displaystyle y^{j+1} argminyY(uj,y)\displaystyle\in\mathop{\rm argmin}_{y\in Y}{\cal R}(u^{j},y) (4.15)
uj+1\displaystyle u^{j+1} argminvVw(v,yj+1).\displaystyle\in\mathop{\rm argmin}_{v\in V_{w}}{\cal R}(v,y^{j+1}). (4.16)

Each problem has a simply computable minimizer, as discussed in the next section, and the residuals are non-increasing

(uj,yj)(uj,yj+1)(uj+1,yj+1){\cal R}(u^{j},y^{j})\geq{\cal R}(u^{j},y^{j+1})\geq{\cal R}(u^{j+1},y^{j+1})\geq\cdots (4.17)

Of course, one cannot guarantee in general that (uj,yj)(u^{j},y^{j}) converges to a global minimizer, and the procedure may stagnate at a local minimum.

The above improvement property still tells us that if we initialize the algorithm by taking u0=u=u(w)u^{0}=u^{*}=u^{*}(w) the state estimate from (3.28) and y0argminyY(u,y)y^{0}\in{\rm argmin}_{y\in Y}{\cal R}(u^{*},y), then we are ensured at step kk that

(uj,yj)(u,y),{\cal R}(u^{j},y^{j})\leq{\cal R}(u^{*},y^{*}), (4.18)

and therefore, by the same arguments as in the proof of Theorem 3.5, one finds that

uujδκρ+εnoise,\|u-u^{j}\|\leq\delta_{\kappa\rho}+\varepsilon_{noise}, (4.19)

with κ\kappa and ρ\rho as in (3.43). In other words, the new estimate uju^{j} satisfies at least the same accuracy bound than uu^{*}. The numerical tests performed in §5.3 reveal that it can be significanly more accurate.

4.3 Computational issues

We now explain how to efficiently compute the steps in (4.15) and (4.16). We continue to consider a family of linear parametric PDEs with affine parameter dependence (3.69), admitting a uniformly stable variational formulation over the pair trial and test spaces V,ZV,Z, see (3.64)-(3.65).

Minimization of (4.15):

Problem (4.9) requires minimizing (v,y){\cal R}(v,y) for a fixed vVwv\in V_{w} over yYy\in Y. According to (3.74), it amounts to solving a constrained linear least-squares problem

minyYe0+j=1dyjejZ2,\min_{y\in Y}\Big{\|}e_{0}+\sum_{j=1}^{d}y_{j}e_{j}\Big{\|}^{2}_{Z}, (4.20)

where the ejZe_{j}\in Z are the Riesz-lifts rZ(Ajvfj)r_{\mbox{\tiny Z}}(A_{j}v-f_{j}), j=0,,dj=0,\ldots,d, defined in (3.73). As indicated earlier, the numerical solution of (4.20) (for ej=ej,hZhZe_{j}=e_{j,h}\in Z_{h}\subset Z) is standard.

Minimization of (4.16):

Problem (4.21) is of the form

minvw+W(v,y)2=minvw+WA(y)vf(y)Z2\min_{v\in w+W^{\perp}}{\cal R}(v,y)^{2}=\min_{v\in w+W^{\perp}}\|A(y)v-f(y)\|^{2}_{Z^{\prime}} (4.21)

for a fixed yYy\in Y. A naive approach for solving (4.21) would consist in working in a closed subspace of W~W\widetilde{W}^{\perp}\subseteq W^{\perp} of sufficiently large dimension. We would then optimize over vw+W~v\in w+\widetilde{W}^{\perp}. However, this would lead to a large quadratic problem of size dimW~\dim\widetilde{W}^{\perp} which would involve dimW~\dim\widetilde{W}^{\perp} Riesz representer computations. We next propose an alternative strategy involving the solution of only m+3m+3 variational problems. To that end, we assume in what follows that VV is continuously embedded in ZZ^{\prime}, which is the case for all the examples of interest, mentioned earlier in the paper.

The proposed strategy is based on two isomorphisms from VV to ZZ that preserve inner products in a sense to be explained next. We make again heavy use of the Riesz-isometry defined in (3.72) and consider the two isomorphisms

T=T(y):=rZA(y):VZ,S=S(y):=A(y)rV1:VZ,T=T(y):=r_{\mbox{\tiny Z}}A(y):V\to Z,\quad S=S(y):=A(y)^{-*}r_{\mbox{\tiny V}}^{-1}:V\to Z, (4.22)

where rZ:ZZr_{Z}:Z^{\prime}\to Z and rV:VVr_{V}:V^{\prime}\to V are the previously introduced Riesz lifts. One then observes that, by standard duality arguments, they preserve inner products in the sense that for u,vVu,v\in V

Tu,SvZ=rZA(y)u,A(y)rV1vZ=u,vV,\langle Tu,Sv\rangle_{Z}=\langle r_{\mbox{\tiny Z}}A(y)u,A(y)^{-*}r_{\mbox{\tiny V}}^{-1}v\rangle_{Z}=\langle u,v\rangle_{V}, (4.23)

where we have used selfadjointness of Riesz isometries. In these terms the objective functional (v,y)2{\cal R}(v,y)^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}2}} takes the form

A(y)vf(y)Z2=TvrZf(y)Z2.\|A(y)v-f(y)\|^{2}_{Z^{\prime}}=\|Tv-r_{\mbox{\tiny Z}}f(y)\|^{2}_{Z}. (4.24)

We can use (4.23) to reformulate (4.21) as

minvw+W(v,y)2=minvw+WTvrZf(y)Z2=minzTw+S(W)zrZf(y)Z2,\min_{v\in w+W^{\perp}}{\cal R}(v,y)^{2}=\min_{v\in w+W^{\perp}}\|Tv-r_{\mbox{\tiny Z}}f(y)\|^{2}_{Z}=\min_{z\in Tw+S(W)^{\perp}}\|z-r_{\mbox{\tiny Z}}f(y)\|^{2}_{Z}, (4.25)

where we have used that T(W)=S(W)T(W^{\perp})=S(W)^{\perp} to obtain the last equality. Note that the unique solution zZz^{*}\in Z to the right hand side gives a solution vVv^{*}\in V to the original problem through the relationship Tv=zTv^{*}=z^{*}. The minimizer zz^{*} can be obtained by an appropriate orthogonal projection onto S(W)S(W). This indeed amounts to solving a fixed number of m+3m+3 variational problems without compromising accuracy by choosing a perhaps too moderate dimension for a subspace W~\widetilde{W}^{\perp} of WW^{\perp}.

More precisely, we have z=Tw+z~z^{*}=Tw+\tilde{z} where z~S(W)\tilde{z}\in S(W)^{\perp} minimizes z~+TwrZf(y)Z2\|\tilde{z}+Tw-r_{\mbox{\tiny Z}}f(y)\|^{2}_{Z}, and therefore

z~=PS(W)(rZf(y)Tw)=rZf(y)TwPS(W)(rZf(y)Tw).\tilde{z}=P_{S(W)^{\perp}}(r_{\mbox{\tiny Z}}f(y)-Tw)=r_{\mbox{\tiny Z}}f(y)-Tw-P_{S(W)}(r_{\mbox{\tiny Z}}f(y)-Tw). (4.26)

This shows that

z=z(y):=f(y)PS(W)(rZf(y)Tw).z^{*}=z^{*}(y):=f(y)-P_{S(W)}(r_{\mbox{\tiny Z}}f(y)-Tw). (4.27)

Thus, a single iteration of the type (4.21) requires assembling zz^{*} followed by solving the variational problem

Tv,zZ=(A(y)v)(z)=z,zZ,zZ,\langle Tv^{*},z\rangle_{Z}=(A(y)v^{*})(z)=\langle z^{*},z\rangle_{Z},\quad z\in Z, (4.28)

that gives vv^{*}. Assembling zz^{*} involves

  1. (i)

    evaluating TwTw, which means solving the Riesz-lift Tw,zZ=(A(y)w)(z)\langle Tw,z\rangle_{Z}=(A(y)w)(z), zZz\in Z;

  2. (ii)

    computing the Riesz-lift rZf(y)r_{\mbox{\tiny Z}}f(y) by solving rZf(y),zZ=(f(y))(z)\langle r_{\mbox{\tiny Z}}f(y),z\rangle_{Z}=(f(y))(z), zZz\in Z;

  3. (iii)

    computing the projection PS(W)(rZf(y)Tw)P_{S(W)}(r_{\mbox{\tiny Z}}f(y)-Tw). This requires computing the transformed basis functions Swi=A(y)rV1wiSw_{i}=A(y)^{-*}r_{\mbox{\tiny V}}^{-1}w_{i}, which are solutions to the variational problems

    (A(y)Swi)(v)=wi,vV,vV,i=1,,m.(A(y)^{*}Sw_{i})(v)=\langle w_{i},v\rangle_{V},\,\,v\in V,\quad i=1,\ldots,m. (4.29)

Of course, these variational problems are solved only approximately in appropriate large but finite dimensional spaces VhV,ZhZV_{h}\subset V,Z_{h}\subset Z along the remarks at the end of the previous section. While approximate Riesz-lifts involve symmetric variational formulations which are well-treated by Galerkin schemes, the problems involving the operator A(y)A(y) or A(y)A(y)^{*} may in general require an unsymmetric variational formulation where ZVZ\neq V and Petrov-Galerkin schemes on the discrete level. For each of the examples (such as a time-space variational formulation of parabolic or convection diffusion equations) stable discretizations are known, see e.g. [6, 8, 14, 15, 26].

A particularly important strategy for unsymmetric problems is to write the PDE first as an equivalent system of first order PDEs permitting a so called “ultraweak” formulation where the (infinite-dimensional) trial space VV is actually an L2L_{2}-space and the required continuous embedding VZV\subset Z^{\prime} holds. The mapping rVr_{\mbox{\tiny V}} is then just the identity and so called Discontinuous Petrov Galerkin methods offer a way of systematically finding appropriate test spaces in the discrete case with uniform inf-sup stability, [7]. In this context, the mapping TT from(4.22) plays a pivotal role in the identification of “optimal test spaces” and is referred to as “trial-to-test-map”.

Of course, in the case of problems that admit a symmetric variational formulation, i.e., V=ZV=Z, things simplify even further. To exemplify this, consider the a parametric family of elliptic PDEs (1.36). In this case one has (assuming homogeneous boundary conditions) V=Z=H01(Ω)V=Z=H^{1}_{0}(\Omega) so that rZ=rV=Δ1r_{\mbox{\tiny Z}}=r_{\mbox{\tiny V}}=\Delta^{-1}. Due to the selfadjointness of the underlying elliptic operators A(y)A(y) in this case, the problems (4.29) are of the same form as in (4.28) that can be treated on the discrete level by standard Galerkin discretizations.

5 Numerical illustration

In this section we illustrate the construction of nonlinear reduced models, and demonstrate the mechanism of model selection using the residual surrogate methods outlined in §3.4.

In our tests we consider the elliptic problem mentioned in §1.3 on the unit square D=]0,1[2D=]0,1[^{2} with homogeneous Dirichelet boundary conditions, and a parameter dependence in the diffusivity field aa. Specifically, we consider the problem

div(a(y)u)=f,-{\rm div}(a(y)\nabla u)=f, (5.1)

with f=1f=1 on DD, with u|D=0u_{|\partial D}=0. The classical variational formulation uses the same trial and test space V=Z=H01(D)V=Z=H^{1}_{0}(D). We perform space discretization by the Galerkin method using 1\mathbb{P}_{1} finite elements to produce solutions uh(y)u_{h}(y), with a triangulation on a regular grid of mesh size h=27h=2^{-7}.

5.1 Test 1: pre-determined splittings

In this first test, we examine the reconstruction performance with localized reduced bases on a manifold having a predetermined splitting. Specifically, we consider two partitions of the unit square, {D1,}=14\{D_{1,\ell}\}_{\ell=1}^{4} and {D2,}=14\{D_{2,\ell}\}_{\ell=1}^{4}, with

D1,1[0,34[×[0,34[\displaystyle D_{1,1}\coloneqq\Bigl{[}0,\frac{3}{4}\Bigl{[}\times\Bigl{[}0,\frac{3}{4}\Bigl{[} D1,2[0,34[×[34,1]\displaystyle D_{1,2}\coloneqq\Bigl{[}0,\frac{3}{4}\Bigl{[}\times\Bigl{[}\frac{3}{4},1\Bigr{]} D1,3[34,1]×[0,34[\displaystyle D_{1,3}\coloneqq\Bigl{[}\frac{3}{4},1\Bigr{]}\times\Bigl{[}0,\frac{3}{4}\Bigl{[} D1,4[34,1]×[34,1]\displaystyle D_{1,4}\coloneqq\Bigl{[}\frac{3}{4},1\Bigr{]}\times\Bigl{[}\frac{3}{4},1\Bigr{]}
D2,1[14,1]×[14,1]\displaystyle D_{2,1}\coloneqq\Bigl{[}\frac{1}{4},1\Bigr{]}\times\Bigl{[}\frac{1}{4},1\Bigr{]} D2,2[14,1]×[0,14[\displaystyle D_{2,2}\coloneqq\Bigl{[}\frac{1}{4},1\Bigr{]}\times\Bigl{[}0,\frac{1}{4}\Bigl{[} D2,3[0,14[×[14,1]\displaystyle D_{2,3}\coloneqq\Bigl{[}0,\frac{1}{4}\Bigl{[}\times\Bigl{[}\frac{1}{4},1\Bigr{]} D2,4[0,14[×[0,14[\displaystyle D_{2,4}\coloneqq\Bigl{[}0,\frac{1}{4}\Bigl{[}\times\Bigl{[}0,\frac{1}{4}\Bigl{[}

The partitions are symmetric along the axis x+y=1x+y=1 as illustrated in Figure 5.2. This will play a role in the understanding of the results below. We next define two parametric diffusivity fields

a1(y)a¯+c=14χD1,yanda2(y)a¯+c=14χD2,y,a_{1}(y)\coloneqq\overline{a}+c\sum_{\ell=1}^{4}\raise 1.29167pt\hbox{\large$\chi$}_{D_{1,\ell}}y_{\ell}\quad\text{and}\quad a_{2}(y)\coloneqq\overline{a}+c\sum_{\ell=1}^{4}\raise 1.29167pt\hbox{\large$\chi$}_{D_{2,\ell}}y_{\ell}\,, (5.2)

where the vector of parameters y=(y1,,y4)y=(y_{1},\dots,y_{4}) ranges in Y=[1,1]4Y=[-1,1]^{4} and χD1,\raise 1.29167pt\hbox{\large$\chi$}_{D_{1,\ell}} is the indicator function of D1,D_{1,\ell} (similarly for χD2,\raise 1.29167pt\hbox{\large$\chi$}_{D_{2,\ell}}). The fields a1(y)a_{1}(y) and a2(y)a_{2}(y) are mirror images of each other along x+y=1x+y=1. In the numerical tests that follow, we take a¯=1\overline{a}=1 and c=0.9c=0.9.

D1,1D_{1,1}D1,2D_{1,2}D1,3D_{1,3}D1,4D_{1,4}x1x_{1}x2x_{2}0x1x_{1}x2x_{2}0D2,4D_{2,4}D2,3D_{2,3}D2,2D_{2,2}D2,1D_{2,1}
Figure 5.2: Left, the partition of the unit square used in a1a_{1}, and right, the partition used in a2a_{2}.

We denote by u1(y)u_{1}(y) the solution to the elliptic problem (5.1) with diffusivity field a1(y)a_{1}(y), and then label by 1:={u1(y):yY}{\cal M}_{1}:=\{u_{1}(y)\;:\;y\in Y\} the resulting solution manifold. Strictly speaking, we should write h,1{\cal M}_{h,1} as our solutions are finite dimensional approximations, however we suppress the hh as there should be little ambiguity going forward. Similarly, 2{\cal M}_{2} is the set of all solutions u2(y)u_{2}(y) of (5.1) over YY where the diffusivity field is given by a2a_{2}. We take their union =12{\cal M}={\cal M}_{1}\cup{\cal M}_{2} to be our global solution manifold that has the obvious pre-determined splitting available to us.

For our computations, we generate training and test sets. For the training, we draw Ntr=5000N_{\mathrm{tr}}=5000 independent samples Y~tr=(yjtr)j=1Ntr\widetilde{Y}_{\mathrm{tr}}=(y^{\mathrm{tr}}_{j})_{j=1}^{N_{\mathrm{tr}}} that are uniformly distributed over YY. The collection of solutions ~1{u1(yjtr)}j=1Ntr\widetilde{\cal M}_{1}\coloneqq\{u_{1}(y^{\mathrm{tr}}_{j})\}_{j=1}^{N_{\mathrm{tr}}} and ~2{u2(yjtr)}j=1Ntr\widetilde{\cal M}_{2}\coloneqq\{u_{2}(y^{\mathrm{tr}}_{j})\}_{j=1}^{N_{\mathrm{tr}}}, are used as training sets for 1{\cal M}_{1} and 2{\cal M}_{2}. The training set for the full manifold {\cal M} is ~=~1~2\widetilde{\cal M}=\widetilde{\cal M}_{1}\cup\widetilde{\cal M}_{2}. Since we use the same parameter points yjtry^{\mathrm{tr}}_{j} for both sets, any solution in ~1\widetilde{\cal M}_{1} has a corresponding solution in ~2\widetilde{\cal M}_{2} that is its symmetric image along the axis x+y=1x+y=1. To test our reconstruction methods, we generate Nte=2000N_{\mathrm{te}}=2000 independent points in YY that are distinct from the training set. The corresponding test solution sets are 𝒯~1\widetilde{\cal T}_{1} and 𝒯~2\widetilde{\cal T}_{2}. All computations are done by solving (5.1) in the finite element space.

Given an unknown uu\in{\cal M}, we want to recover it from its observation w=PWuw=P_{W}u. For the measurement space WW, we take a collection of m=8m=8 measurement functionals i(u)=ωi,u=|Bi|1uχBi\ell_{i}(u)=\langle\omega_{i},u\rangle=|B_{i}|^{-1}\intop\limits u\,\raise 1.29167pt\hbox{\large$\chi$}_{B_{i}} that are local averages in a small area BiB_{i} which are boxes of width 2h=262h=2^{-6}, each placed randomly in the unit square. The measurement space is then W=span{ω1,,ωm}W=\mathop{\rm span}\{\omega_{1},\ldots,\omega_{m}\}.

Since we are only given ww, we do not know if the function to reconstruct is in 1{\cal M}_{1} or 2{\cal M}_{2} and we consider two possibilities for reconstruction:

  • Affine method: We use affine reduced models Vn,0=u¯0+V¯n,0V_{n,0}=\bar{u}_{0}+\bar{V}_{n,0} generated for the full manifold =12{\cal M}={\cal M}_{1}\cup{\cal M}_{2}. In our example, we take u¯0=u(y=0)\bar{u}_{0}=u(y=0) and V¯n,0\bar{V}_{n,0} is computed by the greedy selection algorithm over ~u¯0\widetilde{\cal M}-\bar{u}_{0}. Of course the spaces Vn,0V_{n,0} with nn sufficiently large have high potential for approximation of the full manifold {\cal M}, and obviously also for the subsets 1{\cal M}_{1} and 2{\cal M}_{2} (see Figure 5.4). Yet, we can expect some bad artefacts in the reconstruction with this space since the true solution will be approximated by snapshots, some of which coming from the wrong part of the manifold and thus associated to the wrong partition of DD. In addition, we can only work with nm=8n\leq m=8 and this may not be sufficient regarding the approximation power. Our estimator u0(w)u^{*}_{0}(w) uses the space Vn0,0V_{n^{*}_{0},0}, where the dimension n0n^{*}_{0} is the one that reaches τ0=min1nmμn,0εn,0\tau_{0}=\min_{1\leq n\leq m}\mu_{n,0}\varepsilon_{n,0} as defined in (3.20) and (3.21). Figure 5.4 shows the product μn,0εn,0\mu_{n,0}\varepsilon_{n,0} for n=1,,mn=1,\dots,m and we see that n0=3n^{*}_{0}=3.

  • Nonlinear method: We generate affine reduced bases spaces Vn,1=u¯1+V¯n,1V_{n,1}=\bar{u}_{1}+\bar{V}_{n,1} and Vn,2=u¯2+V¯n,2V_{n,2}=\bar{u}_{2}+\bar{V}_{n,2}, each one specific for 1{\cal M}_{1} and 2{\cal M}_{2}. Similarly as for the affine method, we take as offsets u¯i=ui(y=0)=u¯0\bar{u}_{i}=u_{i}(y=0)=\bar{u}_{0}, for i=1,2i=1,2, and we run two separate greedy algorithms over ~1u¯1\widetilde{\cal M}_{1}-\bar{u}_{1} and ~2u¯2\widetilde{\cal M}_{2}-\bar{u}_{2} to build V¯n,1\bar{V}_{n,1} and V¯n,2\bar{V}_{n,2}. We select the dimensions nkn^{*}_{k} that reach τk=minn=1,,mμn,kεn,k\tau_{k}=\min_{n=1,\dots,m}\mu_{n,k}\varepsilon_{n,k} for k=1,2k=1,2. From Figure 5.4, we deduce111Due to the spatial symmetry along the axis x+y=1x+y=1 for the functions in ~1\widetilde{\cal M}_{1} and ~2\widetilde{\cal M}_{2}, the greedy algorithm selects exactly the same candidates to build Vn,2V_{n,2} as for Vn,1V_{n,1}, except that each element is mirrored in the axis. One may thus wonder why n1n2n^{*}_{1}\neq n^{*}_{2}. The fact that different values are chosen for each manifold reflects the fact that the measurement space WW introduces a symmetry break and the reconstruction scheme is no longer spatially symmetric contrary to the Vn,kV_{n,k}. that n1=4n^{*}_{1}=4 and n2=3n^{*}_{2}=3. This yields two estimators u1(w)u^{*}_{1}(w) and u2(w)u^{*}_{2}(w). We can expect better results than the affine approach if we can detect well in which part of the manifold the target function is located. The main question is thus whether our model selection strategy outlined in Section 3.3 is able to detect well from the observed data ww if the true uu lies in 1{\cal M}_{1} or 2{\cal M}_{2}. For this, we compute the surrogate manifold distances

    𝒮(uk(w),k)minyYk(uk(w),y),k=1,2,\displaystyle{\cal S}(u^{*}_{k}(w),{\cal M}_{k})\coloneqq\min_{y\in Y}{\cal R}_{k}(u^{*}_{k}(w),y),\quad k=1,2, (5.3)

    where

    k(uk(w),y)div(ak(y)uk(w))+fV{\cal R}_{k}(u^{*}_{k}(w),y)\coloneqq\|\mathrm{div}(a_{k}(y)\nabla u^{*}_{k}(w)){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}+}f\|_{V^{\prime}}

    is the residual of uk(w)u^{*}_{k}(w) related to the PDE with diffusion field ak(y)a_{k}(y). To solve problem (5.3), we follow the steps given in Section 3.4. The final estimator is u=uku^{*}=u^{*}_{k^{*}}, where

    k=argmink=1,2𝒮(uk(w),k).k^{*}=\mathop{\rm argmin}_{k=1,2}{\cal S}(u^{*}_{k}(w),{\cal M}_{k}).

Table 1 quantifies the quality of the model selection approach. It displays how many times our model selection strategy yields the correct result k=1k^{*}=1 or incorrect result k=2k^{*}=2 for the functions from the test set 𝒯~11\widetilde{\cal T}_{1}\subset{\cal M}_{1} (and vice-versa for 𝒯~2\widetilde{\cal T}_{2}). Recalling that these tests sets have Nte=2000N_{te}=2000 snapshots, we conclude that the residual gives us the correct manifold portion roughly 75% of the time. We can compare this performance with the one given by the oracle estimator (see Table 1)

koracle=argmink=1,2uuk(w).k^{*}_{\text{oracle}}=\mathop{\rm argmin}_{k=1,2}\|u-u^{*}_{k}(w)\|.

In this case, we see that the oracle selection is very efficient since it gives us the correct manifold portion roughly 99% of the time. Figure 5.3 completes the information given in Table 1 by showing the distribution of the values of the residuals and oracle errors. The distributions give visual confirmation that both the model and oracle selection tend to pick the correct model by giving residual/error values which are lower in the right manifold portion. Last but not least, Figure 5.4 gives information on the value of inf-sup constants and residual errors leading to the choice of the dimension nn^{*} for the reduced models. Table 2 summarizes the reconstruction errors.

Test function from:
Surrogate selection 𝒯~11\widetilde{\cal T}_{1}\subset{\cal M}_{1} 𝒯~22\widetilde{\cal T}_{2}\subset{\cal M}_{2}
k=1k^{*}=1 1625 386
k=2k^{*}=2 375 1614
Success rate 81.2 % 80.7 %
Test function from:
Oracle selection 𝒯~11\widetilde{\cal T}_{1}\subset{\cal M}_{1} 𝒯~22\widetilde{\cal T}_{2}\subset{\cal M}_{2}
k=1k^{*}=1 1962 9
k=2k^{*}=2 38 1991
Success rate 98.1 % 99.5 %
Table 1: Performance of model selection and oracle selection.
Refer to caption
Figure 5.3: Kernel density estimate (KDE) plot of the u1u_{1}^{*} and u2u_{2}^{*}.
Refer to caption
Figure 5.4: Inf-sup constants μn,k\mu_{n,k} and residual errors εn,k\varepsilon_{n,k}, leading to the dimension nn choice for VkV_{k}.
Test function from:
Average error 𝒯~11\widetilde{\cal T}_{1}\subset{\cal M}_{1} 𝒯~22\widetilde{\cal T}_{2}\subset{\cal M}_{2}
Affine method u0u^{*}_{0} 6.047e-02 6.661e-02
Nonlinear with oracle model selection 5.057e-02 4.855e-02
Nonlinear with surrogate model selection 5.522e-02 5.201e-02
Test function from:
Worst case error 𝒯~11\widetilde{\cal T}_{1}\subset{\cal M}_{1} 𝒯~22\widetilde{\cal T}_{2}\subset{\cal M}_{2}
Affine method u0u^{*}_{0} 4.203e-01 4.319e-01
Nonlinear with oracle model selection 2.786e-01 2.641e-01
Nonlinear with surrogate model selection 4.798e-01 2.660e-01
Table 2: Reconstruction errors with the different methods.

5.2 Test 2: constructing σ\sigma-admissible families

In this example we examine the behavior of the splitting scheme to construct σ\sigma-admissible families outlined in §3.2.

The manifold {\cal M} is given by the solutions to equation (5.1) associated to the diffusivity field

a(y)=a¯+=1dcχDy,yY,a(y)=\overline{a}+\sum_{\ell=1}^{d}c_{\ell}\raise 1.29167pt\hbox{\large$\chi$}_{D_{\ell}}y_{\ell},\quad y\in Y, (5.4)

where χD\raise 1.29167pt\hbox{\large$\chi$}_{D_{\ell}} is the indicator function on the set DD_{\ell}, and parameters ranging uniformly in Y=[1,1]dY=[-1,1]^{d}. We study the impact of the intrinsic dimensionality of the manifold by considering two cases for the partition of the unit square DD, a 2×22\times 2 uniform grid partition resulting in d=4d=4 parameters, and a 4×44\times 4 grid partition of DD resulting in d=16d=16 parameters. We also study the impact of coercivity and anisotropy on our reconstruction algorithm by examining the different manifolds generated by taking c=c1rc_{\ell}=c_{1}\ell^{-r} with c1=0.9c_{1}=0.9 or 0.990.99 and r=1r=1 or 22. The value c1=0.99c_{1}=0.99 corresponds to a severe degeneration of coercivity, and the rate r=2r=2 corresponds to a more pronounced anisotropy.

Refer to caption
Figure 5.5: Measurement locations for Test 2 and Test 3

We use two different measurement spaces, one with m=dim(W)=4m=\dim(W)=4 evenly spaced local averages and the other with m=16m=16 evenly spaced local averages. The measurement locations are shown diagrammatically in Figure 5.5. The local averages are exactly as in the last test, taken in squares of side-length 262^{-6}. Note that the two values m=4m=4 and m=16m=16 which we consider for the dimension of the measurement space are the same as the parameter dimensions d=4d=4 and d=16d=16 of the manifolds. This allows us to study different regimes:

  • When m<dm<d, we have a highly ill-posed problem since the intrinsic dimension of the manifold is larger than the dimension of the measurement space. In particular, we expect that the fundamental barrier δ0()\delta_{0}({\cal M}) is strictly positive. Thus we cannot expect very accurate reconstructions even with the splitting strategy.

  • When mdm\geq d, the situation is more favorable and we can expect that the reconstruction involving manifold splitting brings significant accuracy gains.

As in the previous case, the training set ~\widetilde{\cal M} is generated by a subset Y~tr={yjtr}j=1,,Ntr\widetilde{Y}_{\mathrm{tr}}=\{y^{\mathrm{tr}}_{j}\}_{j=1,\ldots,N_{\mathrm{tr}}} of Ntr=5000N_{\mathrm{tr}}=5000 samples taken uniformly on YY. We build the σ\sigma-admissible families outlined in §3.2 using a dyadic splitting and the splitting rule is given by (3.21). For example, our first split of YY results in two rectangular cells Y1Y_{1} and Y2Y_{2}, and the corresponding collections of parameter points Y~1Y1\widetilde{Y}_{1}\subset Y_{1} and Y~2Y2\widetilde{Y}_{2}\subset Y_{2}, as well as split collections of solutions ~1\widetilde{\cal M}_{1} and ~2\widetilde{\cal M}_{2}. On each ~k\widetilde{\cal M}_{k} we apply the greedy selection procedure, resulting in VkV_{k}, with computable values μk\mu_{k} and εk\varepsilon_{k}. The coordinate direction in which we split YY is precisely the direction that gives us the smallest resulting σ=maxk=1,2μkεk\sigma=\max_{k=1,2}\mu_{k}\varepsilon_{k}, so we need to compute greedy reduced bases for each possible splitting direction before deciding which results in the lowest σ\sigma. Subsequent splittings are performed in the same manner, but at each step we first chose cell ksplit=argmaxk=1,,Kμkεkk_{\mathrm{split}}=\mathop{\rm argmax}_{k=1,\ldots,K}\mu_{k}\varepsilon_{k} to be split.

After K1K-1 splits, the parameter domain is divided into Y=k=1KYkY=\bigcup_{k=1}^{K}Y_{k} disjoint subsets YkY_{k} and we have computed a family of KK affine reduced spaces (Vk)k=1,,K(V_{k})_{k=1,\ldots,K}. For a given wWw\in W, we have KK possible reconstructions u1(w),uK(w)u_{1}^{*}(w),\dots u_{K}^{*}(w) and we select a value kk^{*} with the surrogate based model selection outlined in §3.4. The test is done on a test set of Nte=1000N_{\text{te}}=1000 snapshots which are different from the ones used for the training set ~\widetilde{\cal M}.

In Figure 5.6 we plot the reconstruction error, averaged over the test set, as a function of the number of splits KK for all the different configurations: we consider the 2 different diffusivity fields a(y)a(y) with d=4d=4 and d=16d=16 parameters, the two measurement spaces of dimension m=4m=4 and m=16m=16, and the 44 different ellipticity/coercivity regimes of cc_{\ell} in a(y)a(y). We also plot the error when taking for kk^{*} the oracle value that corresponds to the value of kk that contains the parameter yy which gave rise to the snapshot and measurement.

Our main findings can be summarized as follows:

  1. (i)

    The error decreases with the number of splits. As anticipated, the splitting strategy is more effective in the overdetermined regime mdm\geq d.

  2. (ii)

    Degrading coercivity has a negative effect on the estimation error, while anisotropy has a positive effect.

  3. (iii)

    Choosing kk^{*} by the surrogate based model selection yields error curves that are above yet close to those obtained with the oracle choice. The largest discrepancy is observed when coercivity degrades.

Figure 5.7 presents the error bounds σK:=maxk=1,,Kμkεk\sigma_{K}:=\max_{k=1,\dots,K}\mu_{k}\varepsilon_{k} which are known to be upper bounds for the estimation error when choosing the oracle value for kk^{*} at the given step KK of the splitting procedure. We observe that these worst upper bounds have similar behaviour as the averaged error curves depicted on Figure 5.6. In Figure 5.8, for the particular configuration dim(Y)=dim(W)=16\dim(Y)=\dim(W)=16, we demonstrate that σK\sigma_{K} indeed acts as an upper bound for the worst case error of the oracle estimator.

Refer to caption
Figure 5.6: Average of errors ujteuk(wjte)\|u^{\mathrm{te}}_{j}-u^{*}_{k^{*}}(w^{\mathrm{te}}_{j})\| for different choices of kk^{*} .
Refer to caption
Figure 5.7: Error bounds of local linear families, given by σK=maxk=1Kμkεk\sigma_{K}=\max_{k=1\ldots K}\mu_{k}\varepsilon_{k}.
Refer to caption
Figure 5.8: Comparison between σK\sigma_{K} (dashed curve), the averaged oracle error (full curve) and the the range from maximum to minimum oracle error (shaded region).

5.3 Test 3: improving the state estimate by alternating residual minimization

The goal of this test is to illustrate how the alternating residual minimization outlined in §4.2 allows one to improve the accuracy of the state estimate. We use the same setting as Test 2, in particular, we consider the solution manifold {\cal M} of equation (5.1) with the random field a(y)a(y) defined as in (5.4). Again we consider the cases where the DD_{\ell} from (5.4) are from the 2×22\times 2 and 4×44\times 4 grid, resulting in d=dim(Y)=4d=\dim(Y)=4 and 1616 respectively. Our test uses all three measurement regimes presented in Figure 5.5, with m=4m=4 and 1616 evenly spaced local average functions, and m=8m=8 randomly placed local averages confined to the upper half. We use the coercivity/anisotropy regime c=0.91c_{\ell}=0.9\,\ell^{-1}.

In this test we compare three different candidates for u0u^{0}, the starting point of the alternating minimization procedure:

  • u0=wu^{0}=w, the measurement vector without any further approximation, or equivalently the reconstruction of minimal H01H^{1}_{0} norm among all functions that agree with the observations.

  • u0=u(w)u^{0}=u^{*}(w), the PBDW state estimation using the greedy basis over the whole manifold, thus starting the minimization from a “lifted” candidate that we hope is closer to the manifold {\cal M} and should thus offer better performance.

  • u0=uk(w)u^{0}=u^{*}_{k^{*}}(w), the surrogate-chosen local linear reconstruction from the same family of local linear models from §5.2 (where kk^{*} is the index of the chosen local linear model). In this last case we take K=20K=20 local linear models, i.e. where YY has been split 19 times.

Furthermore, in the third case, we restrict our parameter range to be the local parameter range chosen by the surrogate, that is we alter the step outlined in (4.15) to be

yj+1=argminyYk(uj,y),y^{j+1}=\mathop{\rm argmin}_{y\in Y_{k^{*}}}{\cal R}(u^{j},y),

where yj+1y^{j+1} denotes the parameter found at the (j+1)(j+1)-th step of the procedure. The hope is that we have correctly chosen the local linear model and restricted parameter range from which the true solution comes thanks to our local model selection. The alternating minimization will thus have a better starting position and then a faster convergence rate due to the restricted parameter range.

In our test we use the same training set ~\widetilde{\cal M} as in the previous test, with Ntr=5000N_{\text{tr}}=5000 samples, in order to generate the reduced basis spaces. We consider a set of Nte=10N_{\text{te}}=10 snapshots, distinct from any snapshots in ~\widetilde{\cal M}, and perform the alternate minimization for each of the snapshots in the test set.

The final figures display the state error trajectories juujj\mapsto\|u-u^{j}\| for each snapshots (dashed lines) as well as their geometric average (full lines), in different colors depending on the initialization choice. Similarly we display the residual trajectories j(uj,yj)=A(yj)ujfVj\mapsto{\cal R}(u^{j},y^{j})=\|A(y^{j})u^{j}-f\|_{V^{\prime}} and parameter error trajectories j|yyj|2j\mapsto|y-y^{j}|_{2}. Our main findings can be summarized as follows:

  1. (i)

    In all cases, there is a substantial gain in taking u0=uk(w)u^{0}=u^{*}_{k^{*}}(w), the surrogate-chosen local linear reconstruction, as starting point. In certain cases, the iterative procedure initiated from the two other choices ww or u(w)u^{*}(w) stagnates at an error level that is even higher than uuk(w)\|u-u^{*}_{k^{*}}(w)\|.

  2. (ii)

    The state error, residual and parameter error decrease to zero in the overdetermined configurations where (dim(W),dim(Y))(\dim(W),\dim(Y)) is (4,4)(4,4), (16,16)(16,16) or (16,4)(16,4), with equally spaced measurement sites. In the underdetermined configurations (4,16)(4,16), the state and parameter error stagnates, while the residual error decreases to zero, which reflects the fact that there are several (y,u)Y×w+W(y,u)\in Y\times w+W^{\perp} satisfying (y,u)=0{\cal R}(y,u)=0, making the fundamental barrier δ0\delta_{0} strictly positive.

  3. (iii)

    The state error, residual and parameter error do not decrease to zero in the overdetermined configuration (dim(W),dim(Y))=(8,4)(\dim(W),\dim(Y))=(8,4) where the measurement sites are concentrated on the upper-half of the domain. This case is interesting since, while we may expect that there is a unique pair (y,u(y))Y×w+W(y,u(y))\in Y\times w+W^{\perp} reaching the global minimal value (y,u)=0{\cal R}(y,u)=0, the algorithm seems to get trapped in local minima.

[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]

References

  • [1] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk, Convergence Rates for Greedy Algorithms in Reduced Basis Methods, SIAM Journal of Mathematical Analysis 43, 1457-1472, 2011. DOI: 10.1137/100795772
  • [2] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk, Data assimilation in reduced modeling, SIAM/ASA J. Uncertain. Quantif. 5, 1-29, 2017. DOI: 10.1137/15M1025384
  • [3] P. Binev, A. Cohen, O. Mula and J. Nichols, Greedy algorithms for optimal measurements selection in state estimation using reduced models, SIAM Journal on Uncertainty Quantification 43, 1101-1126, 2018. DOI: 10.1137/17M1157635
  • [4] A. Bonito, A. Cohen, R. DeVore, G. Petrova and G. Welper, Diffusion Coefficients Estimation for Elliptic Partial Differential Equations, SIAM J. Math. Anal 49. 1570-1592, 2017. DOI: 10.1137/16M1094476
  • [5] A. Bonito, A. Cohen, R. DeVore, D. Guignard, P. Jantsch and G. Petrova, Nonlinear methods for model reduction, arXiv:2005.02565, 2020.
  • [6] D. Broersen and R. P. Stevenson, A Petrov-Galerkin discretization with optimal test space of a mild-weak formulation of convection-diffusion equations in mixed form, IMA J. Numer. Anal. 35 (2015), no. 1, 39–73. DOI 10.1093/imanum/dru003.
  • [7] C. Carstensen, L. Demkowicz, J. Gopalakrishnan, Breaking spaces and forms for the DPG method and applications including Maxwell equations, Comput. Math. Appl. 72 (2016), no. 3, 494–522. DOI: 10.1016/j.camwa.2016.05.004
  • [8] A. Cohen, W. Dahmen, and G. Welper, Adaptivity and variational stabilization for convection-diffusion equations, ESAIM Math. Model. Numer. Anal. 46 (2012), no. 5, 1247–1273, DOI: 10.1051/m2an/2012003.
  • [9] A. Cohen, W. Dahmen, R. DeVore, and J. Nichols, Reduced Basis Greedy Selection Using Random Training Sets, ESAIM: Mathematical Modelling and Numerical Analysis, vol. 54, no. 5, 2020. DOI: 10.1051/m2an/2020004
  • [10] A. Cohen, W. Dahmen, R. DeVore, J. Fadili, O. Mula and J. Nichols, Optimal reduced model algorithms for data-based state estimation, arXiv:1903.07938. Accepted in SIAM Journal of Numerical Analysis, 2020.
  • [11] A. Cohen and R. DeVore, Approximation of high dimensional parametric PDEs, Acta Numerica, no. 24, 1-159, 2015. DOI: 10.1017/S0962492915000033
  • [12] A. Cohen, R. DeVore and C. Schwab, Analytic Regularity and Polynomial Approximation of Parametric Stochastic Elliptic PDEs, Analysis and Applications 9, 11-47, 2011. DOI: 10.1142/S0219530511001728
  • [13] A. Cohen, R. DeVore, G. Petrova, and P. Wojtaszczyk, Finding the minimum of a function, Methods and Applications of Analysis 20, 393-410, 2013. DOI: 10.4310/MAA.2013.v20.n4.a4
  • [14] W. Dahmen, C. Huang, C. Schwab, and G. Welper, Adaptive Petrov-Galerkin meth- ods for first order transport equations, SIAM J. Numer. Anal. 50 (2012), no. 5, 2420–2445. DOI: 10.1137/110823158. MR3022225
  • [15] W. Dahmen, C. Plesken, G. Welper, Double Greedy Algorithms: Reduced Basis Methods for Transport Dominated Problems, ESAIM: Mathematical Modelling and Numerical Analysis, 48(3) (2014), 623–663. DOI: 10.1051/m2an/2013103.
  • [16] R. DeVore, G. Petrova, and P. Wojtaszczyk, Greedy algorithms for reduced bases in Banach spaces, Constructive Approximation, 37, 455-466, 2013. DOI: 10.1007/s00365-013-9186-2
  • [17] M. Dashti and A.M. Stuart, The Bayesian Approach to Inverse Problems, Handbook of Uncertainty Quantification, 311–428, Editors R. Ghanem, D. Higdon and H. Owhadi, Springer, 2017. DOI: 10.1007/978-3-319-12385-1_7
  • [18] J.L. Eftang, A.T. Patera and E.M. Ronquist, An hphp certified reduced basis method for parametrized elliptic partial differential equations, SIAM J. Sci. Comput. 32, 3170-3200, 2010. DOI: 10.1137/090780122
  • [19] Y. Maday, A.T. Patera, J.D. Penn and M. Yano, A parametrized-background data-weak approach to variational data assimilation: Formulation, analysis, and application to acoustics, Int. J. Numer. Meth. Eng. 102, 933-965, 2015. DOI: 10.1002/nme.4747
  • [20] Y. Maday, A. T. Patera, and G. Turinici, Global a priori convergence theory for reduced-basis approximations of single-parameter symmetric coercive elliptic partial differential equations, Comptes Rendus Académie des Sciences, Série I, Math.335, 289-294, 2002. DOI: 10.1016/S1631-073X(02)02466-4
  • [21] Y. Maday and B. Stamm, Locally adaptive greedy approximations for anisotropic parametrer reduced basis spaces, SIAM J. Scientific Computing 35, 2417-2441, 2013. DOI: 10.1137/120873868
  • [22] P. Massart, Concentration inequalities and model selection, Springer 2007. DOI: 10.1007/978-3-540-48503-2
  • [23] G. Rozza, D.B.P. Huynh, and A.T. Patera, Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations - application to transport and continuum mechanics, Archive of Computational Methods in Engineering 15, 229-275, 2008. DOI: 10.1007/s00791-006-0044-7
  • [24] S. Sen, Reduced-basis approximation and a posteriori error estimation for many-parameter heat conduction problems, Numerical Heat Transfer B-Fund 54, 369-389, 2008. DOI: 10.1080/10407790802424204
  • [25] A. M. Stuart, Inverse problems: A Bayesian perspective Acta Numerica, 451-559, 2010. DOI: 10.1017/S0962492910000061
  • [26] R. P. Stevenson, J. Westerdiep, Stability of Galerkin discretizations of a mixed space-time variational formulation of parabolic evolution equations, IMA J. Numer. Anal. (2020). DOI: 10.1093/imanum/drz069