This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bernstein–von Mises theorems for
time evolution equations


Richard Nickl

University of Cambridge
Abstract

We consider a class of infinite-dimensional dynamical systems driven by non-linear parabolic partial differential equations with initial condition θ\theta modelled by a Gaussian process ‘prior’ probability measure. Given discrete samples of the state of the system evolving in space-time, one obtains updated ‘posterior’ measures on a function space containing all possible trajectories. We give a general set of conditions under which these non-Gaussian posterior distributions are approximated, in Wasserstein distance for the supremum-norm metric, by the law of a Gaussian random function. We demonstrate the applicability of our results to periodic non-linear reaction diffusion equations

tuΔu\displaystyle\frac{\partial}{\partial t}u-\Delta u =f(u)\displaystyle=f(u)
u(0)\displaystyle u(0) =θ\displaystyle=\theta

where ff is any smooth and compactly supported reaction function. In this case the limiting Gaussian measure can be characterised as the solution of a time-dependent Schrödinger equation with ‘rough’ Gaussian initial conditions whose covariance operator we describe.

1 Introduction

1.1 Bayesian inference in dynamical systems

Denote by L2(Ω)L^{2}(\Omega) the Hilbert space of square-integrable vector fields over a bounded domain Ωd\Omega\subset\mathbb{R}^{d}. For initial condition θ=u(0)\theta=u(0), consider states uθ(t)u_{\theta}(t) at times t>0t>0 of a dynamical system evolving in L2(Ω)L^{2}(\Omega). Examples we have in mind include the Navier-Stokes, reaction-diffusion or McKean-Vlasov systems [43, 39, 9, 41], where uθu_{\theta} models the velocity field of an incompressible fluid, the concentration of a chemical substance, or the distribution of interacting particles, respectively. The infinitesimal dynamics of such systems are described by a non-linear partial differential equation (PDE)

ut=Δu+F(u)\frac{\partial u}{\partial t}=\Delta u+F(u) (1)

where Δ\Delta is the Laplacian and FF is a functional modelling the underlying non-linearity.

In scientific applications, the initial condition θ\theta is uncertain and moreover, physical observations are necessarily discrete and include statistical error. For instance, following the paradigm of probabilistic numerics [13, 10, 40, 3], an experimenter may sample the state uθu_{\theta} of the system at positions (ti,ωi)i=1N(t_{i},\omega_{i})_{i=1}^{N} drawn uniformly and independently at random from the time-space cylinder [0,T]×Ω[0,T]\times\Omega, with T>0T>0 a fixed time horizon. One thus observes the vector Z(N)=(Yi,ti,ωi)i=1NZ^{(N)}=(Y_{i},t_{i},\omega_{i})_{i=1}^{N} from the regression equation

Yi=uθ(ti,ωi)+εi,i=1,,N,N,Y_{i}=u_{\theta}(t_{i},\omega_{i})+\varepsilon_{i},\leavevmode\nobreak\ i=1,\dots,N,\leavevmode\nobreak\ N\in\mathbb{N}, (2)

where the εi𝒩(0,I)\varepsilon_{i}\sim\mathcal{N}(0,I) are independent and identically distributed (iid) Gaussian random noise. The infinite product probability measure describing the law of the sequence Z()Z^{(\infty)} arising from initial condition θ\theta will be denoted by PθP_{\theta}^{\mathbb{N}}.

The uncertainty about the state θ\theta of the system before measurements are taken can be updated to a posterior distribution on the uθu_{\theta}’s given Z(N)Z^{(N)}, by applying the rules of conditional probability. This follows seminal ideas of Laplace (Chapter VI in [24]) and nowadays is called Bayesian inference [19] or sometimes more specifically – when considering dynamical systems in applied sciences – ‘data assimilation’ or ‘filtering’ [16, 38, 25, 26, 23]. In infinite-dimensional settings where θ\theta is a function, a typical approach is to model the initial condition by a Gaussian random field (θ(x):xΩ)(\theta(x):x\in\Omega) whose mean function is based on past knowledge (or in absence of it, equals zero) and whose covariance is an inverse power of the Laplace operator on Ω\Omega. If we denote by Π\Pi the induced ‘prior’ probability distribution on θ\theta, the updating step (i.e., Bayes’ theorem) generates the posterior Gibbs’ probability distribution for θ|Z(N)\theta|Z^{(N)} of the form

dΠ(θ|Z(N))exp{12i=1N|Yiuθ(ti,ωi)|2}dΠ(θ),θL2(Ω).d\Pi(\theta|Z^{(N)})\propto\exp\Big{\{}-\frac{1}{2}\sum_{i=1}^{N}|Y_{i}-u_{\theta}(t_{i},\omega_{i})|^{2}\Big{\}}d\Pi(\theta),\leavevmode\nobreak\ \leavevmode\nobreak\ \theta\in L^{2}(\Omega). (3)

We also obtain an update for the whole dynamical system uθ(t)u_{\theta}(t) whose marginal time distributions on L2(Ω)L^{2}(\Omega) are given by the image measures

Π^t,N=Law(uθ(t)),θΠ(|Z(N)),t0.\hat{\Pi}_{t,N}=Law(u_{\theta}(t)),\leavevmode\nobreak\ \theta\sim\Pi(\cdot|Z^{(N)}),\leavevmode\nobreak\ t\geq 0. (4)

For the last observation times t(N)=maxiNti(0,T],t_{(N)}=\max_{i\leq N}t_{i}\in(0,T], the preceding laws are sometimes called filtering distributions; for t>t(N)t>t_{(N)} they can be used to forecast future states of the system, while for t<t(N)t<t_{(N)} the are referred to as ‘smoothing distributions’, see, e.g., p.xii in [25] or p.173 in [38] for this terminology from the data assimilation literature. Various iterative algorithms exist that aim to compute these posterior measures by numerical (e.g., MCMC, or particle filter) methods [40, 11, 22, 38, 25, 4, 21]. The resulting inferences are widely used in scientific application areas, but statistical theory for the performance of such methods in non-linear settings remains elusive.

The distributions Π^t,N\hat{\Pi}_{t,N} are random probability measures in function space that involve the non-linearity of the underlying evolution operator both in the Gibbs’ measure (3) and in the push-forward (4), and have no simple closed form representation. A first sanity check for this Bayesian approach would be to establish its ‘posterior consistency’ [19]; namely that Π^t,N\hat{\Pi}_{t,N} converges in an appropriate sense to Dirac measure δuθ0\delta_{u_{\theta_{0}}} at the state uθ0(t)u_{\theta_{0}}(t) arising from ‘ground truth’ initial condition θ0\theta_{0}, at least as the signal to noise ratio increases (i.e., as sample size NN\to\infty), and with high Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability. For the 2d2d-Navier-Stokes model such a result was proved recently in [36] – see Remark 2 for more discussion.

In this article we aim to further advance our understanding of the statistical behaviour of the random measures Π^t,N\hat{\Pi}_{t,N} for large sample sizes NN\to\infty. Specifically, following another idea of Laplace, we investigate whether the non-Gaussian posterior measures Π^t,N\hat{\Pi}_{t,N} are perhaps approximately Gaussian in a suitable sense, a phenomenon that goes by the name of Bernstein-von Mises (BvM) theorem in the mathematical statistics literature. Such results hold in ‘regular’ finite-dimensional statistical models [44] but are more intricate in high- or infinite-dimensional settings [18, 5, 6, 19, 33]. We will devise a template tailored to dynamical systems of parabolic type (1) that allows to show that in a suitable sense, uniformly in fixed time windows, and with high Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability,

Π^t,N𝒩(uθ~N,𝒞t/N),t>0,\hat{\Pi}_{t,N}\approx\mathcal{N}\big{(}u_{\tilde{\theta}_{N}},\mathcal{C}_{t}/N\big{)},t>0, (5)

where θ~N\tilde{\theta}_{N} is the posterior mean vector and where the limit law 𝒩(0,𝒞t)\mathcal{N}(0,\mathcal{C}_{t}) can be characterised by the solution at time tt of a linear stochastic PDE with certain ‘rough’ Gaussian initial conditions. We demonstrate how the theory works for a concrete nonlinear reaction-diffusion system and discuss structurally similar non-linear parabolic PDEs (1) in Remark 2.

These Gaussian approximations will be shown to hold in the uniform norm over Ω\Omega, and at the convergence rate 1/N1/\sqrt{N}. This is possible by first proving a Gaussian approximation in weak ‘Schwartz’-type topologies for the initial conditions θ\theta (partly following ideas from [5, 6, 30] in different models), and then using strong smoothing properties of the semigroup underlying (1) at positive times t>0t>0. Some discussion of the last point and its relationship to non-existence results [18] for infinite-dimensional Bernstein-von Mises theorems can be found in Remark 3.

1.2 Asymptotics for posterior measures in reaction-diffusion equations

We let Ω=[0,1]d\Omega=[0,1]^{d} denote the dd-dimensional torus and consider periodic solutions to the PDE

tu\displaystyle\frac{\partial}{\partial t}u =Δu+f(u), on (0,)×Ω,\displaystyle=\Delta u+f(u),\leavevmode\nobreak\ \leavevmode\nobreak\ \text{ on }(0,\infty)\times\Omega, (6)
u(0)\displaystyle u(0) =θ, on Ω,\displaystyle=\theta,\leavevmode\nobreak\ \leavevmode\nobreak\ \text{ on }\Omega,

where Δ\Delta is the Laplacian and ff is a given non-linear ‘reaction’ term that is smooth, compactly supported and time-independent, acting on uu by composition f(u)=fuf(u)=f\circ u point-wise. The unique solutions of this PDE induce a dynamical system evolving in L2(Ω)L^{2}(\Omega), see [43, 39]. In applications the initial condition θ\theta may represent the distribution of some substance and (6) describes its evolution over time where the non-linear reaction term models the amount of the substance that is created or destroyed in dependence f(u(t))f(u(t)) of the current state u(t)u(t). For the proofs we will assume the dimension of the state space Ω=[0,1]d\Omega=[0,1]^{d} to be d3d\leq 3 and we also only consider scalar uu taking values in \mathbb{R} rather than general vector fields u:Ωdu:\Omega\to\mathbb{R}^{d}. Extensions to d4d\geq 4 are mostly of a technical nature (involving Schauder theory, as in Sec. 5.2 of [27]) while coupled systems of equations could be dealt with by mostly notational changes as long as f=Ff=\nabla F is a gradient vector field of smooth F:ΩF:\Omega\to\mathbb{R}.

We model the uncertainty about the initial condition θ\theta of the system by a γ\gamma-regular Gaussian random field (θ(x):xΩ)(\theta(x):x\in\Omega) whose prior Law(θ)Law(\theta) on the space L02(Ω)=L2(Ω){h:Ωh=0}L^{2}_{0}(\Omega)=L^{2}(\Omega)\cap\{h:\int_{\Omega}h=0\} arises from

θΠ=𝒩(0,ρ2Δγ),γ>1+d/2,ρ>0,\theta\sim\Pi=\mathcal{N}(0,\rho^{2}\Delta^{-\gamma}),\leavevmode\nobreak\ \leavevmode\nobreak\ \gamma>1+d/2,\leavevmode\nobreak\ \rho>0, (7)

see Condition 1 for details. The choice of ρ\rho provides necessary prior regularisation to prevent the nonlinearities inherent in the posterior Gibbs measure (3) to behave erratically at ‘low temperatures’ NN\to\infty. A way that achieves this exhibited in [28] takes the form

ρ=ρN=1/(NδN), where δNNγ/(2γ+d).\rho=\rho_{N}=1/(\sqrt{N}\delta_{N}),\text{ where }\delta_{N}\simeq N^{-\gamma/(2\gamma+d)}. (8)

The posterior distribution Π(θ|Z(N))\Pi(\theta|Z^{(N)}) and its pushforward Π^t,N\hat{\Pi}_{t,N} then arise from data (2) as in (3) and (4), with dynamical system described by the PDE (6) for any fixed fCc(Ω)f\in C^{\infty}_{c}(\Omega).

Our main Theorem 1 will show that the (non-Gaussian) posterior measures Π^t,N\hat{\Pi}_{t,N} are approximated by a Gaussian law, as announced in (5). This approximation will hold as NN\to\infty, and in Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability, with ground truth initial condition θ0\theta_{0} that generated the data (2). The most delicate step in the identification of the limit distribution is the construction of a mean zero Gaussian random field (ϑ(x):xΩ)(\vartheta(x):x\in\Omega) whose covariance is related to an appropriate inverse of the underlying ‘Fisher information’ operator of the statistical model (see, [44] or Ch.3 in [31] for more on this notion). We will show that this process indeed exists for the reaction-diffusion system (6) and that the law 𝒩θ0\mathcal{N}_{\theta_{0}} it induces defines a Gaussian Borel measure in the negative Sobolev space HaH^{-a} whenever a>1+d/2a>1+d/2. Then let uθ0u_{\theta_{0}} be the solution of (6), write f=df/dxf^{\prime}=df/dx, and consider the Gaussian process UU over (0,T]×Ω(0,T]\times\Omega obtained from the unique weak solution of the following linear time-dependent Schrödinger equation with random initial condition:

tU(t,)ΔU(t,)f(uθ0(t,))U(t,)\displaystyle\frac{\partial}{\partial t}U(t,\cdot)-\Delta U(t,\cdot)-f^{\prime}(u_{\theta_{0}}(t,\cdot))U(t,\cdot) =0 on (0,)×Ω\displaystyle=0\text{ on }(0,\infty)\times\Omega
U(0,)\displaystyle U(0,\cdot) =ϑ𝒩θ0.\displaystyle=\vartheta\sim\mathcal{N}_{\theta_{0}}. (9)

Even though the initial condition ϑ\vartheta is not point-wise defined as a function (almost surely), our proofs will imply that the weak solutions U(t,)U(t,\cdot) exist and almost surely define continuous functions on Ω\Omega for t>0t>0. This is related to the time-smoothing properties of the parabolic solution operator underlying (1.2) and the fact that the null space of the Schrödinger operator Δ+f(θ0)\Delta+f^{\prime}(\theta_{0}) governing the ‘explosion’ at time t=0t=0 will be seen to be finite-dimensional, see also Remark 3.

We will assume that the true initial condition θ0\theta_{0} (but not the prior model) is smooth to simplify the statement of the following result. But this restriction is not necessary and can be weakened to sufficient Sobolev-regularity of θ0\theta_{0}. Fix any time window 0<tmin<tmax<0<t_{min}<t_{max}<\infty and define the Banach space

𝒞:=C([tmin,tmax],C(Ω)),v𝒞:=supt[tmin,tmax],xΩ|v(t,x)|,\mathscr{C}:=C\big{(}[t_{\min},t_{\max}],C(\Omega)\big{)},\leavevmode\nobreak\ \leavevmode\nobreak\ \|v\|_{\mathscr{C}}:=\sup_{t\in[t_{\min},t_{\max}],x\in\Omega}|v(t,x)|, (10)

of continuous maps on [tmin,tmax]×Ω[t_{\min},t_{\max}]\times\Omega. Denote by 𝒲1\mathscr{W}_{1} (equal to W1,𝒞W_{1,\mathscr{C}} in (20) below) the 11-Wasserstein distance on the space of probability measures on 𝒞\mathscr{C}.

Theorem 1.

Let μN=μ(|Z(N))\mu_{N}=\mu(\cdot|Z^{(N)}) be the conditional law in 𝒞\mathscr{C} of the stochastic process

{N(uθ(t,x)uθ~N(t,x))|Z(N):t[tmin,tmax],xΩ}\Big{\{}\sqrt{N}(u_{\theta}(t,x)-u_{\tilde{\theta}_{N}}(t,x))|Z^{(N)}:t\in[t_{\min},t_{\max}],x\in\Omega\Big{\}}

where θΠ(|Z(N))\theta\sim\Pi(\cdot|Z^{(N)}) arises from posterior (3) with data (2) in the reaction-diffusion system (6), prior Π=ΠN\Pi=\Pi_{N} in (7) for ρ\rho as in (8), integer γ>2+3d\gamma>2+3d, d3d\leq 3, and where θ~N=EΠ[θ|Z(N)]\tilde{\theta}_{N}=E^{\Pi}[\theta|Z^{(N)}] is the posterior mean in L02(Ω)L_{0}^{2}(\Omega). Denote by μ\mu the law in 𝒞\mathscr{C} of the Gaussian random function arising from the unique weak solution UU to the PDE (1.2) with initial condition ϑ𝒩θ0\vartheta\sim\mathcal{N}_{\theta_{0}} and smooth θ0L02\theta_{0}\in L^{2}_{0}. Then we have as NN\to\infty

𝒲1(μN,μ)Pθ00 as well as N(uθ~Nuθ0)𝒞dμ.\mathscr{W}_{1}(\mu_{N},\mu)\to^{P_{\theta_{0}}^{\mathbb{N}}}0\leavevmode\nobreak\ \leavevmode\nobreak\ \text{ as well as }\leavevmode\nobreak\ \leavevmode\nobreak\ \sqrt{N}(u_{\tilde{\theta}_{N}}-u_{\theta_{0}})\to_{\mathscr{C}}^{d}\mu.

From convergence in law d\to^{d} in the space 𝒞\mathscr{C} one deduces in particular that N(uθ~uθ0)\sqrt{N}(u_{\tilde{\theta}}-u_{\theta_{0}}) is uniformly tight in 𝒞\mathscr{C}, so the theorem implies that the updated posterior mean estimates uθ~(t)u_{\tilde{\theta}}(t) are concentrated near the ‘true’ dynamical system uθ0(t)u_{\theta_{0}}(t), and that their uniform 𝒞\|\cdot\|_{\mathscr{C}} fluctuations at a N\sqrt{N} scale are approximately Gaussian.

Theorem 1 has various important applications that we only outline briefly here. One is to uncertainty quantification and the frequentist validity of posterior credible bands which are random subsets CN𝒞C_{N}\subset\mathscr{C} of posterior probability 1α1-\alpha for some fixed 0<α<10<\alpha<1. Theorem 1 and arguments as in the proof of Theorem 7.3.21 in [20] can be combined to show that appropriate CNC_{N} are of diameter OPθ0(1/N)O_{P_{\theta_{0}}^{\mathbb{N}}}(1/\sqrt{N}) and satisfy

Pθ0((uθ0(t,x):t[tmin,tmax],xΩ)CN)1α, as N.P_{\theta_{0}}^{\mathbb{N}}\big{(}(u_{\theta_{0}}(t,x):t\in[t_{\min},t_{\max}],x\in\Omega)\in C_{N}\big{)}\to 1-\alpha,\leavevmode\nobreak\ \text{ as }N\to\infty. (11)

Another interesting application our our results concerns guarantees for posterior computation: When Gaussian approximation theorems such as Theorem 1 hold, they furnish a strategy to prove the existence of polynomial-time computational algorithms based on MCMC [2, 22, 37, 31], overcoming potential computational hardness barriers [1]. We note that the hypothesis of a Gaussian approximation is also used in recent work on convergence guarantees for Kalman-type filters in non-linear settings [4].

1.3 Notation and preliminaries

For real numbers a,ba,b, we write aba\lesssim b whenever aCba\leq Cb for some fixed constant C>0C>0, and aba\simeq b if ab,baa\lesssim b,b\lesssim a. We write ZμZ\sim\mu when a random variable has law μ\mu, write P\to^{P} for convergence in probability, d\to^{d} for convergence in distribution, and use the standard OP,oPO_{P},o_{P} notation for stochastic orders of magnitude, see [44].

For 1p1\leq p\leq\infty and (𝒵,𝒜,μ)(\mathscr{Z},\mathcal{A},\mu) a measure space, the Lp(μ)L^{p}(\mu)-spaces are defined in the usual way as all measurable functions HH on 𝒵\mathscr{Z} such that |H|p|H|^{p} is μ\mu-integrable, or essentially bounded if p=p=\infty. When 𝒵\mathscr{Z} is a subset of d\mathbb{R}^{d} we take μ\mu to be Lebesgue measure on the Borel sets, unless specified otherwise. For XX a normed linear space, we define the function space Lp([0,T],X)L^{p}([0,T],X) of measurable maps HH from [0,T]X[0,T]\to X such that H(t)X\|H(t)\|_{X} lies in Lp([0,T])L^{p}([0,T]). We also define the space C([0,T],X)C([0,T],X) of continuous maps from [0,T]X[0,T]\to X normed by the supremum norm sup0<t<TH(t)X\sup_{0<t<T}\|H(t)\|_{X} as well as the spaces of weakly differentiable maps

C1([0,T],X)={H:HC1([0,T],X):=sup0<t<TH(t)X+esssup0<t<TH(t)X]<},C^{1}([0,T],X)=\Big{\{}H:\|H\|_{C^{1}([0,T],X)}:=\sup_{0<t<T}\|H(t)\|_{X}+\textrm{ess}\sup_{0<t<T}\|H^{\prime}(t)\|_{X}]<\infty\Big{\}},
H1([0,T],X)={H:HH1([0,T],X)2:=0TH(t)X2𝑑t+0TH(t)X2𝑑t<}.H^{1}([0,T],X)=\Big{\{}H:\|H\|^{2}_{H^{1}([0,T],X)}:=\int_{0}^{T}\|H(t)\|^{2}_{X}dt+\int_{0}^{T}\|H^{\prime}(t)\|^{2}_{X}dt<\infty\Big{\}}.

For Ω=[0,1]d\Omega=[0,1]^{d} the dd-dimensional torus (with opposite points identified), C(Ω)C(\Omega) denotes the Banach space of bounded continuous functions defined on Ω\Omega, equipped with supremum norm \|\cdot\|_{\infty}, and Cγ(Ω)C^{\gamma}(\Omega) denotes the usual spaces of functions defined on Ω\Omega whose partial derivatives up to order γ\gamma\in\mathbb{N} are bounded functions; for γ\gamma\notin\mathbb{N} these are the Hölder spaces, and C(Ω)=γ>0Cγ(Ω)C^{\infty}(\Omega)=\cap_{\gamma>0}C^{\gamma}(\Omega) denotes the set of smooth (infinitely-differentiable) functions defined on Ω\Omega. We also introduce the spaces

C1,(Ω)b>0C1([0,T],Cb(Ω)).C^{1,\infty}(\Omega)\equiv\cap_{b>0}C^{1}([0,T],C^{b}(\Omega)). (12)

Further define the Sobolev spaces Hγ(Ω),γ0,H^{\gamma}(\Omega),\gamma\geq 0, of functions whose weak partial derivatives up to order γ\gamma lie in L2(Ω)L^{2}(\Omega). The norms Cγ,Hγ\|\cdot\|_{C^{\gamma}},\|\cdot\|_{H^{\gamma}} on Cγ,Hγ,γ,C^{\gamma},H^{\gamma},\gamma\in\mathbb{N}, are then given by H+|α|=γDαH,\|H\|+\sum_{|\alpha|=\gamma}\|D^{\alpha}H\|, where DαD^{\alpha} is the (weak) partial differential operator for multi-index α\alpha and the norms \|\cdot\| equal \|\cdot\|_{\infty} or L2\|\cdot\|_{L^{2}}, respectively.

The Sobolev spaces HγH^{\gamma} carry equivalent sequence space norms

hhγ2=j0(1+λj)γ|ej,hL2|2,\|h\|_{h^{\gamma}}^{2}=\sum_{j\geq 0}(1+\lambda_{j})^{\gamma}|\langle e_{j},h\rangle_{L^{2}}|^{2}, (13)

where ,L2\langle\cdot,\cdot\rangle_{L^{2}} is the inner product of L2(Ω)L^{2}(\Omega), and the eje_{j} are the L2(Ω)L^{2}(\Omega)-orthonormal eigenfunctions of the periodic Laplacian Δ\Delta for (negative) eigenvalues λj\lambda_{j}. Concretely we have e0=1,λ0=0e_{0}=1,\lambda_{0}=0, and

Δej=λjej,j1, 0<λjλj+1j2/d as j,\Delta e_{j}=-\lambda_{j}e_{j},\leavevmode\nobreak\ j\geq 1,\leavevmode\nobreak\ \leavevmode\nobreak\ 0<\lambda_{j}\leq\lambda_{j+1}\simeq j^{2/d}\leavevmode\nobreak\ \text{ as }\leavevmode\nobreak\ j\to\infty, (14)

where j=0,1,2,j=0,1,2,\dots is an enumeration of the integer lattice d\mathbb{Z}^{d} and

ej(x)e2πikjx,xΩ,kjd.e_{j}(x)\propto e^{2\pi ik_{j}\cdot x},\leavevmode\nobreak\ x\in\Omega,\leavevmode\nobreak\ k_{j}\in\mathbb{Z}^{d}. (15)

An equivalent norm is obtained by replacing (1+λj)γ(1+\lambda_{j})^{\gamma} by (1+λjγ)(1+\lambda_{j}^{\gamma}) and this norm is further equivalent to the graph norm of the image of L2L^{2} under (idΔγ/2)(id-\Delta^{\gamma/2}).

We define topological dual spaces Hγ(Ω)=(Hγ(Ω))H^{-\gamma}(\Omega)=(H^{\gamma}(\Omega))^{*} with norm

fHγ=supψHγ:ψHγ1|f,ψL2|,\|f\|_{H^{-\gamma}}=\sup_{\psi\in H^{\gamma}:\|\psi\|_{H^{\gamma}}\leq 1}\big{|}\langle f,\psi\rangle_{L^{2}}|, (16)

where the supremum may be restricted to ψC(Ω)\psi\in C^{\infty}(\Omega). If we understand h,ejL2\langle h,e_{j}\rangle_{L^{2}} as the action Th(ej)T_{h}(e_{j}) of periodic Schwartz distributions ThT_{h} on smooth test functions ejC(Ω)e_{j}\in C^{\infty}(\Omega), then the hγh^{-\gamma} norms from (13) are equivalent to these dual norms also for negative γ<0-\gamma<0, and (16) is valid for all γ\gamma\in\mathbb{R}. We define closed subspaces

H0γ=Hγ{h:h,1L2=0}H^{\gamma}_{0}=H^{\gamma}\cap\{h:\langle h,1\rangle_{L^{2}}=0\}

and note that H00(Ω)=L02(Ω)H^{0}_{0}(\Omega)=L^{2}_{0}(\Omega). On H0γ(Ω)H^{\gamma}_{0}(\Omega) we have equivalent norms as in (13) but using only λjγ\lambda_{j}^{\gamma} in place of (1+λj)γ(1+\lambda_{j})^{\gamma}, taking note of the Poincaré inequality λ11/2π\lambda_{1}\geq 1/2\pi.

If we set Ba=HaB^{a}=H^{a} for a>d/2a>d/2 and Ba=CaB^{a}=C^{a} for 0ad/20\leq a\leq d/2, then the multiplier inequality for Sobolev norms is

fgHafHagBa.\|fg\|_{H^{a}}\lesssim\|f\|_{H^{a}}\|g\|_{B^{a}}. (17)

For a<0a<0 we can use (16) and (17) to obtain

fgHafHagB|a|.\|fg\|_{H^{a}}\lesssim\|f\|_{H^{a}}\|g\|_{B^{|a|}.} (18)

We also need the following interpolation inequality for Sobolev norms (see (A.16), p.473, in [42] and Lemma 3.27 in [39])

hhξhhγ¯1mhhbm,m=γ¯ξγ¯+b,<bξγ¯<.\|h\|_{h^{\xi}}\leq\|h\|^{1-m}_{h^{\bar{\gamma}}}\|h\|^{m}_{h^{-b}},\leavevmode\nobreak\ \leavevmode\nobreak\ m=\frac{\bar{\gamma}-\xi}{\bar{\gamma}+b},\leavevmode\nobreak\ -\infty<-b\leq\xi\leq\bar{\gamma}<\infty. (19)

To metrise the distance between two probability measures μ,ν\mu,\nu on a metric space (X,d)(X,d), we will use the transportation Wasserstein-11 distance,

W1(μ,ν)=W1,(X,d)(μ,ν)=supF:X,FLip1|EF(Z1)EF(Z2)|W_{1}(\mu,\nu)=W_{1,(X,d)}(\mu,\nu)=\sup_{F:X\to\mathbb{R},\|F\|_{Lip}\leq 1}|EF(Z_{1})-EF(Z_{2})| (20)

where Z1μ,Z2νZ_{1}\sim\mu,Z_{2}\sim\nu and

FLip=supxy;x,yX|F(x)F(y)|d(x,y).\|F\|_{Lip}=\sup_{x\neq y;x,y\in X}\frac{|F(x)-F(y)|}{d(x,y)}.

See Chapter 6 in [46] for equivalent definitions and further properties.

2 Functional Bernstein–von Mises theorems

2.1 A general result for non-linear parabolic PDEs

Let Ω=[0,1]d,d\Omega=[0,1]^{d},d\in\mathbb{N}, and let WW be a finite-dimensional vector space. If each entry uku_{k} of a WW-valued vector field uu lies in a function space X(Ω)X(\Omega) we write uX(Ω,W)u\in X(\Omega,W), and the corresponding norm is denoted by uX(Ω,W)2=kukX(Ω)2\|u\|^{2}_{X(\Omega,W)}=\sum_{k}\|u_{k}\|_{X(\Omega)}^{2}. Often we will just write X(Ω)X(\Omega) for X(Ω,W)X(\Omega,W). We consider a general forward map

θ𝒢(θ),𝒢:H1(Ω,W)L2([0,T],L2(Ω,W)),T>0.\theta\mapsto\mathscr{G}(\theta),\leavevmode\nobreak\ \leavevmode\nobreak\ \mathscr{G}:H^{1}(\Omega,W)\to L^{2}([0,T],L^{2}(\Omega,W)),\leavevmode\nobreak\ \leavevmode\nobreak\ T>0. (21)

When W=W=\mathbb{R} this accommodates the solution map of the reaction-diffusion system (6), but further allows dynamical systems of vector fields, such as coupled systems of PDEs. The measurement model (2) is then a special case of the random design regression equation

Yi=𝒢(θ)(Xi)+εi,i=1,,N,εiiid𝒩(0,IdW),Y_{i}=\mathscr{G}(\theta)(X_{i})+\varepsilon_{i},\leavevmode\nobreak\ \leavevmode\nobreak\ i=1,\dots,N,\leavevmode\nobreak\ \leavevmode\nobreak\ \varepsilon_{i}\sim^{iid}\mathcal{N}(0,Id_{W}), (22)

where the XiX_{i} are drawn iid from the uniform distribution λ\lambda on 𝒳=[0,T]×Ω\mathcal{X}=[0,T]\times\Omega, independently of the Gaussian noise vectors εi\varepsilon_{i}. Just as after (2) we write Z(N):=(Yi,Xi)i=1NZ^{(N)}:=(Y_{i},X_{i})_{i=1}^{N}, denote the product measure describing the law of the infinite observation vector (Yi,Xi)i=1(Y_{i},X_{i})_{i=1}^{\infty} by PθP_{\theta}^{\mathbb{N}}, and occasionally write L2(𝒳)L^{2}(\mathcal{X}) for L2([0,T],L2(Ω))L^{2}([0,T],L^{2}(\Omega)). This random design regression setting permits to borrow from the theory developed in [31] as well as the use of tools from concentration of product measures in high dimensions. Modelling explicit dependence structures in the design complicates the development significantly but is in principle possible, for instance as in [32].

We now turn to the prior probability measure for the parameter θH1\theta\in H^{1} – we refer to Ch.2 in [20] for standard background from the theory of Gaussian processes, such as the definition of their reproducing kernel Hilbert spaces (RKHS).

Condition 1.

Consider the centred Gaussian Borel probability measure Π=Πγ\Pi^{\prime}=\Pi^{\prime}_{\gamma} defined on L02(Ω,W)L^{2}_{0}(\Omega,W) with RKHS =H0γ(Ω,W)\mathcal{H}=H^{\gamma}_{0}(\Omega,W) for some γ>1+d/2\gamma>1+d/2. Then for θΠ\theta^{\prime}\sim\Pi^{\prime} take as prior Π=Πγ,N\Pi=\Pi_{\gamma,N} the law of

θ=1NδNθ where δN=Nγ2γ+d,\theta=\frac{1}{\sqrt{N}\delta_{N}}\theta^{\prime}\text{ where }\delta_{N}=N^{-\frac{\gamma}{2\gamma+d}},

with resulting RKHS-norm N=Nδn\|\cdot\|_{\mathcal{H}_{N}}=\sqrt{N}\delta_{n}\|\cdot\|_{\mathcal{H}}.

For eje_{j} from (15), the prior Πγ\Pi^{\prime}_{\gamma} can be represented by the law of a Gaussian random series

θ(x)=j=1λjγ/2gjej(x),xΩ, where gjiidN(0,1),\theta^{\prime}(x)=\sum_{j=1}^{\infty}\lambda_{j}^{-\gamma/2}g_{j}e_{j}(x),\leavevmode\nobreak\ x\in\Omega,\leavevmode\nobreak\ \text{ where }g_{j}\sim^{iid}N(0,1), (23)

augmented to independent such copies in each of its coordinates whenever dim(W)>1dim(W)>1. Shrinking the prior towards zero ensures a degree of a priori regularisation and has been used throughout proofs in the non-linear inverse problems literature since [28]. By the theory of radonifying maps (as in Thm B.1.3 in [31]), Πγ\Pi^{\prime}_{\gamma} and Π\Pi define centred Gaussian Borel probability measures on the separable Hilbert space

H0γ¯(Ω,W),for any 1γ¯<γd/2,H^{\bar{\gamma}}_{0}(\Omega,W),\leavevmode\nobreak\ \text{for any}\leavevmode\nobreak\ 1\leq\bar{\gamma}<\gamma-d/2,

which serves as the natural ‘parameter’ space charged almost surely by prior draws. The posterior law of θ|Z(N)\theta|Z^{(N)} then arises from a dominated model and is given by

dΠ(θ|Z(N))eN(θ)dΠ(θ),N(θ)=12i=1N|Yi𝒢(θ)(Xi)|W2,θH01(Ω,W),d\Pi(\theta|Z^{(N)})\propto e^{\ell_{N}(\theta)}d\Pi(\theta),\leavevmode\nobreak\ \leavevmode\nobreak\ \ell_{N}(\theta)=-\frac{1}{2}\sum_{i=1}^{N}|Y_{i}-\mathscr{G}(\theta)(X_{i})|_{W}^{2},\leavevmode\nobreak\ \leavevmode\nobreak\ \theta\in H^{1}_{0}(\Omega,W), (24)

see [19] or Sec.1.2.3 in [31].

We now formulate some analytical conditions on 𝒢\mathscr{G} – these are designed for solution maps θ𝒢(θ)\theta\mapsto\mathscr{G}(\theta) of non-linear parabolic PDEs as in (1). We denote the balls in H0rH^{r}_{0} by

U(r,B)={uH0r(Ω,W):uHrB},B>0,U(r,B)=\big{\{}u\in H^{r}_{0}(\Omega,W):\|u\|_{H^{r}}\leq B\big{\}},\leavevmode\nobreak\ B>0, (25)

and the regularity properties will be required to hold uniformly on such balls with constants that may depend on BB and the fixed ground truth θ0\theta_{0}. The constants may depend on further parameters T,W,d,γ¯,γ,ζT,W,d,\bar{\gamma},\gamma,\zeta which we however suppress in the notation.

Condition 2.

Let θ0H0γ(Ω,W)\theta_{0}\in H^{\gamma}_{0}(\Omega,W) for some γ>1+d\gamma>1+d. Suppose a forward map 𝒢\mathscr{G} as in (21) satisfies the following hypotheses:

A) There exists 1γ¯γd/21\leq\bar{\gamma}\leq\gamma-d/2 such that for every B>0B>0 and some u=u(B)>0u=u(B)>0,

supθU(γ¯,B),0<tT,xΩ|𝒢(θ)(t,x)|u<.\sup_{\theta\in U(\bar{\gamma},B),0<t\leq T,x\in\Omega}|\mathscr{G}(\theta)(t,x)|\leq u<\infty. (26)

B) For every B>0B>0, there exists c=c(B)>0c=c(B)>0 such that

𝒢(θ)𝒢(ϑ)L2([0,T],L2(Ω))cθϑL2(Ω)θ,ϑU(1,B).\|\mathscr{G}(\theta)-\mathscr{G}(\vartheta)\|_{L^{2}([0,T],L^{2}(\Omega))}\leq c\|\theta-\vartheta\|_{L^{2}(\Omega)}\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\theta,\vartheta\in U(1,B).

C) [Linear approximation:] There exist a continuous linear operator

𝕀θ0D𝒢θ0:L2(Ω)L2([0,T],L2(Ω))\mathbb{I}_{\theta_{0}}\equiv D\mathscr{G}_{\theta_{0}}:L^{2}(\Omega)\to L^{2}([0,T],L^{2}(\Omega)) (27)

and 1γ¯γd/2,c>0,c=c(θ0,B)>0,1\leq\bar{\gamma}\leq\gamma-d/2,c^{\prime}>0,c=c(\theta_{0},B)>0, such that for all hU(γ¯,B),B>0,h\in U(\bar{\gamma},B),B>0, with hL2c\|h\|_{L^{2}}\leq c^{\prime} we have

𝒢(θ0+h)𝒢(θ0)𝕀θ0[h]L2([0,T],L2(Ω))chL2(Ω)2.\|\mathscr{G}(\theta_{0}+h)-\mathscr{G}(\theta_{0})-\mathbb{I}_{\theta_{0}}[h]\|_{L^{2}([0,T],L^{2}(\Omega))}\leq c\|h\|^{2}_{L^{2}(\Omega)}.

D) [LL^{\infty}-mapping properties:] Suppose that for some d/2<ζ<γd/2d/2<\zeta<\gamma-d/2 and all B>0B>0, there exists c=c(θ0,B)>0c=c(\theta_{0},B)>0 such that

𝒢(θ)𝒢(θ0)L([0,T]),L(Ω))cθθ0Hζ,θU(ζ,B)H1.\|\mathscr{G}(\theta)-\mathscr{G}(\theta_{0})\|_{L^{\infty}([0,T]),L^{\infty}(\Omega))}\leq c\|\theta-\theta_{0}\|_{H^{\zeta}},\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\theta\in U(\zeta,B)\cap H^{1}. (28)

Suppose also that 𝕀θ0\mathbb{I}_{\theta_{0}} from (27) is continuous from HζL([0,T],L(Ω))H^{\zeta}\to L^{\infty}([0,T],L^{\infty}(\Omega)).

E) [Stability:] Suppose that for some 1γ¯<γd/21\leq\bar{\gamma}<\gamma-d/2 and all B>0B>0 there exists a constant c=c(θ0,B)>0c=c(\theta_{0},B)>0 such that

𝒢(θ)𝒢(θ0)L2([0,T],L2(Ω))cθθ0H1,θU(γ¯,B).\|\mathscr{G}(\theta)-\mathscr{G}(\theta_{0})\|_{L^{2}([0,T],L^{2}(\Omega))}\geq c\|\theta-\theta_{0}\|_{H^{-1}},\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\theta\in U(\bar{\gamma},B).

F) [Inverse information:] Take 𝕀θ0\mathbb{I}_{\theta_{0}} from (27) with Hilbert space adjoint operator

𝕀θ0:L2([0,T],L2(Ω))L2(Ω),\mathbb{I}_{\theta_{0}}^{*}:L^{2}([0,T],L^{2}(\Omega))\to L^{2}(\Omega),

and consider the information operator 𝕀θ0𝕀θ0\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}} acting on H0η(Ω)L2(Ω)H^{\eta}_{0}(\Omega)\subset L^{2}(\Omega) by restriction. For Δ\Delta the Laplacian, some η00\eta_{0}\geq 0 and all ηη0\eta\geq\eta_{0}, assume that

Δ𝕀θ0𝕀θ0\mathcal{I}\equiv\Delta\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}} (29)

defines a continuous linear homeomorphism from H0ηH0ηH^{\eta}_{0}\to H^{\eta}_{0}.

Remark 1 (Priors on subspaces).

In some applications it is of interest to consider θ\theta^{\prime} in (23) where a fixed subsequence of the gjg_{j}’s is set to zero, for instance to enforce the divergence free constraint in the Navier-Stokes model via the Stokes projector. In this case Condition 2 can be modified to hold with H0γ(Ω,W)H^{\gamma}_{0}(\Omega,W) replaced everywhere by the resulting closed subspace spanned by the remaining non-zero eje_{j}’s. The proof of Theorem 2 below then still applies after only notational adjustments.

Remark 2 (Condition 2 and parabolic PDE).

Conditions A)-D) can be expected to follow from regularity estimates for ‘dissipative’ systems of the form (1), see e.g., [39, 15]. We will demonstrate in Sec.3 how this works for periodic reaction-diffusion equations (6), but similar estimates exist for McKean-Vlasov systems [34] and for 2d2d-Navier-Stokes equations [36]. In contrast, Conditions E) and F) are more subtle and related to identifiability properties of the dynamical system, and the question of statistical consistency of likelihood-based inference methods – see [36] and also [32, 34] for related settings. Notably, [36] prove the consistency of the posterior distributions Π^t,N\hat{\Pi}_{t,N} from (4) in the periodic 2d2d-Navier-Stokes setting, based on a logarithmic stability estimate for θ𝒢(θ)\theta\mapsto\mathscr{G}(\theta). The Lipschitz stability condition E) is, however, a stronger requirement, which we verify for reaction diffusion equations below. It is possible because we have access to time average measurements near the origin t=0t=0 in (22) – note that the logarithmic lower bounds in [36] explicitly rule out such situations. Our proof relies on the symmetry of certain Schrödinger operators (116) appearing in the linearisation. While this technique may not apply directly to general PDEs (1), it is conceivable that Condition E) and the related injectivity part of Condition F) can still be verified without these symmetries. The surjectivity part of Condition F) should also hold for ‘dissipative’ parabolic systems such as Navier-Stokes dynamics, possibly as in Theorem 8 by comparing the linearised equation to a time-independent parabolic system – see also Remark 3. These important extensions will be investigated elsewhere.

We now state a Bernstein-von Mises theorem for the posterior on θ\theta in the ‘weak’ norm topology of the space Hk2H^{-k-2}. The posterior mean vector θ~N=EΠ[θ|Z(N)]\tilde{\theta}_{N}=E^{\Pi}[\theta|Z^{(N)}] exists as a Bochner integral in L02L^{2}_{0} and hence also in Hκ2H^{-\kappa-2}, while the limiting Gaussian measure 𝒩θ0\mathcal{N}_{\theta_{0}} is constructed in Proposition 2 as a tight Borel law on Hk2H^{-k-2}.

Theorem 2.

Suppose Conditions 1 and 2 hold for some integer γ>max(3ζ+(3d/2)+2,η0)\gamma>\max(3\zeta+(3d/2)+2,\eta_{0}) and k>max(ζ+(5d/2),η0)k>\max(\zeta+(5d/2),\eta_{0}). Let θ|Z(N)Π(|Z(N))\theta|Z^{(N)}\sim\Pi(\cdot|Z^{(N)}) with posterior from (24). For W1W_{1} the Wasserstein distance (20) on Hk2(Ω)H^{-k-2}(\Omega) we have as NN\to\infty

W1(Law(N(θθ~N)|Z(N)),𝒩θ0)Pθ00W_{1}(Law(\sqrt{N}(\theta-\tilde{\theta}_{N})|Z^{(N)}),\mathcal{N}_{\theta_{0}})\to^{P_{\theta_{0}}^{\mathbb{N}}}0

as well as

N(θ~Nθ0)d𝒩θ0 in Hk2.\sqrt{N}(\tilde{\theta}_{N}-\theta_{0})\to^{d}\mathcal{N}_{\theta_{0}}\text{ in }H^{-k-2}.

The particular choice of kk is not important when combined in conjunction with a parabolic smoothing argument such as the one to be used in the proof of Theorem 1 below. The lower bound on γ\gamma could be improved slightly by introducing further technicalities (e.g., by Schauder- rather than just energy estimates in Condition 2D).

The general proof strategy follows ideas developed in [6, 7, 30, 29] who derive Bernstein-von Mises theorems in a variety of ‘non-conjugate’ settings. Infinite dimensional (‘functional’) such results have so far only been obtained in ‘direct’ iid models in [6, 8] and for non-linear inverse problems in [30, 35] not covering Gaussian priors. Here we extend the ‘semi-parametric’ proof in [29] for fixed one-dimensional functionals θ,ψ\langle\theta,\psi\rangle to hold uniformly in ψ\psi belonging to a unit ball of Hk+2H^{k+2}, thereby obtaining a Gaussian approximation in the infinite-dimensional Hk2\|\cdot\|_{H^{-k-2}} space. We also prove our results in the stronger Wasserstein distance rather than just in a metric for weak convergence, allowing to obtain the asymptotics of the posterior mean and first moments.

2.2 Proof of Theorem 2

For the proof we need to fix a few constants in advance: given kk from the theorem and ζ>d/2\zeta>d/2 from Condition 2D, we take κ,κ¯\kappa,\bar{\kappa} satisfying

2d+ζ<κ<kd2,d2<κ¯<κd2.2d+\zeta<\kappa<k-\frac{d}{2},\leavevmode\nobreak\ \leavevmode\nobreak\ \frac{d}{2}<\bar{\kappa}<\kappa-\frac{d}{2}. (30)

Further we take γ¯\bar{\gamma} satisfying

3ζ+d+2<γ¯<γd2,3\zeta+d+2<\bar{\gamma}<\gamma-\frac{d}{2}, (31)

and assume without loss of generality that Condition 2 holds for this γ¯\bar{\gamma}. [We note that if Condition 2 holds for some γ¯\bar{\gamma}^{\prime}, then it also holds for all γ¯γ¯\bar{\gamma}\geq\bar{\gamma}^{\prime} because H0γ¯H0γ¯H_{0}^{\bar{\gamma}}\subset H_{0}^{\bar{\gamma}^{\prime}}.]

Define the span in L2(Ω,W)L^{2}(\Omega,W) of the (dim(W)dim(W)-dimensional tensors of) trigonometric polynomials (15) as

EJ=span{ej:0jJ} with associated L2-projector PEJ:L2EJ.E_{J}=span\{e_{j}:0\leq j\leq J\}\text{ with associated }L^{2}\text{-projector }P_{E_{J}}:L^{2}\to E_{J}. (32)

Since the RKHS of the base prior from Condition 1 equals =H0γ(Ω)\mathcal{H}=H^{\gamma}_{0}(\Omega) with equivalent norm hγ\|\cdot\|_{h^{\gamma}}, and using (14), we have for every JJ\in\mathbb{N} and some c>0c>0 the standard approximation bounds

PEJϕϕL2cϕJγ,PEJϕcmax(1,Jγr)ϕHr,ϕL02(Ω),\|P_{E_{J}}\phi-\phi\|_{L^{2}}\leq c\|\phi\|_{\mathcal{H}}J^{-\gamma},\leavevmode\nobreak\ \leavevmode\nobreak\ \|P_{E_{J}}\phi\|_{\mathcal{H}}\leq c\max(1,J^{\gamma-r})\|\phi\|_{H^{r}},\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\phi\in L^{2}_{0}(\Omega), (33)

which we shall use repeatedly in the proofs.

2.2.1 Posterior contraction, regularity, and localisation

The first step is a global ‘consistency’ result about the contraction of the posterior distribution (24) towards the ‘ground truth’ initial condition θ0\theta_{0}. Proofs of this kind for ‘direct’ regression models follow ideas laid out in [19], specifically for Gaussian priors see [45]. In the context of (non-linear) inverse problems as relevant here, such proofs were developed in [28] and then, in a general setting in Chapter 2 in [31], assuming ‘stability’ estimates for the 𝒢\mathscr{G} map that we will verify using Condition 2E). A novel feature we require is uniform control of the interaction of the RKHS inner product ,\langle\cdot,\cdot\rangle_{\mathcal{H}} of the prior with test functions exhausting the dual norm of HκH^{-\kappa}.

For γ,δN\gamma,\delta_{N} from Condition 1 and γ¯,κ¯\bar{\gamma},\bar{\kappa} from (31), (30), define sequences

δ~N(ξ)=δN(γ¯ξ)/(γ¯+1), 0ξγ¯,δ~N(0)δ~N,\tilde{\delta}_{N}(\xi)=\delta_{N}^{(\bar{\gamma}-\xi)/(\bar{\gamma}+1)},\leavevmode\nobreak\ 0\leq\xi\leq\bar{\gamma},\leavevmode\nobreak\ \leavevmode\nobreak\ \tilde{\delta}_{N}(0)\equiv\tilde{\delta}_{N}, (34)

as well as

JN,JNNδN2,KN=KN,κ¯=NδNmax(1,JNγκ¯).J_{N}\in\mathbb{N},J_{N}\simeq N\delta_{N}^{2},\leavevmode\nobreak\ \leavevmode\nobreak\ K_{N}=K_{N,\bar{\kappa}}=\sqrt{N}\delta_{N}\max(1,J_{N}^{\gamma-\bar{\kappa}}). (35)

In the following proof only Conditions 2A), B) and E) are used, and the hypothesis on γ\gamma from Theorem 2 could be somewhat weakened, as inspection of the proof shows.

Theorem 3.

For ψL02(Ω,W)\psi\in L_{0}^{2}(\Omega,W) define L2L^{2}-projections onto EJE_{J} from (32) as

pN(ψ)=PEJNψp_{N}(\psi)=P_{E_{J_{N}}}\psi (36)

Let γ¯,κ,κ¯\bar{\gamma},\kappa,\bar{\kappa} be as in (31), (30), the RKHS N\mathcal{H}_{N} as in Condition 1, and for L>0L>0 define measurable sets

Θ¯N\displaystyle\bar{\Theta}_{N} =Θ¯N,L:={θH01:θHγ¯L,supψU(κ,1)|θ,pN(ψ)N|LNδNKN,κ¯}\displaystyle=\bar{\Theta}_{N,L}:=\Big{\{}\theta\in H^{1}_{0}:\|\theta\|_{H^{\bar{\gamma}}}\leq L,\leavevmode\nobreak\ \sup_{\psi\in U(\kappa,1)}|\langle\theta,p_{N}(\psi)\rangle_{\mathcal{H}_{N}}|\leq L\sqrt{N}\delta_{N}K_{N,\bar{\kappa}}\Big{\}} (37)
{𝒢(θ)𝒢(θ0)L2([0,T],L2(Ω))LδN,θθ0HξLδ~N(ξ)0ξ<γ¯}.\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \cap\leavevmode\nobreak\ \Big{\{}\|\mathscr{G}(\theta)-\mathscr{G}(\theta_{0})\|_{L^{2}([0,T],L^{2}(\Omega))}\leq L\delta_{N},\leavevmode\nobreak\ \|\theta-\theta_{0}\|_{H^{\xi}}\leq L\tilde{\delta}_{N}(\xi)\leavevmode\nobreak\ \forall 0\leq\xi<\bar{\gamma}\Big{\}}.

Let the posterior measure Π(|Z(N))\Pi(\cdot|Z^{(N)}) be as in (24) arising from data (22) and prior Πγ,N\Pi_{\gamma,N} from Condition 1. Then for all b>0b>0 we can choose LL large enough but finite such that

Pθ0(Π(Θ¯N,L|Z(N))1ebNδN2)N0.P_{\theta_{0}}^{\mathbb{N}}\big{(}\Pi(\bar{\Theta}_{N,L}|Z^{(N)})\leq 1-e^{-bN\delta_{N}^{2}}\big{)}\to_{N\to\infty}0. (38)

We further have for any 1q<,0ξγ¯,1\leq q<\infty,0\leq\xi\leq\bar{\gamma}, that

EΠ[θ|Z(N)]θ0Hξ\displaystyle\|E^{\Pi}[\theta|Z^{(N)}]-\theta_{0}\|_{H^{\xi}} (EΠ[θθ0Hξq|Z(N)])1/q=OPθ0(δ~N(ξ))\displaystyle\leq(E^{\Pi}[\|\theta-\theta_{0}\|^{q}_{H^{\xi}}|Z^{(N)}])^{1/q}=O_{P_{\theta_{0}}^{\mathbb{N}}}(\tilde{\delta}_{N}(\xi)) (39)
Proof.

We start with a preparatory inequality: Consider

Π(ΘN)\displaystyle\Pi(\Theta^{\prime}_{N}) Π(θ:supψU(κ,1)|θ,pN(ψ)N|>LNδNKN)\displaystyle\equiv\Pi\big{(}\theta:\sup_{\psi\in U(\kappa,1)}|\langle\theta,p_{N}(\psi)\rangle_{\mathcal{H}_{N}}|>L\sqrt{N}\delta_{N}K_{N}\big{)}
=Π(θ:supψU(κ,1)|θ,pN(ψ)|>LKN),\displaystyle=\Pi^{\prime}\big{(}\theta^{\prime}:\sup_{\psi\in U(\kappa,1)}|\langle\theta^{\prime},p_{N}(\psi)\rangle_{\mathcal{H}}|>LK_{N}\big{)}, (40)

with ,N=NδN2,\langle\cdot,\cdot\rangle_{\mathcal{H}_{N}}=N\delta_{N}^{2}\langle\cdot,\cdot\rangle_{\mathcal{H}} and θ=θ/NδN\theta=\theta^{\prime}/{\sqrt{N}\delta_{N}}. From the series expansion (23) for θΠ\theta^{\prime}\sim\Pi^{\prime} and (33) with r=κ¯r=\bar{\kappa} we know that

{X(ψ)=θ,pN(ψ):ψU(κ,1)}\{X(\psi)=\langle\theta^{\prime},p_{N}(\psi)\rangle_{\mathcal{H}}:\psi\in U(\kappa,1)\}

is a mean zero Gaussian process with covariance metric

dX2(ψ,ψ)=pN(ψ)pN(ψ)2max(1,JN2(γκ¯))ψψHκ¯2σN2ψψHκ¯2.d^{2}_{X}(\psi,\psi^{\prime})=\|p_{N}(\psi)-p_{N}(\psi^{\prime})\|_{\mathcal{H}}^{2}\lesssim\max(1,J_{N}^{2(\gamma-\bar{\kappa})})\|\psi-\psi^{\prime}\|^{2}_{H^{\bar{\kappa}}}\equiv\sigma_{N}^{2}\|\psi-\psi^{\prime}\|^{2}_{H^{\bar{\kappa}}}.

The unit ball U(κ,1)U(\kappa,1) of H0κH_{0}^{\kappa} has η\eta-covering numbers for Hκ¯H^{{\bar{\kappa}}}-distance of the order

N(U(κ,1),Hκ¯,η)exp{(A/η)d/(κκ¯)}, 0<η<A,A>0,N(U(\kappa,1),\|\cdot\|_{H^{\bar{\kappa}}},\eta)\lesssim\exp\{(A/\eta)^{d/(\kappa-\bar{\kappa})}\},\leavevmode\nobreak\ 0<\eta<A,\leavevmode\nobreak\ A>0,

see Prop. A.3.1 in [31]. In turn the logarithm of its η\eta^{\prime}-covering numbers for the dXd_{X} distance is of order

logN(U(κ,1),dX,η)(JNγκ¯Aη)d/(κκ¯), 0<η<JNγκ¯A,A>0,\log N(U(\kappa,1),d_{X},\eta^{\prime})\lesssim\left(\frac{J_{N}^{\gamma-\bar{\kappa}}A^{\prime}}{\eta^{\prime}}\right)^{d/(\kappa-\bar{\kappa})},\leavevmode\nobreak\ 0<\eta^{\prime}<J_{N}^{\gamma-\bar{\kappa}}A^{\prime},\leavevmode\nobreak\ \leavevmode\nobreak\ A^{\prime}>0,

and we can apply Dudley’s metric entropy bound, Theorem 2.3.7 in [20] with t0=ψ=0U(κ,1)t_{0}=\psi=0\in U(\kappa,1), the substitution v=ηJNγ+κ¯v=\eta J_{N}^{-\gamma+\bar{\kappa}}, as well as κκ¯>d/2\kappa-\bar{\kappa}>d/2 by (30), to obtain the moment bound

EsupψU(κ,1)|X(ψ)|0JNγκ¯A(JNγκ¯Aη)d2(κκ¯)𝑑ηJNγκ¯.E\sup_{\psi\in U(\kappa,1)}|X(\psi)|\lesssim\int_{0}^{J_{N}^{\gamma-\bar{\kappa}}A^{\prime}}\Big{(}\frac{J_{N}^{\gamma-\bar{\kappa}}A^{\prime}}{\eta}\Big{)}^{\frac{d}{2(\kappa-\bar{\kappa})}}d\eta\lesssim J_{N}^{\gamma-\bar{\kappa}}.

Then using the concentration inequality Theorem 2.1.20 in [20] for suprema of Gaussian processes we obtain for all LL large enough that the probability in (2.2.1) is bounded as

Π(ΘN)\displaystyle\Pi(\Theta^{\prime}_{N}) Π(θ:supψU(κ,1)|θ,pN(ψ)|EsupψU(κ,1)|θ,pN(ψ)|>(L/2)KN)\displaystyle\leq\Pi^{\prime}\Big{(}\theta^{\prime}:\sup_{\psi\in U(\kappa,1)}|\langle\theta^{\prime},p_{N}(\psi)\rangle_{\mathcal{H}}|-E\sup_{\psi\in U(\kappa,1)}|\langle\theta,p_{N}(\psi)\rangle_{\mathcal{H}}|>(L/2)K_{N}\Big{)}
ecL2KN2/σN2ecL2NδN2\displaystyle\lesssim e^{-cL^{2}K_{N}^{2}/\sigma_{N}^{2}}\lesssim e^{c^{\prime}L^{2}N\delta_{N}^{2}} (41)

for some c,c>0c,c^{\prime}>0.

We now apply Theorem 2.2.2 in [31] with parameter space Θ=H01\Theta=H^{1}_{0} and regularisation space =H0γ¯(Ω)\mathcal{R}=H_{0}^{\bar{\gamma}}(\Omega) (as remarked on p.31 in [31], the proof applies in our periodic setting just as well). We can verify Condition 2.1.1 in [31] for κ=0,𝒳=[0,T]×Ω,V=W,\kappa=0,\mathcal{X}=[0,T]\times\Omega,V=W, in view of Condition 2A) and B), and Condition 2.2.1 in [31] with such \mathcal{R} by Condition 1. Thus Theorem 2.2.2 from [31] implies for all LL large enough,

Π(θ(ΘN)c:θHγ¯L,𝒢(θ)𝒢(θ0)L2([0,T],L2(Ω))LδN|Z(N))Pθ01\Pi\Big{(}\theta\in(\Theta^{\prime}_{N})^{c}:\|\theta\|_{H^{\bar{\gamma}}}\leq L,\|\mathscr{G}(\theta)-\mathscr{G}(\theta_{0})\|_{L^{2}([0,T],L^{2}(\Omega))}\leq L\delta_{N}|Z^{(N)}\Big{)}\to^{P_{\theta_{0}}^{\mathbb{N}}}1 (42)

as NN\to\infty, in fact as in [31] with the required convergence rate bound, and restricting to (ΘN)c(\Theta^{\prime}_{N})^{c} being possible by (41) and the remark on p.33 in [31]. The proof further implies by virtue of (1.28) in [31] that

Pθ0(AN)N1 where AN={H01eN(θ)N(θ0)𝑑ΠN(θ)e(A+2)NδN2}P_{\theta_{0}}^{\mathbb{N}}\big{(}A_{N}\big{)}\to_{N\to\infty}1\textit{ where }A_{N}=\left\{\int_{H^{1}_{0}}e^{\ell_{N}(\theta)-\ell_{N}(\theta_{0})}d\Pi_{N}(\theta)\geq e^{-(A+2)N\delta_{N}^{2}}\right\} (43)

for some A=A(Π)>0A=A(\Pi^{\prime})>0, which we will use below.

The next step is based on Condition 2E) which implies for θ\theta in the event inside of the probability in (42) that

𝒢(θ)𝒢(θ0)L2(𝒳)cθθ0H1.\|\mathscr{G}(\theta)-\mathscr{G}(\theta_{0})\|_{L^{2}(\mathcal{X})}\geq c\|\theta-\theta_{0}\|_{H^{-1}}. (44)

From (19) with b=1,m=(γ¯ξ)/(γ¯+1)b=-1,m=(\bar{\gamma}-\xi)/(\bar{\gamma}+1) we deduce

θθ0Hξ𝒢(θ)𝒢(θ0)L2(𝒳)m\|\theta-\theta_{0}\|_{H^{\xi}}\lesssim\|\mathscr{G}(\theta)-\mathscr{G}(\theta_{0})\|^{m}_{L^{2}(\mathcal{X})} (45)

which combined with (42) implies the limit (38).

It remains to prove (39). The first inequality follow from Jensen’s inequality. Then from the Cauchy-Schwarz inequality we can bound

(EΠ[θθ0Hξq|Z(N)])2\displaystyle(E^{\Pi}[\|\theta-\theta_{0}\|^{q}_{H^{\xi}}|Z^{(N)}])^{2} L2qδ~N2q(ξ)+EΠ[θθ0Hξ2q|Z(N)]Π(θθ0Hξ>Lδ~N(ξ)|Z(N))\displaystyle\leq L^{2q}\tilde{\delta}^{2q}_{N}(\xi)+E^{\Pi}[\|\theta-\theta_{0}\|_{H^{\xi}}^{2q}|Z^{(N)}]\Pi(\|\theta-\theta_{0}\|_{H^{\xi}}>L\tilde{\delta}_{N}(\xi)|Z^{(N)})
L2qδ~N2q(ξ)+tN.\displaystyle\equiv L^{2q}\tilde{\delta}^{2q}_{N}(\xi)+t_{N}. (46)

For the second summand we use (38), (43), Markov’s inequality, Fubini’s theorem, (24), Eθ0[eN(θ)N(θ0)]1E_{\theta_{0}}^{\mathbb{N}}[e^{\ell_{N}(\theta)-\ell_{N}(\theta_{0})}]\leq 1 and finiteness of all Hξ\|\cdot\|_{H^{\xi}}-norm moments of Π\Pi from Condition 1 (cf. Exercise 2.1.5 in [20])) to obtain

Pθ0(tNδ~N2q(ξ))\displaystyle P_{\theta_{0}}^{\mathbb{N}}\big{(}t_{N}\geq\tilde{\delta}^{2q}_{N}(\xi)\big{)} Pθ0(e(A+2b)NδN2θθ0Hξ2qeN(θ)N(θ0)𝑑Π(θ)>δ~N2q(ξ))+o(1)\displaystyle\leq P_{\theta_{0}}^{\mathbb{N}}\Big{(}e^{(A+2-b)N\delta_{N}^{2}}\int\|\theta-\theta_{0}\|_{H^{\xi}}^{2q}e^{\ell_{N}(\theta)-\ell_{N}(\theta_{0})}d\Pi(\theta)>\tilde{\delta}^{2q}_{N}(\xi)\Big{)}+o(1)
e(A+2b)NδN2δ~N2q(ξ)θθ0Hξ2q𝑑Π(θ)+o(1)N0\displaystyle\leq e^{(A+2-b)N\delta_{N}^{2}}\tilde{\delta}^{-2q}_{N}(\xi)\int\|\theta-\theta_{0}\|_{H^{\xi}}^{2q}d\Pi(\theta)+o(1)\to_{N\to\infty}0

for LL and then bb large enough, so that (39) follows. ∎

We can use the previous contraction theorem to show that the posterior measure is asymptotically equivalent to a hypothetical posterior ΠΘ¯N(|Z(N))\Pi^{\bar{\Theta}_{N}}(\cdot|Z^{(N)}) arising from (24) with ‘localised’ prior

ΠΘ¯N=Π(Θ¯N)Π(Θ¯N)\Pi^{\bar{\Theta}_{N}}=\frac{\Pi(\cdot\cap\bar{\Theta}_{N})}{\Pi(\bar{\Theta}_{N})} (47)

restricted to the regularisation set Θ¯N=Θ¯N,L\bar{\Theta}_{N}=\bar{\Theta}_{N,L} from (37). Specifically, from the arguments on p.142 in [44], Theorem 3 then implies that for any b>0b>0 and L(b)L(b) large enough, with Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability approaching one as NN\to\infty,

Π(|Z(N))ΠΘ¯N,L(|Z(N))TVΠ(Θ¯N,Lc|Z(N))ebNδN2\|\Pi(\cdot|Z^{(N)})-\Pi^{\bar{\Theta}_{N,L}}(\cdot|Z^{(N)})\|_{TV}\leq\Pi(\bar{\Theta}_{N,L}^{c}|Z^{(N)})\lesssim e^{-bN\delta_{N}^{2}} (48)

where TV\|\cdot\|_{TV} is the usual total variation norm on the space of probability measures on L02(Ω)L^{2}_{0}(\Omega).

Now if we denote by τN\tau_{N} the (conditional) law of τN=Law(N(θT))\tau_{N}=Law(\sqrt{N}(\theta-T)) for θΠ(|Z(N))\theta\sim\Pi(\cdot|Z^{(N)}) and any fixed re-centring THk2T\in H^{-k-2}, and if we denote by τ¯N\bar{\tau}_{N} the corresponding law where θ\theta is replaced by a draw θΠΘ¯N(|Z(N))\theta\sim\Pi^{\bar{\Theta}_{N}}(\cdot|Z^{(N)}), then we obtain for the Wasserstein distance W1=W1,Hk2W_{1}=W_{1,H^{-k-2}} featuring in Theorem 2, the following approximation:

Proposition 1.

For any THk2T\in H^{-k-2} and some c>0c>0 we have as NN\to\infty

W1(τN,τ¯N)=OPθ0(ecNδN2).W_{1}(\tau_{N},\bar{\tau}_{N})=O_{P_{\theta_{0}}^{\mathbb{N}}}(e^{-cN\delta_{N}^{2}}).
Proof.

We use Theorem 6.15 in [46] to obtain for ϑ0=N(θ0T)\vartheta_{0}=\sqrt{N}(\theta_{0}-T) the inequality

W1(τN,τ¯N)Hk2ϑϑoHk2d|τNτ¯N|(ϑ)\displaystyle W_{1}(\tau_{N},\bar{\tau}_{N})\leq\int_{H^{-k-2}}\|\vartheta-\vartheta_{o}\|_{H^{-k-2}}d|\tau_{N}-\bar{\tau}_{N}|(\vartheta)
Π(|Z(N))ΠΘ¯N,L(|Z(N))TV+\displaystyle\leq\|\Pi(\cdot|Z^{(N)})-\Pi^{\bar{\Theta}_{N,L}}(\cdot|Z^{(N)})\|_{TV}+ (49)
ϑϑoHk2>1ϑϑoHk2𝑑τN(ϑ)+ϑϑoHk2>1ϑϑoHk2𝑑τ¯N(ϑ).\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \int_{\|\vartheta-\vartheta_{o}\|_{H^{-k-2}}>1}\|\vartheta-\vartheta_{o}\|_{H^{-k-2}}d\tau_{N}(\vartheta)+\int_{\|\vartheta-\vartheta_{o}\|_{H^{-k-2}}>1}\|\vartheta-\vartheta_{o}\|_{H^{-k-2}}d\bar{\tau}_{N}(\vartheta).

From (48) the first term is of order ebNδN20e^{-bN\delta_{N}^{2}}\to 0 as NN\to\infty. For the second term we have from ϑϑ0=N(θθ0)\vartheta-\vartheta_{0}=\sqrt{N}(\theta-\theta_{0}), the Cauchy-Schwarz inequality, the continuous imbedding L2Hk2L^{2}\subset H^{-k-2} and Theorem 3, for all NN large enough,

NEΠ[θθ0Hk21{θθ0Hk2>1}|Z(N)]\displaystyle\sqrt{N}E^{\Pi}[\|\theta-\theta_{0}\|_{H^{-k-2}}1_{\{\|\theta-\theta_{0}\|_{H^{-k-2}}>1\}}|Z^{(N)}]
N(EΠ[θθ0L22|Z(N)])1/2Π(θθ0L2>1|Z(N)))1/2=OPθ0(ebNδN2/2δ~NN1/2).\displaystyle\leq\sqrt{N}\big{(}E^{\Pi}[\|\theta-\theta_{0}\|_{L^{2}}^{2}|Z^{(N)}])^{1/2}\Pi(\|\theta-\theta_{0}\|_{L^{2}}>1|Z^{(N)})\big{)}^{1/2}=O_{P_{\theta_{0}}^{\mathbb{N}}}\big{(}e^{-bN\delta_{N}^{2}/2}\tilde{\delta}_{N}N^{1/2}\big{)}.

Finally, by definition of Θ¯N\bar{\Theta}_{N} and since δ~N=o(1)\tilde{\delta}_{N}=o(1), we have ΠΘ¯N(θθ0Hk2>1|Z(N))=0\Pi^{\bar{\Theta}_{N}}(\|\theta-\theta_{0}\|_{H^{-k-2}}>1|Z^{(N)})=0 from some NN onwards so the third term equals zero eventually, and the result is proved. ∎

The previous result holds in fact for the W1,L02W_{1,L^{2}_{0}}-distance as long as TL02T\in L^{2}_{0}. We conclude that it suffices to prove the first limit in Theorem 2 for τ¯N\bar{\tau}_{N} rather than τN\tau_{N}.

2.2.2 Inverting the Fisher information operator, and the limiting process

Let ψH0η+2(Ω)\psi\in H_{0}^{\eta+2}(\Omega) so that Δψ\Delta\psi lies in H0η(Ω)H^{\eta}_{0}(\Omega). If ηη0\eta\geq\eta_{0} then Condition 2F) implies that there exists ψ¯H0η(Ω)\bar{\psi}\in H^{\eta}_{0}(\Omega) such that ψ¯=Δψ\mathcal{I}\bar{\psi}=\Delta\psi, which we write as

ψ¯=1Δψ,\bar{\psi}=\mathcal{I}^{-1}\Delta\psi, (50)

for continuous inverse 1\mathcal{I}^{-1}. By definition of \mathcal{I} this implies

Δ𝕀θ0𝕀θ0(ψ¯)=ΔψΔ(𝕀θ0𝕀θ0(ψ¯)ψ)=0\Delta\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}(\bar{\psi})=\Delta\psi\leavevmode\nobreak\ \Rightarrow\leavevmode\nobreak\ \Delta(\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}(\bar{\psi})-\psi)=0

and therefore (e.g., by the maximum principle, or examining Fourier coefficients)

𝕀θ0𝕀θ0(ψ¯)ψ=cψ where cψ is constant on Ω.\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}(\bar{\psi})-\psi=c_{\psi}\text{ where }c_{\psi}\text{ is constant on }\Omega.

[Since Ωψ=0\int_{\Omega}\psi=0 this constant equals the first Fourier coefficient cψ=𝕀θ0𝕀θ0(ψ¯),1L2c_{\psi}=\langle\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}(\bar{\psi}),1\rangle_{L^{2}}, but we shall not need this here.] We can write, for all hL02(Ω)h\in L^{2}_{0}(\Omega),

𝕀θ0h,𝕀θ0ψ¯L2([0,T],L2(Ω))=h,𝕀θ0𝕀θ0ψ¯L2(Ω)=h,𝕀θ0𝕀θ0ψ¯cψL2(Ω)=h,ψL2,\langle\mathbb{I}_{\theta_{0}}h,\mathbb{I}_{\theta_{0}}\bar{\psi}\rangle_{L^{2}([0,T],L^{2}(\Omega))}=\langle h,\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}\bar{\psi}\rangle_{L^{2}(\Omega)}=\langle h,\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}\bar{\psi}-c_{\psi}\rangle_{L^{2}(\Omega)}=\langle h,\psi\rangle_{L^{2}}, (51)

so ψ¯\bar{\psi} acts as the inverse (𝕀θ0𝕀θ0)1ψ(\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}})^{-1}\psi for such ψ\psi. We also have, by continuity of 1\mathcal{I}^{-1},

ψ¯Hη=1[Δψ]HηΔψHηψHη+2,ψH0η+2.\|\bar{\psi}\|_{H^{\eta}}=\|\mathcal{I}^{-1}[\Delta\psi]\|_{H^{\eta}}\lesssim\|\Delta\psi\|_{H^{\eta}}\lesssim\|\psi\|_{H^{\eta+2}},\leavevmode\nobreak\ \leavevmode\nobreak\ \psi\in H^{\eta+2}_{0}. (52)

What precedes will be central to our proof of Theorem 2, but also is the key to construct the limiting Gaussian measure 𝒩θ0\mathcal{N}_{\theta_{0}}: For g,hL02(Ω)C(Ω)g,h\in L^{2}_{0}(\Omega)\cap C^{\infty}(\Omega), let g¯=1[Δg]\bar{g}=\mathcal{I}^{-1}[\Delta g], h¯=1[Δh]\bar{h}=\mathcal{I}^{-1}[\Delta h] and define a mean zero Gaussian process 𝕎\mathbb{W} with covariance

E𝕎(g)𝕎(h)\displaystyle E\mathbb{W}(g)\mathbb{W}(h) =𝕀θ0g¯,𝕀θ0h¯L2([0,T],L2(Ω))=g¯,𝕀θ0𝕀θ0h¯L2(Ω),g,hL02(Ω)C.\displaystyle=\langle\mathbb{I}_{\theta_{0}}\bar{g},\mathbb{I}_{\theta_{0}}\bar{h}\rangle_{L^{2}([0,T],L^{2}(\Omega))}=\langle\bar{g},\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}\bar{h}\rangle_{L^{2}(\Omega)},\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ g,h\in L^{2}_{0}(\Omega)\cap C^{\infty}.

The induced law defines a cylindrical probability measure on \mathbb{R}^{\mathbb{N}} by the action Z=(Zj=𝕎(ej):j1)Z=(Z_{j}=\mathbb{W}(e_{j}):j\geq 1) of 𝕎\mathbb{W} on the orthonormal basis functions ejCe_{j}\in C^{\infty} from (15). Its restriction to the subspace H0aH_{0}^{-a} of \mathbb{R}^{\mathbb{N}} exists almost surely since, using the definitions, Parseval’s identity, (14) and the Cauchy Schwarz inequality,

EZha2\displaystyle E\|Z\|_{h^{-a}}^{2} j1λjaE|𝕎(ej)|2=j1λja1Δej,𝕀θ0𝕀θ01ΔejL2(Ω)\displaystyle\lesssim\sum_{j\geq 1}\lambda_{j}^{-a}E|\mathbb{W}(e_{j})|^{2}=\sum_{j\geq 1}\lambda_{j}^{-a}\langle\mathcal{I}^{-1}\Delta e_{j},\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}\mathcal{I}^{-1}\Delta e_{j}\rangle_{L^{2}(\Omega)}
=j1λja11Δej,Δ𝕀θ0𝕀θ01ΔejL2(Ω)=j1λja+11ej,ejL2(Ω)\displaystyle=\sum_{j\geq 1}\lambda_{j}^{-a-1}\langle\mathcal{I}^{-1}\Delta e_{j},\Delta\mathbb{I}_{\theta_{0}}^{*}\mathbb{I}_{\theta_{0}}\mathcal{I}^{-1}\Delta e_{j}\rangle_{L^{2}(\Omega)}=\sum_{j\geq 1}\lambda_{j}^{-a+1}\langle\mathcal{I}^{-1}e_{j},e_{j}\rangle_{L^{2}(\Omega)}
j1λja+11ejhη0ejhη0j1λja+1ejhη0ejhη0\displaystyle\leq\sum_{j\geq 1}\lambda_{j}^{-a+1}\|\mathcal{I}^{-1}e_{j}\|_{h^{\eta_{0}}}\|e_{j}\|_{h^{-\eta_{0}}}\lesssim\sum_{j\geq 1}\lambda_{j}^{-a+1}\|e_{j}\|_{h^{\eta_{0}}}\|e_{j}\|_{h^{-\eta_{0}}}
j1λja+1<\displaystyle\leq\sum_{j\geq 1}\lambda_{j}^{-a+1}<\infty

whenever 2(a1)/d>12(a-1)/d>1. Therefore, using that H0aH_{0}^{-a} is separable and standard results on Gaussian measures (e.g., Section 2.1 in [20]), we obtain:

Proposition 2.

The cylindrical law of 𝕎\mathbb{W} defines a tight centred Gaussian Borel probability measure 𝒩θ0\mathcal{N}_{\theta_{0}} on Ha(Ω)H^{-a}(\Omega) for every a>1+d/2a>1+d/2.

One may show further that a>1+d/2a>1+d/2 is necessary for this result to be true (similar to [30], Proposition 6), in particular 𝕎\mathbb{W} does not extend to a Gaussian random function with sample paths in L2(Ω)L^{2}(\Omega).

2.2.3 Perturbation expansion of the posterior Laplace transform

We now turn to an asymptotic approximation of the Laplace transform of ΠΘ¯N(|Z(N))\Pi^{\bar{\Theta}_{N}}(\cdot|Z^{(N)}). For any fixed ψHκ+2(Ω)\psi\in H^{\kappa+2}(\Omega) with κ\kappa as in (30), let

ψ¯=ψ¯θ0=1Δψ,\bar{\psi}=\bar{\psi}_{\theta_{0}}=\mathcal{I}^{-1}\Delta\psi,

which by (52) and the Sobolev imbedding theorem defines an element of

ψ¯θ0H0κ(Ω)C2(Ω)L2(Ω).\bar{\psi}_{\theta_{0}}\in H^{\kappa}_{0}(\Omega)\subset C^{2}(\Omega)\subset L^{2}(\Omega).

Moreover, recall that PEJP_{E_{J}} denotes the L2L^{2}-projector onto EJ=span{ej:0jJ}E_{J}=span\{e_{j}:0\leq j\leq J\}. For JNNδN2J_{N}\simeq N\delta_{N}^{2} from (35) let us write pN()=PEJN()p_{N}(\cdot)=P_{E_{J_{N}}}(\cdot) and for any θHγ¯(Ω)\theta\in H^{\bar{\gamma}}(\Omega) define

θ(t,ψ):=θtNpN(ψ¯θ0),t,\theta_{(t,\psi)}:=\theta-\frac{t}{\sqrt{N}}p_{N}(\bar{\psi}_{\theta_{0}}),\leavevmode\nobreak\ t\in\mathbb{R}, (53)

which again lies in the support H0γ¯(Ω)H_{0}^{\bar{\gamma}}(\Omega) of the prior (noting also that pN(h),e0L2=0\langle p_{N}(h),e_{0}\rangle_{L^{2}}=0 whenever h,e0L2=0\langle h,e_{0}\rangle_{L^{2}}=0). Then for 𝕀θ0\mathbb{I}_{\theta_{0}} as in (27) define

Ψ^N=ψ,θ0L2+1Ni=1N𝕀θ0pN(ψ¯θ0)(Xi),εiW.\hat{\Psi}_{N}=\langle\psi,\theta_{0}\rangle_{L^{2}}+\frac{1}{N}\sum_{i=1}^{N}\langle\mathbb{I}_{\theta_{0}}p_{N}(\bar{\psi}_{\theta_{0}})(X_{i}),\varepsilon_{i}\rangle_{W}. (54)

It will be shown in (81) that N(Ψ^Nθ0,ψL2)\sqrt{N}(\hat{\Psi}_{N}-\langle\theta_{0},\psi\rangle_{L^{2}}) converges in distribution for fixed ψC\psi\in C^{\infty} to the 𝒩(0,𝕀θ0ψ¯θ0L2(𝒳)2)\mathcal{N}(0,\|\mathbb{I}_{\theta_{0}}\bar{\psi}_{\theta_{0}}\|_{L^{2}(\mathcal{X})}^{2}) distribution, so Ψ^N\hat{\Psi}_{N} serves as an appropriate centering for a Bernstein-von Mises theorem.

Theorem 4.

Let ψHκ+2(Ω)\psi\in H^{\kappa+2}(\Omega) and consider the localised posterior from (48) for any L>0L>0. Then we have for every tt\in\mathbb{R} that

EΠΘ¯N[exp{tN(θ,ψL2Ψ^N)}|Z(N)]=et22𝕀θ0(ψ¯)L2(𝒳)2×Θ¯NeN(θ(t,ψ))𝑑Π(θ)Θ¯NeN(θ)𝑑Π(θ)×erN(t,ψ)\displaystyle E^{\Pi^{\bar{\Theta}_{N}}}\big{[}\exp\big{\{}t\sqrt{N}\big{(}\langle\theta,\psi\rangle_{L^{2}}-\hat{\Psi}_{N}\big{)}\big{\}}|Z^{(N)}\big{]}=e^{\frac{t^{2}}{2}\|\mathbb{I}_{\theta_{0}}(\bar{\psi})\|_{L^{2}(\mathcal{X})}^{2}}\times\frac{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta_{(t,\psi)})}d\Pi(\theta)}{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta)}d\Pi(\theta)}\times e^{r_{N}(t,\psi)}

where N\ell_{N} is as in (24), and where for every B>0,tB>0,t\in\mathbb{R},

supψHκ+2BrN(t,ψ)=oPθ0(1), as N.\sup_{\|\psi\|_{H^{\kappa+2}}\leq B}r_{N}(t,\psi)=o_{P_{\theta_{0}}^{\mathbb{N}}}(1),\leavevmode\nobreak\ \leavevmode\nobreak\ \text{ as }N\to\infty.
Proof.

We can restrict to ψU(κ+2,B)\psi\in U(\kappa+2,B) since Ωθ=0\int_{\Omega}\theta=0 and ΔΩψ=0\Delta\int_{\Omega}\psi=0. To prove the theorem we expand as on p.72 in [31],

N(θ)N(θ(t))=I+II\ell_{N}(\theta)-\ell_{N}(\theta_{(t)})=I+II

where

I=tNi=1Nεi,𝕀θ0pN(ψ¯θ0)(Xi)W+R0,N(θ,ψ)Rt,N(θ,ψ)I=\frac{t}{\sqrt{N}}\sum_{i=1}^{N}\langle\varepsilon_{i},\mathbb{I}_{\theta_{0}}p_{N}(\bar{\psi}_{\theta_{0}})(X_{i})\rangle_{W}+R_{0,N}(\theta,\psi)-R_{t,N}(\theta,\psi) (55)

with

Rt,N(θ,ψ)=i=1Nεi,𝒢(θ(t,ψ))(Xi)𝒢(θ0)(Xi)D𝒢θ0(Xi)[θ(t,ψ)θ0]W.R_{t,N}(\theta,\psi)=\sum_{i=1}^{N}\langle\varepsilon_{i},\mathscr{G}(\theta_{(t,\psi)})(X_{i})-\mathscr{G}(\theta_{0})(X_{i})-D\mathscr{G}_{\theta_{0}}(X_{i})[\theta_{(t,\psi)}-\theta_{0}]\rangle_{W}. (56)

and where

II=N2𝒢(θ0)𝒢(θ)L2(𝒳)2+N2𝒢(θ0)𝒢(θ(t,ψ))L2(𝒳)2+W0,N(θ,ψ)+Wt,N(θ,ψ)II=-\frac{N}{2}\|\mathscr{G}(\theta_{0})-\mathscr{G}(\theta)\|_{L^{2}(\mathcal{X})}^{2}+\frac{N}{2}\|\mathscr{G}(\theta_{0})-\mathscr{G}(\theta_{(t,\psi)})\|_{L^{2}(\mathcal{X})}^{2}+W_{0,N}(\theta,\psi)+W_{t,N}(\theta,\psi) (57)

with

|Wt,N(θ,ψ)|=|i=1N[|𝒢(θ0)(Xi)𝒢(θ(t,ψ))(Xi)|W2E|𝒢(θ0)(Xi)𝒢(θ(t,ψ))(Xi)|W2]|.|W_{t,N}(\theta,\psi)|=\big{|}\sum_{i=1}^{N}\big{[}|\mathscr{G}(\theta_{0})(X_{i})-\mathscr{G}(\theta_{(t,\psi)})(X_{i})|_{W}^{2}-E\big{|}\mathscr{G}(\theta_{0})(X_{i})-\mathscr{G}(\theta_{(t,\psi)})(X_{i})|_{W}^{2}\big{]}\big{|}.

From Markov’s inequality and Lemma 1 below we deduce that the remainder terms Rt,N,Wt,NR_{t,N},W_{t,N} are all oPθ0(1)o_{P_{\theta_{0}}^{\mathbb{N}}}(1) uniformly in θΘ¯N\theta\in\bar{\Theta}_{N} and ψU(κ+2,B)\psi\in U(\kappa+2,B).

Next, using Condition 2C, the first two summands in IIII can be expanded, as on p.73 in [31], as

tN𝕀θ0(θθ0),𝕀θ0pN[ψ¯]L2+t22𝕀θ0pN(ψ¯)L22+VN-t\sqrt{N}\langle\mathbb{I}_{\theta_{0}}(\theta-\theta_{0}),\mathbb{I}_{\theta_{0}}p_{N}[\bar{\psi}]\rangle_{L^{2}}+\frac{t^{2}}{2}\|\mathbb{I}_{\theta_{0}}p_{N}(\bar{\psi})\|_{L^{2}}^{2}+V_{N} (58)

where in view of (34), the hypothesis

γ>3ζ+3d2+2>3d+2γ2γ+dγd/2γ+1d/2>13,\gamma>3\zeta+\frac{3d}{2}+2>3d+2\Rightarrow\frac{\gamma}{2\gamma+d}\frac{\gamma-d/2}{\gamma+1-d/2}>\frac{1}{3},

the Cauchy-Schwarz inequality and continuity of 𝕀θ0:L2(Ω)L2([0,T],L2(Ω))\mathbb{I}_{\theta_{0}}:L^{2}(\Omega)\to L^{2}([0,T],L^{2}(\Omega)), we have for θΘ¯N\theta\in\bar{\Theta}_{N}, ψU(κ+2,B)\psi\in U(\kappa+2,B),

VNNθ(t,ψ)θ0L23N(θθ0L2+tNpN(ψ¯)L2)3Nδ~N3(0)=o(1).V_{N}\lesssim N\|\theta_{(t,\psi)}-\theta_{0}\|_{L^{2}}^{3}\lesssim N\big{(}\|\theta-\theta_{0}\|_{L^{2}}+\frac{t}{\sqrt{N}}\|p_{N}(\bar{\psi})\|_{L^{2}}\big{)}^{3}\lesssim N\tilde{\delta}^{3}_{N}(0)=o(1).

This bound is further uniform in ψU(κ+2,B)\psi\in U(\kappa+2,B) as this and (30) imply that ψ¯\bar{\psi} and then also pN(ψ¯)p_{N}(\bar{\psi}) are uniformly bounded in L2L^{2} by (52). Similarly since

N(δ~N(0)JNκ+JN2κ)=o(1)\sqrt{N}(\tilde{\delta}_{N}(0)J_{N}^{-\kappa}+J_{N}^{-2\kappa})=o(1)

in view of (34), (31), (30) and using also (33) and again continuity of 𝕀θ0\mathbb{I}_{\theta_{0}}, we can replace (58) by

tN𝕀θ0(θθ0),𝕀θ0ψ¯L2+t22𝕀θ0[ψ¯]L22+o(1).-t\sqrt{N}\langle\mathbb{I}_{\theta_{0}}(\theta-\theta_{0}),\mathbb{I}_{\theta_{0}}\bar{\psi}\rangle_{L^{2}}+\frac{t^{2}}{2}\|\mathbb{I}_{\theta_{0}}[\bar{\psi}]\|_{L^{2}}^{2}+o(1). (59)

To conclude let us write

W^N=tNi=1N𝕀θ0pN(ψ¯θ0)(Xi),εiW\hat{W}_{N}=\frac{t}{\sqrt{N}}\sum_{i=1}^{N}\langle\mathbb{I}_{\theta_{0}}p_{N}(\bar{\psi}_{\theta_{0}})(X_{i}),\varepsilon_{i}\rangle_{W}

for the first term in (55). Then using the definitions (24), (54) as well as the identity (51) with h=θθ0h=\theta-\theta_{0}, we obtain

EΠΘ¯N[exp{tN(θ,ψL2Ψ^N)}|Z(N)]\displaystyle E^{\Pi^{\bar{\Theta}_{N}}}\big{[}\exp\{t\sqrt{N}\big{(}\langle\theta,\psi\rangle_{L^{2}}-\hat{\Psi}_{N}\big{)}\}|Z^{(N)}\big{]}
=Θ¯NetN(θθ0,ψL2W^N)+N(θ)N(θ(t,ψ))eN(θ(t,ψ))𝑑Π(θ)Θ¯NeN(θ)𝑑Π(θ)\displaystyle=\frac{\int_{\bar{\Theta}_{N}}e^{t\sqrt{N}(\langle\theta-\theta_{0},\psi\rangle_{L^{2}}-\hat{W}_{N})+\ell_{N}(\theta)-\ell_{N}(\theta_{(t,\psi)})}e^{\ell_{N}(\theta_{(t,\psi)})}d\Pi(\theta)}{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta)}d\Pi(\theta)}
=et22𝕀θ0(ψ¯)L2(𝒳)2×Θ¯NeN(θ(t,ψ))𝑑Π(θ)Θ¯NeN(θ)𝑑Π(θ)×erN(t,ψ),\displaystyle=e^{\frac{t^{2}}{2}\|\mathbb{I}_{\theta_{0}}(\bar{\psi})\|_{L^{2}(\mathcal{X})}^{2}}\times\frac{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta_{(t,\psi)})}d\Pi(\theta)}{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta)}d\Pi(\theta)}\times e^{r_{N}(t,\psi)},

for a sequence rN(t,ψ)r_{N}(t,\psi) with the desired properties, so the result follows. ∎

The following lemma contains the main technical work to control the stochastic remainder terms in (55), (57) in the previous proof. It is based on applying inequalities from empirical process theory (Ch.3 in [20]). We need to upgrade the proofs of Lemmata 4.1.7 and 4.1.8 in [31] to include the additional uniformity in ψ\psi, and to hold under our weaker hypothesis (28).

Lemma 1.

Let κ\kappa be as in (30) and Θ¯N,L\bar{\Theta}_{N,L} as in (37) with γ¯\bar{\gamma} as in (31). We have for every fixed tt\in\mathbb{R} and L>0L>0 that

EsupθΘ¯N,L,ψU(κ+2,1)|i=1Nεi,𝒢(θ(t,ψ))(Xi)𝒢(θ0)(Xi)D𝒢θ0(Xi)[θ(t,ψ)θ0]W|=o(1)E\sup_{\theta\in\bar{\Theta}_{N,L},\psi\in U(\kappa+2,1)}\Big{|}\sum_{i=1}^{N}\langle\varepsilon_{i},\mathscr{G}(\theta_{(t,\psi)})(X_{i})-\mathscr{G}(\theta_{0})(X_{i})-D\mathscr{G}_{\theta_{0}}(X_{i})[\theta_{(t,\psi)}-\theta_{0}]\rangle_{W}\Big{|}=o(1)

as well as

EsupθΘ¯N,L,ψU(κ+2,1)|i=1N[|𝒢(θ0)(Xi)𝒢(θ(t,ψ))(Xi)|W2E|𝒢(θ0)(Xi)𝒢(θ(t,ψ))(Xi)|W2]|=o(1).E\sup_{\theta\in\bar{\Theta}_{N,L},\psi\in U(\kappa+2,1)}\Big{|}\sum_{i=1}^{N}\big{[}|\mathscr{G}(\theta_{0})(X_{i})-\mathscr{G}(\theta_{(t,\psi)})(X_{i})|_{W}^{2}-E\big{|}\mathscr{G}(\theta_{0})(X_{i})-\mathscr{G}(\theta_{(t,\psi)})(X_{i})|_{W}^{2}\big{]}\Big{|}=o(1).
Proof.

Let us deal with the first supremum: setting

gθ,ψ=𝒢(θtNpN(ψ¯θ0))𝒢(θ0)D𝒢θ0[θtNpN(ψ¯θ0)θ0]g_{\theta,\psi}=\mathscr{G}\big{(}\theta-\frac{t}{\sqrt{N}}p_{N}(\bar{\psi}_{\theta_{0}})\big{)}-\mathscr{G}(\theta_{0})-D\mathscr{G}_{\theta_{0}}\Big{[}\theta-\frac{t}{\sqrt{N}}p_{N}(\bar{\psi}_{\theta_{0}})-\theta_{0}\Big{]}

we need to bound

EsupθΘ¯N,L,ψU(κ,1)|i=1Nεi,gθ,ψ(Xi)W|=EsupθΘ¯N,L,ψU(κ,1)|i=1Nfθ,ψ(Zi)|E\sup_{\theta\in\bar{\Theta}_{N,L},\psi\in U(\kappa,1)}\big{|}\sum_{i=1}^{N}\langle\varepsilon_{i},g_{\theta,\psi}(X_{i})\rangle_{W}\big{|}=E\sup_{\theta\in\bar{\Theta}_{N,L},\psi\in U(\kappa,1)}\big{|}\sum_{i=1}^{N}f_{\theta,\psi}(Z_{i})\big{|} (60)

where the ZiZ_{i} are iid copies of the random variable Z=(ε1,X1)Z=(\varepsilon_{1},X_{1}) on ×[0,T]×Ω\mathbb{R}\times[0,T]\times\Omega from after (22), and where

fθ,ψ(z)=e,gθ,ψ(x)W,z=(e,x)×[0,T]×Ω,f_{\theta,\psi}(z)=\langle e,g_{\theta,\psi}(x)\rangle_{W},\leavevmode\nobreak\ \leavevmode\nobreak\ z=(e,x)\in\mathbb{R}\times[0,T]\times\Omega,

are centred random variables, Efθ,ψ(Z)=0Ef_{\theta,\psi}(Z)=0. Let us set W=W=\mathbb{R} for simplicity, the general case is proved after only notational adjustment. By Condition 2B and L2L2([0,T],L2)L^{2}\to L^{2}([0,T],L^{2})-continuity of 𝕀θ0\mathbb{I}_{\theta_{0}}, we can bound the ‘weak variances’ uniformly in ψ,θ\psi,\theta as

Efθ,ψ2(Z)Cθθ0tNpN(ψ¯θ0)L2(Ω)2δ~N2(ζ)+1NσN2,Ef_{\theta,\psi}^{2}(Z)\leq C\big{\|}\theta-\theta_{0}-\frac{t}{\sqrt{N}}p_{N}(\bar{\psi}_{\theta_{0}})\big{\|}_{L^{2}(\Omega)}^{2}\lesssim\tilde{\delta}^{2}_{N}(\zeta)+\frac{1}{N}\equiv\sigma_{N}^{2},

where we note that ψU(κ+2,B)\psi\in U(\kappa+2,B) implies that ψ¯\bar{\psi} and then also pN(ψ¯)p_{N}(\bar{\psi}) are uniformly bounded in HκL2H^{\kappa}\subset L^{2} by (52), (30). [This bound actually holds for ζ=0\zeta=0 but the chaining inequality to be used below with envelopes FNF_{N} would not allow us to take advantage of it because of the ‘Poissonian’ regime of Bernstein’s inequality.] Further from Condition 2D and again (30) we have the uniform bound

gθ,ψθθ0Hζ+tNpN(ψ¯θ0)Hζδ~N(ζ)+1Nvδ~N(ζ), some v>0.\|g_{\theta,\psi}\|_{\infty}\lesssim\|\theta-\theta_{0}\|_{H^{\zeta}}+\frac{t}{\sqrt{N}}\|p_{N}(\bar{\psi}_{\theta_{0}})\|_{H^{\zeta}}\lesssim\tilde{\delta}_{N}(\zeta)+\frac{1}{\sqrt{N}}\leq v\tilde{\delta}_{N}(\zeta),\leavevmode\nobreak\ \leavevmode\nobreak\ \text{ some }v>0.

We can take point-wise ‘envelope’ functions

|fθ(e,x)|FN(e,x)v|e|δ~N(ζ),e,x[0,T]×Ω,|f_{\theta}(e,x)|\leq F_{N}(e,x)\equiv v|e|\tilde{\delta}_{N}(\zeta),\leavevmode\nobreak\ \leavevmode\nobreak\ e\in\mathbb{R},x\in[0,T]\times\Omega,

and introduce, for any finitely supported probability measure QQ on ×[0,T]×Ω\mathbb{R}\times[0,T]\times\Omega, the constants

sQ2=EQ[e2],FNL2(Q)=sQvδ~N(ζ).s^{2}_{Q}=E_{Q}[e^{2}],\leavevmode\nobreak\ \leavevmode\nobreak\ \|F_{N}\|_{L^{2}(Q)}=s_{Q}v\tilde{\delta}_{N}(\zeta).

Then for any θΘ¯N,L,ψU(κ+2,1)\theta\in\bar{\Theta}_{N,L},\psi\in U(\kappa+2,1), and using Condition 2D,

fθ,ψfθ,ψL2(Q)\displaystyle\|f_{\theta,\psi}-f_{\theta^{\prime},\psi^{\prime}}\|_{L^{2}(Q)} fθ,ψfθ,ψL2(Q)+fθ,ψfθ,ψL2(Q)\displaystyle\leq\|f_{\theta,\psi}-f_{\theta,\psi^{\prime}}\|_{L^{2}(Q)}+\|f_{\theta,\psi^{\prime}}-f_{\theta^{\prime},\psi^{\prime}}\|_{L^{2}(Q)}
sQgθ,ψgθ,ψ+sQgθ,ψgθ,ψ\displaystyle\lesssim s_{Q}\|g_{\theta,\psi}-g_{\theta,\psi^{\prime}}\|_{\infty}+s_{Q}\|g_{\theta,\psi^{\prime}}-g_{\theta^{\prime},\psi^{\prime}}\|_{\infty}
sQNpN(ψ¯θ0ψ¯θ0)Hζ+sQθθ0Hζ\displaystyle\lesssim\frac{s_{Q}}{\sqrt{N}}\|p_{N}(\bar{\psi}_{\theta_{0}}-\bar{\psi}^{\prime}_{\theta_{0}})\|_{H^{\zeta}}+s_{Q}\|\theta-\theta_{0}\|_{H^{\zeta}}
1Nvδ~N(ζ)FNL2(Q)ψ¯ψ¯Hζ+1vδ~N(ζ)FNL2(Q)θθ0Hζ.\displaystyle\leq\frac{1}{\sqrt{N}v\tilde{\delta}_{N}(\zeta)}\|F_{N}\|_{L^{2}(Q)}\|\bar{\psi}-\bar{\psi}^{\prime}\|_{H^{\zeta}}+\frac{1}{v\tilde{\delta}_{N}(\zeta)}\|F_{N}\|_{L^{2}(Q)}\|\theta-\theta_{0}\|_{H^{\zeta}}.

The logarithm of the uu-covering numbers of U(κ,1)U(γ¯,1)U(\kappa,1)\cup U(\bar{\gamma},1) by balls of HζH^{\zeta}-radius at most uu is of the order (A/u)d/(min(κ,γ¯)ζ)(A/u)^{d/(\min(\kappa,\bar{\gamma})-\zeta)} in view of Proposition A.3.1 in [31], and so using Theorem 3.5.4 in [20] with entropy functional

𝒥N(s)0s(1vδ~N(ζ)u)d2(min(κ,γ¯)ζ)𝑑u,s>0,\mathscr{J}_{N}(s)\lesssim\int_{0}^{s}\Big{(}\frac{1}{v\tilde{\delta}_{N}(\zeta)u}\Big{)}^{\frac{d}{2(\min(\kappa,\bar{\gamma})-\zeta)}}du,\leavevmode\nobreak\ s>0,

and UN=vδ~N(ζ)maxiN|εi|U_{N}=v\tilde{\delta}_{N}(\zeta)\max_{i\leq N}|\varepsilon_{i}| (such that EUN2δ~N(ζ)logNEU_{N}^{2}\lesssim\tilde{\delta}_{N}(\zeta)\log N in view of Lemma 2.3.3 in [20]) we can estimate (60) from above by a constant multiple of

Nmax[δ~N(ζ)𝒥N(σNδ~N(ζ)),logNδ~N3(ζ)𝒥N2(σN/δ~N(ζ))NσN2].\sqrt{N}\max\Big{[}\tilde{\delta}_{N}(\zeta)\mathscr{J}_{N}\big{(}\frac{\sigma_{N}}{\tilde{\delta}_{N}(\zeta)}\big{)},\frac{\sqrt{\log N}\tilde{\delta}^{3}_{N}(\zeta)\mathscr{J}_{N}^{2}(\sigma_{N}/\tilde{\delta}_{N}(\zeta))}{\sqrt{N}\sigma_{N}^{2}}\Big{]}. (61)

Writing β=min(κ,γ¯)ζ\beta=\min(\kappa,\bar{\gamma})-\zeta, straightforward calculations show that the first term in the preceding maximum is of order

N0σNud/2β𝑑uNδ~N2d/β(ζ)\sqrt{N}\int_{0}^{\sigma_{N}}u^{-d/2\beta}du\lesssim\sqrt{N}\tilde{\delta}_{N}^{2-d/\beta}(\zeta)

while the second is of order

logNδ~N12d/β(ζ).\sqrt{\log N}\tilde{\delta}^{1-2d/\beta}_{N}(\zeta).

We see from this that we require β=min(κ,γ¯)ζ>2d\beta=\min(\kappa,\bar{\gamma})-\zeta>2d for this term to converge to zero, and then we also have

Nδ~N2d/β(ζ)0 since δ~N3(ζ)=o(N1),\sqrt{N}\tilde{\delta}^{2-d/\beta}_{N}(\zeta)\to 0\leavevmode\nobreak\ \text{ since }\leavevmode\nobreak\ \tilde{\delta}^{3}_{N}(\zeta)=o(N^{-1}),

in view of (30), (31), (34) and the hypothesis γ>3ζ+(3d/2)+2\gamma>3\zeta+(3d/2)+2.

The second empirical process in the lemma is of smaller order of magnitude than (60) and treated by similar (in fact simpler) arguments, as in the proof of Lemma 4.1.8 in [31], and left to the reader. ∎

To proceed, the ratio of integrals featuring in Theorem 4 can be written as

Θ¯NeN(θ(t,ψ))𝑑Π(θ)Θ¯NeN(θ)𝑑Π(θ)=Θ¯NeN(θ(t,ψ))dΠ(θ)dΠt(θ)𝑑Πt(θ)Θ¯NeN(θ)𝑑Π(θ)\frac{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta_{(t,\psi)})}d\Pi(\theta)}{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta)}d\Pi(\theta)}=\frac{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta_{(t,\psi)})}\frac{d\Pi(\theta)}{d\Pi_{t}(\theta)}d\Pi_{t}(\theta)}{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta)}d\Pi(\theta)} (62)

where Πt\Pi_{t} is the Gaussian law of the shifted vector θ(t,ψ)=θtpN(ψ¯θ0)/N\theta_{(t,\psi)}=\theta-tp_{N}(\bar{\psi}_{\theta_{0}})/\sqrt{N} from (53) with θΠ\theta\sim\Pi. By construction the projections pN(ψ¯θ0)p_{N}(\bar{\psi}_{\theta_{0}}) lie in EJN{constants}E_{J_{N}}\ominus\{constants\} and hence in the RKHS H0γ=H_{0}^{\gamma}=\mathcal{H} of the prior from Condition 1, which has ‘rescaled’ RKHS norm N=NδN\|\cdot\|_{\mathcal{H}_{N}}=\sqrt{N}\delta_{N}\|\cdot\|_{\mathcal{H}}. Applying the Cameron-Martin theorem (e.g., Theorem 2.6.13 in [20]) we obtain an expression for the Radon-Nikodym density and in turn the bound

|logdΠdΠt(θ)|\displaystyle\Big{|}\log\frac{d\Pi}{d\Pi_{t}}(\theta)\Big{|} |t|N|θ,pN(ψ¯θ0)N|+t2NpN(ψ¯θ0)N2.\displaystyle\leq\frac{|t|}{\sqrt{N}}\big{|}\langle\theta,p_{N}(\bar{\psi}_{\theta_{0}})\rangle_{\mathcal{H}_{N}}\big{|}+\frac{t^{2}}{N}\|p_{N}(\bar{\psi}_{\theta_{0}})\|^{2}_{\mathcal{H}_{N}}.

For ψU(κ+2,1)\psi\in U(\kappa+2,1) we know from (52) that ψ¯\bar{\psi} lies in U(κ,c)U(\kappa,c) for some c>0c>0. Thus on the event Θ¯N\bar{\Theta}_{N} from (37) we can bound the modulus of the first term in the exponent by

|t|N|θ,pN(ψ¯θ0)N||t|δNKN,κ¯δN2NJNγκ¯=N(κ¯d/2)/(2γ+d)0\frac{|t|}{\sqrt{N}}|\langle\theta,p_{N}(\bar{\psi}_{\theta_{0}})\rangle_{\mathcal{H}_{N}}|\lesssim|t|\delta_{N}K_{N,\bar{\kappa}}\lesssim\delta_{N}^{2}\sqrt{N}J_{N}^{\gamma-\bar{\kappa}}=N^{-(\bar{\kappa}-d/2)/(2\gamma+d)}\to 0

as NN\to\infty, in view of (35) and the choice of κ¯\bar{\kappa} in (30). Also by (33) with r=κr=\kappa and (52)

t2NpN(ψ¯θ0)N2=t2δN2pN(ψ¯θ0)2δN2max(JN2γ2κ,1)ψHκ+220\frac{t^{2}}{N}\|p_{N}(\bar{\psi}_{\theta_{0}})\|^{2}_{\mathcal{H}_{N}}=t^{2}\delta_{N}^{2}\|p_{N}(\bar{\psi}_{\theta_{0}})\|^{2}_{\mathcal{H}}\lesssim\delta_{N}^{2}\max(J_{N}^{2\gamma-2\kappa},1)\|\psi\|^{2}_{H^{\kappa+2}}\to 0 (63)

as NN\to\infty. Both preceding bounds are uniform in ψU(κ+2,1)\psi\in U(\kappa+2,1).

For shifted integration regions

Θ¯N,t,ψ={ϑ=θ(t,ψ):θΘ¯N}\bar{\Theta}_{N,t,\psi}=\{\vartheta=\theta_{(t,\psi)}:\theta\in\bar{\Theta}_{N}\}

and after renormalising by H1eN(θ)𝑑Π(θ)\int_{H^{1}}e^{\ell_{N}(\theta)}d\Pi(\theta), we thus deduce that the ratio in (62) equals

(1+o(1))Θ¯N,t,ψeN(ϑ)𝑑Π(ϑ)Θ¯NeN(θ)𝑑Π(θ)\displaystyle(1+o(1))\frac{\int_{\bar{\Theta}_{N,t,\psi}}e^{\ell_{N}(\vartheta)}d\Pi(\vartheta)}{\int_{\bar{\Theta}_{N}}e^{\ell_{N}(\theta)}d\Pi(\theta)} =(1+o(1))Π(Θ¯N,t,ψ|Z(N))Π(Θ¯N|Z(N))\displaystyle=(1+o(1))\frac{\Pi(\bar{\Theta}_{N,t,\psi}|Z^{(N)})}{\Pi(\bar{\Theta}_{N}|Z^{(N)})} (64)
(1+o(1))1Π(Θ¯N|Z(N))=(1+oPθ0(1))\displaystyle\leq(1+o(1))\frac{1}{\Pi(\bar{\Theta}_{N}|Z^{(N)})}=(1+o_{P_{\theta_{0}}^{\mathbb{N}}}(1)) (65)

for fixed tt and using Theorem 3, an upper bound that is uniform in ψ\psi.

Finally, for any fixed ψL02C\psi\in L^{2}_{0}\cap C^{\infty} one has

Π(Θ¯N,t,ψ|Z(N))1\Pi(\bar{\Theta}_{N,t,\psi}|Z^{(N)})\to 1 (66)

in probability: To see this notice that the four defining inequalities in (37) of Theorem 3 are preserved after adding the perturbation tpN(ψ¯)/Ntp_{N}(\bar{\psi})/\sqrt{N} and increasing the constant LL to 2L2L; indeed since ψ\psi is smooth so is ψ¯\bar{\psi} in view of (52), and so are its projections onto EJE_{J}, hence N1/2pN(ψ¯)HξN1/2=o(δ~N(ξ))N^{-1/2}\|p_{N}(\bar{\psi})\|_{H^{\xi}}\lesssim N^{-1/2}=o(\tilde{\delta}_{N}(\xi)) for 0ξγ¯.0\leq\xi\leq\bar{\gamma}. Combined with Condition 2B), (63), the limit (66) now follows from Theorem 3. Therefore, for smooth ψL02\psi\in L^{2}_{0} the ratio in (64) is

Π(Θ¯N,t,ψ|Z(N))Π(Θ¯N|Z(N))Pθ0N1,\frac{\Pi(\bar{\Theta}_{N,t,\psi}|Z^{(N)})}{\Pi(\bar{\Theta}_{N}|Z^{(N)})}\to^{P_{\theta_{0}}^{N}}1, (67)

which will be required for the convergence of the finite-dimensional distributions below.

2.2.4 Convergence in function space

We initially prove Theorem 2 with a centring θ^N\hat{\theta}_{N} that is different from θ~N=EΠ[θ|Z(N)]\tilde{\theta}_{N}=E^{\Pi}[\theta|Z^{(N)}]: Evaluating Ψ^N=Ψ^N,j\hat{\Psi}_{N}=\hat{\Psi}_{N,j} from (54) at all the trigonometric polynomials ψ=ej,j1,\psi=e_{j},j\geq 1, from (15), we obtain a (via Z(N)Z^{(N)} random) Fourier series

θ^N=j1Ψ^N,jej.\hat{\theta}_{N}=\sum_{j\geq 1}\hat{\Psi}_{N,j}e_{j}. (68)

Using independence, Conditions 2C, F, (14) and k>d/2k>d/2 we have for some C>0C>0

NEθ0θ^Nθ0Hk22\displaystyle NE_{\theta_{0}}^{\mathbb{N}}\|\hat{\theta}_{N}-\theta_{0}\|^{2}_{H^{-k-2}} 1Nj=1λjk2i=1NEθ0𝕀θ0(pN(ψ¯j))(Xi),εiW2\displaystyle\lesssim\frac{1}{N}\sum_{j=1}^{\infty}\lambda_{j}^{-k-2}\sum_{i=1}^{N}E_{\theta_{0}}^{\mathbb{N}}\langle\mathbb{I}_{\theta_{0}}(p_{N}(\bar{\psi}_{j}))(X_{i}),\varepsilon_{i}\rangle_{W}^{2} (69)
=jλjk2𝕀θ0[pN(e¯j)]L2(𝒳)2jλjk2e¯jL2(Ω)2\displaystyle=\sum_{j}\lambda_{j}^{-k-2}\|\mathbb{I}_{\theta_{0}}[p_{N}(\bar{e}_{j})]\|_{L^{2}(\mathcal{X})}^{2}\lesssim\sum_{j}\lambda_{j}^{-k-2}\|\bar{e}_{j}\|_{L^{2}(\Omega)}^{2}
=jλjk21Δej,1ΔejL2(Ω)\displaystyle=\sum_{j}\lambda_{j}^{-k-2}\langle\mathcal{I}^{-1}\Delta e_{j},\mathcal{I}^{-1}\Delta e_{j}\rangle_{L^{2}(\Omega)}
jλjk2ejHη0ejHη0jλjkC,\displaystyle\leq\sum_{j}\lambda_{j}^{-k}\|\mathcal{I}^{-2}e_{j}\|_{H^{\eta_{0}}}\|e_{j}\|_{H^{-\eta_{0}}}\lesssim\sum_{j}\lambda_{j}^{-k}\leq C,

which in particular implies that θ^N\hat{\theta}_{N} defines a random variable in H0k2H_{0}^{-k-2}.

Recalling Proposition 1 with TN=θ^NT_{N}=\hat{\theta}_{N}, we now study the ‘localised’ posterior, specifically the resulting conditional laws

τ¯N=Law(N(θθ^N)|Z(N)),θΠΘ¯N(|Z(N)),\bar{\tau}_{N}=Law\big{(}\sqrt{N}(\theta-\hat{\theta}_{N})|Z^{(N)}\big{)},\leavevmode\nobreak\ \theta\sim\Pi^{\bar{\Theta}_{N}}(\cdot|Z^{(N)}), (70)

in Hk2H^{-k-2}. Further recall the law 𝒩θ0\mathcal{N}_{\theta_{0}} from Proposition 2.

Proposition 3.

Let W1W_{1} be the Wasserstein distance on Hk2H^{-k-2} from (20). We have

W1(τ¯N,𝒩θ0)Pθ00 as N.W_{1}(\bar{\tau}_{N},\mathcal{N}_{\theta_{0}})\to^{P_{\theta_{0}}^{\mathbb{N}}}0\leavevmode\nobreak\ \leavevmode\nobreak\ \text{ as }N\to\infty.
Proof.

For JJ\in\mathbb{N} to be chosen, approximations spaces EJE_{J} from (32), and ZN,ZZ_{N},Z random variables in Hk2H^{-k-2} with laws τ¯N,𝒩θ0\bar{\tau}_{N},\mathcal{N}_{\theta_{0}} (conditional on the data vector Z(N)Z^{(N)}), respectively, we define projected laws

τ¯N,J=Law((PEJ(ZN)),nJ=Law(PEJ(Z)).\bar{\tau}_{N,J}=Law((P_{E_{J}}(Z_{N})),\leavevmode\nobreak\ \leavevmode\nobreak\ n_{J}=Law(P_{E_{J}}(Z)).

From the triangle inequality and (20) and the Cauchy-Schwarz inequality we have

W1(τ¯N,𝒩θ0)\displaystyle W_{1}(\bar{\tau}_{N},\mathcal{N}_{\theta_{0}}) W1(τ¯N,J,nJ)+W1(τ¯N,τ¯N,J)+W1(nJ,𝒩θ0)\displaystyle\leq W_{1}(\bar{\tau}_{N,J},n_{J})+W_{1}(\bar{\tau}_{N},\bar{\tau}_{N,J})+W_{1}(n_{J},\mathcal{N}_{\theta_{0}})
W1(τ¯N,J,nJ)+(EPEJ(ZN)ZNHk22)12+(EPEJ(Z)ZHk22)12\displaystyle\leq W_{1}(\bar{\tau}_{N,J},n_{J})+(E\|P_{E_{J}}(Z_{N})-Z_{N}\|^{2}_{H^{-k-2}})^{\frac{1}{2}}+(E\|P_{E_{J}}(Z)-Z\|^{2}_{H^{-k-2}})^{\frac{1}{2}}
=a+b+c.\displaystyle=a+b+c. (71)

Let ϵ>0\epsilon>0 be given. We will show that each of the terms a), b), c) can be made less than ϵ/3\epsilon/3 by appropriate choice of JJ, all NN large enough, and with Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability less than ϵ\epsilon, so that the proposition follows.

Term c): Since Pr(ZHa<)=1\Pr(\|Z\|_{H^{-a}}<\infty)=1 for a>1+d/2a>1+d/2 by Proposition 2, Fernique’s theorem (e.g., Theorem 2.1.20 and Exercise 2.1.5 in [20]) implies that EZHa2<E\|Z\|^{2}_{H^{a}}<\infty. Thus for any ϵ>0\epsilon>0 we can choose J=J(ϵ,k,d)J=J(\epsilon,k,d) fixed but finite so that

EPEJ(Z)Zhk22Ej>Jλjk2|Z,ejL2|2λJak2EZHa2<(ϵ/3)2\displaystyle E\|P_{E_{J}}(Z)-Z\|^{2}_{h^{-k-2}}\lesssim E\sum_{j>J}\lambda_{j}^{-k-2}|\langle Z,e_{j}\rangle_{L^{2}}|^{2}\leq\lambda_{J}^{a-k-2}E\|Z\|^{2}_{H^{-a}}<(\epsilon/3)^{2} (72)

follows, using also (30), (14) in the last inequality.

Term b): Theorem 4 with choice ψ=λj(κ+2)/2ejU(κ+2,1)\psi=\lambda_{j}^{-(\kappa+2)/2}e_{j}\in U(\kappa+2,1) and corresponding centring

Ψ^N,j,κ=λj(κ+2)/2Ψ^N,j\hat{\Psi}_{N,j,\kappa}=\lambda_{j}^{-(\kappa+2)/2}\hat{\Psi}_{N,j}

from (54) imply, using also (65), that for some C,c>0C,c>0 and all j1j\geq 1, with Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability converging to one,

EΠΘ¯N[exp{N(θ,λj(κ+2)/2ejL2Ψ^N,j,κ)}|Z(N)]ce12𝕀θ0(λj(κ+2)/2e¯j)L22C,E^{\Pi^{\bar{\Theta}_{N}}}\Big{[}\exp\big{\{}\sqrt{N}\big{(}\langle\theta,\lambda_{j}^{-(\kappa+2)/2}e_{j}\rangle_{L^{2}}-\hat{\Psi}_{N,j,\kappa}\big{)}\big{\}}|Z^{(N)}\Big{]}\leq ce^{\frac{1}{2}\|\mathbb{I}_{\theta_{0}}(\lambda_{j}^{-(\kappa+2)/2}\bar{e}_{j})\|_{L^{2}}^{2}}\leq C, (73)

where in the last inequality we used supj1𝕀θ0(λj(κ+2)/2e¯j)L22c<\sup_{j\geq 1}\|\mathbb{I}_{\theta_{0}}(\lambda_{j}^{-(\kappa+2)/2}\bar{e}_{j})\|_{L^{2}}^{2}\leq c<\infty, proved just as in (69). Then since x22exx^{2}\leq 2e^{x} for x0x\geq 0, we can choose J=J(ϵ,k,κ)J=J(\epsilon,k,\kappa) large enough such that with Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability as close to one as desired,

EPEJ(ZN)ZNHk22\displaystyle E\|P_{E_{J}}(Z_{N})-Z_{N}\|^{2}_{H^{-k-2}} =j>Jλjκ+2k2EΠΘ¯N[N|θ,λj(κ+2)/2ejL2Ψ^N,j,κ|2|Z(N)]\displaystyle=\sum_{j>J}\lambda_{j}^{\kappa+2-k-2}E^{\Pi^{\bar{\Theta}_{N}}}\big{[}N|\langle\theta,\lambda_{j}^{-(\kappa+2)/2}e_{j}\rangle_{L^{2}}-\hat{\Psi}_{N,j,\kappa}|^{2}|Z^{(N)}\big{]}
j>Jλjκk<(ϵ/3)2,\displaystyle\lesssim\sum_{j>J}\lambda_{j}^{\kappa-k}<(\epsilon/3)^{2}, (74)

using (14) and k>κ+d/2k>\kappa+d/2 in view of (30) in the last step.

It remains to consider term a) for JJ fixed but arbitrary. For any a=(aj)Ja=(a_{j})\in\mathbb{R}^{J}, let

1jJajθ,ejL2=θ,1jJajejL2θ,ψJ,aL2,1jJa^jΨ^N,jΨ^N,J,a,\sum_{1\leq j\leq J}a_{j}\langle\theta,e_{j}\rangle_{L^{2}}=\Big{\langle}\theta,\sum_{1\leq j\leq J}a_{j}e_{j}\Big{\rangle}_{L^{2}}\equiv\langle\theta,\psi_{J,a}\rangle_{L^{2}},\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \sum_{1\leq j\leq J}\hat{a}_{j}\hat{\Psi}_{N,j}\equiv\hat{\Psi}_{N,J,a}, (75)

in slight abuse of Ψ^\hat{\Psi} notation. Then ψJ,a\psi_{J,a} belongs to C(Ω)L02C^{\infty}(\Omega)\cap L^{2}_{0} and we can apply Theorem 4 combined with (64), (67) to deduce convergence of Laplace transforms

EΠΘ¯N[exp{tN(θ,ψJ,aL2Ψ^N,J,a)}|Z(N)]et22𝕀θ0(ψ¯J,a)L22E^{\Pi^{\bar{\Theta}_{N}}}\big{[}\exp\big{\{}t\sqrt{N}\big{(}\langle\theta,\psi_{J,a}\rangle_{L^{2}}-\hat{\Psi}_{N,J,a}\big{)}\big{\}}|Z^{(N)}\big{]}\to e^{\frac{t^{2}}{2}\|\mathbb{I}_{\theta_{0}}(\bar{\psi}_{J,a})\|_{L^{2}}^{2}}

for all tt\in\mathbb{R}, as NN\to\infty and in Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability. By Exercise 4.3.3 in [31] and the Cramer-Wold device ([44], p.16), this implies that as NN\to\infty,

β(Law(PEJ(ZN)),Law(PEJ(Z)))Pθ00,\beta(Law(P_{E_{J}}(Z_{N})),Law(P_{E_{J}}(Z)))\to^{P_{\theta_{0}}^{\mathbb{N}}}0, (76)

where β\beta is any metric (Ch. 11.3 in [14]) for weak convergence of laws on JEJ\mathbb{R}^{J}\approx E_{J}. To upgrade this to convergence in the Wasserstein distance W1,EJW_{1,E_{J}} we first show

EPEJ(ZN)EJ2=OPθ0(1),J fixed.E\|P_{E_{J}}(Z_{N})\|_{E_{J}}^{2}=O_{P_{\theta_{0}}^{\mathbb{N}}}(1),\leavevmode\nobreak\ \leavevmode\nobreak\ J\in\mathbb{N}\text{ fixed}. (77)

In view of the inequality hEJc(J,k)hhk2\|h\|_{E_{J}}\leq c(J,k)\|h\|_{h^{-k-2}} for hEJh\in E_{J} it suffices to bound

NEΠΘ¯N[θθ^NHk22|Z(N)]2Nθ^Nθ0hk22+2NEΠΘ¯N[θθ^NHk22|Z(N)].NE^{\Pi^{\bar{\Theta}_{N}}}\big{[}\|\theta-\hat{\theta}_{N}\|^{2}_{H^{-k-2}}|Z^{(N)}\big{]}\leq 2N\|\hat{\theta}_{N}-\theta_{0}\|^{2}_{h^{-k-2}}+2NE^{\Pi^{\bar{\Theta}_{N}}}\big{[}\|\theta-\hat{\theta}_{N}\|^{2}_{H^{-k-2}}|Z^{(N)}\big{]}.

The first term is bounded in Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability in view of (69) and Markov’s inequality. The second term can be estimated just as in (2.2.4) but extending the series over all indices jj (so with J=0J=0), and is hence also OPθ0(1)O_{P_{\theta_{0}}^{\mathbb{N}}}(1), so that (77) holds. In particular we deduce that for all R>0R>0

EPEJ(ZN)EJ1PEJ(ZN)EJ>REPEJ(ZN)EJ2R=OPθ0(R1).E\|P_{E_{J}}(Z_{N})\|_{E_{J}}1_{\|P_{E_{J}}(Z_{N})\|_{E_{J}}>R}\leq\frac{E\|P_{E_{J}}(Z_{N})\|^{2}_{E_{J}}}{R}=O_{P_{\theta_{0}}^{\mathbb{N}}}(R^{-1}). (78)

But now (76), (78) and Theorem 6.9 in [46] imply that for every JJ\in\mathbb{N} and as NN\to\infty

W1,EJ(τ¯N,J,nJ)Pθ00.W_{1,E_{J}}(\bar{\tau}_{N,J},n_{J})\to^{P_{\theta_{0}}^{\mathbb{N}}}0.

Using also that W1,Hk2W_{1,H^{-k-2}} coincides with W1,EJW_{1,E_{J}} for the projected laws, we conclude that for every JJ\in\mathbb{N} and ϵ>0\epsilon>0 fixed we can find N0(ϵ,J)N_{0}(\epsilon,J) large enough such that for NN0N\geq N_{0},

Pθ0(W1(τ¯N,J,nJ)>ϵ/3)<ϵ,P_{\theta_{0}}^{\mathbb{N}}\big{(}W_{1}(\bar{\tau}_{N,J},n_{J})>\epsilon/3)<\epsilon, (79)

completing the proof of convergence to zero in probability of W1(τ¯N,𝒩θ0)W_{1}(\bar{\tau}_{N},\mathcal{N}_{\theta_{0}}) from (2.2.4). ∎

Combining the preceding proposition with Proposition 1 for T=θ^NT=\hat{\theta}_{N} implies for unrestricted posterior draws θΠ(|Z(N))\theta\sim\Pi(\cdot|Z^{(N)}) that

W1(Law(N(θθ^N),𝒩θ0)Pθ00W_{1}(Law(\sqrt{N}(\theta-\hat{\theta}_{N}),\mathcal{N}_{\theta_{0}})\to^{P_{\theta_{0}}^{\mathbb{N}}}0 (80)

as NN\to\infty, that is, Theorem 2 holds for the centering θ^N\hat{\theta}_{N} instead of θ~N=EΠ[θ|Z(N)]\tilde{\theta}_{N}=E^{\Pi}[\theta|Z^{(N)}]. But convergence in W1W_{1} implies convergence of moments by Theorem 6.9 in [46], so we obtain

N(EΠ[θ|Z(N)]θ^N)Pθ0E𝒩θ0Z=0 in Hk2,\sqrt{N}(E^{\Pi}[\theta|Z^{(N)}]-\hat{\theta}_{N})\to^{P_{\theta_{0}}^{\mathbb{N}}}E_{\mathcal{N}_{\theta_{0}}}Z=0\text{ in }H^{-k-2},

and then also

W1(Law(N(θθ^N)),Law(N(θθ~N)))NEΠ[θ|Z(N)]θ^NHk2=oPθ0(1),W_{1}\big{(}Law(\sqrt{N}(\theta-\hat{\theta}_{N})),Law(\sqrt{N}(\theta-\tilde{\theta}_{N}))\big{)}\leq\sqrt{N}\|E^{\Pi}[\theta|Z^{(N)}]-\hat{\theta}_{N}\|_{H^{-k-2}}=o_{P_{\theta_{0}}^{\mathbb{N}}}(1),

which combined with (80) completes the proof of the first limit in Theorem 2.

Asymptotics for the posterior mean: To prove the second limit in Theorem 2 note that by the last display, the limit distribution of N(θ~Nθ0)\sqrt{N}(\tilde{\theta}_{N}-\theta_{0}) in Hk2H^{-k-2} coincides with the limit distribution of

QN=N(θ^Nθ0)=1Nj1i=1N𝕀θ0pN(e¯j)(Xi),εiWej,Q_{N}=\sqrt{N}(\hat{\theta}_{N}-\theta_{0})=\frac{1}{\sqrt{N}}\sum_{j\geq 1}\sum_{i=1}^{N}\langle\mathbb{I}_{\theta_{0}}p_{N}(\bar{e}_{j})(X_{i}),\varepsilon_{i}\rangle_{W}e_{j},

recalling also (54) and (68). Fix ψ=ψJ,aL02C\psi=\psi_{J,a}\in L^{2}_{0}\cap C^{\infty} as in (75) so that the resulting ψ¯\bar{\psi} from (50) also lies in L02CL^{2}_{0}\cap C^{\infty} in view of (52). By Chebyshev’s inequality and Condition 2C), we have as NN\to\infty,

Eθ0[1Ni=1N𝕀θ0pN(ψ¯)(Xi)𝕀θ0(ψ¯)(Xi),εiW]2\displaystyle E^{\mathbb{N}}_{\theta_{0}}\Big{[}\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\langle\mathbb{I}_{\theta_{0}}p_{N}(\bar{\psi})(X_{i})-\mathbb{I}_{\theta_{0}}(\bar{\psi})(X_{i}),\varepsilon_{i}\rangle_{W}\Big{]}^{2} 𝕀θ0pN(ψ¯θ0)𝕀θ0ψ¯θ0L2(𝒳)2\displaystyle\leq\|\mathbb{I}_{\theta_{0}}p_{N}(\bar{\psi}_{\theta_{0}})-\mathbb{I}_{\theta_{0}}\bar{\psi}_{\theta_{0}}\|^{2}_{L^{2}(\mathcal{X})}
pN(ψ¯θ0)ψ¯θ0L2(Ω)0.\displaystyle\lesssim\|p_{N}(\bar{\psi}_{\theta_{0}})-\bar{\psi}_{\theta_{0}}\|_{L^{2}(\Omega)}\to 0.

Therefore Markov’s inequality, Theorem 2.7 in [44] and the central limit theorem imply

QN,ψJ,aL2dN(0,𝕀θ0ψ¯J,aL22)\langle Q_{N},\psi_{J,a}\rangle_{L^{2}}\to^{d}N\big{(}0,\|\mathbb{I}_{\theta_{0}}\bar{\psi}_{J,a}\|_{L^{2}}^{2}\big{)} (81)

for any vector aJa\in\mathbb{R}^{J}. By the Cramer-Wold device ([44], p.16) this implies convergence of finite-dimensional distributions

PEJ(QN)dPEJ(Z),Z𝒩θ0, in EJP_{E_{J}}(Q_{N})\to^{d}P_{E_{J}}(Z),Z\sim\mathcal{N}_{\theta_{0}},\leavevmode\nobreak\ \text{ in }E_{J} (82)

for every JJ fixed and PEJP_{E_{J}} from (32). Writing qN,J=Law(PEJ(QN)),nJ=Law(PEJ(Z))q_{N,J}=Law(P_{E_{J}}(Q_{N})),n_{J}=Law(P_{E_{J}}(Z)), we then decompose as in (2.2.4) but now with β\beta the bounded Lipschitz metric for weak convergence ([14], Theorem 11.3.3) in Hk2H^{-k-2},

β(Law(QN),𝒩θ0)β(qN,J,nJ)+β(nJ,𝒩θ0)+β(qN,J,Law(QN))\displaystyle\beta(Law(Q_{N}),\mathcal{N}_{\theta_{0}})\leq\beta(q_{N,J},n_{J})+\beta(n_{J},\mathcal{N}_{\theta_{0}})+\beta(q_{N,J},Law(Q_{N}))
β(qN,J,nJ)+(EPEJ(Z)ZHk22)12+(EPEJ(QN)QNHk22)12.\displaystyle\leq\beta(q_{N,J},n_{J})+(E\|P_{E_{J}}(Z)-Z\|^{2}_{H^{-k-2}})^{\frac{1}{2}}+(E\|P_{E_{J}}(Q_{N})-Q_{N}\|^{2}_{H^{-k-2}})^{\frac{1}{2}}.

Given ϵ>0\epsilon>0 we can bound the middle term in the last line by ϵ/3\epsilon/3 as in (72), for J=J(ϵ)J=J(\epsilon) large enough. For the last term we can repeat the arguments from (69) to obtain the bound j>Jλjk<ϵ/3\sum_{j>J}\lambda_{j}^{-k}<\epsilon/3, again for JJ large enough. Finally the first term converges to zero as NN\to\infty by (82) and hence can be made less than ϵ/3\epsilon/3 as well for NN large enough. Conclude that as NN\to\infty,

N(θ^Nθ0)dZ𝒩θ0 in Hk2,\sqrt{N}(\hat{\theta}_{N}-\theta_{0})\to^{d}Z\sim\mathcal{N}_{\theta_{0}}\leavevmode\nobreak\ \leavevmode\nobreak\ \text{ in }H^{-k-2},

as desired. The proof of Theorem 2 is complete.

2.3 Application to the reaction diffusion model and proof of Theorem 1

We now apply Theorem 2 to the reaction-diffusion system (6).

Theorem 5.

Assume θ0C(Ω)L02(Ω)\theta_{0}\in C^{\infty}(\Omega)\cap L^{2}_{0}(\Omega). The forward map 𝒢(θ)uθ\mathscr{G}(\theta)\equiv u_{\theta} from (96) provided by the solution of the reaction diffusion equation (6) satisfies Condition 2 for

ζ>d/2,γ¯>3+d/2,η0>2+d/2,d3,W=.\zeta>d/2,\bar{\gamma}>3+d/2,\eta_{0}>2+d/2,d\leq 3,W=\mathbb{R}.

As a consequence, for prior θΠ\theta\sim\Pi as in Condition 1 with integer γ>2+3d\gamma>2+3d, posterior law of θ|Z(N)\theta|Z^{(N)} arising as in (24), posterior mean θ~N=EΠ[θ|Z(N)]\tilde{\theta}_{N}=E^{\Pi}[\theta|Z^{(N)}], and if 𝒩θ0\mathcal{N}_{\theta_{0}} is the Gaussian measure from Proposition 2, then we have

W1(Law(N(θθ~N)|Z(N)),𝒩θ0)Pθ00W_{1}(Law(\sqrt{N}(\theta-\tilde{\theta}_{N})|Z^{(N)}),\mathcal{N}_{\theta_{0}})\to^{P_{\theta_{0}}^{\mathbb{N}}}0

where W1=W1,Hk2W_{1}=W_{1,H^{-k-2}} is the Wasserstein distance (20) on Hk2(Ω)H^{-k-2}(\Omega) for any k>3dk>3d.

Proof.

The proof is based on Theorem 2 and PDE theory developed in Sec. 3. Here are the pointers: Conditions A), B) are the subject of Corollary 1, while Condition C) is verified in Theorem 7 (for any c>0c^{\prime}>0). The inequality (28) in Condition D) follows from (98) and the continuity of 𝕀θ0\mathbb{I}_{\theta_{0}} required in D) follows from (104). The stability estimate E) follows from (100), while F) is the content of Theorem 8. ∎

While k>3dk>3d is not necessarily optimal, Theorem 5 cannot hold in topologies significantly stronger than that of Hk2H^{-k-2} (such as, e.g., L2L^{2}) as convergence in law would require the limiting random variable to be tight in this function space (Thm. 11.5.4 in [14]), which 𝒩θ0\mathcal{N}_{\theta_{0}} is not in view of the remarks after Proposition 2.

Theorem 1 can be deduced from Theorem 5 by the following argument, which relies on linearisation and Lemma 2 on the smoothing properties of solutions to the linear PDE (3.2). Let us define (conditional on the observations) random variables

YN=N(uθuθ~N)|Z(N),ZN=N(θθ~N)|Z(N),θΠ(|Z(N)).Y_{N}=\sqrt{N}(u_{\theta}-u_{\tilde{\theta}_{N}})|Z^{(N)},\leavevmode\nobreak\ \leavevmode\nobreak\ Z_{N}=\sqrt{N}(\theta-\tilde{\theta}_{N})|Z^{(N)},\leavevmode\nobreak\ \leavevmode\nobreak\ \theta\sim\Pi(\cdot|Z^{(N)}).

Prior and posterior draws lie in Hγ¯H^{\bar{\gamma}} almost surely for some γ¯>d/2\bar{\gamma}>d/2. Similarly the Bochner integrals θ~N\tilde{\theta}_{N} lie in Hγ¯H^{\bar{\gamma}} on events of Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability approaching one by the arguments in and after (2.2.1). Thus an application of Proposition 4 implies that we can consider the random variables YnY_{n} as taking values in the Banach space 𝒞\mathscr{C} from (10). Next, Theorem 7 provides the linearisation of θuθ(t)\theta\mapsto u_{\theta}(t) around θ0CL02\theta_{0}\in C^{\infty}\cap L^{2}_{0} as

Duθ0[]:L2(Ω)L2([tmin,tmax],L2(Ω))Du_{\theta_{0}}[\cdot]:L^{2}(\Omega)\to L^{2}([t_{\min},t_{\max}],L^{2}(\Omega))

Since the time-dependent potential V(t,)=f(uθ0(t))V(t,\cdot)=f^{\prime}(u_{\theta_{0}}(t)) lies in C1,C^{1,\infty} from (12) in view of Proposition 4, we can invoke Lemma 2 to deduce that the linear operator Duθ0Du_{\theta_{0}} in fact extends to a continuous linear operator from Hk2H^{-k-2} to L([tmin,tmax],Hη)L^{\infty}([t_{\min},t_{\max}],H^{\eta}) for η>d/2\eta>d/2. We can replace LL^{\infty} by CC since time-continuity follows by considering solutions to (3.2) with initial conditions h=Duθ0(tmin)Hηh=Du_{\theta_{0}}(t_{\min})\in H^{\eta}, and using Proposition 5. By the Sobolev imbedding the map and hence the random variables Duθ0(ZN)Du_{\theta_{0}}(Z_{N}) take values in 𝒞=C([tmin,tmax],C(Ω))\mathscr{C}=C([t_{\min},t_{\max}],C(\Omega)). Similarly, for ZZ a random variable in Hk2H^{-k-2} of law 𝒩θ0\mathcal{N}_{\theta_{0}}, the Gaussian Borel random variable Y=Duθ0(Z)Y=Du_{\theta_{0}}(Z) takes values in the space 𝒞\mathscr{C}. By the remarks after (3.2) its law coincides with the law of the unique solution of the stochastic PDE (1.2).

To prove Theorem 1, recall from (20) that the Wasserstein distance 𝒲1(μ,ν)\mathscr{W}_{1}(\mu,\nu) involves suprema over 11-Lipschitz functions F:𝒞F:\mathscr{C}\to\mathbb{R}. Then we have

𝒲1(Law(YN),Law(Y))=supFLip1|EF(N(uθuθ~N))EF(Y)|\displaystyle\mathscr{W}_{1}(Law(Y_{N}),Law(Y))=\sup_{\|F\|_{Lip}\leq 1}|EF(\sqrt{N}(u_{\theta}-u_{\tilde{\theta}_{N}}))-EF(Y)|
=supFLip1|E[F(N(uθuθ~N)Duθ0(ZN)+Duθ0(ZN))F(Duθ0(ZN))]\displaystyle=\sup_{\|F\|_{Lip}\leq 1}\Big{|}E\big{[}F\big{(}\sqrt{N}(u_{\theta}-u_{\tilde{\theta}_{N}})-Du_{\theta_{0}}(Z_{N})+Du_{\theta_{0}}(Z_{N})\big{)}-F(Du_{\theta_{0}}(Z_{N}))\big{]}
+EF(Duθ0(ZN))EF(Y)|\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +EF(Du_{\theta_{0}}(Z_{N}))-EF(Y)\Big{|}
NEuθuθ~NDuθ0[θθ~N]𝒞+supFLip1|EF(Duθ0(ZN))EF(Duθ0[Z])|\displaystyle\leq\sqrt{N}E\big{\|}u_{\theta}-u_{\tilde{\theta}_{N}}-Du_{\theta_{0}}[\theta-\tilde{\theta}_{N}]\big{\|}_{\mathscr{C}}+\sup_{\|F\|_{Lip}\leq 1}|EF(Du_{\theta_{0}}(Z_{N}))-EF(Du_{\theta_{0}}[Z])|
=I+II.\displaystyle=I+II.

From what precedes the linear map Duθ0Du_{\theta_{0}} is continuous from Hk2H^{-k-2} to 𝒞\mathscr{C} and hence Lipschitz, and therefore the composite map GFDuθ0G\equiv F\circ Du_{\theta_{0}} is Lipschitz from Hk2(Ω)H^{-k-2}(\Omega)\to\mathbb{R}. We therefore have from (20) that

IIsupGLip1|EG(ZN)EG(Z)|=W1,Hk2(Law(N(θθ~N)|Z(N)),𝒩θ0)II\lesssim\sup_{\|G\|_{Lip}\leq 1}|EG(Z_{N})-EG(Z)|=W_{1,H^{-k-2}}(Law(\sqrt{N}(\theta-\tilde{\theta}_{N})|Z^{(N)}),\mathcal{N}_{\theta_{0}})

which converges to zero in Pθ0P_{\theta_{0}}^{\mathbb{N}}-probability as NN\to\infty by Theorem 5.

To show that the first term converges to zero we use the following argument: By the Sobolev imbedding and the interpolation inequality (19) with exponent m=(γ¯ζ)/γ¯m=(\bar{\gamma}-\zeta)/\bar{\gamma} for any d/2<ζ<γ¯<γd/2d/2<\zeta<\bar{\gamma}<\gamma-d/2 we obtain the uniform in tt bound

uθ(t,)uθ~N(t,)Duθ0(t,)[θθ~N]\displaystyle\|u_{\theta}(t,\cdot)-u_{\tilde{\theta}_{N}}(t,\cdot)-Du_{\theta_{0}}(t,\cdot)[\theta-\tilde{\theta}_{N}]\|_{\infty}
uθ(t,)uθ~N(t,)Duθ0(t,)[θθ~N]Hζ\displaystyle\lesssim\|u_{\theta}(t,\cdot)-u_{\tilde{\theta}_{N}}(t,\cdot)-Du_{\theta_{0}}(t,\cdot)[\theta-\tilde{\theta}_{N}]\|_{H^{\zeta}}
uθ(t,)uθ~N(t,)Duθ0(t,)[θθ~N]L2muθ(t,)uθ~N(t,)Duθ0(t,)[θθ~N]Hγ¯1m\displaystyle\lesssim\|u_{\theta}(t,\cdot)-u_{\tilde{\theta}_{N}}(t,\cdot)-Du_{\theta_{0}}(t,\cdot)[\theta-\tilde{\theta}_{N}]\|_{L^{2}}^{m}\|u_{\theta}(t,\cdot)-u_{\tilde{\theta}_{N}}(t,\cdot)-Du_{\theta_{0}}(t,\cdot)[\theta-\tilde{\theta}_{N}]\|_{H^{\bar{\gamma}}}^{1-m}
(θθ0L22+θ~Nθ0L22)m(θHγ¯+θ~NHγ¯)1mvNmwN1m,\displaystyle\lesssim\big{(}\|\theta-\theta_{0}\|^{2}_{L^{2}}+\|\tilde{\theta}_{N}-\theta_{0}\|_{L^{2}}^{2}\big{)}^{m}\big{(}\|\theta\|_{H^{\bar{\gamma}}}+\|\tilde{\theta}_{N}\|_{H^{\bar{\gamma}}}\big{)}^{1-m}\equiv v_{N}^{m}w_{N}^{1-m},

where the first factor was estimated from above, using also Theorem 7, by

Duθ0[θθ0]Duθ0[θ~Nθ0]Duθ0[θθ~N]L2+Cθθ0L22+Cθ~Nθ0L22\|Du_{\theta_{0}}[\theta-\theta_{0}]-Du_{\theta_{0}}[\tilde{\theta}_{N}-\theta_{0}\big{]}-Du_{\theta_{0}}[\theta-\tilde{\theta}_{N}]\|_{L^{2}}+C\|\theta-\theta_{0}\|^{2}_{L^{2}}+C\|\tilde{\theta}_{N}-\theta_{0}\|_{L^{2}}^{2}

while the second factor was bounded using Propositions 4 and 5. We can bound the (under the posterior, deterministic) variables

θ~Nθ0L22=OPθ0(δ~N),θ~NHγ¯=OPθ0(1)\|\tilde{\theta}_{N}-\theta_{0}\|_{L^{2}}^{2}=O_{P^{\mathbb{N}}_{\theta_{0}}}(\tilde{\delta}_{N}),\leavevmode\nobreak\ \|\tilde{\theta}_{N}\|_{H^{\bar{\gamma}}}=O_{P^{\mathbb{N}}_{\theta_{0}}}(1)

in view of (39). To show convergence to zero of II in probability we are thus left with estimating

NEΠ[(θθ0L22m+δ~N2m)(1+θHγ¯1m)|Z(N)].\sqrt{N}E^{\Pi}\big{[}\big{(}\|\theta-\theta_{0}\|_{L^{2}}^{2m}+\tilde{\delta}^{2m}_{N}\big{)}(1+\|\theta\|^{1-m}_{H^{\bar{\gamma}}})|Z^{(N)}\big{]}.

Under the maintained hypothesis γ>2+3d\gamma>2+3d we can choose γ¯,ζ\bar{\gamma},\zeta so that

δ~N4m=N4γ2γ+dγ¯ζγ¯+1=o(N1).\tilde{\delta}_{N}^{4m}=N^{-4\frac{\gamma}{2\gamma+d}\frac{\bar{\gamma}-\zeta}{\bar{\gamma}+1}}=o(N^{-1}).

Therefore, using again (39)

δ~N2mEΠ[(1+θHγ¯1m)|Z(N)]=oPθ0(1/N)\tilde{\delta}^{2m}_{N}E^{\Pi}[(1+\|\theta\|^{1-m}_{H^{\bar{\gamma}}})|Z^{(N)}\big{]}=o_{P_{\theta_{0}}^{\mathbb{N}}}(1/\sqrt{N})

and, using also the Cauchy-Schwarz inequality,

EΠ[θθ0L22m(1+θHγ¯1m)|Z(N)]\displaystyle E^{\Pi}[\|\theta-\theta_{0}\|_{L^{2}}^{2m}(1+\|\theta\|_{H^{\bar{\gamma}}}^{1-m})|Z^{(N)}]
(EΠ[θθ0L24m|Z(N)])12(EΠ[(1+θHγ¯22m)|Z(N)])12=OPθ0(δ~N2m)=oPθ0(1/N),\displaystyle\leq(E^{\Pi}[\|\theta-\theta_{0}\|_{L^{2}}^{4m}|Z^{(N)}])^{\frac{1}{2}}(E^{\Pi}[(1+\|\theta\|_{H^{\bar{\gamma}}}^{2-2m})|Z^{(N)}])^{\frac{1}{2}}=O_{P_{\theta_{0}}^{\mathbb{N}}}(\tilde{\delta}_{N}^{2m})=o_{P_{\theta_{0}}^{\mathbb{N}}}(1/\sqrt{N}),

completing the proof.

3 PDE results

In this section we derive various analytical results needed for the proof of Theorem 1. We refer to [39, 15] for relevant background material on the theory of parabolic PDEs that we will rely upon.

3.1 Periodic non-linear reaction diffusion equations

We first develop appropriate theory for the non-linear equation (6) underlying the dynamical system featuring in Theorem 1. For Ω=[0,1]d\Omega=[0,1]^{d} and any T>0T>0 we consider periodic solutions uu to the PDE

tuΔu\displaystyle\frac{\partial}{\partial t}u-\Delta u =f(u)on (0,T]×Ω,\displaystyle=f(u)\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }(0,T]\times\Omega,
u(0,)\displaystyle u(0,\cdot) =θon Ω,\displaystyle=\theta\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }\Omega, (83)

where f:f:\mathbb{R}\to\mathbb{R} is a smooth function of compact support, fCc()f\in C^{\infty}_{c}(\mathbb{R}), where Δ=j=1d2/(xj)2\Delta=\sum_{j=1}^{d}\partial^{2}/(\partial x_{j})^{2} is the Laplacian, and θ=u(0,)H1(Ω)\theta=u(0,\cdot)\in H^{1}(\Omega) is an initial condition. We note that ff acts non-linearly via point-wise composition, f(u)(x)=f(u(x)),xΩf(u)(x)=f(u(x)),x\in\Omega.

For (H1(Ω))=H1(Ω)(H^{1}(\Omega))^{*}=H^{-1}(\Omega), a weak solution to (3.1) is a map uL2([0,T],H1(Ω))C([0,T],L2(Ω))u\in L^{2}([0,T],H^{1}(\Omega))\cap C([0,T],L^{2}(\Omega)) such that (/t)u=uL2([0,T],H1(Ω))(\partial/\partial t)u=u^{\prime}\in L^{2}([0,T],H^{-1}(\Omega)) satisfies

u,vL2+u,vL2\displaystyle\langle u^{\prime},v\rangle_{L^{2}}+\langle\nabla u,\nabla v\rangle_{L^{2}} =f(u),vL2,vH1 and a.e. t(0,T]\displaystyle=\langle f(u),v\rangle_{L^{2}},\leavevmode\nobreak\ \leavevmode\nobreak\ \forall v\in H^{1}\text{ and a.e. }t\in(0,T]
u(0)\displaystyle u(0) =θ,\displaystyle=\theta, (84)

noting that f(u)f(u) is uniformly bounded. We call uu solving (3.1) a strong solution if in addition

uL2([0,T],H2(Ω))C([0,T],H1(Ω)),uL2([0,T],L2(Ω)),u\in L^{2}([0,T],H^{2}(\Omega))\cap C([0,T],H^{1}(\Omega)),\leavevmode\nobreak\ \leavevmode\nobreak\ u^{\prime}\in L^{2}([0,T],L^{2}(\Omega)),

in which case uu then solves (3.1) as an equation in L2([0,T],L2(Ω))L^{2}([0,T],L^{2}(\Omega)).

3.1.1 Existence and uniqueness of strong solutions

The following proof of existence adapts arguments from p.537f in [15] to the present setting.

Theorem 6.

For fCc()f\in C_{c}^{\infty}(\mathbb{R}) and any θH1(Ω),d\theta\in H^{1}(\Omega),d\in\mathbb{N}, there exists a strong solution u=uθu=u_{\theta} to the reaction-diffusion equation (3.1) that is unique in C([0,T],L2(Ω))C([0,T],L^{2}(\Omega)).

Proof.

Define the Banach space XC([0,T],L2(Ω))X\equiv C([0,T],L^{2}(\Omega)) with norm

uX=sup0<t<Tu(t)L2(Ω).\|u\|_{X}=\sup_{0<t<T}\|u(t)\|_{L^{2}(\Omega)}.

For any uXu\in X define a new function h(t,)=f(u(t,))L([0,T],L)h(t,\cdot)=f(u(t,\cdot))\in L^{\infty}([0,T],L^{\infty}). The linear parabolic PDE

twΔw\displaystyle\frac{\partial}{\partial t}w-\Delta w =hon (0,T]×Ω\displaystyle=h\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }(0,T]\times\Omega
w(0,)\displaystyle w(0,\cdot) =θon Ω\displaystyle=\theta\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }\Omega (85)

has a unique strong solution wL2([0,T],H2)C([0,T],H1(Ω))w\in L^{2}([0,T],H^{2})\cap C([0,T],H^{1}(\Omega)) such that (/t)wL2([0,T],L2)(\partial/\partial t)w\in L^{2}([0,T],L^{2}), see, e.g., Theorem 7.7 in [39]. Such ww can be represented via the usual variation of constants formula

w(t,)=j=0eλjtej,θL2ej+j=00te(ts)λjej,h(s)L2ej𝑑s,t>0,w(t,\cdot)=\sum_{j=0}^{\infty}e^{-\lambda_{j}t}\langle e_{j},\theta\rangle_{L^{2}}e_{j}+\sum_{j=0}^{\infty}\int_{0}^{t}e^{-(t-s)\lambda_{j}}\langle e_{j},h(s)\rangle_{L^{2}}e_{j}ds,\leavevmode\nobreak\ t>0, (86)

where eje_{j} are the orthonormal eigenfunctions of the (periodic) Laplacian from (15), and with the above sums converging in L2L^{2} under the present hypotheses on h,θh,\theta. For uiX,hi=f(ui),i=1,2,u_{i}\in X,h_{i}=f(u_{i}),i=1,2, consider solutions wiA(ui)Xw_{i}\equiv A(u_{i})\in X to (3.1.1) and define the constant

i1,2=Ω(w1w2)=w1w2,e0L2.i_{1,2}=\int_{\Omega}(w_{1}-w_{2})=\langle w_{1}-w_{2},e_{0}\rangle_{L^{2}}.

Since ej,1L2=0\langle e_{j},1\rangle_{L^{2}}=0 for j0j\neq 0, we have from Fubini’s theorem and the Cauchy-Schwarz inequality

i1,2\displaystyle i_{1,2} =|ΩθΩθ+0tΩ(f(u1(s,x))f(u2(s,x))dxds|\displaystyle=\left|\int_{\Omega}\theta-\int_{\Omega}\theta+\int_{0}^{t}\int_{\Omega}(f(u_{1}(s,x))-f(u_{2}(s,x))dxds\right|
Tsup0<t<Tf(u1)f(u2)L2TfLipu1u2X.\displaystyle\leq T\sup_{0<t<T}\|f(u_{1})-f(u_{2})\|_{L^{2}}\leq T\|f\|_{Lip}\|u_{1}-u_{2}\|_{X}.

Interchanging integration and differentiation (justified by standard arguments for the strong solutions wiw_{i} to (3.1.1)) and using the inequalities of Cauchy-Schwarz, Young and Poincaré (cf. p.151 in [39]), we have for any ϵ>0\epsilon>0 that

ddtw1w2L22\displaystyle\frac{d}{dt}\|w_{1}-w_{2}\|_{L^{2}}^{2} +2(w1w2i1,2)L22\displaystyle+2\|\nabla(w_{1}-w_{2}-i_{1,2})\|^{2}_{L^{2}}
=2w1w2,t(w1w2)Δ(w1w2)L2\displaystyle=2\langle w_{1}-w_{2},\frac{\partial}{\partial t}(w_{1}-w_{2})-\Delta(w_{1}-w_{2})\rangle_{L^{2}}
=2w1w2,h1h2L2\displaystyle=2\langle w_{1}-w_{2},h_{1}-h_{2}\rangle_{L^{2}}
(ϵ/2)w1w2i1,2+i1,2L22+12ϵf(u1)f(u2)L22\displaystyle\leq(\epsilon/2)\|w_{1}-w_{2}-i_{1,2}+i_{1,2}\|_{L^{2}}^{2}+\frac{1}{2\epsilon}\|f(u_{1})-f(u_{2})\|^{2}_{L^{2}}
ϵ(w1w2i1,2)L22+(T2+(2ϵ)1)fLip2u1u2X2.\displaystyle\leq\epsilon\|\nabla(w_{1}-w_{2}-i_{1,2})\|_{L^{2}}^{2}+(T^{2}+(2\epsilon)^{-1})\|f\|_{Lip}^{2}\|u_{1}-u_{2}\|^{2}_{X}.

Then choosing ϵ\epsilon small enough and subtracting, we deduce

ddtw1w2L22c(T2+1)u1u2X2\frac{d}{dt}\|w_{1}-w_{2}\|_{L^{2}}^{2}\leq c(T^{2}+1)\|u_{1}-u_{2}\|^{2}_{X}

for some c=c(fLip)>0c=c(\|f\|_{Lip})>0. We can integrate 0t()𝑑s\int_{0}^{t}(\cdot)ds this inequality and use w1(0)w2(0)=θθ=0w_{1}(0)-w_{2}(0)=\theta-\theta=0 to obtain

A(u1)A(u2)X2=sup0<t<Tw1(t)w2(t)L22c(T2+1)Tu1u2X2.\|A(u_{1})-A(u_{2})\|^{2}_{X}=\sup_{0<t<T}\|w_{1}(t)-w_{2}(t)\|^{2}_{L^{2}}\leq c(T^{2}+1)T\|u_{1}-u_{2}\|^{2}_{X}. (87)

We deduce that for T=T0T=T_{0} small enough depending only on fLip\|f\|_{Lip}, the map AA is a contraction on XX and Banach’s fixed point theorem (p.536, [15]) implies the existence of a unique uXu\in X satisfying A(u)=uA(u)=u on the interval [0,T0][0,T_{0}]. Moreover, since u=A(u)u=A(u) is in the range of the linear parabolic solution operator from above, it has the desired regularity for a strong solution. In particular u(T0)H1u(T_{0})\in H^{1} and we can iterate the preceding argument finitely many times with initial conditions u(T0),u(T1),,u(T_{0}),u(T_{1}),\dots, etc., to extend the solutions to [0,T][0,T]. ∎

3.1.2 Regularity estimates

We will need that regularity of initial conditions θHγ\theta\in H^{\gamma} translates into regularity of solutions uθL([0,T],Hγ)u_{\theta}\in L^{\infty}([0,T],H^{\gamma}) for any γ\gamma and in a quantitative way. We will employ ‘energy’ methods via Sobolev norms (13) with corresponding inner product ,hγ\langle\cdot,\cdot\rangle_{h^{\gamma}}. Note that we have for any uHγu\in H^{\gamma} and dd-dimensional periodic vector field wHγw\in H^{\gamma} the ‘divergence theorem’

w,uhγ\displaystyle\langle w,\nabla u\rangle_{h^{\gamma}} =l=1dj0(1+λj)γwl,ejL2lu,ejL2¯\displaystyle=\sum_{l=1}^{d}\sum_{j\geq 0}(1+\lambda_{j})^{\gamma}\langle w_{l},e_{j}\rangle_{L^{2}}\overline{\langle\partial_{l}u,e_{j}\rangle_{L^{2}}}
=l=1dj0(1+λj)γwl,ejL2u,lejL2¯\displaystyle=-\sum_{l=1}^{d}\sum_{j\geq 0}(1+\lambda_{j})^{\gamma}\langle w_{l},e_{j}\rangle_{L^{2}}\overline{\langle u,\partial_{l}e_{j}\rangle_{L^{2}}}
=l=1dj0(1+λj)γwl,cj,lejL2u,ejL2¯=w,uhγ,\displaystyle=\sum_{l=1}^{d}\sum_{j\geq 0}(1+\lambda_{j})^{\gamma}\langle w_{l},c_{j,l}e_{j}\rangle_{L^{2}}\overline{\langle u,e_{j}\rangle_{L^{2}}}=-\langle\nabla\cdot w,u\rangle_{h^{\gamma}}, (88)

using integration by parts for the ll-th variable and lej=cj,lej(x)\partial_{l}e_{j}=c_{j,l}e_{j}(x) for cj,l=2πijlc_{j,l}=2\pi ij_{l} with cj,l¯=cj,l\overline{c_{j,l}}=-c_{j,l} and eje_{j} from (15). In particular if w=uw=\nabla u we have

uhγ2=Δu,uhγ.\|\nabla u\|_{h^{\gamma}}^{2}=-\langle\Delta u,u\rangle_{h^{\gamma}}. (89)

These identities hold for all γ\gamma\in\mathbb{R} as long as the action of u,wu,w on eje_{j} is appropriately defined.

The proof of the following proposition works for any dd as long as γ2\gamma\leq 2. For γ>2\gamma>2 in general dimensions d>3d>3, energy methods would need to be replaced by Schauder estimates as in Sec. 5.2 in [27], but we abstain from these technicalities to focus on the main ideas.

Proposition 4.

For θH1(Ω)\theta\in H^{1}(\Omega) and fCc()f\in C^{\infty}_{c}(\mathbb{R}), let uu be a solution of (3.1). Suppose θHγB\|\theta\|_{H^{\gamma}}\leq B for γ0\gamma\geq 0 and if γ>2\gamma>2 also assume d3d\leq 3. Then we have

0Tuθ(t)Hγ+12𝑑t+sup0tTuθ(t)HγC,\int_{0}^{T}\|u_{\theta}(t)\|_{H^{\gamma+1}}^{2}dt+\sup_{0\leq t\leq T}\|u_{\theta}(t)\|_{H^{\gamma}}\leq C, (90)

for some C=C(B,f,γ,d)C=C(B,f,\gamma,d), and, writing u=(u/t)u^{\prime}=(\partial u/\partial t), we also have

0Tuθ(t)Hγ12𝑑t+esssup0tTuθ(t)Hγ2C.\int_{0}^{T}\big{\|}u^{\prime}_{\theta}(t)\big{\|}_{H^{\gamma-1}}^{2}dt+{\rm{ess}}\sup_{0\leq t\leq T}\|u^{\prime}_{\theta}(t)\|_{H^{\gamma-2}}\leq C^{\prime}. (91)

In particular uθC([0,T],Hγ)u_{\theta}\in C([0,T],H^{\gamma}) and if θC(Ω)\theta\in C^{\infty}(\Omega), then uθu_{\theta} lies in b>0C1([0,T],Cb(Ω))\cap_{b>0}C^{1}([0,T],C^{b}(\Omega)).

Proof.

We first assume the solutions uu are sufficiently regular and establish the a-priori estimates (90), (91) in Steps 1 and 2. The general case then follows from a Galerkin argument in Step 3.

Step 1: Preliminary identities and estimates. Differentiating the squared hγh^{\gamma}-norm and using (89) we obtain the identity

12ddtuhγ2+uhγ2\displaystyle\frac{1}{2}\frac{d}{dt}\|u\|_{h^{\gamma}}^{2}+\|\nabla u\|_{h^{\gamma}}^{2} =tu,uhγΔu,uhγ=f(u),uhγ.\displaystyle=\big{\langle}\frac{\partial}{\partial t}u,u\big{\rangle}_{h^{\gamma}}-\langle\Delta u,u\rangle_{h^{\gamma}}=\langle f(u),u\rangle_{h^{\gamma}}. (92)

Since f(u)f(u) is uniformly bounded and of compact support, we know that

f(u),uL2=Ωf(u)u1|u|Kcf<, for some K>0,\langle f(u),u\rangle_{L^{2}}=\int_{\Omega}f(u)u1_{|u|\leq K}\leq c_{f}<\infty,\text{ for some }K>0,

and all uL2u\in L^{2}, so that (92) gives for some constant cf>0c_{f}>0,

ddtu(t)L22+2u(t)L222cf\frac{d}{dt}\|u(t)\|^{2}_{L^{2}}+2\|\nabla u(t)\|^{2}_{L^{2}}\leq 2c_{f} (93)

which we can integrate to obtain sup0tTu(t)L22Tcf<\sup_{0\leq t\leq T}\|u(t)\|_{L^{2}}\leq 2Tc_{f}<\infty as well as

0TuL22𝑑tTcf+u(0,)L22c(B,f,T)<.\int_{0}^{T}\|\nabla u\|_{L^{2}}^{2}dt\leq Tc_{f}+\|u(0,\cdot)\|_{L^{2}}^{2}\leq c(B,f,T)<\infty.

Since uH12uL22+uL22\|u\|^{2}_{H^{1}}\simeq\|u\|^{2}_{L^{2}}+\|\nabla u\|_{L^{2}}^{2} this already proves the case γ=0\gamma=0 in (90).

Next by definition of the hγh^{\gamma}-norms and the Cauchy-Schwarz inequality, the r.h.s. in (92) can be upper bounded as

f(u),uhγf(u)hγ1uhγ+1f(u)Hγ1(uhγ+uL2)\langle f(u),u\rangle_{h^{\gamma}}\leq\|f(u)\|_{h^{\gamma-1}}\|u\|_{h^{\gamma+1}}\lesssim\|f(u)\|_{H^{\gamma-1}}(\|\nabla u\|_{h^{\gamma}}+\|u\|_{L^{2}})

and then using Young’s inequality for products this is further bounded by

2f(u)hγ12+14uhγ2+14uL22.2\|f(u)\|^{2}_{h^{\gamma-1}}+\frac{1}{4}\|\nabla u\|^{2}_{h^{\gamma}}+\frac{1}{4}\|u\|^{2}_{L^{2}}.

We conclude from this, (92) and the bound for uL2\|u\|_{L^{2}} just established for γ=0\gamma=0 that

ddtuhγ2+uhγ2f(u)hγ12+c\frac{d}{dt}\|u\|_{h^{\gamma}}^{2}+\|\nabla u\|_{h^{\gamma}}^{2}\lesssim\|f(u)\|^{2}_{h^{\gamma-1}}+c (94)

holds for some uniform constant c>0c>0 and all γ1\gamma\geq 1.

Step 2: Proof of a priori estimates. We now turn to the proof of (90) for γ1\gamma\geq 1. We argue by induction on γ\gamma and assume the result has been proved for γ1\gamma-1. The induction hypothesis provides us with

u(t)L2+u(t)hγ1C,γ1,\|u(t)\|_{L^{2}}+\|u(t)\|_{h^{\gamma-1}}\leq C,\leavevmode\nobreak\ \gamma\geq 1,

with constant C=C(T,B)C=C(T,B). We show that as a consequence the r.h.s. in (94) is bounded by a fixed constant. This is clear for γ=1\gamma=1 and for γ=2\gamma=2 we have from the chain rule and for multi-index α\alpha that

f(u)h1f(u)H1f(u)L2+max|α|=1f(u)DαuL2c+fuhγ1c<.\|f(u)\|_{h^{1}}\lesssim\|f(u)\|_{H^{1}}\lesssim\|f(u)\|_{L^{2}}+\max_{|\alpha|=1}\|f^{\prime}(u)D^{\alpha}u\|_{L^{2}}\leq c+\|f^{\prime}\|_{\infty}\|u\|_{h^{\gamma-1}}\leq c^{\prime}<\infty.

For γ=3\gamma=3 we have f(u)h2f(u)H2c+max|α|=2Dα(f(u))L2\|f(u)\|_{h^{2}}\lesssim\|f(u)\|_{H^{2}}\lesssim c+\max_{|\alpha|=2}\|D^{\alpha}(f(u))\|_{L^{2}} and need to estimate the last term: we use the chain rule combined with the Cauchy-Schwarz and Ladyzhenskaya’s inequality ([39], p.244) to obtain, with |αi|=1|\alpha_{i}|=1, the bound

f′′[Ω(Dα1uDα2u)2]1/2+fuH2Dα1uL4Dα2uL4+uH2\displaystyle\|f^{\prime\prime}\|_{\infty}\big{[}\int_{\Omega}(D^{\alpha_{1}}uD^{\alpha_{2}}u)^{2}\big{]}^{1/2}+\|f^{\prime}\|_{\infty}\|u\|_{H^{2}}\lesssim\|D^{\alpha_{1}}u\|_{L^{4}}\|D^{\alpha_{2}}u\|_{L^{4}}+\|u\|_{H^{2}} (95)
uH12d/2uH2d/2+uH2uH2d/2+uH2C<.\displaystyle\lesssim\|u\|_{H^{1}}^{2-d/2}\|u\|_{H^{2}}^{d/2}+\|u\|_{H^{2}}\lesssim\|u\|_{H^{2}}^{d/2}+\|u\|_{H^{2}}\leq C<\infty.

[This argument is for d=2,3d=2,3 – for d=1d=1 one can argue directly via (17).] For γ=4\gamma=4, we have from the chain rule and the multiplier inequality (17) for H2H^{2}-norms with d<4d<4 that

f(u)H3\displaystyle\|f(u)\|_{H^{3}} c+max0|α|1f(u)DαuH2c+f(u)H2max0|α|1DαuH2\displaystyle\lesssim c+\max_{0\leq|\alpha|\leq 1}\|f^{\prime}(u)D^{\alpha}u\|_{H^{2}}\lesssim c+\|f^{\prime}(u)\|_{H^{2}}\max_{0\leq|\alpha|\leq 1}\|D^{\alpha}u\|_{H^{2}}
c+(uH2d/2+uH2)uH3c,\displaystyle\lesssim c+(\|u\|_{H^{2}}^{d/2}+\|u\|_{H^{2}})\|u\|_{H^{3}}\leq c^{\prime},

using the same bound as in (95) but now with f′′′f^{\prime\prime\prime} replacing f′′f^{\prime\prime}. The last argument can now be iterated for all γ>4\gamma>4, using the multiplier inequality (17) for Hγ2H^{\gamma-2}-norms to see

f(u)Hγ1c+max0|α|1f(u)DαuHγ2c+f(u)Hγ2uHγ1C,\|f(u)\|_{H^{\gamma-1}}\lesssim c+\max_{0\leq|\alpha|\leq 1}\|f^{\prime}(u)D^{\alpha}u\|_{H^{\gamma-2}}\lesssim c+\|f^{\prime}(u)\|_{H^{\gamma-2}}\|u\|_{H^{\gamma-1}}\leq C,

since the bound for f(u)Hγ2\|f^{\prime}(u)\|_{H^{\gamma-2}} is the same as the one for f(u)Hγ2\|f(u)\|_{H^{\gamma-2}} up to adjusting the constants to depend on ff^{\prime} rather than on ff.

Summarising, assuming the induction hypothesis with γ1\gamma-1, we have shown the inequality

ddtuhγ2+uhγ2c0,\frac{d}{dt}\|u\|_{h^{\gamma}}^{2}+\|\nabla u\|_{h^{\gamma}}^{2}\leq c_{0},

for some uniform constant c0>0c_{0}>0. Since u(0)hγ=θhγB\|u(0)\|_{h^{\gamma}}=\|\theta\|_{h^{\gamma}}\lesssim B by hypothesis, this can be integrated just as in Step 1 with γ=0\gamma=0 to give

u(t)hγ2+0Tu(t)hγ2𝑑tc1(B,f,γ,T)<,\|u(t)\|_{h^{\gamma}}^{2}+\int_{0}^{T}\|\nabla u(t)\|_{h^{\gamma}}^{2}dt\leq c_{1}(B,f,\gamma,T)<\infty,

which proves (90), using also that uHγ+12uL22+uhγ2\|u\|^{2}_{H^{\gamma+1}}\simeq\|u\|^{2}_{L^{2}}+\|\nabla u\|_{h^{\gamma}}^{2}. Finally the first estimate in (91) follows from integrating over [0,T][0,T] the inequality

uθ(t)hγ12=Δuθ(t)+f(uθ(t,))hγ12uθ(t)hγ+12+f(uθ(t,))hγ12,\big{\|}u^{\prime}_{\theta}(t)\big{\|}_{h^{\gamma-1}}^{2}=\|\Delta u_{\theta}(t)+f(u_{\theta}(t,\cdot))\|_{h^{\gamma-1}}^{2}\lesssim\|u_{\theta}(t)\|_{h^{\gamma+1}}^{2}+\|f(u_{\theta}(t,\cdot))\|^{2}_{h^{\gamma-1}},

and the preceding estimates. Applying the preceding inequality with γ1\gamma-1 replaced by γ2\gamma-2 gives the second part of (91).

Step 3: Conclusion of the proof. We now employ a standard Galerkin argument: We can use standard ODE theory to construct smooth solutions uJu_{J} to (3.1) in the finite dimensional space EJ,J,E_{J},J\in\mathbb{N}, from (32), with smooth PEJfP_{E_{J}}f and initial conditions PEJθEJP_{E_{J}}\theta\in E_{J}, where PEJP_{E_{J}} is the L2L^{2}-projector onto EJE_{J}. Note that

PEJhhhγJ0,PEJhhγhhγhHγ.\|P_{E_{J}}h-h\|_{h^{\gamma}}\to_{J\to\infty}0,\leavevmode\nobreak\ \leavevmode\nobreak\ \|P_{E_{J}}h\|_{h^{\gamma}}\leq\|h\|_{h^{\gamma}}\leavevmode\nobreak\ \forall h\in H^{\gamma}.

This and the estimates from above then imply uniform in JJ bounds for the norms of uJu_{J} in (90) and (91). We can take weak sub-sequential limits as JJ\to\infty (by the Banach Alaoglu theorem in the corresponding Hilbert spaces), and the limits provide weak solutions of (3.1). Since θH1\theta\in H^{1}, Theorem 6 implies that these weak solutions coincide with the unique strong solutions, in particular they inherit the same norms bounds as weak limits in Hilbert space. The details are as on p.222f in [39] and left to the reader.

Finally, the time continuity of the solutions follows as in Corollary 7.3 in [39]. If θC\theta\in C^{\infty} then θHγ\theta\in H^{\gamma} for every γ\gamma and (90), (91) imply that uθHγ+uθHγ2\|u_{\theta}\|_{H^{\gamma}}+\|u^{\prime}_{\theta}\|_{H^{\gamma-2}} are bounded on 0<t<T0<t<T. An application of the Sobolev imbedding theorem with γ2>b+d/2\gamma-2>b+d/2 then bounds the CbC^{b}norms uniformly on [0,T][0,T], for any bb. ∎

3.1.3 Regularity of the forward map 𝒢\mathscr{G}

We can now derive some first properties of the ‘forward’ map

𝒢(θ)=uθ,𝒢:H1(Ω)C([0,T],L2(Ω)),\mathscr{G}(\theta)=u_{\theta},\leavevmode\nobreak\ \leavevmode\nobreak\ \mathscr{G}:H^{1}(\Omega)\to C([0,T],L^{2}(\Omega)), (96)

arising from the solution operator θuθ\theta\mapsto u_{\theta} of the PDE (3.1) constructed in Theorem 6.

Corollary 1.

Let fCc()f\in C^{\infty}_{c}(\mathbb{R}) be fixed and d3d\leq 3. Then hypotheses B) and A) of Condition 2 are satisfied, and with any γ¯>d/2\bar{\gamma}>d/2 in case A).

Proof.

Part A) follows from (90) and the Sobolev imbedding. For Part B), we use (89) and obtain for w=uθuϑw=u_{\theta}-u_{\vartheta} and all 0<t<T0<t<T that

ddtwL22+wL22=f(uθ)f(uϑ),uθuϑL2fLipuθuϑL22cwL22\frac{d}{dt}\|w\|_{L^{2}}^{2}+\|\nabla w\|_{L^{2}}^{2}=\langle f(u_{\theta})-f(u_{\vartheta}),u_{\theta}-u_{\vartheta}\rangle_{L^{2}}\leq\|f\|_{Lip}\|u_{\theta}-u_{\vartheta}\|_{L^{2}}^{2}\leq c\|w\|_{L^{2}}^{2}

which combined with Gronwall’s inequality gives

w(t)L22w(0)L22, 0<t<T.\|w(t)\|_{L^{2}}^{2}\lesssim\|w(0)\|_{L^{2}}^{2},\leavevmode\nobreak\ \leavevmode\nobreak\ 0<t<T. (97)

Integrating this bound over [0,T][0,T] completes the proof. ∎

Similarly we can take w=uθuϑw=u_{\theta}-u_{\vartheta} for θ,ϑHζH1\theta,\vartheta\in H^{\zeta}\cap H^{1} and use the mean value theorem with mean values u~(x)(uθ(x),uϑ(x))\tilde{u}(x)\in(u_{\theta}(x),u_{\vartheta}(x)), as well as the Cauchy-Schwarz and the multiplier inequalities (17) for any ζ>d/2\zeta>d/2 to deduce the inequality

ddtwhζ2+whζ2=f(u~)(uθuϑ),uθuϑhζf(u~)HζuθuϑHζ2cwhζ2\frac{d}{dt}\|w\|_{h^{\zeta}}^{2}+\|\nabla w\|_{h^{\zeta}}^{2}=\langle f^{\prime}(\tilde{u})(u_{\theta}-u_{\vartheta}),u_{\theta}-u_{\vartheta}\rangle_{h^{\zeta}}\leq\|f^{\prime}(\tilde{u})\|_{H^{\zeta}}\|u_{\theta}-u_{\vartheta}\|_{H^{\zeta}}^{2}\leq c\|w\|_{h^{\zeta}}^{2}

for all 0<t<T0<t<T, where f(u~)Hζ\|f^{\prime}(\tilde{u})\|_{H^{\zeta}} can be bounded as in the proof of Proposition 4 by a constant depending only on ff and BθHζ+ϑHζB\geq\|\theta\|_{H^{\zeta}}+\|\vartheta\|_{H^{\zeta}}. This implies by Gronwall’s inequality and the Sobolev imbedding HζLH^{\zeta}\subset L^{\infty} that

sup0<t<Tuθ(t)uϑ(t)sup0<t<Tuθ(t)uϑ(t)HζθϑHζ.\sup_{0<t<T}\|u_{\theta}(t)-u_{\vartheta}(t)\|_{\infty}\lesssim\sup_{0<t<T}\|u_{\theta}(t)-u_{\vartheta}(t)\|_{H^{\zeta}}\lesssim\|\theta-\vartheta\|_{H^{\zeta}}. (98)

3.1.4 The linearisation of 𝒢\mathscr{G} as a Schrödinger equation

Similarly as in the previous subsection, consider now any θ,θ0\theta,\theta_{0} such that θHγ¯+θ0Hγ¯B,γ¯>max(1,d/2),\|\theta\|_{H^{\bar{\gamma}}}+\|\theta_{0}\|_{H^{\bar{\gamma}}}\leq B,\bar{\gamma}>\max(1,d/2), and let uθ,uθ0C([0,T],Hγ¯)u_{\theta},u_{\theta_{0}}\in C([0,T],H^{\bar{\gamma}}) be the corresponding strong solutions of (3.1) from Proposition 4 with initial conditions θ,θ0\theta,\theta_{0}. Then w=uθuθ0w=u_{\theta}-u_{\theta_{0}} satisfies on [0,T]×Ω[0,T]\times\Omega

twΔw=f(uθ)f(uθ0)=f(u~)w\frac{\partial}{\partial t}w-\Delta w=f(u_{\theta})-f(u_{\theta_{0}})=f^{\prime}(\tilde{u})w (99)

and w(0)=hw(0)=h, where we have used the mean value theorem with mean values

u~(t,x)=(1s)uθ(t,x)+suθ0(t,x), 0<s<1,(t,x)[0,T]×Ω.\tilde{u}(t,x)=(1-s)u_{\theta}(t,x)+su_{\theta_{0}}(t,x),\leavevmode\nobreak\ 0<s<1,\leavevmode\nobreak\ \leavevmode\nobreak\ (t,x)\in[0,T]\times\Omega.

We see that ww solves the linear PDE (3.2) below with potential V=f(u~)V=f^{\prime}(\tilde{u}), m=0m=0 and initial condition θθ0\theta-\theta_{0}. We thus deduce from Lemma 4 below the stability estimate

uθuθ0L2([0,T],L2(Ω))2=wL2([0,T],L2(Ω))2cθθ0H12,\|u_{\theta}-u_{\theta_{0}}\|^{2}_{L^{2}([0,T],L^{2}(\Omega))}=\|w\|^{2}_{L^{2}([0,T],L^{2}(\Omega))}\geq c\|\theta-\theta_{0}\|^{2}_{H^{-1}}, (100)

for some c>0c>0 depending only on d,T,Bd,T,B, noting that the C1([0,T],C1)C^{1}([0,T],C^{1})-norm of V=f(u~)V=f^{\prime}(\tilde{u}) is bounded by a fixed constant M=M(B)>0M=M(B)>0 whenever γ¯>3+d/2\bar{\gamma}>3+d/2, seen as follows: First we have from (90) and the Sobolev imbedding with γ¯>1+d/2\bar{\gamma}>1+d/2 that

f(u~(t))C1f+f′′u~(t)C1c+supθHγ¯Buθ(t)Hγ¯M\|f^{\prime}(\tilde{u}(t))\|_{C^{1}}\lesssim\|f\|_{\infty}+\|f^{\prime\prime}\|_{\infty}\|\tilde{u}(t)\|_{C^{1}}\lesssim c+\sup_{\|\theta\|_{H^{\bar{\gamma}}\leq B}}\|u_{\theta}(t)\|_{H^{\bar{\gamma}}}\leq M

for every t[0,T]t\in[0,T]. Next, for any multi-index β\beta and partial differential operator DβD^{\beta}, the chain rule implies, for almost every 0<t<T0<t<T,

V(t)C1\displaystyle\|V^{\prime}(t)\|_{C^{1}} =f′′(u~(t))u~(t)C1\displaystyle=\|f^{\prime\prime}(\tilde{u}(t))\tilde{u}^{\prime}(t)\|_{C^{1}}
f′′(u~(t))u~(t)+max|β|=1f′′′(u~(t))Dβu~(t)u~(t)+f′′(u~(t))Dβu~(t)\displaystyle\lesssim\|f^{\prime\prime}(\tilde{u}(t))\tilde{u}^{\prime}(t)\|_{\infty}+\max_{|\beta|=1}\|f^{\prime\prime\prime}(\tilde{u}(t))D^{\beta}\tilde{u}(t)\tilde{u}^{\prime}(t)+f^{\prime\prime}(\tilde{u}(t))D^{\beta}\tilde{u}^{\prime}(t)\|_{\infty}
u~(t)(1+u~(t)C1)+u~(t)C1\displaystyle\lesssim\|\tilde{u}^{\prime}(t)\|_{\infty}(1+\|\tilde{u}(t)\|_{C^{1}})+\|\tilde{u}^{\prime}(t)\|_{C^{1}}
supθHγ¯B[uθ(t)(1+uθ(t)C1)+uθ(t)C1]c<,\displaystyle\lesssim\sup_{\|\theta\|_{H^{\bar{\gamma}}\leq B}}[\|u_{\theta}^{\prime}(t)\|_{\infty}(1+\|u_{\theta}(t)\|_{C^{1}})+\|u_{\theta}^{\prime}(t)\|_{C^{1}}]\leq c<\infty,

using (91) and the Sobolev imbedding Hγ¯2C1H^{\bar{\gamma}-2}\subset C^{1}.

The preceding argument (99) is only a ‘pseudo’-linearisation argument since the potential VV featuring in the linear PDE still depends on θ,ϑ\theta,\vartheta via u~\tilde{u}. The following theorem derives the full linearisation of the map 𝒢\mathscr{G} from (96) near a fixed point θ0\theta_{0}.

Theorem 7.

Let fCc()f\in C^{\infty}_{c}(\mathbb{R}) and θ0C\theta_{0}\in C^{\infty} be fixed and let hHγ¯,γ¯>max(1,d/2),d3,h\in H^{\bar{\gamma}},\bar{\gamma}>\max(1,d/2),d\leq 3, be such that θ0+hHγ¯B.\|\theta_{0}+h\|_{H^{\bar{\gamma}}}\leq B. Denote by uθ0,uθ0+hL([0,T],Hγ¯)u_{\theta_{0}},u_{\theta_{0}+h}\in L^{\infty}([0,T],H^{\bar{\gamma}}) the strong solutions of (3.1) from Proposition 4 with initial conditions θ0,θ0+h\theta_{0},\theta_{0}+h, respectively. Then we have for some constant C=C(B,γ¯,T)>0C=C(B,\bar{\gamma},T)>0 that

sup0<t<Tuθ0+h(t)uθ0(t)Uf,θ0,h(t)L2ChL22\sup_{0<t<T}\|u_{\theta_{0}+h}(t)-u_{\theta_{0}}(t)-U_{f,\theta_{0},h}(t)\|_{L^{2}}\leq C\|h\|^{2}_{L^{2}} (101)

where

Uh=Uf,θ0,h=:𝕀θ0[h]U_{h}=U_{f,\theta_{0},h}=:\mathbb{I}_{\theta_{0}}[h]

is the unique solution to the time-dependent linear Schrödinger equation (3.2) with m=0m=0, initial condition hh, and potential V(t,)=f(uθ0(t,))C1,V(t,\cdot)=f^{\prime}(u_{\theta_{0}}(t,\cdot))\in C^{1,\infty} from (12).

Proof.

Let us denote the error in the linear approximation by

r(t)=uθ0+h(t)uθ0(t)Uf,θ0,h(t)r(t)=u_{\theta_{0}+h}(t)-u_{\theta_{0}}(t)-U_{f,\theta_{0},h}(t)

and notice that rr solves the linear PDE

trΔrf(uθ0)r=m,where m=f(uθ0+h)f(uθ0)f(uθ0)(uθ0+huθ0),\frac{\partial}{\partial t}r-\Delta r-f^{\prime}(u_{\theta_{0}})r=m,\leavevmode\nobreak\ \leavevmode\nobreak\ \text{where }m=f(u_{\theta_{0}+h})-f(u_{\theta_{0}})-f^{\prime}(u_{\theta_{0}})(u_{\theta_{0}+h}-u_{\theta_{0}}), (102)

with initial condition r(0)=θ0+hθ0h=0r(0)=\theta_{0}+h-\theta_{0}-h=0. To this we can apply the uniform in 0<t<T0<t<T regularity estimate (107) with a=0a=0 and h=0h=0, a second order Taylor expansion of ff, as well as (97) to obtain

sup0<t<Tr(t)L2sup0<t<Tm(t)L2fC2sup0<t<Tuθ0+h(t)uθ0(t)L22hL22,\sup_{0<t<T}\|r(t)\|_{L^{2}}\lesssim\sup_{0<t<T}\|m(t)\|_{L^{2}}\lesssim\|f\|_{C^{2}}\sup_{0<t<T}\|u_{\theta_{0}+h}(t)-u_{\theta_{0}}(t)\|_{L^{2}}^{2}\lesssim\|h\|_{L^{2}}^{2}, (103)

completing the proof of (101), noting also that the stipulated regularity of VV follows from the chain rule and Proposition 4 for smooth θ0\theta_{0}. ∎

We further record here the consequence of Theorem 7 and Proposition 5 that the mapping 𝕀θ0\mathbb{I}_{\theta_{0}} is continuous from L2(Ω)L^{2}(\Omega) to L2([0,T],L2(Ω))L^{2}([0,T],L^{2}(\Omega)). In fact we also have

sup0<t<T𝕀θ0[h](t,)sup0<t<TUh(t,)HζhHζ,ζ>d/2,\sup_{0<t<T}\|\mathbb{I}_{\theta_{0}}[h](t,\cdot)\|_{\infty}\lesssim\sup_{0<t<T}\|U_{h}(t,\cdot)\|_{H^{\zeta}}\lesssim\|h\|_{H^{\zeta}},\leavevmode\nobreak\ \zeta>d/2, (104)

in view of Proposition 5 and the Sobolev imbedding HζLH^{\zeta}\subset L^{\infty}.

3.2 Regularity theory for time-dependent linear Schrödinger equations

For Ω=[0,1]d,d\Omega=[0,1]^{d},d\in\mathbb{N}, we now turn to a detailed study of the solution operator hUhh\mapsto U_{h} of the linear time-dependent periodic Schrödinger equation

tU(t,)ΔU(t,)V(t,)U(t,)\displaystyle\frac{\partial}{\partial t}U(t,\cdot)-\Delta U(t,\cdot)-V(t,\cdot)U(t,\cdot) =m(t,)on (0,T]×Ω,\displaystyle=m(t,\cdot)\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }(0,T]\times\Omega,
U(0,)\displaystyle U(0,\cdot) =hon Ω,\displaystyle=h\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }\Omega, (105)

for a general uniformly bounded potential VC([0,T],C(Ω))V\in C([0,T],C(\Omega)), initial condition hH1(Ω)h\in H^{1}(\Omega), and source term mL2([0,T],L2)m\in L^{2}([0,T],L^{2}). The existence of unique weak solutions

UL2([0,T],H1)C([0,T],L2) with U=(U/t)L2(0,T,H1)U\in L^{2}([0,T],H^{1})\cap C([0,T],L^{2})\text{ with }U^{\prime}=(\partial U/\partial t)\in L^{2}({0,T},H^{-1})

satisfying for all vH1v\in H^{1} and a.e. t[0,T]t\in[0,T] the equations

U,vL2+U,vL2V(t,)U,vL2\displaystyle\langle U^{\prime},v\rangle_{L^{2}}+\langle\nabla U,\nabla v\rangle_{L^{2}}-\langle V(t,\cdot)U,v\rangle_{L^{2}} =m(t,),vL2,\displaystyle=\langle m(t,\cdot),v\rangle_{L^{2}},
U(0)\displaystyle U(0) =h,\displaystyle=h, (106)

can be proved by standard arguments from linear parabolic PDE just as on p.380f. in [15]. The regularity estimate Proposition 5 below gives sufficient conditions on V,mV,m for these to be strong solutions UL2([0,T],H2)C([0,T],H1),UL2([0,T],L2)U\in L^{2}([0,T],H^{2})\cap C([0,T],H^{1}),U^{\prime}\in L^{2}([0,T],L^{2}) whenever hH1h\in H^{1}. The estimates in Proposition 5 for a0a\leq 0 further imply by the usual Galerkin limiting procedure (again as on p.380f. in [15]) that unique solutions UhL2([0,T],Ha+1)C([0,T],Ha)U_{h}\in L^{2}([0,T],H^{a+1})\cap C([0,T],H^{a}) with UL2([0,T],Ha1)U^{\prime}\in L^{2}([0,T],H^{a-1}) also exist for initial conditions hHah\in H^{a} – in this case (3.2) holds in Ha1H^{a-1} in the sense that we are testing against vH1av\in H^{1-a} in (3.2).

We will apply the estimates that follow with V=f(uθ)V=f^{\prime}(u_{\theta}) for uθu_{\theta} the solution of (3.1), as this provides the linearisation of the forward map 𝒢\mathscr{G} derived in Theorem 7. For the verification of Condition 2E) in (100) we need to track the explicit dependence of the constants on θU(γ¯,B)\theta\in U(\bar{\gamma},B) via the C1([0,T],C1)C^{1}([0,T],C^{1})-norm of the potential V=f(uθ)V=f^{\prime}(u_{\theta}). For the verification of Condition 2F) and to prove the key smoothing Lemma 2, we consider potentials VC1,V\in C^{1,\infty} from (12). This is compatible with our choice V=f(uθ0),fCc,θ0CV=f^{\prime}(u_{\theta_{0}}),f\in C^{\infty}_{c},\theta_{0}\in C^{\infty}, in view of Proposition 4.

3.2.1 A parabolic regularity estimate

Proposition 5.

Suppose UhL2([0,T],Ha+1)U_{h}\in L^{2}([0,T],H^{a+1}) with UhL2([0,T],Ha1)U_{h}^{\prime}\in L^{2}([0,T],H^{a-1}) is a weak solution in Ha1H^{a-1} of (3.2) for hHah\in H^{a}, mL2([0,T],Ha)m\in L^{2}([0,T],H^{a}), aa\in\mathbb{R}. Then UhC([0,T],Ha(Ω))U_{h}\in C([0,T],H^{a}(\Omega)). Assume further either i) that a=1a=-1 and BVC1([0,T],C1)B\geq\|V\|_{C^{1}([0,T],C^{1})} for some B>0B>0, or that ii) VC1,V\in C^{1,\infty} from (12). Then we have for all 0<T0T0<T_{0}\leq T,

0T0Uh(t)Ha+12𝑑t+sup0<t<T0Uh(t)Ha2+0T0Uh(t)Ha12𝑑t\displaystyle\int_{0}^{T_{0}}\|U_{h}(t)\|_{H^{a+1}}^{2}dt+\sup_{0<t<T_{0}}\|U_{h}(t)\|^{2}_{H^{a}}+\int_{0}^{T_{0}}\|U_{h}^{\prime}(t)\|^{2}_{H^{a-1}}dt (107)
ChHa2+C0T0m(t)Ha2𝑑t\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leq C\|h\|^{2}_{H^{a}}+C\int_{0}^{T_{0}}\|m(t)\|^{2}_{H^{a}}dt

where CC is a constant that depends on B,f,T,dB,f,T,d in case i) and on a,f,T,d,Va,f,T,d,V in case ii). Moreover,

esssup0<t<T0Uh(t)Ha2ChHa2+C0T0m(t)Ha2𝑑t+Cesssup0<t<T0m(t)Ha22.{\rm{ess}}\sup_{0<t<T_{0}}\|U^{\prime}_{h}(t)\|_{H^{a-2}}\leq C\|h\|^{2}_{H^{a}}+C\int_{0}^{T_{0}}\|m(t)\|^{2}_{H^{a}}dt+C{\rm{ess}}\sup_{0<t<T_{0}}\|m(t)\|_{H^{a-2}}^{2}. (108)
Proof.

We first establish the bounds assuming the solution UhU_{h} is sufficiently regular so that all expressions are well defined. One can then employ a standard Galerkin argument as in Sections 7.2-7.4 in [39] to extend the result to hold in general. Time continuity of solutions into Ha(Ω)H^{a}(\Omega) follows as in Corollary 7.3 in [39].

Let us write U=UhU=U_{h} in this proof, and set T0=TT_{0}=T without loss of generality. It suffices to establish the bound for the equivalent hah^{a}-norms. From (89) we have Uha2=ΔU,Uha\|\nabla U\|^{2}_{h^{a}}=-\langle\Delta U,U\rangle_{h^{a}} for any aa\in\mathbb{R}. Differentiating the squared hah^{a}-norm (13) we have from the Cauchy-Schwarz, Sobolev multiplier (18), and Young inequalities

12ddtUha2+Uha2\displaystyle\frac{1}{2}\frac{d}{dt}\|U\|_{h^{a}}^{2}+\|\nabla U\|^{2}_{h^{a}} =tU,UhaΔU,Uha\displaystyle=\langle\frac{\partial}{\partial t}U,U\rangle_{h^{a}}-\langle\Delta U,U\rangle_{h^{a}} (109)
=VU,Uha+m,Uha\displaystyle=\langle VU,U\rangle_{h^{a}}+\langle m,U\rangle_{h^{a}}
VUhaUha+mhaUha\displaystyle\leq\|VU\|_{h^{a}}\|U\|_{h^{a}}+\|m\|_{h^{a}}\|U\|_{h^{a}}
(1+VC|a|)Uha2+mha2.\displaystyle\lesssim(1+\|V\|_{C^{|a|}})\|U\|_{h^{a}}^{2}+\|m\|_{h^{a}}^{2}.

What precedes implies first that ddtU(t)ha2U(t)ha2+m(t)ha2\frac{d}{dt}\|U(t)\|^{2}_{h^{a}}\lesssim\|U(t)\|^{2}_{h^{a}}+\|m(t)\|_{h^{a}}^{2} for all 0<t<T0<t<T to which we can apply Gronwall’s inequality to deduce

U(t)ha2U(0)ha2+0tm(s)ha2𝑑shha2+0Tm(s)ha2𝑑s, 0<tT.\|U(t)\|^{2}_{h^{a}}\lesssim\|U(0)\|_{h^{a}}^{2}+\int_{0}^{t}\|m(s)\|_{h^{a}}^{2}ds\leq\|h\|_{h^{a}}^{2}+\int_{0}^{T}\|m(s)\|_{h^{a}}^{2}ds,\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ 0<t\leq T.

Then integrating (109) we deduce

0TU(t)ha2𝑑thha2+0Tm(t)ha2𝑑t120TddtUha2hha2+0Tm(t)ha2𝑑t.\int_{0}^{T}\|\nabla U(t)\|_{h^{a}}^{2}dt\lesssim\|h\|_{h^{a}}^{2}+\int_{0}^{T}\|m(t)\|_{h^{a}}^{2}dt-\frac{1}{2}\int_{0}^{T}\frac{d}{dt}\|U\|_{h^{a}}^{2}\lesssim\|h\|_{h^{a}}^{2}+\int_{0}^{T}\|m(t)\|_{h^{a}}^{2}dt.

Since uha+uha\|u\|_{h^{a}}+\|\nabla u\|_{h^{a}} is equivalent to the ha+1\|\cdot\|_{h^{a+1}}-norm, the preceding inequalities in particular bound 0TU(t)ha+12𝑑t\int_{0}^{T}\|U(t)\|^{2}_{h^{a+1}}dt by the r.h.s. in (107). The proof of the first inequality is completed upon noting, using again (17) and the assumption on VV, that for a.e. tt,

U(t)HbΔU(t)Hb+V(t)U(t)Hb+m(t)HbU(t)Hb+2+m(t)Hb,\|U^{\prime}(t)\|_{H^{b}}\lesssim\|\Delta U(t)\|_{H^{b}}+\|V(t)U(t)\|_{H^{b}}+\|m(t)\|_{H^{b}}\lesssim\|U(t)\|_{H^{b+2}}+\|m(t)\|_{H^{b}}, (110)

which applied with b=a1b=a-1 and after integrating combines with the preceding bound for 0TU(t)Ha+1𝑑t\int_{0}^{T}\|U(t)\|_{H^{a+1}}dt to complete the proof of (107). The second inequality then follows similarly from b=a2b=a-2 in the last display and the preceding bound for U(t)Ha\|U(t)\|_{H^{a}}. ∎

3.2.2 Forward smoothing of the semigroup

We now strengthen the preceding estimates for strictly positive times, adapting an argument from p.294 in [39] to the present situation.

Proposition 6.

In the setting of Proposition 5, let m=0m=0 and assume VC1,V\in C^{1,\infty} from (12). For all fixed 0<t0<T0<0<t_{0}<T_{0}<\infty there exists a constant c=c(t0,T0,V,a,d)c=c(t_{0},T_{0},V,a,d) such that

esssupt0tT0Uh(t)Ha1chHa,{\rm{ess}}\sup_{t_{0}\leq t\leq T_{0}}\|U^{\prime}_{h}(t)\|_{H^{a-1}}\leq c\|h\|_{H^{a}}, (111)
supt0tT0Uh(t)Ha+1chHa.\sup_{t_{0}\leq t\leq T_{0}}\|U_{h}(t)\|_{H^{a+1}}\leq c\|h\|_{H^{a}}. (112)
Proof.

Again we prove the required estimates first assuming h,Uh,Uhh,U_{h},U_{h}^{\prime} are smooth and a Galerkin approximation argument just as in the proofs of Propositions 4 and 5 then implies the general result.

It suffices to consider the equivalent hah^{a}-norms. Let us write U=UhU=U_{h} in the proof and define g(t)=tU(t)g(t)=tU^{\prime}(t) for t(0,T]t\in(0,T] with g(0)=0g(0)=0. Differentiate the equation (3.2) with respect to tt and multiply by tg(t)tg(t) to deduce

t2U,(/t)Uha1=t2U,ΔUha1+t2VU+VU,Uha1a.e. on [0,T]\displaystyle\langle t^{2}U^{\prime},(\partial/\partial t)U^{\prime}\rangle_{h^{a-1}}=t^{2}\langle U^{\prime},\Delta U^{\prime}\rangle_{h^{a-1}}+t^{2}\langle V^{\prime}U+VU^{\prime},U^{\prime}\rangle_{h^{a-1}}\leavevmode\nobreak\ a.e.\text{ on }[0,T]

which implies by (89), the Cauchy-Schwarz and Young inequalities, as well as (18) that almost everywhere on [0,T][0,T],

12ddttU(t)ha12tU(t)ha12+t2U(t)ha12cV,T0(U(t)ha12+U(t)ha12)\frac{1}{2}\frac{d}{dt}\|tU^{\prime}(t)\|_{h^{a-1}}^{2}-t\|U^{\prime}(t)\|_{h^{a-1}}^{2}+t^{2}\|\nabla U^{\prime}(t)\|_{h^{a-1}}^{2}\leq c_{V,T_{0}}(\|U^{\prime}(t)\|^{2}_{h^{a-1}}+\|U(t)\|_{h^{a-1}}^{2})

for some finite constant cV,T0c_{V,T_{0}} depending on VC1([0,T],C|a1|)<\|V\|_{C^{1}([0,T],C^{|a-1|})}<\infty. Therefore, integrating the penultimate identity over (0,t0)(0,t_{0}), any t0>0,t_{0}>0, and using g(0)=0g(0)=0 as well as Proposition 5,

t0U(t0)ha12+0t0t2Uha12𝑑t\displaystyle\|t_{0}U^{\prime}(t_{0})\|_{h^{a-1}}^{2}+\int_{0}^{t_{0}}t^{2}\|\nabla U^{\prime}\|_{h^{a-1}}^{2}dt
0t0[(cV,T0+t)U(t)ha12+cV,T0U(t)ha12]𝑑thha2\displaystyle\leq\leavevmode\nobreak\ \leavevmode\nobreak\ \int_{0}^{t_{0}}\big{[}(c_{V,T_{0}}+t)\|U^{\prime}(t)\|^{2}_{h^{a-1}}+c_{V,T_{0}}\|U(t)\|^{2}_{h^{a-1}}\big{]}dt\lesssim\|h\|_{h^{a}}^{2}

which provides the required bound on U(t0)Ha1\|U^{\prime}(t_{0})\|_{H^{a-1}} after dividing by t0>0t_{0}>0, and proves (111) since t0t_{0} was arbitrary. For the final estimate we notice that for a.e. t[t0,T0]t\in[t_{0},T_{0}],

ΔU(t)=U(t)V(t)U(t)\Delta U(t)=U^{\prime}(t)-V(t)U(t)

and hence, using the standard elliptic regularity estimate uha+1(IdΔ)uha1\|u\|_{h^{a+1}}\lesssim\|(Id-\Delta)u\|_{h^{a-1}} (e.g., via (13) and (89)), (111) and again Proposition 5 we obtain for almost all tt0t\geq t_{0},

Uh(t)ha+1ΔUh(t)ha1+Uh(t)ha1Uh(t)ha1+Uh(t)ha1hha.\|U_{h}(t)\|_{h^{a+1}}\lesssim\|\Delta U_{h}(t)\|_{h^{a-1}}+\|U_{h}(t)\|_{h^{a-1}}\lesssim\|U_{h}^{\prime}(t)\|_{h^{a-1}}+\|U_{h}(t)\|_{h^{a-1}}\lesssim\|h\|_{h^{a}}. (113)

Since UhC([0,T],Ha)U_{h}\in C([0,T],H^{a}) this inequality can in fact be shown to hold everywhere on [t0,T0][t_{0},T_{0}] (e.g., as in Lemma 11.2 in [39]). ∎

Using the above proposition iteratively allows by a bootstrap argument to show that the mapping hUh(t0)h\to U_{h}(t_{0}) maps any HaH^{a} into any HbH^{b} space, reflecting the smoothing nature of the semigroup action at strictly positive times t>0t>0.

Lemma 2.

In the setting of Proposition 5, assume VV lies in C1,C^{1,\infty} from (12). Let tmin>0t_{\min}>0, and consider any real numbers b>ab>a. Then

suptmintTUh(t)HbchHa,\sup_{t_{\min}\leq t\leq T}\|U_{h}(t)\|_{H^{b}}\leq c\|h\|_{H^{a}}, (114)

for some constant c=c(tmin,T,V,a,b,d)>0c=c(t_{\min},T,V,a,b,d)>0.

Proof.

Fix a small 0<ϵ<tmin0<\epsilon<t_{\min} and dissect [ϵ,tmin][\epsilon,t_{\min}] into at least bab-a-many points

ϵl=ϵ+lAtmin(tminϵ),l=0,,Atmin,\epsilon_{l}=\epsilon+\frac{l}{At_{\min}}(t_{\min}-\epsilon),\leavevmode\nobreak\ \leavevmode\nobreak\ l=0,\dots,At_{\min},

for appropriate A=A(a,b,tmin)>0A=A(a,b,t_{\min})>0. We use (112) with t0=ϵlt_{0}=\epsilon_{l} and for equation (3.2) started at time ϵl1\epsilon_{l-1} to obtain the chain of inequalities

Uh(ϵl)HbUh(ϵl1)Hb1,l1,\|U_{h}(\epsilon_{l})\|_{H^{b}}\lesssim\|U_{h}(\epsilon_{l-1})\|_{H^{b-1}},\leavevmode\nobreak\ \leavevmode\nobreak\ l\geq 1, (115)

which can be iterated to obtain for any ttmint\geq t_{min} that

Uh(t)HbUh(tmin)HbUh(ϵ)HahHa,\|U_{h}(t)\|_{H^{b}}\lesssim\|U_{h}(t_{\min})\|_{H^{b}}\lesssim\|U_{h}(\epsilon)\|_{H^{a}}\lesssim\|h\|_{H^{a}},

where we have also used Proposition 5. ∎

3.2.3 Constant approximation of the potential

For a potential VC1([0,T],Cb),b1V\in C^{1}([0,T],C^{b}),b\geq 1, and ε>0\varepsilon>0, define

V(ε)(t,)=V(0,),fort[0,ε/2],V(ε)(t,)=V(t,),fortε,V^{(\varepsilon)}(t,\cdot)=V(0,\cdot),\leavevmode\nobreak\ \text{for}\leavevmode\nobreak\ t\in[0,\varepsilon/2],\leavevmode\nobreak\ \leavevmode\nobreak\ V^{(\varepsilon)}(t,\cdot)=V(t,\cdot),\leavevmode\nobreak\ \text{for}\leavevmode\nobreak\ t\geq\varepsilon,

and linear in between

V(ε)(t,)=(1t(ε/2)ε/2)V(0,)+t(ε/2)ε/2V(ε,),t[ε/2,ε].V^{(\varepsilon)}(t,\cdot)=\Big{(}1-\frac{t-(\varepsilon/2)}{\varepsilon/2}\Big{)}V(0,\cdot)+\frac{t-(\varepsilon/2)}{\varepsilon/2}V(\varepsilon,\cdot),\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ t\in[\varepsilon/2,\varepsilon].

This function lies again in C1([0,T],Cb)C^{1}([0,T],C^{b}), with norm bounded by at most a constant multiple of VC1([0,T],Cb)\|V\|_{C^{1}([0,T],C^{b})}. In particular if VC1,V\in C^{1,\infty} then V(ε)V^{(\varepsilon)} also lies in C1,C^{1,\infty}.

Proposition 7.

Assume either i) that a=1a=-1 and BVC1([0,T],C1)B\geq\|V\|_{C^{1}([0,T],C^{1})} for some B>0B>0, or that ii) aa\in\mathbb{R} and VC1,V\in C^{1,\infty} from (12). Suppose Uh,Uh(ε)U_{h},U_{h}^{(\varepsilon)} are weak solutions to (3.2) as in Proposition 5 with potentials VV and V(ε)V^{(\varepsilon)} respectively, both with initial condition hHah\in H^{a} and source m=0m=0. Then there exists a constant c>0c>0 depending on T,B,dT,B,d in case i) and on a,T,V,da,T,V,d in case ii) such that for all ε>0,0<T0T,\varepsilon>0,0<T_{0}\leq T, we have

0T0Uh(t)Uh(ε)(t)Ha+12𝑑tcT0ε2hHa2.\int_{0}^{T_{0}}\|U_{h}(t)-U^{(\varepsilon)}_{h}(t)\|_{H^{a+1}}^{2}dt\leq cT_{0}\varepsilon^{2}\|h\|^{2}_{H^{a}}.

In case ii) we further have, for some constant c=c(a,T,V,d)>0c=c(a,T,V,d)>0,

0T0Uh(t)Uh(ε)(t)Ha+12𝑑tcε2hHa12\int_{0}^{T_{0}}\|U_{h}(t)-U^{(\varepsilon)}_{h}(t)\|_{H^{a+1}}^{2}dt\leq c\varepsilon^{2}\|h\|^{2}_{H^{a-1}}
Proof.

The function v¯h=UhUh(ε)\bar{v}_{h}=U_{h}-U_{h}^{(\varepsilon)} solves on (0,T]×Ω(0,T]\times\Omega the equation

(tΔV(ε))v¯=(VV(ε))Uhm\Big{(}\frac{\partial}{\partial t}-\Delta-V^{(\varepsilon)}\Big{)}\bar{v}=(V-V^{(\varepsilon)})U_{h}\equiv m

with v¯(0,)=hh=0\bar{v}(0,\cdot)=h-h=0. From the regularity estimate (107) we obtain

0T0v¯(s)ha+12𝑑s0T0m(t)ha2𝑑t.\int_{0}^{T_{0}}\|\bar{v}(s)\|_{h^{a+1}}^{2}ds\lesssim\int_{0}^{T_{0}}\|m(t)\|_{h^{a}}^{2}dt.

Next, since V=V(ε)V=V^{(\varepsilon)} outside of [0,ε][0,\varepsilon], using Proposition 7.1 in [39], Jensen’s inequality, (18) and again (107) we further bound the r.h.s. by

0min(ε,T0)(V(t)V(0)+V(ε)(0)V(ε)(t))Uh(t)ha2𝑑t\displaystyle\int_{0}^{\min(\varepsilon,T_{0})}\|(V(t)-V(0)+V^{(\varepsilon)}(0)-V^{(\varepsilon)}(t))U_{h}(t)\|_{h^{a}}^{2}dt
=0min(ε,T0)t21t0t(V(s)(V(ε))(s))𝑑sUh(t)ha2𝑑t\displaystyle=\int_{0}^{\min(\varepsilon,T_{0})}t^{2}\big{\|}\frac{1}{t}\int_{0}^{t}(V^{\prime}(s)-(V^{(\varepsilon)})^{\prime}(s))dsU_{h}(t)\big{\|}_{h^{a}}^{2}dt
0min(ε,T0)t0tV(s)(V(ε))(s)C|a|2𝑑sUh(t)ha2𝑑t\displaystyle\lesssim\int_{0}^{\min(\varepsilon,T_{0})}t\int_{0}^{t}\big{\|}V^{\prime}(s)-(V^{(\varepsilon)})^{\prime}(s)\|^{2}_{C^{|a|}}ds\|U_{h}(t)\big{\|}_{h^{a}}^{2}dt
ε2esssup0<s<ε(V(s)C|a|+(V(ε))(s)C|a|)0T0Uh(t)ha2𝑑t\displaystyle\lesssim\varepsilon^{2}\leavevmode\nobreak\ {\rm{ess}}\sup_{0<s<\varepsilon}(\|V^{\prime}(s)\|_{C^{|a|}}+\|(V^{(\varepsilon)})^{\prime}(s)\|_{C^{|a|}})\int_{0}^{T_{0}}\|U_{h}(t)\|^{2}_{h^{a}}dt
ε2T0sup0<t<T0Uh(t)ha2T0ε2hha2.\displaystyle\lesssim\varepsilon^{2}T_{0}\sup_{0<t<T_{0}}\|U_{h}(t)\|_{h^{a}}^{2}\lesssim T_{0}\varepsilon^{2}\|h\|_{h^{a}}^{2}.

This proves the first inequality of the proposition. For the second we follow the same arguments but in the last step bound

0T0Uh(t)Ha2𝑑thHa12\int_{0}^{T_{0}}\|U_{h}(t)\|^{2}_{H^{a}}dt\lesssim\|h\|_{H^{a-1}}^{2}

directly from (107) with a1a-1 in place of aa. ∎

3.3 Spectral results for Schrödinger operators

The solutions Uh(ε)U_{h}^{(\varepsilon)} of (3.2) with potential V(ε)V^{(\varepsilon)} constant in time on [0,ε/2][0,\varepsilon/2] from Proposition 7 can be studied via the theory of elliptic Schrödinger operators which we briefly review here for convenience: Define

𝒮W=ΔW,with bounded potentialW:Ω,WW¯.\mathscr{S}_{W}=\Delta-W,\leavevmode\nobreak\ \textit{with bounded potential}\leavevmode\nobreak\ \leavevmode\nobreak\ W:\Omega\to\mathbb{R},\leavevmode\nobreak\ \leavevmode\nobreak\ \|W\|_{\infty}\leq\bar{W}. (116)
Lemma 3.

The kernel of 𝒮W\mathscr{S}_{W} given by

𝒦=𝒦W={ϕH1(Ω):ΔϕWϕ=0}\mathcal{K}=\mathcal{K}_{W}=\{\phi\in H^{1}(\Omega):\Delta\phi-W\phi=0\} (117)

is finite-dimensional.

Proof.

Suppose ϕ\phi lies in the kernel 𝒦\mathcal{K} so that Δϕ=Wϕ\Delta\phi=W\phi and define A=(IdΔ)=j=0(1+λj)ejej,L2A=(Id-\Delta)=\sum_{j=0}^{\infty}(1+\lambda_{j})e_{j}\langle e_{j},\cdot\rangle_{L^{2}} with ej,λje_{j},\lambda_{j} as in (14), so that the inverse A1A^{-1} is a compact linear operator on H1(Ω)L2(Ω)H^{-1}(\Omega)\supset L^{2}(\Omega). Then Aϕ=(IdΔ)ϕ=ϕWϕA\phi=(Id-\Delta)\phi=\phi-W\phi or equivalently

[IdA1(IdWId)]ϕ=0.[Id-A^{-1}(Id-W\cdot Id)]\phi=0. (118)

But for WLW\in L^{\infty} the operator KA1(IdWId)K\equiv A^{-1}(Id-W\cdot Id) is a composition of a compact and a continuous linear operator on L2L^{2}, hence itself compact, and we deduce that IdKId-K is Fredholm (Proposition A.7.1 on p.593 in [42]) so that the kernel of ϕ\phi’s for which (118) holds is necessarily finite-dimensional, proving the lemma. ∎

Now consider a new Schrödinger operator 𝒮W+\mathscr{S}_{W_{+}} with shifted potential

W+(x):=W(x)+W¯+1,xΩ,W_{+}(x):=W(x)+\bar{W}+1,\leavevmode\nobreak\ x\in\Omega, (119)

so that infxΩW+(x)1>0\inf_{x\in\Omega}W_{+}(x)\geq 1>0. Then we have from the divergence theorem

𝒮W+u,uL2=uH12+W+u,uL2uH12,uH1(Ω),\langle-\mathscr{S}_{W_{+}}u,u\rangle_{L^{2}}=\|\nabla u\|_{H^{1}}^{2}+\langle W_{+}u,u\rangle_{L^{2}}\simeq\|u\|_{H^{1}}^{2},\leavevmode\nobreak\ \leavevmode\nobreak\ u\in H^{1}(\Omega), (120)

hence by (16), 𝒮W+uH1=0\|\mathscr{S}_{W_{+}}u\|_{H^{-1}}=0 implies uH1=0\|u\|_{H^{1}}=0 and 𝒮W+\mathscr{S}_{W_{+}} is injective from H1H^{1} to H1H^{-1}. It is also surjective, for if it is not, then there exists a (nonzero) linear functional u0(H1)=H1u_{0}\in(H^{-1})^{*}=H^{1} that is orthogonal to the range of 𝒮W+\mathscr{S}_{W_{+}}, i.e., such that 𝒮W+u,u0L2=0\langle\mathscr{S}_{W_{+}}u,u_{0}\rangle_{L^{2}}=0 for all uH1u\in H^{1}. But testing u=u0u=u_{0} implies u0=0u_{0}=0, a contradiction to (120). Thus the operator 𝒮W+\mathscr{S}_{W_{+}} has an inverse 𝒮W+1\mathscr{S}_{W_{+}}^{-1} defining a compact linear operator on L2H1L^{2}\subset H^{-1} that is also self-adjoint since the divergence theorem implies that 𝒮W+u,vL2=u,𝒮W+vL2\langle\mathscr{S}_{W_{+}}u,v\rangle_{L^{2}}=\langle u,\mathscr{S}_{W_{+}}v\rangle_{L^{2}} and then

𝒮W+1ϕ,ψL2=𝒮W+1ϕ,𝒮W+𝒮W+1ψL2=ϕ,𝒮W+1ψL2.\langle\mathscr{S}^{-1}_{W_{+}}\phi,\psi\rangle_{L^{2}}=\langle\mathscr{S}^{-1}_{W_{+}}\phi,\mathscr{S}_{W_{+}}\mathscr{S}_{W_{+}}^{-1}\psi\rangle_{L^{2}}=\langle\phi,\mathscr{S}_{W_{+}}^{-1}\psi\rangle_{L^{2}}. (121)

The spectral theorem now implies the existence of eigen-pairs

(ej,W,λj,W+)H1(Ω),j=0,1,2,,(e_{j,W},\lambda_{j,W_{+}})\in H^{1}(\Omega)\cap\mathbb{R},\leavevmode\nobreak\ \leavevmode\nobreak\ j=0,1,2,\dots,

of 𝒮W+-\mathscr{S}_{W_{+}} that form an orthonormal basis of the Hilbert space L2(Ω)L^{2}(\Omega). Taking uL2=1\|u\|_{L^{2}}=1 in (120), the variational characterisation of eigenvalues (e.g., Sec. 4.5 in [12]) and (14) imply that

λj,W+[λj+1,λj+2W¯+1],j0, 0λjj2/d.\lambda_{j,W_{+}}\in[\lambda_{j}+1,\lambda_{j}+2\bar{W}+1],\leavevmode\nobreak\ \leavevmode\nobreak\ j\geq 0,\leavevmode\nobreak\ \leavevmode\nobreak\ 0\leq\lambda_{j}\simeq j^{2/d}. (122)

Thus we arrive at the spectral formulae, for ψH1,ϕL2\psi\in H^{1},\phi\in L^{2},

𝒮W+ψ=jλj,W+ej,Wψ,ej,WL2,𝒮W+1ϕ=jλj,W+1ej,Wϕ,ej,WL2.-\mathscr{S}_{W_{+}}\psi=\sum_{j}\lambda_{j,W_{+}}e_{j,W}\langle\psi,e_{j,W}\rangle_{L^{2}},\leavevmode\nobreak\ \leavevmode\nobreak\ -\mathscr{S}^{-1}_{W_{+}}\phi=\sum_{j}\lambda^{-1}_{j,W_{+}}e_{j,W}\langle\phi,e_{j,W}\rangle_{L^{2}}.

Now let us return to the original operator 𝒮W\mathscr{S}_{W} without shift. If ϕ\phi lies in the kernel 𝒦W\mathcal{K}_{W} from (117), then 𝒮Wϕ=0\mathscr{S}_{W}\phi=0 and so 𝒮W+ϕ=(W¯+1)ϕ\mathscr{S}_{W_{+}}\phi=(\bar{W}+1)\phi, hence if e0,k,k=1,,dim(𝒦W),e_{0,k},k=1,\dots,dim(\mathcal{K}_{W}), is any L2(Ω)L^{2}(\Omega)-orthonormal basis of 𝒦W\mathcal{K}_{W}, then these are eigenfunctions of 𝒮W+\mathscr{S}_{W_{+}} for the eigenvalue λW+=W¯+1\lambda_{W_{+}}=\bar{W}+1. Generally, any eigenfunction ej,We_{j,W} satisfies

(𝒮WW¯1)ej,W=λj,W+ej,W,j0,(\mathscr{S}_{W}-\bar{W}-1)e_{j,W}=\lambda_{j,W_{+}}e_{j,W},\leavevmode\nobreak\ \leavevmode\nobreak\ j\geq 0,

and hence the operator 𝒮W-\mathscr{S}_{W} has the same eigenfunctions ej,We_{j,W} but for eigenvalues

λj,W=λj,W+W¯1[λjW¯,λj,W¯].\lambda_{j,W}=\lambda_{j,W_{+}}-\bar{W}-1\in[\lambda_{j}-\bar{W},\lambda_{j},\bar{W}]. (123)

If ψH1,ϕL2\psi\in H^{1},\phi\in L^{2} also lie in the orthogonal complement L𝒦2L2(Ω)𝒦W,L^{2}_{\mathcal{K}^{\perp}}\equiv L^{2}(\Omega)\ominus\mathcal{K}_{W}, we obtain

𝒮Wψ=j1λj,Wej,Wψ,ej,WL2,𝒮W1ϕ=j1λj,W1ej,Wϕ,ej,WL2.\mathscr{S}_{W}\psi=-\sum_{j\geq 1}\lambda_{j,W}e_{j,W}\langle\psi,e_{j,W}\rangle_{L^{2}},\leavevmode\nobreak\ \leavevmode\nobreak\ \mathscr{S}^{-1}_{W}\phi=-\sum_{j\geq 1}\lambda^{-1}_{j,W}e_{j,W}\langle\phi,e_{j,W}\rangle_{L^{2}}.

In particular we can represent periodic weak solutions (as in (3.2)) w=wh=wh,Ww=w_{h}=w_{h,W} to the linear parabolic PDE with Schrödinger operator 𝒮W=ΔW\mathscr{S}_{W}=\Delta-W,

tw(t,)𝒮Ww(t,)\displaystyle\frac{\partial}{\partial t}w(t,\cdot)-\mathscr{S}_{W}w(t,\cdot) =0on (0,T0]×Ω,\displaystyle=0\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }(0,T_{0}]\times\Omega,
w(0,)\displaystyle w(0,\cdot) =hon Ω\displaystyle=h\leavevmode\nobreak\ \leavevmode\nobreak\ \text{on }\Omega (124)

for any 0<T0T0<T_{0}\leq T and hH1h\in H^{1} by the formula

wW,h(t)=jetλj,Wej,Wej,W,hL2=Ωpt,W(x,y)h(y)𝑑y, 0<tT0,w_{W,h}(t)=\sum_{j}e^{-t\lambda_{j,W}}e_{j,W}\langle e_{j,W},h\rangle_{L^{2}}=\int_{\Omega}p_{t,W}(x,y)h(y)dy,\leavevmode\nobreak\ 0<t\leq T_{0}, (125)

with symmetric Green kernel

pt,W(x,y)=pt,W(y,x)=jetλj,Wej,W(x)ej,W(y).p_{t,W}(x,y)=p_{t,W}(y,x)=\sum_{j}e^{-t\lambda_{j,W}}e_{j,W}(x)e_{j,W}(y). (126)

In the above, the sum extends also over the basis functions ej,We0,ke_{j,W}\equiv e_{0,k} of the finite-dimensional linear subspace 𝒦W\mathcal{K}_{W} of H1H^{1}, with eigenvalues λj,W=0\lambda_{j,W}=0, where the semigroup acts just as the identity operation.

3.3.1 An auxiliary Sobolev scale

In the proof of the key stability estimate Lemma 4 to follow, we exploit the structure of function spaces defines spectrally from 𝒮W+\mathscr{S}_{W_{+}} as follows:

H~Wa(Ω){h:j0λj,W+ah,ej,WL22hH~Wa2<},a.\tilde{H}^{a}_{W}(\Omega)\equiv\left\{h:\sum_{j\geq 0}\lambda^{a}_{j,W_{+}}\langle h,e_{j,W}\rangle_{L^{2}}^{2}\equiv\|h\|_{\tilde{H}^{a}_{W}}^{2}<\infty\right\},\leavevmode\nobreak\ a\in\mathbb{R}. (127)

By Parseval’s identity we have H~W0=L2\tilde{H}^{0}_{W}=L^{2} for all WW. Moreover for uH1u\in H^{1}, and with constants implicit in \simeq depending only on W¯\bar{W},

uH12\displaystyle\|u\|^{2}_{H^{1}} uL22+uL22(W+)1/2uL22Δu,uL2\displaystyle\simeq\|u\|^{2}_{L^{2}}+\|\nabla u\|_{L^{2}}^{2}\simeq\|(W_{+})^{1/2}u\|_{L^{2}}^{2}-\langle\Delta u,u\rangle_{L^{2}}
=ΔuW+u,uL2=jλj,W+u,ej,WL22uH~W12,\displaystyle=-\langle\Delta u-W_{+}u,u\rangle_{L^{2}}=\sum_{j}\lambda_{j,W_{+}}\langle u,e_{j,W}\rangle_{L^{2}}^{2}\simeq\|u\|^{2}_{\tilde{H}^{1}_{W}}, (128)

so that H1=H~W1H^{1}=\tilde{H}^{1}_{W} with equivalent norms. By duality one shows further that H~W1=H1\tilde{H}_{W}^{-1}=H^{-1} with equivalent norms – below we shall only need

uH1\displaystyle\|u\|_{H^{-1}} =supϕH11|Ωϕu|=supϕH11|jλj,W+1/2ϕ,ej,WL2ej,W,uL2λj,W+1/2|uH~W1\displaystyle=\sup_{\|\phi\|_{H^{1}}\leq 1}\Big{|}\int_{\Omega}\phi u\Big{|}=\sup_{\|\phi\|_{H^{1}}\leq 1}\Big{|}\sum_{j}\lambda^{1/2}_{j,W_{+}}\langle\phi,e_{j,W}\rangle_{L^{2}}\langle e_{j,W},u\rangle_{L^{2}}\lambda_{j,W_{+}}^{-1/2}\Big{|}\lesssim\|u\|_{\tilde{H}^{-1}_{W}} (129)

for all uH1u\in H^{1}, which follows from (16), Parseval’s identity, the Cauchy-Schwarz inequality and (3.3.1).

The final facts we need below are the following: We have

uH~W2ΔuW+uL2uH2u\in\tilde{H}^{2}_{W}\iff\Delta u-W_{+}u\in L^{2}\iff u\in H^{2}

and the norms are equivalent: On the one hand

uH~W2=(ΔW+)uL2ΔuL2+uL2uH2\|u\|_{\tilde{H}_{W}^{2}}=\|(\Delta-W_{+})u\|_{L^{2}}\lesssim\|\Delta u\|_{L^{2}}+\|u\|_{L^{2}}\simeq\|u\|_{H^{2}}

and conversely

uH2ΔuL2+uL2(ΔW+)uL2+uL2uH~W2.\|u\|_{H^{2}}\simeq\|\Delta u\|_{L^{2}}+\|u\|_{L^{2}}\lesssim\|(\Delta-W_{+})u\|_{L^{2}}+\|u\|_{L^{2}}\lesssim\|u\|_{\tilde{H}^{2}_{W}}.

In particular for hH2h\in H^{2} so that 𝒮W+hL2\mathscr{S}_{W_{+}}h\in L^{2}, the series jej,Wh,ej,WL2\sum_{j}e_{j,W}\langle h,e_{j,W}\rangle_{L^{2}} converges in H~W2\tilde{H}_{W}^{2} and for d3d\leq 3 also uniformly on Ω\Omega, since its partial sums hJh_{J} satisfy, by the Sobolev imbedding, as JJ\to\infty

hhJ2\displaystyle\|h-h_{J}\|^{2}_{\infty} hhJH22hhJH~W22\displaystyle\lesssim\|h-h_{J}\|^{2}_{H^{2}}\simeq\|h-h_{J}\|^{2}_{\tilde{H}^{2}_{W}}
=j>JλJ,W+2h,ej,WL22=j>J𝒮W+h,ej,WL220.\displaystyle=\sum_{j>J}\lambda^{2}_{J,W_{+}}\langle h,e_{j,W}\rangle_{L^{2}}^{2}=\sum_{j>J}\langle\mathscr{S}_{W_{+}}h,e_{j,W}\rangle_{L^{2}}^{2}\to 0. (130)

3.3.2 Lipschitz stability of the integrated parabolic flow

The following lemma exploits the preceding results and is at the heart of the key injectivity results for 𝒢,𝕀θ\mathscr{G},\mathbb{I}_{\theta} relevant in Condition 2.

Lemma 4.

Let Uh(t)L2([0,T],L2(Ω))U_{h}(t)\in L^{2}([0,T],L^{2}(\Omega)) be a solution of (3.2) for source m=0m=0, initial condition hH1h\in H^{1}, and potential VV such that MVC1([0,T],C1)M\geq\|V\|_{C^{1}([0,T],C^{1})}. For every T>0T>0 the exists a constant c=c(M,T,d)c=c(M,T,d) such that

0TUh(s)L2(Ω)2𝑑schH12,hH1.\int_{0}^{T}\|U_{h}(s)\|^{2}_{L^{2}(\Omega)}ds\geq c\|h\|_{H^{-1}}^{2},\leavevmode\nobreak\ \leavevmode\nobreak\ \forall h\in H^{1}. (131)
Proof.

The solution Uh(ε)U_{h}^{(\varepsilon)} for hH1h\in H^{1} and bounded potential V(ε)V^{(\varepsilon)} can be represented via (125) on (0,ε/2](0,\varepsilon/2] since V(ε)=V(0,)WV^{(\varepsilon)}=V(0,\cdot)\equiv-W is time-independent there. From Proposition 7 with a=1a=-1 and with T0=ε/2T_{0}=\varepsilon/2 we deduce, using also Parsevals’ identity

0TUh(s)L22𝑑s\displaystyle\int_{0}^{T}\|U_{h}(s)\|^{2}_{L^{2}}ds 0ε/2Uh(s)L22𝑑s\displaystyle\geq\int_{0}^{\varepsilon/2}\|U_{h}(s)\|^{2}_{L^{2}}ds
0ε/2Uh(ε)(s)L22𝑑s0ε/2Uh(ε)(s)Uh(s)L22𝑑s\displaystyle\geq\int_{0}^{\varepsilon/2}\|U^{(\varepsilon)}_{h}(s)\|^{2}_{L^{2}}ds-\int_{0}^{\varepsilon/2}\|U^{(\varepsilon)}_{h}(s)-U_{h}(s)\|^{2}_{L^{2}}ds
0ε/2je2sλj,Wej,W,hL22dscM4ε3hH12,\displaystyle\geq\int_{0}^{\varepsilon/2}\sum_{j}e^{-2s\lambda_{j,W}}\langle e_{j,W},h\rangle^{2}_{L^{2}}ds-\frac{c_{M}}{4}\varepsilon^{3}\|h\|_{H^{-1}}^{2},

For eigenvalues λj,W1\lambda_{j,W}\leq 1 and recalling W¯+1+λj,W>1\bar{W}+1+\lambda_{j,W}>1 in view of (122), (123) we see

0ε/2e2sλj,W𝑑s0ε/2e2s𝑑s=12(1eε)121eεW¯+λj,W+1.\int_{0}^{\varepsilon/2}e^{-2s\lambda_{j,W}}ds\geq\int_{0}^{\varepsilon/2}e^{-2s}ds=\frac{1}{2}(1-e^{-\varepsilon})\geq\frac{1}{2}\frac{1-e^{-\varepsilon}}{\bar{W}+\lambda_{j,W}+1}.

For large eigenvalues λj,W>1\lambda_{j,W}>1 we have the estimate

0ε/2e2sλj,W𝑑s=12λj,W(1eελj,W)121eεW¯+λj,W+1.\int_{0}^{\varepsilon/2}e^{-2s\lambda_{j,W}}ds=\frac{1}{2\lambda_{j,W}}(1-e^{-\varepsilon\lambda_{j,W}})\geq\frac{1}{2}\frac{1-e^{-\varepsilon}}{\bar{W}+\lambda_{j,W}+1}.

Combining these and integrating term-wise, we obtain the bound

j0ε/2e2sλj,Wej,W,hL22𝑑s12(1eε)hH~W12b¯2(1eε)hH12,\sum_{j}\int_{0}^{\varepsilon/2}e^{-2s\lambda_{j,W}}\langle e_{j,W},h\rangle^{2}_{L^{2}}ds\geq\frac{1}{2}(1-e^{-\varepsilon})\|h\|_{\tilde{H}_{W}^{-1}}^{2}\geq\frac{\bar{b}}{2}(1-e^{-\varepsilon})\|h\|_{H^{-1}}^{2}, (132)

for constant b¯\bar{b} from (129) depending only on MM. Then choosing ε\varepsilon small enough s.t.

b¯2(1eε)cM4ε3>c′′>0,\frac{\bar{b}}{2}(1-e^{-\varepsilon})-\frac{c_{M}}{4}\varepsilon^{3}>c^{\prime\prime}>0,

possible since (1eε)/ε1(1-e^{-\varepsilon})/\varepsilon\to 1 as ε0\varepsilon\to 0, we obtain

0TUh(s)L22c′′hH12,\int_{0}^{T}\|U_{h}(s)\|^{2}_{L^{2}}\geq c^{\prime\prime}\|h\|_{H^{-1}}^{2},

completing the proof of part a). ∎

3.4 The information operator and its inverse

In this subsection, the parameter θ0C\theta_{0}\in C^{\infty} is a ground truth initial condition for the reaction diffusion equation considered in Theorem 1. As θ0\theta_{0} will be fixed throughout and no other values of θ\theta will be considered, we will write θ=θ0\theta=\theta_{0} to ease notation. By Theorem 7 and Proposition 5

𝕀θh=D𝒢θ[h]=Uh(t):L2(Ω)L2(𝒳),L2(𝒳)=L2([0,T],L2(Ω)),\mathbb{I}_{\theta}h=D\mathscr{G}_{\theta}[h]=U_{h}(t):L^{2}(\Omega)\to L^{2}(\mathcal{X}),\leavevmode\nobreak\ \leavevmode\nobreak\ L^{2}(\mathcal{X})=L^{2}([0,T],L^{2}(\Omega)),

is the continuous linear operator solving the PDE (3.2) with m=0m=0, initial condition hh, and potential V=f(uθ(t,))V=f^{\prime}(u_{\theta}(t,\cdot)) which, in view of Proposition 4, lies in C1,C^{1,\infty} from (12). In particular 𝕀θ\mathbb{I}_{\theta} has a continuous and linear adjoint operator 𝕀θ:L2(𝒳)L2(Ω)\mathbb{I}_{\theta}^{*}:L^{2}(\mathcal{X})\to L^{2}(\Omega) such that for all GL2(𝒳),hL2(Ω)G\in L^{2}(\mathcal{X}),h\in L^{2}(\Omega), we have 𝕀θh,GL2(𝒳)=h,𝕀θGL2(Ω).\langle\mathbb{I}_{\theta}h,G\rangle_{L^{2}(\mathcal{X})}=\langle h,\mathbb{I}_{\theta}^{*}G\rangle_{L^{2}(\Omega)}. The (Fisher-) information operator 𝕀θ𝕀θ\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta} is then a bounded linear operator acting on L2(Ω)L^{2}(\Omega). To study its mapping properties let us first prove the following result:

Lemma 5.

Let η0\eta\geq 0. The linear operator

=Δ𝕀θ𝕀θ\mathcal{I}=\Delta\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta} (133)

maps HηH^{\eta} continuously into H0ηH^{\eta}_{0}, and hence H0ηH^{\eta}_{0} into itself. Moreover \mathcal{I} is injective on H01H^{1}_{0} and hence also on its subspaces H0ηH^{\eta}_{0} for any η1\eta\geq 1.

Proof.

Let us first prove the continuity statement: We have from the definition of the hηh^{\eta} norms, (16), the Cauchy-Schwarz inequality and Proposition 5 with m=0m=0, a=ηa=\eta and a=η2a=-\eta-2, for any ϕHη\phi\in H^{\eta},

ϕhη2\displaystyle\|\mathcal{I}\phi\|^{2}_{h^{\eta}} supψC:ψhη1|Δ𝕀θ𝕀θϕ,ψL2|2=supψC:ψhη1|𝕀θ𝕀θϕ,ΔψL2|2\displaystyle\lesssim\sup_{\psi\in C^{\infty}:\|\psi\|_{h^{-\eta}}\leq 1}|\langle\Delta\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta}\phi,\psi\rangle_{L^{2}}|^{2}=\sup_{\psi\in C^{\infty}:\|\psi\|_{h^{-\eta}}\leq 1}|\langle\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta}\phi,\Delta\psi\rangle_{L^{2}}|^{2}
supψC:ψhη1|0T𝕀θϕ,𝕀θΔψL2(Ω)|2\displaystyle\lesssim\sup_{\psi\in C^{\infty}:\|\psi\|_{h^{-\eta}}\leq 1}\Big{|}\int_{0}^{T}\langle\mathbb{I}_{\theta}\phi,\mathbb{I}_{\theta}\Delta\psi\rangle_{L^{2}(\Omega)}\Big{|}^{2}
supψC:ψhη10T𝕀θϕHη+12𝑑t0T𝕀θΔψHη12𝑑t\displaystyle\lesssim\sup_{\psi\in C^{\infty}:\|\psi\|_{h^{-\eta}}\leq 1}\int_{0}^{T}\|\mathbb{I}_{\theta}\phi\|^{2}_{H^{\eta+1}}dt\int_{0}^{T}\|\mathbb{I}_{\theta}\Delta\psi\|_{H^{-\eta-1}}^{2}dt
supψC:ψhη1ϕHη2ΔψHη22ϕHη2.\displaystyle\lesssim\sup_{\psi\in C^{\infty}:\|\psi\|_{h^{-\eta}}\leq 1}\|\phi\|^{2}_{H^{\eta}}\|\Delta\psi\|_{H^{-\eta-2}}^{2}\lesssim\|\phi\|^{2}_{H^{\eta}}.

It is also clear from the divergence theorem (e.g., (3.1.2)) that \mathcal{I} maps into H0ηH^{\eta}_{0}.

Now to prove that the linear operator \mathcal{I} is injective, let ψH01\psi\in H^{1}_{0}. If ψ=Δ𝕀θ𝕀θψ=0\mathcal{I}\psi=\Delta\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta}\psi=0 then 𝕀θ𝕀θψ\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta}\psi is constant a.e. on Ω\Omega by the maximum principle for Δ\Delta (or by (13), (14)). But then

0TUψ(s)L2(Ω)2𝑑s=𝕀θψL2(𝒳)2=ψ,𝕀θ𝕀θψL2(Ω)=const×Ωψ=0\int_{0}^{T}\|U_{\psi}(s)\|^{2}_{L^{2}(\Omega)}ds=\|\mathbb{I}_{\theta}\psi\|^{2}_{L^{2}(\mathcal{X})}=\langle\psi,\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta}\psi\rangle_{L^{2}(\Omega)}=const\times\int_{\Omega}\psi=0

and this implies ψH1=0\|\psi\|_{H^{-1}}=0 in view of Lemma 4, and then also ψ=0\psi=0 almost everywhere and hence ψ=0\psi=0 in H1H^{1}. ∎

The following key result of this article verifies Condition 2F) for the reaction-diffusion system (6).

Theorem 8.

Let η>2+d/2\eta>2+d/2. Then the operator \mathcal{I} from (133) defines a continuous linear homeomorphism of H0η(Ω)H^{\eta}_{0}(\Omega) onto itself.

Proof.

We will prove that -\mathcal{I} equals the identity IdId plus a compact operator KK mapping H0ηH^{\eta}_{0} into H0η+1H^{\eta+1}_{0} for η>2+d/2\eta>2+d/2. Since \mathcal{I} is injective by Lemma 5, the Fredholm alternative, p.583 in [42], implies that \mathcal{I} is then also surjective onto H0ηH^{\eta}_{0} and hence has an inverse H0ηH0ηH^{\eta}_{0}\to H^{\eta}_{0} which is continuous by the open mapping theorem.

We start with a comparison to the new operator

ε=Δ𝕀θ,ε𝕀θ,ε\mathcal{I}_{\varepsilon}=\Delta\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon} (134)

where 𝕀θ,ε\mathbb{I}_{\theta,\varepsilon} is obtained as above but replacing V=f(uθ)C,1V=f^{\prime}(u_{\theta})\in C^{\infty,1} by the corresponding locally constant potential V(ε)C1,V^{(\varepsilon)}\in C^{1,\infty} from Proposition 7. Notice that V(ε)=f(θ)WV^{(\varepsilon)}=f^{\prime}(\theta)\equiv-W on (0,ε/2)(0,\varepsilon/2). The operator ε\mathcal{I}_{\varepsilon} also maps HηH^{\eta} into H0ηH^{\eta}_{0} (proved just as Lemma 5).

Lemma 6.

For the operator norm from HηHη+1H^{\eta}\to H^{\eta+1} and all ε>0\varepsilon>0 we have

εHηHη+1ε,\|\mathcal{I}_{\varepsilon}-\mathcal{I}\|_{H^{\eta}\to H^{\eta+1}}\lesssim\varepsilon,

in particular ε\mathcal{I}_{\varepsilon}-\mathcal{I} is a compact operator on HηH^{\eta}.

Proof.

By linearity and taking suprema over smooth test functions ψ\psi,

Δ𝕀θ,ε𝕀θ,εΔ𝕀θ𝕀θhηhη+1\displaystyle\|\Delta\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon}-\Delta\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta}\|_{h^{\eta}\to h^{\eta+1}} =supψhη11supϕhη1|(𝕀θ,ε𝕀θ,ε𝕀θ𝕀θ)ϕ,ΔψL2|\displaystyle=\sup_{\|\psi\|_{h^{-\eta-1}}\leq 1}\sup_{\|\phi\|_{h^{\eta}\leq 1}}|\langle(\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon}-\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta})\phi,\Delta\psi\rangle_{L^{2}}|
supψhη11supϕhη1|(𝕀θ,ε𝕀θ,ε𝕀θ,ε𝕀θ)ϕ,ΔψL2|\displaystyle\leq\sup_{\|\psi\|_{h^{-\eta-1}}\leq 1}\sup_{\|\phi\|_{h^{\eta}\leq 1}}|\langle(\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon}-\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta})\phi,\Delta\psi\rangle_{L^{2}}|
+supψHη11supϕHη1|(𝕀θ,ε𝕀θ𝕀θ𝕀θ)ϕ,ΔψL2|\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ +\sup_{\|\psi\|_{H^{-\eta-1}}\leq 1}\sup_{\|\phi\|_{H^{\eta}\leq 1}}|\langle(\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta}-\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta})\phi,\Delta\psi\rangle_{L^{2}}|
=supψhη11supϕhη1|(𝕀θ,ε𝕀θ)ϕ,𝕀θ,εΔψL2(𝒳)|\displaystyle=\sup_{\|\psi\|_{h^{-\eta-1}}\leq 1}\sup_{\|\phi\|_{h^{\eta}\leq 1}}|\langle(\mathbb{I}_{\theta,\varepsilon}-\mathbb{I}_{\theta})\phi,\mathbb{I}_{\theta,\varepsilon}\Delta\psi\rangle_{L^{2}(\mathcal{X})}|
+supψHη11supϕHη1|𝕀θϕ,(𝕀θ𝕀θ,ε)ΔψL2(𝒳)|\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ +\sup_{\|\psi\|_{H^{-\eta-1}}\leq 1}\sup_{\|\phi\|_{H^{\eta}\leq 1}}|\langle\mathbb{I}_{\theta}\phi,(\mathbb{I}_{\theta}-\mathbb{I}_{\theta,\varepsilon})\Delta\psi\rangle_{L^{2}(\mathcal{X})}|

The term inside the supremum of the first summand in the last equation is upper bounded, using the definition of the hηh^{\eta} norms, the Cauchy-Schwarz inequality, the second part of Proposition 7 as well as (107) with m=0m=0, by

(0TUϕ(t)Uϕ(ε)(t)hη+22𝑑t)1/2(0TUΔψ(ε)(t)hη22𝑑t)1/2εϕhηΔψHη3\displaystyle\Big{(}\int_{0}^{T}\|U_{\phi}(t)-U_{\phi}^{(\varepsilon)}(t)\|^{2}_{h^{\eta+2}}dt\Big{)}^{1/2}\Big{(}\int_{0}^{T}\|U^{(\varepsilon)}_{\Delta\psi}(t)\|^{2}_{h^{-\eta-2}}dt\Big{)}^{1/2}\lesssim\varepsilon\|\phi\|_{h^{\eta}}\|\Delta\psi\|_{H^{-\eta-3}}

By similar arguments the second term is bounded by

(0TUΔψ(t)UΔψ(ε)(t)Hη12𝑑t)1/2(0TUϕ(t)Hη+12𝑑t)1/2εϕhηΔψHη3,\Big{(}\int_{0}^{T}\|U_{\Delta\psi}(t)-U_{\Delta\psi}^{(\varepsilon)}(t)\|^{2}_{H^{-\eta-1}}dt\Big{)}^{1/2}\Big{(}\int_{0}^{T}\|U_{\phi}(t)\|^{2}_{H^{\eta+1}}dt\Big{)}^{1/2}\lesssim\varepsilon\|\phi\|_{h^{\eta}}\|\Delta\psi\|_{H^{-\eta-3}},

so the result follows. ∎

Writing =ε+ε\mathcal{I}=\mathcal{I}_{\varepsilon}+\mathcal{I}-\mathcal{I}_{\varepsilon} we conclude that it suffices to prove that ε\mathcal{I}_{\varepsilon} equals the identity operator plus a compact perturbation. To achieve this we first derive a more explicit representation of the information operator 𝕀θ,ε𝕀θ,ε\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon}. For hHηh\in H^{\eta} we can use (125) and write the solutions of (3.2) for m=0m=0 and potential W=f(θ)W=-f^{\prime}(\theta) as

𝕀θ,ε(h)(t,)=Uh(ε)(t)=Ωpt,W(,y)h(y)𝑑y, 0<t<ε/2.\mathbb{I}_{\theta,\varepsilon}(h)(t,\cdot)=U^{(\varepsilon)}_{h}(t)=\int_{\Omega}p_{t,W}(\cdot,y)h(y)dy,\leavevmode\nobreak\ \leavevmode\nobreak\ 0<t<\varepsilon/2. (135)

As the Fredholm alternative only requires the perturbation to be compact and not small, we could in principle choose ε=4T\varepsilon=4T in which case the preceding representation holds on the time horizon [0,2T][0,2T], and the proof that follows can be simplified by discarding the term involving integrals over [ε/4,T][\varepsilon/4,T]. But anticipating Remark 3, we proceed with arbitrary but fixed ε>0\varepsilon>0.

For any t>0t>0, Proposition 5 implies that the linear operators hUh(ε)(t,)h\mapsto U_{h}^{(\varepsilon)}(t,\cdot) for tt fixed are continuous from L2(Ω)L2(Ω)L^{2}(\Omega)\to L^{2}(\Omega) with operator norms uniformly bounded in tt. Therefore they have continuous linear adjoint operators

Ut,ε:L2(Ω)L2(Ω)U_{t,\varepsilon}^{*}:L^{2}(\Omega)\to L^{2}(\Omega)

with operator norms bounded also uniformly in tt. Thus for any h1L2,h2Hηh_{1}\in L^{2},h_{2}\in H^{\eta} we have, using also Parseval’s identity for the basis {ej,W}\{e_{j,W}\}, and Fubini’s theorem,

h1,𝕀θ,ε𝕀θ,εh2L2(Ω)\displaystyle\langle h_{1},\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon}h_{2}\rangle_{L^{2}(\Omega)} (136)
=𝕀θ,εh1,𝕀θ,εh2L2([0,T],L2(Ω))=0TUh1(ε)(t),Uh2(ε)(t)L2(Ω)𝑑t\displaystyle=\langle\mathbb{I}_{\theta,\varepsilon}h_{1},\mathbb{I}_{\theta,\varepsilon}h_{2}\rangle_{L^{2}([0,T],L^{2}(\Omega))}=\int_{0}^{T}\langle U^{(\varepsilon)}_{h_{1}}(t),U^{(\varepsilon)}_{h_{2}}(t)\rangle_{L^{2}(\Omega)}dt
=0ε/4je2tλj,Wej,W,h1L2ej,W,h2L2dt+h1,ε/4TUt,ε[Uh2(ε)(t)]𝑑tL2.\displaystyle=\int_{0}^{\varepsilon/4}\sum_{j}e^{-2t\lambda_{j,W}}\langle e_{j,W},h_{1}\rangle_{L^{2}}\langle e_{j,W},h_{2}\rangle_{L^{2}}dt+\Big{\langle}h_{1},\int_{\varepsilon/4}^{T}U_{t,\varepsilon}^{*}[U_{h_{2}}^{(\varepsilon)}(t)]dt\Big{\rangle}_{L^{2}}.

For the first term we use again (125) and 2tε/22t\leq\varepsilon/2 to recognise

Uh2(ε)(2t,)=je2tλj,Wej,W,h2L2ej,WU^{(\varepsilon)}_{h_{2}}(2t,\cdot)=\sum_{j}e^{-2t\lambda_{j,W}}\langle e_{j,W},h_{2}\rangle_{L^{2}}e_{j,W}

and this series converges uniformly on [0,ε/4]×Ω[0,\varepsilon/4]\times\Omega in view of (3.3.1), h2HηH2h_{2}\in H^{\eta}\subset H^{2} and (123). Therefore by the dominated convergence and Fubini’s theorem

0ε/4je2tλj,Wej,W,h2L2Ωej,W(y)h1(y)𝑑y=Ωh1(y)0ε/4Uh2(ε)(2t,y)𝑑t𝑑y.\displaystyle\int_{0}^{\varepsilon/4}\sum_{j}e^{-2t\lambda_{j,W}}\langle e_{j,W},h_{2}\rangle_{L^{2}}\int_{\Omega}e_{j,W}(y)h_{1}(y)dy=\int_{\Omega}h_{1}(y)\int_{0}^{\varepsilon/4}U^{(\varepsilon)}_{h_{2}}(2t,y)dtdy.

and hence we can write (136), for all h1L2,h2Hηh_{1}\in L^{2},h_{2}\in H^{\eta}, as

h1,𝕀θ,ε𝕀θ,εh2L2(Ω)=h1,0ε/4Uh2(ε)(2t,)𝑑t+ε/4TUt,ε[Uh2(ε)(t)]L2.\langle h_{1},\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon}h_{2}\rangle_{L^{2}(\Omega)}=\Big{\langle}h_{1},\int_{0}^{\varepsilon/4}U^{(\varepsilon)}_{h_{2}}(2t,\cdot)dt+\int_{\varepsilon/4}^{T}U_{t,\varepsilon}^{*}[U_{h_{2}}^{(\varepsilon)}(t)]\Big{\rangle}_{L^{2}}.

In summary, the action of the operator ε=Δ𝕀θ,ε𝕀θ,ε\mathcal{I}_{\varepsilon}=\Delta\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon} on HηH^{\eta} can be represented as

εh\displaystyle\mathcal{I}_{\varepsilon}h =Δ0ε/4Uh(ε)(2t,)𝑑t+Δε/4TUt,ε[Uh(ε)(t)]𝑑t=A(h)+B(h).\displaystyle=\Delta\int_{0}^{\varepsilon/4}U^{(\varepsilon)}_{h}(2t,\cdot)dt+\Delta\int_{\varepsilon/4}^{T}U_{t,\varepsilon}^{*}[U_{h}^{(\varepsilon)}(t)]dt=A(h)+B(h). (137)

For the second term we use (16), Fubini’s theorem, Lemma 2 and Proposition 5 in

B(h)Hη+1\displaystyle\|B(h)\|_{H^{\eta+1}} =supψC:ψHη11|ψ,Δε/4TUt,ε[Uh(ε)(t)]𝑑tL2|\displaystyle=\sup_{\psi\in C^{\infty}:\|\psi\|_{H^{-\eta-1}}\leq 1}\Big{|}\big{\langle}\psi,\Delta\int_{\varepsilon/4}^{T}U_{t,\varepsilon}^{*}[U_{h}^{(\varepsilon)}(t)]dt\big{\rangle}_{L^{2}}\Big{|}
=supψC:ψHη11|ε/4TΔψ,Ut,ε[Uh(ε)(t)]L2𝑑t|\displaystyle=\sup_{\psi\in C^{\infty}:\|\psi\|_{H^{-\eta-1}}\leq 1}\Big{|}\int_{\varepsilon/4}^{T}\langle\Delta\psi,U_{t,\varepsilon}^{*}[U_{h}^{(\varepsilon)}(t)]\rangle_{L^{2}}dt\Big{|}
=supψC:ψHη11|ε/4TUΔψ(ε)(t),Uh(ε)(t)L2𝑑t|\displaystyle=\sup_{\psi\in C^{\infty}:\|\psi\|_{H^{-\eta-1}}\leq 1}\Big{|}\int_{\varepsilon/4}^{T}\langle U_{\Delta\psi}^{(\varepsilon)}(t),U_{h}^{(\varepsilon)}(t)\rangle_{L^{2}}dt\Big{|}
supt[ε/4,T],ψC:ψHη3cUψ(t)HηUh(t)HηhHη,\displaystyle\lesssim\sup_{t\in[\varepsilon/4,T],\psi\in C^{\infty}:\|\psi\|_{H^{-\eta-3}}\leq c}\|U_{\psi}(t)\|_{H^{-\eta}}\|U_{h}(t)\|_{H^{\eta}}\lesssim\|h\|_{H^{\eta}},

for some c>0c>0, so that we conclude that BB is a compact operator on HηH^{\eta}.

About term AA: For hHη,η>2+d/2,h\in H^{\eta},\eta>2+d/2, the functions ΔUh(ε)\Delta U^{(\varepsilon)}_{h} are uniformly bounded on [0,ε/4]×Ω[0,\varepsilon/4]\times\Omega by Proposition 5 and the Sobolev imbedding, and so we can take Δ\Delta inside of the integral, and likewise Uh(ε)(,x)U^{(\varepsilon)}_{h}(\cdot,x) is absolutely continuous on [0,ε/4][0,\varepsilon/4] for every xΩx\in\Omega. Using (3.3) with our W=f(θ)CW=f^{\prime}(\theta)\in C^{\infty} and the fundamental theorem of calculus for Lebesgue integrals (p.106 in [17]), we thereupon obtain

A(h)\displaystyle A(h) =0ε/4ΔUh(ε)(2t,)𝑑t=[0ε/4tUh(ε)(2t,)𝑑t+0ε/4WUh(ε)(2t,)𝑑t]\displaystyle=\int_{0}^{\varepsilon/4}\Delta U^{(\varepsilon)}_{h}(2t,\cdot)dt=\Big{[}\int_{0}^{\varepsilon/4}\frac{\partial}{\partial t}U^{(\varepsilon)}_{h}(2t,\cdot)dt+\int_{0}^{{\varepsilon/4}}WU^{(\varepsilon)}_{h}(2t,\cdot)dt\Big{]}
=Id(h)+Uh(ε)(ε/2)+0ε/4WUh(2t,)𝑑t\displaystyle=-Id(h)+U^{(\varepsilon)}_{h}(\varepsilon/2)+\int_{0}^{\varepsilon/4}WU_{h}(2t,\cdot)dt

where IdId is the identity operator. By Proposition 6 the second term maps hHηh\in H^{\eta} linearly and continuously into Hη+1H^{\eta+1} for every fixed ε>0\varepsilon>0. The same is true for the third time since its Hη+1H^{\eta+1}-norms can be bounded, using the Cauchy-Schwarz and Minkowski’s integral inequality, (17) as well as Proposition 5 by

cWCη+1(0TUh(t)Hη+12𝑑t)1/2hHηc\|W\|_{C^{\eta+1}}\Big{(}\int_{0}^{T}\|U_{h}(t)\|_{H^{\eta+1}}^{2}dt\Big{)}^{1/2}\lesssim\|h\|_{H^{\eta}}

for some c=c(η,T)<c=c(\eta,T)<\infty. Hence these terms are also compact linear operators on HηH^{\eta}.

In summary, we have shown that =Id+K-\mathcal{I}=Id+K on HηH^{\eta} for a compact operator K:HηHηK:H^{\eta}\to H^{\eta}. To show that KK in fact maps H0ηH^{\eta}_{0} compactly into H0ηH^{\eta}_{0}, let hnh_{n} be a bounded sequence in H0ηH^{\eta}_{0}. Then Ωhn=0\int_{\Omega}\mathcal{I}h_{n}=0 for all nn by Lemma 5, hence

ΩK(hn)=ΩhnΩId(hn)=00=0,\int_{\Omega}K(h_{n})=-\int_{\Omega}\mathcal{I}h_{n}-\int_{\Omega}Id(h_{n})=0-0=0,

so K(hn)H0ηK(h_{n})\in H^{\eta}_{0}, and by compactness K(hn)K(h_{n}) has a Hη\|\cdot\|_{H^{\eta}}-convergent subsequence. As H0ηH^{\eta}_{0} is a closed subspace of HηH^{\eta} we see that KK maps compactly into H0ηH^{\eta}_{0}. ∎

Remark 3.

Unlike in the proof of the injectivity result Lemma 5 (via Lemma 4), the proof of surjectivity of the operator \mathcal{I} does not require Proposition 7 for ε\varepsilon sufficiently small, since any bounded perturbation of the potential VV results in a compact perturbation of \mathcal{I} via Lemma 6. But for ε0\varepsilon\to 0 our proof provides some further qualitative understanding of the structure of the information operator, which decomposes as

𝕀θ𝕀θ=𝕀θ,ε𝕀θ,ε+εK2=Δ1+K1,ε+εK2any ε>0,\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta}=\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon}+\varepsilon K_{2}=-\Delta^{-1}+K_{1,\varepsilon}+\varepsilon K_{2}\leavevmode\nobreak\ \leavevmode\nobreak\ \text{any }\varepsilon>0, (138)

where K2K_{2} is smoothing of order 33 and K1,εK_{1,\varepsilon} is infinitely smoothing for any fixed ε\varepsilon. The operator 𝕀θ,ε𝕀θ,ε\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon} corresponds to the case where the infinitesimal behaviour of the reaction-diffusion system at the beginning of time t(0,ε/2)t\in(0,\varepsilon/2) is described by equation (125) with Schrödinger operator 𝒮=Δf(θ)\mathscr{S}=\Delta-f^{\prime}(\theta). On the eigen-spaces of 𝒮\mathscr{S} with negative eigenvalues λj,W<0-\lambda_{j,W}<0, the dynamics instantly smooth the initial condition, but on the null space of 𝒮\mathscr{S} the system ‘stalls’ and 𝕀θ,ε𝕀θ,ε\mathbb{I}_{\theta,\varepsilon}^{*}\mathbb{I}_{\theta,\varepsilon} equals the identity operator, reproducing a ‘direct’ regression model where non-existence results [18] for Bernstein-von Mises theorems in infinite dimensions would apply. A deeper reason for the possibility of Theorem 1 holding in the strong topology of 𝒞\mathscr{C} is the fact that the kernel of 𝒮\mathscr{S} is at most finite-dimensional (Lemma 3), and that 𝕀θ𝕀θ\mathbb{I}_{\theta}^{*}\mathbb{I}_{\theta} is, in view of (138), approximated with arbitrary precision as ε0\varepsilon\to 0 by the information operator of a system with this property. This remark applies equally to the (in view of (123) and (14)) finite-dimensional eigenspaces corresponding to positive eigenvalues λj,W>0-\lambda_{j,W}>0.

Acknowledgement. The author gratefully acknowledges support through an ERC Advanced Grant (UKRI G116786) as well as by EPSRC programme grant EP/V026259.

References

  • [1] Afonso S. Bandeira, Antoine Maillard, Richard Nickl, and Sven Wang. On free energy barriers in Gaussian priors and failure of cold start MCMC for high-dimensional unimodal distributions. Philos. Trans. Roy. Soc. A, 381(2247), 2023.
  • [2] Alexandre Belloni and Victor Chernozhukov. On the computational complexity of MCMC-based estimators in large samples. Ann. Statist., 37(4):2011–2055, 2009.
  • [3] François-Xavier Briol, Chris J. Oates, Mark Girolami, Michael A. Osborne, and Dino Sejdinovic. Probabilistic integration: a role in statistical computation? Statist. Sci., 34(1):1–22, 2019.
  • [4] J.A. Carrillo, F. Hoffmann, A.M. Stuart, and U. Vaes. Statistical accuracy of approximate filtering methods. arXiv, 2024.
  • [5] Ismaël Castillo and Richard Nickl. Nonparametric Bernstein–von Mises Theorems in Gaussian white noise. Ann. Statist., 41(4):1999–2028, 2013.
  • [6] Ismaël Castillo and Richard Nickl. On the Bernstein–von Mises phenomenon for nonparametric Bayes procedures. Ann. Statist., 42(5):1941–1969, 2014.
  • [7] Ismaël Castillo and Judith Rousseau. A Bernstein–von Mises theorem for smooth functionals in semiparametric models. Ann. Statist., 43(6):2353–2383, 2015.
  • [8] Ismaël Castillo and Stéphanie van der Pas. Multiscale Bayesian survival analysis. Ann. Statist., 49(6):3559–3582, 2021.
  • [9] Peter Constantin and Ciprian Foias. Navier-Stokes equations. Chicago Lectures in Mathematics. University of Chicago Press, Chicago, IL, 1988.
  • [10] S. L. Cotter, M. Dashti, J. C. Robinson, and A. M. Stuart. Bayesian inverse problems for functions and applications to fluid mechanics. Inverse Problems, 25(11):115008, 43, 2009.
  • [11] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White. MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci., 28(3):424–446, 2013.
  • [12] E. B. Davies. Spectral theory and differential operators, volume 42 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1995.
  • [13] Persi Diaconis. Bayesian numerical analysis. In Statistical decision theory and related topics, IV, Vol. 1, pages 163–175. Springer, New York, 1988.
  • [14] Richard M. Dudley. Real analysis and probability. Cambridge University Press, Cambridge, 2002.
  • [15] Lawrence C. Evans. Partial differential equations, volume 19. American Mathematical Society, Providence, RI, second edition, 2010.
  • [16] G. Evensen, F.C. Vossepoel, and J. van Leeuwen. Data assimilation fundamentals. Springer textbooks in Earth sciences, Geography and Environment. Springer, Cham, 2022.
  • [17] Gerald B. Folland. Real analysis. John Wiley & Sons, Inc., New York, 1999.
  • [18] David Freedman. On the Bernstein-von Mises theorem with infinite-dimensional parameters. Ann. Statist., 27(4):1119–1140, 1999.
  • [19] Subhashis Ghosal and Aad W. van der Vaart. Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press, New York, 2017.
  • [20] Evarist Giné and Richard Nickl. Mathematical foundations of infinite-dimensional statistical models. Cambridge University Press, New York, 2016.
  • [21] Matteo Giordano and Sven Wang. Statistical algorithms for low-frequency diffusion data: A PDE approach. arXiv, 2024.
  • [22] Martin Hairer, Andrew M. Stuart, and Sebastian J. Vollmer. Spectral gaps for a Metropolis-Hastings algorithm in infinite dimensions. Ann. Appl. Probab., 24(6):2455–2490, 2014.
  • [23] Eugenia Kalnay. Atmospheric modelling, data assimilation, and predictability. Cambridge University Press, Cambridge, UK, 2003.
  • [24] Pierre-Simon M. de Laplace. Theorie analytiques des probabilités. Courcier, Paris, 1812.
  • [25] Kody Law, Andrew Stuart, and Konstantinos Zygalakis. Data assimilation, volume 62 of Texts in Applied Mathematics. Springer, Cham, 2015. A mathematical introduction.
  • [26] Andrew J. Majda and John Harlim. Filtering complex turbulent systems. Cambridge University Press, Cambridge, 2012.
  • [27] Martine Marion. Approximate inertial manifolds for reaction-diffusion equations in high space dimension. J. Dynam. Differential Equations, 1(3):245–267, 1989.
  • [28] François Monard, Richard Nickl, and Gabriel P. Paternain. Consistent inversion of noisy non-Abelian X-ray transforms. Comm. Pure Appl. Math., 74(5):1045–1099, 2021.
  • [29] François Monard, Richard Nickl, and Gabriel P. Paternain. Statistical guarantees for Bayesian uncertainty quantification in nonlinear inverse problems with Gaussian process priors. Ann. Statist., 49(6):3255–3298, 2021.
  • [30] Richard Nickl. Bernstein–von Mises theorems for statistical inverse problems I: Schrödinger equation. J. Eur. Math. Soc. (JEMS), 22(8):2697–2750, 2020.
  • [31] Richard Nickl. Bayesian non-linear statistical inverse problems. Zurich Lectures in Advanced Mathematics. European Mathematical Society (EMS) press, Berlin, 2023.
  • [32] Richard Nickl. Consistent inference for diffusions from low frequency measurements. Annals of Statistics, 52:519–549, 2024.
  • [33] Richard Nickl and Gabriel P. Paternain. On some information-theoretic aspects of non-linear statistical inverse problems. In ICM—International Congress of Mathematicians. Vol. 7. Sections 15–20, pages 5516–5538. EMS Press, Berlin, [2023].
  • [34] Richard Nickl, Grigorios A. Pavliotis, and Kolyan Ray. Bayesian nonparametric inference in McKean-Vlasov models. arXiv, 2024.
  • [35] Richard Nickl and Jakob Söhl. Bernstein-von Mises theorems for statistical inverse problems II: compound Poisson processes. Electron. J. Stat., 13(2):3513–3571, 2019.
  • [36] Richard Nickl and Edriss S. Titi. On posterior consistency of data assimilation with Gaussian process priors: the 2D Navier-Stokes equations. Annals of Statistics, to appear, 2023.
  • [37] Richard Nickl and Sven Wang. On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithms. J. Eur. Math. Soc. (JEMS), 26:1031–1112, 2024.
  • [38] Sebastian Reich and Colin Cotter. Probabilistic forecasting and Bayesian data assimilation. Cambridge University Press, New York, 2015.
  • [39] James C. Robinson. Infinite-dimensional dynamical systems. Cambridge Texts in Applied Mathematics. Cambridge University Press, Cambridge, 2001.
  • [40] Andrew M. Stuart. Inverse problems: a Bayesian perspective. Acta Numer., 19:451–559, 2010.
  • [41] A.-S. Sznitman. Topics in propagation of chaos. In École d’Été de Probabilités de Saint-Flour XIX—1989, volume 1464 of Lecture Notes in Math., pages 165–251. Springer, Berlin, 1991.
  • [42] Michael E. Taylor. Partial differential equations I. Basic theory. Springer, New York, 2011.
  • [43] Roger Temam. Infinite-dimensional dynamical systems in mechanics and physics, volume 68 of Applied Mathematical Sciences. Springer-Verlag, New York, second edition, 1997.
  • [44] A. W. van der Vaart. Asymptotic statistics. Cambridge Univ. Press, Cambridge, 1998.
  • [45] A. W. van der Vaart and J. H. van Zanten. Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist., 36(3):1435–1463, 2008.
  • [46] Cédric Villani. Optimal transport, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer-Verlag, Berlin, 2009.

Department of Pure Mathematics & Mathematical Statistics

University of Cambridge, Cambridge, UK

Email: nickl@maths.cam.ac.uk