This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\spn@wtheorem

assumAssumption \spn@wtheoremclaaaClaim

11institutetext: Emmanuel Vazquez 22institutetext: Julien Bect 33institutetext: SUPELEC, 3 rue Joliot-Curie, 91192 Gif-sur-Yvette, France
33email: emmanuel.vazquez@supelec.fr and julien.bect@supelec.fr

Pointwise consistency of the kriging predictor with known mean and covariance functions

Emmanuel Vazquez    Julien Bect
Abstract. 

This paper deals with several issues related to the pointwise consistency of the kriging predictor when the mean and the covariance functions are known. These questions are of general importance in the context of computer experiments. The analysis is based on the properties of approximations in reproducing kernel Hilbert spaces. We fix an erroneous claim of Yakowitz and Szidarovszky (J. Multivariate Analysis, 1985) that the kriging predictor is pointwise consistent for all continuous sample paths under some assumptions.

Keywords:
kriging; reproducing kernel Hilbert space; asymptotics; consistency

1 Introduction

The domain of computer experiments is concerned with making inferences about the output of an expensive-to-run numerical simulation of some physical system, which depends on a vector of factors with values in 𝕏d{\mathbb{X}}\subseteq{\mathbb{R}}^{d}. The output of the simulator is formally an unknown function f:𝕏f:{\mathbb{X}}\to{\mathbb{R}}. For example, to comply with ever-increasing standards regarding pollutant emissions, numerical simulations are used to determine the level of emissions of a combustion engine as a function of its design parameters (Villemonteix, 2008). The emission of pollutants by an engine involves coupled physical phenomena whose numerical simulation by a finite-element method, for a fixed set of design parameters of the engine, can take several hours on high-end servers. It then becomes very helpful to collect the answers already provided by the expensive simulator, and to construct from them a simpler computer model, that will provide approximate but cheaper answers about a quantity of interest. This approximate model is often called a surrogate, or a metamodel, or an emulator of the actual simulator ff. The quality of the answers given by the approximate model depends on the quality of the approximation, which depends, in turn and in part, on the choice of the evaluation points of ff, also called experiments. The choice of the evaluation points is usually called the design of experiments. Assuming that ff is continuous, it is an important question to know whether the approximate model behaves consistently, in the sense that if the evaluation points xnx_{n} are chosen sequentially in such a way that a given point x𝕏x\in{\mathbb{X}} is an accumulation point of {xn,n1}\{x_{n},\ n\geq 1\}, then the approximation at xx converges to f(x)f(x).

Since the seminal paper of Sacks et al. (1989), kriging has been one of the most popular methods for building approximations in the context of computer experiments (see, e.g., Santner et al., 2003). In the framework of kriging, the unknown function ff is seen as a sample path of a stochastic process ξ\xi, which turns the problem of approximation of ff into a prediction problem for the process ξ\xi. In this paper, we shall assume that the mean and the covariance functions are known. Motivated by the analysis of the expected improvement algorithm (Vazquez and Bect, 2009), a popular kriging-based optimization algorithm, we discuss several issues related to the pointwise consistency of the kriging predictor, that is, the convergence of the kriging predictor to the true value of ξ\xi at a fixed point x𝕏x\in{\mathbb{X}}. These issues are barely documented in the literature, and we believe them to be of general importance for the asymptotic analysis of sequential design procedures based on kriging.

The paper is organized as follows. Section 2 introduces notation and various formulations of pointwise consistency, using the reproducing kernel Hilbert space (RKHS) attached to ξ\xi. Section 3 investigates whether L2L^{2}-pointwise consistency at xx can hold when xx is not in the adherence of the set {xn,n1}\{x_{n},n\geq 1\}. Conversely, assuming that xx is in the adherence, Section 4 studies the set of sample paths f=ξ(ω,)f=\xi(\omega,\bm{\cdot}) for which pointwise consistency holds. In particular, we fix an erroneous claim of Yakowitz and Szidarovszky (1985)—namely, that the kriging predictor is pointwise consistent for all continuous sample paths under some assumptions.

2 Several formulations of pointwise consistency

Let ξ\xi be a second-order process defined on a probability space (Ω,𝒜,𝖯)(\Omega,\mathcal{A},\mathsf{P}), with parameter x𝕏dx\in{\mathbb{X}}\subseteq{\mathbb{R}}^{d}. Without loss of generality, it will be assumed that the mean of ξ\xi is zero and that 𝕏=d{\mathbb{X}}={\mathbb{R}}^{d}. The covariance function of ξ\xi will be denoted by k(x,y):=𝖤[ξ(x)ξ(y)]k(x,y)\mathrel{\mathop{:}}=\operatorname{\mathsf{E}}\left[\xi(x)\xi(y)\right], and the following assumption will be used throughout the paper: {assum} The covariance function kk is continuous. The kriging predictor of ξ(x)\xi(x), based on the observations ξ(xi)\xi(x_{i}), i=1,,ni=1,\ldots,n, is the orthogonal projection

ξ^(x;x¯n):=i=1nλi(x;x¯n)ξ(xi)\widehat{\xi}(x;\underline{x}_{n})\;\mathrel{\mathop{:}}=\;\sum_{i=1}^{n}\lambda^{i}(x;\underline{x}_{n})\,\xi(x_{i}) (1)

of ξ(x)\xi(x) onto span{ξ(xi),i=1,,n}\operatorname{{\rm span}}\{\xi(x_{i}),i=1,\ldots,n\}. The variance of the prediction error, also called the kriging variance in the literature of geostatistics (see, e.g., Chilès and Delfiner, 1999), or the power function in the literature of radial basis functions (see, e.g., Wu and Schaback, 1993), is

σ2(x;x¯n)\displaystyle\sigma^{2}(x;\underline{x}_{n})\; :=var[ξ(x)ξ^(x;x¯n)]\displaystyle\mathrel{\mathop{:}}=\;\operatorname{var}\left[\xi(x)-\widehat{\xi}(x;\underline{x}_{n})\right]
=k(x,x)iλi(x;x¯n)k(x,xi).\displaystyle=\;k(x,x)-\sum_{i}\lambda^{i}(x;\underline{x}_{n})\,k(x,x_{i})\,.

For any xdx\in{\mathbb{R}}^{d}, and any sample path f=ξ(ω,)f=\xi(\omega,\bm{\cdot}), ωΩ\omega\in\Omega, the values ξ(ω,x)=f(x)\xi(\omega,x)=f(x) and ξ^(ω,x;x¯n)\widehat{\xi}(\omega,x;\underline{x}_{n}) can be seen as the result of the application of an evaluation functional to ff. More precisely, let δx\delta_{x} be the Dirac measure at xdx\in{\mathbb{R}}^{d}, and let λn,x\lambda_{\,n,x} denote the measure with finite support defined by λn,x:=i=1nλi(x;x¯n)δxi\lambda_{\,n,x}\mathrel{\mathop{:}}=\sum_{i=1}^{n}\lambda^{i}(x;\underline{x}_{n})\,\delta_{x_{i}}. Then, for all ωΩ\omega\in\Omega, ξ(ω,x)=δx,f\xi(\omega,x)=\left\langle\,\delta_{x},\,f\,\right\rangle and ξ^(ω,x;x¯n)=λn,x,f\widehat{\xi}(\omega,x;\underline{x}_{n})=\left\langle\,\lambda_{\,n,x},\,f\,\right\rangle. Pointwise consistency at xdx\in{\mathbb{R}}^{d}, defined in Section 1 as the convergence of ξ^(ω,x;x¯n)\widehat{\xi}(\omega,x;\underline{x}_{n}) to ξ(x)\xi(x), can thus be seen as the convergence of λn,x\lambda_{\,n,x} to δx\delta_{x} in some sense.

Let {\mathcal{H}} be the RKHS of functions generated by kk, and {\mathcal{H}}^{*} its dual space. Denote by (,)(\cdot,\cdot)_{{\mathcal{H}}} (resp. (,)(\cdot,\cdot)_{{\mathcal{H}}^{*}}) the inner product of {\mathcal{H}} (resp. {\mathcal{H}}^{*}), and by \lVert\cdot\rVert_{{\mathcal{H}}} (resp. \lVert\cdot\rVert_{{\mathcal{H}}^{*}}) the corresponding norm. It is well-known (see, e.g., Wu and Schaback, 1993) that

δxλn,x2=k(x,)iλi(x;x¯n)k(xi,)2=σ2(x;x¯n).\big{\lVert}\delta_{x}-\lambda_{\,n,x}\big{\rVert}^{2}_{{\mathcal{H}}^{*}}\;=\;\big{\lVert}k(x,\cdot)-{\sum}_{i}\,\lambda^{i}(x;\underline{x}_{n})\,k(x_{i},\cdot)\big{\rVert}^{2}_{{\mathcal{H}}}\;=\;\sigma^{2}(x;\underline{x}_{n})\,.

Therefore, the convergence λn,xδx\lambda_{\,n,x}\to\delta_{x} holds strongly in {\mathcal{H}}^{*} if and only if the kriging predictor is L2(Ω,𝒜,𝖯)L^{2}(\Omega,\mathcal{A},\mathsf{P})-consistent at xx; that is, if σ2(x;x¯n)\sigma^{2}(x;\underline{x}_{n}) converges to zero. Since kk is continuous, it is easily seen that σ2(x;x¯n)0\sigma^{2}(x;\underline{x}_{n})\to 0 as soon as xx is adherent to {xn,n1}\{x_{n},n\geq 1\}. Indeed,

σ2(x,x¯n)𝖤[(ξ(x)ξ(xφn))2]=k(x,x)+k(xφn,xφn)2k(x,xφn),\sigma^{2}(x,\underline{x}_{n})\leq\operatorname{\mathsf{E}}[(\xi(x)-\xi(x_{\varphi_{n}}))^{2}]=k(x,x)+k(x_{\varphi_{n}},x_{\varphi_{n}})-2k(x,x_{\varphi_{n}}),

with (φn)n(\varphi_{n})_{n\in{\mathbb{N}}} a non-decreasing sequence such that n1\forall n\geq 1, φnn\varphi_{n}\leq n and xφnxx_{\varphi_{n}}\to x. As explained by Vazquez and Bect (2009), it is sometimes important to work with covariance functions such that the converse holds. That leads to our first open issue, which will be discussed in Section 3:

Problem 1

Find necessary and sufficient conditions on a continuous covariance kk such that σ2(x;x¯n)0\sigma^{2}(x;\underline{x}_{n})\to 0 implies that xx is adherent to {xn,n1}\{x_{n},n\geq 1\}.

Moreover, since strong convergence in {\mathcal{H}}^{*} implies weak convergence in {\mathcal{H}}^{*}, we have

limnσ2(x;x¯n)=0f,limnλn,x,f=δx,f=f(x).\lim_{n\to\infty}\sigma^{2}(x;\underline{x}_{n})=0\;\implies\;\forall f\in{\mathcal{H}}\,,\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=\left\langle\,\delta_{x},\,f\,\right\rangle=f(x)\,. (2)

Therefore, if xx is adherent to {xn,n1}\{x_{n},\,n\geq 1\}, pointwise consistency holds for all sample paths ff\in{\mathcal{H}}. However, this result is not satisfying from a Bayesian point of view since 𝖯{ξ}=0\mathsf{P}\{\xi\in{\mathcal{H}}\}=0 if ξ\xi is Gaussian (see, e.g., Lukic and Beder, 2001, Driscoll’s theorem). In other words, modeling ff as a Gaussian process means that ff cannot be expected to belong to {\mathcal{H}}. This leads to our second problem:

Problem 2

For a given covariance function kk, describe the set of functions 𝒢{\mathcal{G}} such that, for all sequences (xn)n1(x_{n})_{n\geq 1} in d{\mathbb{R}}^{d} and all xdx\in{\mathbb{R}}^{d},

limnσ2(x;x¯n)=0f𝒢,limnλn,x,f=f(x).\lim_{n\to\infty}\sigma^{2}(x;\underline{x}_{n})=0\;\implies\;\forall f\in{\mathcal{G}}\,,\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x)\,. (3)

An important question related to this problem, to be discussed in Section 4, is to know whether the set 𝒢{\mathcal{G}} contains the set C(d)C({\mathbb{R}}^{d}) of all continuous functions. Before proceeding, we can already establish a result which ensures that considering the kriging predictor is relevant from a Bayesian point of view.

Theorem 2.1

If ξ\xi is Gaussian, then {ξ𝒢}\{\xi\not\in{\mathcal{G}}\} is 𝖯\mathsf{P}-negligible.

Proof

If ξ\xi is Gaussian, it is well-known that ξ^(x;x¯n)=𝖤[ξ(x)n]\widehat{\xi}(x;\underline{x}_{n})\;=\;\operatorname{\mathsf{E}}[\xi(x)\mid{\mathcal{F}}_{n}] a.s., where n{\mathcal{F}}_{n} denotes the σ\sigma-algebra generated by ξ(x1)\xi(x_{1}), …, ξ(xn)\xi(x_{n}). Note that (𝖤[ξ(x)n])\left(\operatorname{\mathsf{E}}[\xi(x)\mid{\mathcal{F}}_{n}]\right) is an L2L^{2}-bounded martingale sequence and therefore converges, a.s. and in L2L^{2}-norm, to a random variable ξ\xi_{\infty} (see, e.g., Williams, 1991).∎

3 Pointwise consistency in L2L^{2}-norm and the No-Empty-Ball property

The following definition has been introduced by Vazquez and Bect (2009):

Definition 1

A random process ξ\xi has the No-Empty-Ball (NEB) property if, for all sequences (xn)n1(x_{n})_{n\geq 1} in d{\mathbb{R}}^{d} and all xdx\in{\mathbb{R}}^{d}, the following assertions are equivalent:

  1. i)

    xx is an adherent point of the set {xn,n1}\{x_{n},\,n\geq 1\},

  2. ii)

    σ2(x,x¯n)0\sigma^{2}(x,\underline{x}_{n})\to 0 when n+n\to+\infty.

The NEB property implies that there can be no empty ball centered at xx if the prediction error at xx converges to zero—hence the name. Since kk is continuous, the implication 1.i \Rightarrow 1.ii is true. Therefore, Problem 11 amounts to finding necessary and sufficient conditions on kk for ξ\xi to have the NEB property.

Our contribution to the solution of Problem 11 will be twofold. First, we shall prove that the following assumption, introduced by Yakowitz and Szidarovszky (1985), is a sufficient condition for the NEB property: {assum} The process ξ\xi is second-order stationary and has spectral density SS, with the property that S1S^{-1} has at most polynomial growth. In other words, Assumption 1 means that there exist C>0C>0 and rr\in{\mathbb{N}}^{*} such that S(u)(1+|u|r)CS(u)(1+|u|^{r})\;\geq\;C, almost everywhere on d{\mathbb{R}}^{d}. Note that this is an assumption on kk, which prevents it from being too regular. In particular, the so-called Gaussian covariance,

k(x,y)=s2eαxy2,s>0,α>0,k(x,y)=s^{2}\,e^{-\alpha\,\lVert x-y\rVert^{2}},\qquad s>0,\;\alpha>0, (4)

does not satisfy Assumption 1. In fact, and this is the second part of our contribution, we shall show that ξ\xi with covariance function (4) does not possess the NEB property. Assumption 1 still allows consideration of a large class of covariance functions, which includes the class of (non-Gaussian) exponential covariances

k(x,y)=s2eαxyβ,s>0,α>0, 0<β<2,k(x,y)=s^{2}\,e^{-\alpha\,\lVert x-y\rVert^{\beta}},\qquad s>0,\;\alpha>0,\;0<\beta<2\,, (5)

and the class of Matérn covariances (popularized by Stein, 1999).

To summarize, the main result of this section is:

Proposition 1

  1. i)

    If Assumption 1 holds, then ξ\xi has the NEB property.

  2. ii)

    If ξ\xi has the Gaussian covariance given by (4), then ξ\xi does not possess the NEB property.

The proof of Proposition 1 is given in Section 5. To the best of our knowledge, finding necessary and sufficient conditions for the NEB property—in other words, solving Problem 11—is still an open problem.

4 Pointwise consistency for continuous sample paths

An important question related to Problem 22 is to know whether the set 𝒢{\mathcal{G}} contains the set C(d)C({\mathbb{R}}^{d}) of all continuous functions. Yakowitz and Szidarovszky (1985, Lemma 2.1) claim, but fail to establish, the following: {claaa} Let Assumption 1 hold. Assume that {xn,n1}\{x_{n},\,n\geq 1\} is bounded, and denote by 𝕏0{\mathbb{X}}_{0} its (compact) closure in d{\mathbb{R}}^{d}. Then, if x𝕏0x\in{\mathbb{X}}_{0},

fC(d),limnλn,x,f=f(x).\forall f\in C({\mathbb{R}}^{d})\,,\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x)\,.

Their incorrect proof has two parts, the first of which is correct; it says in essence that, if x𝕏0x\in{\mathbb{X}}_{0} (i.e., if xx is adherent to {xn,n1}\{x_{n},\,n\geq 1\}), then

f𝒮(d),limnλn,x,f=f(x),\forall f\in\mathcal{S}({\mathbb{R}}^{d}),\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x)\,, (6)

where 𝒮(d)\mathcal{S}({\mathbb{R}}^{d}) is the vector space of rapidly decreasing functions111Recall that 𝒮(d)\mathcal{S}({\mathbb{R}}^{d}) corresponds to those fC(d)f\in C^{\infty}({\mathbb{R}}^{d}) for which sup|ν|Nsupxd(1+|x|2)N|(Dνf)(x)|<\sup_{\lvert\nu\rvert\leq N}\,\sup_{x\in{\mathbb{R}}^{d}}~(1+\lvert x\rvert^{2})^{N}\lvert(D^{\nu}f)(x)\rvert<\infty\vskip-6.0pt for N=0,1,2,N=0,1,2,\ldots, where DνD^{\nu} denotes differentiation of order ν\nu.. In fact, this result stems from the weak convergence result (2), once it has been remarked that222Indeed, under Assumption 1, we have f𝒮(d)\forall f\in\mathcal{S}({\mathbb{R}}^{d}), f2=1(2π)dd|f~(u)|2S(u)1𝑑u1C(2π)dd|f~(u)|2(1+|u|r)𝑑u<+,\lVert f\rVert_{{\mathcal{H}}}^{2}=\frac{1}{(2\pi)^{d}}\int_{{\mathbb{R}}^{d}}\left|\tilde{f}(u)\right|^{2}S(u)^{-1}du\leq\frac{1}{C\,(2\pi)^{d}}\int_{{\mathbb{R}}^{d}}\left|\tilde{f}(u)\right|^{2}\,\left(1+|u|^{r}\right)\,du<+\infty\,,\vskip-6.0pt where f~\tilde{f} is the Fourier transform of ff (see, e.g., Wu and Schaback, 1993). 𝒮(d)\mathcal{S}({\mathbb{R}}^{d})\subset{\mathcal{H}} under Assumption 1.

The second part of the proof of Claim 4 is flawed because the extension of the convergence result from 𝒮(d)\mathcal{S}({\mathbb{R}}^{d}) to C(d)C({\mathbb{R}}^{d}), on the ground that 𝒮(d)\mathcal{S}({\mathbb{R}}^{d}) is dense in C(d)C({\mathbb{R}}^{d}) for the topology of the uniform convergence on compact sets, does not work as claimed by the authors. To get an insight into this, let fC(d)f\in C({\mathbb{R}}^{d}), and let (ϕk)𝒮(d)(\phi_{k})\in\mathcal{S}({\mathbb{R}}^{d})^{{\mathbb{N}}} be a sequence that converges to ff uniformly on 𝕏0{\mathbb{X}}_{0}. Then we can write

|λn,x,ff(x)|\displaystyle\left|\left\langle\,\lambda_{\,n,x},f\,\right\rangle-f(x)\right| |λn,x,fϕk|+|λn,xδx,ϕk|+|ϕk(x)f(x)|\displaystyle\;\leq\;\left|\left\langle\,\lambda_{\,n,x},f-\phi_{k}\,\right\rangle\right|+\left|\left\langle\,\lambda_{\,n,x}-\delta_{x},\phi_{k}\,\right\rangle\right|+\left|\phi_{k}(x)-f(x)\right|
(1+λn,xTV)sup𝕏0|fϕk|+|λn,xδx,ϕk|,\displaystyle\;\leq\;\left(1+\left\lVert\lambda_{\,n,x}\right\rVert_{\rm TV}\right)\,\sup_{{\mathbb{X}}_{0}}\left|f-\phi_{k}\right|\,+\,\left|\left\langle\,\lambda_{\,n,x}-\delta_{x},\phi_{k}\,\right\rangle\right|\,,

where λn,xTV:=i=1n|λi(x;x¯n)|\lVert\lambda_{\,n,x}\rVert_{\rm TV}\mathrel{\mathop{:}}=\sum_{i=1}^{n}\lvert\lambda^{i}(x;\underline{x}_{n})\rvert is the total variation norm of λn,x\lambda_{\,n,x}, also called the Lebesgue constant (at xx) in the literature of approximation theory. If we assume that the Lebesgue constant is bounded by K>0K>0, then we get, using (6),

lim supn|λn,x,ff(x)|(1+K)sup𝕏0|fϕk|k 0.\limsup_{n\to\infty}\,\left|\left\langle\,\lambda_{\,n,x},f\,\right\rangle-f(x)\right|\;\leq\;\left(1+K\right)\;\sup_{{\mathbb{X}}_{0}}\,\left|f-\phi_{k}\right|\;\xrightarrow[k\to\infty]{}\;0\,.

Conversely, if the Lebesgue constant is not bounded, the Banach-Steinhaus theorem asserts that there exists a dense subset GG of (C(d),)\left(C({\mathbb{R}}^{d}),\left\lVert\bm{\cdot}\right\rVert_{\infty}\right) such that, for all fGf\in G, supn1|λn,x,f|=+\sup_{n\geq 1}\lvert\left\langle\,\lambda_{\,n,x},f\,\right\rangle\rvert=+\infty (see, e.g., Rudin, 1987, Section 5.8).

Unfortunately, little is known about Lebesgue constants in the literature of kriging and kernel regression. To the best of our knowledge, whether the Lebesgue constant is bounded remains an open problem—although there is empirical evidence in De Marchi and Schaback (2008) that the Lebesgue constant could be bounded in some cases.

Thus, the best result that we can state for now is a fixed version of Yakowitz and Szidarovszky (1985), Lemma 2.1.

Theorem 4.1

Let Assumption 1 hold. Assume that {xn,n1}\{x_{n},\,n\geq 1\} is bounded, and denote by 𝕏0{\mathbb{X}}_{0} its (compact) closure in d{\mathbb{R}}^{d}. Then, for all x𝕏0x\in{\mathbb{X}}_{0}, the following assertions are equivalent:

  1. i)

    fC(d)\forall f\in C({\mathbb{R}}^{d}), limnλn,x,f=f(x)\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x),

  2. ii)

    the Lebesgue constant at xx is bounded.

5 Proof of Proposition 1

Assume that xdx\in{\mathbb{R}}^{d} is not adherent to {xn,n1}\{x_{n},\,n\!\geq 1\}. Then, there exists a C(d)C^{\infty}({\mathbb{R}}^{d}) compactly supported function ff such that f(x)0f(x)\neq 0 and f(xi)=0f(x_{i})=0, i{1,,n}\forall i\in\{1,\ldots,n\}. For such a function, the quantity λn,x,f\left\langle\,\lambda_{\,n,x},f\,\right\rangle cannot converge to f(x)f(x) since

λn,x,f=i=1nλi(x;x¯n)f(xi)= 0f(x).\left\langle\,\lambda_{\,n,x},f\,\right\rangle\;=\;\sum_{i=1}^{n}\lambda^{i}(x;\underline{x}_{n})\,f(x_{i})\;=\;0\;\neq\;f(x)\,.

Under Assumption 1, 𝒮(d)\mathcal{S}({\mathbb{R}}^{d})\subset{\mathcal{H}}, as explained in Section 4. Thus, ff\in{\mathcal{H}}; and it follows that λn,x\lambda_{\,n,x} cannot converge (weakly, hence strongly) to δx\delta_{x} in {\mathcal{H}}^{*}. This proves the first assertion of Proposition 1.

In order to prove the second assertion, pick any sequence (xn)n1(x_{n})_{n\geq 1} such that the closure 𝕏0{\mathbb{X}}_{0} of {xn,n1}\{x_{n},\,n\geq 1\} has a non-empty interior. We will show that σ(x;x¯n)20\sigma{{}^{2}}(x;\underline{x}_{n})\to 0 for all xdx\in{\mathbb{R}}^{d}. Then, choosing x𝕏0x\not\in{\mathbb{X}}_{0} proves the claim.

Recall that ξ^(x;x¯n)\widehat{\xi}(x;\underline{x}_{n}) is the orthogonal projection of ξ(x)\xi(x) onto span{ξ(xi),i=1,,n}\operatorname{{\rm span}}\{\xi(x_{i}),i=1,\ldots,n\} in L2(Ω,𝒜,𝖯)L^{2}\left(\Omega,\mathcal{A},\mathsf{P}\right). Using the fact that the mapping ξ(x)k(x,)\xi(x)\mapsto k(x,\cdot) extends linearly to an isometry333often referred to as Loève’s isometry (see, e.g., Lukic and Beder, 2001) from span¯{ξ(y),yd}\overline{\operatorname{{\rm span}}}\{\xi(y),\,y\in{\mathbb{R}}^{d}\} to {\mathcal{H}}, we get that

σ(x;x¯n)=ξ(x)ξ^(x;x¯n)=d(k(x,),Hn),\sigma(x;\underline{x}_{n})\;=\;\big{\lVert}\xi(x)-\widehat{\xi}(x;\underline{x}_{n})\big{\rVert}\;=\;\mathop{d_{\mathcal{H}}}\!\left(k(x,\cdot),\,H_{n}\right)\,,

where dd_{{\mathcal{H}}} is the distance in {\mathcal{H}}, and HnH_{n} is the subspace of {\mathcal{H}} generated by k(xi,)k(x_{i},\cdot), i=1,,ni=1,\ldots,n. Therefore

limnσ(x;x¯n)=limnd(k(x,),Hn)=d(k(x,),H),\lim_{n\to\infty}\sigma(x;\underline{x}_{n})\;=\;\lim_{n\to\infty}d_{\mathcal{H}}\!\left(k(x,\cdot),\,H_{n}\right)\;=\;d_{\mathcal{H}}\!\left(k(x,\cdot),\,H_{\infty}\right)\,,

where H=n1Hn¯H_{\infty}=\overline{\cup_{n\geq 1}H_{n}}. Any function fHf\in H_{\infty}^{\perp} satisfies f(xi)=(f,k(xi,))=0f(x_{i})=\left(f,\,k(x_{i},\cdot)\right)=0 and therefore vanishes on 𝕏0{\mathbb{X}}_{0}, since {\mathcal{H}} is a space of continuous functions. Corollary 3.9 of Steinwart et al. (2006) leads to the conclusion that f=0f=0 since 𝕏0{\mathbb{X}}_{0} has a non-empty interior. We have proved that H={0}H_{\infty}^{\perp}=\{0\}, hence that H=H_{\infty}={\mathcal{H}} since HH_{\infty} is a closed subspace. As a consequence, limnσ(x;x¯n)=d(kx,H)=0\lim_{n\to\infty}\sigma(x;\underline{x}_{n})=d_{\mathcal{H}}\!\left(k_{x},\,H_{\infty}\right)=0, which completes the proof. ∎

References

  • Chilès and Delfiner (1999) J.-P. Chilès and P. Delfiner. Geostatistics: Modeling Spatial Uncertainty. Wiley, New York, 1999.
  • De Marchi and Schaback (2008) S. De Marchi and R. Schaback. Stability of kernel-based interpolation. Adv. in Comp. Math., 2008. doi: 10.1007/s10444-008-9093-4.
  • Lukic and Beder (2001) M. N. Lukic and J. H. Beder. Stochastic processes with sample paths in reproducing kernel Hilbert spaces. Trans. Amer. Math. Soc., 353(10):3945–3969, 2001.
  • Rudin (1987) W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, 3rd edition, 1987.
  • Sacks et al. (1989) J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. Design and analysis of computer experiments. Statist. Sci., 4(4):409–435, 1989.
  • Santner et al. (2003) T. J. Santner, B. J. Williams, and W. I. Notz. The Design and Analysis of Computer Experiments. Springer, 2003.
  • Stein (1999) M. L. Stein. Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York, 1999.
  • Steinwart et al. (2006) I. Steinwart, D. Hush, and C. Scovel. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Transactions on Information Theory, 52(10):4635–4643, 2006.
  • Vazquez and Bect (2009) E. Vazquez and J. Bect. On the convergence of the expected improvement algorithm. Preprint available on arXiv, http://arxiv.org/abs/0712.3744v2, 2009.
  • Villemonteix (2008) J. Villemonteix. Optimisation de Fonctions Coûteuses. PhD thesis, Université Paris-Sud XI, Faculté des Sciences d’Orsay, 2008.
  • Williams (1991) D. Williams. Probability with Martingales. Cambridge University Press, Cambridge, 1991.
  • Wu and Schaback (1993) Z. Wu and R. Schaback. Local error estimates for radial basis function interpolation of scattered data. IMA J. Numer. Anal., 13:13–27, 1993.
  • Yakowitz and Szidarovszky (1985) S. J. Yakowitz and F. Szidarovszky. A comparison of kriging with nonparametric regression methods. J. Multivariate Analysis, 16:21–53, 1985.