\spn@wtheorem

assumAssumption \spn@wtheoremclaaaClaim

¹¹institutetext: Emmanuel Vazquez ²²institutetext: Julien Bect ³³institutetext: SUPELEC, 3 rue Joliot-Curie, 91192 Gif-sur-Yvette, France
³³email: emmanuel.vazquez@supelec.fr and julien.bect@supelec.fr

Pointwise consistency of the kriging predictor with known mean and covariance functions

Emmanuel Vazquez Julien Bect

Abstract.

This paper deals with several issues related to the pointwise consistency of the kriging predictor when the mean and the covariance functions are known. These questions are of general importance in the context of computer experiments. The analysis is based on the properties of approximations in reproducing kernel Hilbert spaces. We fix an erroneous claim of Yakowitz and Szidarovszky (J. Multivariate Analysis, 1985) that the kriging predictor is pointwise consistent for all continuous sample paths under some assumptions.

Keywords:

kriging; reproducing kernel Hilbert space; asymptotics; consistency

1 Introduction

The domain of computer experiments is concerned with making inferences about the output of an expensive-to-run numerical simulation of some physical system, which depends on a vector of factors with values in ${\mathbb{X}}\subseteq{\mathbb{R}}^{d}$ . The output of the simulator is formally an unknown function $f:{\mathbb{X}}\to{\mathbb{R}}$ . For example, to comply with ever-increasing standards regarding pollutant emissions, numerical simulations are used to determine the level of emissions of a combustion engine as a function of its design parameters (Villemonteix, 2008). The emission of pollutants by an engine involves coupled physical phenomena whose numerical simulation by a finite-element method, for a fixed set of design parameters of the engine, can take several hours on high-end servers. It then becomes very helpful to collect the answers already provided by the expensive simulator, and to construct from them a simpler computer model, that will provide approximate but cheaper answers about a quantity of interest. This approximate model is often called a surrogate, or a metamodel, or an emulator of the actual simulator $f$ . The quality of the answers given by the approximate model depends on the quality of the approximation, which depends, in turn and in part, on the choice of the evaluation points of $f$ , also called experiments. The choice of the evaluation points is usually called the design of experiments. Assuming that $f$ is continuous, it is an important question to know whether the approximate model behaves consistently, in the sense that if the evaluation points $x_{n}$ are chosen sequentially in such a way that a given point $x\in{\mathbb{X}}$ is an accumulation point of $\{x_{n},\ n\geq 1\}$ , then the approximation at $x$ converges to $f(x)$ .

Since the seminal paper of Sacks et al. (1989), kriging has been one of the most popular methods for building approximations in the context of computer experiments (see, e.g., Santner et al., 2003). In the framework of kriging, the unknown function $f$ is seen as a sample path of a stochastic process $\xi$ , which turns the problem of approximation of $f$ into a prediction problem for the process $\xi$ . In this paper, we shall assume that the mean and the covariance functions are known. Motivated by the analysis of the expected improvement algorithm (Vazquez and Bect, 2009), a popular kriging-based optimization algorithm, we discuss several issues related to the pointwise consistency of the kriging predictor, that is, the convergence of the kriging predictor to the true value of $\xi$ at a fixed point $x\in{\mathbb{X}}$ . These issues are barely documented in the literature, and we believe them to be of general importance for the asymptotic analysis of sequential design procedures based on kriging.

The paper is organized as follows. Section 2 introduces notation and various formulations of pointwise consistency, using the reproducing kernel Hilbert space (RKHS) attached to $\xi$ . Section 3 investigates whether $L^{2}$ -pointwise consistency at $x$ can hold when $x$ is not in the adherence of the set $\{x_{n},n\geq 1\}$ . Conversely, assuming that $x$ is in the adherence, Section 4 studies the set of sample paths $f=\xi(\omega,\bm{\cdot})$ for which pointwise consistency holds. In particular, we fix an erroneous claim of Yakowitz and Szidarovszky (1985)—namely, that the kriging predictor is pointwise consistent for all continuous sample paths under some assumptions.

2 Several formulations of pointwise consistency

Let $\xi$ be a second-order process defined on a probability space $(\Omega,\mathcal{A},\mathsf{P})$ , with parameter $x\in{\mathbb{X}}\subseteq{\mathbb{R}}^{d}$ . Without loss of generality, it will be assumed that the mean of $\xi$ is zero and that ${\mathbb{X}}={\mathbb{R}}^{d}$ . The covariance function of $\xi$ will be denoted by $k(x,y)\mathrel{\mathop{:}}=\operatorname{\mathsf{E}}\left[\xi(x)\xi(y)\right]$ , and the following assumption will be used throughout the paper: {assum} The covariance function $k$ is continuous. The kriging predictor of $\xi(x)$ , based on the observations $\xi(x_{i})$ , $i=1,\ldots,n$ , is the orthogonal projection

\widehat{\xi}(x;\underline{x}_{n})\;\mathrel{\mathop{:}}=\;\sum_{i=1}^{n}\lambda^{i}(x;\underline{x}_{n})\,\xi(x_{i})

(1)

of $\xi(x)$ onto $\operatorname{{\rm span}}\{\xi(x_{i}),i=1,\ldots,n\}$ . The variance of the prediction error, also called the kriging variance in the literature of geostatistics (see, e.g., Chilès and Delfiner, 1999), or the power function in the literature of radial basis functions (see, e.g., Wu and Schaback, 1993), is

	$\displaystyle\sigma^{2}(x;\underline{x}_{n})\;$	$\displaystyle\mathrel{\mathop{:}}=\;\operatorname{var}\left[\xi(x)-\widehat{\xi}(x;\underline{x}_{n})\right]$
		$\displaystyle=\;k(x,x)-\sum_{i}\lambda^{i}(x;\underline{x}_{n})\,k(x,x_{i})\,.$

For any $x\in{\mathbb{R}}^{d}$ , and any sample path $f=\xi(\omega,\bm{\cdot})$ , $\omega\in\Omega$ , the values $\xi(\omega,x)=f(x)$ and $\widehat{\xi}(\omega,x;\underline{x}_{n})$ can be seen as the result of the application of an evaluation functional to $f$ . More precisely, let $\delta_{x}$ be the Dirac measure at $x\in{\mathbb{R}}^{d}$ , and let $\lambda_{\,n,x}$ denote the measure with finite support defined by $\lambda_{\,n,x}\mathrel{\mathop{:}}=\sum_{i=1}^{n}\lambda^{i}(x;\underline{x}_{n})\,\delta_{x_{i}}$ . Then, for all $\omega\in\Omega$ , $\xi(\omega,x)=\left\langle\,\delta_{x},\,f\,\right\rangle$ and $\widehat{\xi}(\omega,x;\underline{x}_{n})=\left\langle\,\lambda_{\,n,x},\,f\,\right\rangle$ . Pointwise consistency at $x\in{\mathbb{R}}^{d}$ , defined in Section 1 as the convergence of $\widehat{\xi}(\omega,x;\underline{x}_{n})$ to $\xi(x)$ , can thus be seen as the convergence of $\lambda_{\,n,x}$ to $\delta_{x}$ in some sense.

Let ${\mathcal{H}}$ be the RKHS of functions generated by $k$ , and ${\mathcal{H}}^{*}$ its dual space. Denote by $(\cdot,\cdot)_{{\mathcal{H}}}$ (resp. $(\cdot,\cdot)_{{\mathcal{H}}^{*}}$ ) the inner product of ${\mathcal{H}}$ (resp. ${\mathcal{H}}^{*}$ ), and by $\lVert\cdot\rVert_{{\mathcal{H}}}$ (resp. $\lVert\cdot\rVert_{{\mathcal{H}}^{*}}$ ) the corresponding norm. It is well-known (see, e.g., Wu and Schaback, 1993) that

\big{\lVert}\delta_{x}-\lambda_{\,n,x}\big{\rVert}^{2}_{{\mathcal{H}}^{*}}\;=\;\big{\lVert}k(x,\cdot)-{\sum}_{i}\,\lambda^{i}(x;\underline{x}_{n})\,k(x_{i},\cdot)\big{\rVert}^{2}_{{\mathcal{H}}}\;=\;\sigma^{2}(x;\underline{x}_{n})\,.

Therefore, the convergence $\lambda_{\,n,x}\to\delta_{x}$ holds strongly in ${\mathcal{H}}^{*}$ if and only if the kriging predictor is $L^{2}(\Omega,\mathcal{A},\mathsf{P})$ -consistent at $x$ ; that is, if $\sigma^{2}(x;\underline{x}_{n})$ converges to zero. Since $k$ is continuous, it is easily seen that $\sigma^{2}(x;\underline{x}_{n})\to 0$ as soon as $x$ is adherent to $\{x_{n},n\geq 1\}$ . Indeed,

\sigma^{2}(x,\underline{x}_{n})\leq\operatorname{\mathsf{E}}[(\xi(x)-\xi(x_{\varphi_{n}}))^{2}]=k(x,x)+k(x_{\varphi_{n}},x_{\varphi_{n}})-2k(x,x_{\varphi_{n}}),

with $(\varphi_{n})_{n\in{\mathbb{N}}}$ a non-decreasing sequence such that $\forall n\geq 1$ , $\varphi_{n}\leq n$ and $x_{\varphi_{n}}\to x$ . As explained by Vazquez and Bect (2009), it is sometimes important to work with covariance functions such that the converse holds. That leads to our first open issue, which will be discussed in Section 3:

Problem 1

Find necessary and sufficient conditions on a continuous covariance $k$ such that $\sigma^{2}(x;\underline{x}_{n})\to 0$ implies that $x$ is adherent to $\{x_{n},n\geq 1\}$ .

Moreover, since strong convergence in ${\mathcal{H}}^{*}$ implies weak convergence in ${\mathcal{H}}^{*}$ , we have

\lim_{n\to\infty}\sigma^{2}(x;\underline{x}_{n})=0\;\implies\;\forall f\in{\mathcal{H}}\,,\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=\left\langle\,\delta_{x},\,f\,\right\rangle=f(x)\,.

(2)

Therefore, if $x$ is adherent to $\{x_{n},\,n\geq 1\}$ , pointwise consistency holds for all sample paths $f\in{\mathcal{H}}$ . However, this result is not satisfying from a Bayesian point of view since $\mathsf{P}\{\xi\in{\mathcal{H}}\}=0$ if $\xi$ is Gaussian (see, e.g., Lukic and Beder, 2001, Driscoll’s theorem). In other words, modeling $f$ as a Gaussian process means that $f$ cannot be expected to belong to ${\mathcal{H}}$ . This leads to our second problem:

Problem 2

For a given covariance function $k$ , describe the set of functions ${\mathcal{G}}$ such that, for all sequences $(x_{n})_{n\geq 1}$ in ${\mathbb{R}}^{d}$ and all $x\in{\mathbb{R}}^{d}$ ,

\lim_{n\to\infty}\sigma^{2}(x;\underline{x}_{n})=0\;\implies\;\forall f\in{\mathcal{G}}\,,\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x)\,.

(3)

An important question related to this problem, to be discussed in Section 4, is to know whether the set ${\mathcal{G}}$ contains the set $C({\mathbb{R}}^{d})$ of all continuous functions. Before proceeding, we can already establish a result which ensures that considering the kriging predictor is relevant from a Bayesian point of view.

Theorem 2.1

If $\xi$ is Gaussian, then $\{\xi\not\in{\mathcal{G}}\}$ is $\mathsf{P}$ -negligible.

Proof

If $\xi$ is Gaussian, it is well-known that $\widehat{\xi}(x;\underline{x}_{n})\;=\;\operatorname{\mathsf{E}}[\xi(x)\mid{\mathcal{F}}_{n}]$ a.s., where ${\mathcal{F}}_{n}$ denotes the $\sigma$ -algebra generated by $\xi(x_{1})$ , …, $\xi(x_{n})$ . Note that $\left(\operatorname{\mathsf{E}}[\xi(x)\mid{\mathcal{F}}_{n}]\right)$ is an $L^{2}$ -bounded martingale sequence and therefore converges, a.s. and in $L^{2}$ -norm, to a random variable $\xi_{\infty}$ (see, e.g., Williams, 1991).∎

3 Pointwise consistency in $L^{2}$ -norm and the No-Empty-Ball property

The following definition has been introduced by Vazquez and Bect (2009):

Definition 1

A random process $\xi$ has the No-Empty-Ball (NEB) property if, for all sequences $(x_{n})_{n\geq 1}$ in ${\mathbb{R}}^{d}$ and all $x\in{\mathbb{R}}^{d}$ , the following assertions are equivalent:

i)

$x$ is an adherent point of the set $\{x_{n},\,n\geq 1\}$ ,
ii)

$\sigma^{2}(x,\underline{x}_{n})\to 0$ when $n\to+\infty$ .

The NEB property implies that there can be no empty ball centered at $x$ if the prediction error at $x$ converges to zero—hence the name. Since $k$ is continuous, the implication 1.i $\Rightarrow$ 1.ii is true. Therefore, Problem $1$ amounts to finding necessary and sufficient conditions on $k$ for $\xi$ to have the NEB property.

Our contribution to the solution of Problem $1$ will be twofold. First, we shall prove that the following assumption, introduced by Yakowitz and Szidarovszky (1985), is a sufficient condition for the NEB property: {assum} The process $\xi$ is second-order stationary and has spectral density $S$ , with the property that $S^{-1}$ has at most polynomial growth. In other words, Assumption 1 means that there exist $C>0$ and $r\in{\mathbb{N}}^{*}$ such that $S(u)(1+|u|^{r})\;\geq\;C$ , almost everywhere on ${\mathbb{R}}^{d}$ . Note that this is an assumption on $k$ , which prevents it from being too regular. In particular, the so-called Gaussian covariance,

k(x,y)=s^{2}\,e^{-\alpha\,\lVert x-y\rVert^{2}},\qquad s>0,\;\alpha>0,

(4)

does not satisfy Assumption 1. In fact, and this is the second part of our contribution, we shall show that $\xi$ with covariance function (4) does not possess the NEB property. Assumption 1 still allows consideration of a large class of covariance functions, which includes the class of (non-Gaussian) exponential covariances

k(x,y)=s^{2}\,e^{-\alpha\,\lVert x-y\rVert^{\beta}},\qquad s>0,\;\alpha>0,\;0<\beta<2\,,

(5)

and the class of Matérn covariances (popularized by Stein, 1999).

To summarize, the main result of this section is:

Proposition 1

i)

If Assumption 1 holds, then $\xi$ has the NEB property.
ii)

If $\xi$ has the Gaussian covariance given by (4), then $\xi$ does not possess the NEB property.

The proof of Proposition 1 is given in Section 5. To the best of our knowledge, finding necessary and sufficient conditions for the NEB property—in other words, solving Problem $1$ —is still an open problem.

4 Pointwise consistency for continuous sample paths

An important question related to Problem $2$ is to know whether the set ${\mathcal{G}}$ contains the set $C({\mathbb{R}}^{d})$ of all continuous functions. Yakowitz and Szidarovszky (1985, Lemma 2.1) claim, but fail to establish, the following: {claaa} Let Assumption 1 hold. Assume that $\{x_{n},\,n\geq 1\}$ is bounded, and denote by ${\mathbb{X}}_{0}$ its (compact) closure in ${\mathbb{R}}^{d}$ . Then, if $x\in{\mathbb{X}}_{0}$ ,

\forall f\in C({\mathbb{R}}^{d})\,,\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x)\,.

Their incorrect proof has two parts, the first of which is correct; it says in essence that, if $x\in{\mathbb{X}}_{0}$ (i.e., if $x$ is adherent to $\{x_{n},\,n\geq 1\}$ ), then

\forall f\in\mathcal{S}({\mathbb{R}}^{d}),\quad\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x)\,,

(6)

where $\mathcal{S}({\mathbb{R}}^{d})$ is the vector space of rapidly decreasing functions¹¹1Recall that $\mathcal{S}({\mathbb{R}}^{d})$ corresponds to those $f\in C^{\infty}({\mathbb{R}}^{d})$ for which $\sup_{\lvert\nu\rvert\leq N}\,\sup_{x\in{\mathbb{R}}^{d}}~(1+\lvert x\rvert^{2})^{N}\lvert(D^{\nu}f)(x)\rvert<\infty\vskip-6.0pt$ for $N=0,1,2,\ldots$ , where $D^{\nu}$ denotes differentiation of order $\nu$ .. In fact, this result stems from the weak convergence result (2), once it has been remarked that²²2Indeed, under Assumption 1, we have $\forall f\in\mathcal{S}({\mathbb{R}}^{d})$ , $\lVert f\rVert_{{\mathcal{H}}}^{2}=\frac{1}{(2\pi)^{d}}\int_{{\mathbb{R}}^{d}}\left|\tilde{f}(u)\right|^{2}S(u)^{-1}du\leq\frac{1}{C\,(2\pi)^{d}}\int_{{\mathbb{R}}^{d}}\left|\tilde{f}(u)\right|^{2}\,\left(1+|u|^{r}\right)\,du<+\infty\,,\vskip-6.0pt$ where $\tilde{f}$ is the Fourier transform of $f$ (see, e.g., Wu and Schaback, 1993). $\mathcal{S}({\mathbb{R}}^{d})\subset{\mathcal{H}}$ under Assumption 1.

The second part of the proof of Claim 4 is flawed because the extension of the convergence result from $\mathcal{S}({\mathbb{R}}^{d})$ to $C({\mathbb{R}}^{d})$ , on the ground that $\mathcal{S}({\mathbb{R}}^{d})$ is dense in $C({\mathbb{R}}^{d})$ for the topology of the uniform convergence on compact sets, does not work as claimed by the authors. To get an insight into this, let $f\in C({\mathbb{R}}^{d})$ , and let $(\phi_{k})\in\mathcal{S}({\mathbb{R}}^{d})^{{\mathbb{N}}}$ be a sequence that converges to $f$ uniformly on ${\mathbb{X}}_{0}$ . Then we can write

	$\displaystyle\left\|\left\langle\,\lambda_{\,n,x},f\,\right\rangle-f(x)\right\|$	$\displaystyle\;\leq\;\left\|\left\langle\,\lambda_{\,n,x},f-\phi_{k}\,\right\rangle\right\|+\left\|\left\langle\,\lambda_{\,n,x}-\delta_{x},\phi_{k}\,\right\rangle\right\|+\left\|\phi_{k}(x)-f(x)\right\|$
		$\displaystyle\;\leq\;\left(1+\left\lVert\lambda_{\,n,x}\right\rVert_{\rm TV}\right)\,\sup_{{\mathbb{X}}_{0}}\left\|f-\phi_{k}\right\|\,+\,\left\|\left\langle\,\lambda_{\,n,x}-\delta_{x},\phi_{k}\,\right\rangle\right\|\,,$

where $\lVert\lambda_{\,n,x}\rVert_{\rm TV}\mathrel{\mathop{:}}=\sum_{i=1}^{n}\lvert\lambda^{i}(x;\underline{x}_{n})\rvert$ is the total variation norm of $\lambda_{\,n,x}$ , also called the Lebesgue constant (at $x$ ) in the literature of approximation theory. If we assume that the Lebesgue constant is bounded by $K>0$ , then we get, using (6),

\limsup_{n\to\infty}\,\left|\left\langle\,\lambda_{\,n,x},f\,\right\rangle-f(x)\right|\;\leq\;\left(1+K\right)\;\sup_{{\mathbb{X}}_{0}}\,\left|f-\phi_{k}\right|\;\xrightarrow[k\to\infty]{}\;0\,.

Conversely, if the Lebesgue constant is not bounded, the Banach-Steinhaus theorem asserts that there exists a dense subset $G$ of $\left(C({\mathbb{R}}^{d}),\left\lVert\bm{\cdot}\right\rVert_{\infty}\right)$ such that, for all $f\in G$ , $\sup_{n\geq 1}\lvert\left\langle\,\lambda_{\,n,x},f\,\right\rangle\rvert=+\infty$ (see, e.g., Rudin, 1987, Section 5.8).

Unfortunately, little is known about Lebesgue constants in the literature of kriging and kernel regression. To the best of our knowledge, whether the Lebesgue constant is bounded remains an open problem—although there is empirical evidence in De Marchi and Schaback (2008) that the Lebesgue constant could be bounded in some cases.

Thus, the best result that we can state for now is a fixed version of Yakowitz and Szidarovszky (1985), Lemma 2.1.

Theorem 4.1

Let Assumption 1 hold. Assume that $\{x_{n},\,n\geq 1\}$ is bounded, and denote by ${\mathbb{X}}_{0}$ its (compact) closure in ${\mathbb{R}}^{d}$ . Then, for all $x\in{\mathbb{X}}_{0}$ , the following assertions are equivalent:

i)

$\forall f\in C({\mathbb{R}}^{d})$ , $\lim_{n\to\infty}\left\langle\,\lambda_{\,n,x},f\,\right\rangle=f(x)$ ,
ii)

the Lebesgue constant at $x$ is bounded.

5 Proof of Proposition 1

Assume that $x\in{\mathbb{R}}^{d}$ is not adherent to $\{x_{n},\,n\!\geq 1\}$ . Then, there exists a $C^{\infty}({\mathbb{R}}^{d})$ compactly supported function $f$ such that $f(x)\neq 0$ and $f(x_{i})=0$ , $\forall i\in\{1,\ldots,n\}$ . For such a function, the quantity $\left\langle\,\lambda_{\,n,x},f\,\right\rangle$ cannot converge to $f(x)$ since

\left\langle\,\lambda_{\,n,x},f\,\right\rangle\;=\;\sum_{i=1}^{n}\lambda^{i}(x;\underline{x}_{n})\,f(x_{i})\;=\;0\;\neq\;f(x)\,.

Under Assumption 1, $\mathcal{S}({\mathbb{R}}^{d})\subset{\mathcal{H}}$ , as explained in Section 4. Thus, $f\in{\mathcal{H}}$ ; and it follows that $\lambda_{\,n,x}$ cannot converge (weakly, hence strongly) to $\delta_{x}$ in ${\mathcal{H}}^{*}$ . This proves the first assertion of Proposition 1.

In order to prove the second assertion, pick any sequence $(x_{n})_{n\geq 1}$ such that the closure ${\mathbb{X}}_{0}$ of $\{x_{n},\,n\geq 1\}$ has a non-empty interior. We will show that $\sigma{{}^{2}}(x;\underline{x}_{n})\to 0$ for all $x\in{\mathbb{R}}^{d}$ . Then, choosing $x\not\in{\mathbb{X}}_{0}$ proves the claim.

Recall that $\widehat{\xi}(x;\underline{x}_{n})$ is the orthogonal projection of $\xi(x)$ onto $\operatorname{{\rm span}}\{\xi(x_{i}),i=1,\ldots,n\}$ in $L^{2}\left(\Omega,\mathcal{A},\mathsf{P}\right)$ . Using the fact that the mapping $\xi(x)\mapsto k(x,\cdot)$ extends linearly to an isometry³³3often referred to as Loève’s isometry (see, e.g., Lukic and Beder, 2001) from $\overline{\operatorname{{\rm span}}}\{\xi(y),\,y\in{\mathbb{R}}^{d}\}$ to ${\mathcal{H}}$ , we get that

\sigma(x;\underline{x}_{n})\;=\;\big{\lVert}\xi(x)-\widehat{\xi}(x;\underline{x}_{n})\big{\rVert}\;=\;\mathop{d_{\mathcal{H}}}\!\left(k(x,\cdot),\,H_{n}\right)\,,

where $d_{{\mathcal{H}}}$ is the distance in ${\mathcal{H}}$ , and $H_{n}$ is the subspace of ${\mathcal{H}}$ generated by $k(x_{i},\cdot)$ , $i=1,\ldots,n$ . Therefore

\lim_{n\to\infty}\sigma(x;\underline{x}_{n})\;=\;\lim_{n\to\infty}d_{\mathcal{H}}\!\left(k(x,\cdot),\,H_{n}\right)\;=\;d_{\mathcal{H}}\!\left(k(x,\cdot),\,H_{\infty}\right)\,,

where $H_{\infty}=\overline{\cup_{n\geq 1}H_{n}}$ . Any function $f\in H_{\infty}^{\perp}$ satisfies $f(x_{i})=\left(f,\,k(x_{i},\cdot)\right)=0$ and therefore vanishes on ${\mathbb{X}}_{0}$ , since ${\mathcal{H}}$ is a space of continuous functions. Corollary 3.9 of Steinwart et al. (2006) leads to the conclusion that $f=0$ since ${\mathbb{X}}_{0}$ has a non-empty interior. We have proved that $H_{\infty}^{\perp}=\{0\}$ , hence that $H_{\infty}={\mathcal{H}}$ since $H_{\infty}$ is a closed subspace. As a consequence, $\lim_{n\to\infty}\sigma(x;\underline{x}_{n})=d_{\mathcal{H}}\!\left(k_{x},\,H_{\infty}\right)=0$ , which completes the proof. ∎

References

Chilès and Delfiner (1999) J.-P. Chilès and P. Delfiner. Geostatistics: Modeling Spatial Uncertainty. Wiley, New York, 1999.
De Marchi and Schaback (2008) S. De Marchi and R. Schaback. Stability of kernel-based interpolation. Adv. in Comp. Math., 2008. doi: 10.1007/s10444-008-9093-4.
Lukic and Beder (2001) M. N. Lukic and J. H. Beder. Stochastic processes with sample paths in reproducing kernel Hilbert spaces. Trans. Amer. Math. Soc., 353(10):3945–3969, 2001.
Rudin (1987) W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, 3rd edition, 1987.
Sacks et al. (1989) J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. Design and analysis of computer experiments. Statist. Sci., 4(4):409–435, 1989.
Santner et al. (2003) T. J. Santner, B. J. Williams, and W. I. Notz. The Design and Analysis of Computer Experiments. Springer, 2003.
Stein (1999) M. L. Stein. Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York, 1999.
Steinwart et al. (2006) I. Steinwart, D. Hush, and C. Scovel. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Transactions on Information Theory, 52(10):4635–4643, 2006.
Vazquez and Bect (2009) E. Vazquez and J. Bect. On the convergence of the expected improvement algorithm. Preprint available on arXiv, http://arxiv.org/abs/0712.3744v2, 2009.
Villemonteix (2008) J. Villemonteix. Optimisation de Fonctions Coûteuses. PhD thesis, Université Paris-Sud XI, Faculté des Sciences d’Orsay, 2008.
Williams (1991) D. Williams. Probability with Martingales. Cambridge University Press, Cambridge, 1991.
Wu and Schaback (1993) Z. Wu and R. Schaback. Local error estimates for radial basis function interpolation of scattered data. IMA J. Numer. Anal., 13:13–27, 1993.
Yakowitz and Szidarovszky (1985) S. J. Yakowitz and F. Szidarovszky. A comparison of kriging with nonparametric regression methods. J. Multivariate Analysis, 16:21–53, 1985.