assumAssumption \spn@wtheoremclaaaClaim
33email: emmanuel.vazquez@supelec.fr and julien.bect@supelec.fr
Pointwise consistency of the kriging predictor with known mean and covariance functions
Abstract.
This paper deals with several issues related to the pointwise consistency of the kriging predictor when the mean and the covariance functions are known. These questions are of general importance in the context of computer experiments. The analysis is based on the properties of approximations in reproducing kernel Hilbert spaces. We fix an erroneous claim of Yakowitz and Szidarovszky (J. Multivariate Analysis, 1985) that the kriging predictor is pointwise consistent for all continuous sample paths under some assumptions.
Keywords:
kriging; reproducing kernel Hilbert space; asymptotics; consistency1 Introduction
The domain of computer experiments is concerned with making inferences about the output of an expensive-to-run numerical simulation of some physical system, which depends on a vector of factors with values in . The output of the simulator is formally an unknown function . For example, to comply with ever-increasing standards regarding pollutant emissions, numerical simulations are used to determine the level of emissions of a combustion engine as a function of its design parameters (Villemonteix, 2008). The emission of pollutants by an engine involves coupled physical phenomena whose numerical simulation by a finite-element method, for a fixed set of design parameters of the engine, can take several hours on high-end servers. It then becomes very helpful to collect the answers already provided by the expensive simulator, and to construct from them a simpler computer model, that will provide approximate but cheaper answers about a quantity of interest. This approximate model is often called a surrogate, or a metamodel, or an emulator of the actual simulator . The quality of the answers given by the approximate model depends on the quality of the approximation, which depends, in turn and in part, on the choice of the evaluation points of , also called experiments. The choice of the evaluation points is usually called the design of experiments. Assuming that is continuous, it is an important question to know whether the approximate model behaves consistently, in the sense that if the evaluation points are chosen sequentially in such a way that a given point is an accumulation point of , then the approximation at converges to .
Since the seminal paper of Sacks et al. (1989), kriging has been one of the most popular methods for building approximations in the context of computer experiments (see, e.g., Santner et al., 2003). In the framework of kriging, the unknown function is seen as a sample path of a stochastic process , which turns the problem of approximation of into a prediction problem for the process . In this paper, we shall assume that the mean and the covariance functions are known. Motivated by the analysis of the expected improvement algorithm (Vazquez and Bect, 2009), a popular kriging-based optimization algorithm, we discuss several issues related to the pointwise consistency of the kriging predictor, that is, the convergence of the kriging predictor to the true value of at a fixed point . These issues are barely documented in the literature, and we believe them to be of general importance for the asymptotic analysis of sequential design procedures based on kriging.
The paper is organized as follows. Section 2 introduces notation and various formulations of pointwise consistency, using the reproducing kernel Hilbert space (RKHS) attached to . Section 3 investigates whether -pointwise consistency at can hold when is not in the adherence of the set . Conversely, assuming that is in the adherence, Section 4 studies the set of sample paths for which pointwise consistency holds. In particular, we fix an erroneous claim of Yakowitz and Szidarovszky (1985)—namely, that the kriging predictor is pointwise consistent for all continuous sample paths under some assumptions.
2 Several formulations of pointwise consistency
Let be a second-order process defined on a probability space , with parameter . Without loss of generality, it will be assumed that the mean of is zero and that . The covariance function of will be denoted by , and the following assumption will be used throughout the paper: {assum} The covariance function is continuous. The kriging predictor of , based on the observations , , is the orthogonal projection
(1) |
of onto . The variance of the prediction error, also called the kriging variance in the literature of geostatistics (see, e.g., Chilès and Delfiner, 1999), or the power function in the literature of radial basis functions (see, e.g., Wu and Schaback, 1993), is
For any , and any sample path , , the values and can be seen as the result of the application of an evaluation functional to . More precisely, let be the Dirac measure at , and let denote the measure with finite support defined by . Then, for all , and . Pointwise consistency at , defined in Section 1 as the convergence of to , can thus be seen as the convergence of to in some sense.
Let be the RKHS of functions generated by , and its dual space. Denote by (resp. ) the inner product of (resp. ), and by (resp. ) the corresponding norm. It is well-known (see, e.g., Wu and Schaback, 1993) that
Therefore, the convergence holds strongly in if and only if the kriging predictor is -consistent at ; that is, if converges to zero. Since is continuous, it is easily seen that as soon as is adherent to . Indeed,
with a non-decreasing sequence such that , and . As explained by Vazquez and Bect (2009), it is sometimes important to work with covariance functions such that the converse holds. That leads to our first open issue, which will be discussed in Section 3:
Problem 1
Find necessary and sufficient conditions on a continuous covariance such that implies that is adherent to .
Moreover, since strong convergence in implies weak convergence in , we have
(2) |
Therefore, if is adherent to , pointwise consistency holds for all sample paths . However, this result is not satisfying from a Bayesian point of view since if is Gaussian (see, e.g., Lukic and Beder, 2001, Driscoll’s theorem). In other words, modeling as a Gaussian process means that cannot be expected to belong to . This leads to our second problem:
Problem 2
For a given covariance function , describe the set of functions such that, for all sequences in and all ,
(3) |
An important question related to this problem, to be discussed in Section 4, is to know whether the set contains the set of all continuous functions. Before proceeding, we can already establish a result which ensures that considering the kriging predictor is relevant from a Bayesian point of view.
Theorem 2.1
If is Gaussian, then is -negligible.
Proof
If is Gaussian, it is well-known that a.s., where denotes the -algebra generated by , …, . Note that is an -bounded martingale sequence and therefore converges, a.s. and in -norm, to a random variable (see, e.g., Williams, 1991).∎
3 Pointwise consistency in -norm and the No-Empty-Ball property
The following definition has been introduced by Vazquez and Bect (2009):
Definition 1
A random process has the No-Empty-Ball (NEB) property if, for all sequences in and all , the following assertions are equivalent:
-
i)
is an adherent point of the set ,
-
ii)
when .
The NEB property implies that there can be no empty ball centered at if the prediction error at converges to zero—hence the name. Since is continuous, the implication 1.i 1.ii is true. Therefore, Problem amounts to finding necessary and sufficient conditions on for to have the NEB property.
Our contribution to the solution of Problem will be twofold. First, we shall prove that the following assumption, introduced by Yakowitz and Szidarovszky (1985), is a sufficient condition for the NEB property: {assum} The process is second-order stationary and has spectral density , with the property that has at most polynomial growth. In other words, Assumption 1 means that there exist and such that , almost everywhere on . Note that this is an assumption on , which prevents it from being too regular. In particular, the so-called Gaussian covariance,
(4) |
does not satisfy Assumption 1. In fact, and this is the second part of our contribution, we shall show that with covariance function (4) does not possess the NEB property. Assumption 1 still allows consideration of a large class of covariance functions, which includes the class of (non-Gaussian) exponential covariances
(5) |
and the class of Matérn covariances (popularized by Stein, 1999).
To summarize, the main result of this section is:
Proposition 1
4 Pointwise consistency for continuous sample paths
An important question related to Problem is to know whether the set contains the set of all continuous functions. Yakowitz and Szidarovszky (1985, Lemma 2.1) claim, but fail to establish, the following: {claaa} Let Assumption 1 hold. Assume that is bounded, and denote by its (compact) closure in . Then, if ,
Their incorrect proof has two parts, the first of which is correct; it says in essence that, if (i.e., if is adherent to ), then
(6) |
where is the vector space of rapidly decreasing functions111Recall that corresponds to those for which for , where denotes differentiation of order .. In fact, this result stems from the weak convergence result (2), once it has been remarked that222Indeed, under Assumption 1, we have , where is the Fourier transform of (see, e.g., Wu and Schaback, 1993). under Assumption 1.
The second part of the proof of Claim 4 is flawed because the extension of the convergence result from to , on the ground that is dense in for the topology of the uniform convergence on compact sets, does not work as claimed by the authors. To get an insight into this, let , and let be a sequence that converges to uniformly on . Then we can write
where is the total variation norm of , also called the Lebesgue constant (at ) in the literature of approximation theory. If we assume that the Lebesgue constant is bounded by , then we get, using (6),
Conversely, if the Lebesgue constant is not bounded, the Banach-Steinhaus theorem asserts that there exists a dense subset of such that, for all , (see, e.g., Rudin, 1987, Section 5.8).
Unfortunately, little is known about Lebesgue constants in the literature of kriging and kernel regression. To the best of our knowledge, whether the Lebesgue constant is bounded remains an open problem—although there is empirical evidence in De Marchi and Schaback (2008) that the Lebesgue constant could be bounded in some cases.
Thus, the best result that we can state for now is a fixed version of Yakowitz and Szidarovszky (1985), Lemma 2.1.
Theorem 4.1
Let Assumption 1 hold. Assume that is bounded, and denote by its (compact) closure in . Then, for all , the following assertions are equivalent:
-
i)
, ,
-
ii)
the Lebesgue constant at is bounded.
5 Proof of Proposition 1
Assume that is not adherent to . Then, there exists a compactly supported function such that and , . For such a function, the quantity cannot converge to since
Under Assumption 1, , as explained in Section 4. Thus, ; and it follows that cannot converge (weakly, hence strongly) to in . This proves the first assertion of Proposition 1.
In order to prove the second assertion, pick any sequence such that the closure of has a non-empty interior. We will show that for all . Then, choosing proves the claim.
Recall that is the orthogonal projection of onto in . Using the fact that the mapping extends linearly to an isometry333often referred to as Loève’s isometry (see, e.g., Lukic and Beder, 2001) from to , we get that
where is the distance in , and is the subspace of generated by , . Therefore
where . Any function satisfies and therefore vanishes on , since is a space of continuous functions. Corollary 3.9 of Steinwart et al. (2006) leads to the conclusion that since has a non-empty interior. We have proved that , hence that since is a closed subspace. As a consequence, , which completes the proof. ∎
References
- Chilès and Delfiner (1999) J.-P. Chilès and P. Delfiner. Geostatistics: Modeling Spatial Uncertainty. Wiley, New York, 1999.
- De Marchi and Schaback (2008) S. De Marchi and R. Schaback. Stability of kernel-based interpolation. Adv. in Comp. Math., 2008. doi: 10.1007/s10444-008-9093-4.
- Lukic and Beder (2001) M. N. Lukic and J. H. Beder. Stochastic processes with sample paths in reproducing kernel Hilbert spaces. Trans. Amer. Math. Soc., 353(10):3945–3969, 2001.
- Rudin (1987) W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, 3rd edition, 1987.
- Sacks et al. (1989) J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. Design and analysis of computer experiments. Statist. Sci., 4(4):409–435, 1989.
- Santner et al. (2003) T. J. Santner, B. J. Williams, and W. I. Notz. The Design and Analysis of Computer Experiments. Springer, 2003.
- Stein (1999) M. L. Stein. Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York, 1999.
- Steinwart et al. (2006) I. Steinwart, D. Hush, and C. Scovel. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Transactions on Information Theory, 52(10):4635–4643, 2006.
- Vazquez and Bect (2009) E. Vazquez and J. Bect. On the convergence of the expected improvement algorithm. Preprint available on arXiv, http://arxiv.org/abs/0712.3744v2, 2009.
- Villemonteix (2008) J. Villemonteix. Optimisation de Fonctions Coûteuses. PhD thesis, Université Paris-Sud XI, Faculté des Sciences d’Orsay, 2008.
- Williams (1991) D. Williams. Probability with Martingales. Cambridge University Press, Cambridge, 1991.
- Wu and Schaback (1993) Z. Wu and R. Schaback. Local error estimates for radial basis function interpolation of scattered data. IMA J. Numer. Anal., 13:13–27, 1993.
- Yakowitz and Szidarovszky (1985) S. J. Yakowitz and F. Szidarovszky. A comparison of kriging with nonparametric regression methods. J. Multivariate Analysis, 16:21–53, 1985.