Asymptotic analysis of ML-covariance parameter estimators based on covariance approximations
Abstract
Given a zero-mean Gaussian random field with a covariance function that belongs to a parametric family of covariance functions, we introduce a new notion of likelihood approximations, termed truncated-likelihood functions. Truncated-likelihood functions are based on direct functional approximations of the presumed family of covariance functions. For compactly supported covariance functions, within an increasing-domain asymptotic framework, we provide sufficient conditions under which consistency and asymptotic normality of estimators based on truncated-likelihood functions are preserved. We apply our result to the family of generalized Wendland covariance functions and discuss several examples of Wendland approximations. For families of covariance functions that are not compactly supported, we combine our results with the covariance tapering approach and show that ML estimators, based on truncated-tapered likelihood functions, asymptotically minimize the Kullback-Leibler divergence, when the taper range is fixed.
Keywords: Gaussian random fields, compactly supported covariance functions, likelihood approximations, consistency, asymptotic normality, covariance tapering.
1 Introduction
1.1 On infill- and increasing-domain asymptotics
Maximum likelihood (ML) estimators for covariance parameters are highly popular in inference for random fields. Aiming towards asymptotic properties of such estimators, one needs to specify how the observation points and the associated sampling domain behave as the number of observation points increases. Two well-studied asymptotic frameworks are referred to as infill-domain asymptotics (also termed fixed-domain asymptotics) and increasing-domain asymptotics (see [13], p. for an introduction of terms). In infill-domain asymptotics, observation points are sampled within a bounded sampling domain, whereas in increasing-domain asymptotics, the sampling domain grows as the number of observation points increases. When referring to infill- and increasing-domain asymptotics, one often places additional assumptions on the minimum distance between any two distinct observation points. In increasing-domain asymptotics, the latter distance is often assumed to be bounded away from zero, while in infill-domain asymptotics, one frequently assumes that distinct observation points can be sampled arbitrarily close to each other (see for example [37]). There is a fair amount of literature which demonstrates that asymptotic properties of ML estimators for covariance parameters can be quite different under the two mentioned asymptotic frameworks (see [37] or more lately [6]). For example, it is known that some covariance parameters can not be estimated consistently under an infill-domain asymptotic framework ([34], [36]), whereas they can be estimated consistently, under given regularity conditions, within an increasing-domain asymptotic framework ([25], [4]). It is worth noting that in infill-domain asymptotics, these results can depend on the dimension of the Euclidean space , where the random field is assumed to be observed. For example, when the true covariance function belongs to the Matérn family ([26]), and smoothness parameters are given, it is shown in [36], that for , the scale and variance parameters can not be estimated consistently via an ML approach in an infill-domain asymptotic framework. The case where is still open, but for , it is shown in [2] that under infill-domain asymptotics, all covariance parameters of the Matérn family can be estimated consistently using an ML approach.
1.2 Compactly supported covariance functions
In recent years, the dataset sizes have steadily increased such that statistical analyses on random fields can become quite expensive in terms of computational resources (see for example [15] for a recent discussion). One prominent issue with large datasets is the large size of covariance matrices, constructed upon applying an underlying covariance function to given data. However, in certain fields of application, observed correlations are assumed to vanish beyond a certain cut-off distance (see [18] p. , and references therein, for an example in meteorology or also [10] and [19]). On the other hand, in the context of real valued random fields, it is common practice to multiply a presumed covariance function with a known positive-definite and compactly supported covariance function, called the covariance taper. The resulting compactly supported covariance function is referred to as the tapered covariance function. For an introduction to covariance tapering we refer to [17]. The use of compactly supported covariance functions can thus be of great importance for some fields of application. Not only do they potentially reflect the nature of the underlying covariance structure, but also, their application can lead to sparse covariance matrices. The latter are helpful in terms of the high computational costs in the context of large datasets. An excellent introduction to the construction of compactly supported covariance functions, associated to stationary and isotropic Gaussian random fields, is given in [21]. Additional results are available in [35], [28] and [11].
1.3 Motivation
The parametric family of generalized Wendland covariance functions represents one example of a family of compactly supported covariance functions which allows, similar to the Matérn family, for a continuous parametrization of smoothness (in the mean square sense) of the underlying random field. Its origin is due to Wendland ([32]) and an early adaptation for statistical applications was given by Gneiting ([20]). In its general form (see [21] and [28] for special cases) the generalized Wendland covariance function with smoothness parameters and , variance parameter and range parameter is given by
(1) |
if and is zero otherwise. In the above display, is the beta function. For technical details about valid parameter values, we refer to [9] or Section 6 of the present article. Clearly, in comparison with closed-form covariance functions, computing (1) is cumbersome, as it involves numerical integration. Depending on the support and a set of locations , the covariance matrix requires at most calculations of (1). One strategy, which facilitates computing , is to reduce the number of times (1) must be calculated. As an illustration, we give three examples which involve approximations , , of (respectively approximations of ):
-
()
Truncation of the support
-
()
linear interpolation
-
()
addition of a nugget effect
For , we truncate to obtain which has a smaller support compared to . This becomes especially interesting, when the original function tails off slowly (high degree of differentiability at the origin). As a result, will be more sparse compared to . Example is to predefine the numbers at which (1) is calculated. This is achieved by introducing a partition of the support of . Then, results in calculations of . This defines a closed form approximation of . Notice that do not need to be equispaced. Finally, can be interpreted as a tuning option for a given approximation of :
With regard to practical usage, this form of approximation increases numerical stability. Further, it allows for more flexibility in practice, where the number of observations is given and based on might not be positive-definite.
Following up the above examples, we picture an approximation of (respectively approximation of ). Several questions arise:
-
•
What are conditions on to ensure that is asymptotically (as ) equivalent to and eventually (for large enough) remains positive-definite?
-
•
In terms of ML estimators for covariance parameters, how shall a log-likelihood approximation based on be defined?
-
•
Under which conditions on are ML estimators based on consistent and asymptotically normal?
In the more general setting of a given parametric family of covariance functions, the present study gives a concrete context, where the latter questions are answered by introducing the notion of truncated-ML estimators.
1.4 Framework and contribution
Truncated-ML estimators for covariance parameters are based on truncated-likelihood functions. The latter are defined upon parametric families of sequences of functions, which approximate a presumed family of covariance functions on a common domain. Colloquially we will call these parametric sequences of functions covariance approximations. The respective matrices, constructed upon applying covariance approximations to a given collection of observation points, will be termed covariance matrix approximations. We will allow for covariance matrix approximations that are not necessarily positive semi-definite. Therefore, truncated-likelihood functions are more general than existing likelihood approximations methods such as low-rank, Vecchia, or covariance tapering approaches (see [22] for a summary of commonly used methods).
We work in an increasing-domain asymptotic framework, where collections of observation points are realizations of finite collections of a randomly perturbed regular grid (see also [4]). We consider a stationary Gaussian random field, with a zero-mean function and a true unknown covariance function that belongs to a given parametric family of covariance functions. If the presumed family of covariance functions is compactly supported, we provide sufficient conditions under which truncated-ML estimators and (regular) ML estimators for covariance parameters are consistent and asymptotically normal. Some conditions imposed on families of covariance functions are identical to the conditions that were already considered in [4]. The main difference is that we work with compactly supported covariance functions. Therefore, it is possible to simplify some of the conditions that were set up in [4]. As for statistical applications, we apply these results to the family of generalized Wendland covariance functions. In contrast to the infill-domain asymptotic framework considered in [9], we show that under the studied increasing-domain asymptotic framework, under some conditions on the parameter space, (regular) ML estimators for variance and range parameters are consistent and asymptotically normal. Further, we show that the same asymptotic results are recovered for truncated-ML estimators, based on various generalized Wendland approximations, such as truncations, linear interpolations and added nugget effects.
Additionally, we provide an extension to families of covariance functions which are not compactly supported. We combine our results with the covariance tapering approach. That is, we study covariance taper approximations and their asymptotic influence on the conditional Kullback-Leibler divergence of the misspecified distribution from the true distribution (see also [5]). We show that the latter divergence is minimized by truncated-tapered ML estimators.
1.5 Structure of the article
The rest of the article is organized as follows. Section 2 establishes the context. We introduce some primary notation, define the sampling domain and the random field itself. In Section 3 we introduce regularity conditions on covariance functions and approximations. In Section 4 we present intermediate asymptotic results on covariance matrices and approximations. Section 5 contains our main results: We introduce truncated-ML estimators and present results on consistency and asymptotic normality. In Section 6, we apply our results to the family of generalized Wendland covariance functions and discuss several examples of generalized Wendland approximations. Then, in the context of non-compactly supported covariance functions, Section 7 contains results on the asymptotic influence of taper approximations on the Kullback-Leibler divergence. Section 8 gives an outlook and some final comments. The Appendix is split into three parts. Covariance approximations for isotropic random fields are discussed in Appendix A. Appendix B contains additional supporting results, whereas all the proofs are left for Appendix C.
2 Context
2.1 Primary notation
The set and shall represent the set of positive integers and non-negative real numbers, respectively. For , we use the notation () for the open (closed) ball of radius with center . Given , for some set , we write for the Borel -algebra on .
For a vector , we write for the Euclidean norm of on . In the case of we use the notation for the Euclidean norm. For two vectors , represents the inner product that induces on . Given , we write for the space of real valued, uniformly bounded functions on , having compact support . If and is also continuous, we use the notation instead of . For we write for the uniform norm on . For vectors , denotes the uniform norm on .
For a real matrix , denotes the spectral norm of . We write () to indicate that is positive-definite (negative-definite). Further, denote the real eigenvalues of a matrix , where represents the space of real symmetric matrices.
We use the notation for the gradient of at , where is any differentiable, real valued function, defined on some . Further, for a vector valued, differentiable function , with values in , defined on some , we write , , , for the Jacobi-matrix of at .
A mapping from a probability space to a measure space will be called a random element if it is measurable. If we write that is measurable, we mean that it is measurable. If denotes a sequence of random elements, where for any , is a mapping form a probability space to a measure space , we use the notation
to indicate convergence of to a random element in probability and in distribution, respectively. Note that for convergence in distribution, the introduced notation indicates that the limit has law on . A sequence of estimators for will be referred to as consistent if it converges in probability to . Finally, indicates a multivariate normal distribution with mean vector and covariance matrix .
2.2 Random sampling scheme
On a probability space , we consider a real valued Gaussian random function , which has sample functions on . We assume that is stationary (homogeneous) with zero-mean function and covariance function , , where , with , compact and convex. Thus, we consider a real valued random field , which has true and unknown covariance function that belongs to a family of covariance functions .
Let and be a stochastic process, defined on the same probability space , but independent of . We assume that the sequence is a sequence of independent random vectors with common law on , which has a strictly positive probability density function on (see also Remark 2.1). Given and a sequence of deterministic points , with , we define a randomly perturbed regular grid as the process
(2) |
where we assume that for all , . Therefore, for any , is a sequence on , with image and any first coordinates are in (see also Figure 1). At this point we remark that if nothing is mentioned, the parameter and the sequence shall be fixed. Let and denote finite collections of and , respectively. We use the notation for a vector that contains the first entries of a given sequence in . Correspondingly, given , and , we write , , for perturbed grid locations in .
On , we define the random vector
(3) |
which denotes observed at a finite collection of . The situation, where a Gaussian random field is assumed to be observed at a randomly perturbed regular grid, with parameter and deterministic points , as introduced above, is also considered in [4].

Given , we let denote the non-random covariance matrix based on an arbitrary . On , we write
for the random covariance matrix based on a finite collection of .
Remark 2.1.
Some technical remarks are worth pointing out. We assume that the random function , , is measurable as a function from the measure space to . That is to say that is (jointly) measurable. This condition makes sure that the components , , of (3) are measurable as the composition of the measurable functions and . Thus, the random vector is well defined. Since and are independent, it is readily seen that the conditional distribution of given is Gaussian, with characteristic function , . In addition, we note that for fixed , is not bounded and if we define , we are given some fixed , which is independent of and , such that
(4) |
Hence, we are in an increasing-domain asymptotic framework where the minimum distance between any two distinct observation points is bounded away from zero. The assumption that for any given , has strictly positive probability density function on , is purely technical (see also the proof of Theorem 5.2). As it can be seen from the mentioned proof, if , the assumption becomes redundant.
3 Regularity Conditions on covariance functions and covariance approximations
3.1 Regularity conditions on the family of covariance functions
Assumption 3.1 (Regularity conditions on ).
-
(1)
There exist real constants , , which are independent of , such that , with and .
-
(2)
For any , the first, second and third order partial derivatives of exist. In addition, for any , , , where there exist constants , which are independent of , such that and .
-
(3)
Fourier inversion holds, that is for any
with continuous and strictly positive.
Remark 3.1.
Note that (1) and (2) of Assumption 3.1 are different to the conditions assumed in [4] (compare also to Condition imposed in [5], or Condition stated in [7]). In [4] it is assumed that a given covariance function is not only bounded on , but also it decays sufficiently fast in the Euclidean norm on . Explicitly, it is assumed in Condition 2.1 of [4] that there exists a finite constant , which is independent of , such that for any , . This polynomial decay condition on can be interpreted as a summability condition on the entries of the respective covariance matrices , which guaranties that the maximal eigenvalues of are uniformly bounded in , and (see Lemmas D.1 and D.5 in [4]). Note that the exponent can be replaced by , with some fixed constant (see also (6) in [6]). In the present study we show that under the assumption of a minimal spacing between any two distinct observation points, if has compact support on , the number of possible observation points, which are covered by the support of , must be bounded uniformly in , and (see Lemma B.1). This, together with the condition that is also uniformly bounded on and , will be sufficient to conclude that the maximal eigenvalues of are uniformly bounded in , and (see Lemmas 4.1 and B.3). Similar remarks can be made with regard to the conditions imposed on the partial derivatives of with respect to (see Lemma B.5). In addition, (3) of Assumption 3.1 is also imposed in [4] (compare also to [8] and [7]). It guarantees that the minimal eigenvalues of are bounded from below, uniformly in , and (see Lemmas 4.1 and B.3). Finally, we remark that within the framework of compactly supported covariance functions, the given conditions are very minimal and can be considered as classical in the context of ML estimation. Especially, if one is not interested in the asymptotic distribution, and rather seeks conditions under which ML estimators are consistent (with regard to a concrete example, we refer to Remark 6.2).
3.2 Regularity conditions on the family covariance approximations
Given , we let denote a sequence of real valued functions defined on . The families can be put under the following assumption.
Assumption 3.2 (Regularity conditions on ).
-
(1)
For any and , the function is measurable and such that for any .
- (2)
-
(3)
.
- (4)
-
(5)
For any , , we have that
To make the notation easier, we write . In the following, we formally introduce covariance matrix approximations (random and non-random versions). To do so, let be such that as . Given , we let denote the non-random matrix based on a given family . Then, on , if is a family of Borel measurable sequences of functions, we write
for the random matrix based on a finite collection of . Colloquially we will use the term covariance approximation when we refer to a given family , which can approximate a family of covariance functions in the sense of Assumption 3.2. In these terms itself is a covariance approximation. The expression covariance matrix approximation will be used for both, and its random version . Similar, we use the expression covariance matrix for both, and .
Remark 3.2.
(1), (2) and (4) of Assumption 3.2 are natural extensions of (1) and (2) of Assumption 3.1. Notice that the measurability condition imposed in (1) of Assumption 3.2 makes sure that is measurable. Condition (3) of Assumption 3.2 specifies in which sense a family approximates the family . We require that converges uniformly on to , where the convergence is also uniform on the parameter space . In fact, we will show (see Lemmas B.3 and 4.1) that the uniform convergence of to , together with the condition that the families and have uniformly bounded compact support, are, among others, sufficient criteria to proof that the matrices and are asymptotically (as ) equivalent, uniformly on and . Condition (5) of Assumption 3.2 will allow us to conclude that a similar result holds true for the first, second and third order partial derivatives (with respect to ) of and . For concrete examples of covariance approximations, where the conditions of Assumption 3.2 are verified, we refer to Section 6.
4 Uniform asymptotic equivalence of covariance matrices and covariance matrix approximations
This section presents intermediate results on covariance matrices and approximations. In particular, Lemma 4.1 gives precise conditions under which eventually (for large enough) remains positive-definite with probability one.
5 Truncated-ML estimators
Given a square matrix , we define to be the product of the strictly positive eigenvalues of . If all of the eigenvalues are less or equal to zero, . Further, we use the notation for the pseudoinverse of (sometimes called Moore-Penrose inverse). For the given collection , we define, on , for any and , the random variable
(5) |
Given , shall be called the truncated-modified log-likelihood function based on . A sequence of estimators , defined on , will be called a sequence of truncated-ML estimators for based on , if for any ,
Similarly, on , for a given collection of sequences of real valued functions , we introduce, for any and , the random variable
(6) |
Then, for , the function denotes the truncated-modified log-likelihood function based on . A sequence of estimators , defined on , will be called a sequence of truncated-ML estimators for based on , if for any
(7) |
At this point is important to note that for a given , it is in general not true that and are continuous in for any . Nevertheless, a consequence of Lemma 4.1 is the following proposition:
Proposition 5.1.
Using Proposition 5.1, we notice that if, for any and , both and are times differentiable, we have that and are times differentiable for large enough, respectively.
For the rest of the article, if we refer to truncated-ML estimators (without mentioning further whether estimators are based on families of covariance functions or approximations), we refer to both, truncated-ML estimators based on families of covariance functions and approximations. The same is applied for the notion of truncated-modified log-likelihood functions based on either covariance functions or approximations. However, if satisfies the assumptions of Proposition 5.1, a sequence of truncated-ML estimators shall be simply called a sequence of ML estimators for . Similarly, we will simply refer to a modified log-likelihood function when the given family is under the assumptions of Proposition 5.1.
Remark 5.1.
The introduction of truncated-modified log-likelihood functions is not standard. Modified refers to the fact that the log-likelihood for the Gaussian density function of a random vector is scaled by . This is common practice in the literature about ML estimators for covariance parameters under an increasing-domain asymptotic framework (see for instance [4], [5] and also [7]). The matrices and are not necessarily positive-definite. In particular, can be negative-definite. If the matrices and are not positive-definite, we truncate the log-likelihood by a pseudo-determinant and -inverse to obtain the functions and . Hence, the use of the expression “truncated”.
Remark 5.2.
5.1 Consistency and asymptotic normality of truncated-ML estimators
The main results of this section are that under suitable conditions on the families of covariance functions and approximations, truncated-ML estimators for covariance parameters are not only consistent (Theorem 5.2 and Corollary 5.3) but also asymptotically normal (Theorem 5.4 and Corollary 5.5). In particular, we will make use of the conditions presented in Assumptions 3.1 and 3.2. However, in the context of random fields that are observed at randomly perturbed regular grid locations as defined in (2), we will further make use of the following two technical conditions that were also imposed in [4]. Associated to the common range , of the process , we define the set , where denotes the set of differences between two points in .
Assumption 5.1 (Asymptotic identifiability around ).
For , there does not exists such that for all . If , there does not exists such that is zero a.e. with respect to the Lebesgue measure on and .
Assumption 5.2 (Local identifiability around ).
For , there does not exists such that for all . For , there does not exists such that is zero a.e. with respect to the Lebesgue measure on and .
Theorem 5.2.
Let be a sequence of truncated-ML estimators for based on . Assume that satisfies Assumption 3.1 (regarding (2), and the continuity of first order partial derivatives is sufficient) and Assumption 5.1. Suppose further that satisfies Assumption 3.2 (regarding (4) and (5), and the continuity of first order partial derivatives is sufficient). Then, we have that
The following corollary is immediate.
Corollary 5.3.
Before we present the results about asymptotic normality, it is helpful to consider some additional notation. Let , such that for any , the sequences of functions
(8) |
and
(9) |
are differentiable with respect to . Note that if satisfies Assumption 3.1 and the collection satisfies Assumption 3.2, then we know about the existence of such a under application of Proposition 5.1. For the given , on , we introduce the sequence of random functions
where for and , the random vector has components , , with
and thus
(10) |
Similarly, on , we introduce the sequence of random functions
where for any and , the components of are given by
and thus
(11) |
If the collection satisfies Assumption 3.1, we simply write, for any ,
for the random Jacobi-matrix of evaluated at .
Theorem 5.4.
Corollary 5.5.
Remark 5.3.
Under Assumptions 3.1, for any , with probability one. However, even if is under Assumptions 3.1 and is under Assumptions 3.2, it is not in general true that , where is as in Proposition 5.1. Notice further that under Assumptions 3.1, for , represents the second derivative of the log-likelihood based on .
6 Example of application: Generalized Wendland functions
In this section we work in the same setting as in Section 2.2, but we additionally assume that is isotropic. Explicitly, for the given family of covariance functions , we assume that there exists a parametric family such that for any , , . The family is called the radial version of . We can recycle the notation of Section 3 and easily translate Assumptions 3.1 and 3.2 by considering families of approximations for on . This allows us to readily recover the results of Sections 4 and 5 for isotropic random fields. For the details we refer to Assumptions A.1 and A.2, as well as Theorems A.1 and A.2 in Appendix A.
In terms of an explicit family of radial covariance functions, we reconsider the generalized Wendland covariance function which we have already introduced in (1) of Section 1.3. Let , where , with and . We assume that the covariance function of the random field is given by , , , where belongs to the family which is defined by
(13) |
where
compare to (1) of Section 1.3. We treat and as given but such that and . Notice that the latter restriction on and makes sure that for any , belongs to the class , the class of real valued and continuous functions, defined on , which are strictly positive at the origin and such that for any finite collection of points in , evaluation at the Euclidean norm of pairwise differences between points of the collection results in a non-negative definite matrix (see for example [21]). Actually, in the latter reference it is argued that for , if and only if . For the respective family defined on , we use the notation .
Remark 6.1.
The restriction is imposed to proof that the family satisfies Assumptions 5.1 and 5.2 (see the proof of Propositions 6.2). This is not surprising, as defines the minimal spacing between pairs of distinct observation points of the randomly perturbed regular grid, defined in (2) of Section 2.2. Further, as we have noted that if and only if , the two smoothness parameters and can not be estimated without further constraints.
Proposition 6.1.
Let . Then, the family satisfies Assumption A.1, where for any and for any , , the functions and are continuous on .
Using Propositions 6.1 and 6.2, under application of Theorems A.1 and A.2 (recall also Corollaries 5.3 and 5.5), we obtain the following result:
Proposition 6.3.
Let . A sequence of ML estimators for based on is consistent. Further there exists a non-random symmetric matrix such that
Remark 6.2.
It is worth to note that the restriction is only needed for the asymptotic distribution of ML estimators, respectively truncated-ML estimators. In particular, in Proposition 6.1, if one only demands conditions involving first order partial derivatives of , with respect to , is sufficient. With regard to consistency of the estimator in Proposition 6.3, is sufficient as well. The same applies for the truncated-ML estimators considered in Examples 6.1, 6.2, 6.3 and 6.4. Keeping in mind the differentiability conditions imposed in Assumption A.1, the given restrictions on are not surprising (compare also to [9], within the infill-domain asymptotic framework).
We discuss four examples of generalized Wendland approximations.
Example 6.1 (Truncation of ).
Let be as in Proposition 6.1. Let be defined as follows: For and , we set,
Proposition 6.4.
A sequence of truncated-ML estimators for based on is consistent and we have that
where is defined as in Proposition 6.3.
In the following we let denote a real constant, which is independent of such that .
Example 6.2 (Trimmed Bernstein polynomials).
Let be as in Proposition 6.1. We consider a family defined as follows: For and , we set for ,
with
the Bernstein polynomial of the function on , where and we assume that . Thus, for any ,
the distance between adjacent points converge to zero as approaches infinity. See also [12] for an introduction of Bernstein polynomials on unbounded intervals.
Proposition 6.5.
The family satisfies Assumption A.2.
Using Propositions 6.1, 6.2 and 6.5, under application of Theorems A.1 and A.2, we have proven the following result:
Proposition 6.6.
A sequence of truncated-ML estimators for based on is consistent and we have that
where is defined as in Proposition 6.3.
Example 6.3 (Linear interpolation).
Let be as in Proposition 6.1. For a given , we consider a partition of the interval , , where and for , . Then, we define the family as follows: For and , we set for ,
where
Thus, for a given , represents a linear interpolation of the function on the interval .
Proposition 6.7.
The family satisfies Assumption A.2.
Using Propositions 6.1, 6.2 and 6.7, under application of Theorems A.1 and A.2, we have further proven the following result:
Proposition 6.8.
A sequence of truncated-ML estimators for based on is consistent and we have that
where is defined as in Proposition 6.3.
Example 6.4 (Vanishing nugget effect).
Let be as in Proposition 6.1 and consider a family that satisfies Assumption A.2. Then, define for any and , the function
(14) |
where is independent of and and such that , as . Note that since the family satisfies Assumption A.2, we could also choose in (14).
Proposition 6.9.
A sequence of truncated-ML estimators for based on is consistent and we have that
where is defined as in Proposition 6.3.
Remark 6.3.
As it was already mentioned in the introduction, computing (13) is costly. However, if is a positive integer, closed form solutions of (13) exist. More specifically, if , then
where is a polynomial of order and the Askey function ([3]) of order ,
In addition, if , a positive half-integer, it is shown in [28] that further closed form solutions of (13), involving polynomial, logarithmic and square root terms, exist. Thus, in the specific example of generalized Wendland covariance functions, covariance approximations will facilitate computing (13) when .
7 Covariance taper approximations: Beyond compactly supported covariance functions
Asymptotic properties of (regular) tapered-ML estimators were addressed in both the infill- and increasing-domain asymptotic framework (see [23], [14], [30] and [16]). The direct functional approximation approach studied here can be combined with covariance tapering. Given observations of , it is known that under weak assumptions on the presumed covariance function, ML estimators based on tapered covariance functions (tapered-ML estimators) preserve consistency (see [16], in particular Corollary 2 in the increasing-domain framework). However, this is the case for covariance tapers that have a compact support which is not fixed, but rather grows to the entire as the number of observations from increases. Within an increasing-domain asymptotic framework, given a fixed compact support of the covariance taper, one can in general not expect tapered-ML estimators to be consistent. Still, under suitable conditions, tapered-ML estimators asymptotically minimize the Kullback-Leibler divergence (see for instance Theorem 3.3 in [5]). Given the theory developed here, we can readily recover the same result for truncated-tapered ML estimators, ML estimators based on tapered covariance function, where the covariance taper is replaced with a functional approximation of it. To be more formal, let us remain in the setting of Section 2, but assume that has true and unknown covariance function , , which belongs to a family which satisfies:
-
•
For any , is continuously differentiable
-
•
There exist constants and such that for all , for all and for all , and
- •
The given assumptions are very weak and satisfied for instance for the Matérn family (see also Condition 2.1 in [4] or Remark 3.1). Then, we consider a fixed covariance taper , , , compact and convex. We assume that belongs to a family of tapers that satisfies Assumption 3.1 (regarding (2), and the continuity of first order partial derivatives is sufficient). As we have seen in Section 6 (Proposition 6.1), we may choose, with , , , a generalized Wendland taper (see also Remark 6.2). In the given context it is more convenient to write , where is the taper range, that is for . Based on a finite collection of , on , we then define the tapered covariance matrix , . Additionally, we consider a covariance matrix approximation
of , where is a sequence of functions that belongs to a family of taper approximations , for which Assumption 3.2 applies. Again, we write , , to highlight the fixed range parameter. We note that the results of Lemma 4.1 and Proposition 5.1 remain true with and replaced with and , respectively. We know (see Remark 2.1) that the conditional distribution of given is given by the random variable . On the other hand, we can assume a misspecified distribution , where the true covariance matrix is replaced with the tapered covariance matrix , . Then, we define the scaled (see [5]) conditional Kullback-Leibler divergence of from ,
The distribution shall be called a regular taper miss-specified distribution. If we choose ( as in Proposition 5.1), we can even further misspecify the distribution of given by replacing with in . This gives rise to the scaled conditional Kullback-Leibler divergence of from ,
We use the notation and for ML and truncated-ML estimators for with respect to and , respectively. In accordance with the literature about tapered-ML estimators, the estimators and are then further referred to as tapered-ML estimators and truncated-tapered ML estimators, respectively. We can now state the following theorem:
Theorem 7.1.
We have that
(15) |
and as ,
(16) |
where .
Therefore, in the given scenario, truncated-tapered ML estimators asymptotically minimize the conditional Kullback-Leibler divergence of taper misspecified distributions from the true distribution (compare also to Theorem 3.3 in [5]). Thus, in terms of Kullback-Leibler divergence, truncated-tapered ML estimators and tapered-ML estimators perform asymptotically equally well.
8 Discussion and outlook
With the introduction of truncated-likelihood functions, we allow for more far-reaching forms of covariance approximations, such as linear interpolations or polynomial approximations. Our approximation approach relates directly to the presumed covariance function. Thus, combinations with existing approximation methods such as low-rank or covariance tapering approaches are well possible. We studied the quality of truncated-ML estimators from an asymptotic point of view. For compactly supported covariance functions, the conditions imposed in Sections 3 and 5 permit us to obtain truncated-ML estimators that are asymptotically well-behaving. That is, we obtain estimators that are consistent and asymptotically normal. Our proof strategies were strongly influenced by [4]. We have provided a comprehensive analysis for the family of generalized Wendland covariance functions. That is, we give precise conditions on smoothness, variance and range parameters, under which ML estimators for variance and range parameters are consistent and asymptotically normal. To our knowledge, this does not exists in the literature so far (compare also to [9], within the infill-domain asymptotic context). Further, we gave four examples of generalized Wendland approximations, for which truncated-ML estimators preserve consistency and asymptotic normality.
We now discuss some open questions. Our results on consistency and asymptotic normality depend on the condition that correlations vanish beyond a certain distance. It would be of interest to recover the consistency and asymptotic normality results for truncated-ML estimators, where the assumption of a compact support is dropped. To this end, we recall that the imposed conditions on covariance functions and approximations resulted in the uniform asymptotic equivalence of covariance matrices and approximations. Using this, we established the existence of a positive integer , after which covariance matrix approximations remain positive-definite. Expanding to non-compactly supported covariance function, this result remains unchanged, as long as covariance matrices and approximations are uniformly asymptotically equivalent (uniformly on the parameter and sample space). Thus, in this case, consistency and asymptotic normality can be recovered, even when presumed covariance functions are no longer compactly supported. However, as a mere condition, the asymptotic equivalence of covariance matrices and approximations is of little practical importance. Thus, the case of non-compactly supported covariance functions deserves further attention.
From a more applied point of view, our results provide a strong theoretical basis for further research. It remains to test and extend the given examples of covariance approximations. The four examples of generalized Wendland approximations and their effect on parameter estimations were discussed from a theoretical point of view. An important next step is to provide numerical implementations and practical comparisons.
In conclusion, for large datasets built upon correlated data, the present work provides an essential missing piece in the area of covariance approximations.
Appendix A Covariance approximations for isotropic random fields
We consider families of approximations for on and translate (recycling the notation of Section 3) Assumptions 3.1 and 3.2 as follows:
Assumption A.1 (Regularity conditions on ).
-
(1)
There exist real constants , , which are independent of , such that , with and .
-
(2)
For any , the first, second and third order partial derivatives of exist. In addition, for any , , , where there exist constants , which are independent of , such that and .
-
(3)
Fourier inversion holds, that is for any ,
where is continuous and strictly positive.
Assumption A.2 (Regularity conditions on ).
-
(1)
For any , for any , the function is measurable.
- (2)
-
(3)
.
- (4)
-
(5)
For any , , we have that
Note that the family satisfies Assumption A.1 if and only if satisfies Assumption 3.1. Further, for any and , we have that
on . Thus, a sequence of truncated-ML estimators for based on is a sequence of truncated-ML estimators for based on . If we define a sequence of truncated-ML estimators for based on a given upon replacing in (7) with the random matrix , we can recover the results of Sections 4 and 5:
Theorem A.1.
Let be a sequence of truncated-ML estimators for based on . Assume that satisfies Assumption A.1 (regarding (2), and the continuity of first order partial derivatives is sufficient) and satisfies Assumption 5.1. Suppose further that satisfies Assumption A.2 (regarding (4) and (5), and the continuity of first order partial derivatives is sufficient). Then,
Appendix B Supporting results
Let be such that as . For the families and , we introduce, for any and , for an arbitrary , for any , , the non-random matrices
and
whenever the above partial derivatives with respect to exist. Further, for Borel measurable sequences of functions , we introduce, on , the random matrices
and
whenever the above partial derivatives with respect to exist.
Lemma B.1.
Let , be some real constants. Consider such that , with and . Then, for any , for any sequence ,
(17) |
where , with . Further, we also have that
(18) |
Remark B.1.
Lemma B.2.
Let , be some real constants. Consider a sequence of functions , with values in , where for any , , with and . Then, for any , for any sequence ,
(19) |
where , with . Further we also have that
(20) |
Lemma B.3.
Corollary B.4.
Let , and be as in Lemma B.3. Then, we have that
In addition we can conclude that
In particular we have that
Lemma B.5.
Suppose that satisfies (2) of Assumption 3.1. Consider that satisfies (1), (4) and (5) of Assumption 3.2. Then, for any , , we have that (21) and (22) of Lemma B.3 are satisfied with and replaced with the respective partial derivatives and . In particular, for any , , we have that
and in addition it is true that for any , ,
Lemma B.6.
Let be fixed. On , for , we consider a sequence of random symmetric matrices , , such that , for any , . Further we assume that there exists such that , for , . Let , , , be another sequence of random symmetric matrices, defined on the same probability space, which is such that , for ,
Finally we also assume that , for any ,
Then, we have that
Lemma B.7.
On , consider two sequences of random matrices and , , such that
Then, we have that
(25) |
Lemma B.8.
Suppose that satisfies Assumption 3.1 and 5.2 (regularity conditions for partial derivatives up to order are sufficient). Suppose further that satisfies Assumption 3.2 (regularity conditions for partial derivatives up to order are sufficient). Let be as in Proposition 5.1 and define and as in (10) and (11), respectively. We then have that
(26) |
Further, we conclude that the random matrix converges in probability to a non-random matrix , where
Appendix C Proofs
C.1 Proof of results in Appendix B
Proof of Lemma B.1.
Let . For such that we have that and thus as well (since for any ). Therefore, follows since we have assumed that has compact support . The proof of (17) depends on the fact that there exists a minimal spacing between any two distinct observation points (see (4)). This allows us to show that for some arbitrary , if denotes the cardinality of the set , we have that . For a complete argument one could for example consider the proof of Lemma 4 in [16]. Using this we can estimate,
and thus also (17) is proven. ∎
Proof of Lemma B.2.
The proof is similar to the proof of Lemma B.1 and hence we consider the lemma as proven. ∎
Proof of Lemma B.3.
Let , and , be defined as in (1) of Assumption 3.1 and (2) of Assumption 3.2, respectively. We use Lemma B.1 to show that there exists a real constant , which does not depend on , and such that for any , ,
(27) |
To see this, let and . Using (1) of Assumption 3.1, we have that for any , , where now and , with and finite constants that are independent of and . Thus we can write, for any , and , by Gershgorin circle theorem,
under application of Lemma B.1, with , where . Note that is independent of , and . Similarly, by (2) of Assumption 3.2 we then use Lemma B.2, together with Gershgorin circle theorem, to show that for any , and , as well. This shows (27). Thus, we have established that
and
and therefore (21) of Lemma B.3 is verified. It is shown in [4] (Proposition D.4) that because of the increasing-domain setting, where there exists a minimal distance between any two observation points (see (4)), and since (3) of Assumption 3.1 is satisfied,
This shows (23) of Lemma B.3. Using this result, we can fix some (small enough, independent of , and ), such that for any ,
For the above , we can then find such that,
(28) |
This is valid since for the given , by the uniform convergence of to (see (3) of Assumption 3.2), we find such that for any , for any and ,
Then, if we define
since we have assumed that the families and have compact supports, which belong to , we have that for . Thus, by Gershgorin circle theorem, under application of Lemma B.2, for and ,
Since is independent of , and , we can conclude that (28) must be satisfied. Using (28), we have, for , and , and for vectors such that , that
under application of the Cauchy–Schwarz inequality. In conclusion we have for vectors such that , for , and ,
But we know that and was chosen small enough (but otherwise arbitrary). Thus, we have also proven (24) of Lemma B.3. Notice that (22) is proven with (28), hence the proof of Lemma B.3 is complete. ∎
Proof of Lemma B.5.
Proof of Lemma B.6.
For ( as in the statement) and , we can write
and
Note that and are random symmetric matrices. Further, for each of the random symmetric matrices
the smallest eigenvalue is strictly greater that zero, , uniformly in and and hence we have that
(29) |
In addition, since for , by assumption
and
we also have that
(30) |
and
(31) |
Using (29), (30) and (31), we pick arbitrary and define
such that for some given integer , ,
Now write
(32) |
We can then estimate (C.1) from above and below as
But for the given , for , we have that
On the other hand, by (31), we also have that for ,
Since was arbitrary and independent of , the lemma is proven. ∎
Proof of Lemma B.7.
First, using the Cauchy–Schwarz inequality and the compatibility of the spectral norm with the Euclidean norm, we can estimate
Let us fix some arbitrary such that for large enough we have that ,
Then, let be arbitrary and notice that
where is a Gauss vector, defined on , with zero-mean vector and identity covariance matrix. Then, we use Markov’s inequality to estimate
where the latter term is bounded uniformly in and (see Lemma B.3). Thus we conclude that
which shows that
and thus the proof is complete. ∎
Proof of Lemma B.8.
For , let . Then, for , we have that
and
Similar expressions can then be calculated for based on . We can further calculate, for , for , ,
where
(33) |
and
(34) |
In addition, for , we also have that ,
Again, similar expressions can be obtained for based on , where for , , the respective terms and are defined as in (33) and (34), respectively, but is replaced with . Then, we have for , for , ,
We can apply Lemma B.7 to the sequence of random matrices and to conclude under application of Lemma 4.1 (see also Corollary B.4 and Lemma B.5) that
We also have
using the triangular inequality, von Neumann’s trace inequality and Lemma 4.1 (see also Corollary B.4 and Lemma B.5). Hence, we have shown that for any
In addition, we have that for any , , the expression
is bounded from above by
which again, under application of the triangular inequality, von Neumann’s trace inequality and Lemma 4.1 (see also Corollary B.4 and Lemma B.5), converges to zero Hence, we have shown that for any ,
which concludes the proof of (26). Now it is shown in [4] (see Propositions D. and D. and also consider the proofs of Propositions and ), under application of Lemmas B.1, B.2, 4.1, B.5 and Corollary B.4, that
where is the limit of a sequence of matrices defined as
Further, by Assumption 5.2, it is concluded that the limit is such that . But then, we use (26) to show that
as well, which concludes the proof of Lemma B.8. ∎
C.2 Proof of results in Section 4
C.3 Proof of results in Section 5
To simplify the notation, we write .
Proof of Theorem 5.2.
Let be as in Lemma 4.1 (or Proposition 5.1) and define, for any , the sequence as in (9) of Section 5.1. We note that, under the given assumptions of Theorem 5.2, the first order partial derivatives with respect to exist for the sequence . Then, we define the sequence of estimators . Therefore minimizes for any . To prove that
it is sufficient to show that
(35) |
We consider a similar approach as given in [4]. As is fixed, we write for , . Under the assumptions of the theorem we have that
(36) |
and
(37) |
To see it, we remark that (using Proposition 5.1),
From here, we can use von Neumann’s trace inequality to show that
Now, by Lemma 4.1 (and Corollary B.4) we can conclude that there exists a real constant , such that for any , , , which proofs (36). For (37), we first notice that by Lemma B.5, there exist constants , (which are independent of , and ) such that
Using this result have that
(38) |
Let be a Gauss vector on , with zero-mean vector and identity covariance matrix. Then (see also Remark 2.1), for any finite , we have that the probability
is bounded from above by
Therefore, ,
is bounded from above by as well. Since , (37) is shown.
Notice further that is convex, is continuously differentiable and by (38)
Thus, under application of Corollary of [27], with (36) and (37), we can conclude that
(39) |
To continue, we define the sequences of random variables
For any , we have that ,
Similarly, For any , we have that
Notice that because of (39) we have that
Further, it is shown in [4] (see the proof of Proposition ) that under application of Lemma B.3 there exists some constant (which does not depend on ) such that
Under application of Lemmas B.1, B.2, B.3 and Corollary B.4, it is then shown in the proof of Proposition of [4] that either, if , is deterministic and we have that
where the limit is given by . Or and it is concluded that
where in this case
Notice that because of the assumption that is independent with common law that has a strictly positive probability density function, the function is strictly positive almost everywhere with respect to the Lebesgue measure on (see the end of the proof of Proposition in [4]). In either case, we can thus conclude that
where for any , because of Assumption 5.1, , and the limit is deterministic. We now want to show that there exists some such that for any , for any , ,
(40) |
as well. In this case, with a random function on and a deterministic function of , we would have for any fixed , and for any given ,
where the sequence of estimators , is such that for , ,
and we can conclude the proof of Theorem 5.2, using Theorem of [31]. Hence, it remains to show (40). We write ,
where
and
By Lemma 4.1, Corollary B.4 and Lemma B.6, we already conclude that
converges to zero uniformly in as . Further, we can conclude that
and thus, since by Lemma 4.1 , and are finite, uniformly in and , and
uniformly in , by application of Corollary B.4, we can also see that
converges to zero as , uniformly in . Using a similar argument we can also show that the term converges to zero as , uniformly in . Hence, we have shown that
uniformly in and we can argue that there exists some such that for all , on , which shows (40). Therefore, we have that
which concludes the proof. ∎
Proof of Theorem 5.4.
Let be as in Proposition 5.1 and define, for any , the sequences of functions and as in (8) and (9) of Section 5.1, respectively. From the proof of Theorem 5.2 we know that sequence of estimators , is such that
Define and as in (10) and (11) respectively. For we set . We have for , , for ,
where, by Lemma 4.1 (see also Corollary B.4 and Lemma B.5) , for , and are finite, uniformly in . Further, notice that
From the proof of Lemma B.8, we already know that , , where for any , for any ,
Now, if we define, on , the sequence of random matrices
we also have that, for any , , . This follows from the fact that for any , we have that
where
and
and again, under application of the triangular inequality, von Neumann’s trace inequality and Lemma 4.1 (see also Corollary B.4 and Lemma B.5) we thus have that . But is the limit of and hence we conclude that is also the limit of . Then, we can apply Proposition D.9 of [4] to conclude that
Notice that because the family satisfies Assumption 3.2, we have that for fixed , is twice differentiable in and we can argue exactly as in the proof of Theorem 5.2 to conclude that the sequence
is bounded in probability . In addition, by Lemma B.8, we also have that
Finally, the sequence of estimators is consistent and such that
Thus we conclude, using for example Proposition D.10 in [4] that
Since was fixed, we can conclude that
as well. ∎
C.4 Proof of results in Appendix A
C.5 Proof of results in Section 6
Since and are assumed to be known, we put . We define the function , . We recall that for ,
Proof of Proposition 6.1.
We have already seen that for any , given known and , is continuous on . Further, for any , has compact support and since , we can also see that for any , for any , and thus , which implies that . Hence, with and , and are independent of and hence we can conclude that (1) of Assumption A.1 is satisfied with replaced with . It is now sufficient to show that for any , for any , , there exist constants , , such that
(41) |
where
(42) |
with , , independent of . This means that in general we need to check the above condition for partial derivatives. Let us first focus on the partial derivatives with respect to the range parameter . For we write
where and are continuously differentiable on . To simplify the notation we will put . Then, since is continuous and for any , since ,
exists and is continuous on the rectangle , we can conclude, using the general Leibniz integral rule, that is continuous and given by
Hence, for
(43) |
exists and is continuous as a function of . But clearly, as for is zero, we have that for , exists as well and is continuous as a function of . Hence, for any , exists, is given by (43) and is continuous as a function of . Thus, by monotonicity of we define
and have that
Further, we find
where and do not depend on . Since we have assumed that , we can now repeat the arguments, which led to (43), for another two times, and conclude that for any , , and exits as well, are given by
(44) |
with
and , with
and are both continuous as a function of . Therefore, since , and are non-negative and monotonously decreasing, we can define
and
as well as , and have that
Then, we also find
as well as
and , where , and and do not depend on . This then shows that the partial derivatives of with respect to the range parameter exist up to order three and are continuous on with uniform bounds that do not depend on and compact supports that are subsets of . Let us now focus on the partial derivatives with respect to . We can readily see that for ,
(45) |
and thus with and we can choose and such that (41) and (42) are satisfied. Notice that for any , both and are zero. Thus, the existence of the desired constants , and , for and , and , for , such that (41) and (42) is satisfied, is clear. Let us now consider the mixed partial derivatives. Using (43) and (45), we have
and thus the existence of constants , and , for and such that (41) and (42) is satisfied follows with
and
Using (43), (44) and (45) we further have that
and thus
with , and
Further, (42) is satisfied with , and
Finally we can notice that
and hence we can verify the existence of constants
and
for , , and , such that (41) and (42) is satisfied. Thus, we have shown that for , satisfies (2) of Assumption A.1, where for any , , can be replaced with . It now remains to show that (3) of Assumption A.1 is satisfied. We already know, since , that is continuous and non-negative definite on . We write and for the spaces of Lebesgue integrable functions on and , respectively. Since we have that . Thus we can conclude, using for example Theorems and in [33], that for any , , where
with the Bessel function of order . This also shows is uniformly continuous on , a member of and Fourier inversion holds (see for example Theorem and Corollary in [29]). It remains to check that is continuous. In the present case, where and , one has actually already established a closed form representation of . We can refer to Theorem in [11] (see also Theorem in [9] for a nice summary and further results) and write for ,
where with , , with
and for any ,
a special case of the generalized hypergeometric functions (see also [1]), where for , denotes the Pochhammer symbol. Note that for , (), is defined via its analytic continuation. Since we know that is continuous on the entire and
is continuous in , we can further note that
This then shows that is continuous as a composition of continuous functions and hence the proposition is proven. ∎
Proof of Proposition 6.2.
We first show that Assumption 5.1 is satisfied. We write and and show that implies that for all . Suppose first that but we then have that for all , since for any , and thus . Suppose now that either , with but , or but . Then, let us assume that . We have with , by monotonicity of , that either
for all or
for all . When , we will in either of the above cases have for all . Further, we can also use a similar argument for the case where either , with but or , or but or . Thus we have shown that implies that for all . Then, for , since , at least contains integers . Therefore implies on . If , since , has non zero Lebesgue measure. We have thus shown that Assumption 5.1 is satisfied. Let us now show that also satisfies Assumptions 5.2. To do so, fix some interval , where . We will show that for any , there exists such that
(46) |
where is called the Wronskian of and at . This then shows that the functions and are linearly independent on the entire interval , more explicitly, for any ,
will imply that . This then shows that there does not exist , such that for any ,
a.e. with respect to the Lebesgue measure on on . This then justifies, for both cases, either , or , that also Assumption 5.2 must be satisfied. Hence, let us show (46). We can calculate, using arguments from the proof of Proposition 6.1, that for ,
and
Therefore we have that for , is given by
But the latter expression is not equal to zero on the entire . To see it, assume by contradiction that indeed for all . Using standard algebraic manipulations one can show that this is equivalent to assume that the function
with
and
is constant equal to on . But this makes no sense and thus we arrive at a contradiction. Hence, there exists such that (46) is satisfied, which shows that Assumption 5.2 is satisfied and thus concludes the proof of Proposition 6.2. ∎
Proof of Proposition 6.4.
The goal is to check that satisfies Assumption A.2, then we conclude using Propositions 6.1 and 6.2, as well as Theorems A.1 and A.2. We first notice that for any , and any , , and are Borel measurable functions on . In addition, for any and , has support , with that satisfies . Further, one can verify that the family is also uniformly bounded by on and it converges uniformly to on , independent of , that is . Thus (1), (2) and (3) of Assumption A.2 are satisfied. To verify the remaining assumptions, we view as the result of a truncation operator evaluated at . That is, . Then, we remark that for any , , for any ,
Thus, by Proposition 6.1, also (4) and (5) of Assumption A.2 are satisfied.
∎
Proof of Proposition 6.5.
For any , for any , is continuous on and it is also continuous as a function of (see also the proof of Proposition 6.1). Further, it converges uniformly to on , independent of . That is
To see this we can rely, for example, on the proof of Theorem 2.3.1 in [24]. There, it is shown that for any and , for any , there exists such that
for large enough. Since is compact and is continuous, we can choose , independent of and , such that the above inequality is satisfied for arbitrary , with replaced with . Then, we conclude by taking the supremum on the left and right over and . For any , for any , we can write
Notice that because , the latter term is actually zero independent of and thus we have that converges uniformly to on the entire , independent of . Thus, we have that . Note also that the convergence (in the uniform norm) of to in particular implies that the sequence of functions is bounded on for any . Therefore we can use that
to find and , two constants, which are independent of and (recall that is compact), such that (2) of Assumption A.2 is satisfied. Clearly, for any and for any , the function is measurable. In conclusion we have shown that (1), (2) and (3) of Assumption A.2 are satisfied. In the proof of Proposition 6.1 we have shown that for any , for any , , there exist constants , , such that
where for any , , . In addition, we notice that for any , , for any ,
where is the Bernstein polynomial operator for a function with support included :
for and zero otherwise. Therefore, we can rely on the same arguments that we have used to show that (2) and (3) of Assumption A.2 are satisfied, to show that also (4) and (5) of Assumption A.2 must be satisfied. This then concludes the proof of Proposition 6.5. ∎
C.6 Proof of results in Section 7
Proof of Theorem 7.1.
Given a collection of , let , , denote the covariance matrix based on the family . We first note that under the given assumptions on the family , we have that
with probability one. This can be seen from Proposition D.4 and Lemma D.5 in [4]. Using this, the proof of (15) is immediate, it follows from Lemmas 4.1 and B.6.
If we proof
(47) |
where , (16) follows from (15), and we are done. We note that (47) is established if we prove
(48) |
where
the random version of the modified log-likelihood function based on the tapered covariance function. This is seen from the proof of Theorem 3.3 in [5]. But under the given assumptions, the family satisfies Assumption 3.1 (regarding (2), up to and the continuity of first order partial derivatives). Thus (48) can be shown as it was shown (see (39)) in the proof of Theorem 5.2. ∎
Acknowledgments
The authors thank Roman Flury for all the stimulating discussions that were held during the development of this work. This work was supported by the Swiss National Science Foundation SNSF-175529.
References
- [1] Milton Abramowitz and Irene A. Stegun, editors. Handbook of Mathematical Functions, with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series, No. 55. U. S. Government Printing Office, Washington, D. C., 1965. Superintendent of Documents.
- [2] Ethan Anderes. On the consistent separation of scale and variance for Gaussian random fields. The Annals of Statistics, 38(2):870–893, 2010.
- [3] Richard Askey. Radial characteristic functions. Technical report, research center, univ. Wisconsin-Madison, Madison, WI, 1973.
- [4] François Bachoc. Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes. Journal of Multivariate Analysis, 125:1–35, 2014.
- [5] François Bachoc. Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case. Bernoulli, 24(2):1531–1575, 2018.
- [6] François Bachoc. Asymptotic analysis of maximum likelihood estimation of covariance parameters for Gaussian processes: An introduction with proofs. In Abdelaati Daouia and Anne Ruiz-Gazen, editors, Advances in Contemporary Statistics and Econometrics, pages 283–303. Springer, Cham, 2021.
- [7] François Bachoc, José Betancourt, Reinhard Furrer, and Thierry Klein. Asymptotic properties of the maximum likelihood and cross validation estimators for transformed Gaussian processes. Electronic Journal of Statistics, 14(1):1962–2008, 2020.
- [8] François Bachoc and Reinhard Furrer. On the smallest eigenvalues of covariance matrices of multivariate spatial processes. Stat, 5:102–107, 2016.
- [9] Moreno Bevilacqua, Tarik Faouzi, Reinhard Furrer, and Emilio Porcu. Estimation and prediction using generalized Wendland covariance functions under fixed domain asymptotics. The Annals of Statistics, 47(2):828–856, 2019.
- [10] Federico Blasi, Christian Caamaño Carrillo, Moreno Bevilacqua, and Reinhard Furrer. A selective view of climatological data and likelihood estimation. Spatial Statistics, 50:Paper No. 100596, 2022.
- [11] Andrew Chernih and Simon Hubbert. Closed form representations and properties of the generalised Wendland functions. Journal of Approximation Theory, 177:17–33, 2014.
- [12] I. Chlodovsky. Sur le développement des fonctions définies dans un intervalle infini en séries de polynomes de M. S. Bernstein. Compositio Mathematica, 4:380–393, 1937.
- [13] Noel A. C. Cressie. Statistics for Spatial Data. John Wiley & Sons, Inc., New York, 1993. Reprint, A Wiley-Interscience Publication.
- [14] Juan Du, Hao Zhang, and V. S. Mandrekar. Fixed-domain asymptotic properties of tapered maximum likelihood estimators. The Annals of Statistics, 37(6A):3330–3361, 2009.
- [15] Roman Flury and Reinhard Furrer. Discussion on competition for spatial statistics for large datasets. Journal of Agricultural, Biological, and Environmental Statistics, 26:599–603, 2021.
- [16] Reinhard Furrer, François Bachoc, and Juan Du. Asymptotic properties of multivariate tapering for estimation and prediction. Journal of Multivariate Analysis, 149:177–191, 2016.
- [17] Reinhard Furrer, Marc G. Genton, and Douglas Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502–523, 2006.
- [18] Gregory Gaspari and Stephen E. Cohn. Construction of correlation functions in two and three dimensions. Quarterly Journal of the Royal Meteorological Society, 125(554):723–757, 1999.
- [19] Florian Gerber, Kaspar Mösinger, and Reinhard Furrer. Extending r packages to support 64-bit compiled code: An illustration with spam64 and GIMMS NDVI3g data. Computers & Geosciences, 104:109–119, 2017.
- [20] Tilmann Gneiting. Correlation functions for atmospheric data analysis. Quarterly Journal of the Royal Meteorological Society, 125(559):2449–2464, 1999.
- [21] Tilmann Gneiting. Compactly supported correlation functions. Journal of Multivariate Analysis, 83(2):493–508, 2002.
- [22] Matthew J. Heaton, Abhirup Datta, Andrew O. Finley, and et al. A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological, and Environmental Statistics, 24(3):398–425, 2019.
- [23] Cari G. Kaufman, Mark J. Schervish, and Douglas W. Nychka. Covariance tapering for likelihood-based estimation in large spatial data sets. Journal of the American Statistical Association, 103(484):1545–1555, 2008.
- [24] G. G. Lorentz. Bernstein Polynomials. Mathematical Expositions, No. 8. University of Toronto Press, Toronto, 1953.
- [25] K. V. Mardia and R. J. Marshall. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika, 71(1):135–146, 1984.
- [26] Bertil Matérn. Spatial Variation: Stochastic Models and their Application to some Problems in Forest Surveys and other Sampling Investigations. Meddelanden Fran Statens Skogsforskningsinstitut, Band 49, Nr. 5, Stockholm, 1960.
- [27] Whitney K. Newey. Uniform convergence in probability and stochastic equicontinuity. Econometrica. Journal of the Econometric Society, 59(4):1161–1167, 1991.
- [28] Robert Schaback. The missing Wendland functions. Advances in Computational Mathematics, 34(1):67–81, 2011.
- [29] Elias M. Stein and Guido Weiss. Introduction to Fourier Analysis on Euclidean Spaces. Princeton Mathematical Series, No. 32. Princeton University Press, Princeton, N.J., 1971.
- [30] Michael L. Stein. Statistical properties of covariance tapers. Journal of Computational and Graphical Statistics, 22(4):866–885, 2013.
- [31] A. W. Van der Vaart. Asymptotic Statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.
- [32] Holger Wendland. Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Advances in Computational Mathematics, 4(4):389–396, 1995.
- [33] Holger Wendland. Scattered Data Approximation, volume 17 of Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2005.
- [34] Zhiliang Ying. Asymptotic properties of a maximum likelihood estimator with data from a Gaussian process. Journal of Multivariate Analysis, 36(2):280–296, 1991.
- [35] V. P. Zastavnyi. On some properties of the Buhmann functions. Ukrainian Mathematical Journal, 58(8):1045–1067, 2006.
- [36] Hao Zhang. Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. Journal of the American Statistical Association, 99(465):250–261, 2004.
- [37] Hao Zhang and Dale L. Zimmerman. Towards reconciling two asymptotic frameworks in spatial statistics. Biometrika, 92(4):921–936, 2005.