This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal designs in regression with correlated errors

Holger Dette
Ruhr-Universität Bochum
Fakultät für Mathematik
44780 Bochum
Germany
   Andrey Pepelyshev, Anatoly Zhigljavsky
School of Mathematics
Cardiff University
Cardiff, CF24 4AG
UK
Abstract

This paper discusses the problem of determining optimal designs for regression models, when the observations are dependent and taken on an interval. A complete solution of this challenging optimal design problem is given for a broad class of regression models and covariance kernels.
We propose a class of estimators which are only slightly more complicated than the ordinary least-squares estimators. We then demonstrate that we can design the experiments, such that asymptotically the new estimators achieve the same precision as the best linear unbiased estimator computed for the whole trajectory of the process. As a by-product we derive explicit expressions for the BLUE in the continuous time model and analytic expressions for the optimal designs in a wide class of regression models. We also demonstrate that for a finite number of observations the precision of the proposed procedure, which includes the estimator and design, is very close to the best achievable. The results are illustrated on a few numerical examples.

Keywords and Phrases: linear regression, correlated observations, signed measures, optimal design, BLUE, Gaussian processes, Doob representation

AMS Subject classification: Primary 62K05; Secondary 31A10

1 Introduction

Optimal design theory is a classical field of mathematical statistics with numerous applications in life sciences, physics and engineering. In many cases the use of optimal or efficient designs yields to a reduction of costs by a statistical inference with a minimal number of experiments without loosing any accuracy. Most work on optimal design theory concentrates on experiments with independent observations. Under this assumption the field is very well developed and a powerful methodology for the construction of optimal designs has been established [see for example the monograph of Pukelsheim, (2006)]. While important and elegant results have been derived in the case of independence, there exist numerous situations where correlation between different observations is present and these classical optimal designs are not applicable.

The theory of optimal design for correlated observations is much less developed and explicit results are only available in rare circumstances. The challenging difficulty consists here in the fact that - in contrast to the independent case - correlations yield to non-convex optimization problems and classical tools of convex optimization theory are not applicable. Some exact optimal designs for specific linear models have been studied in Dette et al., (2008); Kiselak and Stehlík, (2008); Harman and Štulajter, (2010). Because explicit solutions of optimal design problems for correlated observations are rarely available, several authors have proposed to determine optimal designs based on asymptotic arguments [see for example Sacks and Ylvisaker, (1966, 1968), Bickel and Herzberg, (1979), Näther, 1985a , Zhigljavsky et al., (2010)], where the references differ in the asymptotic arguments used to embed the discrete (non-convex) optimization problem in a continuous (or approximate) one. However, in contrast to the uncorrelated case, this approach does not simplify the problem substantially and due to the lack of convexity the resulting approximate optimal design problems are still extremely difficult to solve. As a consequence, optimal designs have mainly been determined analytically for the location model (in this case the optimization problems are in fact convex) and for a few one-parameter linear models [see Boltze and Näther, (1982), Näther, 1985a , Ch. 4, Näther, 1985b , Pázman and Müller, (2001) and Müller and Pázman, (2003) among others]. Only recently, Dette et al., (2013) determined (asymptotic) optimal designs for least squares estimation in models with more parameters under the additional assumption that the regression functions are eigenfunctions of an integral operator associated with the covariance kernel of the error process. However, due to this assumption, the class of models for which approximate optimal designs can be determined explicitly is rather small.

The present paper provides a complete solution of this challenging optimal design problem for a broad class of regression models and covariance kernels. Roughly speaking, we determine (asymptotic) optimal designs for a slightly modified ordinary least squares estimator (OLSE), such that the new estimate and the corresponding optimal design achieve the same accuracy as the best unbiased linear estimate (BLUE) with corresponding optimal designs.

To be more precise, consider a general regression observation scheme given by

y(tj)=θTf(tj)+ε(tj),j=1,,N,\displaystyle y(t_{j})=\theta^{T}f(t_{j})+\varepsilon(t_{j}),~~\quad j=1,\dots,N\,, (1.1)

where 𝔼[ε(tj)]=0,\mathbb{E}[\varepsilon(t_{j})]=0, K(ti,tj)=𝔼[ε(ti)ε(tj)]K(t_{i},t_{j})=\mathbb{E}[\varepsilon(t_{i})\varepsilon(t_{j})] denotes the covariance between observations at the points tit_{i} and tjt_{j}. (i,j=1,,Ni,j=1,\dots,N), θ=(θ1,,θm)T\theta=(\theta_{1},\ldots,\theta_{m})^{T} is a vector of unknown parameters, f(t)=(f1(t),,fm(t))Tf(t)=(f_{1}(t),\ldots,f_{m}(t))^{T} is a vector of linearly independent functions, the explanatory variables t1,,tNt_{1},\dots,t_{N} vary in a compact interval, say [a,b][a,b]. Parallel to model (1.1) we also consider its continuous time version

y(t)=θTf(t)+ε(t),t[a,b],\displaystyle y(t)=\theta^{T}f(t)+\varepsilon(t)~,~~t\in[a,b], (1.2)

where the full trajectory of the process {y(t)|t[a,b]}\{y(t)|t\in[a,b]\} can be observed and {ε(t)|t[a,b]}\{\varepsilon(t)|t\in[a,b]\} is a centered Gaussian process with covariance kernel KK, i.e. K(s,t)=𝔼[ε(s)ε(t)]K(s,t)=\mathbb{E}[\varepsilon(s)\varepsilon(t)]. This kernel is assumed to be continuous throughout this paper.

We pay much attention to the one-parameter case and develop a general method for solving the optimal design problem in model (1.2) explicitly for the OLSE, perhaps slightly modified. The new estimate and the corresponding optimal design achieve the minimal variance among all linear estimates (obtained by the BLUE). In particular, our approach allows to calculate this optimal variance explicitly. As a by-product we also identify the BLUE in the continuous time model (1.2). Based on these asymptotic considerations, we consider the finite sample case and suggest designs for a new estimation procedure (which is very similar to OLSE) with an efficiency very close to the best possible (obtained by the BLUE and the corresponding optimal design), for any number of observations. In doing this, we show how to implement the optimal strategies from the continuous time model in practice and demonstrate that even for very small sample sizes the loss of efficiency with respect to the best strategies based on the use of BLUE with a corresponding optimal design can be considered as negligible. We would like to point out at this point that - even in the one-dimensional case - the problem of numerically calculating optimal designs for the BLUE for a fixed sample size is an extremely challenging one due to the lack of convexity of the optimization problem.

In our approach, the importance of the one-parameter design problem is also related to the fact that the optimal design problem for multi-parameter models can be reduced component-wisely to problems in the one-parameter models. This gives us a way to generate analytically constructed universally optimal designs for a wide range of continuous time multi-parameter models of the form (1.2). Our technique is based on the observation that for a finite number of observations we can always emulate the BLUE in model (1.1) by a different linear estimator. To achieve that theoretically we assign signs to the support points of a discrete design and not only weights in the one-parameter models, but in the multi-parameter case we use matrix weights. We then determine “optimal” signs and weights and consider the weak convergence of these ‘designs” and estimators as the sample size converges to infinity. Finally, we prove the (universal) optimality of the limits in the continuous time model (1.2).

Theoretically, we construct a sequence of designs for either the pure or a modified OLSE, say θ^N\hat{\theta}_{N}, such that its variance or covariance matrix satisfies Var(θ^N)D\mbox{Var}(\hat{\theta}_{N})\to D^{*} as the sample size NN converges to infinity, where DD^{*} is the variance (if m=1m=1) or covariance matrix (if m>1m>1) for the BLUE in the continuous time model (1.2). In other words, DD^{*} is the smallest possible variance (or covariance matrix with respect to the Loewner ordering) of any unbiased linear estimator and any design. This makes the designs derived in this paper very competitive in applications against the designs proposed by Sacks and Ylvisaker, (1966) and optimal designs constructed numerically for the BLUE (using the Brimkulov-Krug-Savanov algorithm, for example). We emphasize once again that due to non-convexity the numerical construction of optimal designs for the BLUE is extremely difficult. An additional advantage of our approach is that we can analytically compute the BLUE with the corresponding optimal variance (covariance matrix) DD^{*} in the continuous time model (1.2) and therefore monitor the proximity of different approximations to the optimal variance DD^{*} obtained by the BLUE.

The methodology developed in this paper results in a non-standard estimation and optimal design theory and consists in a delicate interplay between new linear estimators and designs in the models (1.1) and (1.2). For this reason let us briefly introduce various estimators, which we will often refer to in the following discussion. Consider the model (1.1) and suppose that NN observations are taken at experimental conditions t1,,tNt_{1},\ldots,t_{N}. For the corresponding vector of observations 𝐘=(y(t1),,y(tN))T\mathbf{Y}=(y(t_{1}),\ldots,y(t_{N}))^{T}, a general weighted least squares estimator (WLSE) of θ\theta is defined by

WLSE:\displaystyle\mathrm{WLSE:} θ^WLSE=(𝐗T𝐖𝐗)1𝐗T𝐖𝐘,\displaystyle\widehat{\theta}_{WLSE}=(\mathbf{X}^{T}\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{W}\mathbf{Y}, (1.3)

where 𝐗=(fi(tj))j=1,,Ni=1,,m\mathbf{X}=(f_{i}(t_{j}))^{i=1,\ldots,m}_{j=1,\ldots,N} is an N×mN\!\times\!m design matrix and 𝐖\mathbf{W} is some N×NN\times N matrix such that (𝐗T𝐖𝐗)1(\mathbf{X}^{T}\mathbf{W}\mathbf{X})^{-1} exists. For any such 𝐖\mathbf{W} the estimator (1.3) is obviously unbiased. The covariance matrix of the estimator (1.3) is given by

Var(θ^WLSE)\displaystyle\mathrm{Var}(\widehat{\theta}_{WLSE}) =(𝐗T𝐖𝐗)1𝐗T𝐖𝚺𝐖T𝐗(𝐗T𝐖T𝐗)1,\displaystyle=~(\mathbf{X}^{T}\mathbf{W}\,\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{W}\,\mathbf{\Sigma}\mathbf{W}^{T}\,\mathbf{X}(\mathbf{X}^{T}\mathbf{W}^{T}\,\mathbf{X})^{-1}, (1.4)

where 𝚺=(K(ti,tj))i,j=1,,N\mathbf{\Sigma}=(K(t_{i},t_{j}))_{i,j=1,\ldots,N} is an N×NN\!\times\!N matrix of variances/covariances. For the standard WLSE the matrix 𝐖\mathbf{W} is symmetric non-negative definite; in this case θ^WLSE\widehat{\theta}_{WLSE} minimizes the weighted sum of squares SS𝐖(θ)=(𝐘𝐗θ)T𝐖(𝐘𝐗θ)SS_{\mathbf{W}}(\theta)=(\mathbf{Y}-\mathbf{X}\theta)^{T}\mathbf{W}(\mathbf{Y}-\mathbf{X}\theta) with respect to θ\theta. Important particular cases of estimators of the form (1.3) are the OLSE, the best unbiased linear estimate (BLUE) and the signed least squares estimate (SLSE):

OLSE:\displaystyle\mathrm{OLSE:} θ^OLSE=(𝐗T𝐗)1𝐗T𝐘,\displaystyle\widehat{\theta}_{OLSE}=(\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{Y}, (1.5)
BLUE:\displaystyle\mathrm{BLUE:} θ^BLUE=(𝐗T𝚺1𝐗)1𝐗T𝚺1𝐘,\displaystyle\widehat{\theta}_{BLUE}=(\mathbf{X}^{T}\mathbf{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{\Sigma}^{-1}\mathbf{Y}, (1.6)
SLSE:\displaystyle\mathrm{SLSE:} θ^SLSE=(𝐗T𝐒𝐗)1𝐗T𝐒𝐘.\displaystyle\widehat{\theta}_{SLSE}=(\mathbf{X}^{T}\mathbf{S}\,\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{S}\,\mathbf{Y}. (1.7)

Here 𝐒\mathbf{S} is an N×NN\!\times\!N diagonal matrix with entries +1+1 and 1-1 on the diagonal; note that if 𝐒𝐈N\mathbf{S}\neq\mathbf{I}_{N} then SLSE is not a standard WLSE. While the use of BLUE and OLSE is standard, the SLSE is less common. It was introduced in Boltze and Näther, (1982) and further studied in Chapter 5.3 of Näther, 1985a . In the content of the present paper, the SLSE will turn out to be very useful for constructing optimal designs for OLSE and the BLUE in the model (1.2) with one parameter, where the full trajectory can be observed. Another estimate of θ\theta, which is not a special case of the WLSE, will be introduced in Section 3 and used in the multi-parameter models.

The remaining structure of the paper is as follows. In Section 2 we derive optimal designs for continuous time one-parameter models and discuss how to implement the designs in practice. In Section 3 we extend the results of Section 2 to multi-parameter models. In Appendix B we discuss transformations of regression models and associated designs, which are a main tool in the proofs of our result but also of own interest. In particular, we provide an extension of the famous Doob representation for Gaussian processes [see Doob, (1949) and Mehr and McFadden, (1965)], which turns out to be a very important ingredient in proving the design optimality results of Sections 2 and 3. Finally, in Appendix A we collect some auxiliary statements and proofs for the main results of this paper.

2 Optimal designs for one-parameter models

In this section we concentrate on the one-parameter model

y(tj)=θf(tj)+ε(tj);j=1,,N,y(t_{j})=\theta f(t_{j})+\varepsilon(t_{j});\qquad j=1,\dots,N, (2.1)

on the interval [a,b][a,b] and its continuous time analogue, where 𝔼[ε(t)]=0\mathbb{E}[\varepsilon(t)]=0 and 𝔼[ε(t)ε(t)]=K(t,t)\mathbb{E}[\varepsilon(t)\varepsilon(t^{\prime})]=K(t,t^{\prime}). Our approach uses some non-standard ideas and estimators in linear models and therefore we begin this section with a careful explanation of the logic of the material.

  • Sect. 2.1. Under the assumption that the design space is finite we show in Lemma 2.1 that by assigning weights and signs to the observation points {t1,,tN}\{t_{1},\dots,t_{N}\} we can construct a WLSE which is equivalent to the BLUE. Then, we derive in Corollary 2.1 an explicit form for the optimal weights for a broad class of covariance kernels, which are called triangular covariance kernels.

  • Sect. 2.2. We demonstrate in Theorem 2.1 that the optimal designs derived in Sect. 2.1 converge weakly to a signed measure, if the cardinality of the design space converges to infinity.

  • Sect. 2.3. We consider model (2.1) under the assumption that the full trajectory of the process {y(t)|t[a,b]}\{y(t)|t\in[a,b]\} can be observed. For the specific case of Brownian motion, that is K(t,t)=min{t,t}K(t,t^{\prime})=\min\{t,t^{\prime}\}, we prove analytically the optimality of the signed measure derived in Theorem 2.1 for OLSE. Then, in Theorem 2.3 we establish optimality of the asymptotic measures from Theorem 2.1 for general covariance kernels. As a by-product we also identify the BLUE in the continuous time model (1.2) (in the one-dimensional case). For this purpose, we introduce a transformation which maps any regression model with a triangular covariance kernel into another model with different triangular kernels. These transformations allow us to reduce any optimization problem to the situation considered in Theorem 2.2, which refers to the case of Brownian motion. The construction of this map is based on an extension of the celebrated Doob’s representation which will be developed in Appendix B.

  • Sect. 2.4. We provide some examples of asymptotic optimal measures for specific models.

  • Sect. 2.5. We introduce a practical implementation of the asymptotic theory derived in the previous sections. For a finite sample size we construct WLSE with corresponding designs which can achieve very high efficiency compared to the BLUE with corresponding optimal design. It turns out that these estimators are slightly modified OLSE, where only observations at the end-points obtain a weight (and in some cases also a sign).

  • Sect. 2.6. We illustrate the new methodology in several examples. In particular, we give a comparison with the best known procedures based on BLUE and show that the loss in precision for the procedures derived in this paper is negligible with our procedures being much simpler and more robust than the procedures based on BLUE.

2.1 Optimal designs for SLSE on a finite design space

In this section, we suppose that the design space for model (2.1) is finite, say 𝒯={t1,,tN}{\cal T}=\{t_{1},\ldots,t_{N}\}, and demonstrate that in this case the approximate optimal designs for the SLSE (1.7) can be found explicitly. Since we consider the SLSE (1.7) rather than the OLSE (1.5), a generic approximate design on the design space 𝒯={t1,,tN}{\cal T}=\{t_{1},\ldots,t_{N}\} is an arbitrary discrete signed measure ξ={t1,,tN;w1,,wN}\xi=\{t_{1},\ldots,t_{N};w_{1},\ldots,w_{N}\}, where wi=sipiw_{i}=s_{i}p_{i}, si{1,1}s_{i}\in\{-1,1\}, pi0p_{i}\geq 0 (i=1,,Ni=1,\dots,N) and i=1Npi=1\sum_{i=1}^{N}p_{i}=1. We assume that the support t1,,tNt_{1},\ldots,t_{N} of the design is fixed but the weights p1,,pNp_{1},\ldots,p_{N} and signs s1,,sNs_{1},\ldots,s_{N}, or equivalently the signed weights wiw_{i}, will be chosen to minimize the variance of the SLSE (1.7). In view of (1.4), this variance is given by

D(ξ)=i=1Nj=1NK(ti,tj)wiwjf(ti)f(tj)/(i=1Nwif2(ti))2.\displaystyle D(\xi)=\sum^{N}_{i=1}\sum^{N}_{j=1}K(t_{i},t_{j})w_{i}w_{j}f(t_{i})f(t_{j})\Big{/}\Big{(}\sum^{N}_{i=1}w_{i}f^{2}(t_{i})\Big{)}^{2}. (2.2)

Note that this expression coincides with the variance of the WLSE (1.2), where the matrix 𝐖\mathbf{W} is defined by 𝐖=diag(w1,,wN)\mathbf{W}=\mbox{diag}(w_{1},\dots,w_{N}).

We assume that f(ti)0f(t_{i})\neq 0 for all i=1,,Ni=1,\ldots,N. If f(tj)=0f(t_{j})=0 for some jj then the point tjt_{j} can be removed from the design space 𝒯\mathcal{T} without changing the SLSE estimator, its variance and the corresponding value D(ξ)D(\xi). In the above definition of the weights wiw_{i}, we have i=1N|wi|=i=1Npi=1\sum^{N}_{i=1}|w_{i}|=\sum^{N}_{i=1}p_{i}=1. Note, however, that the value of the criterion (2.2) does not change if we change all the weights from wiw_{i} to cwicw_{i} (i=1,,Ni=1,\ldots,N) for arbitrary c0c\neq 0.

Despite the fact that the functional DD in (2.2) is not convex as a function of (w1,,wN)(w_{1},\dots,w_{N}), the problem of determining the optimal design can be easily solved by a simple application of the Cauchy-Schwarz inequality. The proof of the following lemma is given in Appendix A [see also Theorem 5.3 in Näther, 1985a , where this result was proved in a slightly different form].

Lemma 2.1

Assume that the matrix 𝚺=(K(ti,tj))i,j=1,,N\mathbf{\Sigma}=(K(t_{i},t_{j}))_{i,j=1,\ldots,N} is positive definite and f(ti)0f(t_{i})\neq 0 for all i=1,,Ni=1,\ldots,N. Then the optimal weights w1,,wNw_{1}^{*},\dots,w_{N}^{*} minimizing (2.2) subject to the constraint i=1N|wi|=1\sum^{N}_{i=1}|w_{i}|=1 are given by

wi=c𝐞iT𝚺1𝐟f(ti);i=1,,N,\displaystyle w_{i}^{*}=c\frac{\mathbf{e}_{i}^{T}\mathbf{\Sigma}^{-1}\mathbf{f}}{f(t_{i})};\qquad i=1,\dots,N, (2.3)

where 𝐟=(f(t1),,f(tN))T\mathbf{f}=(f(t_{1}),\ldots,f(t_{N}))^{T}, 𝐞i=(0,0,,0,1,0,,0)TN\mathbf{e}_{i}=(0,0,\ldots,0,1,0,\ldots,0)^{T}\in\mathbb{R}^{N} is the ii-th unit vector, and

c=(i=1N|𝐞iT𝚺1𝐟/f(ti)|)1.c=\Bigl{(}\sum^{N}_{i=1}|\mathbf{e}_{i}^{T}\mathbf{\Sigma}^{-1}\mathbf{f}/f(t_{i})|\Bigr{)}^{-1}.

Moreover, for the design ξ={t1,,tN;w1,,wN}\xi^{*}=\{t_{1},\ldots,t_{N};w_{1}^{*},\ldots,w_{N}^{*}\} with weights (2.3) we have D(ξ)=DD(\xi^{*})=D^{*}, where D=1/(𝐟T𝚺1𝐟)D^{*}=1/(\mathbf{f}^{T}\,\mathbf{\Sigma}^{-1}\mathbf{f}), the variance of the BLUE defined in (1.6) using all observations t1,,tNt_{1},\ldots,t_{N}.

Lemma 2.1 shows, in particular, that the pair {SLS estimate, corresponding optimal design ξ\xi^{*}} provides an unbiased estimator with the best possible variance for the one-parameter model (2.1). This results in a WLSE (1.2) with 𝐖=diag(w1,,wN)\mathbf{W}^{*}=\mbox{diag}(w^{*}_{1},\dots,w^{*}_{N}) which is BLUE. In other words, by a slight modification of the OLSE we are able to emulate the BLUE using the appropriate design or WLSE.

While the statement of Lemma 2.1 holds for arbitrary kernels, we are able to determine the optimal weights wiw_{i}^{*} more explicitly for a broad class, which are called triangular kernels and are of the form

K(t,t)=u(t)v(t)fortt,\displaystyle K(t,t^{\prime})=u(t)v(t^{\prime})\;\;\;{\rm for}\;\;t\leq t^{\prime}, (2.4)

where u()u(\cdot) and v()v(\cdot) are some functions on the interval [a,b][a,b]. Note that the majority of covariance kernels considered in literature belong to this class, see for example Näther, 1985a ; Zhigljavsky et al., (2010) or Harman and Štulajter, (2011). The following result is a direct consequence of Lemma A.1 from Appendix A .

Corollary 2.1

Assume that the covariance kernel K(,)K(\cdot,\cdot) has the form (2.4) so that the matrix 𝚺=(K(ti,tj))i,j=1,,N\mathbf{\Sigma}=(K(t_{i},t_{j}))_{i,j=1,\ldots,N} is positive definite and has the entries K(ti,tj)=uivjK(t_{i},t_{j})=u_{i}v_{j} for iji\leq j, where for k=1,,Nk=1,\ldots,N we denote uk=u(tk)u_{k}=u(t_{k}), vk=v(tk)v_{k}=v(t_{k}), and also fk=f(tk)f_{k}=f(t_{k}), qk=uk/vkq_{k}=u_{k}/v_{k}. If f10(i=1,,N)f_{1}\neq 0\ (i=1,\dots,N), the weights in (2.3) can be represented explicitly as follows:

w1\displaystyle w_{1}^{*} =\displaystyle= cf1(σ~11f1+σ~12f2)=cu2f1v1v2(q2q1)(f1u1f2u2),\displaystyle\frac{c}{f_{1}}\left(\tilde{\sigma}_{11}{f_{1}}+\tilde{\sigma}_{12}{f_{2}}\right)=\frac{c\,u_{2}}{f_{1}v_{1}v_{2}(q_{2}-q_{1})}\Bigl{(}\frac{f_{{1}}}{u_{1}}-\frac{f_{2}}{u_{2}}\Bigr{)}\,, (2.5)
wN\displaystyle w_{N}^{*} =\displaystyle= cfN(σ~N,NfN+σ~N1,NfN1)=cfNvN(qNqN1)(fNvNfN1vN1),\displaystyle\frac{c}{f_{N}}\left(\tilde{\sigma}_{N,N}{f_{N}}+\tilde{\sigma}_{N-1,N}{f_{N-1}}\right)=\frac{c}{f_{N}v_{N}(q_{N}-q_{N-1})}\Bigl{(}\frac{f_{N}}{v_{N}}-\frac{f_{N-1}}{v_{N-1}}\Bigr{)}\,, (2.6)
wi\displaystyle w_{i}^{*} =\displaystyle= cfi(σ~i,ifi+σ~i1,ifi1+σ~i,i+1fi+1)\displaystyle\frac{c}{f_{i}}\left(\tilde{\sigma}_{i,i}f_{i}+\tilde{\sigma}_{i-1,i}f_{i-1}+\tilde{\sigma}_{i,i+1}f_{i+1}\right)
=\displaystyle= cfivi((qi+1qi1)fivi(qi+1qi)(qiqi1)fi1vi1(qiqi1)fi+1vi+1(qi+1qi)),\displaystyle\frac{c}{f_{i}v_{i}}\Bigl{(}\frac{(q_{i+1}-q_{i-1})f_{i}}{v_{i}(q_{i+1}-q_{i})(q_{i}-q_{i-1})}-\frac{f_{i-1}}{v_{i-1}(q_{i}-q_{i-1})}-\frac{f_{i+1}}{v_{i+1}(q_{i+1}-q_{i})}\Bigr{)}\,,

for i=2,,N1i=2,\ldots,N-1. In formulas (2.5), (2.6) and (2.1), the quantity σ~ij\tilde{\sigma}_{ij} denotes the element in the position (i,j)(i,j) of the matrix 𝚺1=(σ~ij)i,j=1,,N\mathbf{\Sigma}^{-1}=(\tilde{\sigma}_{ij})_{i,j=1,\ldots,N}.

2.2 Weak convergence of designs

In this section, we consider the asymptotic properties of designs with weights (2.5) - (2.1). Recall that the design space is an interval, say [a,b][a,b], and that we assume a triangular covariance function of the form (2.4). According to the discussion of triangular covariance kernels provided in Section 4.1 of Appendix B, the functions u()u(\cdot) and v()v(\cdot) are continuous and strictly positive on the interval (a,b)(a,b) and the function q()=u()/v()q(\cdot)=u(\cdot)/v(\cdot) is positive, continuous and strictly increasing on (a,b)(a,b). We also assume that the regression function ff in (2.1) is continuous and strictly positive on the interval (a,b)(a,b). We define the transformation

Q(t)=q(t)q(a)q(b)q(a)Q(t)={q(t)-q(a)\over q(b)-q(a)} (2.8)

and note that the function Q:[a,b][0,1]Q:[a,b]\to[0,1] is increasing on the interval [a,b][a,b] with Q(a)=0Q(a)=0 and Q(b)=1Q(b)=1, that is Q()Q(\cdot) is a cumulative distribution function (c.d.f.). For fixed NN and i=1,,Ni=1,\ldots,N, define zi,N=(i12)/Nz_{i,N}=(i-\frac{1}{2})/N and the design points

ti,N=Q1(zi,N),i=1,,N.\displaystyle t_{i,N}=Q^{-1}\left(z_{i,N}\right),\;\;i=1,\ldots,N\,. (2.9)
Theorem 2.1

Consider the optimal design problem for the model (2.1), where the error process ε(t)\varepsilon(t) has the covariance kernel K(t,s)K(t,s) of the form (2.4). Assume that u()u(\cdot), v()v(\cdot), f()f(\cdot) and q()q(\cdot) are strictly positive, twice continuously differentiable functions on the interval [a,b][a,b]. Consider the sequence of signed measures

ξN={t1,N,,tN,N;w1,N,,wN,N},\xi_{N}=\{t_{1,N},\ldots,t_{N,N};w_{1,N},\ldots,w_{N,N}\},

where the support points ti,Nt_{i,N} are defined in (2.9) and the weights wi,Nw_{i,N} are assigned to these points according to the rule (2.3) of Lemma 2.1. Then the sequence of measures {ξN}N\{\xi_{N}\}_{N\in\mathbb{N}} converges in distribution to a signed measure ξ\xi^{*}, which has masses

Pa=cf(a)v2(a)q(a)[f(a)u(a)u(a)f(a)],Pb=ch(b)f(b)v(b)q(b)\displaystyle P_{a}=\frac{c}{f(a)v^{2}(a)q^{\prime}(a)}\Big{[}\frac{f(a)u^{\prime}(a)}{u(a)}-f^{\prime}(a)\Big{]}\,,\;\;P_{b}=c\cdot\frac{h^{\prime}(b)}{f(b)v(b)q^{\prime}(b)} (2.10)

at the points aa and bb, respectively, and the signed density

p(t)=cf(t)v(t)[h(t)q(t)]\displaystyle p(t)=-\frac{c}{f(t)v(t)}\Big{[}\frac{h^{\prime}(t)}{q^{\prime}(t)}\Big{]}^{\prime}\, (2.11)

(that is, the Radon-Nikodym derivative of ξ\xi^{*} with respect to the Lebesque measure) on the interval (a,b)(a,b), where the function h()h(\cdot) is defined by h(t)=f(t)/v(t)h(t)=f(t)/v(t).

The proof of Theorem 2.1 is technically complicated and therefore given in Appendix A. The constant c0c\neq 0 in (2.10) and (2.11) is arbitrary. If a normalization |ξ|([a,b])=1|\xi^{*}|([a,b])=1 is required, then cc can be found from the normalizing condition

abξ(dt)=|Pa|+|Pb|+ab|p(t)|𝑑t=1.\int^{b}_{a}\xi^{*}(dt)=|P_{a}|+|P_{b}|+\int_{a}^{b}|p(t)|dt=1\,.

Throughout this paper we write the limiting designs of Theorem 2.1 in the form

ξ(dt)=Paδa(dt)+Pbδb(dt)+p(t)dt,\xi^{*}(dt)=P_{a}\delta_{a}(dt)+P_{b}\delta_{b}(dt)+p(t)dt\,, (2.12)

where δa(dt)\delta_{a}(dt) and δb(dt)\delta_{b}(dt) are the Dirac-measures concentrated at the points aa and bb, respectively, and the function p()p(\cdot) is defined by (2.11). Note also that under the assumptions of Theorem 2.1, the function p()p(\cdot) is continuous on the interval [a,b][a,b]. In the case of Brownian motion, the limiting design of Theorem 2.1 is particularly simple.

Example 2.1

If the error process ε\varepsilon in model (2.1) is the Brownian motion on the interval [a,b][a,b] with 0<a<b<0<a<b<\infty, then K(t,s)=min(t,s)K(t,s)=\min(t,s) and hence u(t)=tu(t)=t, v(t)=1v(t)=1, q(t)=tq(t)=t. This implies that the limiting design of Theorem 2.1 is given by (2.12) with

Pa=cf(a)f(a)aaf(a),Pb=cf(b)f(b)andp(t)=cf′′(t)f(t).\displaystyle P_{a}=c\ \frac{f(a)-f^{\prime}(a)a}{af(a)}\,,\;\;P_{b}=c\ \frac{f^{\prime}(b)}{f(b)}\qquad{\rm{and}}\qquad p(t)=-c\frac{f^{\prime\prime}(t)}{f(t)}\,. (2.13)

2.3 Optimal designs and the BLUE

In this section we consider the continuous time model (1.2) in the case m=1m=1 and demonstrate that the limiting designs derived in Theorem 2.1 are in fact optimal. A linear estimator for the parameter θ\theta in model (1.2) is defined by θ^μ=aby(t)μ(dt)\hat{\theta}_{\mu}=\int^{b}_{a}y(t)\mu(dt), where μ\mu is a signed measure on the interval [a,b][a,b]. Special cases include the OLSE and SLSE θξ~=aby(t)f(t)ξ(dt)/abf2(t)ξ(dt)\widetilde{\theta_{\xi}}=\int^{b}_{a}y(t)f(t)\xi(dt)/\int^{b}_{a}f^{2}(t)\xi(dt), where ξ\xi is a measure or a signed measure on the interval [a,b][a,b], respectively. Note that θ^μ\hat{\theta}_{\mu} is unbiased if and only if abf(t)μ(dt)=1\int^{b}_{a}f(t)\mu(dt)=1 and θξ~\widetilde{\theta_{\xi}} is unbiased by construction. The BLUE (in the continuous time model (1.2)) minimizes

Φ(μ)=Var(θ^μ)=ababK(x,y)μ(dx)μ(dy)\Phi(\mu)=\mbox{Var}(\hat{\theta}_{\mu})=\int^{b}_{a}\int^{b}_{a}K(x,y)\mu(dx)\mu(dy)

in the class of all signed measures μ\mu satisfying abf(t)μ(dt)=1\int^{b}_{a}f(t)\mu(dt)=1, and

D=inf{Φ(μ)μsigned measure on[a,b]}\displaystyle D^{*}=\inf\{\Phi(\mu)\mid\mu\ \ \mbox{signed measure on}\ [a,b]\} (2.14)

denotes the best possible variance of all linear unbiased estimators in the continuous time model (1.2).

Similarly, a signed measure ξ\xi^{*} on the interval [a,b][a,b] is called optimal for least squares estimation in the one-parameter model (1.2), if it minimizes the functional

D(ξ)=Var(θξ~)=ababK(t,s)f(t)f(s)ξ(dt)ξ(ds)/(abf2(t)ξ(dt))2,\displaystyle D(\xi)=\mbox{Var}(\widetilde{\theta_{\xi}})={\int_{a}^{b}\int_{a}^{b}K(t,s)f(t)f(s)\xi(dt)\xi(ds)}\Big{/}{\Big{(}\int_{a}^{b}f^{2}(t)\xi(dt)\Big{)}^{2}}\,, (2.15)

in the set of all signed measures ξ\xi on the interval [a,b][a,b], such that abf2(t)ξ(dt)0\int_{a}^{b}f^{2}(t)\xi(dt)\neq 0. In the case of a Brownian motion, we are able to establish the optimality of the design of Example 2.1. A proof of the following result is given in Appendix A.

Theorem 2.2

Let {ε(t)|t[a,b]}\{\varepsilon(t)\,|\,t\in[a,b]\} be a Brownian motion, so that K(t,t)=min{t,t}K(t,t^{\prime})=\min\{t,t^{\prime}\}, and ff be a positive, twice continuously differentiable function on the interval [a,b]+[a,b]\subset\mathbb{R^{+}}. Then the signed measure ξ\xi^{*}, defined by (2.12) and (2.13) with arbitrary c0c\neq 0, minimizes the functional (2.15). The minimal value in (2.15) is obtained as

D(ξ)=minξD(ξ)=[f2(a)a+ab(f(t))2𝑑t]1.D(\xi^{*})=\min_{\xi}D(\xi)=\Big{[}\frac{f^{2}(a)}{a}+\int^{b}_{a}(f^{\prime}(t))^{2}dt\Big{]}^{-1}.

Moreover, the BLUE in model (1.2) is given by θ^μ\hat{\theta}_{\mu^{*}}, where μ(dt)=f(t)ξ(dt)\mu^{*}(dt)=f(t)\xi^{**}(dt) and ξ\xi^{**} is the signed measure defined by (2.12) and (2.13) with constant c=D(ξ)c^{*}=D(\xi^{*}). This further implies D=D(ξ)=Φ(μ).D^{*}=D(\xi^{*})=\Phi(\mu^{*}).

Based on the design optimality established in Theorem 2.2 for the special case of Brownian motion and the technique of transformation of regression models described in Appendix B, we can establish the optimality of the asymptotic designs derived in Theorem 2.1 for more general covariance kernels; see Appendix A for the proof.

Theorem 2.3

Under the conditions of Theorem 2.1, the optimal design ξ\xi^{*} minimizing the functional (2.15) is defined by the formulas (2.10) - (2.12) with arbitrary c0c\neq 0. The minimal value in (2.15) is obtained as

D(ξ)=[f~2(q(a))q(a)+q(a)q(b)(f~(t))2𝑑t]1,\displaystyle D(\xi^{*})=\Big{[}\frac{\tilde{f}^{2}(q(a))}{q(a)}+\int^{q(b)}_{q(a)}(\tilde{f}^{\prime}(t))^{2}dt\Big{]}^{-1}\,, (2.16)

where f~(t)=f(q1(s))/v(q1(s))\tilde{f}(t)=f(q^{-1}(s))/v(q^{-1}(s)). Moreover, the BLUE in model (1.2) is given by θ^μ\hat{\theta}_{\mu^{*}}, where μ(dt)=f(t)ξ(dt)\mu^{*}(dt)=f(t)\xi^{**}(dt), ξ\xi^{**} is the signed measure defined in (2.10) - (2.12) with constant c=D(ξ)c^{*}=D(\xi^{*}), and D=Φ(μ)=D(ξ).D^{*}=\Phi(\mu^{*})=D(\xi^{*}).

2.4 Examples of optimal designs

In this section, we provide the values of PaP_{a}, PbP_{b} and the function p()p(\cdot) in the general expression (2.12) for the optimal designs in a number of important special cases for the one-parameter continuous time model (1.2), where the design space is 𝒯=[a,b]{\cal T}=[a,b]. Specifically, optimal designs are given in Table 1 for the location model, in Table 2 for the linear model, in Table 3 for a quadratic model and in Table 4 for a trigonometric model. The last named model was especially chosen to demonstrate the existence of optimal designs with a density pp which changes sign in the interval (a,b)(a,b). In the tables several triangular covariance kernels are considered. The parameters of these covariance kernels satisfy the constraints c2>±c1c_{2}>\pm c_{1}, c2[a,b]\mp c_{2}\not\in[a,b], γ>ω\gamma>\omega, λ>0\lambda>0. For the sake of a transparent presentation, we use the factor c=1c=1 in all tables, but we emphasize once again that the optimal designs do not depend on the scaling factor.

As an example, if K(t,t)=eλ|tt|K(t,t^{\prime})=e^{-\lambda|t-t^{\prime}|} for some λ>0\lambda>0, we have from the last row of Table 2 that the optimal design for the continuous time model {θt+ε(t)|t[1,2]}\{\theta t+\varepsilon(t)|t\in[1,2]\} is ξ(dt)=(λ1)δ1(dt)+(λ+12)δ2(dt)+λ2dt\xi^{*}(dt)=(\lambda-1)\delta_{1}(dt)+(\lambda+\frac{1}{2})\delta_{2}(dt)+\lambda^{2}dt, and as a consequence, D=(52+12λ+76λ)1D^{*}=(\frac{5}{2}+\frac{1}{2\lambda}+\frac{7}{6}\lambda)^{-1}.

Table 1: Optimal designs for the location model: f(t)=1f(t)=1, t[a,b]t\in[a,b].
u(t)u(t) v(t)v(t) PaP_{a} PbP_{b} p(t)p(t)
any 11 1 0 0
c1+tc_{1}+t c2±tc_{2}\pm t 1a+c1\displaystyle\frac{1}{a+c_{1}} 1b±c2\displaystyle\frac{-1}{b\pm c_{2}} 0
tγt^{\gamma} tωt^{\omega} γaγω-\gamma a^{-\gamma-\omega} ωbγω\omega b^{-\gamma-\omega} γωt1γω\gamma\omega t^{-1-\gamma-\omega}
eλte^{\lambda t} eγte^{-\gamma t} λea(γλ)\lambda e^{a(\gamma-\lambda)} γeb(γλ)\gamma e^{b(\gamma-\lambda)} λγet(γλ)\lambda\gamma e^{t(\gamma-\lambda)}
Table 2: Optimal designs for the linear regression model through the origin: f(t)=tf(t)=t, t[a,b]t\in[a,b].
u(t)u(t) v(t)v(t) PaP_{a} PbP_{b} p(t)p(t)
tt 11 0 1 0
c1+tc_{1}+t c2±tc_{2}\pm t c1(a+c1)a\displaystyle\frac{-c_{1}}{(a+c_{1})a} ±c2(b±c2)b\displaystyle\frac{\pm c_{2}}{(b\pm c_{2})b} 0
tγt^{\gamma} tωt^{\omega} (γ1)aγω-\!(\gamma\!-\!1)a^{-\gamma-\omega} (ω1)bγω(\omega\!-\!1)b^{-\gamma-\omega} (1γ)(1ω)t1γω(1\!-\!\gamma)(1\!-\!\omega)t^{-1-\gamma-\omega}
eλte^{\lambda t} 11 (aλ1)eaλa\displaystyle\frac{(a\lambda-1)e^{-a\lambda}}{a} ebλb\displaystyle\frac{e^{-b\lambda}}{b} λetλt\displaystyle\frac{\lambda e^{-t\lambda}}{t}
eλte^{\lambda t} eγte^{-\gamma t} aλ1aea(γλ)\displaystyle\frac{a\lambda-1}{a}e^{a(\gamma-\lambda)} bγ+1beb(γλ)\displaystyle\frac{b\gamma+1^{\rule{0.0pt}{8.53581pt}}}{b}e^{b(\gamma-\lambda)} λγtγ+λtet(γλ)\displaystyle\frac{\lambda\gamma t-\gamma+\lambda}{t}e^{t(\gamma-\lambda)}
Table 3: Optimal designs for the quadratic regression model: f(t)=t2+νf(t)=t^{2}+\nu, t[a,b]t\in[a,b].
u(t)u(t) v(t)v(t) Paf(a)P_{a}f(a) Pbf(b)P_{b}f(b) p(t)f(t)p(t)f(t)
tt 11 (a2ν)/a\displaystyle(a^{2}-\nu)/a 2b\displaystyle-2b 2\displaystyle 2
c1+tc_{1}+t c2±tc_{2}\pm t (a2ν+2ac1)a+c1\displaystyle\frac{(a^{2}-\nu+2ac_{1})}{a+c_{1}} (b2ν±2bc2)b±c2\displaystyle\frac{\mp(b^{2}-\nu\pm 2bc_{2})}{b\pm c_{2}} 2\displaystyle 2
tγt^{\gamma} tωt^{\omega} ((2γ)a2γν)aγω{\small{((2\!-\!\gamma)a^{2}\!-\!\gamma\nu)a^{-\gamma-\omega}}} ((ω2)b2+ων)bγω{\small{((\omega\!-\!2)b^{2}\!+\!\omega\nu)b^{-\gamma-\omega}}} ((2ω)(2γ)+νγω)t1γω{((2\!-\!\omega\!)(2\!-\!\gamma)\!+\!\nu\gamma\omega)t^{1-\gamma-\omega}}
eλte^{\lambda t} 11 (2a(a2+ν)λ)eaλ\displaystyle{(2a-(a^{2}+\nu)\lambda)e^{-a\lambda}} 2bebλ\displaystyle-2{be^{-b\lambda}} 2(1tλ)etλ\displaystyle{2(1-t\lambda)e^{-t\lambda}}
eλte^{\lambda t} eλte^{-\lambda t} (2a(a2+ν)λ)\displaystyle{(2a-(a^{2}+\nu)\lambda)} ((b2+ν)λ+2b)\displaystyle-{((b^{2}+\nu)\lambda+2b^{\rule{0.0pt}{8.53581pt}})} (2λ2(t2+ν))\displaystyle\left(2-\lambda^{2}(t^{2}+\nu)\right)
Table 4: Optimal designs for the trigonometric regression model: f(t)=1+12sin(2πt)f(t)=1+\frac{1}{2}\sin(2\pi t), t[1,2]t\in[1,2].
u(t)u(t) v(t)v(t) PaP_{a} PbP_{b} p(t)f(t)p(t)f(t)
tt 11 (1π)\displaystyle(1-\pi) π\displaystyle\pi 2π2sin(2πt)\displaystyle 2\pi^{2}\sin(2\pi t)
c1+tc_{1}+t c2±tc_{2}\pm t 1πc1πc1+1\displaystyle\frac{1-\pi c_{1}-\pi}{c_{1}+1} 1πc22πc2±2\displaystyle\mp\frac{1\mp\pi c_{2}-2\pi}{c_{2}\pm 2} 2π2sin(2πt)\displaystyle 2\pi^{2}\sin(2\pi t)
t2t^{2} tt (2π)\displaystyle(2-\pi) (2π1)/8\displaystyle(2\pi-1)/{8} 2t4((π2t21)sin(2πt)+πtcos(2πt)1)\displaystyle 2t^{-4}{((\pi^{2}t^{2}\!-\!1)\sin(2\pi t)\!+\!\pi t\cos(2\pi t)\!-\!1)}{}
eλte^{\lambda t} 11 (λπ)eλ\displaystyle(\lambda-\pi)e^{-\lambda} πe2λ\displaystyle\pi e^{-2\lambda} (2π2sin(2πt)+πλcos(2πt))eλt\displaystyle(2\pi^{2}\sin(2\pi t)+\pi\lambda\cos(2\pi t))e^{-\lambda t}
eλte^{\lambda t} eλte^{-\lambda t} (λπ)\displaystyle(\lambda-\pi) (λ+π)\displaystyle(\lambda+\pi) ((2π2+λ2/2)sin(2πt)+λ2)\displaystyle((2\pi^{2}+\lambda^{2}/2)\sin(2\pi t)+\lambda^{2})

2.5 Practical implementation: designs for finite sample size

In practice, efficient designs and corresponding estimators for the model (1.1) have to be derived from the optimal solutions in the continuous time model (1.2), and in this section a procedure with a good finite sample performance is proposed. Roughly speaking, it consists of a slight modification of the ordinary least squares estimator and a discretization of a continuous signed measure with the asymptotic optimal density in (2.11) .

We assume that the experimenter can take N+2N\!+\!2 observations with NN observations inside the interval [a,b][a,b]. In principle, any probability measure on the interval can be approximated by an (N+2)(N\!+\!2)-point measure with weights 1/(N+2)1/(N\!+\!2) and similarly any finite signed measure can be approximated by an (N+2)(N\!+\!2)-point signed measure with equal weights (in absolute value). We hence could use a direct approximation of the optimal signed measures of the form (2.12) by a sequence of (N+2)(N\!+\!2)-point signed measures with equal weights (in absolute value). For an increasing sample size this sequence will eventually converge to the optimal measure of Theorem 2.3. However, this convergence will typically be very slow, where we measure the speed of convergence by the differences between the variances D(ξ)D(\xi) of the corresponding estimates and the optimal value DD^{*} defined in (2.16). The main difficulty lies in the fact that a typical optimal measure has masses at the boundary points aa and bb, in addition to some density on the interval (a,b)(a,b). The convergence of discrete measures with equal (in absolute value) weights to such a measure will be very slow, especially in view of the fact that in our approximating measures the points cannot be repeated. Summarizing, approximation of the optimal signed measures by measures with equal weights is possible but cannot be accurate for small NN.

In order to improve the rate of convergence we propose a slight modification of the ordinary least squares procedure. In particular, we propose a WLSE with weights at the points aa and bb (the end-points of the interval [a,b][a,b]), which correspond to the masses PaP_{a} and PbP_{b} of the asymptotic optimal design. We thus only need to approximate the continuous part of the optimal signed measure, which has a density on (a,b)(a,b), by an NN-point design with equal masses. To be precise, consider an optimal measure of the form (2.12). We assume that the density p()p(\cdot) is not identically zero on the interval (a,b)(a,b) and choose the constant cc such that ab|p(t)|𝑑t=1\int_{a}^{b}|p(t)|dt=1. Note that unless p()p(\cdot) changes sign in (a,b)(a,b), we can choose p(t)0p(t)\geq 0 for all t(a,b)t\in(a,b). Define φ(t)=|p(t)|\varphi(t)=|p(t)| for t(a,b)t\in(a,b) and denote by F(t)=atφ(s)𝑑sF(t)=\int_{a}^{t}\varphi(s)ds the corresponding distribution function. The NN-point design we use as an NN-point approximation to the measure with density φ(t)\varphi(t) is ξ^N={t1,N,,tN,N;1/N,,1/N}\hat{\xi}_{N}=\{t_{1,N},\ldots,t_{N,N};1/N,\ldots,1/N\}, where ti,N=F1(zi,N)t_{i,N}=F^{-1}(z_{i,N}) with zi,N=i/(N+1)z_{i,N}=i/(N+1), i=1,2,,Ni=1,2,\ldots,N. If p(t)=0p(t)=0 on a sub-interval of [a,b][a,b] and F1(zi,N)F^{-1}(z_{i,N}) is not uniquely defined then we choose the smallest element from the set F1(zi,N)F^{-1}(z_{i,N}) as ti,Nt_{i,N}. Finally, the design we suggest as an (N+2N\!+\!2)-point approximation to the optimal measure in (2.12) is

ξN+2=Paδa+Pbδb+Pξ¯N,\xi^{*}_{N\!+\!2}=P_{a}\delta_{a}+P_{b}\delta_{b}+P\bar{\xi}_{N},

where P=1|Pa||Pb|P=1-|P_{a}|-|P_{b}|, ξ¯N={t1,N,,tN,N;s1,N/N,,sN,N/N}\bar{\xi}_{N}=\{t_{1,N},\ldots,t_{N,N};s_{1,N}/N,\ldots,s_{N,N}/N\} and si,N=sign(p(ti,N)),s_{i,N}={\rm sign}(p(t_{i,N})), i=1,,Ni=1,\ldots,N.
The matrix 𝐖\mathbf{W}, which corresponds to the design ξN+2\xi_{N\!+\!2} and is used in the corresponding WLSE (1.3), is a diagonal matrix 𝐖N=diag(NPa,s1,NP,s2,NP,,sN,NP,NPb)\mathbf{W}_{N}={\rm diag}(NP_{a},s_{1,N}P,s_{2,N}P,\ldots,s_{N,N}P,NP_{b}) of size (N+2)×(N+2)(N\!+\!2)\!\times\!(N\!+\!2). The set of N+2N\!+\!2 design points, where the observations should be taken, is given by {a,t1,N,t2,N,,tN,N,b}\{a,t_{1,N},t_{2,N},\ldots,t_{N,N},b\} and the resulting estimate is defined by

θ^WLSE,N=(𝐗T𝐖N𝐗)1𝐗T𝐖N𝐘.\displaystyle\widehat{\theta}_{WLSE,N}=(\mathbf{X}^{T}\mathbf{W}_{N}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{W}_{N}\mathbf{Y}. (2.17)

It follows from (1.4), (2.15) and the discussion of the previous paragraph that

limNVar(θ^WLSE,N)=limND(ξN+2)=D,\lim_{N\to\infty}\mathrm{Var}(\hat{\theta}_{\rm WLSE,N})=\lim_{N\to\infty}D(\xi^{*}_{N\!+\!2})=D^{*},

where DD^{*} is defined in (2.14).

2.6 Some numerical results

Consider the regression model (2.1) with f(t)=t2+1f(t)=t^{2}+1, t[1,2]t\in[1,2], where the error process is given by the Brownian motion. The optimal design for this model can be obtained from Table 3, and we have Pa=0P_{a}=0, Pb=0.55P_{b}=-0.55, P=0.45P=0.45 and p(t)=1.38/(t2+1)p(t)=1.38/(t^{2}+1). By computing the quantiles from the c.d.f. corresponding to pp we can easily obtain support points of (N+2)(N\!+\!2)-point designs. For example, supp(ξ4)={1,1.24,1.56,2}\mathrm{supp}\,(\xi^{*}_{4})=\{1,1.24,1.56,2\}, supp(ξ5)={1,1.18,1.39,1.65,2}\mathrm{supp}\,(\xi_{5}^{*})=\{1,1.18,1.39,1.65,2\} and supp(ξ6)={1,1.14,1.30,1.49,1.71,2}\mathrm{supp}\,(\xi_{6}^{*})=\{1,1.14,1.30,1.49,1.71,2\}.

In Figure 1 we display the variance of various linear unbiased estimators for different sample sizes. We observe that the variance of the WLSE defined by (2.17) for the proposed (N+2)(N\!+\!2)-point design ξN+2\xi^{*}_{N+2} is slightly larger than the variance of the BLUE for the proposed (N+2)(N\!+\!2)-point design, which is very close to the variance of the BLUE with corresponding optimal (N+2)(N\!+\!2)-point design. The calculation of these designs is complicated and has been performed numerically by the Nelder-Mead algorithm in MATLAB. We also note that due to the non-convexity of the optimization problem it is not clear that the algorithm finds the optimal design. However, by Theorem 2.2 and 2.3 we determined the optimal value (2.14), which is D0.075004D^{*}\simeq 0.075004. This means that for the proposed designs WLSE has almost the same precision as BLUE.

Refer to caption
Figure 1: The variance of the WLSE defined in (2.17) for the proposed (N+2)(N\!+\!2)-point designs ξN+2\xi^{*}_{N+2} (crosses), of the BLUE for the proposed (N+2)(N\!+\!2)-point designs (grey circles) and of the BLUE with corresponding optimal (N+2)(N\!+\!2)-point designs (line). The error process in model (2.1) is given by the Brownian motion and the regression function is f(t)=t2+1f(t)=t^{2}+1, t[1,2]t\in[1,2].

In our second example we compare the proposed optimal designs with the designs from Sacks and Ylvisaker, (1966), which are constructed for the BLUE. For this purpose we consider the model (2.1) with regression function f(t)=1+0.5sin(2πt)f(t)=1+0.5\sin(2\pi t), t[1,2]t\in[1,2], and triangular covariance kernel of the form (2.4) with u(t)=t2u(t)=t^{2} and v(t)=tv(t)=t. The optimal design in the continuous time model can be obtained from Table 4 and its density is depicted in Figure 2.

Refer to caption
Figure 2: The density of the optimal design for continuous time model (2.1) with regression function f(t)=1+0.5sin(2πt)f(t)=1+0.5\sin(2\pi t), t[1,2]t\in[1,2], and covariance kernel of the form (2.4) with u(t)=t2u(t)=t^{2} and v(t)=tv(t)=t.

By computing quantiles using this optimal design, we obtain that the 4-point design ξ4\xi^{*}_{4} is supported at points 1, 1.27, 1.68 and 2. For ξ4\xi^{*}_{4}, the variance of the BLUE is 0.6129\simeq 0.6129. Using the optimal density from Sacks and Ylvisaker, (1966), we obtain the 4-point design ξ4SY\xi^{SY}_{4} supported at 1, 1.25, 1.63 and 2. For ξ4SY\xi^{SY}_{4}, the variance of the BLUE is 0.6200\simeq 0.6200. For N=2,3,,20N=2,3,\ldots,20, the variances of the BLUE for the proposed (N+2)(N+2)-point designs, the (N+2)(N+2)-point designs from Sacks and Ylvisaker, (1966) and the optimal (N+2)(N+2)-point designs for the BLUE are depicted in Figure 3. We observe that for N=2,3,4N=2,3,4 the new designs yield a smaller variance of the BLUE, while for N=5N=5 the design of Sacks and Ylvisaker, (1966) shows a better performance. In all other cases the results for both designs are very similar. In particular, for N6N\geq 6 the variances from the optimal (N+2)(N\!+\!2)-point designs proposed in this paper and in the paper of Sacks and Ylvisaker, (1966) are only slightly worse than the variances of the BLUE with corresponding best (N+2)(N\!+\!2)-point designs (which is computed by direct optimization).

Refer to caption
Figure 3: The variance of BLUE for the proposed (N+2)(N\!+\!2)-point designs (grey circles), the (N+2)(N\!+\!2)-point designs from Sacks and Ylvisaker, (1966) (crosses) and the BLUE with corresponding optimal (N+2)(N+2)-point designs (line) for the model f(t)=1+0.5sin(2πt)f(t)=1+0.5\sin(2\pi t), t[1,2],t\in[1,2], and the covariance kernel with u(t)=t2u(t)=t^{2} and v(t)=tv(t)=t; N=2,,20N=2,\ldots,20.

3 Multi-parameter models

In this section we discuss optimal design problems for models with more than one parameter. The structure of this section is somewhat similar to the structure of Section 2. In Section 3.1 we introduce a new class of linear estimators of the parameters in model (1.3), which we call matrix-weighted estimators (MWE) and show in Lemma 3.3 that for some special choices of the matrix weights the MWE can always emulate the BLUE. In Section 3.2 matrix-weighted designs associated with the MWE are defined. Then, for the case of triangular kernels, in Corollary 3.1 we derive the asymptotic forms for the sequence of designs that are associated with the version of the MWE which emulates the BLUE. In Section 3.3 we prove optimality of the asymptotic matrix-weighted measure derived in Corollary 3.1 in the continuous time model (1.2) (see Theorem 3.1), while some examples of asymptotically optimal measures are provided in Section 3.4. Finally, the practical implementation of the asymptotic measures is discussed in Section 3.5 and numerical examples are provided in Section 3.6.
The proofs of many statements in this section use the results of Section 2. This is possible as there is a lot of freedom in choosing the form of the MWE to emulate the BLUE and we choose a special form which could be considered as component-wise SLSE. Correspondingly, the resulting matrix-weighted designs (including the asymptotic ones) become combinations of designs for one-parameter models.

3.1 Matrix-weighted estimators and designs

Consider the regression model (1.1) and assume that NN observations at points tjt_{j} (j=1,,Nj=1,\ldots,N) have been made. Let 𝐎j\mathbf{O}_{j} be an m×mm\times m matrix associated with the observation point tjt_{j}; j=1,,N.j=1,\ldots,N. Recall the definition of the design matrix 𝐗=(fi(tj))j=1,,Ni=1,,m\mathbf{X}=(f_{i}(t_{j}))^{i=1,\ldots,m}_{j=1,\ldots,N} and the definition of 𝐘=(y(t1),,y(tN))T.\mathbf{Y}=(y(t_{1}),\ldots,y(t_{N}))^{T}. We introduce the m×Nm\times N matrix 𝐂=(𝐎1f(t1),,𝐎Nf(tN)),\mathbf{C}=(\mathbf{O}_{1}f(t_{1}),\ldots,\mathbf{O}_{N}f(t_{N}))\,, whose jj-th column is 𝐎jf(tj)\mathbf{O}_{j}f(t_{j}). Assuming that the m×mm\times m matrix

𝐌=𝐂𝐗=j=1N𝐎jf(tj)fT(tj)\displaystyle\mathbf{M}=\mathbf{C}\mathbf{X}=\sum_{j=1}^{N}\mathbf{O}_{j}f(t_{j})f^{T}(t_{j}) (3.1)

is non-singular we define the linear estimator

θ^MWE=(𝐂𝐗)1𝐂𝐘\displaystyle\widehat{\theta}_{MWE}=(\mathbf{C}\mathbf{X})^{-1}\mathbf{C}\mathbf{Y} (3.2)

for the vector θ\theta in model (1.1). We call this estimator the matrix-weighted estimator (MWE), because each column of the matrix 𝐗\mathbf{X} is multiplied by a matrix weight. It is easy to see that for any 𝐂\mathbf{C} the MWE θ^MWE\widehat{\theta}_{MWE} is unbiased and its covariance matrix is given by

Var(θ^MWE)=𝐌1𝐂𝚺𝐂T(𝐌1)T,\displaystyle\mbox{Var}(\widehat{\theta}_{MWE})=\mathbf{M}^{-1}\mathbf{C}\mathbf{\Sigma}\mathbf{C}^{T}(\mathbf{M}^{-1})^{T}, (3.3)

where 𝚺=(K(ti,tj))i,j=1,,N\mathbf{\Sigma}=(K(t_{i},t_{j}))_{i,j=1,\ldots,N} is the N×NN\!\times\!N matrix of covariances of the errors. Note that the matrix 𝐌\mathbf{M} defined in (3.1) generalizes the standard information matrix 𝐗𝐓𝐗\mathbf{X^{T}X} and that 𝐌\mathbf{M} is not necessarily a symmetric matrix. The following result shows that different matrices 𝐎1,,𝐎N\mathbf{O}_{1},\ldots,\mathbf{O}_{N} may yield the same matrix-weighted estimator θ^MWE\widehat{\theta}_{MWE}. Its proof is obvious and therefore omitted.

Lemma 3.1

Consider the regression model (1.1) and assume that the matrix 𝐌\mathbf{M} defined in (3.1) is non-singular. Then the estimator θ^MWE\widehat{\theta}_{MWE} defined in (3.2) coincides with the estimator θ^MWE,Λ=(𝐂Λ𝐗)1𝐂Λ𝐘,\widehat{\theta}_{MWE,\Lambda}=(\mathbf{C}_{\Lambda}\mathbf{X})^{-1}\mathbf{C}_{\Lambda}\mathbf{Y}\,, where 𝐂Λ=Λ𝐂\mathbf{C}_{\Lambda}=\Lambda\mathbf{C} and Λ\Lambda is an arbitrary non-singular m×mm\times m matrix.

The estimator θ^MWE,Λ\widehat{\theta}_{MWE,\Lambda} introduced in Lemma 3.1 is the MWE defined by the matrix weights Λ𝐎1,,Λ𝐎N\Lambda\mathbf{O}_{1},\ldots,\Lambda\mathbf{O}_{N}. Lemma 3.1 implies that the θ^MWE\widehat{\theta}_{MWE} is exactly the same for any set of matrices {Λ𝐎1,,Λ𝐎N}\{\Lambda\mathbf{O}_{1},\ldots,\Lambda\mathbf{O}_{N}\} as long as Λ\Lambda is non-singular. In the asymptotic considerations below it will be convenient to interpret the combination of the set of experimental conditions {t1,,tN}\{t_{1},\ldots,t_{N}\} and the set of corresponding matrices {𝐎1,,𝐎N}\{\mathbf{O}_{1},\ldots,\mathbf{O}_{N}\} in the MWE as an NN-point matrix-weighted design.

Definition 3.1

Any combination of NN points {t1,,tN}\{t_{1},\ldots,t_{N}\} and m×mm\times m matrices {𝐎1,,𝐎N}\{\mathbf{O}_{1},\ldots,\mathbf{O}_{N}\} will be called NN-point matrix-weighted design and denoted by

ξN={t1,,tN;1N𝐎1,,1N𝐎N}.\displaystyle\xi_{N}=\big{\{}t_{1},\ldots,t_{N};\frac{1}{N}\mathbf{O}_{1},\ldots,\frac{1}{N}\mathbf{O}_{N}\big{\}}\,. (3.4)

The covariance matrix 𝐃(ξN)\mathbf{D}(\xi_{N}) of a matrix-weighted design ξN\xi_{N} is defined as the covariance matrix Var(θ^MWE){\rm Var}(\widehat{\theta}_{MWE}) in (3.3) of the corresponding estimate θ^MWE\widehat{\theta}_{MWE}.

The estimator θ^MWE\widehat{\theta}_{MWE} is not necessarily a least-squares type estimator; that is, it may not be representable in the form (1.3) for some N×NN\times N weight matrix 𝐖\mathbf{W} and hence there may be no associated weighted sum of squares which is minimized by the MWE. However, for any given 𝐖\mathbf{W}, we can always find matrices 𝐎j\mathbf{O}_{j} such that

𝐂=𝐗T𝐖\displaystyle\mathbf{C}=\mathbf{X}^{T}\mathbf{W} (3.5)

and therefore achieve θ^MWE=θ^WLSE\widehat{\theta}_{MWE}=\widehat{\theta}_{WLSE}. The following result gives a constructive solution to the matrix equations (3.5).

Lemma 3.2

Assume that f1(t)0f_{1}(t)\neq 0 for all t[a,b]t\in[a,b]. Define 𝐎j=ωje1T\mathbf{O}_{j}=\omega_{j}e_{1}^{T},

ωj=1f1(tj)(𝐗T𝐖)jm,\displaystyle\omega_{j}=\frac{1}{f_{1}(t_{j})}(\mathbf{X}^{T}\mathbf{W})_{j}\in\mathbb{R}^{m}\,, (3.6)

where e1=(1,0,,0)Tme_{1}=(1,0,\ldots,0)^{T}\in\mathbb{R}^{m} is the first unit vector and (𝐗T𝐖)j(\mathbf{X}^{T}\mathbf{W})_{j} denotes the jj-th column of the m×Nm\times N matrix 𝐗T𝐖\mathbf{X}^{T}\mathbf{W}. Then the corresponding matrix-weighted estimator satisfies θ^MWE=θ^WLSE\widehat{\theta}_{MWE}=\widehat{\theta}_{WLSE}.

Proof. The matrix equation (3.5) can be written as NN vector equations

𝐎jf(tj)=(𝐗T𝐖)j;j=1,,N,\displaystyle\mathbf{O}_{j}f(t_{j})=(\mathbf{X}^{T}\mathbf{W})_{j};\qquad j=1,\dots,N, (3.7)

with respect to the matrices 𝐎j\mathbf{O}_{j}. Assume that 𝐎j=ωje1T\mathbf{O}_{j}=\omega_{j}e_{1}^{T} for some ωjm\omega_{j}\in\mathbb{R}^{m}. Then

𝐎jf(tj)=ωje1Tf(tj)=ωjf1(tj)\displaystyle\mathbf{O}_{j}f(t_{j})=\omega_{j}e_{1}^{T}f(t_{j})=\omega_{j}f_{1}(t_{j})

and equation (3.7) has the unique solutions (3.6). \Box

The form 𝐎j=ωje1T\mathbf{O}_{j}=\omega_{j}e_{1}^{T} for the matrices 𝐎j\mathbf{O}_{j} considered in Lemma 3.2 means that the matrix 𝐎j\mathbf{O}_{j} has the vector ωj\omega_{j} as its first column while all other entries in this matrix are zero. We shall refer to this form as the one-column form. We can choose other forms for the matrices 𝐎j\mathbf{O}_{j}, but then we would require different, somewhat stronger, assumptions regarding the vector f(t)f(t). For example, if f(t)(0,,0)Tf(t)\neq(0,\ldots,0)^{T} for all t[a,b]t\in[a,b], then we can always choose diagonal matrices 𝐎j\mathbf{O}_{j} to satisfy (3.5) (see Lemma 3.5 below).

The following choices for 𝐎j\mathbf{O}_{j} ensure coincidence of θ^MWE\widehat{\theta}_{MWE} with the three popular estimators defined in the Introduction.

If 𝐎j=𝐈m\mathbf{O}_{j}=\mathbf{I}_{m} for all jj, then θ^MWE=θ^OLSE\widehat{\theta}_{MWE}=\widehat{\theta}_{OLSE}.

If 𝐎j=sj𝐈m\mathbf{O}_{j}=s_{j}\mathbf{I}_{m} for all jj, then θ^MWE=θ^SLSE\widehat{\theta}_{MWE}=\widehat{\theta}_{SLSE}.

If 𝐖=𝚺1\mathbf{W}=\mathbf{\Sigma}^{-1} and 𝐎j=ωje1T\mathbf{O}_{j}=\omega_{j}e_{1}^{T} with ωj=(𝐗T𝚺1)j/f1(tj)\omega_{j}=(\mathbf{X}^{T}\mathbf{\Sigma}^{-1})_{j}/{f_{1}(t_{j})}, then θ^MWE=θ^BLUE\widehat{\theta}_{MWE}=\widehat{\theta}_{BLUE}.

We shall call any MWE θ^MWE\widehat{\theta}_{MWE} optimal if it coincides with the BLUE. In view of the importance of the last case, the corresponding result is summarized in the following lemma.

Lemma 3.3

Consider the regression model (1.1) and let f1(t)0f_{1}(t)\neq 0 for all t[a,b]t\in[a,b]. For a given set of NN observation points {t1,,tN}\{t_{1},\ldots,t_{N}\} the MWE θ^MWE\widehat{\theta}_{MWE} defines a BLUE if 𝐎j=ωje1T\mathbf{O}_{j}=\omega_{j}^{*}e_{1}^{T} with ωj=(𝐗T𝚺1)j/f1(tj)\omega_{j}^{*}=(\mathbf{X}^{T}\mathbf{\Sigma}^{-1})_{j}/{f_{1}(t_{j})}.

If the covariance kernel of the error process has triangular form (2.4) then we can derive the explicit form for the optimal MWE. The result follows by a direct application of Lemma A.1.

Lemma 3.4

Assume that the covariance kernel K(,)K(\cdot,\cdot) has the form (2.4) and that the matrix 𝚺=(K(ti,tj))i,j=1,,N\mathbf{\Sigma}=(K(t_{i},t_{j}))_{i,j=1,\ldots,N} is positive definite with entries K(ti,tj)=uivjK(t_{i},t_{j})=u_{i}v_{j} for iji\leq j, where for k=1,,Nk=1,\ldots,N we denote uk=u(tk)u_{k}=u(t_{k}), vk=v(tk)v_{k}=v(t_{k}) and qk=uk/vkq_{k}=u_{k}/v_{k}. Then we have the following representation for the optimal vectors ωj=(𝐗T𝚺1)j/f1(tj)m\omega_{j}^{*}=(\mathbf{X}^{T}\mathbf{\Sigma}^{-1})_{j}/{f_{1}(t_{j})}\in\mathbb{R}^{m} introduced in Lemma 3.3:

ω1\displaystyle\omega_{1}^{*} =\displaystyle= cf1(t1)(σ~11f(t1)+σ~12f(t2))=cu2f1(t1)v1v2(q2q1)(f(t1)u1f(t2)u2),\displaystyle\frac{c}{f_{1}(t_{1})}\left(\tilde{\sigma}_{11}f(t_{1})+\tilde{\sigma}_{12}{f(t_{2})}\right)=\frac{c\,u_{2}}{f_{1}(t_{1})v_{1}v_{2}(q_{2}-q_{1})}\left(\frac{f(t_{1})}{u_{1}}-\frac{f(t_{2})}{u_{2}}\right)\,, (3.8)
ωN\displaystyle\omega_{N}^{*} =\displaystyle= cf1(tN)(σ~N,Nf(tN)+σ~N1,Nf(tN1))\displaystyle\frac{c}{f_{1}(t_{N})}\left(\tilde{\sigma}_{N,N}{f(t_{N})}+\tilde{\sigma}_{N-1,N}{f(t_{N-1})}\right)
=\displaystyle= cf1(tN)vN(qNqN1)(f(tN)vNf(tN1)vN1),\displaystyle\frac{c}{f_{1}(t_{N})v_{N}(q_{N}-q_{N-1})}\left(\frac{f(t_{N})}{v_{N}}-\frac{f(t_{N-1})}{v_{N-1}}\right)\,,
ωi\displaystyle\omega_{i}^{*} =\displaystyle= cf1(ti)(σ~i,if(ti)+σ~i1,if(ti1)+σ~i,i+1f(ti+1))\displaystyle\frac{c}{f_{1}(t_{i})}\left(\tilde{\sigma}_{i,i}f(t_{i})+\tilde{\sigma}_{i-1,i}f(t_{i-1})+\tilde{\sigma}_{i,i+1}f(t_{i+1})\right)
=\displaystyle= cf1(ti)vi((qi+1qi1)f(ti)vi(qi+1qi)(qiqi1)f(ti1)vi1(qiqi1)f(ti+1)vi+1(qi+1qi)),\displaystyle\frac{c}{f_{1}(t_{i})v_{i}}\left(\frac{(q_{i+1}-q_{i-1})f(t_{i})}{v_{i}(q_{i+1}-q_{i})(q_{i}-q_{i-1})}-\frac{f(t_{i-1})}{v_{i-1}(q_{i}-q_{i-1})}-\frac{f(t_{i+1})}{v_{i+1}(q_{i+1}-q_{i})}\right)\,,

for i=2,,N1i=2,\ldots,N-1. Here in formulas (3.8), (3.4) and (3.4) σ~ij\tilde{\sigma}_{ij} denote the elements of the matrix 𝚺1=(σ~ij)i,j=1,,N\mathbf{\Sigma}^{-1}=(\tilde{\sigma}_{ij})_{i,j=1,\ldots,N}.

The following provides a result similar to Lemmas 3.2 and 3.3 in the case where the matrices 𝐎j\mathbf{O}_{j} are diagonal. An extension of Lemma 3.4 to the matrices 𝐎j\mathbf{O}_{j} of the diagonal form is straightforward and omitted for the sake of brevity.

Lemma 3.5

Consider the regression model (1.1) and let fk(t)0f_{k}(t)\neq 0 for all t[a,b]t\in[a,b] and all k=1,,mk=1,\ldots,m. For each j=1,,Nj=1,\ldots,N, define the diagonal matrix 𝐎j\mathbf{O}_{j} by its diagonal elements

(𝐎j)k,k=1fk(tj)(𝐗T𝐖)k,j;k=1,,m,\displaystyle(\mathbf{O}_{j})_{k,k}=\frac{1}{f_{k}(t_{j})}(\mathbf{X}^{T}\mathbf{W})_{k,j};\qquad k=1,\ldots,m\,,

where (𝐗T𝐖)k,j(\mathbf{X}^{T}\mathbf{W})_{k,j} denotes the (k,j)(k,j)-th element of the matrix 𝐗T𝐖\mathbf{X}^{T}\mathbf{W}. Then θ^MWE=θ^WLSE\widehat{\theta}_{MWE}=\widehat{\theta}_{WLSE}.

If additionally 𝐖=𝚺1\mathbf{W}=\mathbf{\Sigma}^{-1} so that (𝐎j)k,k=(𝐗T𝚺1)k,j/fk(tj)(\mathbf{O}_{j})_{k,k}=(\mathbf{X}^{T}\mathbf{\Sigma}^{-1})_{k,j}/f_{k}(t_{j}), then θ^MWE=θ^BLUE\widehat{\theta}_{MWE}=\widehat{\theta}_{BLUE}.

3.2 Weak convergence of matrix-weighted designs

Let Q:[a,b][0,1]Q:[a,b]\to[0,1] be an increasing function on the interval [a,b][a,b] with Q(a)=0Q(a)=0 and Q(b)=1Q(b)=1 so that Q()Q(\cdot) is a c.d.f. For a fixed NN and j=1,,Nj=1,\ldots,N, define the points t1,N,,tN,Nt_{1,N},\dots,t_{N,N} by (2.9). Suppose that with each t[a,b]t\in[a,b] we can associate an m×mm\times m matrix 𝐎(t)\mathbf{O}(t) and consider an NN-point matrix-weighted design ξN\xi_{N} of the form (3.4) with tj=tj,Nt_{j}=t_{j,N} and 𝐎j=𝐎(tj,N)\mathbf{O}_{j}=\mathbf{O}(t_{j,N}). In view of (3.1) and (3.3) this design has the covariance matrix

𝐃(ξN)=𝐌1(ξN)𝐁(ξN)(𝐌1(ξN))T,\mathbf{D}(\xi_{N})=\mathbf{M}^{-1}(\xi_{N})\mathbf{B}(\xi_{N})\left(\mathbf{M}^{-1}(\xi_{N})\right)^{T}\,,

where the matrices M(ξN)M(\xi_{N}) and B(ξN)B(\xi_{N}) are defined by

𝐌(ξN)\displaystyle\mathbf{M}(\xi_{N}) =\displaystyle= 1Nj=1N𝐎(tj,n)f(tj,n)fT(tj,n),\displaystyle\frac{1}{N}\sum_{j=1}^{N}\mathbf{O}(t_{j,n})f(t_{j,n})f^{T}(t_{j,n}),
𝐁(ξN)\displaystyle\mathbf{B}(\xi_{N}) =\displaystyle= 1N2i=1Nj=1NK(ti,n,tj,n)𝐎(ti,n)f(ti,n)fT(tj,n)𝐎T(tj,n).\displaystyle\frac{1}{N^{2}}\sum_{i=1}^{N}\sum_{j=1}^{N}K(t_{i,n},t_{j,n})\mathbf{O}(t_{i,n})f(t_{i,n})f^{T}(t_{j,n})\mathbf{O}^{T}(t_{j,n}).

In addition to the sequence of matrix-weighted designs ξN\xi_{N} consider the sequence of uniform distributions on the set {t1,N,,tN,N}.\{t_{1,N},\ldots,t_{N,N}\}. As NN\to\infty, this sequence converges weakly to the design (probability measure) ζ\zeta on the interval [a,b][a,b] with distribution function QQ. This implies

limN𝐌(ξN)=𝐌(ξ)=ab𝐎(t)f(t)fT(t)ζ(dt)\lim_{N\to\infty}\mathbf{M}(\xi_{N})=\mathbf{M}(\xi)=\int_{a}^{b}\mathbf{O}(t)f(t)f^{T}(t)\zeta(dt)
limN𝐁(ξN)=𝐁(ξ)=ababK(t,s)𝐎(t)f(s)fT(t)𝐎T(s)ζ(dt)ζ(ds),\lim_{N\to\infty}\mathbf{B}(\xi_{N})=\mathbf{B}(\xi)=\int_{a}^{b}\int_{a}^{b}K(t,s)\mathbf{O}(t)f(s)f^{T}(t)\mathbf{O}^{T}(s)\zeta(dt)\zeta(ds)\,,

and

limN𝐃(ξN)=𝐃(ξ)=𝐌1(ξ)𝐁(ξ)(𝐌1(ξ))T\displaystyle\lim_{N\to\infty}\mathbf{D}(\xi_{N})=\mathbf{D}(\xi)=\mathbf{M}^{-1}(\xi)\mathbf{B}(\xi)\left(\mathbf{M}^{-1}(\xi)\right)^{T} (3.11)

under the assumptions that the vector-valued function ff, the matrix-valued function 𝐎\mathbf{O}, the kernel KK are continuous on the interval [a,b][a,b] and the generalized information matrix 𝐌(ξ)\mathbf{M}(\xi) are non-singular. Moreover, the sequence of estimators (3.2) converges (almost surely as NN\to\infty) to

θ^MWE,=𝐌1(ξ)ab𝐎(t)f(t)y(t)ζ(dt),\displaystyle\widehat{\theta}_{MWE,\infty}=\mathbf{M}^{-1}(\xi)\int_{a}^{b}\mathbf{O}(t)f(t)y(t)\zeta(dt)\,, (3.12)

where {y(t)|t[a,b]}\{y(t)\,|\,t\in[a,b]\} is the stochastic process in the continuous time model (1.2). Bearing these limiting expressions in mind we say that the sequence of matrix-weighted designs ξN\xi_{N} defined by (3.4) converges to the limiting matrix-weighted design ξ(dt)=𝐎(t)ζ(dt)\xi(dt)=\mathbf{O}(t)\zeta(dt) as NN\to\infty. This relation justifies the notation 𝐌(ξ),𝐁(ξ)\mathbf{M}(\xi),\mathbf{B}(\xi) and 𝐃(ξ)\mathbf{D}(\xi) of the previous paragraph.

The (optimal) limiting matrix-weighted designs which will be constructed below will have a similar structure as the signed measures in (2.12). They will assign matrix weights 𝐎a\mathbf{O}_{a} and 𝐎b\mathbf{O}_{b} to the end-points of the interval [a,b][a,b] and a ‘matrix density’ 𝐎(t)\mathbf{O}(t) to the points t(a,b)t\in(a,b); that is, these designs will have the form

ξ(dt)=𝐎aδa(dt)+𝐎bδb(dt)+𝐎(t)dt.\xi(dt)=\mathbf{O}_{a}\delta_{a}(dt)+\mathbf{O}_{b}\delta_{b}(dt)+\mathbf{O}(t)dt\,. (3.13)

In view of (3.12), the MWE in the continuous time model (1.2) associated with any design of the form (3.13) can be written as

θ^MWE(ξ)=𝐌1(ξ)[𝐎af(a)y(a)+𝐎bf(b)y(b)+ab𝐎(t)f(t)y(t)𝑑t],\displaystyle\widehat{\theta}_{MWE}(\xi)=\mathbf{M}^{-1}(\xi)\Big{[}\mathbf{O}_{a}f(a)y(a)+\mathbf{O}_{b}f(b)y(b)+\int_{a}^{b}\mathbf{O}(t)f(t)y(t)dt\Big{]}\,, (3.14)

where 𝐌(ξ)=𝐎af(a)fT(a)+𝐎bf(b)fT(b)+ab𝐎(t)f(t)fT(t)𝑑t\mathbf{M}(\xi)=\mathbf{O}_{a}f(a)f^{T}(a)+\mathbf{O}_{b}f(b)f^{T}(b)+\int_{a}^{b}\mathbf{O}(t)f(t)f^{T}(t)dt. In the particular case associated with Lemma 3.4, we have the following structure of the matrices 𝐎a\mathbf{O}_{a} and 𝐎b\mathbf{O}_{b} and the matrix function 𝐎(t)\mathbf{O}(t) in (3.13):

𝐎a=ωae1T,𝐎b=ωbe1T,𝐎(t)=ω(t)e1Tfort(a,b),\mathbf{O}_{a}=\omega_{a}e_{1}^{T},\;\mathbf{O}_{b}=\omega_{b}e_{1}^{T},\;\,\mathbf{O}(t)=\omega(t)e_{1}^{T}\,\;{\rm for\;}t\in(a,b), (3.15)

where ωa\omega_{a} and ωb\omega_{b} are some mm-dimensional vectors and ω(t)m\omega(t)\in\mathbb{R}^{m} is some vector-valued function defined on the interval (a,b)(a,b). Note that ω(t)\omega(t) does not have to approach ωa\omega_{a} and ωb\omega_{b} as tat\to a and tbt\to b, respectively.

When the sequence of matrix-weighted designs is defined by the formulas of Lemma 3.3 we can compute the limiting matrix-weighted design. The proof follows by similar arguments as given in the proof of Theorem 2.1 and is therefore omitted.

Corollary 3.1

Consider model (1.1), where the error process {ε(t)|t[a,b]}\{\varepsilon(t)|\,t\in[a,b]\} has a covariance kernel KK of the form (2.4). Assume that u()u(\cdot), v()v(\cdot), q()q(\cdot) are strictly positive, twice continuously differentiable functions on the interval [a,b][a,b] and that the vector-valued function f()f(\cdot) is twice continuously differentiable with f1(t)0f_{1}(t)\neq 0 for all t[a,b]t\in[a,b]. Consider the matrix-weighted design ξN\xi_{N} of the form (3.4), where the support points tj=tj,Nt_{j}=t_{j,N} are generated by (2.9) and the matrix weights 𝐎j=𝐎j,N\mathbf{O}_{j}=\mathbf{O}_{j,N} are defined in Lemma 3.3. The sequence {ξN}N\{\xi_{N}\}_{N\in\mathbb{N}} converges (in the sense defined above in the previous paragraph) to a matrix-weighted design ξ\xi defined by (3.13) and (3.15) with

ωa=cf1(a)v2(a)q(a)[f(a)u(a)u(a)f(a)],ωb=ch(b)f1(b)v(b)q(b),ω(t)=cf1(t)v(t)[h(t)q(t)]\displaystyle\omega_{a}\!=\!\frac{c}{f_{1}(a)v^{2}(a)q^{\prime}(a)}\left[\frac{f(a)u^{\prime}(a)}{u(a)}-f^{\prime}(a)\right],\;\omega_{b}\!=\!\frac{c\;h^{\prime}(b)}{f_{1}(b)v(b)q^{\prime}(b)},\;\omega(t)\!=\!\frac{-c}{f_{1}(t)v(t)}\left[\frac{h^{\prime}(t)}{q^{\prime}(t)}\right]^{\prime}\;\;\;\; (3.16)

where h(t)=f(t)/v(t)h(t)=f(t)/v(t) and the constant c0c\neq 0 is arbitrary.

In Corollary 3.1, the one-column representation of the matrices 𝐎j\mathbf{O}_{j} is used. The following statement contains a similar result for the case where the matrices 𝐎j\mathbf{O}_{j} are diagonal.

Corollary 3.2

Let the conditions of Corollary 3.1 hold and assume additionally that fk(t)0f_{k}(t)\neq 0 for all t[a,b]t\in[a,b] and all k=1,,mk=1,\ldots,m. Consider the matrix-weighted design ξN\xi_{N} of the form (3.4), where the support points tj=tj,Nt_{j}=t_{j,N} are generated by (2.9) and the matrices 𝐎j=𝐎j,N\mathbf{O}_{j}=\mathbf{O}_{j,N} are defined in Lemma 3.5 with diagonal elements given by (𝐎j)k,k=(𝐗T𝚺1)k,j/fk(tj)(\mathbf{O}_{j})_{k,k}=(\mathbf{X}^{T}\mathbf{\Sigma}^{-1})_{k,j}/f_{k}(t_{j}). Then the sequence {ξN}N\{\xi_{N}\}_{N\in\mathbb{N}} converges to the optimal matrix-weighted design ξ\xi^{*} of the form (3.13), where the diagonal elements of the matrices 𝐎a=diag(𝐎a,11,,𝐎a,mm)\mathbf{O}_{a}=\mathrm{diag}(\mathbf{O}_{a,11},\ldots,\mathbf{O}_{a,mm}), 𝐎b=diag(𝐎b,11,,𝐎b,mm)\mathbf{O}_{b}=\mathrm{diag}(\mathbf{O}_{b,11},\ldots,\mathbf{O}_{b,mm}) and 𝐎(t)=diag(𝐎11(t),,𝐎mm(t))\mathbf{O}(t)=\mathrm{diag}(\mathbf{O}_{11}(t),\ldots,\mathbf{O}_{mm}(t)) are given by

𝐎a,jj=cfj(a)v2(a)q(a)[fj(a)u(a)u(a)fj(a)],𝐎b,jj=chj(b)fj(b)v(b)q(b),𝐎jj(t)=cfj(t)v(t)[hj(t)q(t)]\mathbf{O}_{a,jj}\!=\!\frac{c}{f_{j}(a)v^{2}(a)q^{\prime}(a)}\!\left[\frac{f_{j}(a)u^{\prime}(a)}{u(a)}\!-\!f^{\prime}_{j}(a)\right],\;\mathbf{O}_{b,jj}\!=\!\frac{c\;h^{\prime}_{j}(b)}{f_{j}(b)v(b)q^{\prime}(b)},\;\mathbf{O}_{jj}(t)\!=\!\frac{-c}{f_{j}(t)v(t)}\!\left[\frac{h^{\prime}_{j}(t)}{q^{\prime}(t)}\right]^{\prime}\;\;\;\;\;\;

respectively, hj(t)=fj(t)/v(t)h_{j}(t)=f_{j}(t)/v(t), j=1,,mj=1,\ldots,m and the constant c0c\neq 0 is arbitrary.

3.3 Optimal designs and best linear estimators

In this section we consider again the continuous time model (1.2), where the full trajectory of the process {y(t)|t[a,b]}\{y(t)|~t\in[a,b]\} can be observed. We start recalling some known facts concerning best linear unbiased estimation. For details we refer the interested reader to the work of Grenander, (1950) or Section 2.2 in Näther, 1985a . Any linear estimator of θ\theta can be written in the form of the integral

θ^μ=aby(t)μ(dt),\displaystyle\widehat{\theta}_{\mu}=\int_{a}^{b}y(t)\mu(dt), (3.17)

where μ(t)=(μ1(t),,μm(t))T\mu(t)=(\mu_{1}(t),\ldots,\mu_{m}(t))^{T} is a vector of signed measures on the interval [a,b][a,b]. For given μ\mu, the estimator θ^μ\widehat{\theta}_{\mu} is unbiased if and only if abf(t)μT(dt)=𝐈m\int_{a}^{b}f(t)\mu^{T}(dt)=\mathbf{I}_{m}, where 𝐈m\mathbf{I}_{m} denotes the mm-dimensional identity matrix. Theorem 2.3 in Näther, 1985a states that the estimator θ^μ\widehat{\theta}_{\mu^{*}} is BLUE if and only if abf(t)μT(dt)=𝐈m\int_{a}^{b}f(t){\mu^{*}}^{T}(dt)=\mathbf{I}_{m} and the identity

abK(u,v)μ(dv)=𝐀f(u)\int_{a}^{b}K(u,v)\mu^{*}(dv)=\mathbf{A}f(u)

holds for all u[a,b]u\in[a,b], where 𝐀\mathbf{A} is some m×mm\times m matrix. The matrix 𝐀\mathbf{A} is uniquely defined and coincides with the matrix

𝐃=Var(θ^μ)=inf{ababK(u,v)μ(du)μT(dv)|μ vector of signed measures}.\mathbf{D}^{*}=\mbox{Var}(\widehat{\theta}_{\mu^{*}})=\inf\Big{\{}\int_{a}^{b}\int_{a}^{b}K(u,v)\mu(du){\mu}^{T}(dv)~\Big{|}~\mu\mbox{ vector of signed measures}\Big{\}}~. (3.18)

The Gauss-Markov theorem further implies that 𝐃Var(θ^)\mathbf{D}^{*}\leq\mbox{Var}(\widehat{\theta}), where θ^\widehat{\theta} is any other linear unbiased estimator of θ.\theta.

Definition 3.2

A matrix-weighted design ξ\xi^{*} is called optimal if 𝐃(ξ)=𝐃\mathbf{D}(\xi^{*})=\mathbf{D}^{*}, where 𝐃(ξ)\mathbf{D}(\xi) is defined in (3.11) and 𝐃\mathbf{D}^{*} is defined in (3.18).

The designs we consider have the form (3.13) and the corresponding MWE are expressed by (3.14). The estimator (3.14) can be expressed in the form (3.17), that is θ^MWE(ξ)=θ^μ\widehat{\theta}_{MWE}(\xi)=\hat{\theta}_{\mu} with

μ(dt)=𝐌1(ξ)[𝐎af(a)δa(dt)+𝐎bf(b)δb(dt)+𝐎(t)f(t)dt].\mu(dt)=\mathbf{M}^{-1}(\xi)\bigl{[}\mathbf{O}_{a}f(a)\delta_{a}(dt)+\mathbf{O}_{b}f(b)\delta_{b}(dt)+\mathbf{O}(t)f(t)dt\bigr{]}\,.

The estimators defined in (3.14) are always unbiased and the following result provides the matrix-weighted optimal design and the BLUE in the continuous time model (1.2). The proof follows by similar arguments as given in the proof of Theorem 2.2 and 2.3 and is therefore omitted.

Theorem 3.1

Let K(t,s)K(t,s) be a covariance kernel of the form (2.4) and the vector-function f()f(\cdot) be twice continuously differentiable with f1(t)0f_{1}(t)\neq 0 for all t[a,b]t\in[a,b]. Under the assumptions of Corollary 3.1 the matrix-weighted design ξ\xi^{*} defined by the formulas (3.13) and (3.16) with c=1c=1 is optimal in the sense of Definition 3.2. Moreover, if

μ(dt)=𝐌1(ξ)[ωae1Tf(a)δa(dt)+ωbe1Tf(b)δb(dt)+ω(t)e1Tf(t)dt],\mu^{*}(dt)=\mathbf{M}^{-1}(\xi^{*})\bigl{[}\omega_{a}e_{1}^{T}f(a)\delta_{a}(dt)+\omega_{b}e_{1}^{T}f(b)\delta_{b}(dt)+\omega(t)e_{1}^{T}f(t)dt\bigr{]}\,,

then θ^μ\hat{\theta}_{\mu^{*}} defines the BLUE in model (1.2). Additionally, we have

𝐃(ξ)=𝐃=ababK(s,t)μ(ds)μ(dt)=𝐌1(ξ),\displaystyle\mathbf{D}(\xi^{*})=\mathbf{D}^{*}=\int_{a}^{b}\int_{a}^{b}K(s,t)\mu^{*}(ds)\mu^{*}(dt)=\mathbf{M}^{-1}(\xi^{*}),

where the matrix 𝐌(ξ)\mathbf{M}(\xi^{*}) is given by

𝐌(ξ)=f~(q(a))f~T(q(a))q(a)+q(a)q(b)f~(s)f~T(s)𝑑s\displaystyle\mathbf{M}(\xi^{*})=\frac{\tilde{f}(q(a))\tilde{f}^{T}(q(a))}{q(a)}+\int_{q(a)}^{q(b)}\tilde{f}^{\prime}(s)\tilde{f}^{\prime T}(s)ds

and f~(s)=f(q1(s))/v(q1(s))\tilde{f}(s)=f(q^{-1}(s))/v(q^{-1}(s)).

In Theorem 3.1 we have used the one-column representation for the matrices 𝐎(t)\mathbf{O}(t). Similar arguments establish the optimality of the matrix-weighted designs ξ\xi^{*} defined in Corollary 3.2 where the diagonal representation for the matrices 𝐎(t)\mathbf{O}(t) is used. The details are omitted for the sake of brevity.

3.4 Examples of optimal matrix-weighted designs

Consider the polynomial regression model with f(t)=(1,t,t2,,tm1)Tf(t)=(1,t,t^{2},\ldots,t^{m-1})^{T}, t[a,b]t\in[a,b] and the covariance kernel of the Brownian motion K(t,s)=min(t,s)K(t,s)=\min(t,s). For the construction of matrix-weighted designs we use matrices 𝐎(t)\mathbf{O}(t) in the one-column and diagonal representations.

For the one-column representation we have from Corollary 3.1 and Theorem 3.1 that the optimal matrix weighted design has masses 𝐎a=ωae1T\mathbf{O}_{a}=\omega_{a}e_{1}^{T} and 𝐎b=ω(b)e1T\mathbf{O}_{b}=\omega(b)e_{1}^{T} at points aa and bb, respectively, and the density 𝐎(t)=ω(t)e1T\mathbf{O}(t)=\omega(t)e_{1}^{T}. Here the vectors ωa,ωb\omega_{a},\omega_{b} and ω(t)\omega(t) are given by

ωa\displaystyle\omega_{a} =\displaystyle= (1/a,0,a,,(2m)am2)T,\displaystyle(1/a,0,-a,\ldots,(2-m)a^{m-2})^{T},
ωb\displaystyle\omega_{b} =\displaystyle= (0,1,2b,3b2,,(m1)bm2)T,\displaystyle(0,1,2b,3b^{2},\ldots,(m-1)b^{m-2})^{T},
ω(t)\displaystyle\omega(t) =\displaystyle= (0,0,2,32t,,(m1)(m2)tm3)T,t(a,b),\displaystyle(0,0,-2,-3\cdot 2t,\ldots,-(m-1)(m-2)t^{m-3})^{T},~t\in(a,b),

respectively. For the diagonal representation we have from Corollary 3.2 (and an analogue of Theorem 3.1) that the optimal matrix weighted design has masses 𝐎a\mathbf{O}_{a} and 𝐎b\mathbf{O}_{b} at points aa and bb, respectively, and the density 𝐎(t)\mathbf{O}(t), where

𝐎a\displaystyle\mathbf{O}_{a} =\displaystyle= diag(1/a,0,1/a,,(2m)/a),\displaystyle\mathrm{diag}(1/a,0,-1/a,\ldots,(2-m)/a),
𝐎b\displaystyle\mathbf{O}_{b} =\displaystyle= diag(0,1/b,2/b,,(m1)/b),\displaystyle\mathrm{diag}(0,1/b,2/b,\ldots,(m-1)/b),
𝐎(t)\displaystyle\mathbf{O}(t) =\displaystyle= diag(0,0,2/t2,,(m1)(m2)/t2),t(a,b).\displaystyle\mathrm{diag}(0,0,-2/t^{2},\ldots,-(m-1)(m-2)/t^{2}),~t\in(a,b).

Note that in this case all non-vanishing diagonal elements of the matrix 𝐎(t)\mathbf{O}(t) are proportional to the function 1/t21/t^{2}. According to Lemma 3.1, we can use Λ𝐎(t)\Lambda\mathbf{O}(t) instead of 𝐎(t)\mathbf{O}(t), for any non-singular m×mm\times m matrix Λ\Lambda. By taking the matrix

Λ=diag(1,1,1/2,,1/[(m1)(m2)]),\Lambda=\mathrm{diag}(1,1,-1/2,\ldots,-1/[(m-1)(m-2)]),

we obtain

Λ𝐎(a)\displaystyle\Lambda\mathbf{O}(a)\; =\displaystyle= diag(1/a,0,1/(2a),,1/[(m1)a]),\displaystyle\mathrm{diag}(1/a,0,1/(2a),\ldots,1/[(m-1)a]),
Λ𝐎(b)\displaystyle\Lambda\mathbf{O}(b)\; =\displaystyle= diag(0,1/b,1/b,,1/[(m2)b]),\displaystyle\mathrm{diag}(0,1/b,-1/b,\ldots,-1/[(m-2)b]),
Λ𝐎(t)\displaystyle\Lambda\mathbf{O}(t) =\displaystyle= diag(0,0,1/t2,,1/t2),t(a,b).\displaystyle\mathrm{diag}(0,0,1/t^{2},\ldots,1/t^{2}),~t\in(a,b).

As another example, consider the polynomial regression model with f(t)=(1,t,t2,,tm1)Tf(t)=(1,t,t^{2},\ldots,t^{m-1})^{T}, t[a,b]t\in[a,b], and the triangular covariance kernel of the function (2.4) with u(t)=tγu(t)=t^{\gamma} and v(t)=tωv(t)=t^{\omega}.

For the diagonal representation we have from Corollary 3.2 that the optimal matrix weighted design has masses 𝐎a\mathbf{O}_{a} and 𝐎b\mathbf{O}_{b} at points aa and bb, respectively, and the density 𝐎(t)\mathbf{O}(t), where

𝐎a\displaystyle\mathbf{O}_{a} =\displaystyle= aγωdiag(γ,1γ,2γ,,m1γ),\displaystyle a^{-\gamma-\omega}\mathrm{diag}(-\gamma,1-\gamma,2-\gamma,\ldots,m-1-\gamma),
𝐎b\displaystyle\mathbf{O}_{b} =\displaystyle= bγωdiag(ω,ω1,ω2b,,ω+1m),\displaystyle b^{-\gamma-\omega}\mathrm{diag}(\omega,\omega-1,\omega-2b,\ldots,\omega+1-m),
𝐎(t)\displaystyle\mathbf{O}(t) =\displaystyle= t1γωdiag(τ1,τ2,,τm),t(a,b),\displaystyle t^{-1-\gamma-\omega}\mathrm{diag}(\tau_{1},\tau_{2},\ldots,\tau_{m}),~t\in(a,b),

with τi=(i1γ)(i1ω)\tau_{i}=(i-1-\gamma)(i-1-\omega), i=1,,mi=1,\ldots,m. If we further use Λ=diag(1/τ1,1/τ2,,1/τm)\Lambda=\mathrm{diag}(1/\tau_{1},1/\tau_{2},\ldots,1/\tau_{m}) then we obtain Λ𝐎(t)=t1γωdiag(1,1,,1),t(a,b)\Lambda\mathbf{O}(t)=t^{-1-\gamma-\omega}\mathrm{diag}(1,1,\ldots,1),~t\in(a,b); that is, all components of the matrix Λ𝐎(t)\Lambda\mathbf{O}(t) have exactly the same density.

3.5 Practical implementation

Here we only consider the diagonal representation of the matrices 𝐎a\mathbf{O}_{a}, 𝐎b\mathbf{O}_{b} and 𝐎(t)\mathbf{O}(t); the case of one-column representation of the matrices 𝐎\mathbf{O} can be treated similarly. We assign matrix weights 𝐎a\mathbf{O}_{a} and 𝐎b\mathbf{O}_{b} to the boundary points aa and bb and use an NN-point approximation to an absolutely continuous probability measure on (a,b)(a,b) with some density φ(t)\varphi(t). The density φ(t)\varphi(t) is defined to be either the uniform density on (a,b)(a,b) (if nonzero elements of different components of 𝐎(t)\mathbf{O}(t) are not proportional to each other) or φ(t)=c|𝐎l,l(t)|\varphi(t)=c|\mathbf{O}_{l,l}(t)| for some l{1,,m}l\in\{1,\dots,m\} (if nonzero elements of different components of 𝐎(t)\mathbf{O}(t) are proportional to each other), where cc is the normalization constant and ll is such that the density φ(t)\varphi(t) is not identically zero on the interval (a,b)(a,b). Denote by F(t)=atφ(s)𝑑sF(t)=\int_{a}^{t}\varphi(s)ds the corresponding c.d.f. For given NN, we calculate an NN-point approximation {t1,N,,tN,N;1/N,,1/N}\{t_{1,N},\ldots,t_{N,N};1/N,\ldots,1/N\}, where ti,N=F1(zi,N)t_{i,N}=F^{-1}(z_{i,N}) with zi,N=i/(N+1)z_{i,N}=i/(N+1), i=1,2,,Ni=1,2,\ldots,N, to the probability measure with density φ(t)\varphi(t).

To each point tj,Nt_{j,N} we assign a vector of weights sj=(sj,N,1,,sj,N,m)Ts_{j}=(s_{j,N,1},\ldots,s_{j,N,m})^{T} such that sj,N,k{1,0,1}s_{j,N,k}\in\{-1,0,1\} (k=1,,mk=1,\ldots,m). The values sj,N,k=sign(𝐎k,k(tj))=±1s_{j,N,k}=\mathrm{sign}(\mathbf{O}_{k,k}(t_{j}))=\pm 1 correspond to the sign of the point tj,Nt_{j,N} in the estimation of θk\theta_{k} exactly as in the procedure for one-parameter models described in Section 2.5. Some of the values sj,N,ks_{j,N,k} could be 0. If sj,N,k=0s_{j,N,k}=0 for some kk then the point tj,Nt_{j,N} is not used for the estimation of θk\theta_{k}. By assigning zero weight to a point tj,Nt_{j,N} in the kk-th estimation direction, we perform a thinning of the sample of points t1,N,,tN,Nt_{1,N},\ldots,t_{N,N} in kk-th direction and thus achieve a required density in the each estimation direction. This is a deterministic version of the well-known ‘rejection method’ widely used to generate samples from various probability distributions. If the nonzero components of the matrix weight 𝐎(t)\mathbf{O}(t) are proportional to each other then for these components sj,N,k=1s_{j,N,k}=1 for all jj and NN.

The resulting estimator θ^\hat{\theta} has the form (3.2) where

𝐂\displaystyle\mathbf{C} =\displaystyle= (N𝐎af(a),𝐒1𝐏f(t1),𝐒N𝐏f(tN),N𝐎bf(b)),\displaystyle\Big{(}N\mathbf{O}_{a}f(a),\mathbf{S}_{1}\mathbf{P}f(t_{1})\ldots,\mathbf{S}_{N}\mathbf{P}f(t_{N}),N\mathbf{O}_{b}f(b)\Big{)},
𝐒j\displaystyle\mathbf{S}_{j} =\displaystyle= diag(sj,N,1,,sj,N,m)m×m\displaystyle\mathrm{diag}(s_{j,N,1},\ldots,s_{j,N,m})\in\mathbb{R}^{m\times m}

and 𝐏\mathbf{P} is the diagonal m×mm\times m matrix whose diagonal elements are given by

𝐏k,k=Nj=1N|sj,N,k|ab𝐎k,k(t)𝑑t.\mathbf{P}_{k,k}=\frac{N}{\sum_{j=1}^{N}|s_{j,N,k}|}\int_{a}^{b}\mathbf{O}_{k,k}(t)dt.

If nonzero elements of different components of the matrix weight 𝐎(t)\mathbf{O}(t) are proportional to each other (as was the case in the examples of Section 3.4) then the (N+2)(N\!+\!2)-point approximations to the limiting design are very similar to the approximations in the one-parameter case considered in Section 2.5; their accuracy is also very high. Otherwise, when the diagonal elements of 𝐎(t)\mathbf{O}(t) are possibly non-proportional, the accuracy of approximations will depend on the degree of non-homogeneity of components of the matrix weight 𝐎(t)\mathbf{O}(t).

3.6 Some numerical results

For comparison of competing matrix weighted designs for multiparameter models it is convenient to consider a functional of the covariance matrix. Exemplarily we investigate in this Section the classical DD-optimality criterion defined as Ψ(𝐃(ξ))=(det𝐃(ξ))1/m\Psi(\mathbf{D}(\xi))=(\det\mathbf{D}(\xi))^{1/m} which has to be minimized.
As an example where all nonzero elements of the matrix 𝐎(t)\mathbf{O}(t) are proportional to each other, let us consider the cubic regression model with f(t)=(1,t,t2,t3)Tf(t)=(1,t,t^{2},t^{3})^{T} and the Brownian motion error process. The optimal value in the continuous time model (1.2) is Ψ(𝐃)2.7927\Psi(\mathbf{D}^{*})\simeq 2.7927. In Figure 4 we display the DD-criterion of the covariance matrices of the MWE and the BLUE for the proposed (N+2)(N\!+\!2)-point designs and the covariance matrix of the BLUE with corresponding optimal (N+2)(N\!+\!2)-point designs. We can see that the DD-efficiency of the proposed matrix-weighted design is very high, even for small NN.

Refer to caption
Figure 4: The DD-optimality criterion of the covariance matrix of the MWE for the proposed (N+2)(N\!+\!2)-point designs (crosses), of the covariance matrix of the BLUE for the proposed (N+2)(N\!+\!2)-point designs (line) and of the covariance matrix of the BLUE with corresponding DD-optimal (N+2)(N\!+\!2)-point designs (grey circles). The error process in model (1.1) is the Brownian motion and the vector of regression functions is given by f(t)=(1,t,t2,t3)f(t)=(1,t,t^{2},t^{3}), t[1,2]t\in[1,2].

The second example of this section considers a situation where nonzero elements of the matrix 𝐎(t)\mathbf{O}(t) are not proportional to each other. For this purpose we consider the model (2.1) with f(t)=(1,t,t2)Tf(t)=(1,t,t^{2})^{T}, t[1,2]t\in[1,2] and covariance kernel K(t,t)=e|tt|K(t,t^{\prime})=e^{-|t-t^{\prime}|} with u(t)=etu(t)=e^{t} and v(t)=etv(t)=e^{-t}. Using the diagonal representation, we obtain for the optimal matrix-weighted designs

𝐎a=diag(1,0,1),𝐎b=diag(1,1.5,2),𝐎(t)=diag(1,1,12/t2),t(1,2).\displaystyle\mathbf{O}_{a}=\mathrm{diag}(1,0,-1),\;\;\mathbf{O}_{b}=\mathrm{diag}(1,1.5,2),\;\;\mathbf{O}(t)=\mathrm{diag}(1,1,1-2/t^{2}),~t\in(1,2).

The optimal value in the continuous time model (1.2) is given by Ψ(𝐃)1.6779\Psi(\mathbf{D}^{*})\simeq 1.6779. Since some diagonal elements of 𝐎(t)\mathbf{O}(t) are constant functions, we take the support points of the design ξN+2\xi_{N+2} to be equidistant: ti,N=i/(N+1)t_{i,N}=i/(N+1) for i=1,,Ni=1,\ldots,N. Then we have sj,N,k=1s_{j,N,k}=1 for all j=1,,Nj=1,\ldots,N and k=1,2k=1,2. However, some elements of (s1,N,3,,sN,N,3)(s_{1,N,3},\ldots,s_{N,N,3}) should be zero because 𝐎3,3(t)\mathbf{O}_{3,3}(t) is not proportional to 𝐎1,1(t)\mathbf{O}_{1,1}(t). For example, for N=10N=10 the vector of signs (s1,N,3,,sN,N,3)(s_{1,N,3},\ldots,s_{N,N,3}) is (1,1,0,0,0,1,0,0,1,0)(-1,-1,0,0,0,1,0,0,1,0) and for N=30N=30 it is
(1,0,1,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1).(-1,0,-1,-1,0,-1,0,0,-1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1)\,.

In Figure 5 we depict the DD-optimality criterion of the covariance matrices for various estimators. We observe that in this example for all NN the DD-optimality criterion of the covariance matrices of the MWE is slightly larger than the DD-optimality criterion of the covariance matrices of the BLUE. However, we can also see that the proposed (N+2)(N\!+\!2)-point designs are very efficient compared to the BLUE with corresponding DD-optimal (N+2)(N\!+\!2)-point designs even for small NN.

Refer to caption
Figure 5: The DD-optimality criterion of the covariance matrix of the MWE for the proposed (N+2)(N\!+\!2)-point designs (crosses), of the covariance matrix of the BLUE for the proposed (N+2)(N\!+\!2)-point designs (line) and of the BLUE with corresponding DD-optimal (N+2)(N\!+\!2)-point designs (grey circles). The covariance kernel in model (1.1) is K(t,t)=e|tt|K(t,t^{\prime})=e^{-|t-t^{\prime}|} and the vector of regression functions is f(t)=(1,t,t2)f(t)=(1,t,t^{2}), t[1,2]t\in[1,2].

Acknowledgements. This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Teilprojekt C2) of the German Research Foundation (DFG). The research of H. Dette reported in this publication was also partially supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM107639. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would also like to thank Martina Stein who typed parts of this paper with considerable technical expertise. The work of Andrey Pepelyshev was partly supported by Russian Foundation of Basic Research, project 12-01-00747.

Appendix A Proof of main results

A.1 Explicit form of the inverse of the covariance matrix of errors

Here we state an auxiliary result, which gives an explicit form for the inverse of the matrix 𝚺=(K(ti,tj))i,j=1N\mathbf{\Sigma}=(K(t_{i},t_{j}))^{N}_{i,j=1}, with a triangular covariance kernel KK. We did not find this result (as formulated below) in the literature. Versions of Lemma A.1, however, have been derived independently by different authors; see, for example, Lemma 7.3.2 in Zhigljavsky, (1991) and formula (8) in Harman and Štulajter, (2011). The proof follows from straightforward checking the condition 𝚺1𝚺=𝚺𝚺1=𝐈\mathbf{\Sigma}^{-1}\mathbf{\Sigma}=\mathbf{\Sigma}\mathbf{\Sigma}^{-1}=\mathbf{I}.

Lemma A.1

Consider a symmetric N×NN\times N matrix 𝚺=(σi,j)i,j=1,,N\mathbf{\Sigma}=(\sigma_{i,j})_{i,j=1,\dots,N} which elements are defined by the formula σi,j=uivj{\sigma}_{i,j}=u_{i}v_{j} for 1ijN1\leq i\leq j\leq N. Assume that q1<q2<<qNq_{1}<q_{2}<\ldots<q_{N} where qi=ui/viq_{i}=u_{i}/v_{i}. Then the inverse matrix 𝚺~=𝚺1\widetilde{\mathbf{\Sigma}}=\mathbf{\Sigma}^{-1} is a symmetric tri-diagonal matrix and its elements σ~i,j\widetilde{{\sigma}}_{i,j} with iji\leq j can be computed as follows:

σ~1,1=u2u1v1v2(q2q1),σ~N,N=1vN2(qNqN1),\displaystyle\widetilde{{\sigma}}_{1,1}={\frac{u_{2}}{u_{1}v_{1}v_{2}\left(q_{2}-q_{1}\right)}}\,,\;\;\;\widetilde{{\sigma}}_{N,N}=\frac{1}{v_{N}^{2}\left(q_{N}-q_{{N-1}}\right)}\,,
σ~i,i=qi+1qi1vi2(qiqi1)(qi+1qi)(i=2,,N1),\displaystyle\widetilde{{\sigma}}_{i,i}=\frac{q_{{i+1}}-q_{{i-1}}}{v_{i}^{2}(q_{{i}}-q_{i-1})(q_{{i+1}}-q_{i})}\,\;(i=2,\ldots,N-1)\,,
σ~i,i+1=1vivi+1(qi+1qi)(i=1,,N1),\displaystyle\widetilde{{\sigma}}_{i,i+1}=-\frac{1}{v_{i}v_{i+1}(q_{i+1}-q_{i})}\,\;(i=1,\ldots,N-1)\,,
σ~i,i+k=0(i=1,,N2,k2).\displaystyle\widetilde{{\sigma}}_{i,i+k}=0\,\;(i=1,\ldots,N-2,\;\;k\geq 2)\,.

In our applications of Lemma A.1 we assume that σi,j=K(ti,tj)\sigma_{i,j}=K(t_{i},t_{j}) with the covariance kernel KK having the form (2.4).

A.2 Proof of Lemma 2.1

Denote Kij=K(ti,tj)K_{ij}=K(t_{i},t_{j}), f(ti)=fif(t_{i})=f_{i}, ai=fiwia_{i}=f_{i}w_{i}, i,j=1,,Ni,j=1,\ldots,N, 𝐚=(a1,,aN)T\mathbf{a}=(a_{1},\ldots,a_{N})^{T}. Then for any signed measure ξ={t1,,tN;w1,,wN}\xi=\{t_{1},\ldots,t_{N};w_{1},\ldots,w_{N}\} we have

D(ξ)=ijKijfifjwiwj(ifi2wi)2=ijKijaiaj(ifiai)2=𝐚T𝚺𝐚(𝐚T𝐟)2.\displaystyle D(\xi)=\frac{\sum_{i}\sum_{j}K_{ij}f_{i}f_{j}w_{i}w_{j}}{(\sum_{i}f_{i}^{2}w_{i})^{2}}=\frac{\sum_{i}\sum_{j}K_{ij}a_{i}a_{j}}{(\sum_{i}f_{i}a_{i})^{2}}=\frac{\mathbf{a}^{T}\mathbf{\Sigma}\mathbf{a}}{(\mathbf{a}^{T}\mathbf{f})^{2}}.

Since 𝚺\mathbf{\Sigma} is symmetric and 𝚺>𝟎\mathbf{\Sigma>0}, there exists 𝚺1\mathbf{\Sigma}^{-1} and a symmetric matrix 𝚺1/2>0\mathbf{\Sigma}^{1/2}>0 such that 𝚺=𝚺1/2𝚺1/2\mathbf{\Sigma}=\mathbf{\Sigma}^{1/2}\mathbf{\Sigma}^{1/2}. Denote 𝐛=𝚺1/2𝐚\mathbf{b}=\mathbf{\Sigma}^{1/2}\mathbf{a} and 𝐝=𝚺1/2𝐟\mathbf{d}=\mathbf{\Sigma}^{-1/2}\mathbf{f}. Then we can write the design optimality criterion D(ξ)D(\xi) as D(ξ)=𝐛T𝐛/(𝐛T𝐝)2D(\xi)=\mathbf{b}^{T}\mathbf{b}/(\mathbf{b}^{T}\mathbf{d})^{2}. The Cauchy-Schwartz inequality gives for any two vectors 𝐛\mathbf{b} and 𝐝\mathbf{d} the inequality (𝐛T𝐝)2(𝐛T𝐛)(𝐝T𝐝)(\mathbf{b}^{T}\mathbf{d})^{2}\leq(\mathbf{b}^{T}\mathbf{b})(\mathbf{d}^{T}\mathbf{d}), that is, 𝐛T𝐛/(𝐛T𝐝)21/(𝐝T𝐝)\mathbf{b}^{T}\mathbf{b}/(\mathbf{b}^{T}\mathbf{d})^{2}\geq 1/(\mathbf{d}^{T}\mathbf{d}). This inequality with 𝐛\mathbf{b} and 𝐝\mathbf{d} as above is equivalent to D(ξ)1/𝐟T𝚺1𝐟D(\xi)\geq{1}/{\mathbf{f}^{T}\,\mathbf{\Sigma}^{-1}\mathbf{f}} for all ξ\xi. Equality is attained if the vector 𝐛\mathbf{b} is proportional to the vector 𝐝\mathbf{d}; that is, if bi=cdib_{i}=cd_{i} for all ii and any c0c\neq 0. Finally, the equality bi=cdib_{i}=cd_{i} can be rewritten in the form wi=c(𝚺1𝐟)i/f(ti)w_{i}=c{(\mathbf{\Sigma}^{-1}\mathbf{f})_{i}}/{f(t_{i})}.

A.3 Proof of Theorem 2.1

Before starting the main proof we recall the definition of the design points (2.9) and prove the following auxiliary result.

Lemma A.2

Assume that q()=u()/v()q(\cdot)=u(\cdot)/v(\cdot) is a twice continuously differentiable function on the interval [a,b][a,b]. Then for all i=1,,N1i=1,\ldots,N-1, we have

ti+1,Nti,N\displaystyle t_{i+1,N}-t_{i,N} =\displaystyle= 1NQ(ti,N)+O(1N2)asN.\displaystyle\frac{1}{NQ^{\prime}(t_{i,N})}+O\Bigl{(}\frac{1}{N^{2}}\Bigr{)}\;\;{\rm as}\;N\to\infty. (A.1)
Δn=(ti+1,Nti1,N)/2\displaystyle\Delta_{n}=(t_{i+1,N}-t_{i-1,N})/2 =\displaystyle= 1NQ(ti,N)(1+O(1N))asN.\displaystyle\frac{1}{NQ^{\prime}(t_{i,N})}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}\,\;{\rm as}\;N\to\infty. (A.2)

Proof of Lemma A.2. Recall the definition zi,N=(i1/2)/Nz_{i,N}=(i-1/2)/N (i=1,,Ni=1,\ldots,N) and set

m=q(a)=mint[a,b]q(t),M=q(b)=maxt[a,b]q(t).m=q(a)=\min_{t\in[a,b]}q(t)~,~~M=q(b)=\max_{t\in[a,b]}q(t).

From the definition of the function QQ in (2.8) we have

q(ti+1,N)q(ti,N)=(Mm)(zi+1,Nzi,N)=MmN\displaystyle q(t_{i+1,N})-q(t_{i,N})=(M-m)(z_{i+1,N}-z_{i,N})=\frac{M-m}{N}\, (A.3)

for all i=1,,N1i=1,\ldots,N-1. Observing Taylor’s formula yields for any zz

Q1(z+δ)=Q1(z)+δ(Q1)(z)+O(δ2)asδ0.Q^{-1}(z+\delta)=Q^{-1}(z)+\delta\cdot{(Q^{-1})}^{\prime}(z)+O(\delta^{2})\;\;{\rm as}\;\delta\to 0.

In this formula, set z=zi,Nz=z_{i,N} and δN=1/N\delta_{N}=1/N so that z+δ=zi+1,Nz+\delta=z_{i+1,N}. We thus obtain

ti+1,Nti,N=Q1(zi+1,N)=Q1(zi,N)+1N(Q1)(zi,N)+O(1N2)asN.t_{i+1,N}-t_{i,N}=Q^{-1}(z_{i+1,N})=Q^{-1}(z_{i,N})+\frac{1}{N}\cdot{(Q^{-1})}^{\prime}(z_{i,N})+O\Bigl{(}\frac{1}{N^{2}}\Bigr{)}\;\;{\rm as}\;N\to\infty.

By using (2.9) and the relation (Q1)(z)=1/Q(Q1(z)){\left(Q^{-1}\right)}^{\prime}(z)=1/Q^{\prime}(Q^{-1}(z)) we can rewrite this in the form (A.1). The second statement obviously follows from (A.1).

Proof of Theorem 2.1. In view of Lemma 2.1 and (2.5) - (2.1) we have

w1,N=cNu2f1v1v2(f1u1f2u2),wN,N=cNfNvN(fNvNfN1vN1),\displaystyle w_{1,N}=\frac{c_{N}u_{2}}{f_{1}v_{1}v_{2}}\Bigl{(}\frac{f_{{1}}}{u_{1}}-\frac{f_{2}}{u_{2}}\Bigr{)}\,\,,\;\;\;w_{N,N}=\frac{c_{N}}{f_{N}v_{N}}\Bigl{(}\frac{f_{N}}{v_{N}}-\frac{f_{N-1}}{v_{N-1}}\Bigr{)}\,,
wi,N=cNfivi(2fivifi1vi1fi+1vi+1) for i=2,,N1,\displaystyle w_{i,N}=\frac{c_{N}}{f_{i}v_{i}}\Bigl{(}\frac{2f_{i}}{v_{i}}-\frac{f_{i-1}}{v_{i-1}}-\frac{f_{i+1}}{v_{i+1}}\Bigr{)}\,\;\;\mbox{ for $i=2,\ldots,N-1$},

where we have used the relations (A.3). Here cNc_{N} is the normalization constant providing i=1N|wi,N|=1\sum_{i=1}^{N}|w_{i,N}|=1 and we use the notation ui=u(ti,N)u_{i}=u(t_{i,N}), vi=(ti,N)v_{i}=(t_{i,N}) and fi=f(ti,N)f_{i}=f(t_{i,N}).

Consider first w1,Nw_{1,N}. Denote g(t)=f(t)/u(t)g(t)=f(t)/u(t), then

w1,N/cN=u(t2,N)f(t1,N)v(t1,N)v(t2,N)(g(t1,N)g(t2,N)),\displaystyle w_{1,N}/c_{N}=\frac{u(t_{2,N})}{f(t_{1,N})v(t_{1,N})v(t_{2,N})}\left(g(t_{1,N})-g(t_{2,N})\right)\,, (A.4)

which gives

u(t2,N)f(t1,N)v(t1,N)v(t2,N)=u(a)f(a)v2(a)(1+O(1N)),\displaystyle\frac{u(t_{2,N})}{f(t_{1,N})v(t_{1,N})v(t_{2,N})}=\frac{u(a)}{f(a)v^{2}(a)}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}\,, (A.5)
g(t1,N)g(a)=g(a)(t1,Na)+O((t1,Na)2)=g(a)12NQ(a)+O(1N2)\displaystyle g(t_{1,N})-g(a)=g^{\prime}(a)\left(t_{1,N}-a\right)+O\left(\left(t_{1,N}-a\right)^{2}\right)=g^{\prime}(a)\cdot\frac{1}{2NQ^{\prime}(a)}+O\Bigl{(}\frac{1}{N^{2}}\Bigr{)}

as NN\to\infty. Similarly

g(t2,N)g(a)=g(a)32NQ(a)+O(1N2)\displaystyle g(t_{2,N})-g(a)=g^{\prime}(a)\frac{3}{2NQ^{\prime}(a)}+O\Bigl{(}\frac{1}{N^{2}}\Bigr{)}

yielding

g(t1,N)g(t2,N)=g(a)1NQ(a)+O(1N2).\displaystyle g(t_{1,N})-g(t_{2,N})=-g^{\prime}(a)\frac{1}{NQ^{\prime}(a)}+O\Bigl{(}\frac{1}{N^{2}}\Bigr{)}\,. (A.6)

Combining (A.4), (A.5) and (A.6) we obtain

w1,NcN\displaystyle\frac{w_{1,N}}{c_{N}} =\displaystyle= 1Nu(a)g(a)f(a)v2(a)Q(a)(1+O(1N))\displaystyle-\frac{1}{N}\cdot\frac{u(a)g^{\prime}(a)}{f(a)v^{2}(a)Q^{\prime}(a)}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)} (A.7)
=\displaystyle= 1N1v2(a)Q(a)[u(a)u(a)f(a)f(a)](1+O(1N))\displaystyle\frac{1}{N}\cdot\frac{1}{v^{2}(a)Q^{\prime}(a)}\Bigl{[}\frac{u^{\prime}(a)}{u(a)}-\frac{f^{\prime}(a)}{f(a)}\Bigr{]}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}

as NN\to\infty. Similarly to (A.7) we get the asymptotic expression for wN,Nw_{N,N}:

wN,NcN=1Nh(b)f(b)v(b)Q(b)(1+O(1N))\displaystyle\frac{w_{N,N}}{c_{N}}=\frac{1}{N}\cdot\frac{h^{\prime}(b)}{f(b)v(b)Q^{\prime}(b)}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)} (A.8)

as NN\to\infty. Consider now the weights

wi,N=cN2h(ti,N)h(ti1,N)h(ti+1,N)f(ti,N)v(ti,N) (i=2,,N1).\displaystyle w_{i,N}=c_{N}\frac{2h(t_{i,N})-h(t_{i-1,N})-h(t_{i+1,N})}{f(t_{i,N})v(t_{i,N})}\,\;\;\mbox{ ($i=2,\ldots,N-1$)}. (A.9)

Assume NN\to\infty and i=i(N)i=i(N) is such that i(N)/N=z+O(1/N)i(N)/N=z+O(1/N) as NN\to\infty for some z(0,1)z\in(0,1), and set t=Q1(z)t=Q^{-1}(z).

We are going to prove that

wi,NcN\displaystyle\frac{w_{i,N}}{c_{N}} =\displaystyle= 2h(ti,N)h(ti1,N)h(ti+1,N)f(ti,N)v(ti,N)\displaystyle\frac{2h(t_{i,N})-h(t_{i-1,N})-h(t_{i+1,N})}{f(t_{i,N})v(t_{i,N})}\,
=\displaystyle= 1N2(Q(t))2f(t)v(t)[h(t)Q′′(t)Q(t)h′′(t)](1+O(1N))\displaystyle\frac{1}{N^{2}(Q^{\prime}(t))^{2}f(t)v(t)}\Bigl{[}\frac{h^{\prime}(t)Q^{\prime\prime}(t)}{Q^{\prime}(t)}-h^{\prime\prime}(t)\Bigr{]}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}\,
=\displaystyle= 1N2Q(t)f(t)v(t)[h(t)Q(t)](1+O(1N)).\displaystyle-\frac{1}{N^{2}Q^{\prime}(t)f(t)v(t)}\Bigl{[}\frac{h^{\prime}(t)}{Q^{\prime}(t)}\Bigr{]}^{\prime}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}\,. (A.11)

First, in view of (2.9) we have ti,n=t+O(1N)t_{i,n}=t+O\left(\frac{1}{N}\right) and hence

f(ti,N)v(ti,N)=f(t)v(t)(1+O(1N))asN.f(t_{i,N})v(t_{i,N})=f(t)v(t)\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}\;\;{\rm as}\;N\to\infty\,.

Consider the numerator in (A.3) and rewrite it as follows:

2h(ti,N)h(ti1,N)h(ti+1,N)=[2h(t~i,N)h(ti1,N)h(ti+1,N)]+2[h(ti,N)h(t~i,N)]\displaystyle 2h(t_{i,N})-h(t_{i-1,N})-h(t_{i+1,N})=\left[2h(\tilde{t}_{i,N})-h(t_{i-1,N})-h(t_{i+1,N})\right]+2\left[h({t}_{i,N})-h(\tilde{t}_{i,N})\right]

where t~i,N=(ti1,N+ti+1,N)/2.\tilde{t}_{i,N}=(t_{i-1,N}+t_{i+1,N})/2\,. We obviously have ti+1,N=t~i,N+ΔNt_{i+1,N}=\tilde{t}_{i,N}+\Delta_{N} and ti1,N=t~i,NΔN,t_{i-1,N}=\tilde{t}_{i,N}-\Delta_{N}, where ΔN=(ti+1,Nti1,N)/2\Delta_{N}=(t_{i+1,N}-t_{i-1,N})/2 is defined in (A.2). This yields

2h(t~i,N)h(ti1,N)h(ti+1,N)ΔN2=h′′(t)(1+O(1N)).\displaystyle\frac{2h(\tilde{t}_{i,N})-h(t_{i-1,N})-h(t_{i+1,N})}{\Delta_{N}^{2}}=-h^{\prime\prime}(t)\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}\,. (A.12)

Next we consider

h(ti,N)h(t~i,N)ΔN2=h(ti,N)h(t~i,N)ti,Nt~i,Nti,Nt~i,NΔN2.\displaystyle\frac{h({t}_{i,N})-h(\tilde{t}_{i,N})}{\Delta_{N}^{2}}=\frac{h({t}_{i,N})-h(\tilde{t}_{i,N})}{{t}_{i,N}-\tilde{t}_{i,N}}\cdot\frac{{t}_{i,N}-\tilde{t}_{i,N}}{\Delta_{N}^{2}}.

For the first factor we have

h(ti,N)h(t~i,N)ti,Nt~i,N=h(t)(1+O(1N)),\displaystyle\frac{h({t}_{i,N})-h(\tilde{t}_{i,N})}{{t}_{i,N}-\tilde{t}_{i,N}}=h^{\prime}(t)\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}\,,

while the second factor gives

ti,Nt~i,NΔN2\displaystyle\frac{{t}_{i,N}-\tilde{t}_{i,N}}{\Delta_{N}^{2}} =\displaystyle= 22ti,Nti+1,Nti1,N(ti+1,Nti1,N)2\displaystyle 2\frac{2t_{i,N}-t_{i+1,N}-t_{i-1,N}}{(t_{i+1,N}-t_{i-1,N})^{2}}
=\displaystyle= 22Q1(zi,N)Q1(zi+1,N)Q1(zi1,N)1/N2(1/NQ1(zi+1,N)Q1(zi1,N))2\displaystyle 2\frac{2Q^{-1}(z_{i,N})-Q^{-1}(z_{i+1,N})-Q^{-1}(z_{i-1,N})}{1/N^{2}}\Big{(}\frac{1/N}{Q^{-1}(z_{i+1,N})-Q^{-1}(z_{i-1,N})}\Big{)}^{2}
=\displaystyle= 2(Q1)′′(z)/(2(Q1)(z))2(1+O(1N))=Q′′(t)/(2Q(t))(1+O(1N)),\displaystyle-2{(Q^{-1})}^{\prime\prime}(z)/(2{(Q^{-1})}^{\prime}(z))^{2}\Big{(}1+O\Big{(}\frac{1}{N}\Big{)}\Big{)}=Q^{\prime\prime}(t)/(2Q^{\prime}(t))\Big{(}1+O\Big{(}\frac{1}{N}\Big{)}\Big{)}\,,

where we have used the relation (Q1)′′(z)=Q′′(z)/(Q(z))3{\left(Q^{-1}\right)}^{\prime\prime}(z)=-Q^{\prime\prime}(z)/(Q^{\prime}(z))^{3} in the last equation. This gives, as NN\to\infty,

2h(ti,N)h(t~i,N)ΔN2=2h(ti,N)h(t~i,N)ti,Nt~i,Nti,Nt~i,NΔN2=h(t)Q′′(t)Q(t)(1+O(1N)).\displaystyle 2\frac{h({t}_{i,N})-h(\tilde{t}_{i,N})}{\Delta_{N}^{2}}=2\frac{h({t}_{i,N})-h(\tilde{t}_{i,N})}{{t}_{i,N}-\tilde{t}_{i,N}}\cdot\frac{{t}_{i,N}-\tilde{t}_{i,N}}{\Delta_{N}^{2}}=\frac{h^{\prime}(t)Q^{\prime\prime}(t)}{Q^{\prime}(t)}\Bigl{(}1+O\Bigl{(}\frac{1}{N}\Bigr{)}\Bigr{)}.\;\;\; (A.13)

Combining the expressions (A.2), (A.3), (A.12) and (A.13) yields the asymptotic expression (A.11) for wi,N/cN{w_{i,N}}/{c_{N}}.

By noting that cN=NC(1+O(1N))c_{N}=NC\left(1+O\left(\frac{1}{N}\right)\right) as NN\to\infty and that the asymptotic density of the points ti,Nt_{i,N} (i=1,,Ni=1,\ldots,N) is Q(t)Q^{\prime}(t) on the interval [a,b][a,b], we deduce the statement of the theorem as a consequence of the asymptotic formulas (A.7), (A.8) and (A.11) for w1,N/cN{w_{1,N}}/{c_{N}}, wN,N/cN{w_{N,N}}/{c_{N}} and wi,N/cN{w_{i,N}}/{c_{N}} respectively.

A.4 Proof of Theorem 2.2

By Theorem 3.3 in Dette et al., (2013) a design minimizes the functional (2.15) if the identity

abmin(s,t)f(t)𝑑ξ(t)=λf(s)\displaystyle\int_{a}^{b}\min(s,t)f(t)d\xi(t)=\lambda f(s) (A.14)

holds ξa.e.\xi\!-\!a.e., where λ\lambda is some constant. We consider the design ξ=ξ\xi=\xi^{*} defined by (2.12) and (2.13) and verify for this design the condition (A.14). To do this we calculate by partial integration

1cabmin(s,t)f(t)p(t)𝑑t\displaystyle\frac{1}{c}\int_{a}^{b}\min(s,t)f(t)p(t)dt =\displaystyle= ast(f′′(t))𝑑t+sbs(f′′(t))𝑑t\displaystyle\int^{s}_{a}t(-f^{\prime\prime}(t))dt+\int^{b}_{s}s(-f^{\prime\prime}(t))dt
=\displaystyle= {tf(t)|asasf(t)𝑑t}s{f(b)f(s)}\displaystyle-\Bigl{\{}tf^{\prime}(t)\Big{|}^{s}_{a}-\int^{s}_{a}f^{\prime}(t)dt\Bigr{\}}-s\Bigl{\{}f^{\prime}(b)-f^{\prime}(s)\Bigr{\}}
=\displaystyle= (af(a)f(a))sf(b)+f(s).\displaystyle(af^{\prime}(a)-f(a))-sf^{\prime}(b)+f(s).

Observing the definition of the masses in (2.13), the identify (A.14) follows with λ=c\lambda=c. This proves the first part of Theorem 2.2.

For a proof of the second statement consider a linear unbiased estimator θ^μ\hat{\theta}_{\mu^{*}} in model (2.1) based on the full trajectory, where μ(dt)=f(t)ξ(dt)\mu^{*}(dt)=f(t)\xi^{*}(dt) and ξ\xi^{*} is the design in (2.12), (2.13) with a constant cc chosen such that θ^μ\hat{\theta}_{\mu^{*}} is unbiased, that is

c=[f2(a)a+ab(f(t))2𝑑t]1.c^{*}=\Big{[}\frac{f^{2}(a)}{a}+\int^{b}_{a}(f^{\prime}(t))^{2}dt\Big{]}^{-1}.

Standard arguments of optimal design theory show that μ\mu^{*} minimizes Φ\Phi (that is, θ^μ\hat{\theta}_{\mu^{*}} is BLUE in model (2.1) where the full trajectory can be observed) if and only if the inequality

Φ(μ,ν)=ababK(x,y)μ(dx)ν(dy)Φ(μ)\displaystyle\Phi(\mu^{*},\nu)=\int^{b}_{a}\int^{b}_{a}K(x,y)\mu^{*}(dx)\nu(dy)\geq\Phi(\mu^{*}) (A.15)

holds for all signed measures ν\nu satisfying abf(t)ν(dt)=1\int^{b}_{a}f(t)\nu(dt)=1. Observing this condition and the identity (A.14) we obtain

Φ(μ,ν)=cabf(s)ν(ds)=[f2(a)a+ab(f(t))2𝑑t]1=Φ(μ)\Phi(\mu^{*},\nu)=c^{*}\int^{b}_{a}f(s)\nu(ds)=\Bigl{[}\frac{f^{2}(a)}{a}+\int^{b}_{a}(f^{\prime}(t))^{2}dt\Bigr{]}^{-1}=\Phi(\mu^{*})

for all signed measures ν\nu on [a,b][a,b] with abf(t)ν(dt)=1\int^{b}_{a}f(t)\nu(dt)=1. By (A.15) μ\mu^{*} minimizes Φ\Phi. Consequently, the corresponding estimator θ^μ\hat{\theta}_{\mu^{*}} is BLUE with minimal variance

D=c=[f2(a)a+ab(f(t))2𝑑t]1.D^{*}=c^{*}=\Bigl{[}\frac{f^{2}(a)}{a}+\int^{b}_{a}(f^{\prime}(t))^{2}dt\Bigr{]}^{-1}.

A.5 Proof of Theorem 2.3

Let {ε~(s)|s[a~,b~]\{\tilde{\varepsilon}(s)|s\in[\tilde{a},\tilde{b}] be a Brownian motion on the interval [a~,b~][\tilde{a},\tilde{b}] and consider the regression model (2.1) with some function f~(s)\tilde{f}(s) and the error process. By Theorem 2.2 the optimal design is given by

ξ~(ds)=P~a~δa~(ds)+P~b~δb~(ds)+p~(s)ds\tilde{\xi}^{*}(ds)=\tilde{P}_{\tilde{a}}\delta_{\tilde{a}}(ds)+\tilde{P}_{\tilde{b}}\delta_{\tilde{b}}(ds)+\tilde{p}(s)ds

with

P~a~=cf~(a)f~(a~)a~a~f~(a~),P~b~=cf~(b~)f~(b~)andp~(s)=cf~′′(s)f~(s).\tilde{P}_{\tilde{a}}=c\ \frac{\tilde{f}(a)-\tilde{f}^{\prime}(\tilde{a})\tilde{a}}{\tilde{a}\tilde{f}(\tilde{a})}\,,\;\;\tilde{P}_{\tilde{b}}=c\ \frac{\tilde{f}^{\prime}(\tilde{b})}{\tilde{f}(\tilde{b})}\qquad{\rm{and}}\qquad\tilde{p}(s)=-c\frac{\tilde{f}^{\prime\prime}(s)}{\tilde{f}(s)}\,.

We shall now use Theorem B.1 to derive the optimal design ξ(dt)\xi^{*}(dt) for the original regression model (2.1) with regression function f(t)f(t) and covariance kernel K(t,t)K(t,t^{\prime}) from the design ξ~(ds)\tilde{\xi}^{*}(ds) for the function f~(s)=h(q1(s))\tilde{f}(s)=h(q^{-1}(s)), where h(t)=f(t)/v(t)h(t)=f(t)/v(t).

For the Brownian motion, the covariance function is defined by (B.4) with v~(t)=1\tilde{v}(t)=1 and q~(t)=t\tilde{q}(t)=t so that by (B.6) we have β(t)=q(t)\beta(t)=q(t), α(t)=v(t)\alpha(t)=v(t) and α~(t)=1/v(q1(t)).\tilde{\alpha}(t)=1/v(q^{-1}(t)). According to (B.14) the optimal design dξ~(s)d\tilde{\xi}^{*}(s) transforms to dξ(t)=α~2(β(t))dξ~(β(t))=dξ~(q(t))/ν(t)d\xi^{*}(t)=\tilde{\alpha}^{2}(\beta(t))d\tilde{\xi}^{*}(\beta(t))=d\tilde{\xi}^{*}(q(t))/\nu(t).

Consider first the mass at bb. We have P~b~=cf~(b~)/f~(b~)\tilde{P}_{\tilde{b}}=c{\tilde{f}^{\prime}(\tilde{b})}/{\tilde{f}(\tilde{b})}. By using the transformation of tt into s=q1(t)s=q^{-1}(t), we obtain

Pb=P~bv2(b)=cf~(b~)f~(b~)v2(b)=ch(b)q(b)v2(b)h(b)=ch(b)q(b)v(b)f(b),P_{b}=\frac{\tilde{P}_{b}}{v^{2}(b)}=c\frac{\tilde{f}^{\prime}(\tilde{b})}{\tilde{f}(\tilde{b})v^{2}(b)}=c\frac{h^{\prime}(b)}{q^{\prime}(b)v^{2}(b)h(b)}=c\frac{h^{\prime}(b)}{q^{\prime}(b)v(b)f(b)}\,,

as required. From the representation of P~a~\tilde{P}_{\tilde{a}} we obtain by similar arguments

Pa=P~av2(a)=ch(a)ah(a)/q(a)q(a)v2(a)h(a).P_{a}=\frac{\tilde{P}_{a}}{v^{2}(a)}=c\ \frac{h(a)-ah^{\prime}(a)/q^{\prime}(a)}{q^{\prime}(a)v^{2}(a)h(a)}.

Let us now consider the density p~(s),\tilde{p}(s), s[a~,b~]s\in[\tilde{a},\tilde{b}], and rewrite dξ~p(β(t))d\tilde{\xi}^{*}_{p}(\beta(t)), the absolutely continuous part of the measure ξ~\tilde{\xi}^{*}. The transformation of the variable ss into t=q1(s)[a,b]t=q^{-1}(s)\in[a,b] induces the density

dξ~p(β(t))=p~(q(t))q(t)=cq(t)f~′′(q(t))f~(q(t)).\displaystyle d\tilde{\xi}^{*}_{p}(\beta(t))=\tilde{p}(q(t))q^{\prime}(t)=-cq^{\prime}(t)\frac{\tilde{f}^{\prime\prime}(q(t))}{\tilde{f}(q(t))}\,. (A.16)

Differentiating the equality f~(s)=h(q1(s))\tilde{f}(s)=h(q^{-1}(s)), we have

f~′′(s)=(h(q1(s))(q1(s)))=h′′(q1(s))((q1(s)))2+h(q1(s))(q1(s))′′.\displaystyle\tilde{f}^{\prime\prime}(s)=\left(h^{\prime}(q^{-1}(s))\cdot(q^{-1}(s))^{\prime}\right)^{\prime}=h^{\prime\prime}(q^{-1}(s))\cdot\left((q^{-1}(s))^{\prime}\right)^{2}+h^{\prime}(q^{-1}(s))\cdot\left(q^{-1}(s)\right)^{\prime\prime}.

Now we obtain

f~′′(q(t))=h′′(t)(q(t))2h(t)q′′(t)(q(t))3\displaystyle\tilde{f}^{\prime\prime}(q(t))=\frac{h^{\prime\prime}(t)}{(q^{\prime}(t))^{2}}-h^{\prime}(t)\cdot\frac{q^{\prime\prime}(t)}{(q^{\prime}(t))^{3}}

Inserting this into (A.16) and taking into account that f~(q(t))=h(t)\tilde{f}(q(t))=h(t), we obtain the density

dξ~p(β(t))=c1h(t)q(t)(h(t)q′′(t)q(t)h′′(t))=ch(t)[h(t)q(t)].\displaystyle d\tilde{\xi}^{*}_{p}(\beta(t))=c\,\frac{1}{h(t)q^{\prime}(t)}\Bigl{(}h^{\prime}(t)\cdot\frac{q^{\prime\prime}(t)}{q^{\prime}(t)}-h^{\prime\prime}(t)\Bigr{)}=-\frac{c}{h(t)}\left[\frac{h^{\prime}(t)}{q^{\prime}(t)}\right]^{\prime}\,. (A.17)

In view of the relation dξ(t)=α~2(β(t))dξ~(β(t))d\xi^{*}(t)=\tilde{\alpha}^{2}(\beta(t))d\tilde{\xi}^{*}(\beta(t)) we need to divide the right hand side in (A.17) by v2(t)v^{2}(t) and obtain the expression for the density (2.11). This completes the proof of Theorem 2.3.

Appendix B Gaussian processes with triangular covariance kernels

B.1 Extended Doob’s representation

Assume that {ε(t)|t[a,b]}\{\varepsilon(t)|\;t\in[a,b]\} is a Gaussian process with covariance kernel KK of the form (2.4); that is, K(t,t)=u(t)v(t)K(t,t^{\prime})=u(t)v(t^{\prime}) for ttt\leq t^{\prime}, where u()u(\cdot) and v()v(\cdot) are functions defined on the interval [a,b][a,b]. According to the terminology introduced in Mehr and McFadden, (1965) kernels of the form (2.4) are called triangular. An alternative way of writing these covariance kernels is

K(t,t)=v(t)v(t)min{q(t),q(t)}fort,t[a,b],\displaystyle K(t,t^{\prime})=v(t)v(t^{\prime})\min\{q(t),q(t^{\prime})\}\;\;{\rm for}\;\;t,t^{\prime}\in[a,b]\,, (B.1)

where q(t)=u(t)/v(t)q(t)=u(t)/v(t). We assume that ε(t)\varepsilon(t) is non-degenerate on the open interval (a,b)(a,b), which implies that the function qq is strictly increasing and continuous on the interval [a,b][a,b] [see Mehr and McFadden, (1965), Remark 2]. Moreover, this function is also positive on the interval (a,b)(a,b) [see Remark 1 in Mehr and McFadden, (1965)], which yields that the functions uu and vv must have the same sign and can be assumed to be positive on the interval (a,b)(a,b) without loss of generality. We repeatedly use the following extension of the celebrated Doob’s representation [see Doob, (1949)], which relates to two Gaussian processes (on compact intervals) by a time-space transformation.

Lemma B.1

Let {ε(t)|t[a,b]}\{\varepsilon(t)|\;t\in[a,b]\} be a non-degenerate Gaussian process with zero mean and covariance function (B.1) and let v~\tilde{v} and q~\tilde{q} be continuous positive functions on [a~,b~][\tilde{a},\tilde{b}], such that q~\tilde{q} is strictly increasing and q~([a~,b~])=q([a,b])\tilde{q}([\tilde{a},\tilde{b}])=q([a,b]). Define the transformations β~:[a~,b~][a,b]\tilde{\beta}:[\tilde{a},\tilde{b}]\to[a,b] and α~:[a~,b~]+\tilde{\alpha}:[\tilde{a},\tilde{b}]\to\mathbb{R}_{+} by

β~(s)=q1(q~(s)),α~(s)=v~(s)/v(β~(s)).\displaystyle\tilde{\beta}(s)=q^{-1}(\tilde{q}(s))\,,\;\;\;\tilde{\alpha}(s)=\tilde{v}(s)/v(\tilde{\beta}(s))\,. (B.2)

Then the Gaussian process {ε~(t)|t[a~,b~]}\{\tilde{\varepsilon}(t)|\;t\in[\tilde{a},\tilde{b}]\} defined by

ε~(s)=α~(s)ε(β~(s))\displaystyle\tilde{\varepsilon}(s)=\tilde{\alpha}(s)\varepsilon(\tilde{\beta}(s))\, (B.3)

has zero mean and the covariance function is given by

K~(s,s)=𝔼[ε~(s)ε~(s)]=v~(s)v~(s)min(q~(s),q~(s)).\displaystyle\tilde{K}(s,s^{\prime})=\mathbb{E}[\tilde{\varepsilon}(s)\tilde{\varepsilon}(s^{\prime})]=\tilde{v}(s)\tilde{v}(s^{\prime})\min(\tilde{q}(s),\tilde{q}(s^{\prime}))\,. (B.4)

Conversely, the Gaussian process ε(t)\varepsilon(t) can be expressed via ε~(s)\tilde{\varepsilon}(s) by the transformation

ε(t)=α(t)ε~(β(t)),\displaystyle\varepsilon(t)=\alpha(t)\tilde{\varepsilon}(\beta(t))\,, (B.5)

where

β(t)=q~1(q(t)),α(t)=v(t)/v~(β(t)).\displaystyle\beta(t)=\tilde{q}^{-1}(q(t))\,,\;\;\alpha(t)=v(t)/\tilde{v}(\beta(t))\,. (B.6)

Proof. Since {ε(t)|t[a,b]}\{\varepsilon(t)|t\in[a,b]\} is Gaussian and has zero mean, the process defined by (B.3) is also Gaussian and has zero mean. For the covariance function of the process (B.3) we have

𝔼[ε~(s)ε~(s)]\displaystyle\mathbb{E}\left[\tilde{\varepsilon}(s)\tilde{\varepsilon}(s^{\prime})\right] =\displaystyle= α~(s)α~(s)𝔼[ε(β~(s))ε(β~(s))]\displaystyle\tilde{\alpha}(s)\tilde{\alpha}(s^{\prime})\mathbb{E}\left[\varepsilon(\tilde{\beta}(s))\varepsilon(\tilde{\beta}(s^{\prime}))\right]
=\displaystyle= α~(s)α~(s)v(β~(s))v(β~(s))min[q(β~(s)),q(β~(s))]\displaystyle\tilde{\alpha}(s)\tilde{\alpha}(s^{\prime})v(\tilde{\beta}(s))v(\tilde{\beta}(s^{\prime}))\min\left[q(\tilde{\beta}(s)),\,q(\tilde{\beta}(s^{\prime}))\right]
=\displaystyle= v~(s)v~(s)min[q~(s),q~(s)]=K~(s,s).\displaystyle{\tilde{v}(s)}{\tilde{v}(s^{\prime})}\min\left[\tilde{q}(s),\,\tilde{q}(s^{\prime})\right]=\tilde{K}(s,s^{\prime})\,.

The second part of the proof follows by the same arguments and the details are therefore omitted. \Box

Remark B.1

(a) The classical result of Doob is a particular case of (B.5) when ε~(t)=W(t)\tilde{\varepsilon}(t)=W(t) is the Brownian motion with covariance function K~(t,s)=min(t,s)\tilde{K}(t,s)=\min(t,s). In this case we have v~(t)=1\tilde{v}(t)=1, q~(t)=t\tilde{q}(t)=t, α(t)=v(t)\alpha(t)=v(t) and β(t)=q(t)\beta(t)=q(t). Specifically, the Doob’s representation is given by ε(t)=v(t)W(q(t))\varepsilon(t)=v(t)W(q(t)) [see Doob, (1949)].

(b) Both functions β:[a,b][a~,b~]\beta:\;[a,b]\to[\tilde{a},\tilde{b}] and β~:[a~,b~][a,b]\tilde{\beta}:[\tilde{a},\tilde{b}]\to[a,b] are positive strictly increasing functions and are inverses of each other; that is,

β(t)=β~1(t),t[a,b].\displaystyle\beta(t)=\tilde{\beta}^{-1}(t)\,,\;\;\forall~t\in[a,b]\,. (B.7)

(c) The functions α()\alpha(\cdot) and α~()\tilde{\alpha}(\cdot) are positive and satisfy the relation

α(t)α~(β(t))=v(t)v~(β(t))v~(β(t))v(β~(β(t)))=1,t[a,b].\displaystyle\alpha(t)\cdot\tilde{\alpha}(\beta(t))=\frac{v(t)}{\tilde{v}(\beta(t))}\cdot\frac{\tilde{v}(\beta(t))}{v(\tilde{\beta}(\beta(t)))}=1\,,\;\;\forall~t\in[a,b]. (B.8)

(d) The properties (b) and (c) imply that the transformation ε~ε\tilde{\varepsilon}\to\varepsilon defined by (B.5) is the inverse of the transformation εε~\varepsilon\to\tilde{\varepsilon} defined in (B.3).

B.2 Transformation of regression models

Associated with the transformation of the triangular covariance kernels there exists a canonical transformation for the corresponding regression models. To be precise, consider the regression model (1.1) or its continuous time version (1.2), where the covariance kernel K(,)K(\cdot,\cdot) has the form (B.1). Recall the definition of the transformation β:[a,b][a~,b~]\beta:\;[a,b]\to[\tilde{a},\tilde{b}] defined in (B.6), which maps the observation points tjt_{j} to t~j=β(tj)\tilde{t}_{j}=\beta(t_{j}), j=1,,Nj=1,\ldots,N and define

f~(s)=f(β~(s))α(β~(s)),ε~(s)=ε(β~(s))α(β~(s)),y~(t~j)=y(tj)α(tj),\displaystyle\tilde{f}(s)=\frac{f(\tilde{\beta}(s))}{\alpha(\tilde{\beta}(s))},~\tilde{\varepsilon}(s)=\frac{\varepsilon(\tilde{\beta}(s))}{\alpha(\tilde{\beta}(s))},~\tilde{y}(\tilde{t}_{j})=\frac{y(t_{j})}{\alpha(t_{j})}\,, (B.9)

where s[a~,b~]s\in[\tilde{a},\tilde{b}] so that β~(s)[a,b]\tilde{\beta}(s)\in[a,b]. The regression model (1.1) can now be rewritten in the form

y~(t~j)=θTf~(t~j)+ε~(t~j),t~j[a~,b~],j=1,,N.\displaystyle\tilde{y}(\tilde{t}_{j})=\theta^{T}\tilde{f}(\tilde{t}_{j})+\tilde{\varepsilon}(\tilde{t}_{j}),~\tilde{t}_{j}\in[\tilde{a},\tilde{b}],~j=1,\ldots,N. (B.10)

The errors ε~(t~j)\tilde{\varepsilon}(\tilde{t}_{j}) in (B.10) have zero mean and, by Lemma B.1 and the identity (B.8), their covariances are given by

𝔼[ε~(t~i)ε~(t~j)]=K~(t~i,t~j).\displaystyle~\mathbb{E}[\tilde{\varepsilon}(\tilde{t}_{i})\tilde{\varepsilon}(\tilde{t}_{j})]=\tilde{K}(\tilde{t}_{i},\tilde{t}_{j}). (B.11)

Hence we have transformed the regression observation scheme (1.1) with error covariances 𝔼[ε(ti)ε(tj)]=K(ti,tj)\mathbb{E}[\varepsilon(t_{i})\varepsilon(t_{j})]=K(t_{i},t_{j}) to the scheme (B.10) with covariances (B.11). Conversely, we can transform the model (B.10) with covariances (B.11) to the model (1.1) using the transformations

f(t)=f~(β(t))α~(β(t)),ε(t)=ε~(β(t))α~(β(t)),t[a,b].\displaystyle f(t)=\frac{\tilde{f}(\beta(t))}{\tilde{\alpha}(\beta(t))}\,,\;\;\varepsilon(t)=\frac{\tilde{\varepsilon}(\beta(t))}{\tilde{\alpha}(\beta(t))}\,,\;t\in[a,b]\,. (B.12)
Lemma B.2

The transformation ff~f\to\tilde{f} defined in (B.9) is an inverse to the transformation f~f\tilde{f}\to f defined in (B.12).

Proof. Inserting the expression for f~\tilde{f} from (B.9) into (B.12), we have

f(t)=f~(β(t))α~(β(t))=f(β~(β(t)))α(β~(β(t)))α~(β(t))=f(t)α(t)α~(β(t))=f(t),\displaystyle f(t)=\frac{\tilde{f}(\beta(t))}{\tilde{\alpha}(\beta(t))}=\frac{f(\tilde{\beta}(\beta(t)))}{\alpha(\tilde{\beta}(\beta(t)))\,\tilde{\alpha}(\beta(t))}=\frac{f(t)}{\alpha(t)\,\tilde{\alpha}(\beta(t))}=f(t),

where we have used the identities β~(β(t))=t\tilde{\beta}(\beta(t))=t, see (B.7), and α(t)α~(β(t))=1\alpha(t)\,\tilde{\alpha}(\beta(t))=1, see (B.8).

B.3 Transformation of designs

In this section we consider a transformation of the matrix-weighted designs under a given transformation of the regression models. In the one-parameter case with m=1m=1, these matrix-weighted designs become signed measures; that is, signed designs as considered in Section 2. In this section, it is convenient to define all integrals as Lebesgue-Stieltjes integrals with respect to the distribution functions of the measures ζ\zeta and ζ~\tilde{\zeta}.

To be precise, let dξ(t)=𝐎ξ(t)dζ(t)d\xi(t)=\mathbf{O}_{\xi}(t)d\zeta(t) be a matrix-weighted design on the interval t[a,b]t\in[a,b]. Recalling the definition of α,α~\alpha,\tilde{\alpha} and β,β~\beta,\tilde{\beta} in (B.2) and (B.6) we define a matrix-weighted design dξ~(s)=𝐎~ξ~(s)dζ~(s)d\tilde{\xi}(s)=\tilde{\mathbf{O}}_{\tilde{\xi}}(s)d\tilde{\zeta}(s) by

dζ~(s)=dζ(β~(s))and𝐎~ξ~(s)=α2(β~(s))𝐎ξ(β~(s)).\displaystyle d\tilde{\zeta}(s)=d\zeta(\tilde{\beta}(s))\,\;\;{\rm and}\;\;\tilde{\mathbf{O}}_{\tilde{\xi}}(s)=\alpha^{2}(\tilde{\beta}(s))\mathbf{O}_{\xi}(\tilde{\beta}(s))\,. (B.13)

Note that ζ~\tilde{\zeta} and ζ\zeta are probability measures on the intervals [a~,b~][\tilde{a},\tilde{b}] and [a,b][a,b], respectively. Similarly, for a given matrix-weighted design dξ~(s)=𝐎~ξ~(s)dζ~(s)d\tilde{\xi}(s)=\tilde{\mathbf{O}}_{\tilde{\xi}}(s)d\tilde{\zeta}(s) on the interval [a~,b~][\tilde{a},\tilde{b}] we define a matrix-weighted design dξ(t)=𝐎ξ(t)dζ(t)d\xi(t)=\mathbf{O}_{\xi}(t)d\zeta(t) on the interval [a,b][a,b] by

dζ(t)=dζ~(β(t))and𝐎ξ(t)=α~2(β(t))𝐎~ξ~(β(t)).\displaystyle d\zeta(t)=d\tilde{\zeta}(\beta(t))\,\;\;{\rm and}\;\;\mathbf{O}_{\xi}(t)=\tilde{\alpha}^{2}(\beta(t))\tilde{\mathbf{O}}_{\tilde{\xi}}(\beta(t)). (B.14)

Similar to Lemma B.2 we can see that the transformation ξ~ξ\tilde{\xi}\to\xi defined by (B.14) is the inverse to the transformation ξξ~\xi\to\tilde{\xi} defined by (B.13).

For the following discussion we recall the definition of the covariance matrix 𝐃(ξ)\mathbf{D}(\xi) in (3.11). For the model (B.10), the covariance matrix of the design dξ~(s)=𝐎~ξ~(s)dζ~(s)d\tilde{\xi}(s)=\tilde{\mathbf{O}}_{\tilde{\xi}}(s)d\tilde{\zeta}(s), defined by (B.13), is given by

𝐃~(ξ~)=𝐌~1(ξ~)𝐁~(ξ~)(𝐌~1(ξ~))T,\displaystyle\tilde{\mathbf{D}}(\tilde{\xi})=\tilde{\mathbf{M}}^{-1}(\tilde{\xi})\tilde{\mathbf{B}}(\tilde{\xi})(\tilde{\mathbf{M}}^{-1}(\tilde{\xi}))^{T}, (B.15)

where

𝐁~(ξ~)=a~b~a~b~K~(t,s)𝐎~ξ~(t)f~(t)(𝐎~ξ~(s)f~(s))T𝑑ζ~(t)𝑑ζ~(s),𝐌~(ξ~)=a~b~𝐎~ξ~(t)f~(t)f~T(t)𝑑ζ~(t)\displaystyle\tilde{\mathbf{B}}(\tilde{\xi})=\int_{\tilde{a}}^{\tilde{b}}\int_{\tilde{a}}^{\tilde{b}}\tilde{K}(t,s){\tilde{\mathbf{O}}}_{\tilde{\xi}}(t)\tilde{f}(t)({\tilde{\mathbf{O}}}_{\tilde{\xi}}(s)\tilde{f}(s))^{T}d\tilde{\zeta}(t)d\tilde{\zeta}(s),\;\;\tilde{\mathbf{M}}(\tilde{\xi})=\int_{\tilde{a}}^{\tilde{b}}{\tilde{\mathbf{O}}}_{\tilde{\xi}}(t){\tilde{f}}(t){\tilde{f}}^{T}(t)d\tilde{\zeta}(t)\,

and the kernel K~\tilde{K} is defined by (B.4).

Theorem B.1

For any matrix-weighted design dξ(t)=𝐎ξ(t)dζ(t)d\xi(t)=\mathbf{O}_{\xi}(t)d\zeta(t) and the corresponding matrix-weighted design ξ~\tilde{\xi} defined by (B.13), we have 𝐃(ξ)=𝐃~(ξ~).\mathbf{D}(\xi)=\tilde{\mathbf{D}}(\tilde{\xi}). In particular, 𝐃=𝐃~\mathbf{D}^{*}=\tilde{\mathbf{D}}^{*}, where 𝐃\mathbf{D}^{*} and 𝐃~\tilde{\mathbf{D}}^{*} are the covariance matrices of the BLUE in the continuous time models (1.2) and in the model {θTf~(s)+ε~(s)|s[a~,b~]}\{\theta^{T}\tilde{f}(s)+\tilde{\varepsilon}(s)|s\in[\tilde{a},\tilde{b}]\}, respectively.

Proof. Using the variable transformation β~(s)=t\tilde{\beta}(s)=t and (B.9), we have

𝐌~(ξ~)\displaystyle\tilde{\mathbf{M}}(\tilde{\xi}) =\displaystyle= 𝐎~ξ~(s)f~(s)f~T(s)𝑑ζ~(s)=𝐎ξ(β~(s))f(β~(s))α(β~(s))fT(β~(s))α(β~(s))α2(β~(s))𝑑ζ(β~(s))\displaystyle\int{\tilde{\mathbf{O}}}_{\tilde{\xi}}(s){\tilde{f}}(s){\tilde{f}}^{T}(s)d\tilde{\zeta}(s)=\int\frac{\mathbf{O}_{\xi}(\tilde{\beta}(s))f(\tilde{\beta}(s))}{\alpha(\tilde{\beta}(s))}\frac{f^{T}(\tilde{\beta}(s))}{\alpha(\tilde{\beta}(s))}\cdot\alpha^{2}(\tilde{\beta}(s))d\zeta(\tilde{\beta}(s))
=\displaystyle= 𝐎ξ(t)f(t)fT(t)𝑑ζ(t)=𝐌(ξ).\displaystyle\int\mathbf{O}_{\xi}(t)f(t)f^{T}(t)d\zeta(t)=\mathbf{M}(\xi).

Next, we calculate the corresponding expression for 𝐁~(ξ~)\tilde{\mathbf{B}}(\tilde{\xi}), that is

𝐁~(ξ~)\displaystyle\tilde{\mathbf{B}}(\tilde{\xi}) =\displaystyle= K~(x,y)𝐎~ξ~(x)f~(x)(𝐎~ξ~(y)f~(y))T𝑑ζ~(x)𝑑ζ~(y)\displaystyle\int\int\tilde{K}(x,y)\tilde{\mathbf{O}}_{\tilde{\xi}}(x)\tilde{f}(x)(\tilde{\mathbf{O}}_{\tilde{\xi}}(y)\tilde{f}(y))^{T}d\tilde{\zeta}(x)d\tilde{\zeta}(y)
=\displaystyle= v~(x)v~(y)min(q~(x),q~(y))𝐎~ξ~(x)f~(x)(𝐎~ξ~(y)f~(y))T𝑑ζ~(x)𝑑ζ~(y)\displaystyle\int\int\tilde{v}(x)\tilde{v}(y)\min(\tilde{q}(x),\tilde{q}(y))\tilde{\mathbf{O}}_{\tilde{\xi}}(x)\tilde{f}(x)(\tilde{\mathbf{O}}_{\tilde{\xi}}(y)\tilde{f}(y))^{T}d\tilde{\zeta}(x)d\tilde{\zeta}(y)
=\displaystyle= v~(x)v~(y)min(q~(x),q~(y))𝐎ξ(β~(x))f(β~(x))α(β~(x))(𝐎ξ(β~(y))f(β~(y)))Tα(β~(y))×\displaystyle\int\int\tilde{v}(x)\tilde{v}(y)\min(\tilde{q}(x),\tilde{q}(y))\frac{\mathbf{O}_{\xi}(\tilde{\beta}(x))f(\tilde{\beta}(x))}{\alpha(\tilde{\beta}(x))}\frac{(\mathbf{O}_{\xi}(\tilde{\beta}(y))f(\tilde{\beta}(y)))^{T}}{\alpha(\tilde{\beta}(y))}\times
α2(β~(x))dζ(β~(x))α2(β~(y))dζ(β~(y))\displaystyle\alpha^{2}(\tilde{\beta}(x))d\zeta(\tilde{\beta}(x))\alpha^{2}(\tilde{\beta}(y))d\zeta(\tilde{\beta}(y))

Define s=β~(x)s=\tilde{\beta}(x) and t=β~(y)t=\tilde{\beta}(y) so that x=β~1(s)=β(s)x=\tilde{\beta}^{-1}(s)=\beta(s) and similarly y=β(t)y=\beta(t). Changing the variables in the integrals above we obtain

B~(ξ~)\displaystyle\tilde{B}(\tilde{\xi}) =\displaystyle\!\!=\!\! v~(β(s))v~(β(t))min(q~(β(s)),q~(β(t)))𝐎ξ(s)f(s)(𝐎ξ(t)f(t))Tα(s)α(t)𝑑ζ(s)𝑑ζ(t).\displaystyle\int\!\int\tilde{v}(\beta(s))\tilde{v}(\beta(t))\min(\tilde{q}(\beta(s)),\tilde{q}(\beta(t)))\mathbf{O}_{\xi}(s)f(s)(\mathbf{O}_{\xi}(t)f(t))^{T}\alpha(s)\alpha(t)d\zeta(s)d\zeta(t).

Using the definition of β\beta in (B.6) yields q~(β(t))=q~(q~1(q(t)))=q(t)\tilde{q}(\beta(t))=\tilde{q}(\tilde{q}^{-1}(q(t)))=q(t) and by the definition of α\alpha in (B.6) we finally get

𝐁~(ξ~)\displaystyle\tilde{\mathbf{B}}(\tilde{\xi}) =\displaystyle= v~(β(s))v~(β(t))min(q(s),q(t))𝐎ξ(s)f(s)(𝐎ξ(t)f(t))Tv(s)v~(β(s))v(t)v~(β(t))𝑑ζ(s)𝑑ζ(t)\displaystyle\int\!\int\tilde{v}(\beta(s))\tilde{v}(\beta(t))\min(q(s),q(t))\mathbf{O}_{\xi}(s)f(s)(\mathbf{O}_{\xi}(t)f(t))^{T}\frac{v(s)}{\tilde{v}(\beta(s))}\frac{v(t)}{\tilde{v}(\beta(t))}d\zeta(s)d\zeta(t)
=\displaystyle= min(q(s),q(t))𝐎ξ(s)f(s)(𝐎ξ(t)f(t))Tv(s)v(t)𝑑ζ(s)𝑑ζ(t)=𝐁(ξ).\displaystyle\int\!\int\min(q(s),q(t))\mathbf{O}_{\xi}(s)f(s)(\mathbf{O}_{\xi}(t)f(t))^{T}v(s)v(t)d\zeta(s)d\zeta(t)=\mathbf{B}(\xi)\,.

The result 𝐃(ξ)=𝐃~(ξ~)\mathbf{D}(\xi)=\tilde{\mathbf{D}}(\tilde{\xi}) follows now from the definitions (3.11) and (B.15).

References

  • Bickel and Herzberg, (1979) Bickel, P. J. and Herzberg, A. M. (1979). Robustness of design against autocorrelation in time I: Asymptotic theory, optimality for location and linear regression. Annals of Statistics, 7(1):77–95.
  • Boltze and Näther, (1982) Boltze, L. and Näther, W. (1982). On effective observation methods in regression models with correlated errors. Math. Operationsforsch. Statist. Ser. Statist., 13:507–519.
  • Dette et al., (2008) Dette, H., Kunert, J., and Pepelyshev, A. (2008). Exact optimal designs for weighted least squares analysis with correlated errors. Statistica Sinica, 18(1):135–154.
  • Dette et al., (2013) Dette, H., Pepelyshev, A., and Zhigljavsky, A. (2013). Optimal design for linear models with correlated observations. The Annals of Statistics, 41(1):143–176.
  • Doob, (1949) Doob, J. L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. The Annals of Mathematical Statistics, 20(3):393–403.
  • Grenander, (1950) Grenander, U. (1950). Stochastic processes and statistical inference. Ark. Mat., 1:195–277.
  • Harman and Štulajter, (2010) Harman, R. and Štulajter, F. (2010). Optimal prediction designs in finite discrete spectrum linear regression models. Metrika, 72(2):281–294.
  • Harman and Štulajter, (2011) Harman, R. and Štulajter, F. (2011). Optimality of equidistant sampling designs for the Brownian motion with a quadratic drift. Journal of Statistical Planning and Inference, 141(8):2750–2758.
  • Kiselak and Stehlík, (2008) Kiselak, J. and Stehlík, M. (2008). Equidistant DD-optimal designs for parameters of Ornstein-Uhlenbeck process. Statistics and Probability Letters, 78:1388–1396.
  • Mehr and McFadden, (1965) Mehr, C. and McFadden, J. (1965). Certain properties of Gaussian processes and their first-passage times. Journal of the Royal Statistical Society. Series B (Methodological), pages 505–522.
  • Müller and Pázman, (2003) Müller, W. G. and Pázman, A. (2003). Measures for designs in experiments with correlated errors. Biometrika, 90:423–434.
  • (12) Näther, W. (1985a). Effective Observation of Random Fields. Teubner Verlagsgesellschaft, Leipzig.
  • (13) Näther, W. (1985b). Exact design for regression models with correlated errors. Statistics, 16:479–484.
  • Pázman and Müller, (2001) Pázman, A. and Müller, W. G. (2001). Optimal design of experiments subject to correlated errors. Statist. Probab. Lett., 52(1):29–34.
  • Pukelsheim, (2006) Pukelsheim, F. (2006). Optimal Design of Experiments. SIAM, Philadelphia.
  • Sacks and Ylvisaker, (1966) Sacks, J. and Ylvisaker, N. D. (1966). Designs for regression problems with correlated errors. Annals of Mathematical Statistics, 37:66–89.
  • Sacks and Ylvisaker, (1968) Sacks, J. and Ylvisaker, N. D. (1968). Designs for regression problems with correlated errors; many parameters. Annals of Mathematical Statistics, 39:49–69.
  • Zhigljavsky et al., (2010) Zhigljavsky, A., Dette, H., and Pepelyshev, A. (2010). A new approach to optimal design for linear models with correlated observations. Journal of the American Statistical Association, 105:1093–1103.
  • Zhigljavsky, (1991) Zhigljavsky, A. A. (1991). Theory of Global Random Search. Springer, Netherlands.