10.1080/1744250YYxxxxxxxx \issn1744-2516 \issnp1744-2508 \jvol00 \jnum00 \jyear2006 \jmonthDecember
Optimality of estimators for misspecified semi-Markov models
Abstract
Suppose we observe a geometrically ergodic semi-Markov process
and have a parametric model for the transition distribution
of the embedded Markov chain,
for the conditional distribution of the inter-arrival times, or for both.
The first two models for the process are semiparametric,
and the parameters can be estimated by conditional maximum likelihood
estimators. The third model for the process is parametric,
and the parameter can be estimated by an unconditional
maximum likelihood estimator.
We determine heuristically the asymptotic distributions of these estimators
and show that they are asymptotically efficient.
If the parametric models are not correct,
the (conditional) maximum likelihood estimators estimate the parameter
that maximizes the Kullback–Leibler information.
We show that they remain asymptotically efficient in a nonparametric sense.
{classcode}
Primary: 62M09; secondary: 62F12, 62G20.
keywords:
Hellinger differentiability, local asymptotic normality, asymptotically linear estimator, Markov renewal process.1 Introduction
For i.i.d. observations, Daniels Da61 [6] and Huber Hu67 [20] show that the maximum likelihood estimator of a misspecified parametric model estimates the parameter that maximizes the Kullback–Leibler (KL) information, and determine its asymptotic distribution. Weaker conditions are given by Pollard Po85 [33]. For applications see also White Wh82 [35], Müller Mu07 [30], and Doksum, Ozeki, Kim and Neto DOKN [7]. Analogous results are obtained for parametric Markov chain models by Ogata Og80 [31], for parametric time series by Hosoya Ho89 [19] and by Andrews and Pollard AP94 [1], and for parametric diffusion models by McKeague Mc84 [28] and Kutoyants Ku88 [25]. We refer also to the monograph of Kutoyants Ku04 [26]. Applications to time series models in econometrics are studied by White Wh84 [36] and Sin and White SW96 [34], and in the monograph of White Wh94 [37].
Greenwood and Wefelmeyer GW97 [15] prove that the maximum likelihood estimator of a misspecified parametric Markov chain model is efficient in a nonparametric sense. Related efficiency results for misspecified parametric time series are in Dahlhaus and Wefelmeyer DW96 [5]. Here we outline corresponding results for semi-Markov processes. We consider both parametric and semiparametric misspecified models. The arguments are heuristic; sufficient regularity conditions can be obtained as in the above references.
Suppose we observe a semi-Markov process , , with values in an arbitrary measurable space , on a time interval . Let denote the embedded Markov renewal process. Its transition distribution factors as
where is the transition distribution of the embedded Markov chain , and is the conditional distribution of the inter-arrival time given and .
We assume that the embedded Markov chain is stationary. We write , and for the stationary laws of , and , respectively. Of course, and . Set . We note that studying a semi-Markov process is equivalent to studying the embedded Markov renewal process. The latter is a Markov chain. Observing the semi-Markov process up to time is equivalent to observing the embedded Markov renewal process up to the random time .
Natural estimators for , and are the empirical distributions
where denotes the Dirac measure at a point .
Let be an open subset of . We consider the following three models for the semi-Markov process. In Model Q we assume a parametric form , , of the transition distribution of the embedded Markov chain. These models are also considered in Greenwood, Müller and Wefelmeyer GMW04 [11]. In Model R we assume a parametric form , , of the conditional distribution of the inter-arrival times. In Model S we assume parametric forms and , , for both. Of course, the last model covers the case that and carry different parameters. We assume that has a density with respect to some dominating measure , and has a density with respect to some dominating measure .
If Model Q holds, then the transition distribution of the semi-Markov process is semiparametric, , with an infinite-dimensional nuisance parameter. A natural estimator of is the partial maximum likelihood estimator , which maximizes
Suppose that Model Q is misspecified, and that the true transition distribution of the embedded Markov chain is . Then is an empirical version of the KL information . Let denote the parameter that maximizes . We call a KL functional. Note that the partial maximum likelihood estimator is the empirical version of the KL functional, . Since Model Q is misspecified, the semi-Markov model is nonparametric. The empirical distribution is efficient for in a certain sense. If the KL functional is smooth, i.e. compactly differentiable in an appropriate sense, it follows that is efficient for . We will not use this approach in this paper. Instead we derive, in Section 3, a stochastic expansion of , and determine its influence function. We also show that the KL functional is pathwise differentiable, and determine its canonical gradient. To keep the exposition simple, we do not give regularity conditions for these results. They can be adapted e.g. from those of Greenwood and Wefelmeyer GW97 [15]. It turns out that the canonical gradient equals the influence function of . By the characterisation of efficient estimators in Section 2, this shows that is efficient in the nonparametric semi-Markov model. We also show that remains efficient when Model Q is true. The advantage of our approach is that we do not need to check compact differentiability of and a corresponding efficiency property of .
The other two models are treated analogously. If Model R holds, then the transition distribution of the semi-Markov process is semiparametric, , with an infinite-dimensional nuisance parameter. A natural estimator of is the partial maximum likelihood estimator , which maximizes
Suppose that Model Q is misspecified, and that the true conditional distribution of the inter-arrival times is . Then is an empirical version of . Again we call the latter KL information. We denote by the parameter that maximizes , and we call a KL functional. Then . In Section 4 we derive a stochastic expansion of and the canonical gradient of and show that is efficient in the nonparametric semi-Markov model. We also show that remains efficient when Model R is true.
If Model S holds, then the transition distribution of the semi-Markov process is parametric, . Set
A natural estimator of is the maximum likelihood estimator , which maximizes
Suppose that Model Q is misspecified, and that the true transition distribution of the embedded Markov renewal process is . Then is an empirical version of . Again we call the latter KL information. We denote by the parameter that maximizes , and we call a KL functional. Then . In Section 5 we derive a stochastic expansion of and the canonical gradient of and show that is efficient in the nonparametric semi-Markov model. We also show that remains efficient when Model S is true. Section 6 contains some additional comments.
2 Characterization of efficient estimators
We assume that the embedded Markov chain is positive Harris recurrent and geometrically ergodic in . We make the usual assumption that the conditional distribution of the inter-arrival times does not charge zero. We also assume that the mean inter-arrival time is finite. Then
(1) |
For a function we have the strong law of large numbers
(2) |
For a function with we have the martingale central limit theorem
(3) |
where denotes a standard normal random variable.
In order to characterize efficient estimators for functionals of semi-Markov models, we consider a family , , of transition distributions of the embedded Markov chain, and a family , , of conditional distributions of the inter-arrival time. Here is a possibly infinite-dimensional set, the parameter space. We fix and set , and
Note that and can be viewed as orthogonal subspaces of . We assume that the parametrization is smooth in the following sense. There is a linear space , the tangent space of , and a bounded linear operator , and for each there is a sequence in such that is Hellinger differentiable at with derivative ,
and is Hellinger differentiable at with derivative ,
Now write for the distribution of , , if and are in effect, and if and are. By Taylor expansion and 2 and 3, we obtain local asymptotic normality:
(4) |
and
(5) |
For Markov chains, different proofs are in Penev Pe91 [32], Bickel Bi93 [2] and Greenwood and Wefelmeyer GW95 [13]; see also Bickel and Kwon BK01 [4]. For Markov step processes see Höpfner, Jacod and Ladelli HJL90 [18] and Höpfner Ho93a [16, 17]. A proof for nonparametric semi-Markov models is in Greenwood and Wefelmeyer GW96 [14].
We want to estimate a -dimensional functional of the parameter . We call differentiable at with gradient if , , and
(6) |
The canonical gradient of is the componentwise projection of onto the closure of in . If is closed in , we can write for some . This will be the case in Sections 3–5.
An estimator is called regular for at with limit if is a -dimensional random vector such that
The convolution theorem says that
with a -dimensional standard normal random vector, and a -dimensional random vector independent of . This justifies calling efficient for at if is asymptotically normal under with covariance matrix .
An estimator is called asymptotically linear for at with influence function if , , and
We have the following characterization. An estimator is regular and efficient for at if and only if it is asymptotically linear with influence function equal to the canonical gradient,
For proofs of the convolution theorem and the characterization we refer to Bickel, Klaassen, Ritov and Wellner BKRW98 [3].
To prove asymptotic linearity of estimators in misspecified models, we need the following martingale approximation. Set . The potential of the embedded Markov chain is defined by
For set
Then and
Let and set . Then we obtain the stochastic expansion
(7) |
Note that and . Hence and are orthogonal martingale increments. For discrete-time processes, the martingale approximation 7 is due to Gordin Go69 [9] and Gordin and Lifšic GL78 [10]. It was discovered independently by Maigret Ma78 [27], Dürr and Goldstein DG86 [8] and Greenwood and Wefelmeyer GW95 [13]. See also Section 17.4 in the monograph of Meyn and Tweedie MT93 [29]. The martingale approximation 7 and the martingale central limit theorem 3 imply that
To calculate canonical gradients of functionals in misspecified models, we need the following perturbation expansion, due to Kartashov Ka85a [21, 22, 23],
(8) |
Here denotes the distribution of if is in effect. This pathwise version of the perturbation expansion suffices for our purposes. Greenwood and Wefelmeyer GW95 [13] show that it follows also from the martingale approximation 7.
3 Model Q
In Model Q we assume a parametric model , , for the -density of the transition distribution of the embedded Markov chain, and consider the conditional inter-arrival time distribution as unknown. Suppose the model is misspecified, and the true transition distribution is . Then the KL functional maximizes , and the partial maximum likelihood estimator maximizes . Write
for the -dimensional vector of partial derivatives of . Then solves , and solves . Heuristically, by Taylor expansion,
(9) | |||||
Here is the matrix of partial derivatives of . With 1 and 2 we obtain
(10) |
If Model Q is correctly specified and , then . We also have the following relations, which are well-known in the i.i.d. case,
In particular, the partial Fisher information matrix for Model Q is . Hence, for the correctly specified model, the partial maximum likelihood estimator has the stochastic expansion
This means that is asymptotically linear with influence function , and is asymptotically normal with covariance matrix .
If the model is misspecified, then is not in . We apply the martingale approximation 7 to 10 and see that is asymptotically linear with influence function . Hence is asymptotically normal with covariance matrix
Let us now prove efficiency of , first for the correctly specified model. For set . Assume that is Hellinger differentiable at ,
(11) |
Let denote the set of all conditional inter-arrival distributions. For choose a sequence in that is Hellinger differentiable at ,
(12) |
Then the assumptions of Section 2 hold with , , , . The functional to be estimated is . By orthogonality of and , its canonical gradient is obtained from 6 as with matrix determined by
i.e. . Hence the canonical gradient of is and equals the influence function of , which is therefore efficient for the correctly specified model.
Suppose now that the model is misspecified, and let be the set of all transition distributions of the embedded Markov chain. Let denote the true transition distribution. For choose a sequence in that is Hellinger differentiable at ,
(13) |
Then the assumptions of Section 2 hold with , , , . The functional to be estimated is . Heuristically,
With we obtain
The perturbation expansion 8 yields
(14) |
Hence
and the canonical gradient of is obtained from 6 as and equals the influence function of , which is therefore efficient for the misspecified model.
4 Model R
Model R is completely analogous to Model Q, with interchanged roles of the transition distribution of the embedded Markov chain, and the conditional inter-arrival time distribution . Specifically, in Model R we assume a parametric model , , for the -density of the conditional inter-arrival time, and consider the transition distribution of the embedded Markov chain as unknown. Suppose the model is misspecified, and the true conditional inter-arrival time distribution is . Then the KL functional maximizes , and the partial maximum likelihood estimator maximizes . Write
for the -dimensional vector of partial derivatives of . Then solves , and solves . Heuristically, by Taylor expansion,
(15) | |||||
Here is the matrix of partial derivatives of . With 1 and 2 we obtain
(16) |
If Model R is correctly specified and , then . We also have the following relations,
In particular, the partial Fisher information matrix for Model R is . Hence, for the correctly specified model, the partial maximum likelihood estimator has the stochastic expansion
This means that is asymptotically linear with influence function , and is asymptotically normal with covariance matrix .
If the model is misspecified, then is not in . We apply the martingale approximation 7 to 16 and see that is asymptotically linear with influence function
Hence is asymptotically normal with covariance matrix
where
Let us now prove efficiency of , first for the correctly specified model. For set . Assume that is Hellinger differentiable at ,
(17) |
Let denote the set of all transition distributions of the embedded Markov chain. For choose a sequence in that is Hellinger differentiable 13 at . Then the assumptions of Section 2 hold with , , , . The functional to be estimated is . By orthogonality of and , its canonical gradient is obtained from 6 as with matrix determined by
i.e. . Hence the canonical gradient of is and equals the influence function of , which is therefore efficient for the correctly specified model.
Suppose now that the model is misspecified, and let be the set of all transition distributions of the embedded Markov chain. Let denote the true transition distribution. For choose a sequence in that is Hellinger differentiable 12 at . Then the assumptions of Section 2 hold with , , , . The functional to be estimated is . Heuristically,
With we obtain
Write and apply the perturbation expansion 14 to obtain
and the canonical gradient of is obtained from 6 as
and equals the influence function of , which is therefore efficient for the misspecified model.
5 Model S
While Models Q and R are semiparametric, Models S is parametric. In Model S we assume parametric models and , , for the -density of the transition distribution of the embedded Markov chain and for the -density of the conditional inter-arrival time. We have . Hence the KL functional maximizes , and the partial maximum likelihood estimator maximizes . Write
for the -dimensional vector of partial derivatives of . Then solves , and solves . Taylor expansions analogous to 9 and 15 imply
where is the matrix of partial derivatives of . We obtain
(18) |
If Model S is correctly specified with and , then . From Sections 3 and 4 we obtain the Fisher information matrix for Model S as . Hence, for the correctly specified model, the maximum likelihood estimator has the stochastic expansion
This means that is asymptotically linear with influence function , and is asymptotically normal with covariance matrix .
If the model is misspecified, then is not in and is not in . We apply the martingale approximation 7 to 18 and see that is asymptotically linear with influence function
Hence is asymptotically normal with covariance matrix
where
Let us now prove efficiency of , first for the correctly specified model. For set . Assume that is Hellinger differentiable 11 at , and is Hellinger differentiable 17 at . Then the assumptions of Section 2 hold with , , , . The functional to be estimated is . The canonical gradient is obtained from 6 as . It equals the influence function of , which is therefore efficient in the correctly specified model.
Suppose now that the model is misspecified. Let be the set of all transition distributions of the embedded Markov chain, and let be the set of all transition distributions of the embedded Markov chain. For choose a sequence in that is Hellinger differentiable 13 at . For choose a sequence in that is Hellinger differentiable 12 at . Then the assumptions of Section 2 hold with , , , . The functional to be estimated is . Similarly as in Section 4,
and therefore
Hence by 6 the canonical gradient of is obtained as
and equals the influence function of , which is therefore efficient for the misspecified model.
6 Remarks
In this section we comment on examples and possible extensions of our results.
1. If the distribution of the inter-arrival times charges only 1, so that , then the semi-Markov process reduces to a Markov chain with transition distribution , and for Model Q we recover the results of Greenwood and Wefelmeyer GW97 [15].
2. Our results carry over to observations of the embedded Markov renewal process. Just replace by . In particular, instead of the central limit theorem 3 with random summation index , use
and replace by 1 everywhere.
In some examples we can describe the KL functional more explicitly.
3. Suppose the embedded Markov chain is a linear autoregressive model of order 1, i.e. , where and the innovations are i.i.d. with mean 0, finite variance, and known density . Then Model Q holds with , and with . Hence the KL functional solves . If is the density of for some , then and . Hence the KL functional is , and the partial maximum likelihood estimator for is the least squares estimator
a ratio of two empirical estimators.
4. Suppose the inter-arrival time given and is exponentially distributed with mean not depending on ,
Then the semi-Markov process is a Markov step process. If the mean is constant, , , then Model R holds with , and . Hence the KL functional solves , and we obtain . The partial maximum likelihood estimator for is
a function of an empirical estimator. Efficiency of empirical estimators in Markov step processes is studied in Greenwood and Wefelmeyer GW94 [12].
The models Q, R and S are described in terms of the conditional distributions and . It is occasionally reasonable to model instead the marginal distributions , or . Results for these three models differ considerably among each other and from Models Q, R and S.
5. Suppose we have a parametric model for the -density of . The marginal maximum likelihood estimator maximizes
It estimates the KL functional , the parameter that maximizes . Note that the marginal maximum likelihood estimator is an empirical version of the KL functional, .
However, is not efficient for when the marginal model is correctly specified. The reason is that the specification of the marginal density implies a constraint on the conditional distribution of the embedded Markov chain, but the marginal maximum likelihood estimator does not use this information. An efficient estimator for is difficult to construct. See Kessler, Schick and Wefelmeyer KSW01 [24] for an efficient estimator of in a Markov chain model with a (correctly specified) parametric model for the (one-dimensional) marginal density. On the other hand, is efficient for in a nonparametric sense when the marginal model is misspecified.
We note that, in this respect, semi-Markov processes and Markov chains are different from the i.i.d. case. Suppose we have i.i.d. observations with joint distribution , where is unknown. Then is not constrained by the marginal model , and the marginal maximum likelihood estimator is efficient for if the marginal model is correctly specified, and also efficient for if the marginal model is misspecified.
6. Suppose we have a parametric model for the -density of . The marginal maximum likelihood estimator maximizes
It estimates the KL functional , the parameter that maximizes , and . The perturbation expansion 8 suggests that maximizing is asymptotically equivalent to solving , and the martingale approximation 7 suggests that this is asymptotically equivalent to solving . Hence the marginal maximum likelihood estimator is asymptotically equivalent to the conditional maximum likelihood estimator and therefore efficient in the correctly specified model . The reason is that , and determines , which therefore does not contain additional information about .
This is again different from the i.i.d. case. Suppose we have i.i.d. observations with joint density . Then contains, in general, additional information about .
7. Suppose we have a parametric model for the -density of . The marginal maximum likelihood estimator maximizes
It estimates the KL functional , the parameter that maximizes , and . We can write . Now carries additional information about , similarly as in the i.i.d. case.
8. Remark 5 tells us in particular the following, rather obvious, fact. If a parametric estimator is efficient in a nonparametric sense, then the reason is not that it is efficient in a parametric model. Rather, an estimator usually is nonparametrically efficient because it is a smooth function of an empirical estimator. We can illustrate this also with Model S. Suppose we have parametric models and for the densities of and . Let be the conditional maximum likelihood estimator based on the model alone. In general, will not be efficient for when model S is correctly specified, because does not use the information about in the model . But if both and are misspecified, will be nonparametrically efficient for , which is the KL functional for Model Q but not for Model S.
References
- [1]
- [2] Andrews, D. W. K. and Pollard, D., 1994, An introduction to functional central limit theorems for dependent stochastic processes. Internat. Statist. Rev. 62, 119–132.
- [3] Bickel, P. J., 1993, Estimation in semiparametric models. In: (C. R. Rao, Ed.) Multivariate Analysis: Future Directions (Amsterdam: North-Holland), pp. 55–73 MR1246354
- [4] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A., 1998, Efficient and Adaptive Estimation for Semiparametric Models (New York: Springer). MR1623559
- [5] Bickel, P. J. and Kwon, J., 2001, Inference for semiparametric models: Some questions and an answer (with discussion). Statist. Sinica 11, 863–960. MR1867326
- [6] Dahlhaus, R. and Wefelmeyer, W., 1996, Asymptotically optimal estimation in misspecified time series models. Ann. Statist. 24, 952–974. MR1401832
- [7] Daniels, H. E., 1961, The asymptotic efficiency of a maximum likelihood estimator. Proc. Fourth Berkeley Sympos. Math. Statist. and Probability 1, 151–163. MR0131924
- [8] Doksum, K., Ozeki, A., Kim, J. and Neto, E. C., 2007, Thinking outside the box: Statistical inference based on Kullback-Leibler empirical projections. Statist. Probab. Lett. 77, 1201-1213
- [9] Dürr, D. and Goldstein, S., 1986, Remarks on the central limit theorem for weakly dependent random variables. In: (S. Albeverio, P. Blanchard and L. Streit, Eds.) Stochastic Processes — Mathematics and Physics, Lecture Notes in Mathematics 1158 (Berlin: Springer), pp. 104–118 MR0838560
- [10] Gordin, M. I., 1969, The central limit theorem for stationary processes. Soviet Math. Dokl. 10, 1174–1176. MR0251785
- [11] Gordin, M. I. and Lifšic, B. A. 1978, The central limit theorem for stationary Markov processes. Soviet Math. Dokl. 19, 392–394.
- [12] Greenwood, P. E., Müller, U. U. and Wefelmeyer, W., 2004, Efficient estimation for semiparametric semi-Markov processes. Comm. Statist. Theory Methods 33, 419-435. MR2056947
- [13] Greenwood, P. E. and Wefelmeyer, W., 1994, Nonparametric estimators for Markov step processes. Stochastic Process. Appl. 52, 1-16. MR1289165
- [14] Greenwood, P. E. and Wefelmeyer, W., 1995, Efficiency of empirical estimators for Markov chains, Ann. Statist., 23, 132–143. MR1331660
- [15] Greenwood, P. E. and Wefelmeyer, W., 1996, Empirical estimators for semi-Markov processes. Math. Meth. Statist. 5, 299-315. MR1417674
- [16] Greenwood, P. E. and Wefelmeyer, W., 1997, Maximum likelihood estimator and Kullback–Leibler information in misspecified Markov chain models. Theory Probab. Appl. 42, 103–111. MR1453336
- [17] Höpfner, R., 1993a, On statistics of Markov step processes: representation of log-likelihood ratio processes in filtered local models. Probab. Theory Related Fields 94, 375–398. MR1198653
- [18] Höpfner, R., 1993b, Asymptotic inference for Markov step processes: observation up to a random time. Stochastic Process. Appl. 48, 295–310. MR1244547
- [19] Höpfner, R., Jacod, J. and Ladelli, L., 1990, Local asymptotic normality and mixed normality for Markov statistical models. Probab. Theory Related Fields 86, 105–129. MR1061951
- [20] Hosoya, Y., 1989, The bracketing condition for limit theorems on stationary linear processes. Ann. Statist. 17, 401–418. MR0981458
- [21] Huber, P. J., 1967, The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Sympos. Math. Statist. and Probability 1, 221–233. MR0216620
- [22] Kartashov, N. V., 1985a, Criteria for uniform ergodicity and strong stability of Markov chains with a common phase space. Theory Probab. Math. Statist. 30, 71–89.
- [23] Kartashov, N. V., 1985b, Inequalities in theorems of ergodicity and stability for Markov chains with common phase space. I. Theory Probab. Appl. 30, 247–259.
- [24] Kartashov, N. V., 1996, Strong Stable Markov Chains (Utrecht: VSP). MR1451375
- [25] Kessler, M., Schick, A. and Wefelmeyer, W., 2001, The information in the marginal law of a Markov chain. Bernoulli 7, 243-266. MR1828505
- [26] Kutoyants, Yu. A., 1988, On an identification problem for dynamical systems with small noise. Izv. Akad. Nauk Armyan. SSR 23, 270–285. MR0976484
- [27] Kutoyants, Yu. A., 2004, Statistical Inference for Ergodic Diffusion Processes, Springer Series in Statistics (London: Springer). MR2144185
- [28] Maigret, N., 1978, Théorème de limite centrale fonctionnel pour une chaîne de Markov récurrente au sens de Harris et positive. Ann. Inst. H. Poincaré Probab. Statist. 14, 425–440. MR0523221
- [29] McKeague, I. W., 1984, Estimation for diffusion processes under misspecified models. J. Appl. Probab. 21, 511–520. MR0752016
- [30] Meyn, S. P. and Tweedie, R. L., 1993, Markov Chains and Stochastic Stability (London: Springer). MR1287609
- [31] Müller, U. U., 2007, Weighted least squares estimators in possibly misspecified nonlinear regression. Metrika 66, 39–59. MR2306376
- [32] Ogata, Y., 1980, Maximum likelihood estimates of incorrect Markov models for time series and the derivation of AIC. J. Appl. Probab. 17, 59–72. MR0557435
- [33] Penev, S., 1991, Efficient estimation of the stationary distribution for exponentially ergodic Markov chains. J. Statist. Plann. Inference 27, 105–123. MR1089356
- [34] Pollard, D., 1985, New ways to prove central limit theorems. Econometric Theory 1, 295–314.
- [35] Sin, C.-Y. and White, H., 1996, Information criteria for selecting possibly misspecified parametric models. J. Econometrics 71, 207–225. MR1381082
- [36] White, H., 1982, Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25. MR0640163
- [37] White, H., 1984, Maximum likelihood estimation of misspecified dynamic models. In: T. K. Dijkstra (Ed) Misspecification Analysis, Lecture Notes in Economics and Mathematical Systems 237 (Berlin: Springer), pp. 1–19. MR0791952
- [38] White, H., 1994, Estimation, Inference and Specification Analysis, Econometric Society Monographs 22 (Cambridge: Cambridge University Press). MR1292251
- [39]