This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

10.1080/1744250YYxxxxxxxx \issn1744-2516 \issnp1744-2508 \jvol00 \jnum00 \jyear2006 \jmonthDecember

Optimality of estimators for misspecified semi-Markov models

URSULA U. MÜLLER{\dagger}    ANTON SCHICK1{\ddagger} and WOLFGANG WEFELMEYER§\S
{\dagger} Department of Statistics
1Supported in part by NSF Grant DMS0405791Corresponding author. Email: wefelm@math.uni-koeln.de
   Texas A&M University    College Station    TX 77843-3143    USA
{\ddagger} Department of Mathematical Sciences
   Binghamton University    Binghamton    NY 13902-6000    USA
§\S Mathematisches Institut
   Universität zu Köln    Weyertal 86-90    50931 Köln    Germany
(v3.1 released December 2006)
Abstract

Suppose we observe a geometrically ergodic semi-Markov process and have a parametric model for the transition distribution of the embedded Markov chain, for the conditional distribution of the inter-arrival times, or for both. The first two models for the process are semiparametric, and the parameters can be estimated by conditional maximum likelihood estimators. The third model for the process is parametric, and the parameter can be estimated by an unconditional maximum likelihood estimator. We determine heuristically the asymptotic distributions of these estimators and show that they are asymptotically efficient. If the parametric models are not correct, the (conditional) maximum likelihood estimators estimate the parameter that maximizes the Kullback–Leibler information. We show that they remain asymptotically efficient in a nonparametric sense.


{classcode}

Primary: 62M09; secondary: 62F12, 62G20.

keywords:
Hellinger differentiability, local asymptotic normality, asymptotically linear estimator, Markov renewal process.

1 Introduction

For i.i.d. observations, Daniels Da61 [6] and Huber Hu67 [20] show that the maximum likelihood estimator of a misspecified parametric model estimates the parameter that maximizes the Kullback–Leibler (KL) information, and determine its asymptotic distribution. Weaker conditions are given by Pollard Po85 [33]. For applications see also White Wh82 [35], Müller Mu07 [30], and Doksum, Ozeki, Kim and Neto DOKN [7]. Analogous results are obtained for parametric Markov chain models by Ogata Og80 [31], for parametric time series by Hosoya Ho89 [19] and by Andrews and Pollard AP94 [1], and for parametric diffusion models by McKeague Mc84 [28] and Kutoyants Ku88 [25]. We refer also to the monograph of Kutoyants Ku04 [26]. Applications to time series models in econometrics are studied by White Wh84 [36] and Sin and White SW96 [34], and in the monograph of White Wh94 [37].

Greenwood and Wefelmeyer GW97 [15] prove that the maximum likelihood estimator of a misspecified parametric Markov chain model is efficient in a nonparametric sense. Related efficiency results for misspecified parametric time series are in Dahlhaus and Wefelmeyer DW96 [5]. Here we outline corresponding results for semi-Markov processes. We consider both parametric and semiparametric misspecified models. The arguments are heuristic; sufficient regularity conditions can be obtained as in the above references.

Suppose we observe a semi-Markov process ZtZ_{t}, t0t\geq 0, with values in an arbitrary measurable space EE, on a time interval 0tn0\leq t\leq n. Let (X0,T0),(X1,T1),(X_{0},T_{0}),(X_{1},T_{1}),\dots denote the embedded Markov renewal process. Its transition distribution factors as

S(x,dy,du)=QR(x,dy,du)=Q(x,dy)R(x,y,du),S(x,dy,du)=Q\otimes R(x,dy,du)=Q(x,dy)R(x,y,du),

where Q(x,dy)Q(x,dy) is the transition distribution of the embedded Markov chain X0,X1,X_{0},X_{1},\dots, and R(x,y,du)R(x,y,du) is the conditional distribution of the inter-arrival time Uj=TjTj1U_{j}=T_{j}-T_{j-1} given Xj1=xX_{j-1}=x and Xj=yX_{j}=y.

We assume that the embedded Markov chain is stationary. We write P1(dx)P_{1}(dx), P2(dx,dy)P_{2}(dx,dy) and P3(dx,dy,du)P_{3}(dx,dy,du) for the stationary laws of Xj1X_{j-1}, (Xj1,Xj)(X_{j-1},X_{j}) and (Xj1,Xj,Uj)(X_{j-1},X_{j},U_{j}), respectively. Of course, P2=P1QP_{2}=P_{1}\otimes Q and P3=P2R=P1QRP_{3}=P_{2}\otimes R=P_{1}\otimes Q\otimes R. Set N=max{j:Tjn}N=\max\{j:T_{j}\leq n\}. We note that studying a semi-Markov process is equivalent to studying the embedded Markov renewal process. The latter is a Markov chain. Observing the semi-Markov process up to time nn is equivalent to observing the embedded Markov renewal process up to the random time NN.

Natural estimators for P1P_{1}, P2P_{2} and P3P_{3} are the empirical distributions

1=1Nj=1NδXj1,2=1Nj=1Nδ(Xj1,Xj),3=1Nj=1Nδ(Xj1,Xj,Uj).\mathbb{P}_{1}=\frac{1}{N}\sum_{j=1}^{N}\delta_{X_{j-1}},\qquad\mathbb{P}_{2}=\frac{1}{N}\sum_{j=1}^{N}\delta_{(X_{j-1},X_{j})},\qquad\mathbb{P}_{3}=\frac{1}{N}\sum_{j=1}^{N}\delta_{(X_{j-1},X_{j},U_{j})}.

where δx\delta_{x} denotes the Dirac measure at a point xx.

Let Θ\Theta be an open subset of d\mathbb{R}^{d}. We consider the following three models for the semi-Markov process. In Model Q we assume a parametric form Q=QϑQ=Q_{\vartheta}, ϑΘ\vartheta\in\Theta, of the transition distribution of the embedded Markov chain. These models are also considered in Greenwood, Müller and Wefelmeyer GMW04 [11]. In Model R we assume a parametric form R=RϑR=R_{\vartheta}, ϑΘ\vartheta\in\Theta, of the conditional distribution of the inter-arrival times. In Model S we assume parametric forms Q=QϑQ=Q_{\vartheta} and R=RϑR=R_{\vartheta}, ϑΘ\vartheta\in\Theta, for both. Of course, the last model covers the case that QQ and RR carry different parameters. We assume that Qϑ(x,dy)Q_{\vartheta}(x,dy) has a density qϑ(x,y)q_{\vartheta}(x,y) with respect to some dominating measure μ(dy)\mu(dy), and Rϑ(x,y,du)R_{\vartheta}(x,y,du) has a density rϑ(x,y,u)r_{\vartheta}(x,y,u) with respect to some dominating measure ν(du)\nu(du).

If Model Q holds, then the transition distribution of the semi-Markov process is semiparametric, S=Qϑ×RS=Q_{\vartheta}\times R, with RR an infinite-dimensional nuisance parameter. A natural estimator of ϑ\vartheta is the partial maximum likelihood estimator ϑ^Q\hat{\vartheta}_{Q}, which maximizes

2[logqϑ]=1Nj=1Nlogqϑ(Xj1,Xj).\mathbb{P}_{2}[\log q_{\vartheta}]=\frac{1}{N}\sum_{j=1}^{N}\log q_{\vartheta}(X_{j-1},X_{j}).

Suppose that Model Q is misspecified, and that the true transition distribution of the embedded Markov chain is QQ. Then 2[logqϑ]\mathbb{P}_{2}[\log q_{\vartheta}] is an empirical version of the KL information P2[logqϑ]P_{2}[\log q_{\vartheta}]. Let KQ(P2)K_{Q}(P_{2}) denote the parameter that maximizes P2[logqϑ]P_{2}[\log q_{\vartheta}]. We call KQK_{Q} a KL functional. Note that the partial maximum likelihood estimator is the empirical version of the KL functional, ϑ^Q=KQ(2)\hat{\vartheta}_{Q}=K_{Q}(\mathbb{P}_{2}). Since Model Q is misspecified, the semi-Markov model is nonparametric. The empirical distribution 2\mathbb{P}_{2} is efficient for P2P_{2} in a certain sense. If the KL functional is smooth, i.e. compactly differentiable in an appropriate sense, it follows that ϑ^Q=KQ(2)\hat{\vartheta}_{Q}=K_{Q}(\mathbb{P}_{2}) is efficient for KQ(P2)K_{Q}(P_{2}). We will not use this approach in this paper. Instead we derive, in Section 3, a stochastic expansion of ϑ^Q\hat{\vartheta}_{Q}, and determine its influence function. We also show that the KL functional KQK_{Q} is pathwise differentiable, and determine its canonical gradient. To keep the exposition simple, we do not give regularity conditions for these results. They can be adapted e.g. from those of Greenwood and Wefelmeyer GW97 [15]. It turns out that the canonical gradient equals the influence function of ϑ^Q\hat{\vartheta}_{Q}. By the characterisation of efficient estimators in Section 2, this shows that ϑ^Q\hat{\vartheta}_{Q} is efficient in the nonparametric semi-Markov model. We also show that ϑ^Q\hat{\vartheta}_{Q} remains efficient when Model Q is true. The advantage of our approach is that we do not need to check compact differentiability of KQK_{Q} and a corresponding efficiency property of 2\mathbb{P}_{2}.

The other two models are treated analogously. If Model R holds, then the transition distribution of the semi-Markov process is semiparametric, S=QRϑS=Q\otimes R_{\vartheta}, with QQ an infinite-dimensional nuisance parameter. A natural estimator of ϑ\vartheta is the partial maximum likelihood estimator ϑ^R\hat{\vartheta}_{R}, which maximizes

3[logrϑ]=1Nj=1Nlogrϑ(Xj1,Xj,Uj).\mathbb{P}_{3}[\log r_{\vartheta}]=\frac{1}{N}\sum_{j=1}^{N}\log r_{\vartheta}(X_{j-1},X_{j},U_{j}).

Suppose that Model Q is misspecified, and that the true conditional distribution of the inter-arrival times is RR. Then 3[logrϑ]\mathbb{P}_{3}[\log r_{\vartheta}] is an empirical version of P3[logrϑ]P_{3}[\log r_{\vartheta}]. Again we call the latter KL information. We denote by KR(P3)K_{R}(P_{3}) the parameter that maximizes P3[logrϑ]P_{3}[\log r_{\vartheta}], and we call KRK_{R} a KL functional. Then ϑ^R=KR(3)\hat{\vartheta}_{R}=K_{R}(\mathbb{P}_{3}). In Section 4 we derive a stochastic expansion of ϑ^R\hat{\vartheta}_{R} and the canonical gradient of KRK_{R} and show that ϑ^R\hat{\vartheta}_{R} is efficient in the nonparametric semi-Markov model. We also show that ϑ^R\hat{\vartheta}_{R} remains efficient when Model R is true.

If Model S holds, then the transition distribution of the semi-Markov process is parametric, Sϑ=QϑRϑS_{\vartheta}=Q_{\vartheta}\otimes R_{\vartheta}. Set

sϑ(x,y,u)=qϑ(x,y)rϑ(x,y,u).s_{\vartheta}(x,y,u)=q_{\vartheta}(x,y)r_{\vartheta}(x,y,u).

A natural estimator of ϑ\vartheta is the maximum likelihood estimator ϑ^S\hat{\vartheta}_{S}, which maximizes

3[logsϑ]=2[logqϑ]+3[logrϑ]=1Nj=1Nlogqϑ(Xj1,Xj)+1Nj=1Nlogrϑ(Xj1,Xj,Uj).\mathbb{P}_{3}[\log s_{\vartheta}]=\mathbb{P}_{2}[\log q_{\vartheta}]+\mathbb{P}_{3}[\log r_{\vartheta}]=\frac{1}{N}\sum_{j=1}^{N}\log q_{\vartheta}(X_{j-1},X_{j})+\frac{1}{N}\sum_{j=1}^{N}\log r_{\vartheta}(X_{j-1},X_{j},U_{j}).

Suppose that Model Q is misspecified, and that the true transition distribution of the embedded Markov renewal process is S=QRS=Q\otimes R. Then 3[logsϑ]\mathbb{P}_{3}[\log s_{\vartheta}] is an empirical version of P3[logsϑ]P_{3}[\log s_{\vartheta}]. Again we call the latter KL information. We denote by KS(P3)K_{S}(P_{3}) the parameter that maximizes P3[logsϑ]P_{3}[\log s_{\vartheta}], and we call KSK_{S} a KL functional. Then ϑ^S=KS(3)\hat{\vartheta}_{S}=K_{S}(\mathbb{P}_{3}). In Section 5 we derive a stochastic expansion of ϑ^S\hat{\vartheta}_{S} and the canonical gradient of KSK_{S} and show that ϑ^S\hat{\vartheta}_{S} is efficient in the nonparametric semi-Markov model. We also show that ϑ^S\hat{\vartheta}_{S} remains efficient when Model S is true. Section 6 contains some additional comments.

2 Characterization of efficient estimators

We assume that the embedded Markov chain is positive Harris recurrent and geometrically ergodic in L2(P2)L_{2}(P_{2}). We make the usual assumption that the conditional distribution of the inter-arrival times does not charge zero. We also assume that the mean inter-arrival time m=EUjm=EU_{j} is finite. Then

n/Nma.s.n/N\to m\quad a.s. (1)

For a function fL2(P3)f\in L_{2}(P_{3}) we have the strong law of large numbers

1Nj=1Nf(Xj1,Xj,Uj)P3[f]a.s.\frac{1}{N}\sum_{j=1}^{N}f(X_{j-1},X_{j},U_{j})\to P_{3}[f]\quad\mbox{a.s.} (2)

For a function fL2(P3)f\in L_{2}(P_{3}) with Sf=0Sf=0 we have the martingale central limit theorem

n1/2j=1Nf(Xj1,Xj,Uj)m1/2(P3[f2])1/2Y,n^{-1/2}\sum_{j=1}^{N}f(X_{j-1},X_{j},U_{j})\Rightarrow m^{-1/2}(P_{3}[f^{2}])^{1/2}Y, (3)

where YY denotes a standard normal random variable.

In order to characterize efficient estimators for functionals of semi-Markov models, we consider a family QδQ_{\delta}, δΔ\delta\in\Delta, of transition distributions of the embedded Markov chain, and a family RδR_{\delta}, δΔ\delta\in\Delta, of conditional distributions of the inter-arrival time. Here Δ\Delta is a possibly infinite-dimensional set, the parameter space. We fix δΔ\delta\in\Delta and set Q=QδQ=Q_{\delta}, R=RδR=R_{\delta} and

V={vL2(P2):Qv=0},W={wL2(P3):Rw=0}.V=\{v\in L_{2}(P_{2}):Qv=0\},\qquad W=\{w\in L_{2}(P_{3}):Rw=0\}.

Note that VV and WW can be viewed as orthogonal subspaces of L2(P3)L_{2}(P_{3}). We assume that the parametrization is smooth in the following sense. There is a linear space KK, the tangent space of Δ\Delta, and a bounded linear operator D=(DQ,DR):KV×WD=(D_{Q},D_{R}):K\to V\times W, and for each kKk\in K there is a sequence δnk\delta_{nk} in Δ\Delta such that Qnk=QδnkQ_{nk}=Q_{\delta_{nk}} is Hellinger differentiable at QQ with derivative DQkVD_{Q}k\in V,

P1[(dQnk1/2dQ1/212n1/2DQkdQ1/2)2]0,P_{1}\Big{[}\int\Big{(}dQ_{nk}^{1/2}-dQ^{1/2}-\frac{1}{2}n^{-1/2}D_{Q}k\,dQ^{1/2}\Big{)}^{2}\Big{]}\to 0,

and Rnk=RδnkR_{nk}=R_{\delta_{nk}} is Hellinger differentiable at RR with derivative DRkWD_{R}k\in W,

P2[(dRnk1/2dR1/212n1/2DRkdR1/2)2]0.P_{2}\Big{[}\int\Big{(}dR_{nk}^{1/2}-dR^{1/2}-\frac{1}{2}n^{-1/2}D_{R}k\,dR^{1/2}\Big{)}^{2}\Big{]}\to 0.

Now write MnM_{n} for the distribution of ZtZ_{t}, 0tn0\leq t\leq n, if QQ and RR are in effect, and MnkM_{nk} if QnkQ_{nk} and RnkR_{nk} are. By Taylor expansion and 2 and 3, we obtain local asymptotic normality:

logdMnkdMn=n1/2j=1N(DQk(Xj1,Xj)+DRk(Xj1,Xj,Uj))m1(P2[DQ2k]+P3[DR2k])+op(1)\log\frac{dM_{nk}}{dM_{n}}=n^{-1/2}\sum_{j=1}^{N}\big{(}D_{Q}k(X_{j-1},X_{j})+D_{R}k(X_{j-1},X_{j},U_{j})\big{)}-m^{-1}(P_{2}[D_{Q}^{2}k]+P_{3}[D_{R}^{2}k])+o_{p}(1) (4)

and

n1/2j=1N(DQk(Xj1,Xj)+DRk(Xj1,Xj,Uj))m1/2(P2[DQ2k]+P3[DR2k])1/2Y.n^{-1/2}\sum_{j=1}^{N}\big{(}D_{Q}k(X_{j-1},X_{j})+D_{R}k(X_{j-1},X_{j},U_{j})\big{)}\Rightarrow m^{-1/2}(P_{2}[D_{Q}^{2}k]+P_{3}[D_{R}^{2}k])^{1/2}Y. (5)

For Markov chains, different proofs are in Penev Pe91 [32], Bickel Bi93 [2] and Greenwood and Wefelmeyer GW95 [13]; see also Bickel and Kwon BK01 [4]. For Markov step processes see Höpfner, Jacod and Ladelli HJL90 [18] and Höpfner Ho93a [16, 17]. A proof for nonparametric semi-Markov models is in Greenwood and Wefelmeyer GW96 [14].

We want to estimate a dd-dimensional functional φ:Δd\varphi:\Delta\to\mathbb{R}^{d} of the parameter δ\delta. We call φ\varphi differentiable at δ\delta with gradient (vφ,wφ)(v_{\varphi},w_{\varphi}) if vφVdv_{\varphi}\in V^{d}, wφWdw_{\varphi}\in W^{d}, and

n1/2(φ(δnk)φ(δ))m1(P2[vφDQk]+P3[wφDRk]),kK.n^{1/2}(\varphi(\delta_{nk})-\varphi(\delta))\to m^{-1}(P_{2}[v_{\varphi}D_{Q}k]+P_{3}[w_{\varphi}D_{R}k]),\quad k\in K. (6)

The canonical gradient (vφ,wφ)(v_{\varphi}^{*},w_{\varphi}^{*}) of φ\varphi is the componentwise projection of (vφ,wφ)(v_{\varphi},w_{\varphi}) onto the closure of (DK)d(DK)^{d} in (L2(P3))d(L_{2}(P_{3}))^{d}. If DKDK is closed in L2(P3)L_{2}(P_{3}), we can write (vφ,wφ)=(DQkφ,DRkφ)(v_{\varphi}^{*},w_{\varphi}^{*})=(D_{Q}k_{\varphi},D_{R}k_{\varphi}) for some kφKk_{\varphi}\in K. This will be the case in Sections 3–5.

An estimator φ^\hat{\varphi} is called regular for φ\varphi at δ\delta with limit LL if LL is a dd-dimensional random vector such that

n1/2(φ^φ(δnk))Lunder Mnk,kK.n^{1/2}(\hat{\varphi}-\varphi(\delta_{nk}))\Rightarrow L\quad\mbox{under }M_{nk},\quad k\in K.

The convolution theorem says that

L=A+m1/2(P2[vφvφ]+P3[wφwφ])1/2Yd,L=A+m^{-1/2}(P_{2}[v_{\varphi}^{*}v_{\varphi}^{*\top}]+P_{3}[w_{\varphi}^{*}w_{\varphi}^{*\top}])^{1/2}Y_{d},

with YdY_{d} a dd-dimensional standard normal random vector, and AA a dd-dimensional random vector independent of YdY_{d}. This justifies calling φ^\hat{\varphi} efficient for φ\varphi at δ\delta if n1/2(φ^φ(δ))n^{1/2}(\hat{\varphi}-\varphi(\delta)) is asymptotically normal under MnM_{n} with covariance matrix m1(P2[vφvφ]+P3[wφwφ])m^{-1}(P_{2}[v_{\varphi}^{*}v_{\varphi}^{*\top}]+P_{3}[w_{\varphi}^{*}w_{\varphi}^{*\top}]).

An estimator φ^\hat{\varphi} is called asymptotically linear for φ\varphi at δ\delta with influence function (a,b)(a,b) if aVda\in V^{d}, bWdb\in W^{d}, and

n1/2(φ^φ(δ))=n1/2j=1N(a(Xj1,Xj)+b(Xj1,Xj,Uj))+op(1).n^{1/2}(\hat{\varphi}-\varphi(\delta))=n^{-1/2}\sum_{j=1}^{N}\big{(}a(X_{j-1},X_{j})+b(X_{j-1},X_{j},U_{j})\big{)}+o_{p}(1).

We have the following characterization. An estimator φ^\hat{\varphi} is regular and efficient for φ\varphi at δ\delta if and only if it is asymptotically linear with influence function equal to the canonical gradient,

n1/2(φ^φ(δ))=n1/2j=1N(vφ(Xj1,Xj)+wφ(Xj1,Xj,Uj))+op(1).n^{1/2}(\hat{\varphi}-\varphi(\delta))=n^{-1/2}\sum_{j=1}^{N}\big{(}v_{\varphi}^{*}(X_{j-1},X_{j})+w_{\varphi}^{*}(X_{j-1},X_{j},U_{j})\big{)}+o_{p}(1).

For proofs of the convolution theorem and the characterization we refer to Bickel, Klaassen, Ritov and Wellner BKRW98 [3].

To prove asymptotic linearity of estimators in misspecified models, we need the following martingale approximation. Set L2,0(P2)={fL2(P2):P2[f]=0}L_{2,0}(P_{2})=\{f\in L_{2}(P_{2}):P_{2}[f]=0\}. The potential GG of the embedded Markov chain is defined by

Gf=i=0Qif,fL2,0(P2).Gf=\sum_{i=0}^{\infty}Q^{i}f,\quad f\in L_{2,0}(P_{2}).

For fL2(P2)f\in L_{2}(P_{2}) set

Af(x,y)=G(fP2[f])(y)QG(fP2[f])(x)=i=0(Qif(y)Qi+1f(x)).Af(x,y)=G(f-P_{2}[f])(y)-QG(f-P_{2}[f])(x)=\sum_{i=0}^{\infty}(Q^{i}f(y)-Q^{i+1}f(x)).

Then QAf=0QAf=0 and

P2[(Af)2]=P2[f2](P2[f])2+2i=1P2[(fP2[f])Qif].P_{2}[(Af)^{2}]=P_{2}[f^{2}]-(P_{2}[f])^{2}+2\sum_{i=1}^{\infty}P_{2}[(f-P_{2}[f])Q^{i}f].

Let fL2(P3)f\in L_{2}(P_{3}) and set f0=fRff_{0}=f-Rf. Then we obtain the stochastic expansion

n1/2j=1N(f(Xj1,Xj,Uj)P3[f])=n1/2j=1N(ARf(Xj1,Xj)+f0(Xj1,Xj,Uj))+op(1).n^{-1/2}\sum_{j=1}^{N}(f(X_{j-1},X_{j},U_{j})-P_{3}[f])=n^{-1/2}\sum_{j=1}^{N}\big{(}ARf(X_{j-1},X_{j})+f_{0}(X_{j-1},X_{j},U_{j})\big{)}+o_{p}(1). (7)

Note that QARf=0QARf=0 and Sf0=0Sf_{0}=0. Hence ARf(Xj1,Xj)ARf(X_{j-1},X_{j}) and f0(Xj1,Xj,Uj)f_{0}(X_{j-1},X_{j},U_{j}) are orthogonal martingale increments. For discrete-time processes, the martingale approximation 7 is due to Gordin Go69 [9] and Gordin and Lifšic GL78 [10]. It was discovered independently by Maigret Ma78 [27], Dürr and Goldstein DG86 [8] and Greenwood and Wefelmeyer GW95 [13]. See also Section 17.4 in the monograph of Meyn and Tweedie MT93 [29]. The martingale approximation 7 and the martingale central limit theorem 3 imply that

n1/2j=1N(f(Xj1,Xj,Uj)P3[f])m1/2(P2[(ARf)2]+P3[(fRf)2])1/2Y.n^{-1/2}\sum_{j=1}^{N}(f(X_{j-1},X_{j},U_{j})-P_{3}[f])\Rightarrow m^{-1/2}(P_{2}[(ARf)^{2}]+P_{3}[(f-Rf)^{2}])^{1/2}Y.

To calculate canonical gradients of functionals in misspecified models, we need the following perturbation expansion, due to Kartashov Ka85a [21, 22, 23],

n1/2(P2nk[f]P2[f])P2[DQkAf],kK.n^{1/2}(P_{2nk}[f]-P_{2}[f])\to P_{2}[D_{Q}k\cdot Af],\quad k\in K. (8)

Here P2nkP_{2nk} denotes the distribution of (Xj1,Xj)(X_{j-1},X_{j}) if QnkQ_{nk} is in effect. This pathwise version of the perturbation expansion suffices for our purposes. Greenwood and Wefelmeyer GW95 [13] show that it follows also from the martingale approximation 7.

3 Model Q

In Model Q we assume a parametric model qϑq_{\vartheta}, ϑΘd\vartheta\in\Theta\subset\mathbb{R}^{d}, for the μ\mu-density of the transition distribution of the embedded Markov chain, and consider the conditional inter-arrival time distribution as unknown. Suppose the model is misspecified, and the true transition distribution is QQ. Then the KL functional KQ(P2)K_{Q}(P_{2}) maximizes P2[logqϑ]P_{2}[\log q_{\vartheta}], and the partial maximum likelihood estimator ϑ^Q\hat{\vartheta}_{Q} maximizes 2[logqϑ]\mathbb{P}_{2}[\log q_{\vartheta}]. Write

χϑ(x,y)=ϑlogqϑ(x,y)\chi_{\vartheta}(x,y)=\partial_{\vartheta}\log q_{\vartheta}(x,y)

for the dd-dimensional vector of partial derivatives of logqϑ(x,y)\log q_{\vartheta}(x,y). Then KQ(P2)K_{Q}(P_{2}) solves P2[χϑ]=0P_{2}[\chi_{\vartheta}]=0, and ϑ^Q\hat{\vartheta}_{Q} solves 2[χϑ]=0\mathbb{P}_{2}[\chi_{\vartheta}]=0. Heuristically, by Taylor expansion,

0\displaystyle 0 =\displaystyle= 2[χϑ^Q]=1Nj=1Nχϑ^Q(Xj1,Xj)\displaystyle\mathbb{P}_{2}[\chi_{\hat{\vartheta}_{Q}}]=\frac{1}{N}\sum_{j=1}^{N}\chi_{\hat{\vartheta}_{Q}}(X_{j-1},X_{j}) (9)
=\displaystyle= 1Nj=1NχKQ(P2)(Xj1,Xj)+1Nj=1Nχ˙KQ(P2)(Xj1,Xj)(ϑ^QKQ(P2))+op(n1/2).\displaystyle\frac{1}{N}\sum_{j=1}^{N}\chi_{K_{Q}(P_{2})}(X_{j-1},X_{j})+\frac{1}{N}\sum_{j=1}^{N}\dot{\chi}_{K_{Q}(P_{2})}(X_{j-1},X_{j})(\hat{\vartheta}_{Q}-K_{Q}(P_{2}))+o_{p}(n^{-1/2}).

Here χ˙ϑ(x,y)\dot{\chi}_{\vartheta}(x,y) is the d×dd\times d matrix of partial derivatives of χϑ(x,y)\chi_{\vartheta}(x,y). With 1 and 2 we obtain

n1/2(ϑ^QKQ(P2))=m(P2[χ˙KQ(P2)])1n1/2j=1NχKQ(P2)(Xj1,Xj)+op(1).n^{1/2}(\hat{\vartheta}_{Q}-K_{Q}(P_{2}))=-m(P_{2}[\dot{\chi}_{K_{Q}(P_{2})}])^{-1}n^{-1/2}\sum_{j=1}^{N}\chi_{K_{Q}(P_{2})}(X_{j-1},X_{j})+o_{p}(1). (10)

If Model Q is correctly specified and Q=QϑQ=Q_{\vartheta}, then KQ(P2)=ϑK_{Q}(P_{2})=\vartheta. We also have the following relations, which are well-known in the i.i.d. case,

0=ϑQϑ(,E)=Qϑχϑ,0=ϑQϑχϑ=Qϑχϑχϑ+Qϑχ˙ϑ.0=\partial_{\vartheta}Q_{\vartheta}(\cdot,E)=Q_{\vartheta}\chi_{\vartheta},\qquad 0=\partial_{\vartheta}Q_{\vartheta}\chi_{\vartheta}=Q_{\vartheta}\chi_{\vartheta}\chi_{\vartheta}^{\top}+Q_{\vartheta}\dot{\chi}_{\vartheta}.

In particular, the partial Fisher information matrix for Model Q is Iϑ=P2[χ˙ϑ]=P2[χϑχϑ]I_{\vartheta}=-P_{2}[\dot{\chi}_{\vartheta}]=P_{2}[\chi_{\vartheta}\chi_{\vartheta}^{\top}]. Hence, for the correctly specified model, the partial maximum likelihood estimator ϑ^Q\hat{\vartheta}_{Q} has the stochastic expansion

n1/2(ϑ^Qϑ)=mIϑ1n1/2j=1Nχϑ(Xj1,Xj)+op(1).n^{1/2}(\hat{\vartheta}_{Q}-\vartheta)=mI_{\vartheta}^{-1}n^{-1/2}\sum_{j=1}^{N}\chi_{\vartheta}(X_{j-1},X_{j})+o_{p}(1).

This means that ϑ^Q\hat{\vartheta}_{Q} is asymptotically linear with influence function mIϑ1(χϑ,0)mI_{\vartheta}^{-1}(\chi_{\vartheta},0), and n1/2(ϑ^Qϑ)n^{1/2}(\hat{\vartheta}_{Q}-\vartheta) is asymptotically normal with covariance matrix mIϑ1mI_{\vartheta}^{-1}.

If the model is misspecified, then χKQ(P2)\chi_{K_{Q}(P_{2})} is not in VdV^{d}. We apply the martingale approximation 7 to 10 and see that ϑ^Q\hat{\vartheta}_{Q} is asymptotically linear with influence function m(P2[χ˙KQ(P2)])1(AχKQ(P2),0)-m(P_{2}[\dot{\chi}_{K_{Q}(P_{2})}])^{-1}(A\chi_{K_{Q}(P_{2})},0). Hence n1/2(ϑ^QKQ(P2))n^{1/2}(\hat{\vartheta}_{Q}-K_{Q}(P_{2})) is asymptotically normal with covariance matrix

m(P2[χ˙KQ(P2)])1P2[AχKQ(P2)AχKQ(P2)](P2[χ˙KQ(P2)])1.m(P_{2}[\dot{\chi}_{K_{Q}(P_{2})}])^{-1}P_{2}[A\chi_{K_{Q}(P_{2})}A^{\top}\chi_{K_{Q}(P_{2})}](P_{2}[\dot{\chi}_{K_{Q}(P_{2})}])^{-1}.

Let us now prove efficiency of ϑ^Q\hat{\vartheta}_{Q}, first for the correctly specified model. For cdc\in\mathbb{R}^{d} set ϑnc=ϑ+n1/2c\vartheta_{nc}=\vartheta+n^{-1/2}c. Assume that qnc=qϑncq_{nc}=q_{\vartheta_{nc}} is Hellinger differentiable at ϑ\vartheta,

(qnc1/2(x,y)qϑ1/2(x,y)12n1/2cχϑ(x,y)qϑ1/2(x,y))2μ(dy)P1(dx)0.\int\int\Big{(}q_{nc}^{1/2}(x,y)-q_{\vartheta}^{1/2}(x,y)-\frac{1}{2}n^{-1/2}c^{\top}\chi_{\vartheta}(x,y)q_{\vartheta}^{1/2}(x,y)\Big{)}^{2}\mu(dy)P_{1}(dx)\to 0. (11)

Let \mathcal{R} denote the set of all conditional inter-arrival distributions. For wWw\in W choose a sequence RnwR_{nw} in \mathcal{R} that is Hellinger differentiable at RR,

P2[(dRnw1/2dR1/212n1/2wdR1/2)2]0.P_{2}\Big{[}\int\Big{(}dR_{nw}^{1/2}-dR^{1/2}-\frac{1}{2}n^{-1/2}w\,dR^{1/2}\Big{)}^{2}\Big{]}\to 0. (12)

Then the assumptions of Section 2 hold with Δ=Θ×\Delta=\Theta\times\mathcal{R}, K=d×WK=\mathbb{R}^{d}\times W, DQ(c,w)=cχϑD_{Q}(c,w)=c^{\top}\chi_{\vartheta}, DR(c,w)=wD_{R}(c,w)=w. The functional to be estimated is φ(ϑ,R)=ϑ\varphi(\vartheta,R)=\vartheta. By orthogonality of VV and WW, its canonical gradient is obtained from 6 as (cϑχϑ,0)(c_{\vartheta}^{\top}\chi_{\vartheta},0) with d×dd\times d matrix cϑc_{\vartheta} determined by

c=m1cϑP2[χϑχϑ]c=m1cϑIϑc,c,c=m^{-1}c_{\vartheta}^{\top}P_{2}[\chi_{\vartheta}\chi_{\vartheta}^{\top}]c=m^{-1}c_{\vartheta}^{\top}I_{\vartheta}c,\quad c\in\mathbb{R},

i.e. cϑ=mIϑ1c_{\vartheta}=mI_{\vartheta}^{-1}. Hence the canonical gradient of ϑ\vartheta is mIϑ1(χϑ,0)mI_{\vartheta}^{-1}(\chi_{\vartheta},0) and equals the influence function of ϑ^\hat{\vartheta}, which is therefore efficient for the correctly specified model.

Suppose now that the model is misspecified, and let 𝒬\mathcal{Q} be the set of all transition distributions of the embedded Markov chain. Let QQ denote the true transition distribution. For vVv\in V choose a sequence QnvQ_{nv} in 𝒬\mathcal{Q} that is Hellinger differentiable at QQ,

P1[(dQnv1/2dQ1/212n1/2vdQ1/2)2]0.P_{1}\Big{[}\int\Big{(}dQ_{nv}^{1/2}-dQ^{1/2}-\frac{1}{2}n^{-1/2}v\,dQ^{1/2}\Big{)}^{2}\Big{]}\to 0. (13)

Then the assumptions of Section 2 hold with Δ=𝒬×\Delta=\mathcal{Q}\times\mathcal{R}, K=V×WK=V\times W, DQ(v,w)=vD_{Q}(v,w)=v, DR(v,w)=wD_{R}(v,w)=w. The functional to be estimated is φ(Q,R)=KQ(P2)\varphi(Q,R)=K_{Q}(P_{2}). Heuristically,

0=P2nv[χKQ(P2nv)]=P2nv[χKQ(P2)]+P2nv[χ˙KQ(P2)](KQ(P2nv)KQ(P2))+op(n1/2).0=P_{2nv}[\chi_{K_{Q}(P_{2nv})}]=P_{2nv}[\chi_{K_{Q}(P_{2})}]+P_{2nv}[\dot{\chi}_{K_{Q}(P_{2})}](K_{Q}(P_{2nv})-K_{Q}(P_{2}))+o_{p}(n^{-1/2}).

With P2nv[χ˙KQ(P2)]P2[χ˙KQ(P2)]P_{2nv}[\dot{\chi}_{K_{Q}(P_{2})}]\to P_{2}[\dot{\chi}_{K_{Q}(P_{2})}] we obtain

KQ(P2nv)KQ(P2)=(P2[χ˙KQ(P2)])1P2nv[χKQ(P2)]+op(n1/2).K_{Q}(P_{2nv})-K_{Q}(P_{2})=-(P_{2}[\dot{\chi}_{K_{Q}(P_{2})}])^{-1}P_{2nv}[\chi_{K_{Q}(P_{2})}]+o_{p}(n^{-1/2}).

The perturbation expansion 8 yields

n1/2P2nv[χKQ(P2)]=n1/2(P2nvP2)[χKQ(P2)]P2[vAχKQ(P2)].n^{1/2}P_{2nv}[\chi_{K_{Q}(P_{2})}]=n^{1/2}(P_{2nv}-P_{2})[\chi_{K_{Q}(P_{2})}]\to P_{2}[vA\chi_{K_{Q}(P_{2})}]. (14)

Hence

n1/2(KQ(P2nv)KQ(P2))(P2[χ˙KQ(P2)])1P2[vAχKQ(P2)],vV,n^{1/2}(K_{Q}(P_{2nv})-K_{Q}(P_{2}))\to-(P_{2}[\dot{\chi}_{K_{Q}(P_{2})}])^{-1}P_{2}[vA\chi_{K_{Q}(P_{2})}],\quad v\in V,

and the canonical gradient of KQK_{Q} is obtained from 6 as m(P2[χ˙KQ(P2)])1(AχKQ(P2),0)-m(P_{2}[\dot{\chi}_{K_{Q}(P_{2})}])^{-1}(A\chi_{K_{Q}(P_{2})},0) and equals the influence function of ϑ^Q\hat{\vartheta}_{Q}, which is therefore efficient for the misspecified model.

4 Model R

Model R is completely analogous to Model Q, with interchanged roles of the transition distribution QQ of the embedded Markov chain, and the conditional inter-arrival time distribution RR. Specifically, in Model R we assume a parametric model rϑr_{\vartheta}, ϑΘd\vartheta\in\Theta\subset\mathbb{R}^{d}, for the ν\nu-density of the conditional inter-arrival time, and consider the transition distribution of the embedded Markov chain as unknown. Suppose the model is misspecified, and the true conditional inter-arrival time distribution is RR. Then the KL functional KR(P3)K_{R}(P_{3}) maximizes P3[logrϑ]P_{3}[\log r_{\vartheta}], and the partial maximum likelihood estimator ϑ^Q\hat{\vartheta}_{Q} maximizes 3[logrϑ]\mathbb{P}_{3}[\log r_{\vartheta}]. Write

ϱϑ(x,y,u)=ϑlogrϑ(x,y,u)\varrho_{\vartheta}(x,y,u)=\partial_{\vartheta}\log r_{\vartheta}(x,y,u)

for the dd-dimensional vector of partial derivatives of logrϑ(x,y,u)\log r_{\vartheta}(x,y,u). Then KR(P3)K_{R}(P_{3}) solves P3[ϱϑ]=0P_{3}[\varrho_{\vartheta}]=0, and ϑ^R\hat{\vartheta}_{R} solves 3[ϱϑ]=0\mathbb{P}_{3}[\varrho_{\vartheta}]=0. Heuristically, by Taylor expansion,

0\displaystyle 0 =\displaystyle= 3[ϱϑ^R]=1Nj=1Nϱϑ^R(Xj1,Xj,Uj)\displaystyle\mathbb{P}_{3}[\varrho_{\hat{\vartheta}_{R}}]=\frac{1}{N}\sum_{j=1}^{N}\varrho_{\hat{\vartheta}_{R}}(X_{j-1},X_{j},U_{j}) (15)
=\displaystyle= 1Nj=1NϱKR(P3)(Xj1,Xj,Uj)+1Nj=1Nϱ˙KR(P3)(Xj1,Xj,Uj)(ϑ^RKR(P3))+op(n1/2).\displaystyle\frac{1}{N}\sum_{j=1}^{N}\varrho_{K_{R}(P_{3})}(X_{j-1},X_{j},U_{j})+\frac{1}{N}\sum_{j=1}^{N}\dot{\varrho}_{K_{R}(P_{3})}(X_{j-1},X_{j},U_{j})(\hat{\vartheta}_{R}-K_{R}(P_{3}))+o_{p}(n^{-1/2}).

Here ϱ˙ϑ(x,y,u)\dot{\varrho}_{\vartheta}(x,y,u) is the d×dd\times d matrix of partial derivatives of ϱϑ(x,y,u)\varrho_{\vartheta}(x,y,u). With 1 and 2 we obtain

n1/2(ϑ^RKR(P3))=m(P3[ϱ˙KR(P3)])1n1/2j=1NϱKR(P3)(Xj1,Xj,Uj)+op(1).n^{1/2}(\hat{\vartheta}_{R}-K_{R}(P_{3}))=-m(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1}n^{-1/2}\sum_{j=1}^{N}\varrho_{K_{R}(P_{3})}(X_{j-1},X_{j},U_{j})+o_{p}(1). (16)

If Model R is correctly specified and R=RϑR=R_{\vartheta}, then KR(P3)=ϑK_{R}(P_{3})=\vartheta. We also have the following relations,

0=ϑRϑ(,,)=Rϑϱϑ,0=ϑRϑϱϑ=Rϑϱϑϱϑ+Rϑϱ˙ϑ.0=\partial_{\vartheta}R_{\vartheta}(\cdot,\cdot,\mathbb{R})=R_{\vartheta}\varrho_{\vartheta},\qquad 0=\partial_{\vartheta}R_{\vartheta}\varrho_{\vartheta}=R_{\vartheta}\varrho_{\vartheta}\varrho_{\vartheta}^{\top}+R_{\vartheta}\dot{\varrho}_{\vartheta}.

In particular, the partial Fisher information matrix for Model R is Jϑ=P3[ϱ˙ϑ]=P3[ϱϑϱϑ]J_{\vartheta}=-P_{3}[\dot{\varrho}_{\vartheta}]=P_{3}[\varrho_{\vartheta}\varrho_{\vartheta}^{\top}]. Hence, for the correctly specified model, the partial maximum likelihood estimator ϑ^R\hat{\vartheta}_{R} has the stochastic expansion

n1/2(ϑ^Rϑ)=mJϑ1n1/2j=1Nϱϑ(Xj1,Xj,Uj)+op(1).n^{1/2}(\hat{\vartheta}_{R}-\vartheta)=mJ_{\vartheta}^{-1}n^{-1/2}\sum_{j=1}^{N}\varrho_{\vartheta}(X_{j-1},X_{j},U_{j})+o_{p}(1).

This means that ϑ^R\hat{\vartheta}_{R} is asymptotically linear with influence function mJϑ1(0,ϱϑ)mJ_{\vartheta}^{-1}(0,\varrho_{\vartheta}), and n1/2(ϑ^Rϑ)n^{1/2}(\hat{\vartheta}_{R}-\vartheta) is asymptotically normal with covariance matrix mJϑ1mJ_{\vartheta}^{-1}.

If the model is misspecified, then ϱKR(P3)\varrho_{K_{R}(P_{3})} is not in WdW^{d}. We apply the martingale approximation 7 to 16 and see that ϑ^R\hat{\vartheta}_{R} is asymptotically linear with influence function

m(P3[ϱ˙KR(P3)])1(ARϱKR(P3),ϱKR(P3)RϱKR(P3)).-m(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1}(AR\varrho_{K_{R}(P_{3})},\varrho_{K_{R}(P_{3})}-R\varrho_{K_{R}(P_{3})}).

Hence n1/2(ϑ^RKR(P3))n^{1/2}(\hat{\vartheta}_{R}-K_{R}(P_{3})) is asymptotically normal with covariance matrix

m(P3[ϱ˙KR(P3)])1ΣR(P3[ϱ˙KR(P3)])1,m(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1}\Sigma_{R}(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1},

where

ΣR=P2[ARϱKR(P3)ARϱKR(P3)]+P3[(ϱKR(P3)RϱKR(P3))(ϱKR(P3)RϱKR(P3))].\Sigma_{R}=P_{2}[AR\varrho_{K_{R}(P_{3})}A^{\top}R\varrho_{K_{R}(P_{3})}]+P_{3}[(\varrho_{K_{R}(P_{3})}-R\varrho_{K_{R}(P_{3})})(\varrho_{K_{R}(P_{3})}-R\varrho_{K_{R}(P_{3})})^{\top}].

Let us now prove efficiency of ϑ^R\hat{\vartheta}_{R}, first for the correctly specified model. For cdc\in\mathbb{R}^{d} set ϑnc=ϑ+n1/2c\vartheta_{nc}=\vartheta+n^{-1/2}c. Assume that rnc=rϑncr_{nc}=r_{\vartheta_{nc}} is Hellinger differentiable at ϑ\vartheta,

(rnc1/2(x,y,u)rϑ1/2(x,y,u)12n1/2cϱϑ(x,y,u)rϑ1/2(x,y,u))2ν(du)P2(d(x,y))0.\int\int\Big{(}r_{nc}^{1/2}(x,y,u)-r_{\vartheta}^{1/2}(x,y,u)-\frac{1}{2}n^{-1/2}c^{\top}\varrho_{\vartheta}(x,y,u)r_{\vartheta}^{1/2}(x,y,u)\Big{)}^{2}\nu(du)P_{2}(d(x,y))\to 0. (17)

Let 𝒬\mathcal{Q} denote the set of all transition distributions of the embedded Markov chain. For vVv\in V choose a sequence QnvQ_{nv} in 𝒬\mathcal{Q} that is Hellinger differentiable 13 at QQ. Then the assumptions of Section 2 hold with Δ=𝒬×Θ\Delta=\mathcal{Q}\times\Theta, K=V×dK=V\times\mathbb{R}^{d}, DQ(v,c)=vD_{Q}(v,c)=v, DR(v,c)=cϱϑD_{R}(v,c)=c^{\top}\varrho_{\vartheta}. The functional to be estimated is φ(Q,ϑ)=ϑ\varphi(Q,\vartheta)=\vartheta. By orthogonality of VV and WW, its canonical gradient is obtained from 6 as (0,cϑϱϑ)(0,c_{\vartheta}^{\top}\varrho_{\vartheta}) with d×dd\times d matrix cϑc_{\vartheta} determined by

c=m1cϑJϑc,c,c=m^{-1}c_{\vartheta}^{\top}J_{\vartheta}c,\quad c\in\mathbb{R},

i.e. cϑ=mJϑ1c_{\vartheta}=mJ_{\vartheta}^{-1}. Hence the canonical gradient of ϑ\vartheta is mJϑ1(0,ϱϑ)mJ_{\vartheta}^{-1}(0,\varrho_{\vartheta}) and equals the influence function of ϑ^\hat{\vartheta}, which is therefore efficient for the correctly specified model.

Suppose now that the model is misspecified, and let \mathcal{R} be the set of all transition distributions of the embedded Markov chain. Let RR denote the true transition distribution. For wWw\in W choose a sequence RnwR_{nw} in \mathcal{R} that is Hellinger differentiable 12 at RR. Then the assumptions of Section 2 hold with Δ=𝒬×\Delta=\mathcal{Q}\times\mathcal{R}, K=V×WK=V\times W, DQ(v,w)=vD_{Q}(v,w)=v, DR(v,w)=wD_{R}(v,w)=w. The functional to be estimated is φ(Q,R)=KR(P3)\varphi(Q,R)=K_{R}(P_{3}). Heuristically,

0=P3nvw[ϱKR(P3nvw)]=P3nvw[ϱKR(P3)]+P3nvw[ϱ˙KR(P3)](KR(P3nvw)KR(P3))+op(n1/2).0=P_{3nvw}[\varrho_{K_{R}(P_{3nvw})}]=P_{3nvw}[\varrho_{K_{R}(P_{3})}]+P_{3nvw}[\dot{\varrho}_{K_{R}(P_{3})}](K_{R}(P_{3nvw})-K_{R}(P_{3}))+o_{p}(n^{-1/2}).

With P3nvw[ϱ˙KR(P3)]P3[ϱ˙KR(P3)]P_{3nvw}[\dot{\varrho}_{K_{R}(P_{3})}]\to P_{3}[\dot{\varrho}_{K_{R}(P_{3})}] we obtain

KR(P3nvw)KR(P3)=(P3[ϱ˙KR(P3)])1P3nvw[ϱKR(P3)]+op(n1/2).K_{R}(P_{3nvw})-K_{R}(P_{3})=-(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1}P_{3nvw}[\varrho_{K_{R}(P_{3})}]+o_{p}(n^{-1/2}).

Write P3nvw=P2nvRnwP_{3nvw}=P_{2nv}\otimes R_{nw} and apply the perturbation expansion 14 to obtain

n1/2(KR(P3nvw)KR(P3))\displaystyle n^{1/2}(K_{R}(P_{3nvw})-K_{R}(P_{3})) \displaystyle\to (P3[ϱ˙KR(P3)])1(P2[vARϱKR(P3)]+P3[wϱKR(P3)])\displaystyle-(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1}\Big{(}P_{2}[vAR\varrho_{K_{R}(P_{3})}]+P_{3}[w\varrho_{K_{R}(P_{3})}]\Big{)}
=(P3[ϱ˙KR(P3)])1(P2[vARϱKR(P3)]+P3[w(ϱKR(P3)RϱKR(P3))]),\displaystyle=-(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1}\Big{(}P_{2}[vAR\varrho_{K_{R}(P_{3})}]+P_{3}[w(\varrho_{K_{R}(P_{3})}-R\varrho_{K_{R}(P_{3})})]\Big{)},

and the canonical gradient of KRK_{R} is obtained from 6 as

m(P3[ϱ˙KR(P3)])1(ARϱKR(P3),ϱKR(P3)RϱKR(P3))-m(P_{3}[\dot{\varrho}_{K_{R}(P_{3})}])^{-1}(AR\varrho_{K_{R}(P_{3})},\varrho_{K_{R}(P_{3})}-R\varrho_{K_{R}(P_{3})})

and equals the influence function of ϑ^R\hat{\vartheta}_{R}, which is therefore efficient for the misspecified model.

5 Model S

While Models Q and R are semiparametric, Models S is parametric. In Model S we assume parametric models qϑq_{\vartheta} and rϑr_{\vartheta}, ϑΘd\vartheta\in\Theta\subset\mathbb{R}^{d}, for the μ\mu-density of the transition distribution of the embedded Markov chain and for the ν\nu-density of the conditional inter-arrival time. We have sϑ(x,y,u)=qϑ(x,y)rϑ(x,y,u)s_{\vartheta}(x,y,u)=q_{\vartheta}(x,y)r_{\vartheta}(x,y,u). Hence the KL functional KS(P3)K_{S}(P_{3}) maximizes P3[logsϑ]=P2[logqϑ]+P3[logrϑ]P_{3}[\log s_{\vartheta}]=P_{2}[\log q_{\vartheta}]+P_{3}[\log r_{\vartheta}], and the partial maximum likelihood estimator ϑ^S\hat{\vartheta}_{S} maximizes 3[logsϑ]=2[logqϑ]+3[logrϑ]\mathbb{P}_{3}[\log s_{\vartheta}]=\mathbb{P}_{2}[\log q_{\vartheta}]+\mathbb{P}_{3}[\log r_{\vartheta}]. Write

σϑ(x,y,u)=ϑlogsϑ(x,y,u)=χϑ(x,y)+ϱϑ(x,y,u)\sigma_{\vartheta}(x,y,u)=\partial_{\vartheta}\log s_{\vartheta}(x,y,u)=\chi_{\vartheta}(x,y)+\varrho_{\vartheta}(x,y,u)

for the dd-dimensional vector of partial derivatives of logsϑ(x,y,u)\log s_{\vartheta}(x,y,u). Then KS(P3)K_{S}(P_{3}) solves P3[σϑ]=P2[χϑ]+P3[ϱϑ]=0P_{3}[\sigma_{\vartheta}]=P_{2}[\chi_{\vartheta}]+P_{3}[\varrho_{\vartheta}]=0, and ϑ^S\hat{\vartheta}_{S} solves 3[σϑ]=2[χϑ]+3[ϱϑ]=0\mathbb{P}_{3}[\sigma_{\vartheta}]=\mathbb{P}_{2}[\chi_{\vartheta}]+\mathbb{P}_{3}[\varrho_{\vartheta}]=0. Taylor expansions analogous to 9 and 15 imply

0\displaystyle 0 =\displaystyle= 3[σϑ^S]=1Nj=1Nσϑ^S(Xj1,Xj,Uj)\displaystyle\mathbb{P}_{3}[\sigma_{\hat{\vartheta}_{S}}]=\frac{1}{N}\sum_{j=1}^{N}\sigma_{\hat{\vartheta}_{S}}(X_{j-1},X_{j},U_{j})
=\displaystyle= 1Nj=1NσKS(P3)(Xj1,Xj,Uj)+1Nj=1Nσ˙KS(P3)(Xj1,Xj,Uj)(ϑ^SKS(P3))+op(n1/2),\displaystyle\frac{1}{N}\sum_{j=1}^{N}\sigma_{K_{S}(P_{3})}(X_{j-1},X_{j},U_{j})+\frac{1}{N}\sum_{j=1}^{N}\dot{\sigma}_{K_{S}(P_{3})}(X_{j-1},X_{j},U_{j})(\hat{\vartheta}_{S}-K_{S}(P_{3}))+o_{p}(n^{-1/2}),

where σ˙ϑ(x,y,u)=χ˙ϑ(x,y)+ϱ˙ϑ(x,y,u)\dot{\sigma}_{\vartheta}(x,y,u)=\dot{\chi}_{\vartheta}(x,y)+\dot{\varrho}_{\vartheta}(x,y,u) is the d×dd\times d matrix of partial derivatives of σϑ(x,y,u)\sigma_{\vartheta}(x,y,u). We obtain

n1/2(ϑ^SKS(P3))=m(P3[σ˙KS(P3)])1n1/2j=1NσKS(P3)(Xj1,Xj,Uj)+op(1).n^{1/2}(\hat{\vartheta}_{S}-K_{S}(P_{3}))=-m(P_{3}[\dot{\sigma}_{K_{S}(P_{3})}])^{-1}n^{-1/2}\sum_{j=1}^{N}\sigma_{K_{S}(P_{3})}(X_{j-1},X_{j},U_{j})+o_{p}(1). (18)

If Model S is correctly specified with Q=QϑQ=Q_{\vartheta} and R=RϑR=R_{\vartheta}, then KS(P3)=ϑK_{S}(P_{3})=\vartheta. From Sections 3 and 4 we obtain the Fisher information matrix for Model S as Iϑ+JϑI_{\vartheta}+J_{\vartheta}. Hence, for the correctly specified model, the maximum likelihood estimator ϑ^S\hat{\vartheta}_{S} has the stochastic expansion

n1/2(ϑ^Sϑ)=m(Iϑ+Jϑ)1n1/2j=1Nσϑ(Xj1,Xj,Uj)+op(1).n^{1/2}(\hat{\vartheta}_{S}-\vartheta)=m(I_{\vartheta}+J_{\vartheta})^{-1}n^{-1/2}\sum_{j=1}^{N}\sigma_{\vartheta}(X_{j-1},X_{j},U_{j})+o_{p}(1).

This means that ϑ^S\hat{\vartheta}_{S} is asymptotically linear with influence function m(Iϑ+Jϑ)1(χϑ,ϱϑ)m(I_{\vartheta}+J_{\vartheta})^{-1}(\chi_{\vartheta},\varrho_{\vartheta}), and n1/2(ϑ^Sϑ)n^{1/2}(\hat{\vartheta}_{S}-\vartheta) is asymptotically normal with covariance matrix m(Iϑ+Jϑ)1m(I_{\vartheta}+J_{\vartheta})^{-1}.

If the model is misspecified, then χKS(P3)\chi_{K_{S}(P_{3})} is not in VdV^{d} and ϱKS(P3)\varrho_{K_{S}(P_{3})} is not in WdW^{d}. We apply the martingale approximation 7 to 18 and see that ϑ^S\hat{\vartheta}_{S} is asymptotically linear with influence function

m(P3[σ˙KS(P3)])1(AχKS(P3)+ARϱKS(P3),ϱKS(P3)RϱKS(P3)).-m(P_{3}[\dot{\sigma}_{K_{S}(P_{3})}])^{-1}(A\chi_{K_{S}(P_{3})}+AR\varrho_{K_{S}(P_{3})},\varrho_{K_{S}(P_{3})}-R\varrho_{K_{S}(P_{3})}).

Hence n1/2(ϑ^SKS(P3))n^{1/2}(\hat{\vartheta}_{S}-K_{S}(P_{3})) is asymptotically normal with covariance matrix

m(P3[σ˙KS(P3)])1ΣS(P3[σ˙KS(P3)])1,m(P_{3}[\dot{\sigma}_{K_{S}(P_{3})}])^{-1}\Sigma_{S}(P_{3}[\dot{\sigma}_{K_{S}(P_{3})}])^{-1},

where

ΣS=P2[A(χKS(P3)+RϱKS(P3))A(χKS(P3)+RϱKS(P3))]+P3[(ϱKS(P3)RϱKS(P3))(ϱKS(P3)RϱKS(P3))].\Sigma_{S}=P_{2}[A(\chi_{K_{S}(P_{3})}+R\varrho_{K_{S}(P_{3})})A^{\top}(\chi_{K_{S}(P_{3})}+R\varrho_{K_{S}(P_{3})})]+P_{3}[(\varrho_{K_{S}(P_{3})}-R\varrho_{K_{S}(P_{3})})(\varrho_{K_{S}(P_{3})}-R\varrho_{K_{S}(P_{3})})^{\top}].

Let us now prove efficiency of ϑ^S\hat{\vartheta}_{S}, first for the correctly specified model. For cdc\in\mathbb{R}^{d} set ϑnc=ϑ+n1/2c\vartheta_{nc}=\vartheta+n^{-1/2}c. Assume that qnc=qϑncq_{nc}=q_{\vartheta_{nc}} is Hellinger differentiable 11 at ϑ\vartheta, and rnc=rϑncr_{nc}=r_{\vartheta_{nc}} is Hellinger differentiable 17 at ϑ\vartheta. Then the assumptions of Section 2 hold with Δ=Θ\Delta=\Theta, K=dK=\mathbb{R}^{d}, DQc=cχϑD_{Q}c=c^{\top}\chi_{\vartheta}, DRc=cϱϑD_{R}c=c^{\top}\varrho_{\vartheta}. The functional to be estimated is φ(ϑ)=ϑ\varphi(\vartheta)=\vartheta. The canonical gradient is obtained from 6 as m(Iϑ+Jϑ)1(χϑ,ϱϑ)m(I_{\vartheta}+J_{\vartheta})^{-1}(\chi_{\vartheta},\varrho_{\vartheta}). It equals the influence function of ϑ^S\hat{\vartheta}_{S}, which is therefore efficient in the correctly specified model.

Suppose now that the model is misspecified. Let 𝒬\mathcal{Q} be the set of all transition distributions of the embedded Markov chain, and let \mathcal{R} be the set of all transition distributions of the embedded Markov chain. For vVv\in V choose a sequence QnvQ_{nv} in 𝒬\mathcal{Q} that is Hellinger differentiable 13 at QQ. For wWw\in W choose a sequence RnwR_{nw} in \mathcal{R} that is Hellinger differentiable 12 at RR. Then the assumptions of Section 2 hold with Δ=𝒬×\Delta=\mathcal{Q}\times\mathcal{R}, K=V×WK=V\times W, DQ(v,w)=vD_{Q}(v,w)=v, DR(v,w)=wD_{R}(v,w)=w. The functional to be estimated is φ(Q,R)=KS(P3)\varphi(Q,R)=K_{S}(P_{3}). Similarly as in Section 4,

0=P3nvw[ϱKS(P3nvw)]=P3nvw[ϱKS(P3)]+P3nvw[ϱ˙KS(P3)](KS(P3nvw)KS(P3))+op(n1/2),\displaystyle 0=P_{3nvw}[\varrho_{K_{S}(P_{3nvw})}]=P_{3nvw}[\varrho_{K_{S}(P_{3})}]+P_{3nvw}[\dot{\varrho}_{K_{S}(P_{3})}](K_{S}(P_{3nvw})-K_{S}(P_{3}))+o_{p}(n^{-1/2}),
KS(P3nvw)KS(P3)=(P3[σ˙KS(P3)])1P3nvw[σKS(P3)]+op(n1/2),\displaystyle K_{S}(P_{3nvw})-K_{S}(P_{3})=-(P_{3}[\dot{\sigma}_{K_{S}(P_{3})}])^{-1}P_{3nvw}[\sigma_{K_{S}(P_{3})}]+o_{p}(n^{-1/2}),

and therefore

n1/2(KS(P3nvw)KR(P3))(P3[ϱ˙KS(P3)])1(P2[v(AχKS(P3)+ARϱKS(P3))]+P3[w(ϱKS(P3)RϱKS(P3))]).n^{1/2}(K_{S}(P_{3nvw})-K_{R}(P_{3}))\to-(P_{3}[\dot{\varrho}_{K_{S}(P_{3})}])^{-1}\Big{(}P_{2}[v(A\chi_{K_{S}(P_{3})}+AR\varrho_{K_{S}(P_{3})})]+P_{3}[w(\varrho_{K_{S}(P_{3})}-R\varrho_{K_{S}(P_{3})})]\Big{)}.

Hence by 6 the canonical gradient of KSK_{S} is obtained as

m(P3[σ˙KS(P3)])1(AχKS(P3)+ARϱKS(P3),ϱKS(P3)RϱKS(P3))-m(P_{3}[\dot{\sigma}_{K_{S}(P_{3})}])^{-1}(A\chi_{K_{S}(P_{3})}+AR\varrho_{K_{S}(P_{3})},\varrho_{K_{S}(P_{3})}-R\varrho_{K_{S}(P_{3})})

and equals the influence function of ϑ^S\hat{\vartheta}_{S}, which is therefore efficient for the misspecified model.

6 Remarks

In this section we comment on examples and possible extensions of our results.

1. If the distribution of the inter-arrival times charges only 1, so that R(x,y,du)=δ1(du)R(x,y,du)=\delta_{1}(du), then the semi-Markov process reduces to a Markov chain with transition distribution QQ, and for Model Q we recover the results of Greenwood and Wefelmeyer GW97 [15].

2. Our results carry over to observations (X0,T0),,(Xn,Tn)(X_{0},T_{0}),\dots,(X_{n},T_{n}) of the embedded Markov renewal process. Just replace NN by nn. In particular, instead of the central limit theorem 3 with random summation index NN, use

n1/2j=1nf(Xj1,Xj,Uj)(P3[f2])1/2Y,n^{-1/2}\sum_{j=1}^{n}f(X_{j-1},X_{j},U_{j})\Rightarrow(P_{3}[f^{2}])^{1/2}Y,

and replace mm by 1 everywhere.


In some examples we can describe the KL functional more explicitly.

3. Suppose the embedded Markov chain is a linear autoregressive model of order 1, i.e. Xj=ϑXj1+εjX_{j}=\vartheta X_{j-1}+\varepsilon_{j}, where ϑ\vartheta\in\mathbb{R} and the innovations εj\varepsilon_{j} are i.i.d. with mean 0, finite variance, and known density ff. Then Model Q holds with Q(x,dy)=f(yϑx)dyQ(x,dy)=f(y-\vartheta x)dy, and χϑ(x,y)=x(yϑx)\chi_{\vartheta}(x,y)=x\ell(y-\vartheta x) with =f/f\ell=-f^{\prime}/f. Hence the KL functional solves E[X0(X1ϑX0)]=0E[X_{0}\ell(X_{1}-\vartheta X_{0})]=0. If ff is the density of τY\tau Y for some τ>0\tau>0, then (x)=τ2x\ell(x)=\tau^{-2}x and E[X0(X1ϑX0)]=τ2(E[X0X1]ϑE[X02])E[X_{0}\ell(X_{1}-\vartheta X_{0})]=\tau^{-2}(E[X_{0}X_{1}]-\vartheta E[X_{0}^{2}]). Hence the KL functional is KQ(P2)=E[X0X1]/E[X02]K_{Q}(P_{2})=E[X_{0}X_{1}]/E[X_{0}^{2}], and the partial maximum likelihood estimator for ϑ\vartheta is the least squares estimator

ϑ^Q=KQ(2)=j=1NXj1Xjj=1NXj12,\hat{\vartheta}_{Q}=K_{Q}(\mathbb{P}_{2})=\frac{\sum_{j=1}^{N}X_{j-1}X_{j}}{\sum_{j=1}^{N}X_{j-1}^{2}},

a ratio of two empirical estimators.

4. Suppose the inter-arrival time UjU_{j} given Xj1=xX_{j-1}=x and Xj=yX_{j}=y is exponentially distributed with mean 1/λ(x)1/\lambda(x) not depending on yy,

R(x,y,du)=λ(x)exp(uλ(x))du.R(x,y,du)=\lambda(x)\exp(-u\lambda(x))du.

Then the semi-Markov process is a Markov step process. If the mean is constant, λ(x)=ϑ\lambda(x)=\vartheta, ϑ>0\vartheta>0, then Model R holds with Rϑ(x,y,du)=ϑexp(ϑu)R_{\vartheta}(x,y,du)=\vartheta\exp(\vartheta u), and ϱϑ(x,y,u)=ϑ1u\varrho_{\vartheta}(x,y,u)=\vartheta^{-1}-u. Hence the KL functional solves E[ϱϑ(X0,X1,U1)]=ϑ1E[U1]=0E[\varrho_{\vartheta}(X_{0},X_{1},U_{1})]=\vartheta^{-1}-E[U_{1}]=0, and we obtain KR(P3)=1/E[U1]K_{R}(P_{3})=1/E[U_{1}]. The partial maximum likelihood estimator for ϑ\vartheta is

ϑ^R=1/1Nj=1NUj,\hat{\vartheta}_{R}=1\Big{/}\frac{1}{N}\sum_{j=1}^{N}U_{j},

a function of an empirical estimator. Efficiency of empirical estimators in Markov step processes is studied in Greenwood and Wefelmeyer GW94 [12].


The models Q, R and S are described in terms of the conditional distributions Q(x,dy)Q(x,dy) and R(x,y,du)R(x,y,du). It is occasionally reasonable to model instead the marginal distributions P1P_{1}, P2P_{2} or P3P_{3}. Results for these three models differ considerably among each other and from Models Q, R and S.

5. Suppose we have a parametric model for the μ\mu-density p1ϑp_{1\vartheta} of P1P_{1}. The marginal maximum likelihood estimator ϑ^1\hat{\vartheta}_{1} maximizes

1[logp1ϑ]=1Nj=1Nlogp1ϑ(Xj1).\mathbb{P}_{1}[\log p_{1\vartheta}]=\frac{1}{N}\sum_{j=1}^{N}\log p_{1\vartheta}(X_{j-1}).

It estimates the KL functional K(P1)K(P_{1}), the parameter that maximizes P1[logp1ϑ]P_{1}[\log p_{1\vartheta}]. Note that the marginal maximum likelihood estimator is an empirical version of the KL functional, ϑ^1=K(1)\hat{\vartheta}_{1}=K(\mathbb{P}_{1}).

However, ϑ^1\hat{\vartheta}_{1} is not efficient for ϑ\vartheta when the marginal model is correctly specified. The reason is that the specification p1ϑp_{1\vartheta} of the marginal density implies a constraint on the conditional distribution QQ of the embedded Markov chain, but the marginal maximum likelihood estimator does not use this information. An efficient estimator for ϑ\vartheta is difficult to construct. See Kessler, Schick and Wefelmeyer KSW01 [24] for an efficient estimator of ϑ\vartheta in a Markov chain model with a (correctly specified) parametric model for the (one-dimensional) marginal density. On the other hand, ϑ^1\hat{\vartheta}_{1} is efficient for K(P1)K(P_{1}) in a nonparametric sense when the marginal model is misspecified.

We note that, in this respect, semi-Markov processes and Markov chains are different from the i.i.d. case. Suppose we have i.i.d. observations (Xj,Yj)(X_{j},Y_{j}) with joint distribution p1ϑ(x)dxQ(x,dy)p_{1\vartheta}(x)dx\,Q(x,dy), where QQ is unknown. Then QQ is not constrained by the marginal model p1ϑp_{1\vartheta}, and the marginal maximum likelihood estimator is efficient for ϑ\vartheta if the marginal model is correctly specified, and also efficient for K(P1)K(P_{1}) if the marginal model is misspecified.

6. Suppose we have a parametric model for the μ2\mu^{2}-density p2ϑp_{2\vartheta} of P2P_{2}. The marginal maximum likelihood estimator ϑ^2\hat{\vartheta}_{2} maximizes

2[logp2ϑ]=1Nj=1Nlogp2ϑ(Xj1,Xj).\mathbb{P}_{2}[\log p_{2\vartheta}]=\frac{1}{N}\sum_{j=1}^{N}\log p_{2\vartheta}(X_{j-1},X_{j}).

It estimates the KL functional K(P2)K(P_{2}), the parameter that maximizes P2[logp2ϑ]P_{2}[\log p_{2\vartheta}], and ϑ^2=K(2)\hat{\vartheta}_{2}=K(\mathbb{P}_{2}). The perturbation expansion 8 suggests that maximizing 2[logp2ϑ]\mathbb{P}_{2}[\log p_{2\vartheta}] is asymptotically equivalent to solving 2[Aχϑ]=0\mathbb{P}_{2}[A\chi_{\vartheta}]=0, and the martingale approximation 7 suggests that this is asymptotically equivalent to solving 2[χϑ]=0\mathbb{P}_{2}[\chi_{\vartheta}]=0. Hence the marginal maximum likelihood estimator ϑ^2\hat{\vartheta}_{2} is asymptotically equivalent to the conditional maximum likelihood estimator ϑ^Q\hat{\vartheta}_{Q} and therefore efficient in the correctly specified model p2ϑp_{2\vartheta}. The reason is that p2ϑ(x,y)=p1ϑ(x)qϑ(x,y)p_{2\vartheta}(x,y)=p_{1\vartheta}(x)q_{\vartheta}(x,y), and qϑ(x,y)q_{\vartheta}(x,y) determines p1ϑp_{1\vartheta}, which therefore does not contain additional information about ϑ\vartheta.

This is again different from the i.i.d. case. Suppose we have i.i.d. observations (Xj,Yj)(X_{j},Y_{j}) with joint density p1ϑ(x)qϑ(x,y)p_{1\vartheta}(x)q_{\vartheta}(x,y). Then p1ϑp_{1\vartheta} contains, in general, additional information about ϑ\vartheta.

7. Suppose we have a parametric model for the μ2ν\mu^{2}\otimes\nu-density p3ϑp_{3\vartheta} of P3P_{3}. The marginal maximum likelihood estimator ϑ^3\hat{\vartheta}_{3} maximizes

3[logp3ϑ]=1Nj=1Nlogp3ϑ(Xj1,Xj,Uj).\mathbb{P}_{3}[\log p_{3\vartheta}]=\frac{1}{N}\sum_{j=1}^{N}\log p_{3\vartheta}(X_{j-1},X_{j},U_{j}).

It estimates the KL functional K(P3)K(P_{3}), the parameter that maximizes P3[logp3ϑ]P_{3}[\log p_{3\vartheta}], and ϑ^3=K(3)\hat{\vartheta}_{3}=K(\mathbb{P}_{3}). We can write p3ϑ(x,y,u)=p2ϑ(x,y)rϑ(x,y,u)p_{3\vartheta}(x,y,u)=p_{2\vartheta}(x,y)r_{\vartheta}(x,y,u). Now rϑ(x,y,u)r_{\vartheta}(x,y,u) carries additional information about ϑ\vartheta, similarly as in the i.i.d. case.


8. Remark 5 tells us in particular the following, rather obvious, fact. If a parametric estimator is efficient in a nonparametric sense, then the reason is not that it is efficient in a parametric model. Rather, an estimator usually is nonparametrically efficient because it is a smooth function of an empirical estimator. We can illustrate this also with Model S. Suppose we have parametric models qϑq_{\vartheta} and rϑr_{\vartheta} for the densities of QQ and RR. Let ϑ^Q=KQ(2)\hat{\vartheta}_{Q}=K_{Q}(\mathbb{P}_{2}) be the conditional maximum likelihood estimator based on the model qϑq_{\vartheta} alone. In general, ϑ^Q\hat{\vartheta}_{Q} will not be efficient for ϑ\vartheta when model S is correctly specified, because ϑ^Q\hat{\vartheta}_{Q} does not use the information about ϑ\vartheta in the model rϑr_{\vartheta}. But if both qϑq_{\vartheta} and rϑr_{\vartheta} are misspecified, ϑ^Q\hat{\vartheta}_{Q} will be nonparametrically efficient for KQ(P2)K_{Q}(P_{2}), which is the KL functional for Model Q but not for Model S.

References

  • [1]
  • [2] Andrews, D. W. K. and Pollard, D., 1994, An introduction to functional central limit theorems for dependent stochastic processes. Internat. Statist. Rev. 62, 119–132.
  • [3] Bickel, P. J., 1993, Estimation in semiparametric models. In: (C. R. Rao, Ed.) Multivariate Analysis: Future Directions (Amsterdam: North-Holland), pp. 55–73 MR1246354
  • [4] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A., 1998, Efficient and Adaptive Estimation for Semiparametric Models (New York: Springer). MR1623559
  • [5] Bickel, P. J. and Kwon, J., 2001, Inference for semiparametric models: Some questions and an answer (with discussion). Statist. Sinica 11, 863–960. MR1867326
  • [6] Dahlhaus, R. and Wefelmeyer, W., 1996, Asymptotically optimal estimation in misspecified time series models. Ann. Statist. 24, 952–974. MR1401832
  • [7] Daniels, H. E., 1961, The asymptotic efficiency of a maximum likelihood estimator. Proc. Fourth Berkeley Sympos. Math. Statist. and Probability 1, 151–163. MR0131924
  • [8] Doksum, K., Ozeki, A., Kim, J. and Neto, E. C., 2007, Thinking outside the box: Statistical inference based on Kullback-Leibler empirical projections. Statist. Probab. Lett. 77, 1201-1213
  • [9] Dürr, D. and Goldstein, S., 1986, Remarks on the central limit theorem for weakly dependent random variables. In: (S. Albeverio, P. Blanchard and L. Streit, Eds.) Stochastic Processes — Mathematics and Physics, Lecture Notes in Mathematics 1158 (Berlin: Springer), pp. 104–118 MR0838560
  • [10] Gordin, M. I., 1969, The central limit theorem for stationary processes. Soviet Math. Dokl. 10, 1174–1176. MR0251785
  • [11] Gordin, M. I. and Lifšic, B. A. 1978, The central limit theorem for stationary Markov processes. Soviet Math. Dokl. 19, 392–394.
  • [12] Greenwood, P. E., Müller, U. U. and Wefelmeyer, W., 2004, Efficient estimation for semiparametric semi-Markov processes. Comm. Statist. Theory Methods 33, 419-435. MR2056947
  • [13] Greenwood, P. E. and Wefelmeyer, W., 1994, Nonparametric estimators for Markov step processes. Stochastic Process. Appl. 52, 1-16. MR1289165
  • [14] Greenwood, P. E. and Wefelmeyer, W., 1995, Efficiency of empirical estimators for Markov chains, Ann. Statist., 23, 132–143. MR1331660
  • [15] Greenwood, P. E. and Wefelmeyer, W., 1996, Empirical estimators for semi-Markov processes. Math. Meth. Statist. 5, 299-315. MR1417674
  • [16] Greenwood, P. E. and Wefelmeyer, W., 1997, Maximum likelihood estimator and Kullback–Leibler information in misspecified Markov chain models. Theory Probab. Appl. 42, 103–111. MR1453336
  • [17] Höpfner, R., 1993a, On statistics of Markov step processes: representation of log-likelihood ratio processes in filtered local models. Probab. Theory Related Fields 94, 375–398. MR1198653
  • [18] Höpfner, R., 1993b, Asymptotic inference for Markov step processes: observation up to a random time. Stochastic Process. Appl. 48, 295–310. MR1244547
  • [19] Höpfner, R., Jacod, J. and Ladelli, L., 1990, Local asymptotic normality and mixed normality for Markov statistical models. Probab. Theory Related Fields 86, 105–129. MR1061951
  • [20] Hosoya, Y., 1989, The bracketing condition for limit theorems on stationary linear processes. Ann. Statist. 17, 401–418. MR0981458
  • [21] Huber, P. J., 1967, The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Sympos. Math. Statist. and Probability 1, 221–233. MR0216620
  • [22] Kartashov, N. V., 1985a, Criteria for uniform ergodicity and strong stability of Markov chains with a common phase space. Theory Probab. Math. Statist. 30, 71–89.
  • [23] Kartashov, N. V., 1985b, Inequalities in theorems of ergodicity and stability for Markov chains with common phase space. I. Theory Probab. Appl. 30, 247–259.
  • [24] Kartashov, N. V., 1996, Strong Stable Markov Chains (Utrecht: VSP). MR1451375
  • [25] Kessler, M., Schick, A. and Wefelmeyer, W., 2001, The information in the marginal law of a Markov chain. Bernoulli 7, 243-266. MR1828505
  • [26] Kutoyants, Yu. A., 1988, On an identification problem for dynamical systems with small noise. Izv. Akad. Nauk Armyan. SSR 23, 270–285. MR0976484
  • [27] Kutoyants, Yu. A., 2004, Statistical Inference for Ergodic Diffusion Processes, Springer Series in Statistics (London: Springer). MR2144185
  • [28] Maigret, N., 1978, Théorème de limite centrale fonctionnel pour une chaîne de Markov récurrente au sens de Harris et positive. Ann. Inst. H. Poincaré Probab. Statist. 14, 425–440. MR0523221
  • [29] McKeague, I. W., 1984, Estimation for diffusion processes under misspecified models. J. Appl. Probab. 21, 511–520. MR0752016
  • [30] Meyn, S. P. and Tweedie, R. L., 1993, Markov Chains and Stochastic Stability (London: Springer). MR1287609
  • [31] Müller, U. U., 2007, Weighted least squares estimators in possibly misspecified nonlinear regression. Metrika 66, 39–59. MR2306376
  • [32] Ogata, Y., 1980, Maximum likelihood estimates of incorrect Markov models for time series and the derivation of AIC. J. Appl. Probab. 17, 59–72. MR0557435
  • [33] Penev, S., 1991, Efficient estimation of the stationary distribution for exponentially ergodic Markov chains. J. Statist. Plann. Inference 27, 105–123. MR1089356
  • [34] Pollard, D., 1985, New ways to prove central limit theorems. Econometric Theory 1, 295–314.
  • [35] Sin, C.-Y. and White, H., 1996, Information criteria for selecting possibly misspecified parametric models. J. Econometrics 71, 207–225. MR1381082
  • [36] White, H., 1982, Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25. MR0640163
  • [37] White, H., 1984, Maximum likelihood estimation of misspecified dynamic models. In: T. K. Dijkstra (Ed) Misspecification Analysis, Lecture Notes in Economics and Mathematical Systems 237 (Berlin: Springer), pp. 1–19. MR0791952
  • [38] White, H., 1994, Estimation, Inference and Specification Analysis, Econometric Society Monographs 22 (Cambridge: Cambridge University Press). MR1292251
  • [39]