This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Density power divergence for general integer-valued time series with exogenous covariates

Mamadou Lamine DIOP 111Supported by the MME-DII center of excellence (ANR-11-LABEX-0023-01) and William KENGNE 222Developed within the ANR BREAKRISK: ANR-17-CE26-0001-01 and the CY Initiative of Excellence (grant ”Investissements d’Avenir” ANR-16-IDEX-0008), Project ”EcoDep” PSI-AAP2020-0000000013

THEMA, CY Cergy Paris Université, 33 Boulevard du Port, 95011 Cergy-Pontoise Cedex, France.
E-mail: mamadou-lamine.diop@u-cergy.fr ; william.kengne@u-cergy.fr

Abstract: In this article, we study a robust estimation method for a general class of integer-valued time series models. The conditional distribution of the process belongs to a broad class of distribution and unlike classical autoregressive framework, the conditional mean of the process also depends on some exogenous covariates. We derive a robust inference procedure based on the minimum density power divergence. Under certain regularity conditions, we establish that the proposed estimator is consistent and asymptotically normal. In the case where the conditional distribution belongs to the exponential family, we provide sufficient conditions for the existence of a stationary and ergodic τ\tau-weakly dependent solution. Simulation experiments are conducted to illustrate the empirical performances of the estimator. An application to the number of transactions per minute for the stock Ericsson B is also provided.

Keywords: Robust estimation, minimum density power divergence, integer-valued time series models, exogenous covariates.

1 Introduction

The analysis of time series of counts has attracted much interest in the literature during the last two decades, given the large number of papers written in this direction. This is due among others to the applications in various fields: epidemiological surveillance (number of new infections), in finance (number of transactions), in industrial quality control (number of defects), and traffic accidents (number of road casualties), etc. To describe time series of count data, several questions have been addressed with various modeling approaches which are generally classified into two categories: observation-driven models and parameter-driven models (see Cox (1981)). One of the first important results on this topic were obtained independently by McKenzie (1985) and Al-Osh and Alzaid (1987); the INAR model has been introduced by using the binomial thinning operator. Due to its limitations, numerous extensions have been proposed; see, e.g., Al-Osh and Alzaid (1990). Later, new models with various marginal distributions and dependence structures have been studied by several authors; see among others, Fokianos et al. (2009), Doukhan et al. (2012, 2013), Doukhan and Kengne (2015), Davis and Liu (2016), Ahmad and Francq (2016), Douc et al. (2017), Fokianos et al. (2020) for some recent progress.

For most developments in the literature, the parametric inference is commonly based on the conditional maximum likelihood estimator (MLE) which provides a full asymptotic efficiency among regular estimators. However, it has been recognized that the MLE is very sensitive to small perturbations caused by outliers in the underlying model. Beran (1977) addressed this issue and was one of the first references in the literature to use the density-based minimum divergence methods. Numerous others works have been devoted to this topic; see, among others, Tamura and Boos (1986), Simpson (1987), Basu and Lindsay (1994), and Basu et al. (1998). In the context of modelling time series of counts, this question has already been investigated. For instance, Fokianos and Fried (2010, 2012) have studied the problem of intervention effects (that generating various types of outliers) in linear and log-linear Poisson autoregressive models. Fried et al. (2015) have proposed a Bayesian approach for handling additive outliers in INGARCH processes; they have used Metropolis-Hastings algorithm to estimate the parameters of the model. Recently, Kim and Lee (2017, 2019) adopted the approach of Basu et al. (1998) to construct a robust estimator based on the minimum density power divergence for zero-inflated Poisson autoregressive models, and a general integer-valued time series whose conditional distribution belongs to the one-parameter exponential family. See also Kang and Lee (2014) for application of such procedure to Poisson autoregressive models.

On the other hand, most of the models proposed for analysing time series of counts do not provide a framework within which we are able to analyze the possible dependence of the observations on a relevant exogenous covariates. In this vein, Davis and Wu (2009) have studied generalized linear models for time series of counts, where conditional on covariates, the observed process is modelled by a negative binomial distribution. Agosto et al. (2016) have developed a class of linear Poisson autoregressive models with exogenous covariates (PARX), where the parametric inference is based on the maximum likelihood method. Later, Pedersen and Rahbek (2018) proposed a theory for testing the significance of covariates in a class of PARX models. See also the recent work of Fokianos and Truquet (2019) which considered a class of categorical time series models with covariates and addressed stationarity and ergodicity question. The R package ”tscount” Liboschik et al. (2017) provides tools for analysis and modeling count time series models where the conditional mean of the process can take into account covariate effects.

In this new contribution, we consider a quite general class of observation-driven models for time series of counts with exogenous covariates. The mean of the discrete conditional distribution of YtY_{t} depends on the whole past observations, some relevant covariates and a finite dimensional parameter θ\theta^{*}. We study a robust estimator of θ\theta^{*} by using the minimum density power divergence estimator (MDPDE) proposed by Basu et al. (1998). Compared to that of Kim and Lee (2019), the framework considered here is more general: (i) the class of models has the ability to handle the dependence of the observations on exogenous covariates, (ii) the dependence through the past is of infinite order (which enables a large dependence structure and to consider INGARCH(p,qp,q)-type models) and (iii) even if in many applications the conditional distribution belongs to the exponential family, the class of the conditional distribution considered here is beyond the exponential family.

Recently, Aknouche A. and Francq (2020) consider a class of count time series with the conditional distributions that are not restricted to belong to the one-parameter exponential family and where the conditional mean depend on exogenous covariates. They provided conditions for stationarity and ergodicity. For both the linear and nonlinear conditional means considered by these authors, the feedback effects of the covariate is linear and it is an additive term in the conditional mean. We do not restrict to such setting and consider a larger dependence structure between the conditional means and the covariate. In the case where the conditional distribution belongs to the exponential family, we provide sufficient conditions that ensure the existence of a stationary and ergodic τ\tau-weakly dependent solution. In this sense of a large dependence to the covariate, our result is more general.

The paper is structured as follows. Section 2 contains the model specification and the construction of the robust estimator as well as the main results. Section 3 is devoted to the application of the general results to some examples of dynamic models. Some simulation results are displayed in Section 4 whereas Section 5 focus on applications on a real data example. The proofs of the main results are provided in Section 6.

2 Model specification and estimation

2.1 Model formulation

Suppose that {Yt,t}\{Y_{t},\,t\in\mathbb{Z}\} is a time series of counts and that Xt=(X1,t,X2,t,,Xdx,t)dxX_{t}=(X_{1,t},X_{2,t},\cdots,X_{d_{x},t})\in\mathbb{R}^{d_{x}} represents a vector of covariates with dxd_{x}\in\mathbb{N}. Denote by t1=σ{Yt1,;Xt1,}\mathcal{F}_{t-1}=\sigma\left\{Y_{t-1},\cdots;X_{t-1},\cdots\right\} the σ\sigma-field generated by the whole past at time t1t-1. Consider the general dynamic model where Yt|t1Y_{t}|\mathcal{F}_{t-1} follows a discrete distribution whose mean satisfying:

𝔼(Yt|t1)=λt(θ)=fθ(Yt1,Yt2,;Xt1,Xt2)\mathbb{E}(Y_{t}|\mathcal{F}_{t-1})=\lambda_{t}(\theta)=f_{\theta}(Y_{t-1},Y_{t-2},\cdots;X_{t-1},X_{t-2}\cdots) (2.1)

where fθf_{\theta} is a measurable non-negative function defined on 0×(dx)\mathbb{N}_{0}^{\mathbb{N}}\times(\mathbb{R}^{d_{x}})^{\mathbb{N}} (with 0={0}\mathbb{N}_{0}=\mathbb{N}\cup\{0\}) and assumed to be known up to the parameter θ\theta which belongs in a compact subset Θd\Theta\subset\mathbb{R}^{d} (dd\in\mathbb{N}). Let us note that when XtCX_{t}\equiv C (a constant), then the model (2.1) reduces to the classical integer-valued autoregressive model that has already been considered in the literature (see, e.g., Ahmad and Francq (2016)). In the sequel, we assume that the random variables Yt,tY_{t},~{}t\in\mathbb{Z} have the same distribution and denote by Gθ(|t1)G_{\theta}(\cdot|\mathcal{F}_{t-1}) the distribution of Yt|t1Y_{t}|\mathcal{F}_{t-1}; let g(|ηt)g(\cdot|\eta_{t}) be the probability density function of this distribution, where ηt\eta_{t} is the natural parameter of GθG_{\theta} given by ηt=η(λt(θ))\eta_{t}=\eta(\lambda_{t}(\theta)). Assume that the density g(|ηt(θ))g(\cdot|\eta_{t}(\theta)) is known up to the parameter θ\theta; and this density has a support set {y,g(y|ηt(θ))>0}\{y,~{}g(y|\eta_{t}(\theta))>0\} which is independent of θ\theta.

Throughout the sequel, the following norms will be used:

  • xsup1ip|xi|\left\|x\right\|\coloneqq\sup_{1\leq i\leq p}|x_{i}|, for any xpx\in\mathbb{R}^{p}, pp\in\mathbb{N};

  • fΘsupθΘ(f(θ))\left\|f\right\|_{\Theta}\coloneqq\sup_{\theta\in\Theta}\left(\left\|f(\theta)\right\|\right) for any function f:ΘMp,q()f:\Theta\longrightarrow M_{p,q}(\mathbb{R}), where Mp,q()M_{p,q}(\mathbb{R}) denotes the set of matrices of dimension p×qp\times q with coefficients in \mathbb{R}, for p,qp,q\in\mathbb{N};

  • Yr𝔼(Yr)1/r\left\|Y\right\|_{r}\coloneqq\mathbb{E}\left(\left\|Y\right\|^{r}\right)^{1/r}, if YY is a random vector with finite rr-order moments, for r>0r>0.

We set the following classical Lipschitz-type condition on the function fθf_{\theta}.

Assumption A(Θ)i{}_{i}(\Theta) (i=0,1,2i=0,1,2): For any (y,x)0×(y,x)\in\mathbb{N}_{0}^{\mathbb{N}}\times\mathbb{R}^{\infty}, the function θfθ(y,x)\theta\mapsto f_{\theta}(y,x) is ii times continuously differentiable on Θ\Theta with ifθ(0)/θiΘ<\left\|\partial^{i}f_{\theta}(0)/\partial\theta^{i}\right\|_{\Theta}<\infty; and there exists two sequences of non-negative real numbers (αk,Y(i))k1(\alpha^{(i)}_{k,Y})_{k\geq 1} and (αk,X(i))k1(\alpha^{(i)}_{k,X})_{k\geq 1} satisfying k=1αk,Y(0)<1\sum\limits_{k=1}^{\infty}\alpha^{(0)}_{k,Y}<1 (or k=1αk,Y(i)<\sum\limits_{k=1}^{\infty}\alpha^{(i)}_{k,Y}<\infty for i=1,2i=1,2) and k=1αk,X(i)<\sum\limits_{k=1}^{\infty}\alpha^{(i)}_{k,X}<\infty for i=0,1,2i=0,1,2; such that for any (y,x),(y,x)0×(dx)(y,x),(y^{\prime},x^{\prime})\in\mathbb{N}_{0}^{\mathbb{N}}\times(\mathbb{R}^{d_{x}})^{\mathbb{N}},

ifθ(y,x)θiifθ(y,x)θiΘk=1αk,Y(i)|ykyk|+k=1αk,X(i)xkxk.\Big{\|}\frac{\partial^{i}f_{\theta}(y,x)}{\partial\theta^{i}}-\frac{\partial^{i}f_{\theta}(y^{\prime},x^{\prime})}{\partial\theta^{i}}\Big{\|}_{\Theta}\leq\sum\limits_{k=1}^{\infty}\alpha^{(i)}_{k,Y}|y_{k}-y^{\prime}_{k}|+\sum\limits_{k=1}^{\infty}\alpha^{(i)}_{k,X}\left\|x_{k}-x^{\prime}_{k}\right\|.

where \|\cdot\| denotes any vector or matrix norm.

In the sequel, it is assumed for the general setting that there exists a stationary and ergodic process Yt=(Yt,λt,Xt)Y^{*}_{t}=(Y_{t},\lambda_{t},X_{t}) solution of (2.1); and

C,ϵ>0, such that t,𝔼Yt1+ϵ<C.\exists C,\epsilon>0,\text{ such that }\forall t\in\mathbb{Z},~{}~{}\mathbb{E}\left\|Y^{*}_{t}\right\|^{1+\epsilon}<C. (2.2)

In the case where the distribution of Yt|t1Y_{t}|\mathcal{F}_{t-1} belongs to the one-parameter exponential family, the existence of such solution is established (see Proposition 3.1).

2.2 Minimum power divergence estimator

In this subsection, we briefly describe the use of the density power divergence to obtain an estimation of the parameters of the model (2.1). The asymptotic behavior of the estimated parameter is also studied. Assume that the observations (Y1,X1),,(Yn,Xn)(Y_{1},X_{1}),\cdots,(Y_{n},X_{n}) are generated from (2.1) according to the true parameter θΘ\theta^{*}\in\Theta which is unknown; i.e., g(|ηt(θ))g(\cdot|\eta_{t}(\theta^{*})) is the true conditional density of Yt|t1Y_{t}|\mathcal{F}_{t-1}. Let 𝔾={g(|ηt(θ));θΘ}\mathbb{G}=\left\{g(\cdot|\eta_{t}(\theta));~{}\theta\in\Theta\right\} be the parametric family of density functions indexed by θΘ\theta\in\Theta. To estimate θ\theta^{*}, Basu et al. (1998) have proposed a method which consists to choose the ”best approximating distribution” of Yt|t1Y_{t}|\mathcal{F}_{t-1} in the family 𝔾\mathbb{G} by minimizing a divergence dαd_{\alpha} between density functions g(|ηt(θ))g(\cdot|\eta_{t}(\theta)) and g(|ηt(θ))g(\cdot|\eta_{t}(\theta^{*})). The density power divergence dαd_{\alpha} between two density functions gg and gg_{*} is defined by (in the discrete set-up)

dα(g,g)={y=0{g1+α(y)(1+1α)g(y)gα(y)+1αg1+α(y)},α>0,y=0{g(y)(logg(y)logg(y))},α=0.d_{\alpha}(g,g_{*})=\left\{\begin{array}[]{ll}\sum\limits_{y=0}^{\infty}\left\{g^{1+\alpha}(y)-\big{(}1+\frac{1}{\alpha}\big{)}g_{*}(y)g^{\alpha}(y)+\frac{1}{\alpha}g^{1+\alpha}_{*}(y)\right\},&\alpha>0,\\ \\ \sum\limits_{y=0}^{\infty}\big{\{}g_{*}(y)\big{(}\log g_{*}(y)-\log g(y)\big{)}\big{\}},&\alpha=0.\\ \end{array}\right.

So, the empirical objective function (based on the divergence between conditional density functions) up to some terms which are independent of θ\theta is Hα,n(θ)=1nt=1nα,t(θ)H_{\alpha,n}(\theta)=\frac{1}{n}\sum_{t=1}^{n}\ell_{\alpha,t}(\theta) where

α,t(θ)={y=0g(y|ηt(θ))1+α(1+1α)g(Yt|ηt(θ))α,α>0,logg(Yt|ηt(θ)),α=0,\ell_{\alpha,t}(\theta)=\left\{\begin{array}[]{ll}\sum\limits_{y=0}^{\infty}g(y|\eta_{t}(\theta))^{1+\alpha}-\big{(}1+\frac{1}{\alpha}\big{)}g(Y_{t}|\eta_{t}(\theta))^{\alpha},&\alpha>0,\\ \\ -\log g(Y_{t}|\eta_{t}(\theta)),&\alpha=0,\\ \end{array}\right.

with ηt(θ)=η(λt(θ))\eta_{t}(\theta)=\eta(\lambda_{t}(\theta)) and λt(θ)=fθ(Yt1,Yt2,;Xt1,Xt2)\lambda_{t}(\theta)=f_{\theta}(Y_{t-1},Y_{t-2},\cdots;X_{t-1},X_{t-2}\cdots). Since (Y0,X0),(Y1,X1),(Y_{0},X_{0}),(Y_{-1},X_{-1}),\cdots are not observed, Hα,n(θ)H_{\alpha,n}(\theta) is approximated by H^α,n(θ)=1nt=1n^α,t(θ)\widehat{H}_{\alpha,n}(\theta)=\frac{1}{n}\sum\limits_{t=1}^{n}\widehat{\ell}_{\alpha,t}(\theta), where

^α,t(θ)={y=0g(y|η^t(θ))1+α(1+1α)g(Yt|η^t(θ))α,α>0,logg(Yt|η^t(θ)),α=0,\widehat{\ell}_{\alpha,t}(\theta)=\left\{\begin{array}[]{ll}\sum\limits_{y=0}^{\infty}g(y|\widehat{\eta}_{t}(\theta))^{1+\alpha}-\big{(}1+\frac{1}{\alpha}\big{)}g(Y_{t}|\widehat{\eta}_{t}(\theta))^{\alpha},&\alpha>0,\\ \\ -\log g(Y_{t}|\widehat{\eta}_{t}(\theta)),&\alpha=0,\\ \end{array}\right.

with η^t(θ)=η(λ^t(θ))\widehat{\eta}_{t}(\theta)=\eta(\widehat{\lambda}_{t}(\theta)) and λ^t(θ)=fθ(Yt1,,Y1,0,;Xt1,,X1,0,)\widehat{\lambda}_{t}(\theta)=f_{\theta}(Y_{t-1},\cdots,Y_{1},0,\cdots;X_{t-1},\cdots,X_{1},0,\cdots). Therefore, the MDPDE of θ\theta^{*} is defined by (cf. Basu et al. (1998) and Kim and Lee (2019))

θ^α,n=argminθΘ(H^α,n(θ)).\widehat{\theta}_{\alpha,n}=\underset{\theta\in\Theta}{\text{argmin}}(\widehat{H}_{\alpha,n}(\theta)).

Let us recall that when α=0\alpha=0, the MDPDE corresponds to the MLE.

We need the following regularity assumptions to study the consistency and the asymptotic normality of the MDPDE.

  1. (A0):

    for all θΘ\theta\in\Theta, (fθ(Yt1,Yt2,;Xt1,Xt2,)=fθ(Yt1,Yt2,;Xt1,Xt2,)a.s. for some t)θ=θ\big{(}f_{\theta}(Y_{t-1},Y_{t-2},\cdots;X_{t-1},X_{t-2},\cdots)=f_{\theta^{*}}(Y_{t-1},Y_{t-2},\cdots;X_{t-1},X_{t-2},\cdots)\ \text{a.s.}~{}\text{ for some }t\in\mathbb{Z}\big{)}\Rightarrow~{}\theta=\theta^{*}; moreover, c¯>0\exists\underline{c}>0 such that infθΘfθ(y1,;x1,)c¯\displaystyle\inf_{\theta\in\Theta}f_{\theta}(y_{1},\cdots;x_{1},\cdots)\geq\underline{c} for all (y,x)0×(y,x)\in\mathbb{N}_{0}^{\mathbb{N}}\times\mathbb{R}^{\infty} ;

  2. (A1):

    θ\theta^{*} is an interior point in the compact parameter space Θd\Theta\subset\mathbb{R}^{d};

  3. (A2):

    There exists a constant λ¯>0\underline{\lambda}>0 such that 𝔼λt(θ)Θ4=λ¯<~{}\mathbb{E}\left\|\lambda_{t}(\theta)\right\|^{4}_{\Theta}=\underline{\lambda}<\infty, for all t1t\geq 1;

  4. (A3):

    for all y0y\in\mathbb{N}_{0}, the function ηg(y|η)\eta\mapsto g(y|\eta) is twice continuously differentiable on \mathbb{R} and for some η,η\eta,\eta^{\prime}\in\mathbb{R}, (g(y|η)=g(y|η)y0)η=η\big{(}g(y|\eta)=g(y|\eta^{\prime})~{}\forall y\in\mathbb{N}_{0}\big{)}\Rightarrow\eta=\eta^{\prime};

  5. (A4):

    the mapping ηφ(η)=y=0|g(y|η)η|\eta\mapsto\varphi(\eta)=\sum_{y=0}^{\infty}\left|\frac{\partial g(y|\eta)}{\partial\eta}\right| is well definite on \mathbb{R} and for all t1t\geq 1, there exists a constant φ¯t>0\overline{\varphi}_{t}>0 such that sup0δ1φ(δηt(θ)+(1δ)η^t(θ))Θ2φ¯t<\underset{0\leq\delta\leq 1}{\sup}\left\|\left\|\varphi\left(\delta\eta_{t}(\theta)+(1-\delta)\widehat{\eta}_{t}(\theta)\right)\right\|_{\Theta}\right\|_{2}\leq\overline{\varphi}_{t}<\infty;

  6. (A5):

    for all t1t\geq 1, the mapping ηψ(η)=|1g(Yt|η)g(Yt|η)η|\eta\mapsto\psi(\eta)=\left|\frac{1}{g(Y_{t}|\eta)}\frac{\partial g(Y_{t}|\eta)}{\partial\eta}\right| is well definite on \mathbb{R} and there exists a constant ψ¯t>0\overline{\psi}_{t}>0 such that sup0δ1ψ(δηt(θ)+(1δ)η^t(θ))Θ2ψ¯t<\underset{0\leq\delta\leq 1}{\sup}\left\|\left\|\psi\left(\delta\eta_{t}(\theta)+(1-\delta)\widehat{\eta}_{t}(\theta)\right)\right\|_{\Theta}\right\|_{2}\leq\overline{\psi}_{t}<\infty;

  7. (A6):

    the mapping λη(λ)\lambda\mapsto\eta(\lambda) (defined in [c¯,+[[\underline{c},\,+\infty[) satisfying the Lipschitz condition: there exists a constant cη>0c_{\eta}>0 such that, for all λ,λ>0\lambda,\lambda^{\prime}>0, |η(λ)η(λ)|cη|λλ||\eta(\lambda)-\eta(\lambda^{\prime})|\leq c_{\eta}|\lambda-\lambda^{\prime}|; moreover, (η(λ)=η(λ))λ=λ\big{(}\eta(\lambda)=\eta(\lambda^{\prime})\big{)}\Rightarrow\lambda=\lambda^{\prime}.

  8. (A7):

    there exists two non-negative constants h¯α\overline{h}_{\alpha} and m¯α\overline{m}_{\alpha} such that the mappings:

    λhα(λ)=(1+α)η(λ)[y=0g(y|η(λ))ηg(y|η(λ))αg(Yt|η(λ))ηg(Yt|η(λ))α1]\displaystyle\lambda\mapsto h_{\alpha}(\lambda)=(1+\alpha)\eta^{\prime}(\lambda)\bigg{[}\underset{y=0}{\overset{\infty}{\sum}}\frac{\partial g(y|\eta(\lambda))}{\partial\eta}g(y|\eta(\lambda))^{\alpha}-\frac{\partial g(Y_{t}|\eta(\lambda))}{\partial\eta}g(Y_{t}|\eta(\lambda))^{\alpha-1}\bigg{]} (2.3)
    and
    λmα(λ)=hα(λ)λ\displaystyle\lambda\mapsto m_{\alpha}(\lambda)=\frac{\partial h_{\alpha}(\lambda)}{\partial\lambda} (2.4)

    satisfy hα(λt(θ))Θ22h¯α\big{\|}\left\|h_{\alpha}(\lambda_{t}(\theta))\right\|^{2}_{\Theta}\big{\|}_{2}\leq\overline{h}_{\alpha}~{} and sup0δ1mα(δλt(θ)+(1δ)λ^t(θ))Θ2m¯α~{}\underset{0\leq\delta\leq 1}{\sup}\big{\|}\|m_{\alpha}\left(\delta\lambda_{t}(\theta)+(1-\delta)\widehat{\lambda}_{t}(\theta)\right)\|_{\Theta}\big{\|}_{2}\leq\overline{m}_{\alpha}, for all t1t\geq 1;

  9. (A8):

    for all cTc^{T}\in\mathbb{R}, cTλt(θ)θ=0c^{T}\frac{\partial\lambda_{t}(\theta^{*})}{\partial\theta}=0 a.s cT=0\Longrightarrow~{}c^{T}=0, where T denotes the transpose.

Assumption (A0) is an identifiability condition. The conditions (A3)-(A7) allow us to unify the theory and the treatment for a class of distributions that belongs or not to the exponential family. In the class of the exponential family distribution, these conditions can be written in a simpler form (see the Subsection 3.1). The other conditions are standard in this framework and can also be found in many studies; see for instance, Kim and Lee (2019). As detailed in Section 3, all these conditions are satisfied for many classical models.

The following theorem gives the consistency of the estimator θ^α,n\widehat{\theta}_{\alpha,n}.

Theorem 2.1

Assume that (A0)-(A6), (A(Θ)0{}_{0}(\Theta)) and (2.2) (with ϵ>2\epsilon>2) hold with

αk,Y(0)+αk,X(0)=𝒪(kγ) for some γ>1; and k11kγmax(φ¯k,ψ¯k)<.\alpha^{(0)}_{k,Y}+\alpha^{(0)}_{k,X}=\mathcal{O}(k^{-\gamma})\text{ for some }\gamma>1;\text{ and }\sum_{k\geq 1}\frac{1}{k^{\gamma}}\max\left(\overline{\varphi}_{k},\overline{\psi}_{k}\right)<\infty. (2.5)

Then

θ^α,na.s.nθ.\widehat{\theta}_{\alpha,n}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}\theta^{*}.

The following theorem gives the asymptotic normality of θ^α,n\widehat{\theta}_{\alpha,n}.

Theorem 2.2

Assume that (A0)-(A8), (A(Θ)i{}_{i}(\Theta)) (for i=0,1,2i=0,1,2), (2.2) (with ϵ>3\epsilon>3) with

αk,Y(0)+αk,X(0)+αk,Y(1)+αk,X(1)=𝒪(kγ) for some γ>3/2.\alpha^{(0)}_{k,Y}+\alpha^{(0)}_{k,X}+\alpha^{(1)}_{k,Y}+\alpha^{(1)}_{k,X}=\mathcal{O}(k^{-\gamma})\text{ for some }\gamma>3/2. (2.6)

Then

n(θ^α,nθ)𝒟n𝒩d(0,Σα) with Σα=Jα1IαJα1,\sqrt{n}\left(\widehat{\theta}_{\alpha,n}-\theta^{*}\right)\begin{array}[t]{c}\stackrel{{\scriptstyle{\cal D}}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}\mathcal{N}_{d}\left(0,\Sigma_{\alpha}\right)\ \text{ with }\ \Sigma_{\alpha}=J^{-1}_{\alpha}I_{\alpha}J^{-1}_{\alpha},

where Jα=𝔼(2α,1(θ)θθT) and Iα=𝔼[(α,1(θ)θ)(α,1(θ)θ)T].J_{\alpha}=-\mathbb{E}\Big{(}\frac{\partial^{2}\ell_{\alpha,1}(\theta^{*})}{\partial\theta\partial\theta^{T}}\Big{)}\text{ and }I_{\alpha}=\mathbb{E}\Big{[}\Big{(}\frac{\partial\ell_{\alpha,1}(\theta^{*})}{\partial\theta}\Big{)}\Big{(}\frac{\partial\ell_{\alpha,1}(\theta^{*})}{\partial\theta}\Big{)}^{T}\Big{]}.

The trade-off between the robustness and the efficiency is controlled by the tuning parameter α\alpha. As pointed out in numerous works (see for instance, Basu et al. (1998)), it is found that the estimators with large α\alpha have strong robustness properties while small value of α\alpha is suitable when the efficiency is preferred. So, the procedure is typically less efficient when α\alpha increases. In the empirical studies, we will consider the value of α\alpha between zero and one, and models with condition distribution belonging to the one-parameter exponential family.

3 Examples

In this section, we give some particular cases of class of integer-valued time series defined in (2.1). We show that the regularity conditions required for the asymptotic results of the previous section are satisfied for these models. Throughout the sequel, we consider that Xt=(X1,t,X2,t,,Xdx,t)dxX_{t}=(X_{1,t},X_{2,t},\cdots,X_{d_{x},t})\in\mathbb{R}^{d_{x}} (dxd_{x}\in\mathbb{N}) represents a vector of covariates, θ\theta belongs to a compact set Θd\Theta\subset\mathbb{R}^{d} (dd\in\mathbb{N}) and CC denotes a positive constant whom value may differ from one inequality to another. For any θΘ\theta\in\Theta, we will use the notation ηt,δ(θ)δηt(θ)+(1δ)η^t(θ)\eta_{t,\delta}(\theta)\coloneqq\delta\eta_{t}(\theta)+(1-\delta)\widehat{\eta}_{t}(\theta) (with 0δ10\leq\delta\leq 1) for a given element between η^t(θ)\widehat{\eta}_{t}(\theta) and ηt(θ)\eta_{t}(\theta).

3.1 A general model with the exponential family distribution

As a first example, we consider a process {Yt,t}\{Y_{t},\,t\in\mathbb{Z}\} satisfying:

Yt|t1g(y|ηt)withλt𝔼(Yt|t1)=fθ(Yt1,;Xt1,)Y_{t}|\mathcal{F}_{t-1}\sim g(y|\eta_{t})~{}~{}\textrm{with}~{}~{}\lambda_{t}\coloneqq\mathbb{E}(Y_{t}|\mathcal{F}_{t-1})=f_{\theta}(Y_{t-1},\cdots;X_{t-1},\cdots) (3.1)

where g(|)g(\cdot|\cdot) is a discrete distribution that belongs to the one-parameter exponential family; that is,

g(y|η)=exp{ηyA(η)}h(y)g(y|\eta)=\exp\left\{\eta y-A(\eta)\right\}h(y)

where η\eta is the natural parameter (i.e., ηt\eta_{t} is the natural parameter of the distribution of Yt|t1Y_{t}|\mathcal{F}_{t-1}), A(η)A(\eta), h(y)h(y) are known functions and fθ()f_{\theta}(\cdot) is a non-negative function defined on 0×\mathbb{N}^{\infty}_{0}\times\mathbb{R}^{\infty}, assumed to be know up to θ\theta. Let us set B(η)=A(η)B(\eta)=A^{\prime}(\eta) the derivative of A(η)A(\eta) (which is assumed to exist a well as the second order derivative); it is known that 𝔼(Yt|t1)=B(ηt)\mathbb{E}(Y_{t}|\mathcal{F}_{t-1})=B(\eta_{t}). Therefore, the model (3.1) can be seen as a particular case of (2.1), where η(λ)=B1(λ)\eta(\lambda)=B^{-1}(\lambda). Similar models without covariates have been studied by Davis and Liu (2016) (with the MLE) and Kim and Lee (2019) (with the MDPDE) where the conditional mean of YtY_{t} depends only on (Yt1,λt1)(Y_{t-1},\lambda_{t-1}). Cui and Zheng (2017) carried out the model (3.1) without covariates and proved the existence of a stationary, ergodic and τ\tau-weakly dependent solution; they also considered inference based on the conditional maximum likelihood estimator. The following proposition establishes the existence of a unique solution for the class of model (3.1) with covariates. Let us impose a autoregressive-type structure on the covariates:

Xt=u(Xt1,Xt2,,εt),X_{t}=u(X_{t-1},X_{t-2},\cdots,\varepsilon_{t}), (3.2)

where (εt\varepsilon_{t}) is a sequence of independent and identically distributed (i.i.d) random variables and u(x;εt)u(x;\varepsilon_{t}) a function with values in dx\mathbb{R}^{d_{x}}, satisfying

𝔼u(0;εt)< and 𝔼u(x;εt)u(x;εt)k1αk(u)xkxk for all x,x(dx),\mathbb{E}\|u(0;\varepsilon_{t})\|<\infty~{}\text{ and }~{}\mathbb{E}\|u(x;\varepsilon_{t})-u(x^{\prime};\varepsilon_{t})\|\leq\sum_{k\geq 1}\alpha_{k}(u)\|x_{k}-x^{\prime}_{k}\|~{}\text{ for all }x,x^{\prime}\in(\mathbb{R}^{d_{x}})^{\mathbb{N}}, (3.3)

for some non-negative sequence (αk(u))k1(\alpha_{k}(u))_{k\geq 1} such that k1max{αk(u),αk,Y(0)}<1\sum_{k\geq 1}\max\big{\{}\alpha_{k}(u),\alpha_{k,Y}^{(0)}\big{\}}<1.

Proposition 3.1

Assume that A(Θ)0{}_{0}(\Theta) and (3.3) holds. Then there exists a τ\tau-weakly dependent stationary solution Yt=(Yt,λt,Xt)Y^{*}_{t}=(Y_{t},\lambda_{t},X_{t}) of (3.1) and (3.2) such that 𝔼Yt<\mathbb{E}\|Y^{*}_{t}\|<\infty.

For the model (3.1), let us provide some sufficient conditions for the assumptions (A2)-(A7).

  • According to Assumption A(Θ)0{}_{0}(\Theta), for all t1t\geq 1, we have

    λt(θ)Θ\displaystyle\|\lambda_{t}(\theta)\|_{\Theta} fθ(0)Θ+fθ(0)fθ(Yt1,;Xt1,)Θ\displaystyle\leq\left\|f_{\theta}(0)\right\|_{\Theta}+\left\|f_{\theta}(0)-f_{\theta}(Y_{t-1},\cdots;X_{t-1},\cdots)\right\|_{\Theta}
    fθ(0)Θ+1α,Y(0)Yt+1α,X(0)Xt.\displaystyle\leq\left\|f_{\theta}(0)\right\|_{\Theta}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|. (3.4)

    Thus,

    λt(θ)Θ4\displaystyle\left\|\|\lambda_{t}(\theta)\|_{\Theta}\right\|_{4} fθ(0)Θ4+1α,Y(0)Yt4+1α,X(0)Xt4\displaystyle\leq\left\|\left\|f_{\theta}(0)\right\|_{\Theta}\right\|_{4}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}\left\|Y_{t-\ell}\right\|_{4}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|\left\|X_{t-\ell}\right\|\right\|_{4}
    C+C(1α,Y(0)+1α,X(0))<(from the assumption (2.2) with ϵ>3).\displaystyle\leq C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)}<\infty~{}~{}(\text{from the assumption (\ref{moment}) with }\epsilon>3).

    Hence, (A2) is satisfied.

  • Clearly, (A3) is satisfied.

  • According to the above notations, for any θΘ\theta\in\Theta, δ[0,1]\delta\in[0,1], we have

    φ(ηt,δ(θ))\displaystyle\varphi(\eta_{t,\delta}(\theta)) =y=0|(yB(ηt,δ(θ)))g(y|ηt,δ(θ))|2B(ηt,δ(θ)).\displaystyle=\sum_{y=0}^{\infty}\left|(y-B(\eta_{t,\delta}(\theta)))g(y|\eta_{t,\delta}(\theta))\right|\leq 2B(\eta_{t,\delta}(\theta)).

    Since the function BB is strictly increasing (because Var(Yt|t1)=B(ηt)>0(Y_{t}|\mathcal{F}_{t-1})=B^{\prime}(\eta_{t})>0), we deduce that

    φ(ηt,δ(θ))Θ2(B(ηt(θ))Θ+B(η^t(θ))B(ηt(θ))Θ)=2(λt(θ)Θ+λ^t(θ)λt(θ)Θ).\left\|\varphi(\eta_{t,\delta}(\theta))\right\|_{\Theta}\leq 2\left(\left\|B(\eta_{t}(\theta))\right\|_{\Theta}+\left\|B(\widehat{\eta}_{t}(\theta))-B(\eta_{t}(\theta))\right\|_{\Theta}\right)=2\left(\|\lambda_{t}(\theta)\|_{\Theta}+\|\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\|_{\Theta}\right). (3.5)

    Moreover, Assumption A(Θ)0{}_{0}(\Theta) implies

    λ^t(θ)λt(θ)Θ\displaystyle\|\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\|_{\Theta} fθ(Yt1,,Y1,0,;Xt1,,X1,0,)fθ(Yt1,;Xt1,)Θ\displaystyle\leq\left\|f_{\theta}(Y_{t-1},\cdots,Y_{1},0,\cdots;X_{t-1},\cdots,X_{1},0,\cdots)-f_{\theta}(Y_{t-1},\cdots;X_{t-1},\cdots)\right\|_{\Theta}
    tα,Y(0)Yt+tα,X(0)Xt\displaystyle\leq\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\| (3.6)
    1α,Y(0)Yt+1α,X(0)Xt.\displaystyle\leq\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|. (3.7)

    Thus, from (3.4), (3.5) and (3.7), for t1t\geq 1, we get

    φ(ηt,δ(θ))Θ2\displaystyle\left\|\left\|\varphi(\eta_{t,\delta}(\theta))\right\|_{\Theta}\right\|_{2} 2(fθ(0)Θ2+21α,Y(0)Yt2+21α,X(0)Xt2)\displaystyle\leq 2\Big{(}\left\|\left\|f_{\theta}(0)\right\|_{\Theta}\right\|_{2}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}\|Y_{t-\ell}\|_{2}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|\|X_{t-\ell}\right\|\|_{2}\Big{)}
    C+C(1α,Y(0)+1α,X(0))<(from the stationary assumption).\displaystyle\leq C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)}<\infty~{}(\text{from the stationary assumption}).

    Therefore, (A4) is satisfied with φ¯t=C+C(1α,Y(0)+1α,X(0))\overline{\varphi}_{t}=C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)} is constant. In addition, remark that

    ψ(ηt,δ(θ))Θ=(YtB(ηt,δ(θ)))ΘYt+λt(θ)Θ+λ^t(θ)λt(θ)Θ.\left\|\psi(\eta_{t,\delta}(\theta))\right\|_{\Theta}=\left\|(Y_{t}-B(\eta_{t,\delta}(\theta)))\right\|_{\Theta}\leq Y_{t}+\|\lambda_{t}(\theta)\|_{\Theta}+\|\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\|_{\Theta}.

    Thus, from (3.4), (3.7), we deduce

    ψ(ηt,δ(θ))Θ2\displaystyle\left\|\left\|\psi(\eta_{t,\delta}(\theta))\right\|_{\Theta}\right\|_{2} Yt2+fθ(0)Θ2+21α,Y(0)Yt2+21α,X(0)Xt2\displaystyle\leq\|Y_{t}\|_{2}+\left\|\left\|f_{\theta}(0)\right\|_{\Theta}\right\|_{2}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}\|Y_{t-\ell}\|_{2}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|\|X_{t-\ell}\right\|\|_{2}
    C+C(1α,Y(0)+1α,X(0))<.\displaystyle\leq C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)}<\infty.

    Hence, (A5) is satisfied with ψ¯t=C+C(1α,Y(0)+1α,X(0))\overline{\psi}_{t}=C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)}.

  • To verify (A6), remark that for the model (3.1), η(λ)=1B(B1(λ))\eta^{\prime}(\lambda)=\frac{1}{B^{\prime}(B^{-1}(\lambda))}. Moreover, since BB is strictly increasing in η\eta, B1B^{-1} is also strictly increasing in λ\lambda. Then, from (A0), |η(λ)|1B(B1(c¯))\left|\eta^{\prime}(\lambda)\right|\leq\frac{1}{B^{\prime}(B^{-1}(\underline{c}))}, for some c¯>0\underline{c}>0, which implies that η\eta^{\prime} is bounded. Thus, the function η\eta satisfies the Lipschitz condition, which shows that (A6) holds. Clearly, the second part of (A6) is satisfied.

  • Now, let us show that (A7) is satisfied. From (A0), for all θΘ\theta\in\Theta, we have

    |hα(λt(θ))|\displaystyle\left|h_{\alpha}(\lambda_{t}(\theta))\right| =(1+α)B(B1(λt(θ)))|y=0(yλt(θ))g(y|η(λt(θ)))α+1+(Ytλt(θ))g(Yt|η(λt(θ)))α|\displaystyle=\frac{(1+\alpha)}{B^{\prime}(B^{-1}(\lambda_{t}(\theta)))}\Big{|}\underset{y=0}{\overset{\infty}{\sum}}(y-\lambda_{t}(\theta))g(y|\eta(\lambda_{t}(\theta)))^{\alpha+1}+(Y_{t}-\lambda_{t}(\theta))g(Y_{t}|\eta(\lambda_{t}(\theta)))^{\alpha}\Big{|}
    (1+α)B(B1(c¯))(y=0(y+λt(θ))g(y|η(λt(θ)))+(Yt+λt(θ)))\displaystyle\leq\frac{(1+\alpha)}{B^{\prime}(B^{-1}(\underline{c}))}\Big{(}\underset{y=0}{\overset{\infty}{\sum}}(y+\lambda_{t}(\theta))g(y|\eta(\lambda_{t}(\theta)))+(Y_{t}+\lambda_{t}(\theta))\Big{)}
    C(Yt+3λt(θ)).\displaystyle\leq C\left(Y_{t}+3\lambda_{t}(\theta)\right).

    Therefore,

    hα(λt(θ))Θ2CYt+3λt(θ)Θ2C(Yt2+6Ytλt(θ)Θ+9λt(θ)Θ2).\left\|h_{\alpha}(\lambda_{t}(\theta))\right\|^{2}_{\Theta}\leq C\|Y_{t}+3\lambda_{t}(\theta)\|^{2}_{\Theta}\leq C(Y^{2}_{t}+6Y_{t}\|\lambda_{t}(\theta)\|_{\Theta}+9\|\lambda_{t}(\theta)\|^{2}_{\Theta}). (3.8)

    By applying the Hölder’s inequality to the second term of the right hand side of (3.8 ), we get

    hα(λt(θ))Θ22\displaystyle\left\|\left\|h_{\alpha}(\lambda_{t}(\theta))\right\|^{2}_{\Theta}\right\|_{2} C(Yt22+6Ytλt(θ)Θ2+9λt(θ)Θ22)\displaystyle\leq C\left(\left\|Y^{2}_{t}\right\|_{2}+6\big{\|}Y_{t}\cdot\|\lambda_{t}(\theta)\|_{\Theta}\big{\|}_{2}+9\left\|\|\lambda_{t}(\theta)\|^{2}_{\Theta}\right\|_{2}\right)
    C(Yt42+6Yt4λt(θ)Θ4+9λt(θ)Θ42)\displaystyle\leq C\left(\left\|Y_{t}\right\|^{2}_{4}+6\|Y_{t}\|_{4}\cdot\|\|\lambda_{t}(\theta)\|_{\Theta}\|_{4}+9\left\|\|\lambda_{t}(\theta)\|_{\Theta}\right\|^{2}_{4}\right)
    C(C+6Cλ¯1/4+9λ¯1/2):=h¯α<(from (2.2) with ϵ>3 and (A2)).\displaystyle\leq C\left(C+6C\underline{\lambda}^{1/4}+9\underline{\lambda}^{1/2}\right):=\overline{h}_{\alpha}<\infty~{}(\text{from (\ref{moment}) with $\epsilon>3$ and ({A2})}).

    To complete the verification of (A7), we need to impose the following regularity condition on the function BB (see also Kim and Lee (2019)):

    (B0): supθΘsup0δ1|B′′(ηt,δ(θ))B(ηt,δ(θ))3|K\sup_{\theta\in\Theta}\sup_{0\leq\delta\leq 1}\Big{|}\frac{B^{\prime\prime}(\eta_{t,\delta}(\theta))}{B^{\prime}(\eta_{t,\delta}(\theta))^{3}}\Big{|}\leq K, for some K>0K>0.

    Let λt,δ(θ)=δλt(θ)+(1δ)λ^t(θ)\lambda_{t,\delta}(\theta)=\delta\lambda_{t}(\theta)+(1-\delta)\widehat{\lambda}_{t}(\theta) with θΘ\theta\in\Theta, 0δ10\leq\delta\leq 1; which implies that ηt,δ(θ)=η(λt,δ(θ))\eta_{t,\delta}(\theta)=\eta(\lambda_{t,\delta}(\theta)) is between η(λt(θ))\eta(\lambda_{t}(\theta)) and η(λ^t(θ))\eta(\widehat{\lambda}_{t}(\theta)) since the function η\eta is monotone. By using the condition (B0), we can proceed as in Kim and Lee (2019) to get for all θΘ\theta\in\Theta,

    |mα(λt,δ(θ))|CYt2+KYt+CB(ηt,δ(θ))2+3KB(ηt,δ(θ))+C\displaystyle\left|m_{\alpha}(\lambda_{t,\delta}(\theta))\right|\leq CY^{2}_{t}+KY_{t}+CB(\eta_{t,\delta}(\theta))^{2}+3KB(\eta_{t,\delta}(\theta))+C
    CYt2+KYt+C(B(ηt(θ))+|B(η^t(θ))B(ηt(θ))|)2+3K(B(ηt(θ))+|B(η^t(θ))B(ηt(θ))|)+C\displaystyle\hskip 22.76228pt\leq CY^{2}_{t}+KY_{t}+C\big{(}B(\eta_{t}(\theta))+\left|B(\widehat{\eta}_{t}(\theta))-B(\eta_{t}(\theta))\right|\big{)}^{2}+3K\big{(}B(\eta_{t}(\theta))+\left|B(\widehat{\eta}_{t}(\theta))-B(\eta_{t}(\theta))\right|\big{)}+C
    =CYt2+KYt+C(λt(θ)+|λ^t(θ)λt(θ)|)2+3K(λt(θ)+|λ^t(θ)λt(θ)|)+C.\displaystyle\hskip 22.76228pt=CY^{2}_{t}+KY_{t}+C\big{(}\lambda_{t}(\theta)+|\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)|\big{)}^{2}+3K\big{(}\lambda_{t}(\theta)+|\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)|\big{)}+C.

    Thus, according to (3.4) and (3.7), for any t1t\geq 1, we have

    mα(λt,δ(θ))ΘCYt2+KYt+C(fθ(0)Θ+21α,Y(0)Yt+21α,X(0)Xt)2+3K(fθ(0)Θ+21α,Y(0)Yt+21α,X(0)Xt)+C.\left\|m_{\alpha}(\lambda_{t,\delta}(\theta))\right\|_{\Theta}\leq CY^{2}_{t}+KY_{t}+C\big{(}\left\|f_{\theta}(0)\right\|_{\Theta}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|\big{)}^{2}\\ +3K\big{(}\left\|f_{\theta}(0)\right\|_{\Theta}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|\big{)}+C.

    Hence, from (2.2) with ϵ>3\epsilon>3,

    mα(λt,δ(θ))Θ2\displaystyle\left\|\left\|m_{\alpha}(\lambda_{t,\delta}(\theta))\right\|_{\Theta}\right\|_{2}
    CYt22+KYt2+C(fθ(0)Θ+21α,Y(0)Yt+21α,X(0)Xt)22\displaystyle\hskip 28.45274pt\leq C\|Y^{2}_{t}\|_{2}+K\|Y_{t}\|_{2}+C\Big{\|}\big{(}\left\|f_{\theta}(0)\right\|_{\Theta}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|\big{)}^{2}\Big{\|}_{2}
    +3K(fθ(0)Θ+21α,Y(0)Yt+21α,X(0)Xt)2+C\displaystyle\hskip 85.35826pt+3K\Big{\|}\big{(}\left\|f_{\theta}(0)\right\|_{\Theta}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|\big{)}\Big{\|}_{2}+C
    CYt42+KYt2+Cfθ(0)Θ+21α,Y(0)Yt+21α,X(0)Xt42\displaystyle\hskip 28.45274pt\leq C\|Y_{t}\|^{2}_{4}+K\|Y_{t}\|_{2}+C\Big{\|}\left\|f_{\theta}(0)\right\|_{\Theta}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|\Big{\|}^{2}_{4}
    +3K(fθ(0)Θ2+21α,Y(0)Yt2+21α,X(0)Xt2)+C\displaystyle\hskip 85.35826pt+3K\big{(}\left\|\left\|f_{\theta}(0)\right\|_{\Theta}\right\|_{2}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}\|Y_{t-\ell}\|_{2}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\|\left\|X_{t-\ell}\right\|\|_{2}\big{)}+C
    C+CK+C(fθ(0)Θ4+21α,Y(0)Yt4+21α,X(0)Xt4)2\displaystyle\hskip 28.45274pt\leq C+CK+C\Big{(}\left\|\left\|f_{\theta}(0)\right\|_{\Theta}\right\|_{4}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}\|Y_{t-\ell}\|_{4}+2\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\|\left\|X_{t-\ell}\right\|\|_{4}\Big{)}^{2}
    +3K(C+C(1α,Y(0)+1α,X(0)))+C\displaystyle\hskip 85.35826pt+3K\big{(}C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)}\big{)}+C
    C+C(C+C(1α,Y(0)+1α,X(0)))2+C(C+C(1α,Y(0)+1α,X(0)))+C:=m¯α<.\displaystyle\hskip 28.45274pt\leq C+C\Big{(}C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)}\Big{)}^{2}+C\big{(}C+C\big{(}\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,Y}+\sum\limits_{\ell\geq 1}\alpha^{(0)}_{\ell,X}\big{)}\big{)}+C:=\overline{m}_{\alpha}<\infty.

    Thus (A7) is satisfied.

3.2 A particular case of linear models: INGARCH-X

As second example, consider the Poisson-INGARCH-X model defined by

Yt|t1𝒫(λt)withλt=α0+i=1qαiYti+i=1pβiλti+v(Xt1),Y_{t}|\mathcal{F}_{t-1}\sim\mathcal{P}(\lambda_{t})~{}~{}\text{with}~{}~{}\lambda_{t}=\alpha^{*}_{0}+\sum_{i=1}^{q^{*}}\alpha^{*}_{i}Y_{t-i}+\sum_{i=1}^{p^{*}}\beta^{*}_{i}\lambda_{t-i}+v(X_{t-1}), (3.9)

where α0>0\alpha^{*}_{0}>0, α1,,αq,β1,,βp0\alpha^{*}_{1},\cdots,\alpha^{*}_{q^{*}},\beta^{*}_{1},\cdots,\beta^{*}_{p^{*}}\geq 0 and vv is a non-negative function defined on dx\mathbb{R}^{d_{x}}. This class of models has already been studied within the maximum-likelihood framework; see for instance Agosto et al. (2016) and Pedersen and Rahbek (2018). Similarly, one can define the NB-INGARCH-X and BIN-INGARCH-X models with the negative binomial distribution and the Bernoulli distribution, respectively. Without loss of generality, we assume that the components of the exogenous covariate vector are non-negative and that v(x)v(x) is a linear function in xx. More precisely, there exists (γ1,,γdx)[0,)dx(\gamma^{*}_{1},\cdots,\gamma^{*}_{d_{x}})\in[0,\infty)^{d_{x}} such that v(x)=i=1dxγixiv(x)=\sum\limits_{i=1}^{d_{x}}\gamma^{*}_{i}x_{i}, for any x=(x1,,xdx)[0,)dxx=(x_{1},\cdots,x_{d_{x}})\in[0,\infty)^{d_{x}}. The true parameter of the model is θ=(α0,α1,,αq,β1,,βp,γ1,,γdx)\theta^{*}=(\alpha^{*}_{0},\alpha^{*}_{1},\cdots,\alpha^{*}_{q^{*}},\beta^{*}_{1},\cdots,\beta^{*}_{p^{*}},\gamma^{*}_{1},\cdots,\gamma^{*}_{d_{x}}); therefore, Θ\Theta is a compact subset of (0,)×[0,)p+q+dx(0,\infty)\times[0,\infty)^{p^{*}+q^{*}+d_{x}}. If i=1pβi<1\sum\limits_{i=1}^{p^{*}}\beta_{i}<1, then we can find two sequences (ψk(θ))k0(\psi_{k}(\theta^{*}))_{k\geq 0} and (γk(θ))k1(\gamma_{k}(\theta^{*}))_{k\geq 1} such that

λt=ψ0(θ)+k1ψk(θ)Ytk+k1γk(θ)Xtk;\lambda_{t}=\psi_{0}(\theta^{*})+\sum_{k\geq 1}\psi_{k}(\theta^{*})Y_{t-k}+\sum_{k\geq 1}\gamma^{\prime}_{k}(\theta^{*})X_{t-k};

which implies that fθ(y;x)=ψ0(θ)+k1ψk(θ)yk+k1γk(θ)xkf_{\theta^{*}}(y;x)=\psi_{0}(\theta^{*})+\sum_{k\geq 1}\psi_{k}(\theta^{*})y_{k}+\sum_{k\geq 1}\gamma^{\prime}_{k}(\theta^{*})x_{k}, for any (y,x)0×(dx)(y,x)\in\mathbb{N}^{\mathbb{N}}_{0}\times{(\mathbb{R}^{d_{x}})}^{\mathbb{N}}. As pointed out in Aknouche and Francq (2020), by using the arguments of Theorem 1 in Francq and Thieu (2019), a sufficient condition on the model (3.9) to be identifiable is that YtY_{t} is not a measurable function of {Xs,s}\{X_{s},\ s\in\mathbb{Z}\}. Let us impose here a Markov-structure on the set of the covariates. Assume that Xt=g(Xt1,εt)X_{t}=g(X_{t-1},\varepsilon_{t}) for some function g(x,εt)g(x,\varepsilon_{t}) with values in [0,)dx[0,\infty)^{d_{x}} and where (εt\varepsilon_{t}) is a sequence of i.i.d random variables. To ensure the stability of the exogenous covariates, we make the following assumption (see also Agosto et al. (2016)).

(B1): 𝔼[g(x,εt)g(x,εt)s]ρxxs\mathbb{E}\left[\left\|g(x,\varepsilon_{t})-g(x^{\prime},\varepsilon_{t})\right\|^{s}\right]\leq\rho\left\|x-x^{\prime}\right\|^{s} and 𝔼[g(0,εt)s]<\mathbb{E}\left[\left\|g(0,\varepsilon_{t})\right\|^{s}\right]<\infty, for some (s,ρ)[1,)×(0,1)(s,\rho)\in[1,\infty)\times(0,1) and for all x,x[0,)dxx,x^{\prime}\in[0,\infty)^{d_{x}}.

To check the conditions of Theorems 2.1 and 2.2, define the compact set Θ\Theta as follows:

Θ={θ=(α0,α1,,αq,β1,,βp,γ1,,γdx)[0,)p+q+dx+1/ 0<αLα0αU,i=1qαi+i=1pβi<1ϵ with ϵ=max{0,(1i=1pβi)(ρα1)} and max1idx{γi}αV},\Theta=\Big{\{}\theta=(\alpha_{0},\alpha_{1},\cdots,\alpha_{q^{*}},\beta_{1},\cdots,\beta_{p^{*}},\gamma_{1},\cdots,\gamma_{d_{x}})\in[0,\infty)^{p^{*}+q^{*}+d_{x}+1}\big{/}\ 0<\alpha_{L}\leq\alpha_{0}\leq\alpha_{U},\\ \sum\limits_{i=1}^{q^{*}}\alpha_{i}+\sum\limits_{i=1}^{p^{*}}\beta_{i}<1-\epsilon\text{ with }\epsilon=\max\big{\{}0,\big{(}1-\sum\limits_{i=1}^{p^{*}}\beta_{i}\big{)}(\rho-\alpha_{1})\big{\}}~{}\text{ and }\max_{1\leq i\leq d_{x}}\{\gamma_{i}\}\leq\alpha_{V}\Big{\}}, (3.10)

for some αL,αU,αV,ϵ>0\alpha_{L},\alpha_{U},\alpha_{V},\epsilon>0.

  • Under the assumption (B1), if θΘ\theta^{*}\in\Theta (see the definition in 3.10), A(Θ)i{}_{i}(\Theta) (i=0,1,2i=0,1,2) holds; and Proposition 3.1 establishes the existence of a τ\tau-weakly dependent stationary and ergodic solution Yt=(Yt,λt,Xt)Y^{*}_{t}=(Y_{t},\lambda_{t},X_{t}) of the model (3.9) as well as the NB-INGARCH-X and BIN-INGARCH-X models, satisfying 𝔼Yt<\mathbb{E}\|Y^{*}_{t}\|<\infty. In the case of the Poisson-INGARCH-X model, we refer to the second part of the proof of Theorem 1 in Agosto et al. (2016) for the existence of the ss-order moment with s>1s>1. Thus, according to (B1) (with s>3s>3 ), the condition (2.2) (with ϵ>3\epsilon>3) holds, and consequently (A2) holds for the Poisson-INGARCH-X model (see subsection 3.1, and for sufficient conditions in a large class of models, including NB-INGARCH-X and BIN-INGARCH-X). In addition, fθ(y;x)α0αL:=c¯f_{\theta}(y;x)\geq\alpha_{0}\geq\alpha_{L}:=\underline{c} for all θΘ\theta\in\Theta and (y,x)0×(dx)(y,x)\in\mathbb{N}^{\mathbb{N}}_{0}\times{(\mathbb{R}^{d_{x}})}^{\mathbb{N}}, which shows that the second part of (A0) holds. Since the one-parameter exponential family includes the Poisson distribution, by using (2.2), (A0) and A(Θ)0{}_{0}(\Theta), we can go along similar lines as in the general model (3.1) to show that the assumptions (A3), (A4), (A5) and (A6) hold.

  • Let us check the assumption (A7). If (B0) holds, one can go along similar lines as in (3.1) to get (A7). Let us show that (B0) holds. Remark that model (3.9) is a particular case of model (3.1) where λt=B(ηt)=eηt\lambda_{t}=B(\eta_{t})=\rm{e}^{\eta_{t}}. Thus, for any δ[0,1]\delta\in[0,1], θΘ\theta\in\Theta, we have

    |B(ηt,δ(θ))B′′(ηt,δ(θ))3|=1λt,δ2(θ)1αL2K,\displaystyle\Big{|}\frac{B^{\prime}(\eta_{t,\delta}(\theta))}{B^{\prime\prime}(\eta_{t,\delta}(\theta))^{3}}\Big{|}=\frac{1}{\lambda^{2}_{t,\delta}(\theta)}\leq\frac{1}{\alpha^{2}_{L}}\coloneqq K,

    where λt,δ(θ)\lambda_{t,\delta}(\theta) is between λ^t(θ)\widehat{\lambda}_{t}(\theta) and λt(θ)\lambda_{t}(\theta). Hence, (B0) is satisfied.

4 Simulation and results

This section presents some simulation study to assess the efficiency and the robustness of the MDPDE. We compare the performances of the MDPDE with those of the MLE (α=0\alpha=0) for some dynamic models satisfying (2.1). To this end, the stability of the estimators under contaminated data will be studied. For each model considered, the results are based on 100100 replications of Monte Carlo simulations of sample sizes n=500, 1000n=500,\,1000. The sample mean and the mean square error (MSE) of the estimators will be applied as evaluation criteria.

4.1 Poisson-INGARCH-X process

Consider the Poisson-INGARCH-X defined by

Yt|t1𝒫(λt)withλt=α0+α1Yt1+βλt1+γ|Xt1|,Y_{t}|\mathcal{F}_{t-1}\sim\mathcal{P}(\lambda_{t})~{}~{}\text{with}~{}~{}\lambda_{t}=\alpha_{0}+\alpha_{1}Y_{t-1}+\beta\lambda_{t-1}+\gamma|X_{t-1}|, (4.1)

where θ=(α0,α1,β,γ)Θ(0,)×[0,)3\theta^{*}=(\alpha_{0},\alpha_{1},\beta,\gamma)\in\Theta\subset(0,\infty)\times[0,\infty)^{3} is the true parameter and XtX_{t} follows an ARCH(1)(1) process given by

Xt|t1𝒩(0,σt2)withσt2=ω0+ω1Xt12,ω0>0,ω10.X_{t}|\mathcal{F}_{t-1}\sim\mathcal{N}(0,\sigma^{2}_{t})~{}~{}\text{with}~{}~{}\sigma^{2}_{t}=\omega_{0}+\omega_{1}X^{2}_{t-1},~{}~{}\omega_{0}>0,\,\omega_{1}\geq 0.

For this model, we set θ=(0.10,0.15,0.80,0.03)\theta^{*}=(0.10,0.15,0.80,0.03). This scenario is related and close to the real data example (see below). We first consider the case where the data are not contaminated by outliers. We generate a trajectory of model (4.1) with (ω0,ω1)=(1,0.5)(\omega_{0},\omega_{1})=(1,0.5). For different values of α\alpha, the parameter estimates and their corresponding MSEs (shown in parentheses) are summarized in Table 1. In the table, the minimal MSE for each component of θ^α,n\widehat{\theta}_{\alpha,n} is indicated by the symbol (some values of the MSE have been rounded up, e.g. in Table1, n=500n=500, α=0.10\alpha=0.10, the last column, in parentheses, the value is 0.088 instead of 0.09). These results show that the MLE has the minimal MSEs for all the parameters, except for γ\gamma when n=500n=500 and α1\alpha_{1} when n=1000n=1000. For the parameters γ\gamma (when n=500n=500) and α1\alpha_{1} (when n=1000n=1000), the MDPDE with α=0.1\alpha=0.1 has the minimal MSE, but the values are close to those of the MLE. One can also observe that the MSEs of the MDPDEs increase with α\alpha. These findings confirm the fact that the MLE generally outperforms the MDPDE when no outliers exist. However, as nn increases, the performances of the MDPDE increase for each α\alpha.
Now, we evaluate the robustness of the estimators by considering the case where the data are contaminated by additive outliers. Assume that we observe the contaminated process Yc,tY_{c,t} such that Yc,t=Yt+PtY0,tY_{c,t}=Y_{t}+P_{t}Y_{0,t}, where YtY_{t} is generated from (4.1), PtP_{t} is an i.i.d Bernoulli random variable with a success probability pp and Y0,tY_{0,t} is an i.i.d Poisson random variable with a mean μ\mu. In the sequel, the variables PtP_{t}, Y0,tY_{0,t} and YtY_{t} are assumed to be independent. For p=0.02p=0.02 and μ=10\mu=10, the corresponding results are summarized in Table 2. From this table, one can see that the MDPDE has smaller MSEs than the MLE, except for the estimations obtained with α=0.75\alpha=0.75 and α=1\alpha=1 (see for instance β^\widehat{\beta}); which indicates that the MDPDE is more robust to outliers and overall outperforms the MLE in such cases. We also observe that the selected optimal value of α\alpha decreases as nn increases for all parameters.

Table 1: Sample mean and MSE×102\times 10^{2} of the estimators for the Poisson-INGARCH-X model (4.1) with θ=(0.1,0.15,0.8,0.03)\theta^{*}=(0.1,0.15,0.8,0.03): the case without outliers in the data.
  n=500n=500 n=1000n=1000
α\alpha α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^\widehat{\gamma} α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^\widehat{\gamma}
  0 0.137(1.86)0.137(1.86)^{\bullet} 0.152(0.10)0.152(0.10)^{\bullet} 0.780(0.60)0.780(0.60)^{\bullet} 0.035(0.09)0.035(0.09) \bf\cdot 0.120(0.39)0.120(0.39)^{\bullet} 0.148(0.06)0.148(0.06) 0.793(0.16)0.793(0.16)^{\bullet} 0.034(0.06)0.034(0.06)^{\bullet}
0.100.10 0.139(2.05)0.139(2.05) 0.152(0.11)0.152(0.11) 0.779(0.67)0.779(0.67) 0.035(0.09)0.035(0.09)^{\bullet} \bf\vdots 0.121(0.39)0.121(0.39) 0.148(0.06)0.148(0.06)^{\bullet} 0.793(0.16)0.793(0.16) 0.034(0.06)0.034(0.06)
0.200.20 0.140(2.25)0.140(2.25) 0.152(0.11)0.152(0.11) 0.779(0.74)0.779(0.74) 0.034(0.09)0.034(0.09) \bf\vdots 0.122(0.41)0.122(0.41) 0.148(0.07)0.148(0.07) 0.792(0.16)0.792(0.16) 0.034(0.06)0.034(0.06)
0.300.30 0.143(2.46)0.143(2.46) 0.152(0.12)0.152(0.12) 0.778(0.81)0.778(0.81) 0.034(0.09)0.034(0.09) \bf\vdots 0.123(0.43)0.123(0.43) 0.149(0.07)0.149(0.07) 0.791(0.17)0.791(0.17) 0.034(0.06)0.034(0.06)
0.400.40 0.144(2.64)0.144(2.64) 0.152(0.12)0.152(0.12) 0.777(0.87)0.777(0.87) 0.034(0.10)0.034(0.10) \bf\vdots 0.123(0.45)0.123(0.45) 0.149(0.07)0.149(0.07) 0.791(0.18)0.791(0.18) 0.034(0.06)0.034(0.06)
0.500.50 0.146(2.81)0.146(2.81) 0.152(0.13)0.152(0.13) 0.776(0.93)0.776(0.93) 0.034(0.10)0.034(0.10) \bf\vdots 0.124(0.47)0.124(0.47) 0.149(0.07)0.149(0.07) 0.790(0.19)0.790(0.19) 0.034(0.07)0.034(0.07)
0.750.75 0.150(3.09)0.150(3.09) 0.152(0.14)0.152(0.14) 0.775(1.03)0.775(1.03) 0.033(0.10)0.033(0.10) \bf\vdots 0.127(0.53)0.127(0.53) 0.150(0.09)0.150(0.09) 0.788(0.22)0.788(0.22) 0.034(0.07)0.034(0.07)
1.001.00 0.153(3.23)0.153(3.23) 0.152(0.16)0.152(0.16) 0.774(1.10)0.774(1.10) 0.032(0.10)0.032(0.10) \bf\cdot 0.129(0.59)0.129(0.59) 0.151(0.10)0.151(0.10) 0.787(0.26)0.787(0.26) 0.034(0.08)0.034(0.08)
 
Table 2: Sample mean and MSE×102\times 10^{2} of the estimators for the Poisson-INGARCH-X model (4.1) with θ=(0.1,0.15,0.8,0.03)\theta^{*}=(0.1,0.15,0.8,0.03): the case in which the data are contaminated by outliers.
  n=500n=500 n=1000n=1000
α\alpha α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^\widehat{\gamma} α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^\widehat{\gamma}
  0 0.279(26.90)0.279(26.90) 0.105(0.34)0.105(0.34) 0.775(3.53)0.775(3.53) 0.045(0.30)0.045(0.30) \bf\cdot 0.140(0.75)0.140(0.75) 0.107(0.25)0.107(0.25) 0.827(0.27)0.827(0.27) 0.044(0.12)0.044(0.12)
0.100.10 0.186(14.81)0.186(14.81) 0.105(0.30)0.105(0.30) 0.805(1.88)0.805(1.88) 0.041(0.18)0.041(0.18) \bf\vdots 0.109(0.49)0.109(0.49)^{\bullet} 0.109(0.22)0.109(0.22) 0.829(0.23)0.829(0.23)^{\bullet} 0.043(0.09)0.043(0.09)^{\bullet}
0.200.20 0.200(21.91)0.200(21.91) 0.105(0.30)0.105(0.30) 0.799(2.73)0.799(2.73) 0.041(0.16)0.041(0.16) \bf\vdots 0.104(0.52)0.104(0.52) 0.110(0.21)0.110(0.21)^{\bullet} 0.828(0.24)0.828(0.24) 0.043(0.09)0.043(0.09)
0.300.30 0.160(10.67)0.160(10.67)^{\bullet} 0.104(0.31)0.104(0.31) 0.812(1.58)0.812(1.58)^{\bullet} 0.041(0.16)0.041(0.16)^{\bullet} \bf\vdots 0.103(0.56)0.103(0.56) 0.110(0.22)0.110(0.22) 0.827(0.25)0.827(0.25) 0.043(0.10)0.043(0.10)
0.400.40 0.185(17.23)0.185(17.23) 0.104(0.31)0.104(0.31) 0.802(2.27)0.802(2.27) 0.041(0.16)0.041(0.16) \bf\vdots 0.104(0.58)0.104(0.58) 0.110(0.22)0.110(0.22) 0.826(0.26)0.826(0.26) 0.043(0.10)0.043(0.10)
0.500.50 0.187(17.17)0.187(17.17) 0.105(0.31)0.105(0.31) 0.801(2.27)0.801(2.27) 0.041(0.16)0.041(0.16) \bf\vdots 0.106(0.63)0.106(0.63) 0.110(0.22)0.110(0.22) 0.825(0.27)0.825(0.27) 0.043(0.10)0.043(0.10)
0.750.75 0.200(17.94)0.200(17.94) 0.107(0.30)0.107(0.30) 0.792(2.72)0.792(2.72) 0.041(0.17)0.041(0.17) \bf\vdots 0.109(0.69)0.109(0.69) 0.111(0.22)0.111(0.22) 0.822(0.29)0.822(0.29) 0.0441(0.11)0.0441(0.11)
1.001.00 0.238(26.31)0.238(26.31) 0.109(0.30)0.109(0.30)^{\bullet} 0.775(3.55)0.775(3.55) 0.043(0.19)0.043(0.19) \bf\cdot 0.112(0.78)0.112(0.78) 0.113(0.22)0.113(0.22) 0.818(0.32)0.818(0.32) 0.045(0.12)0.045(0.12)
 

4.2 NB-INGARCH-X process

Consider the negative-binomial-INGARCH-X (NB-INGARCH-X) model defined by

Yt|t1NB(r,pt)withr(1pt)pt=λt=α0+α1Yt1+βλt1+γ1exp(X1,t1)+γ21{X2,t1<0}|X2,t1|,Y_{t}|\mathcal{F}_{t-1}\sim NB(r,p_{t})~{}\text{with}~{}r\frac{(1-p_{t})}{p_{t}}=\lambda_{t}=\alpha_{0}+\alpha_{1}Y_{t-1}+\beta\lambda_{t-1}+\gamma_{1}\exp(X_{1,t-1})+\gamma_{2}\textrm{\dsrom{1}}_{\{X_{2,t-1}<0\}}|X_{2,t-1}|, (4.2)

where θ=(α0,α1,β,γ1,γ2)Θ(0,)×[0,)4\theta^{*}=(\alpha_{0},\alpha_{1},\beta,\gamma_{1},\gamma_{2})\in\Theta\subset(0,\infty)\times[0,\infty)^{4} is the true parameter, (X1,t,X2,t)(X_{1,t},X_{2,t}) is the covariate vector, {Xi,t,t1}\{X_{i,t},\,t\geq 1\} (for i=1,2i=1,2) is an autoregressive process satisfying

Xi,t=φiXi,t1+εi,t(with 0<φi<1 and εi,t is a Gaussian white noise),X_{i,t}=\varphi_{i}X_{i,t-1}+\varepsilon_{i,t}~{}(\text{with }0<\varphi_{i}<1\text{ and }\varepsilon_{i,t}\text{ is a Gaussian white noise}),

1{}\textrm{\dsrom{1}}_{\{\cdot\}} denotes the indicator function and NB(r,p)NB(r,p) denotes the negative binomial distribution with parameters rr and pp. For this model, we consider the cases of (φ1,φ2)=(1/3,1/2)(\varphi_{1},\varphi_{2})=(1/3,1/2), r=8r=8 and θ=(0.5,0.2,0.4,0.1,0.3)\theta^{*}=(0.5,0.2,0.4,0.1,0.3). We first generate a data YtY_{t} of model (4.2) (without outliers). To evaluate the robustness of the estimators, we consider the contaminated data Yc,tY_{c,t} (presence of outliers) as follows: Yc,t=Yt+PtY0,tY_{c,t}=Y_{t}+P_{t}Y_{0,t}, where PtP_{t} is an i.i.d Bernoulli random variable with a success probability p=0.02p=0.02 and Y0,tY_{0,t} is an i.i.d NB(5,0.4)NB(5,0.4). The results are presented in Tables 3 and 4. Once again, in the absence of outliers (see Table 3), the MLE outperforms the MDPDE and the efficiency of the MDPDE decreases as α\alpha increases. When the data are contaminated by outliers (see Table 4), one can see that the MDPDE has smaller MSEs than the MLE; that is, the MDPDE is more robust than the MLE. As in the model (4.1), when nn increases, the symbol overall tends to move upwards.

Table 3: Sample mean and MSE×102\times 10^{2} of the estimators for the NB-INGARCH-X model (4.2) with θ=(0.5,0.2,0.4,0.1,0.3)\theta^{*}=(0.5,0.2,0.4,0.1,0.3): the case without outliers in the data.
  n=500n=500 n=1000n=1000
α\alpha α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^1\widehat{\gamma}_{1} γ^2\widehat{\gamma}_{2} α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^1\widehat{\gamma}_{1} γ^2\widehat{\gamma}_{2}
  0 0.508(3.64)0.508(3.64)^{\bullet} 0.200(0.21)0.200(0.21)^{\bullet} 0.395(1.55)0.395(1.55)^{\bullet} 0.102(0.09)0.102(0.09)^{\bullet} 0.308(0.90)0.308(0.90)^{\bullet} \bf\cdot 0.507(1.79)0.507(1.79)^{\bullet} 0.192(0.13)0.192(0.13)^{\bullet} 0.405(0.71)0.405(0.71)^{\bullet} 0.102(0.06)0.102(0.06)^{\bullet} 0.300(0.51)0.300(0.51)^{\bullet}
0.100.10 0.509(3.75)0.509(3.75) 0.200(0.22)0.200(0.22) 0.394(1.60)0.394(1.60) 0.103(0.09)0.103(0.09) 0.309(0.91)0.309(0.91) \bf\vdots 0.507(1.92)0.507(1.92) 0.192(0.14)0.192(0.14) 0.405(0.76)0.405(0.76) 0.101(0.06)0.101(0.06) 0.299(0.52)0.299(0.52)
0.200.20 0.510(3.93)0.510(3.93) 0.201(0.23)0.201(0.23) 0.393(1.68)0.393(1.68) 0.103(0.10)0.103(0.10) 0.309(0.94)0.309(0.94) \bf\vdots 0.507(2.05)0.507(2.05) 0.193(0.14)0.193(0.14) 0.405(0.82)0.405(0.82) 0.101(0.06)0.101(0.06) 0.298(0.54)0.298(0.54)
0.300.30 0.511(4.17)0.511(4.17) 0.201(0.24)0.201(0.24) 0.391(1.78)0.391(1.78) 0.104(0.10)0.104(0.10) 0.310(0.98)0.310(0.98) \bf\vdots 0.507(2.19)0.507(2.19) 0.193(0.15)0.193(0.15) 0.406(0.88)0.406(0.88) 0.101(0.06)0.101(0.06) 0.298(0.56)0.298(0.56)
0.400.40 0.512(4.45)0.512(4.45) 0.201(0.25)0.201(0.25) 0.390(1.89)0.390(1.89) 0.104(0.11)0.104(0.11) 0.310(1.04)0.310(1.04) \bf\vdots 0.507(2.31)0.507(2.31) 0.193(0.15)0.193(0.15) 0.406(0.95)0.406(0.95) 0.101(0.07)0.101(0.07) 0.298(0.59)0.298(0.59)
0.500.50 0.512(4.69)0.512(4.69) 0.201(0.26)0.201(0.26) 0.389(1.99)0.389(1.99) 0.105(0.12)0.105(0.12) 0.312(1.11)0.312(1.11) \bf\vdots 0.507(2.43)0.507(2.43) 0.193(0.16)0.193(0.16) 0.406(0.99)0.406(0.99) 0.100(0.07)0.100(0.07) 0.298(0.62)0.298(0.62)
0.750.75 0.512(5.29)0.512(5.29) 0.202(0.30)0.202(0.30) 0.387(2.24)0.387(2.24) 0.106(0.14)0.106(0.14) 0.312(1.31)0.312(1.31) \bf\vdots 0.506(2.67)0.506(2.67) 0.192(0.18)0.192(0.18) 0.407(1.10)0.407(1.10) 0.100(0.08)0.100(0.08) 0.298(0.69)0.298(0.69)
1.001.00 0.514(5.97)0.514(5.97) 0.203(0.35)0.203(0.35) 0.384(2.52)0.384(2.52) 0.107(0.16)0.107(0.16) 0.315(1.54)0.315(1.54) \bf\vdots 0.505(2.86)0.505(2.86) 0.193(0.19)0.193(0.19) 0.408(1.20)0.408(1.20) 0.100(0.08)0.100(0.08) 0.299(0.77)0.299(0.77)
 
Table 4: Sample mean MSE×102\times 10^{2} of the estimators for the NB-INGARCH-X model (4.2) with θ=(0.5,0.2,0.3,0.1,0.3)\theta^{*}=(0.5,0.2,0.3,0.1,0.3): the case in which the data are contaminated by outliers.
  n=500n=500 n=1000n=1000
α\alpha α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^1\widehat{\gamma}_{1} γ^2\widehat{\gamma}_{2} α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} β^\widehat{\beta} γ^1\widehat{\gamma}_{1} γ^2\widehat{\gamma}_{2}
  0 0.608(10.57)0.608(10.57) 0.140(0.66)0.140(0.66) 0.433(2.91)0.433(2.91) 0.107(0.20)0.107(0.20) 0.303(1.02)0.303(1.02) \bf\cdot 0.619(5.35)0.619(5.35) 0.153(0.39)0.153(0.39) 0.416(1.35)0.416(1.35) 0.103(0.09)0.103(0.09) 0.312(0.65)0.312(0.65)
0.100.10 0.546(5.97)0.546(5.97) 0.146(0.51)0.146(0.51) 0.433(2.19)0.433(2.19) 0.107(0.16)0.107(0.16) 0.310(0.79)0.310(0.79) \bf\vdots 0.564(2.70)0.564(2.70) 0.160(0.29)0.160(0.29) 0.414(0.92)0.414(0.92) 0.104(0.06)0.104(0.06)^{\bullet} 0.317(0.60)0.317(0.60)^{\bullet}
0.200.20 0.526(5.25)0.526(5.25) 0.148(0.49)0.148(0.49) 0.433(2.09)0.433(2.09)^{\bullet} 0.107(0.15)0.107(0.15)^{\bullet} 0.311(0.77)0.311(0.77)^{\bullet} \bf\vdots 0.546(2.29)0.546(2.29) 0.161(0.28)0.161(0.28)^{\bullet} 0.413(0.89)0.413(0.89)^{\bullet} 0.103(0.06)0.103(0.06) 0.318(0.61)0.318(0.61)
0.300.30 0.518(5.22)0.518(5.22)^{\bullet} 0.148(0.48)0.148(0.48) 0.432(2.14)0.432(2.14) 0.106(0.15)0.106(0.15) 0.311(0.81)0.311(0.81) \bf\vdots 0.540(2.21)0.540(2.21)^{\bullet} 0.162(0.28)0.162(0.28) 0.412(0.92)0.412(0.92) 0.103(0.06)0.103(0.06) 0.317(0.63)0.317(0.63)
0.400.40 0.518(5.37)0.518(5.37) 0.149(0.48)0.149(0.48)^{\bullet} 0.429(2.19)0.429(2.19) 0.106(0.16)0.106(0.16) 0.312(0.86)0.312(0.86) \bf\vdots 0.537(2.25)0.537(2.25) 0.162(0.29)0.162(0.29) 0.411(0.97)0.411(0.97) 0.103(0.07)0.103(0.07) 0.316(0.65)0.316(0.65)
0.500.50 0.518(5.62)0.518(5.62) 0.150(0.49)0.150(0.49) 0.428(2.31)0.428(2.31) 0.106(0.16)0.106(0.16) 0.311(0.91)0.311(0.91) \bf\vdots 0.537(2.35)0.537(2.35) 0.163(0.29)0.163(0.29) 0.410(1.03)0.410(1.03) 0.103(0.07)0.103(0.07) 0.316(0.66)0.316(0.66)
0.750.75 0.532(6.63)0.532(6.63) 0.153(0.50)0.153(0.50) 0.415(2.71)0.415(2.71) 0.106(0.19)0.106(0.19) 0.315(1.10)0.315(1.10) \bf\vdots 0.538(2.63)0.538(2.63) 0.165(0.29)0.165(0.29) 0.406(1.17)0.406(1.17) 0.103(0.08)0.103(0.08) 0.315(0.71)0.315(0.71)
1.001.00 0.535(7.05)0.535(7.05) 0.157(0.51)0.157(0.51) 0.409(2.93)0.409(2.93) 0.106(0.21)0.106(0.21) 0.318(1.26)0.318(1.26) \bf\vdots 0.540(2.88)0.540(2.88) 0.168(0.29)0.168(0.29) 0.402(1.29)0.402(1.29) 0.104(0.09)0.104(0.09) 0.315(0.75)0.315(0.75)
 

4.3 A 11-knot dynamic model

We consider the 11-knot nonlinear dynamic model defined by (see also Davis and Liu (2016))

Yt|t1𝒫(λt)withλt=α0+α1Yt1+α2λt1+β(Yt1ξ)+,Y_{t}|\mathcal{F}_{t-1}\sim\mathcal{P}(\lambda_{t})~{}~{}\text{with}~{}~{}\lambda_{t}=\alpha_{0}+\alpha_{1}Y_{t-1}+\alpha_{2}\lambda_{t-1}+\beta(Y_{t-1}-\xi^{*})^{+}, (4.3)

where α0>0\alpha_{0}>0, α1,α2,β0\alpha_{1},\alpha_{2},\beta\geq 0, ξ\xi^{*} is a non-negative integer (so-called knot) and x+=max(x,0)x^{+}=\max(x,0) is the positive part of xx. The model (4.3) is a particular case of the models (2.1) and (3.1) with XtconstantX_{t}\equiv constant. The true parameter is θ=(α0,α1,α2,β)\theta^{*}=(\alpha_{0},\alpha_{1},\alpha_{2},\beta). In this model, we consider the cases where ξ=4\xi^{*}=4 and θ=(1,0.3,0.2,0.4)\theta^{*}=(1,0.3,0.2,0.4). We generate a data YtY_{t} from (4.3) and a contaminated data Yc,tY_{c,t} such that Yc,t=Yt+PtY0,tY_{c,t}=Y_{t}+P_{t}Y_{0,t}, where PtP_{t} is an i.i.d Bernoulli random variable with a success probability p=0.006p=0.006 and Y0,tY_{0,t} is an i.i.d Poisson random variable with a mean μ=11\mu=11. For each α\alpha, the knot ξ\xi^{*} is estimated by minimizing the function Hα,n()H_{\alpha,n}(\cdot) over the set of integer values {1,,ξmax}\{1,\cdots,\xi_{\max}\} where ξmax\xi_{\max} is an upper bound of the true knot ξ\xi^{*} given by ξmax=max(Y1,,Yn)\xi_{\max}=\max(Y_{1},\cdots,Y_{n}). The estimation can be summarized as follows:

  • For each ξ{1,,ξmax}\xi\in\{1,\cdots,\xi_{\max}\} fixed, compute the MDPDE of θ\theta^{*} denoted θ^α,n,ξ\widehat{\theta}_{\alpha,n,\xi}.

  • Estimate the knot by the relation: ξ^α,n=argminξ{1,,ξmax}Hα,n(θ^α,n,ξ).\widehat{\xi}_{\alpha,n}=\underset{\xi\in\{1,\cdots,\xi_{\max}\}}{\text{argmin}}H_{\alpha,n}(\widehat{\theta}_{\alpha,n,\xi}).

Some empirical statistics of the estimator ξ^α,n\widehat{\xi}_{\alpha,n} are reported in Table 5. These results show that the estimation of the knot is reasonably good in terms of the mean and the quantiles. In addition, the empirical probability of selecting the true knot increases with nn. From Table 6, we can see that when the data are without outliers, the MLE displays the minimal MSE (except for the estimation of β\beta); whereas in the presence of outliers (see Table 7), the MDPDE outperforms the MLE. Further, for the parameter β\beta, the MDPDE with α=1\alpha=1 has the minimal MSE; these results reveal that the estimation of β\beta is more damaged than that of the other parameters. This can be explained by the fact that the term (Yt1ξ)+(Y_{t-1}-\xi^{*})^{+} (in the relation (4.3)) is very sensitive to outliers.

Table 5: Some elementary statistics of the estimator ξ^α,n\widehat{\xi}_{\alpha,n} for the model (4.3) without outliers.
 
Sample size Mean SD Min Q1Q_{1} Med Q3Q_{3} Max (ξ^α,n=ξ)\mathbb{P}(\widehat{\xi}_{\alpha,n}=\xi^{*})
  500500 3.713.71 1.191.19 1.001.00 3.003.00 4.004.00 4.004.00 7.007.00 0.450.45
10001000 3.963.96 0.780.78 2.002.00 4.004.00 4.004.00 4.004.00 6.006.00 0.540.54
 
Table 6: Sample mean and MSE×102\times 10^{2} of the estimators for the 11-knot dynamic model (4.3) with θ=(1,0.3,0.2,0.4)\theta^{*}=(1,0.3,0.2,0.4): the case without outliers in the data.
  n=500n=500 n=1000n=1000
α\alpha α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} α^2\widehat{\alpha}_{2} β^\widehat{\beta} α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} α^2\widehat{\alpha}_{2} β^\widehat{\beta}
  0 1.052(3.96)1.052(3.96)^{\bullet} 0.288(1.02)0.288(1.02)^{\bullet} 0.172(0.88)0.172(0.88)^{\bullet} 0.405(1.96)0.405(1.96) \bf\cdot 1.028(2.38)1.028(2.38)^{\bullet} 0.290(0.29)0.290(0.29)^{\bullet} 0.189(0.60)0.189(0.60)^{\bullet} 0.382(1.33)0.382(1.33)
0.100.10 1.052(4.09)1.052(4.09) 0.283(1.09)0.283(1.09) 0.174(0.89)0.174(0.89) 0.404(1.84)0.404(1.84)^{\bullet} \bf\vdots 1.03(2.46)1.03(2.46) 0.289(0.30)0.289(0.30) 0.189(0.62)0.189(0.62) 0.387(1.29)0.387(1.29)
0.200.20 1.051(4.08)1.051(4.08) 0.280(1.13)0.280(1.13) 0.176(0.92)0.176(0.92) 0.409(1.92)0.409(1.92) \bf\vdots 1.034(2.53)1.034(2.53) 0.287(0.33)0.287(0.33) 0.188(0.64)0.188(0.64) 0.389(1.29)0.389(1.29)^{\bullet}
0.300.30 1.048(4.16)1.048(4.16) 0.281(1.08)0.281(1.08) 0.178(0.95)0.178(0.95) 0.410(2.04)0.410(2.04) \bf\vdots 1.036(2.61)1.036(2.61) 0.285(0.34)0.285(0.34) 0.189(0.67)0.189(0.67) 0.388(1.38)0.388(1.38)
0.400.40 1.047(4.35)1.047(4.35) 0.280(1.13)0.280(1.13) 0.178(1.01)0.178(1.01) 0.414(2.09)0.414(2.09) \bf\vdots 1.037(2.75)1.037(2.75) 0.283(0.37)0.283(0.37) 0.189(0.69)0.189(0.69) 0.386(1.46)0.386(1.46)
0.500.50 1.048(4.59)1.048(4.59) 0.276(1.25)0.276(1.25) 0.179(1.07)0.179(1.07) 0.418(2.11)0.418(2.11) \bf\vdots 1.037(2.88)1.037(2.88) 0.284(0.37)0.284(0.37) 0.189(0.73)0.189(0.73) 0.390(1.50)0.390(1.50)
0.750.75 1.045(5.08)1.045(5.08) 0.271(1.37)0.271(1.37) 0.181(1.25)0.181(1.25) 0.419(2.33)0.419(2.33) \bf\vdots 1.038(3.18)1.038(3.18) 0.282(0.45)0.282(0.45) 0.190(0.83)0.190(0.83) 0.397(1.66)0.397(1.66)
1.001.00 1.046(5.88)1.046(5.88) 0.264(1.58)0.264(1.58) 0.183(1.44)0.183(1.44) 0.425(2.42)0.425(2.42) \bf\vdots 1.040(3.45)1.040(3.45) 0.275(0.56)0.275(0.56) 0.191(0.94)0.191(0.94) 0.393(1.93)0.393(1.93)
 
Table 7: Sample mean and MSE×102\times 10^{2} of the estimators for the 11-knot dynamic model (4.3) with θ=(1,0.3,0.2,0.4)\theta^{*}=(1,0.3,0.2,0.4): the case in which the data are contaminated by outliers.
  n=500n=500 n=1000n=1000
α\alpha α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} α^2\widehat{\alpha}_{2} β^\widehat{\beta} α^0\widehat{\alpha}_{0} α^1\widehat{\alpha}_{1} α^2\widehat{\alpha}_{2} β^\widehat{\beta}
  0 1.018(7.52)1.018(7.52) 0.258(1.19)0.258(1.19) 0.233(2.64)0.233(2.64) 0.134(9.78)0.134(9.78) \bf\cdot 1.020(2.77)1.020(2.77) 0.259(0.90)0.259(0.90) 0.233(0.88)0.233(0.88) 0.123(9.24)0.123(9.24)
0.100.10 0.998(5.22)0.998(5.22)^{\bullet} 0.269(0.93)0.269(0.93) 0.217(1.95)0.217(1.95) 0.137(9.64)0.137(9.64) \bf\vdots 1.004(2.39)1.004(2.39)^{\bullet} 0.261(0.74)0.261(0.74) 0.219(0.65)0.219(0.65)^{\bullet} 0.132(8.86)0.132(8.86)
0.200.20 1.011(5.32)1.011(5.32) 0.271(0.92)0.271(0.92)^{\bullet} 0.205(1.90)0.205(1.90)^{\bullet} 0.158(8.90)0.158(8.90) \bf\vdots 1.015(2.621.015(2.62 0.264(0.70)0.264(0.70) 0.208(0.65)0.208(0.65) 0.150(8.31)0.150(8.31)
0.300.30 1.031(6.07)1.031(6.07) 0.270(1.01)0.270(1.01) 0.193(2.03)0.193(2.03) 0.187(8.28)0.187(8.28) \bf\vdots 1.028(2.84)1.028(2.84) 0.270(0.52)0.270(0.52) 0.198(0.70)0.198(0.70) 0.167(7.80)0.167(7.80)
0.400.40 1.048(6.72)1.048(6.72) 0.269(1.10)0.269(1.10) 0.183(2.14)0.183(2.14) 0.217(7.77)0.217(7.77) \bf\vdots 1.046(3.24)1.046(3.24) 0.269(0.53)0.269(0.53) 0.188(0.76)0.188(0.76) 0.190(7.18)0.190(7.18)
0.500.50 1.058(7.28)1.058(7.28) 0.267(1.14)0.267(1.14) 0.178(2.24)0.178(2.24) 0.234(7.40)0.234(7.40) \bf\vdots 1.059(3.68)1.059(3.68) 0.272(0.51)0.272(0.51) 0.179(0.85)0.179(0.85) 0.210(6.79)0.210(6.79)
0.750.75 1.083(8.44)1.083(8.44) 0.268(1.22)0.268(1.22) 0.165(2.43)0.165(2.43) 0.289(6.53)0.289(6.53) \bf\vdots 1.080(4.17)1.080(4.17) 0.276(0.50)0.276(0.50)^{\bullet} 0.166(1.01)0.166(1.01) 0.262(6.04)0.262(6.04)
1.001.00 1.089(9.14)1.089(9.14) 0.266(1.40)0.266(1.40) 0.162(2.62)0.162(2.62) 0.325(5.86)0.325(5.86)^{\bullet} \bf\vdots 1.102(4.98)1.102(4.98) 0.275(0.54)0.275(0.54) 0.154(1.19)0.154(1.19) 0.311(5.10)0.311(5.10)^{\bullet}
 

5 Real data application

The aim of this section is to apply the model (2.1) to analyze the number of transactions per minute for the stock Ericsson B during July 21, 2002. There are 460460 observations which represent trading from 09:35 to 17:14. This time series is a part of a large dataset which has already been the subject of many works in the literature. See, for instance, Fokianos et al. (2009), Fokianos and Neumann (2013), Davis and Liu (2016) (the series of July 2, 2002), Doukhan and Kengne (2015) (the series of July 16, 2002), Diop and Kengne (2017) (the series of July 5, 2002) and Brännäs and Quoreshi (2010). The data (the transaction during July 21, 2002) and its autocorrelation function displayed on Figure 1 (see (a) and (b)) show three stylized facts: (i) a positive temporal dependence; (ii) the data are overdispersed (the empirical mean is 7.287.28 while the empirical variance is 28.0528.05); (iii) presence of outliers is suspected. Our purpose is to fit these data by taking into account a possible relationship between the number of transactions and the volume-volatility. This question has been investigated in several financial studies during the past two decades; see, for example, Takaishi and Chen (2016), Belhaj et al. (2015) and Louhichi (2011). In these works, the volume-volatility is found to exhibit a statistically significant impact on the trading volume (number of transactions or trade size).
Now, we consider the Poisson-INGARCH-X model given by

Yt|t1𝒫(λt)withλt=α0+α1Yt1+βλti+γ|Vt1|,Y_{t}|\mathcal{F}_{t-1}\sim\mathcal{P}(\lambda_{t})~{}~{}\text{with}~{}~{}\lambda_{t}=\alpha_{0}+\alpha_{1}Y_{t-1}+\beta\lambda_{t-i}+\gamma|V_{t-1}|, (5.1)

where α0>0\alpha_{0}>0, α1,β,γ0\alpha_{1},\beta,\gamma\geq 0 and VtV_{t} represents the volume-volatility at time tt. Observe that if γ=0\gamma=0, then the model (5.1) reduces to the classical Poisson-INGARCH(1,1)(1,1) model. We first examine the adequacy of the fitted (based on the MLE) model with exogenous covariate by comparing it with the Poisson-INGARCH(1,1)(1,1) model (without exogenous covariate). As evaluation criteria, we consider the estimated counterparts of the Pearson residuals defined by et=(Ytλt)/λte_{t}=(Y_{t}-\lambda_{t})/\sqrt{\lambda_{t}}. Under the true model, the process {et}\{e_{t}\} is close to a white noise sequence with constant variance (see for instance, Kedem and Fokianos (2002)). The comparison of these two models is based on the MSE of the Pearson residuals which is defined by t=1net2/(nd)\sum_{t=1}^{n}e^{2}_{t}/(n-d), where dd denotes the number of the estimated parameters. The approximated MSE of the Pearson residuals is 2.2692.269 for the model (5.1) and 2.2982.298 for the Poisson-INGARCH(1,1)(1,1); which indicates a preference for the model with covariate. The test of the significance of the exogenous covariate of Pedersen and Rahbek (2018), applied to the series also confirms these results. Under the model (5.1) (i.e, with γ0\gamma\neq 0), the cumulative periodogram plot of the Pearson residuals is displayed in Figure 1(d). From this figure, the associated residuals appear to be uncorrelated over time. This lends a substantial support to the choice of the model with exogenous covariate for fitting these data.

Since outliers are suspected in the data (see Figure 1(a)), we apply the MDEPE to estimate the parameters of the model. To choose the optimal tuning parameter α\alpha, we adopt the idea of Warwick and Jones (2005). It is based on the minimization of an Asymptotic approximation of the summed Mean Squared Error (AMSE) defined by

AMSE^=(θ^α,nθ^1,n)(θ^α,nθ^1,n)+1nTrace[Σ^α,n],\widehat{\text{AMSE}}=(\widehat{\theta}_{\alpha,n}-\widehat{\theta}_{1,n})^{\prime}(\widehat{\theta}_{\alpha,n}-\widehat{\theta}_{1,n})+\frac{1}{n}\text{Trace}\big{[}\widehat{\Sigma}_{\alpha,n}\big{]},

where θ^1,n\widehat{\theta}_{1,n} is the MDPDE obtained with α=1\alpha=1 and Σ^α,n\widehat{\Sigma}_{\alpha,n} is a consistent estimator of the covariance matrix Σα\Sigma_{\alpha} given in Theorem 2.2. To compute θ^α,n\widehat{\theta}_{\alpha,n}, the initial value λ^1\widehat{\lambda}_{1} is set to be the empirical mean of the data and λ^1/θ\partial\widehat{\lambda}_{1}/\partial\theta is set to be the null vector. For different values of α\alpha, the corresponding AMSE^\widehat{\text{AMSE}} are displayed in Table 8. Based on the findings of this table, the optimal tuning parameter chosen is α=0.2\alpha=0.2, which provides the minimum value of the AMSE^\widehat{\text{AMSE}} (indicated by the symbol ). Thus, the MDPDE is more accurate than the MLE for this data. With α=0.2\alpha=0.2, the MDPDE applied to the model (5.1) yields:

λ^t=0.103(0.081)+0.144(0.024)Yt1+0.833(0.027)λ^t1+0.030(0.023)|Vt1|,\widehat{\lambda}_{t}=\underset{(0.081)}{0.103}+\underset{(0.024)}{0.144}Y_{t-1}+\underset{(0.027)}{0.833}\widehat{\lambda}_{t-1}+\underset{(0.023)}{0.030}|V_{t-1}|, (5.2)

where in parentheses are the standard errors of the estimators obtained from the robust sandwich matrix. Figure 1(c) displays the number of transactions (YtY_{t}), the fitted values (Y^tλ^t\widehat{Y}_{t}\coloneqq\widehat{\lambda}_{t}) and the 95%95\%-prediction interval based on the underlying Poisson distribution. This figure shows that the fitted values capture reasonably well the dynamics of the observed process.

Table 8: The AMSE^×102\widehat{\text{AMSE}}\times 10^{2} corresponding to some values of α\alpha in the model (5.1).
 
α\alpha 0 0.050.05 0.10.1 0.150.15 0.20.2 0.250.25 0.30.3 0.350.35 0.40.4 0.450.45 0.50.5 0.750.75 11
AMSE^\widehat{\text{AMSE}} 4.1814.181 2.6242.624 2.2642.264 2.1532.153 2.1372.137^{\bullet} 2.1562.156 2.2052.205 2.2822.282 2.3742.374 2.4882.488 2.6242.624 3.6283.628 5.2675.267
 
Refer to caption
Figure 1: (a) Number of transactions per minute for the stock Ericsson B during July 21, 2002. (b) Autocorrelation function of the transaction data. (c) Predicted number of transactions per minute and the corresponding confidence bands at the 95%95\% nominal level (dotted curves) based on the relation (5.2). (c) Cumulative periodogram plot of the Pearson residuals from the model (5.1).

6 Proofs of the main results

Without loss of generality, we only provide the proofs of Theorems 2.1 and 2.2 for α>0\alpha>0. The proofs in the case of the maximum likelihood estimators (i.e., α=0\alpha=0) can be done conventionally by using the classical methods.
Throughout the sequel, CC denotes a positive constant whom value may differ from an inequality to another.

6.1 Proof of Theorem 2.1

We consider the following lemma.

Lemma 6.1

Assume that the conditions of Theorem 2.1 hold. Then

H^α,n(θ)Hα,n(θ)Θa.s.n0.\left\|\widehat{H}_{\alpha,n}(\theta)-H_{\alpha,n}(\theta)\right\|_{\Theta}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0. (6.1)

Proof of Lemma 6.1
Remark that

H^α,n(θ)Hα,n(θ)Θ\displaystyle\left\|\widehat{H}_{\alpha,n}(\theta)-H_{\alpha,n}(\theta)\right\|_{\Theta} 1nt=1n^α,t(θ)α,t(θ)Θ\displaystyle\leq\frac{1}{n}\sum_{t=1}^{n}\|\widehat{\ell}_{\alpha,t}(\theta)-\ell_{\alpha,t}(\theta)\|_{\Theta}
In,1+In,2,\displaystyle\leq I_{n,1}+I_{n,2},

where

In,1\displaystyle I_{n,1} =1nt=1ny=0{g(y|η^t(θ))1+αg(y|ηt(θ))1+α}Θ,\displaystyle=\frac{1}{n}\sum_{t=1}^{n}\Big{\|}\sum_{y=0}^{\infty}\left\{g(y|\widehat{\eta}_{t}(\theta))^{1+\alpha}-g(y|\eta_{t}(\theta))^{1+\alpha}\right\}\Big{\|}_{\Theta},
In,2\displaystyle I_{n,2} =(1+1α)1nt=1ng(Yt|η^t(θ))αg(Yt|ηt(θ))αΘ.\displaystyle=\big{(}1+\frac{1}{\alpha}\big{)}\frac{1}{n}\sum_{t=1}^{n}\Big{\|}g(Y_{t}|\widehat{\eta}_{t}(\theta))^{\alpha}-g(Y_{t}|\eta_{t}(\theta))^{\alpha}\Big{\|}_{\Theta}.

It suffices to show that (i)(i) In,1a.s.n0I_{n,1}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0 and (ii)(ii) In,2a.s.n0I_{n,2}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0.

  1. (i)(i)

    For any t1t\geq 1, we apply the mean value theorem at the function ηy=0g(y|η)1+α\eta\mapsto\sum_{y=0}^{\infty}g(y|\eta)^{1+\alpha}. For any θΘ\theta\in\Theta, there exists η~t(θ)\tilde{\eta}_{t}(\theta) between ηt(θ)\eta_{t}(\theta) and η^t(θ)\widehat{\eta}_{t}(\theta) such that

    |y=0{g(y|η^t(θ))1+αg(y|ηt(θ))1+α}|\displaystyle\Big{|}\sum_{y=0}^{\infty}\left\{g(y|\widehat{\eta}_{t}(\theta))^{1+\alpha}-g(y|\eta_{t}(\theta))^{1+\alpha}\right\}\Big{|} (1+α)|η^t(θ)ηt(θ)|y=0|g(y|η~t(θ))η|g(y|η~t(θ))α\displaystyle\leq(1+\alpha)\left|\widehat{\eta}_{t}(\theta)-\eta_{t}(\theta)\right|\sum_{y=0}^{\infty}\Big{|}\frac{\partial g(y|\tilde{\eta}_{t}(\theta))}{\partial\eta}\Big{|}g(y|\tilde{\eta}_{t}(\theta))^{\alpha}
    C|η^t(θ)ηt(θ)|y=0|g(y|η~t(θ))η|\displaystyle\leq C\left|\widehat{\eta}_{t}(\theta)-\eta_{t}(\theta)\right|\sum_{y=0}^{\infty}\Big{|}\frac{\partial g(y|\tilde{\eta}_{t}(\theta))}{\partial\eta}\Big{|}
    C|η(λ^t(θ))η(λt(θ))|φ(η~t(θ))\displaystyle\leq C\big{|}\eta(\widehat{\lambda}_{t}(\theta))-\eta(\lambda_{t}(\theta))\big{|}\varphi(\tilde{\eta}_{t}(\theta))
    C|λ^t(θ)λt(θ)|φ(η~t(θ))(by virtue of (A6)).\displaystyle\leq C\big{|}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{|}\varphi(\tilde{\eta}_{t}(\theta))~{}(\text{by virtue of }(\textbf{A6})).

    We deduce that

    In,1\displaystyle I_{n,1} C1nt=1n(λ^t(θ)λt(θ))φ(η~t(θ))Θ.\displaystyle\leq C\frac{1}{n}\sum_{t=1}^{n}\big{\|}\big{(}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{)}\varphi(\tilde{\eta}_{t}(\theta))\big{\|}_{\Theta}.

    By using Kounias and Weng (1969), it suffices to show that

    k11k𝔼(λ^k(θ)λk(θ))φ(η~k(θ))Θ<.\sum_{k\geq 1}\frac{1}{k}\mathbb{E}\big{\|}\big{(}\widehat{\lambda}_{k}(\theta)-\lambda_{k}(\theta)\big{)}\varphi(\tilde{\eta}_{k}(\theta))\big{\|}_{\Theta}<\infty. (6.2)

    From (3.6) and the Hölder’s inequality, we have

    𝔼(λ^k(θ)λk(θ))φ(η~k(θ))Θ\displaystyle\mathbb{E}\big{\|}\big{(}\widehat{\lambda}_{k}(\theta)-\lambda_{k}(\theta)\big{)}\varphi(\tilde{\eta}_{k}(\theta))\big{\|}_{\Theta} 𝔼[φ(η~k(θ))Θkα(0)(Yk+Xk)]\displaystyle\leq\mathbb{E}\Big{[}\big{\|}\varphi(\tilde{\eta}_{k}(\theta))\|_{\Theta}\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell}\left(Y_{k-\ell}+\left\|X_{k-\ell}\right\|\right)\Big{]}
    (𝔼φ(η~k(θ))Θ2)1/2(𝔼[(kα,Y(0)Yk+kα,X(0)Xk)2])1/2\displaystyle\leq\big{(}\mathbb{E}\big{\|}\varphi(\tilde{\eta}_{k}(\theta))\big{\|}^{2}_{\Theta}\big{)}^{1/2}\Big{(}\mathbb{E}\Big{[}\big{(}\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell,Y}Y_{k-\ell}+\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell,X}\|X_{k-\ell}\|\big{)}^{2}\Big{]}\Big{)}^{1/2}
    φ(η~k(θ))Θ2[kα,Y(0)Yk2+kα,X(0)Xk2]\displaystyle\leq\|\|\varphi(\tilde{\eta}_{k}(\theta))\|_{\Theta}\|_{2}\Big{[}\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell,Y}\|Y_{k-\ell}\|_{2}+\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell,X}\|\|X_{k-\ell}\|\|_{2}\Big{]}
    Cφ¯kk(α,Y(0)+α,X(0))(from (A4) and the stationary assumptions)\displaystyle\leq C\overline{\varphi}_{k}\sum\limits_{\ell\geq k}\big{(}\alpha^{(0)}_{\ell,Y}+\alpha^{(0)}_{\ell,X}\big{)}~{}(\text{from ({A4}) and the stationary assumptions})
    C1kγ1φ¯k(from the Riemannian assumption (2.5)).\displaystyle\leq C\frac{1}{k^{\gamma-1}}\overline{\varphi}_{k}~{}(\text{from the Riemannian assumption }(\ref{eq_th1})).

    Hence,

    k11k𝔼(λ^k(θ)λk(θ))φ(η~k(θ))ΘCk11kγφ¯k<(from (2.5)).\sum_{k\geq 1}\frac{1}{k}\mathbb{E}\big{\|}\big{(}\widehat{\lambda}_{k}(\theta)-\lambda_{k}(\theta)\big{)}\varphi(\tilde{\eta}_{k}(\theta))\big{\|}_{\Theta}\leq C\sum_{k\geq 1}\frac{1}{k^{\gamma}}\overline{\varphi}_{k}<\infty~{}(\text{from }(\ref{eq_th1})).

    Therefore, (6.2) holds and thus, In,1a.s.n0I_{n,1}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0.

  2. (ii)(ii)

    By applying the mean value theorem at the function ηg(Yt|η)α\eta\mapsto g(Y_{t}|\eta)^{\alpha} and from (A6), for all θΘ\theta\in\Theta, there exists η~t(θ)\tilde{\eta}_{t}(\theta) between ηt(θ)\eta_{t}(\theta) and η^t(θ)\widehat{\eta}_{t}(\theta) such that

    |g(Yt|η^t(θ))αg(Yt|ηt(θ))α|\displaystyle\big{|}g(Y_{t}|\widehat{\eta}_{t}(\theta))^{\alpha}-g(Y_{t}|\eta_{t}(\theta))^{\alpha}\big{|} (1+α)|η^t(θ)ηt(θ)||g(Yt|η~t(θ))η|g(Yt|η~t(θ))α1\displaystyle\leq(1+\alpha)\left|\widehat{\eta}_{t}(\theta)-\eta_{t}(\theta)\right|\Big{|}\frac{\partial g(Y_{t}|\tilde{\eta}_{t}(\theta))}{\partial\eta}\Big{|}g(Y_{t}|\tilde{\eta}_{t}(\theta))^{\alpha-1}
    C|λ^t(θ)λt(θ)||1g(Yt|η~t)g(Yt|η~t(θ))η|\displaystyle\leq C\big{|}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{|}\Big{|}\frac{1}{g(Y_{t}|\tilde{\eta}_{t})}\,\frac{\partial g(Y_{t}|\tilde{\eta}_{t}(\theta))}{\partial\eta}\Big{|}
    C|λ^t(θ)λt(θ)|ψ(η~t(θ)).\displaystyle\leq C\big{|}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{|}\psi(\tilde{\eta}_{t}(\theta)).

    According to [35], a sufficient condition for that In,2a.s.n0I_{n,2}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0 is

    k11k𝔼[(λ^k(θ)λk(θ))ψ(η~k(θ))Θ]<.\sum_{k\geq 1}\frac{1}{k}\mathbb{E}\Big{[}\left\|\left(\widehat{\lambda}_{k}(\theta)-\lambda_{k}(\theta)\right)\psi(\tilde{\eta}_{k}(\theta))\right\|_{\Theta}\Big{]}<\infty. (6.3)

    According to (3.6) and by using the same arguments as above, we get

    𝔼(λ^k(θ)λk(θ))ψ(η~k(θ))Θ\displaystyle\mathbb{E}\big{\|}\big{(}\widehat{\lambda}_{k}(\theta)-\lambda_{k}(\theta)\big{)}\psi(\tilde{\eta}_{k}(\theta))\big{\|}_{\Theta} 𝔼[ψ(η~k(θ))Θkα(0)(Yk+Xk)]\displaystyle\leq\mathbb{E}\Big{[}\big{\|}\psi(\tilde{\eta}_{k}(\theta))\|_{\Theta}\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell}\left(Y_{k-\ell}+\left\|X_{k-\ell}\right\|\right)\Big{]}
    (𝔼ψ(η~k(θ))Θ2)1/2(𝔼[(kα(0)(Yk+Xk))2])1/2\displaystyle\leq\big{(}\mathbb{E}\big{\|}\psi(\tilde{\eta}_{k}(\theta))\big{\|}^{2}_{\Theta}\big{)}^{1/2}\Big{(}\mathbb{E}\Big{[}\big{(}\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell}(Y_{k-\ell}+\|X_{k-\ell}\|)\big{)}^{2}\Big{]}\Big{)}^{1/2}
    ψ(η~k(θ))Θ2[kα,Y(0)Yk2+kα,X(0)Xk2]\displaystyle\leq\|\|\psi(\tilde{\eta}_{k}(\theta))\|_{\Theta}\|_{2}\big{[}\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell,Y}\|Y_{k-\ell}\|_{2}+\sum\limits_{\ell\geq k}\alpha^{(0)}_{\ell,X}\|\|X_{k-\ell}\|\|_{2}\big{]}
    Cψ¯kk(α,Y(0)+α,X(0))C1kγ1ψ¯k.\displaystyle\leq C\overline{\psi}_{k}\sum\limits_{\ell\geq k}\big{(}\alpha^{(0)}_{\ell,Y}+\alpha^{(0)}_{\ell,X}\big{)}\leq C\frac{1}{k^{\gamma-1}}\overline{\psi}_{k}.

    We deduce that

    k11k𝔼(λ^k(θ)λk(θ))ψ(η~k(θ))ΘCk11kγψ¯k< from (2.5.\sum_{k\geq 1}\frac{1}{k}\mathbb{E}\big{\|}\big{(}\widehat{\lambda}_{k}(\theta)-\lambda_{k}(\theta)\big{)}\psi(\tilde{\eta}_{k}(\theta))\big{\|}_{\Theta}\leq C\sum_{k\geq 1}\frac{1}{k^{\gamma}}\overline{\psi}_{k}<\infty\text{ from (\ref{eq_th1}) }.

    Hence, (6.3) holds and thus, In,2a.s.n0I_{n,2}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0. This achieves the proof of Lemma 6.1. ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\blacksquare

To complete the proof of Theorem 2.1, we will show that: (1.) 𝔼[α,tΘ]<\mathbb{E}\left[\left\|\ell_{\alpha,t}\right\|_{\Theta}\right]<\infty and (2.) the function θ𝔼(α,1(θ))\theta\mapsto\mathbb{E}(\ell_{\alpha,1}(\theta)) has a unique minimum at θ\theta^{*}.

  1. (1.)

    For any θΘ\theta\in\Theta, we have

    |α,t(θ)|y=0g(y|ηt(θ))1+α+(1+1α)g(Yt|ηt(θ))αy=0g(y|ηt(θ))+(1+1α)2+1α.\left|\ell_{\alpha,t}(\theta)\right|\leq\underset{y=0}{\overset{\infty}{\sum}}g(y|\eta_{t}(\theta))^{1+\alpha}+\big{(}1+\frac{1}{\alpha}\big{)}g(Y_{t}|\eta_{t}(\theta))^{\alpha}\leq\underset{y=0}{\overset{\infty}{\sum}}g(y|\eta_{t}(\theta))+\big{(}1+\frac{1}{\alpha}\big{)}\leq 2+\frac{1}{\alpha}.

    Hence, 𝔼[α,tΘ]<\mathbb{E}\left[\left\|\ell_{\alpha,t}\right\|_{\Theta}\right]<\infty.

  2. (2.)

    Let θΘ\theta\in\Theta, with θθ\theta\neq\theta^{*}. We have

    𝔼(α,1(θ))𝔼(α,1(θ))=𝔼{𝔼[(α,1(θ)α,1(θ))|0]}\displaystyle\mathbb{E}(\ell_{\alpha,1}(\theta))-\mathbb{E}(\ell_{\alpha,1}(\theta^{*}))=\mathbb{E}\Big{\{}\mathbb{E}\big{[}\big{(}\ell_{\alpha,1}(\theta)-\ell_{\alpha,1}(\theta^{*})\big{)}\big{|}\mathcal{F}_{0}\big{]}\Big{\}}
    =𝔼{y=0g(y|η1(θ))1+αy=0g(y|η1(θ))1+α(1+1α)𝔼[g(Y1|η1(θ))αg(Y1|η1(θ))α|0]}\displaystyle=\mathbb{E}\Bigg{\{}\sum_{y=0}^{\infty}g(y|\eta_{1}(\theta))^{1+\alpha}-\sum_{y=0}^{\infty}g(y|\eta_{1}(\theta^{*}))^{1+\alpha}-\big{(}1+\frac{1}{\alpha}\big{)}\mathbb{E}\bigg{[}g(Y_{1}|\eta_{1}(\theta))^{\alpha}-g(Y_{1}|\eta_{1}(\theta^{*}))^{\alpha}\Big{|}\mathcal{F}_{0}\bigg{]}\Bigg{\}}
    =𝔼{y=0g(y|η1(θ))1+αy=0g(y|η1(θ))1+α(1+1α)y=0[(g(y|η1(θ))αg(y|η1(θ))α)g(y|η1(θ))]}\displaystyle=\mathbb{E}\Bigg{\{}\sum_{y=0}^{\infty}g(y|\eta_{1}(\theta))^{1+\alpha}-\sum_{y=0}^{\infty}g(y|\eta_{1}(\theta^{*}))^{1+\alpha}-\big{(}1+\frac{1}{\alpha}\big{)}\sum_{y=0}^{\infty}\bigg{[}\big{(}g(y|\eta_{1}(\theta))^{\alpha}-g(y|\eta_{1}(\theta^{*}))^{\alpha}\big{)}g(y|\eta_{1}(\theta^{*}))\bigg{]}\Bigg{\}}
    =𝔼{y=0[g(y|η1(θ))1+αg(y|η1(θ))1+α(1+1α)(g(y|η1(θ))αg(y|η1(θ))α)g(y|η1(θ))]}\displaystyle=\mathbb{E}\left\{\sum_{y=0}^{\infty}\left[g(y|\eta_{1}(\theta))^{1+\alpha}-g(y|\eta_{1}(\theta^{*}))^{1+\alpha}-\big{(}1+\frac{1}{\alpha}\big{)}\big{(}g(y|\eta_{1}(\theta))^{\alpha}-g(y|\eta_{1}(\theta^{*}))^{\alpha}\big{)}g(y|\eta_{1}(\theta^{*}))\right]\right\}
    =𝔼{dα(g(|η1(θ)),g(|η1(θ)))}0,\displaystyle=\mathbb{E}\left\{d_{\alpha}\big{(}g(\cdot|\eta_{1}(\theta)),g(\cdot|\eta_{1}(\theta^{*}))\big{)}\right\}\geq 0,

    where the equality holds a.s. if and only if θ=θ\theta=\theta^{*} according to (A0), (A3), (A6). Thus, the function θ𝔼(α,1(θ))\theta\mapsto\mathbb{E}(\ell_{\alpha,1}(\theta)) has a unique minimum at θ\theta^{*}.

Recall that since {(Yt,Xt),t}\{(Y_{t},X_{t}),\,t\in\mathbb{Z}\} is stationary and ergodic, the process {α,t(θ),t}\{\ell_{\alpha,t}(\theta),\,t\in\mathbb{Z}\} is also a stationary and ergodic sequence. Then, according to (1.), by the uniform strong law of large number applied on the process {α,t(θ),t}\{\ell_{\alpha,t}(\theta),\,t\in\mathbb{Z}\}, it holds that

Hα,n(θ)𝔼(α,1(θ))Θ=1nt=1nα,t(θ)𝔼(α,1(θ))Θa.s.n0.\displaystyle\left\|H_{\alpha,n}(\theta)-\mathbb{E}(\ell_{\alpha,1}(\theta))\right\|_{\Theta}=\bigg{\|}\frac{1}{n}\sum_{t=1}^{n}\ell_{\alpha,t}(\theta)-\mathbb{E}(\ell_{\alpha,1}(\theta))\bigg{\|}_{\Theta}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0. (6.6)

Hence, from Lemma 6.1 and (6.6), we have

H^α,n(θ)𝔼(α,1(θ))Θ\displaystyle\left\|\widehat{H}_{\alpha,n}(\theta)-\mathbb{E}(\ell_{\alpha,1}(\theta))\right\|_{\Theta} H^α,n(θ)Hα,n(θ)Θ+Hα,n(θ)𝔼(α,1(θ))Θa.s.n0.\displaystyle\leq\big{\|}\widehat{H}_{\alpha,n}(\theta)-H_{\alpha,n}(\theta)\big{\|}_{\Theta}+\big{\|}H_{\alpha,n}(\theta)-\mathbb{E}(\ell_{\alpha,1}(\theta))\big{\|}_{\Theta}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0. (6.9)

(2.), (6.9) and standard arguments lead to the conclusion. ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\blacksquare

6.2 Proof of Theorem 2.2

The following lemma is needed.

Lemma 6.2

Assume that the conditions of Theorem 2.2 hold. Then

𝔼(1nt=1n^α,t(θ)θα,t(θ)θΘ)n0.\mathbb{E}\Big{(}\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\Big{\|}\frac{\partial\widehat{\ell}_{\alpha,t}(\theta)}{\partial\theta}-\frac{\partial\ell_{\alpha,t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}\Big{)}\begin{array}[t]{c}\stackrel{{\scriptstyle}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0. (6.10)

Proof of Lemma 6.2
Remark that for all tt\in\mathbb{Z},

α,t(θ)θ\displaystyle\frac{\partial\ell_{\alpha,t}(\theta)}{\partial\theta} =(1+α)[y=0g(y|ηt(θ))θg(y|ηt(θ))αg(Yt|ηt(θ))θg(Yt|ηt(θ))α1]\displaystyle=(1+\alpha)\big{[}\underset{y=0}{\overset{\infty}{\sum}}\frac{\partial g(y|\eta_{t}(\theta))}{\partial\theta}g(y|\eta_{t}(\theta))^{\alpha}-\frac{\partial g(Y_{t}|\eta_{t}(\theta))}{\partial\theta}g(Y_{t}|\eta_{t}(\theta))^{\alpha-1}\big{]}
=(1+α)ηt(θ)θ[y=0g(y|ηt(θ))ηg(y|ηt(θ))αg(Yt|ηt(θ))ηg(Yt|ηt(θ))α1]\displaystyle=(1+\alpha)\frac{\partial\eta_{t}(\theta)}{\partial\theta}\bigg{[}\underset{y=0}{\overset{\infty}{\sum}}\frac{\partial g(y|\eta_{t}(\theta))}{\partial\eta}g(y|\eta_{t}(\theta))^{\alpha}-\frac{\partial g(Y_{t}|\eta_{t}(\theta))}{\partial\eta}g(Y_{t}|\eta_{t}(\theta))^{\alpha-1}\bigg{]}
=(1+α)λt(θ)θη(λt(θ))[y=0g(y|ηt(θ))ηg(y|ηt(θ))αg(Yt|ηt(θ))ηg(Yt|ηt(θ))α1]\displaystyle=(1+\alpha)\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\eta^{\prime}(\lambda_{t}(\theta))\bigg{[}\underset{y=0}{\overset{\infty}{\sum}}\frac{\partial g(y|\eta_{t}(\theta))}{\partial\eta}g(y|\eta_{t}(\theta))^{\alpha}-\frac{\partial g(Y_{t}|\eta_{t}(\theta))}{\partial\eta}g(Y_{t}|\eta_{t}(\theta))^{\alpha-1}\bigg{]}
=hα(λt(θ))λt(θ)θ,\displaystyle=h_{\alpha}(\lambda_{t}(\theta))\frac{\partial\lambda_{t}(\theta)}{\partial\theta},

where the function hαh_{\alpha} is defined in (2.3). ^α,t(θ)θ\frac{\partial\widehat{\ell}_{\alpha,t}(\theta)}{\partial\theta} can be computed in the same way and by using the relation |a1b1a2b2||a1a2||b2|+|b1b2||a1|,a1,a2,b1,b2|a_{1}b_{1}-a_{2}b_{2}|\leq|a_{1}-a_{2}||b_{2}|+|b_{1}-b_{2}||a_{1}|,~{}\forall a_{1},a_{2},b_{1},b_{2}\in\mathbb{R}, we get,

^α,t(θ)θα,t(θ)θΘ\displaystyle\Big{\|}\frac{\partial\widehat{\ell}_{\alpha,t}(\theta)}{\partial\theta}-\frac{\partial\ell_{\alpha,t}(\theta)}{\partial\theta}\Big{\|}_{\Theta} =hα(λ^t(θ))λ^t(θ)θhα(λt(θ))λt(θ)θΘ\displaystyle=\Big{\|}h_{\alpha}(\widehat{\lambda}_{t}(\theta))\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}-h_{\alpha}(\lambda_{t}(\theta))\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}
hα(λt(θ))Θλ^t(θ)θλt(θ)θΘ+hα(λ^t(θ))hα(λt(θ))Θλ^t(θ)θΘ.\displaystyle\leq\big{\|}h_{\alpha}(\lambda_{t}(\theta))\big{\|}_{\Theta}\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}-\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}+\big{\|}h_{\alpha}(\widehat{\lambda}_{t}(\theta))-h_{\alpha}(\lambda_{t}(\theta))\big{\|}_{\Theta}\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}. (6.11)

The mean value theorem applied to the function λhα(λ)\lambda\mapsto h_{\alpha}(\lambda) gives,

|hα(λ^t(θ))hα(λt(θ))|\displaystyle\big{|}h_{\alpha}(\widehat{\lambda}_{t}(\theta))-h_{\alpha}(\lambda_{t}(\theta))\big{|} =|hα(λ~t(θ))λ||λ^t(θ)λt(θ)|\displaystyle=\Big{|}\frac{\partial h_{\alpha}(\tilde{\lambda}_{t}(\theta))}{\partial\lambda}\Big{|}\big{|}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{|}
=|mα(λ~t(θ))||λ^t(θ)λt(θ)|,\displaystyle=\big{|}m_{\alpha}(\tilde{\lambda}_{t}(\theta))\big{|}\big{|}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{|},

where λ~t(θ)\tilde{\lambda}_{t}(\theta) is between λ^t(θ)\widehat{\lambda}_{t}(\theta) and λt(θ)\lambda_{t}(\theta); and the function mαm_{\alpha} is defined in (2.4). Thus,

𝔼(1nt=1n^α,t(θ)θα,t(θ)θΘ)1nt=1n𝔼[hα(λt(θ))Θλ^t(θ)θλt(θ)θΘ]+1nt=1n𝔼[mα(λ~t(θ))Θλ^t(θ)λt(θ)Θλ^t(θ)θΘ].\mathbb{E}\Big{(}\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\Big{\|}\frac{\partial\widehat{\ell}_{\alpha,t}(\theta)}{\partial\theta}-\frac{\partial\ell_{\alpha,t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}\Big{)}\leq\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\mathbb{E}\Big{[}\big{\|}h_{\alpha}(\lambda_{t}(\theta))\big{\|}_{\Theta}\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}-\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}\Big{]}\\ +\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\mathbb{E}\Big{[}\big{\|}m_{\alpha}(\tilde{\lambda}_{t}(\theta))\big{\|}_{\Theta}\big{\|}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{\|}_{\Theta}\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}\Big{]}. (6.12)

From Assumption A(Θ)1{}_{1}(\Theta), for all t1t\geq 1, we have

λ^t(θ)θλt(θ)θΘ\displaystyle\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}-\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta} =θfθ(Yt1,,Y1,0,;Xt1,,X1,0,)θfθ(Yt1,;Xt1,)Θ\displaystyle=\Big{\|}\frac{\partial}{\partial\theta}f_{\theta}(Y_{t-1},\cdots,Y_{1},0,\cdots;X_{t-1},\cdots,X_{1},0,\cdots)-\frac{\partial}{\partial\theta}f_{\theta}(Y_{t-1},\cdots;X_{t-1},\cdots)\Big{\|}_{\Theta}
tα,Y(1)Yt+tα,X(1)Xt,\displaystyle\leq\sum\limits_{\ell\geq t}\alpha^{(1)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq t}\alpha^{(1)}_{\ell,X}\left\|X_{t-\ell}\right\|, (6.13)

and

λ^t(θ)θΘ\displaystyle\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta} θfθ(0)Θ+θfθ(Yt1,,Y1,0,;Xt1,,X1,0,)θfθ(0)Θ\displaystyle\leq\Big{\|}\frac{\partial}{\partial\theta}f_{\theta}(0)\Big{\|}_{\Theta}+\Big{\|}\frac{\partial}{\partial\theta}f_{\theta}(Y_{t-1},\cdots,Y_{1},0,\cdots;X_{t-1},\cdots,X_{1},0,\cdots)-\frac{\partial}{\partial\theta}f_{\theta}(0)\Big{\|}_{\Theta}
θfθ(0)Θ+=1t1α,Y(1)Yt+=1t1α,X(1)Xt\displaystyle\leq\Big{\|}\frac{\partial}{\partial\theta}f_{\theta}(0)\Big{\|}_{\Theta}+\sum\limits_{\ell=1}^{t-1}\alpha^{(1)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell=1}^{t-1}\alpha^{(1)}_{\ell,X}\left\|X_{t-\ell}\right\|
C+1α,Y(1)Yt+1α,X(1)Xt.\displaystyle\leq C+\sum\limits_{\ell\geq 1}\alpha^{(1)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq 1}\alpha^{(1)}_{\ell,X}\left\|X_{t-\ell}\right\|. (6.14)

From (6.13), the Hölder’s inequality and the assumption (A7), we get

1nt=1n𝔼[hα(λt(θ))Θλ^t(θ)θλt(θ)θΘ]\displaystyle\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\mathbb{E}\Big{[}\big{\|}h_{\alpha}(\lambda_{t}(\theta))\big{\|}_{\Theta}\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}-\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}\Big{]}
1nt=1n𝔼[hα(λt(θ))Θ(tα,Y(1)Yt+tα,X(1)Xt)]\displaystyle\hskip 28.45274pt\leq\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\mathbb{E}\Big{[}\big{\|}h_{\alpha}(\lambda_{t}(\theta))\big{\|}_{\Theta}\big{(}\sum\limits_{\ell\geq t}\alpha^{(1)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq t}\alpha^{(1)}_{\ell,X}\left\|X_{t-\ell}\right\|\big{)}\Big{]}
1nt=1nhα(λt(θ))Θ2[tα,Y(1)Yt2+tα,X(1)Xt2]\displaystyle\hskip 28.45274pt\leq\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\|\|h_{\alpha}(\lambda_{t}(\theta))\|_{\Theta}\|_{2}\big{[}\sum\limits_{\ell\geq t}\alpha^{(1)}_{\ell,Y}\|Y_{t-\ell}\|_{2}+\sum\limits_{\ell\geq t}\alpha^{(1)}_{\ell,X}\|\|X_{t-\ell}\|\|_{2}\big{]}
C1nt=1nt(α,Y(1)+α,X(1))C1nt=1n1tγ1(from the condition (2.6))\displaystyle\hskip 28.45274pt\leq C\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sum\limits_{\ell\geq t}\big{(}\alpha^{(1)}_{\ell,Y}+\alpha^{(1)}_{\ell,X}\big{)}\leq C\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\frac{1}{t^{\gamma-1}}~{}(\text{from the condition }(\ref{eq_th2}))
C1n(1+n2γ)n0,\displaystyle\hskip 28.45274pt\leq C\frac{1}{\sqrt{n}}(1+n^{2-\gamma})\begin{array}[t]{c}\stackrel{{\scriptstyle}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0,

where the last convergence holds since γ>3/2\gamma>3/2. Therefore, the first term of the right hand side of (6.12) converges to 0. Moreover, according to (3.6), (6.2), the Hölder’s inequality and the assumption (A7), we have

1nt=1n𝔼[mα(λ~t(θ))Θλ^t(θ)λt(θ)Θλ^t(θ)θΘ]\displaystyle\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\mathbb{E}\Big{[}\|m_{\alpha}(\tilde{\lambda}_{t}(\theta))\|_{\Theta}\big{\|}\widehat{\lambda}_{t}(\theta)-\lambda_{t}(\theta)\big{\|}_{\Theta}\Big{\|}\frac{\partial\widehat{\lambda}_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}\Big{]}
1nt=1n𝔼[mα(λ~t(θ))Θ(tα,Y(0)Yt+tα,X(0)Xt)(fθ(0)θΘ+=1α,Y(1)Yt+1α,X(1)Xt)]\displaystyle\leq\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\mathbb{E}\Big{[}\|m_{\alpha}(\tilde{\lambda}_{t}(\theta))\|_{\Theta}\Big{(}\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|\Big{)}\Big{(}\Big{\|}\frac{\partial f_{\theta}(0)}{\partial\theta}\Big{\|}_{\Theta}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq 1}\alpha^{(1)}_{\ell,X}\left\|X_{t-\ell}\right\|\Big{)}\Big{]}
1nmα(λ~t(θ))Θ2tα,Y(0)Yt+tα,X(0)Xt4fθ(0)θΘ+=1α,Y(1)Yt+1α,X(1)Xt4\displaystyle\leq\frac{1}{\sqrt{n}}\Big{\|}\Big{\|}m_{\alpha}(\tilde{\lambda}_{t}(\theta))\Big{\|}_{\Theta}\Big{\|}_{2}\cdot\Big{\|}\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,X}\left\|X_{t-\ell}\right\|\Big{\|}_{4}\cdot\Big{\|}\Big{\|}\frac{\partial f_{\theta}(0)}{\partial\theta}\Big{\|}_{\Theta}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell\geq 1}\alpha^{(1)}_{\ell,X}\left\|X_{t-\ell}\right\|\Big{\|}_{4}
1nm¯αt=1n(tα,Y(0)Yt4+tα,X(0)Xt4)(fθ(0)θΘ4+=1α,Y(1)Yt4+=1α,X(1)Xt4)\displaystyle\leq\frac{1}{\sqrt{n}}\overline{m}_{\alpha}\sum_{t=1}^{n}\Big{(}\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,Y}\|Y_{t-\ell}\|_{4}+\sum\limits_{\ell\geq t}\alpha^{(0)}_{\ell,X}\|\|X_{t-\ell}\|\|_{4}\Big{)}\Big{(}\Big{\|}\Big{\|}\frac{\partial f_{\theta}(0)}{\partial\theta}\Big{\|}_{\Theta}\Big{\|}_{4}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}\|Y_{t-\ell}\|_{4}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,X}\|\|X_{t-\ell}\|\|_{4}\Big{)}
C1nt=1n(t(α,Y(0)+α,X(0)))(C+C(=1α,Y(1)+=1α,X(1)))\displaystyle\leq C\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\Big{(}\sum\limits_{\ell\geq t}\big{(}\alpha^{(0)}_{\ell,Y}+\alpha^{(0)}_{\ell,X}\big{)}\Big{)}\Big{(}C+C\big{(}\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,X}\big{)}\Big{)}
C1nt=1nt(α,Y(0)+α,X(0))C1nt=1n1tγ1n0(from the condition (2.6) and see above).\displaystyle\leq C\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sum\limits_{\ell\geq t}\big{(}\alpha^{(0)}_{\ell,Y}+\alpha^{(0)}_{\ell,X}\big{)}\leq C\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\dfrac{1}{t^{\gamma-1}}\begin{array}[t]{c}\stackrel{{\scriptstyle}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0~{}(\text{from the condition }(\ref{eq_th2})\text{ and see above}).

Hence, the second term of the right hand side of (6.12) converges to 0. This complete the proof of the lemma. \blacksquare   
  

The following lemma is also needed.

Lemma 6.3

Assume that the conditions of Theorem 2.2 hold. Then

1nt=1n2α,t(θ)θθT𝔼(2α,1(θ)θθT)Θa.s.n0.\bigg{\|}\frac{1}{n}\sum_{t=1}^{n}\frac{\partial^{2}\ell_{\alpha,t}(\theta)}{\partial\theta\partial\theta^{T}}-\mathbb{E}\Big{(}\frac{\partial^{2}\ell_{\alpha,1}(\theta)}{\partial\theta\partial\theta^{T}}\Big{)}\bigg{\|}_{\Theta}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0. (6.15)

Proof of Lemma 6.3

Recall that α,t(θ)θ=hα(λt(θ))λt(θ)θ\frac{\partial\ell_{\alpha,t}(\theta)}{\partial\theta}=h_{\alpha}(\lambda_{t}(\theta))\frac{\partial\lambda_{t}(\theta)}{\partial\theta}. Then, for i,j{1,,d}i,j\in\{1,\cdots,d\}, we have

2α,t(θ)θiθj\displaystyle\frac{\partial^{2}\ell_{\alpha,t}(\theta)}{\partial\theta_{i}\partial\theta_{j}} =θj(hα(λt(θ))λt(θ)θi)\displaystyle=\frac{\partial}{\partial\theta_{j}}\big{(}h_{\alpha}(\lambda_{t}(\theta))\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{i}}\big{)}
=hα(λt(θ))2λt(θ)θjθi+hα(λt(θ))θjλt(θ)θi\displaystyle=h_{\alpha}(\lambda_{t}(\theta))\frac{\partial^{2}\lambda_{t}(\theta)}{\partial\theta_{j}\partial\theta_{i}}+\frac{\partial h_{\alpha}(\lambda_{t}(\theta))}{\partial\theta_{j}}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{i}}
=hα(λt(θ))2λt(θ)θiθj+hα(λt(θ))λλt(θ)θiλt(θ)θj\displaystyle=h_{\alpha}(\lambda_{t}(\theta))\frac{\partial^{2}\lambda_{t}(\theta)}{\partial\theta_{i}\partial\theta_{j}}+\frac{\partial h_{\alpha}(\lambda_{t}(\theta))}{\partial\lambda}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{i}}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{j}}
=hα(λt(θ))2λt(θ)θiθj+mα(λt(θ))λt(θ)θiλt(θ)θj.\displaystyle=h_{\alpha}(\lambda_{t}(\theta))\frac{\partial^{2}\lambda_{t}(\theta)}{\partial\theta_{i}\partial\theta_{j}}+m_{\alpha}(\lambda_{t}(\theta))\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{i}}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{j}}.

Let us show that 𝔼(2α,t(θ)θiθjΘ)<\mathbb{E}\Big{(}\left\|\frac{\partial^{2}\ell_{\alpha,t}(\theta)}{\partial\theta_{i}\partial\theta_{j}}\right\|_{\Theta}\Big{)}<\infty, for all i,j{1,,d}i,j\in\{1,\cdots,d\}. From the Hölder’s inequality and the assumption (A7), we have

𝔼(2α,t(θ)θiθjΘ)\displaystyle\mathbb{E}\Big{(}\Big{\|}\frac{\partial^{2}\ell_{\alpha,t}(\theta)}{\partial\theta\partial_{i}\theta_{j}}\Big{\|}_{\Theta}\Big{)} hα(λt(θ))Θ22λt(θ)θiθjΘ2+mα(λt(θ))Θ2λt(θ)θiΘλt(θ)θjΘ2\displaystyle\leq\left\|\left\|h_{\alpha}(\lambda_{t}(\theta))\right\|_{\Theta}\right\|_{2}\Big{\|}\Big{\|}\frac{\partial^{2}\lambda_{t}(\theta)}{\partial\theta_{i}\partial\theta_{j}}\Big{\|}_{\Theta}\Big{\|}_{2}+\left\|\|m_{\alpha}(\lambda_{t}(\theta))\right\|_{\Theta}\|_{2}\cdot\Big{\|}\Big{\|}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{i}}\Big{\|}_{\Theta}\Big{\|}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{j}}\Big{\|}_{\Theta}\Big{\|}_{2}
(h¯α+1)2λt(θ)θiθjΘ2+m¯αλt(θ)θiΘ4λt(θ)θjΘ4.\displaystyle\leq(\overline{h}_{\alpha}+1)\Big{\|}\Big{\|}\frac{\partial^{2}\lambda_{t}(\theta)}{\partial\theta_{i}\partial\theta_{j}}\Big{\|}_{\Theta}\Big{\|}_{2}+\overline{m}_{\alpha}\Big{\|}\Big{\|}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{i}}\Big{\|}_{\Theta}\Big{\|}_{4}\cdot\Big{\|}\Big{\|}\frac{\partial\lambda_{t}(\theta)}{\partial\theta_{j}}\Big{\|}_{\Theta}\Big{\|}_{4}. (6.16)

Moreover, according to the assumption A(Θ)2{}_{2}(\Theta), for all tt\in\mathbb{Z}, we get

2λt(θ)θiθjΘ\displaystyle\Big{\|}\frac{\partial^{2}\lambda_{t}(\theta)}{\partial\theta_{i}\partial\theta_{j}}\Big{\|}_{\Theta} 2θiθjfθ(0)Θ+2θiθjfθ(Yt1,;Xt1,)2θiθjfθ(0)Θ\displaystyle\leq\Big{\|}\frac{\partial^{2}}{\partial\theta_{i}\partial\theta_{j}}f_{\theta}(0)\Big{\|}_{\Theta}+\Big{\|}\frac{\partial^{2}}{\partial\theta_{i}\partial\theta_{j}}f_{\theta}(Y_{t-1},\cdots;X_{t-1},\cdots)-\frac{\partial^{2}}{\partial\theta_{i}\partial\theta_{j}}f_{\theta}(0)\Big{\|}_{\Theta}
2θiθjfθ(0)Θ+=1α,Y(2)Yt+=1α,X(2)Xt.\displaystyle\leq\Big{\|}\frac{\partial^{2}}{\partial\theta_{i}\partial\theta_{j}}f_{\theta}(0)\Big{\|}_{\Theta}+\sum\limits_{\ell=1}^{\infty}\alpha^{(2)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell=1}^{\infty}\alpha^{(2)}_{\ell,X}\left\|X_{t-\ell}\right\|.

Thus, according to (6.2) and similar arguments as in (6.2), we deduce

𝔼(2α,t(θ)θiθjΘ)\displaystyle\mathbb{E}\Big{(}\Big{\|}\frac{\partial^{2}\ell_{\alpha,t}(\theta)}{\partial\theta\partial_{i}\theta_{j}}\Big{\|}_{\Theta}\Big{)} (h¯α+1)(2θiθjfθ(0)Θ2+=1α,Y(2)Yt2+=1α,X(2)Xt2)\displaystyle\leq(\overline{h}_{\alpha}+1)\Big{(}\Big{\|}\Big{\|}\frac{\partial^{2}}{\partial\theta_{i}\partial\theta_{j}}f_{\theta}(0)\Big{\|}_{\Theta}\Big{\|}_{2}+\sum\limits_{\ell=1}^{\infty}\alpha^{(2)}_{\ell,Y}\|Y_{t-\ell}\|_{2}+\sum\limits_{\ell=1}^{\infty}\alpha^{(2)}_{\ell,X}\left\|\left\|X_{t-\ell}\right\|\right\|_{2}\Big{)}
+m¯α(θfθ(0)Θ4+=1α,Y(1)Yt4+=1α,X(1)Xt4)2\displaystyle\hskip 85.35826pt+\overline{m}_{\alpha}\Big{(}\Big{\|}\Big{\|}\frac{\partial}{\partial\theta}f_{\theta}(0)\Big{\|}_{\Theta}\Big{\|}_{4}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}\|Y_{t-\ell}\|_{4}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,X}\left\|\left\|X_{t-\ell}\right\|\right\|_{4}\Big{)}^{2}
(h¯α+1)(C+C(=1α,Y(2)+=1α,X(2)))+m¯α(C+C(=1α,Y(1)+=1α,X(1)))2<.\displaystyle\leq(\overline{h}_{\alpha}+1)\Big{(}C+C\big{(}\sum\limits_{\ell=1}^{\infty}\alpha^{(2)}_{\ell,Y}+\sum\limits_{\ell=1}^{\infty}\alpha^{(2)}_{\ell,X}\big{)}\Big{)}+\overline{m}_{\alpha}\Big{(}C+C\big{(}\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,X}\big{)}\Big{)}^{2}<\infty.

Hence, from the stationarity and ergodicity properties of the sequence {2α,t(θ)θθT,t}\big{\{}\frac{\partial^{2}\ell_{\alpha,t}(\theta)}{\partial\theta\partial\theta^{T}},\,t\in\mathbb{Z}\big{\}} and the uniform strong law of large numbers, it holds that

1nt=1n2α,t(θ)θθT𝔼(2α,1(θ)θθT)Θa.s.n0.\Big{\|}\frac{1}{n}\sum_{t=1}^{n}\frac{\partial^{2}\ell_{\alpha,t}(\theta)}{\partial\theta\partial\theta^{T}}-\mathbb{E}\Big{(}\frac{\partial^{2}\ell_{\alpha,1}(\theta)}{\partial\theta\partial\theta^{T}}\Big{)}\Big{\|}_{\Theta}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0.

Thus, Lemma 6.3 is verified. ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\blacksquare
 

Now, we use the results of Lemma 6.2 and 6.3 to prove Theorem 2.1. For i{1,,d}i\in\{1,\cdots,d\}, by applying the Taylor expansion to the function θθiHα,n(θ)\theta\mapsto\frac{\partial}{\partial\theta_{i}}H_{\alpha,n}(\theta), there exists θ~n,i\tilde{\theta}_{n,i} between θ^α,n\widehat{\theta}_{\alpha,n} and θ\theta^{*} such that

θiHα,n(θ^α,n)=θiHα,n(θ)+2θθiHα,n(θ~n,i)(θ^α,nθ).\frac{\partial}{\partial\theta_{i}}H_{\alpha,n}(\widehat{\theta}_{\alpha,n})=\frac{\partial}{\partial\theta_{i}}H_{\alpha,n}(\theta^{*})+\frac{\partial^{2}}{\partial\theta\partial\theta_{i}}H_{\alpha,n}(\tilde{\theta}_{n,i})(\widehat{\theta}_{\alpha,n}-\theta^{*}).

It comes that

nJn(θ^α,n)(θ^α,nθ)=n(θHα,n(θ)θHα,n(θ^α,n)),\displaystyle\sqrt{n}J_{n}(\widehat{\theta}_{\alpha,n})(\widehat{\theta}_{\alpha,n}-\theta^{*})=\sqrt{n}\Big{(}\frac{\partial}{\partial\theta}H_{\alpha,n}(\theta^{*})-\frac{\partial}{\partial\theta}H_{\alpha,n}(\widehat{\theta}_{\alpha,n})\Big{)}, (6.17)

where

Jn(θ^α,n)=(2θθiHα,n(θ~n,i))1id.J_{n}(\widehat{\theta}_{\alpha,n})=-\bigg{(}\frac{\partial^{2}}{\partial\theta\partial\theta_{i}}H_{\alpha,n}(\tilde{\theta}_{n,i})\bigg{)}_{1\leq i\leq d}.

We can rewrite (6.17) as

nJn(θ^α,n)(θ^α,nθ)=nθHα,n(θ)nθH^α,n(θ^α,n)+n(θH^α,n(θ^α,n)θHα,n(θ^α,n)).\sqrt{n}J_{n}(\widehat{\theta}_{\alpha,n})(\widehat{\theta}_{\alpha,n}-\theta^{*})=\sqrt{n}\frac{\partial}{\partial\theta}H_{\alpha,n}(\theta^{*})-\sqrt{n}\frac{\partial}{\partial\theta}\widehat{H}_{\alpha,n}(\widehat{\theta}_{\alpha,n})+\sqrt{n}\Big{(}\frac{\partial}{\partial\theta}\widehat{H}_{\alpha,n}(\widehat{\theta}_{\alpha,n})-\frac{\partial}{\partial\theta}H_{\alpha,n}(\widehat{\theta}_{\alpha,n})\Big{)}.

According to Lemma 6.2, it holds that

𝔼(n|θH^α,n(θ^α,n)θHα,n(θ^α,n)|)\displaystyle\mathbb{E}\Big{(}\sqrt{n}\Big{|}\frac{\partial}{\partial\theta}\widehat{H}_{\alpha,n}(\widehat{\theta}_{\alpha,n})-\frac{\partial}{\partial\theta}H_{\alpha,n}(\widehat{\theta}_{\alpha,n})\Big{|}\Big{)} 𝔼(nθH^α,n(θ)θHα,n(θ)Θ)\displaystyle\leq\mathbb{E}\Big{(}\sqrt{n}\Big{\|}\frac{\partial}{\partial\theta}\widehat{H}_{\alpha,n}(\theta)-\frac{\partial}{\partial\theta}H_{\alpha,n}(\theta)\Big{\|}_{\Theta}\Big{)}
𝔼(1nt=1n^α,t(θ)θα,t(θ)θΘ)n0.\displaystyle\leq\mathbb{E}\Big{(}\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\bigg{\|}\frac{\partial\widehat{\ell}_{\alpha,t}(\theta)}{\partial\theta}-\frac{\partial\ell_{\alpha,t}(\theta)}{\partial\theta}\bigg{\|}_{\Theta}\Big{)}\begin{array}[t]{c}\stackrel{{\scriptstyle}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0.

Moreover, for nn large enough, θH^α,n(θ^α,n)=0\frac{\partial}{\partial\theta}\widehat{H}_{\alpha,n}(\widehat{\theta}_{\alpha,n})=0, since θ^α,n\widehat{\theta}_{\alpha,n} is a local minimum of the function θH^α,n(θ)\theta\mapsto\widehat{H}_{\alpha,n}(\theta).
So, for nn large enough, we have

nJn(θ^α,n)(θ^α,nθ)=nθHα,n(θ)+oP(1).\displaystyle\sqrt{n}J_{n}(\widehat{\theta}_{\alpha,n})(\widehat{\theta}_{\alpha,n}-\theta^{*})=\sqrt{n}\frac{\partial}{\partial\theta}H_{\alpha,n}(\theta^{*})+o_{P}(1). (6.18)

To complete the proof of Theorem 2.2, we will show that

  1. (i)

    {α,t(θ)θ|t1,t}\big{\{}\frac{\partial\ell_{\alpha,t}(\theta^{*})}{\partial\theta}|\mathcal{F}_{t-1},\,t\in\mathbb{Z}\big{\}} is a stationary ergodic martingale difference sequence and 𝔼(α,t(θ)θ)2<\mathbb{E}\big{(}\frac{\partial\ell_{\alpha,t}(\theta^{*})}{\partial\theta}\big{)}^{2}<\infty.

  2. (ii)

    Jn(θ^α,n)a.s.nJαJ_{n}(\widehat{\theta}_{\alpha,n})\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}J_{\alpha}.

  3. (iii)

    The matrix JαJ_{\alpha} is invertible.

  1. (i)

    Recall that t1=σ((Ys,Xs),st1)\mathcal{F}_{t-1}=\sigma((Y_{s},X_{s}),\,s\leq t-1) and α,t(θ)θ=hα(λt(θ))λt(θ)θ\frac{\partial\ell_{\alpha,t}(\theta^{*})}{\partial\theta}=h_{\alpha}(\lambda_{t}(\theta^{*}))\frac{\partial\lambda_{t}(\theta^{*})}{\partial\theta}, where

    hα(λt(θ))=(1+α)η(λt(θ))[y=0g(y|ηt(θ))ηg(y|ηt(θ))αg(Yt|ηt(θ))ηg(Yt|ηt(θ))α1].h_{\alpha}(\lambda_{t}(\theta^{*}))=(1+\alpha)\eta^{\prime}(\lambda_{t}(\theta^{*}))\bigg{[}\underset{y=0}{\overset{\infty}{\sum}}\frac{\partial g(y|\eta_{t}(\theta^{*}))}{\partial\eta}g(y|\eta_{t}(\theta^{*}))^{\alpha}-\frac{\partial g(Y_{t}|\eta_{t}(\theta^{*}))}{\partial\eta}g(Y_{t}|\eta_{t}(\theta^{*}))^{\alpha-1}\bigg{]}.

    Since the functions λt(θ)\lambda_{t}(\theta^{*}) and λt(θ)θ\frac{\partial\lambda_{t}(\theta^{*})}{\partial\theta} are t1\mathcal{F}_{t-1}-measurable, we have

    𝔼(α,t(θ)θ|t1)=𝔼(hα(λt(θ))|t1)λt(θ)θ and 𝔼(hα(λt(θ))|t1)=0.\mathbb{E}\Big{(}\frac{\partial\ell_{\alpha,t}(\theta^{*})}{\partial\theta}|\mathcal{F}_{t-1}\Big{)}=\mathbb{E}\big{(}h_{\alpha}(\lambda_{t}(\theta^{*}))|\mathcal{F}_{t-1}\big{)}\frac{\partial\lambda_{t}(\theta^{*})}{\partial\theta}~{}\text{ and }~{}\mathbb{E}\big{(}h_{\alpha}(\lambda_{t}(\theta^{*}))|\mathcal{F}_{t-1}\big{)}=0.

    Thus, 𝔼(α,t(θ)θ|t1)=0\mathbb{E}\Big{(}\frac{\partial\ell_{\alpha,t}(\theta^{*})}{\partial\theta}|\mathcal{F}_{t-1}\Big{)}=0. Moreover, since YtY_{t}, Xt\left\|X_{t-\ell}\right\| and θfθ\frac{\partial}{\partial\theta}f_{\theta} have 4th4th-order moment, by using (A7) and the Hölder’s inequality, we get

    𝔼(|α,t(θ)θ|2)\displaystyle\mathbb{E}\Big{(}\Big{|}\frac{\partial\ell_{\alpha,t}(\theta^{*})}{\partial\theta}\Big{|}^{2}\Big{)} 𝔼((hα(λt(θ)))2Θλt(θ)θΘ2)hα(λt(θ))Θ22λt(θ)θΘ42\displaystyle\leq\mathbb{E}\Big{(}\left\|\left(h_{\alpha}(\lambda_{t}(\theta))\right)^{2}\right\|_{\Theta}\Big{\|}\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\Big{\|}^{2}_{\Theta}\Big{)}\leq\left\|\left\|h_{\alpha}(\lambda_{t}(\theta))\right\|^{2}_{\Theta}\right\|_{2}\Big{\|}\Big{\|}\frac{\partial\lambda_{t}(\theta)}{\partial\theta}\Big{\|}_{\Theta}\Big{\|}^{2}_{4}
    h¯αθfθ(0)Θ+=1α,Y(1)Yt+=1α,X(1)Xt42\displaystyle\leq\overline{h}_{\alpha}\Big{\|}\big{\|}\frac{\partial}{\partial\theta}f_{\theta}(0)\big{\|}_{\Theta}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}Y_{t-\ell}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,X}\left\|X_{t-\ell}\right\|\Big{\|}^{2}_{4}
    h¯α(θfθ(0)Θ4+=1α,Y(1)Yt4+=1α,X(1)Xt4)2\displaystyle\leq\overline{h}_{\alpha}\Big{(}\big{\|}\big{\|}\frac{\partial}{\partial\theta}f_{\theta}(0)\big{\|}_{\Theta}\big{\|}_{4}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}\left\|Y_{t-\ell}\right\|_{4}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,X}\left\|\left\|X_{t-\ell}\right\|\right\|_{4}\Big{)}^{2}
    C(C+C(=1α,Y(1)+=1α,X(1)))2<.\displaystyle\leq C\Big{(}C+C\big{(}\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,Y}+\sum\limits_{\ell=1}^{\infty}\alpha^{(1)}_{\ell,X}\big{)}\Big{)}^{2}<\infty.
  2. (ii)

    For any j=1,,dj=1,\cdots,d, we have

    |1nt=1nθjθiα,t(θ~n,i)𝔼(θjθiα,1(θ))|\displaystyle\Big{|}\frac{1}{n}\sum\limits_{t=1}^{n}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,t}(\tilde{\theta}_{n,i})-\mathbb{E}\Big{(}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,1}(\theta^{*})\Big{)}\Big{|}
    |1nt=1nθjθiα,t(θ~n,i)𝔼(θjθiα,1(θ~n,i))|+|𝔼(θjθiα,1(θ~n,i))𝔼(θjθiα,1(θ))|\displaystyle\leq\Big{|}\frac{1}{n}\sum\limits_{t=1}^{n}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,t}(\tilde{\theta}_{n,i})-\mathbb{E}\Big{(}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,1}(\tilde{\theta}_{n,i})\Big{)}\Big{|}+\Big{|}\mathbb{E}\Big{(}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,1}(\tilde{\theta}_{n,i})\Big{)}-\mathbb{E}\Big{(}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,1}(\theta^{*})\Big{)}\Big{|}
    1nt=1nθjθiα,t(θ)𝔼(θjθiα,1(θ))Θ+|𝔼(θjθiα,1(θ~n,i))𝔼(θjθiα,1(θ))|\displaystyle\leq\Big{\|}\frac{1}{n}\sum\limits_{t=1}^{n}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,t}(\theta)-\mathbb{E}\Big{(}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,1}(\theta)\Big{)}\Big{\|}_{\Theta}+\Big{|}\mathbb{E}\Big{(}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,1}(\tilde{\theta}_{n,i})\Big{)}-\mathbb{E}\Big{(}\frac{\partial}{\partial\theta_{j}\partial\theta_{i}}\ell_{\alpha,1}(\theta^{*})\Big{)}\Big{|}
    n0(by virtue of Lemma 6.3 and Theorem 2.1).\displaystyle\begin{array}[t]{c}\stackrel{{\scriptstyle}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}0~{}~{}\text{(by virtue of Lemma \ref{lem3} and Theorem \ref{th1})}.

    This holds for any i,j{1,,d}i,j\in\{1,\cdots,d\}. Thus,

    Jn(θ^α,n)=(2θθiHα,n(θ~n,i))1id=(1nt=1nθθiα,t(θ~n,i))1ida.s.n𝔼(θθTα,1(θ))=Jα.J_{n}(\widehat{\theta}_{\alpha,n})=-\bigg{(}\frac{\partial^{2}}{\partial\theta\partial\theta_{i}}H_{\alpha,n}(\tilde{\theta}_{n,i})\bigg{)}_{1\leq i\leq d}=-\bigg{(}\frac{1}{n}\sum\limits_{t=1}^{n}\frac{\partial}{\partial\theta\partial\theta_{i}}\ell_{\alpha,t}(\tilde{\theta}_{n,i})\bigg{)}_{1\leq i\leq d}\begin{array}[t]{c}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}-\mathbb{E}\Big{(}\frac{\partial}{\partial\theta\partial\theta^{T}}\ell_{\alpha,1}(\theta^{*})\Big{)}=J_{\alpha}.
  3. (iii)

    Let UU be a non-zero vector of d\mathbb{R}^{d}. We have

    U(Jα)UT\displaystyle U(-J_{\alpha})U^{T} =U𝔼(𝔼(θθTα,1(θ)|0))UT\displaystyle=U\mathbb{E}\bigg{(}\mathbb{E}\Big{(}\frac{\partial}{\partial\theta\partial\theta^{T}}\ell_{\alpha,1}(\theta^{*})|\mathcal{F}_{0}\Big{)}\bigg{)}U^{T}
    =𝔼(𝔼(mα(λ1(θ))|0)(Uλt(θ)θ)(Uλt(θ)θ)T), since 𝔼(hα(λ1(θ))|0)=0\displaystyle=\mathbb{E}\Big{(}\mathbb{E}\big{(}m_{\alpha}(\lambda_{1}(\theta^{*}))|\mathcal{F}_{0}\big{)}\Big{(}U\frac{\partial\lambda_{t}(\theta^{*})}{\partial\theta}\Big{)}\Big{(}U\frac{\partial\lambda_{t}(\theta^{*})}{\partial\theta}\Big{)}^{T}\Big{)},\text{ since }\mathbb{E}\big{(}h_{\alpha}(\lambda_{1}(\theta^{*}))|\mathcal{F}_{0}\big{)}=0
    >0\displaystyle>0

    according to

    𝔼(mα(λ1(θ))|0)=(1+α)y=0(η(λ1(θ))g(y|η(λ1(θ)))η)2g(y|η(λ1(θ)))α1>0,\displaystyle\mathbb{E}\big{(}m_{\alpha}(\lambda_{1}(\theta^{*}))|\mathcal{F}_{0}\big{)}=(1+\alpha)\underset{y=0}{\overset{\infty}{\sum}}\bigg{(}\eta^{\prime}(\lambda_{1}(\theta^{*}))\frac{\partial g(y|\eta(\lambda_{1}(\theta^{*})))}{\partial\eta}\bigg{)}^{2}g(y|\eta(\lambda_{1}(\theta^{*})))^{\alpha-1}>0,

    and the assumption (A8). This implies that the matrix (Jα)(-J_{\alpha}) is symmetric and positive definite. Thus, JαJ_{\alpha} is invertible.

Now, from (i), we apply the central limit theorem for stationary ergodic martingale difference sequence. It follows that

nθHα,n(θ)=1nt=1nθα,t(θ)𝒟n𝒩d(0,Iα),\sqrt{n}\frac{\partial}{\partial\theta}H_{\alpha,n}(\theta^{*})=\frac{1}{\sqrt{n}}\sum\limits_{t=1}^{n}\frac{\partial}{\partial\theta}\ell_{\alpha,t}(\theta^{*})\begin{array}[t]{c}\stackrel{{\scriptstyle{\cal D}}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}\mathcal{N}_{d}\left(0,I_{\alpha}\right),

where

Iα=𝔼[(θα,1(θ))(θα,1(θ))T].I_{\alpha}=\mathbb{E}\Big{[}\Big{(}\frac{\partial}{\partial\theta}\ell_{\alpha,1}(\theta^{*})\Big{)}\Big{(}\frac{\partial}{\partial\theta}\ell_{\alpha,1}(\theta^{*})\Big{)}^{T}\Big{]}.

According to (6.18), for nn large enough, (ii) and (iii) imply that

n(θ^α,nθ)\displaystyle\sqrt{n}(\widehat{\theta}_{\alpha,n}-\theta^{*}) =(Jn(θ^α,n))1(nθHα,n(θ))+oP(1)\displaystyle=\left(J_{n}(\widehat{\theta}_{\alpha,n})\right)^{-1}\big{(}\sqrt{n}\frac{\partial}{\partial\theta}H_{\alpha,n}(\theta^{*})\big{)}+o_{P}(1)
=Jα1(nθHα,n(θ))+oP(1)𝒟n𝒩d(0,Jα1IαJα1).\displaystyle=J^{-1}_{\alpha}\big{(}\sqrt{n}\frac{\partial}{\partial\theta}H_{\alpha,n}(\theta^{*})\big{)}+o_{P}(1)\begin{array}[t]{c}\stackrel{{\scriptstyle{\cal D}}}{{\longrightarrow}}\\ {\scriptstyle n\rightarrow\infty}\end{array}\mathcal{N}_{d}(0,J^{-1}_{\alpha}I_{\alpha}J^{-1}_{\alpha}).

This completes the proof of Theorem 2.2.
~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\blacksquare

6.3 Proof of Proposition 3.1

Let Gλ(y)G_{\lambda}(y) be the cumulative distribution function of g(y|η)g(y|\eta), with λ=B(η)\lambda=B(\eta), and its inverse Gλ1(u):=inf{y0,Gλ(y)u}G_{\lambda}^{-1}(u):=\inf\{y\geq 0,~{}G_{\lambda}(y)\geq u\} for all u[0,1]u\in[0,1]. Let (Ut)(U_{t}) be a sequence of independent uniform (0,1)(0,1) random variables and such that ξt=(Ut,εt)\xi_{t}=(U_{t},\varepsilon_{t}) is independent and identically distributed over time. Let us prove the existence of a τ\tau-weakly dependent stationary solution (Yt,λt,Xt)(Y_{t},\lambda_{t},X_{t}) of (3.1) and (3.2) satisfying

Yt=Gλt1(Ut).Y_{t}=G_{\lambda_{t}}^{-1}(U_{t}).

Note that, by the Proposition A.2 of [14], we get for λt\lambda_{t} and λt\lambda^{\prime}_{t},

𝔼[|Gλt1(Ut)Gλt1(Ut)||λt,λt]=|λtλt|.\mathbb{E}\big{[}|G_{\lambda_{t}}^{-1}(U_{t})-G_{\lambda^{\prime}_{t}}^{-1}(U_{t})|\big{|}\lambda_{t},\lambda^{\prime}_{t}\big{]}=|\lambda_{t}-\lambda^{\prime}_{t}|. (6.19)

For a solution (Yt,λt,Xt)(Y_{t},\lambda_{t},X_{t}) that fulfills (3.1), (3.2) and (6.19), set Zt=(Yt,Xt)Z_{t}=(Y_{t},X_{t}). We get

Zt=(Yt,Xt)=(Gλt1(Ut),u(Xt1,;εt)):=F(Zt1,;ξt) with λt=fθ(Zt1,).Z_{t}=(Y_{t},X_{t})=\big{(}G_{\lambda_{t}}^{-1}(U_{t}),u(X_{t-1},\cdots;\varepsilon_{t})\big{)}:=F(Z_{t-1},\cdots;\xi_{t})~{}\text{ with }~{}\lambda_{t}=f_{\theta}(Z_{t-1},\cdots). (6.20)

For a vector z=(y,x)0×dxz=(y,x)\in\mathbb{N}_{0}\times\mathbb{R}^{d_{x}}, define the norm zw=|y|+wxx\|z\|_{w}=|y|+w_{x}\|x\| for some wx>0w_{x}>0. According to Doukhan and Wintenberger (2008), it suffices to show that: (i) 𝔼F(𝒛;ξt)w<\mathbb{E}\|F(\boldsymbol{z};\xi_{t})\|_{w}<\infty for some 𝒛(0×dx)\boldsymbol{z}\in\big{(}\mathbb{N}_{0}\times\mathbb{R}^{d_{x}}\big{)}^{\mathbb{N}} and (ii) there exists a non-negative sequence (αk(F))k1(\alpha_{k}(F))_{k\geq 1} satisfying k1αk(F)<1\sum_{k\geq 1}\alpha_{k}(F)<1 such that 𝔼F(𝒛;ξt)F(𝒛;ξt)wk1αk(F)zkzkw\mathbb{E}\|F(\boldsymbol{z};\xi_{t})-F(\boldsymbol{z}^{\prime};\xi_{t})\|_{w}\leq\sum_{k\geq 1}\alpha_{k}(F)\|z_{k}-z^{\prime}_{k}\|_{w} for all 𝒛,𝒛(0×dx)\boldsymbol{z},\boldsymbol{z}^{\prime}\in\big{(}\mathbb{N}_{0}\times\mathbb{R}^{d_{x}}\big{)}^{\mathbb{N}}.

The point (i) holds directly from the assumptions on the functions fθf_{\theta} and uu. To prove (ii), for any 𝒛(0×dx)\boldsymbol{z}\in\big{(}\mathbb{N}_{0}\times\mathbb{R}^{d_{x}}\big{)}^{\mathbb{N}}, we set 𝒛=(𝒚,𝒙)\boldsymbol{z}=(\boldsymbol{y},\boldsymbol{x}) with 𝒚=(yk)\boldsymbol{y}=(y_{k}), 𝒙=(xk)\boldsymbol{x}=(x_{k}) and define λ=fθ(z1,)\lambda=f_{\theta}(z_{1},\cdots). For any 𝒛,𝒛(0×dx)\boldsymbol{z},\boldsymbol{z^{\prime}}\in\big{(}\mathbb{N}_{0}\times\mathbb{R}^{d_{x}}\big{)}^{\mathbb{N}}, from (6.19), A(Θ)0{}_{0}(\Theta) and (3.3), we have

𝔼F(𝒛;ξt)F(𝒛;ξt)w\displaystyle\mathbb{E}\|F(\boldsymbol{z};\xi_{t})-F(\boldsymbol{z}^{\prime};\xi_{t})\|_{w} =𝔼[|Gλ1(Ut)Gλ1(Ut)|+wx|u(𝒙;εt)u(𝒙;εt)|]\displaystyle=\mathbb{E}\big{[}|G_{\lambda}^{-1}(U_{t})-G_{\lambda^{\prime}}^{-1}(U_{t})|+w_{x}|u(\boldsymbol{x};\varepsilon_{t})-u(\boldsymbol{x}^{\prime};\varepsilon_{t})|\big{]}
k1αk,Y(0)|ykyk|+k1αk,X(0)xkxk+wxk1αk(u)xkxk\displaystyle\leq\sum_{k\geq 1}\alpha_{k,Y}^{(0)}|y_{k}-y_{k}^{\prime}|+\sum_{k\geq 1}\alpha_{k,X}^{(0)}\|x_{k}-x_{k}^{\prime}\|+w_{x}\sum_{k\geq 1}\alpha_{k}(u)\|x_{k}-x^{\prime}_{k}\|
k1αk,Y(0)|ykyk|+wxk1(αk,X(0)wxxkxk+αk(u)xkxk)\displaystyle\leq\sum_{k\geq 1}\alpha_{k,Y}^{(0)}|y_{k}-y_{k}^{\prime}|+w_{x}\sum_{k\geq 1}\big{(}\dfrac{\alpha_{k,X}^{(0)}}{w_{x}}\|x_{k}-x_{k}^{\prime}\|+\alpha_{k}(u)\|x_{k}-x^{\prime}_{k}\|\big{)}
k1αk(F)zkzkw\displaystyle\leq\sum_{k\geq 1}\alpha_{k}(F)\|z_{k}-z_{k}^{\prime}\|_{w}

with αk(F)=max{αk,Y(0),αk,X(0)wx+αk(u)}\alpha_{k}(F)=\max\big{\{}\alpha_{k,Y}^{(0)},\frac{\alpha_{k,X}^{(0)}}{w_{x}}+\alpha_{k}(u)\big{\}}. Thus, choose wxw_{x} such that wx>k1αk,X(0)1k1max{αk,Y(0),αk(u)}w_{x}>\frac{\sum_{k\geq 1}\alpha_{k,X}^{(0)}}{1-\sum_{k\geq 1}\max\{\alpha_{k,Y}^{(0)},\alpha_{k}(u)\}} to get k1αk(F)<1\sum_{k\geq 1}\alpha_{k}(F)<1. This complete the proof of the proposition.
~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\blacksquare

References

  • [1] Agosto, A., Cavaliere, G., Kristensen, D. and Rahbek, A. Modeling corporate defaults: Poisson autoregressions with exogenous covariates (PARX). Journal of Empirical Finance 38(B), (2016), 640-663.
  • [2] Ahmad, A. and Francq, C. Poisson QMLE of count time series models. Journal of Time Series Analysis 37, (2016), 291-314.
  • [3] Al-Osh MA, Alzaid AA. First order integer-valued autoregressive (INAR(1)) process. Journal of Time Series Analysis 8, (1987), 261-275.
  • [4] Al-Osh MA, Alzaid AA. Integer-valued pthpth-order autoregressive structure. J. Appl. Probab. 27(2), (1990), 314-324.
  • [5] Aknouche A. and Francq C. Count and duration time series with equal conditional stochastic and mean orders. Econometric Theory, (2020), 1-33.
  • [6] Basu, S., Lindsay, B.G. Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Statist. Math. 48, (1994), 683-705.
  • [7] Basu, A., Harris, I.R., Hjort, N.L. and Jones, M.C. Robust and efficient estimation by minimizing a density power divergence. Biometrika 85, (1998), 549-559.
  • [8] Belhaj, F., Abaoub, E. and Mahjoubi, M. Number of Transactions, Trade Size and the Volume-Volatility Relationship: An Interday and Intraday Analysis on the Tunisian Stock Market. International Business Research 8(6), (2015).
  • [9] Beran, R. Minimum Hellinger distance estimates for parametric models. Ann. Statist. 5, (1977), 445-463.
  • [10] Brännäs, K. and Shahiduzzaman Quoreshi, A.M.M. Integer-valued moving average modelling of the number of transactions in stocks. Applied Financial Economics 20, (2010), 1429-1440.
  • [11] Cox, D.R. Statistical analysis of time series: Some recent developments. Scandinavian Journal of Statistics 8, (1981), 93-115.
  • [12] Cui, Y. and Zheng, Q. Conditional maximum likelihood estimation for a class of observation-driven time series models for count data. Statistics & Probability Letters 123, (2017), 193-201.
  • [13] Davis, R.A. and Wu, R. A negative binomial model for time series of counts. Biometrika 96, (2009), 735-749.
  • [14] Davis, R.A. and Liu, H. Theory and Inference for a Class of Observation-Driven Models with Application to Time Series of Counts. Statistica Sinica 26, (2016), 1673-1707.
  • [15] Diop, M.L. and Kengne, W. Testing parameter change in general integer-valued time series. J. Time Ser. Anal. 38, (2017), 880-894.
  • [16] Douc, R., Fokianos, K., and Moulines, E. Asymptotic properties of quasi-maximum likelihood estimators in observation-driven time series models. Electronic Journal of Statistics 11, (2017), 2707-2740.
  • [17] Doukhan, P., Fokianos, K., and Tjøstheim, D. On Weak Dependence Conditions for Poisson autoregressions. Statist. and Probab. Letters 82, (2012), 942-948.
  • [18] Doukhan, P., Fokianos, K., and Tjøstheim, D. Correction to ”On weak dependence conditions for Poisson autoregressions” [Statist. Probab. Lett. 82 (2012) 942-948]. Statist. and Probab. Letters 83, (2013), 1926-1927.
  • [19] Doukhan, P. and Kengne, W. Inference and testing for structural change in general poisson autoregressive models. Electronic Journal of Statistics 9, (2015), 1267-1314.
  • [20] Doukhan, P. and Wintenberger, O. Weakly dependent chains with infinite memory. Stochastic Process. Appl. 118, (2008) 1997-2013.
  • [21] Ferland, R., Latour, A. and Oraichi, D. Integer-valued GARCH process. J. Time Ser. Anal. 27, (2006), 923-942.
  • [22] Fokianos, K., Rahbek, A. and Tjøstheim, D. Poisson autoregression. Journal of the American Statistical Association 104, (2009), 1430-1439.
  • [23] Fokianos, K. and Fried, R. Interventions in INGARCH processes. J. Time Ser. Anal. 31, (2010), 210-225.
  • [24] Fokianos, K. and Tjøstheim, D. Log-linear Poisson autoregression. Journal of multivariate analysis 102, (2011) 563-578.
  • [25] Fokianos, K. and Fried, R. Interventions in log-linear Poisson autoregression. Statistical Modelling 12, (2012), 1-24.
  • [26] Fokianos, K. and Neumann, M. A goodness-of-fit test for Poisson count processes. Electronic Journal of Statistics 7, (2013), 793-819.
  • [27] Fokianos, K. and Truquet, L. On categorical time series models with covariates. Stochastic processes and their applications 129, (2019), 3446-3462.
  • [28] Fokianos, K., Støve, B., Tjøstheim, D., and Doukhan, P. Multivariate count autoregression. Bernoulli 26, (2020), 471-499.
  • [29] Francq, C. and Thieu, L.Q. QML inference for volatility models with covariates. Econometric Theory 35, (2019), 37-72.
  • [30] Fried, R., Agueusop, I., Bornkamp, B., Fokianos, K., Fruth, J., Ickstadt, K. Retrospective Bayesian outlier detection in INGARCH series. Statistics and Computing 25, (2015), 365-374.
  • [31] Kang, J. and Lee, S. Minimum density power divergence estimator for Poisson autoregressive models. Computational Statistics and Data Analysis 80, (2014), 44-56.
  • [32] Kedem, B. and Fokianos, K. Regression Models for Time Series Analysis. Hoboken, Wiley, NJ, (2002).
  • [33] Kim, B. and Lee, S. Robust estimation for zero-inflated Poisson autoregressive models based on density power divergence. Journal of Statistical Computation and Simulation 87, (2017), 2981-2996.
  • [34] Kim B. and Lee, S. Robust estimation for general integer-valued time series models. Annals of the Institute of Statistical Mathematics, (2019), 1-26.
  • [35] Kounias, E.G. and Weng, T.-S An inequality and almost sure convergence. Annals of Mathematical Statistics 33, (1969), 1091-1093.
  • [36] Liboschik, T., Fokianos, K. and Fried, R. tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software 82, (2017), 1-51.
  • [37] Louhichi, W. What drives the volume-volatility relationship on Euronext Paris? International Review of Financial Analysis 20, (2011) 200-206.
  • [38] McKenzie, E. Some simple models for discrete variate time series. Water Resour Bull. 21, (1985), 645-650.
  • [39] Pedersen, R.S. and Rahbek, A. Testing Garch-X Type Models. Econometric Theory 35(5), (2018), 1-36.
  • [40] Simpson, D.G. Minimum Hellinger distance estimation for the analysis of count data. J. Amer. Statist. Assoc. 82, (1987), 802-807.
  • [41] Takaishi, T., and Chen, T.T. The relationship between trading volumes, number of transactions, and stock volatility in GARCH models. Journal of Physics: Conference Series 738 (1), (2016).
  • [42] Tamura, R.N. and Boos, D.D. Minimum Hellinger distance estimation for multivariate location and covariance. J. Amer. Statist. Assoc. 89, (1986), 223-239.
  • [43] Warwick, J. and Jones, M.C. Choosing a robustness tuning parameter. J. Statist. Comput. Simul. 75, (2005), 581-588.