Adjacent-category models for ordinal time series and their application to climate-dependent spruce budworm defoliation dynamics

Olaloudé Judicaël Franck Osse Zinsou Max Debaly Philippe Marchand and Miguel Montoro Girona

( )

Abstract

This work proposes an adjacent-category autoregressive model for time series of ordinal variables. We apply this model to dendrochronological records to study the effect of climate on the intensity of spruce budworm defoliation during outbreaks in two sites in eastern Canada. The model’s parameters are estimated using the maximum likelihood approach. We show that this estimator is consistent and asymptotically Gaussian distributed. We also propose a Portemanteau test for goodness-of-fit. Our study shows that the seasonal ranges of maximum daily temperatures in the spring and summer have a significant quadratic effect on defoliation. The study reveals that for both regions, a greater range of summer daily maximum temperatures is associated with lower levels of defoliation up to a threshold estimated at 22.7 ^∘C (CI of 0–39.7 ^∘C at 95%) in Témiscamingue and 21.8 ^∘C (CI of 0–54.2 ^∘C at 95%) for Matawinie. For Matawinie, a greater range in spring daily maximum temperatures increased defoliation, up to a threshold of 32.5 ^∘C (CI of 0–80.0 ^∘C). We also present a statistical test to compare the autoregressive parameter values between different fits of the model, which allows us to detect changes in the defoliation dynamics between the study sites in terms of their respective autoregression structures.

Keywords:

ecological modeling, forest, goodness-of-fit test, multivariate analysis, ordinal categorical time series.

1 Introduction

Ordinal categorical data, i.e., data taking discrete values that can be ranked on a scale but that do not represent precise numerical intervals, are common in many disciplines, including medicine, economics, sociology and psychology, geology, wildlife studies, geography, education, biology and ecology (Leonard,, 1999; Agresti,, 2010). They can be produced when the observed phenomenon occurs inherently on a series of discrete, qualitative states, such as the phenological stages of flowering (Salisbury et al.,, 2013). In other cases, the underlying phenomenon is continuous but is recorded on an ordinal scale to allow for the rapid surveying of a large area. Examples include regional surveys of weed density (Goodsell et al.,, 2021) or aerial surveys of defoliation from forest insect outbreaks (e.g., Ministère des Forêts de la Faune et des Parcs du Québec,). If the ordinal variable being measured is also dependent over successive points in time, then the analysis of these data requires methods adapted to ordinal time series.

Important developments have been made recently for discrete-valued time series (Weiß,, 2018; Davis et al.,, 2016). Many models exist for count data (Fokianos and Tjøstheim,, 2012; Fokianos et al.,, 2020), binary data (Manner et al.,, 2016; Moysiadis and Fokianos,, 2014) categorical data (Fokianos and Kedem,, 2003) and even multivariate mixed data (Debaly and Truquet,, 2023). Unfortunately, ordinal categorical time series do not receive much attention. Indeed, since about 1980, authors have distinguished between ordered- and unordered-scale categorical data. Fokianos and Kedem, (2003) introduced generalized linear-type models and distinguished models for ordinal and nominal time series. Their work was extended by Moysiadis and Fokianos, (2014) for nominal data. The latter proposed a model nested in the broad class of observation-driven (Cox et al.,, 1981) models for time series data (see, for example, Davis et al., (2003)).

In this research, we present a new observation-driven autoregressive model for ordinal categorical time series data, which we name an adjacent-category autoregressive model (hereafter ACAR). For other methods for modeling nomimal categorical data, we refer interested readers to Weiß, (2020) for advances in so-called discrete autoregressive–moving-average model (DARMA), presented in Jacobs and Lewis, (1978), and to Berchtold and Raftery, (2002) for the mixture transition distribution model (MTD). The probability stability properties of the ACAR models are derived by applying the perturbation and coupling results of Truquet, (2020). We show that the conditional maximum likelihood estimator is consistent and asymptotically Gaussian distributed. We propose Portemanteau tests (Akashi et al.,, 2021) to verify the adequacy of the ACAR model. It is shown that under the null hypothesis of adequacy, the sum of squares of the first few autocorrelations of certain model residuals is asymptotically chi-square distribution, if scaled appropriately. We also propose a statistical test for assessing the equality of a pair of ACAR models. Our method is then applied to tree-ring records of spruce budworm (Choristoneura fumiferana; SBW) defoliation in two regions of the eastern Canadian boreal forest. We investigate autoregressive effects and the exogenous effect of climate variables on outbreak development.

2 Materials and methods

Notations and conventions

For any sequence $(u_{n})_{n\in\mathbb{N}},$ we adopt the convention $\sum_{i=0}^{-1}u_{i}=0$ . For a positive real number $x$ , we set $\log^{+}(x)=\log(x)$ if $x\geq 1$ and $0$ otherwise. When $x$ is a vector of $\mathbb{R}^{d},|x|_{1}$ is the standard norm- $1$ of $x$ . For a $(m,n)-$ matrix $A=(a_{i,j})_{1\leq i\leq m,1\leq j\leq n},\|A\|_{1}=\mathrm{max}_{1\leq j\leq n}\sum_{i=1}^{m}|a_{i,j}|,A^{\top}=(a_{j,i})_{1\leq j\leq n,1\leq i\leq m}$ is the $(n,m)-$ matrix transpose of $A$ and $A[i:j,k:\ell]$ represents the submatrix of $A$ with entries $(a_{u,v})_{i\leq u\leq j,k\leq v\leq\ell}.$ For $A_{1},\ldots,A_{p}$ matrices with $r$ rows and $c_{1},\ldots,c_{p}$ columns, $A=(A_{1}|\cdots|A_{p})$ is the column concatenation of $A_{1},\ldots,A_{p}.$ For the two vectors $a=(a_{1},\ldots,a_{p})$ and $b=(b_{1},\ldots,b_{p})$ of $\mathbb{R}^{p},$ $a\odot b=(a_{1}b_{1},\ldots,a_{p}b_{p})$ is the Hadamard product of $a$ and $b.$ For an $m\times m$ matrix $A$ , the spectral radius of $A$ is denoted by $\rho(A).$

2.1 General formulation

We denote $Y_{t}$ as the response variable. In our example in Section 3, $Y_{t}$ corresponds to the defoliation level for each area classified on a scale of 0 to 3, corresponding to no defoliation (approx. $0\%$ ) or a low (approx. $1\%-35\%$ ), moderate ( $36\%-70\%$ ) or high ( $71\%-100\%$ ) fraction of the year’s foliage defoliated by SBW. Our model is given by

	$\displaystyle Y_{k,t}=\mathds{1}\left\{\sum_{k=0}^{j-1}\pi_{k,t}\leq U_{t}<\sum_{k=0}^{j}\pi_{k,t}\right\},\quad$	$\displaystyle 0\leq j\leq K,$		(2.1)
	$\displaystyle\log\frac{\pi_{j,t}}{\pi_{j-1,t}}=:\eta_{j,t}=\omega_{j}+\gamma^{\top}X_{t-1}+\alpha^{\top}\overline{Y}_{t-1}+\beta_{j}\eta_{j,t-1},$	$\displaystyle\quad 1\leq j\leq K,t\in\mathbb{Z},$		(2.2)

where $\omega_{j}\in\mathbb{R},\beta_{j}\in\mathbb{R}^{*},j=1,\ldots,K,\gamma\in\mathbb{R}^{P}\text{ and }\alpha\in\mathbb{R}^{K}$ . The vector $\overline{Y}_{t}=(Y_{1,t},\ldots,Y_{K,t})\in\{0,1\}^{K}$ , where for $1\leq j\leq K,\leavevmode\nobreak\ Y_{j,t}=1$ when $Y_{t}=j.$ The vector $(Y_{0,t},\ldots,Y_{K,t})$ is then the one-hot encoding of the categorical random variable $Y_{t}$ valued in $\{0,\ldots,K\}.$ It is well known that the parameters $\beta_{j},j=1,\ldots,K$ act as feedback parameters; $\beta_{j},j=1,\ldots,K$ allow us to extend the memory of Model (2.1)-(2.2) beyond lag 1 when the effect of lagged values decreases quickly when $j=1,\ldots,K,|\beta_{j}|<1.$

Others specifications for probabilities

At least two other ratios can be considered in place of $\frac{\pi_{j,t}}{\pi_{j-1,t}}$ in Equation (2.2). For example, $\frac{\sum_{i=1}^{j}\pi_{i,t}}{\sum_{i=j+1}^{K}\pi_{i,t}},j=1,\ldots,K$ corresponds to a cumulative-logit model (Agresti,, 2010; Fokianos and Kedem,, 2003), and $\frac{\pi_{j,t}}{\sum_{i=j+1}^{K}\pi_{i,t}},j=1,\ldots,K$ is a continuation-ratio logit model (Agresti,, 2010). The most popular of these three ratios for an ordinal response is the cumulative-logit model. However, the adjacent-category model presents some advantages for interpretability. We detail some of these advantages below.

Let us denote by $X_{p,t}$ the $p-th$ covariate and $X_{-p,t}$ the remaining covariates. Setting

R(X_{p,t},X_{-p,t},\overline{Y}_{t-1})=\frac{\pi_{j,t}}{\pi_{j-1,t}}\text{ and }R_{k}(X_{p,t},X_{-p,t},\overline{Y}_{t-1})=\frac{\pi_{j+k,t}}{\pi_{j-1,t}},

the ratio of probabilities of two successive values of $Y_{t}$ and this ratio when the difference between levels is $k+1$ for two different values $X_{-p,t}$ and $X_{-p,t}^{\prime}$ is

\frac{R(X_{p,t}isX_{-p,t},\overline{Y}_{t-1})}{R(X_{p,t}^{\prime},X_{-p,t},\overline{Y}_{t-1})}=\exp\left(\gamma_{p}(X_{p,t}-X_{p,t}^{\prime})\right)\text{ and }\frac{R_{k}(X_{p,t},X_{-p,t},\overline{Y}_{t-1})}{R_{k}(X_{p,t}^{\prime},X_{-p,t},\overline{Y}_{t-1})}=\left[\frac{R(X_{p,t},X_{-p,t},\overline{Y}_{t-1})}{R(X_{p,t}^{\prime},X_{-p,t},\overline{Y}_{t-1})}\right]^{k+1},

where $\gamma_{p}$ is the regression parameter of the covariate $(X_{p,t})_{t\in\mathbb{Z}}$ . These quantities do not depend on the category $j.$

Moreover, for covariates having $\gamma_{p}<0$ , larger values of $X_{p,t}$ are associated with lower values of $Y_{t}$ , and, in contrast, if $\gamma_{p}>0$ , as $X_{p,t}$ increases $Y_{t}$ is more likely to fall into higher categories.

Furthermore, Model (2.1)-(2.2) is similar to the category baseline logistic model for nominal data because

\lambda_{j,t}:=\log\frac{\pi_{j,t}}{\pi_{0,t}}=\sum_{k=0}^{j-1}\log\frac{\pi_{k+1,t}}{\pi_{k,t}}=\sum_{k=0}^{j-1}\eta_{j+1}.

(2.3)

For $k=1,\ldots,K$ ,

\pi_{k,t}=\frac{\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}}}{1+\sum_{i=1}^{K}\mathrm{e}^{\sum_{j=1}^{i}\eta_{j,t}}},\quad\pi_{0,t}=1-\sum_{k=1}^{K}\pi_{k,t}.

(2.4)

Equal slopes model

Equations (2.1)-(2.2) coincide with the equal slopes model in multinomial regression because the parameters $\alpha$ and $\gamma$ in Equation (2.2) do not depend on $j.$ However, one can note that the coefficients of latent processes are indexed with the categories. Nonetheless, it is possible to build a full equal slopes model by replacing $\beta_{j}\eta_{j}$ in Equation (2.2) with $\sum_{k=1}^{K}\beta_{k}\eta_{k}.$ The latter model is not a nested model of the considered model and therefore cannot be compared with this model using nullity coefficients tests (see, for example, Chap. 9 in Heyde, (2013)). Some arguments can be made in favor of models having specific effects, i.e.,

\displaystyle\log\frac{\pi_{j,t}}{\pi_{j-1,t}}=:\eta_{j,t}=\omega_{j}+\gamma_{j}^{\top}X_{t-1}+\alpha_{j}^{\top}\overline{Y}_{t-1}+\beta_{j}\eta_{j,t-1},

\displaystyle\quad 1\leq j\leq K,t\in\mathbb{Z}.

(2.5)

For the application under consideration here, our model allows us to disentangle the effect of climate on different phases of SBW outbreaks. Here, the parameter $\gamma_{1}$ represents the effect of covariates on the initial outbreak phase, and $\gamma_{j},j>1$ represents its outbreak development. Such an approach appears in Bouchard et al., (2006) in which the authors apply Bayesian model averaging (Anderson and Burnham,, 2004; Soti et al.,, 2019).

Model for more lags

For a more general model, Equation (2.2) can be replaced by

\log\frac{\pi_{j,t}}{\pi_{j-1,t}}=:\eta_{j,t}=\omega_{j}+\gamma^{\top}X_{t-1}+\sum_{i=1}^{r}\alpha_{i}^{\top}\overline{Y}_{t-i}+\sum_{i=1}^{r}\beta_{j,i}\eta_{j,t-i},\quad 1\leq j\leq K,t\in\mathbb{Z}

for a fixed $r.$ Nonetheless, a model having one lag is often sufficient to capture the properties of a series. However, it is also possible to consider a long-memory model (Doukhan et al.,, 2016),

\log\frac{\pi_{j,t}}{\pi_{j-1,t}}=:\eta_{j,t}=\omega_{j}+\gamma^{\top}X_{t-1}+\sum_{i=1}^{\infty}\alpha_{i}^{\top}\overline{Y}_{t-i}\quad 1\leq j\leq K,t\in\mathbb{Z},

although this lies beyond the scope of our work.

Other statistical models

Because the raw data are supposed to lie within a $(0,1)$ interval, statistical models for $(0,1)-$ distributed random variables can be used. Such models, known as beta autoregression models, are discussed by Gorgi and Koopman, (2021) and Guolo and Varin, (2014) among others. However, in our case, some remarkable values, such as extrema $0\%$ and $100\%$ and some central values, have important empirical frequencies; therefore, models for continuous data on $(0,1)$ will not perform well in this case. In addition, even if the defoliation is a continuous phenomenon, small changes in its structure will not be indicated within the sample.

The following result stands for the stability properties of Equation (2.1)-(2.2).

Proposition 1.

Consider the Model (2.1)-(2.2) and assume that $(X_{t},U_{t})_{t\in\mathbb{Z}}$ is stationary and ergodic with $\mathbb{E}\log_{+}|X_{0}|_{1}<\infty$ . If for $k=1,\ldots,K,|\beta_{k}|<1$ , then there exists a unique non-anticipative, stationary and ergotic sequence $(Y_{t},X_{t},\eta_{t})_{t\in\mathbb{Z}}$ solution for the Model (2.1)-(2.2). In addition, for $r\geq 1$ , $\mathbb{E}|\eta_{0}|_{1}^{r}<\infty$ if $\mathbb{E}|X_{0}|_{1}^{r}<\infty.$

Other more restrictive conditions for stability can be derived using the Debaly and Truquet, (2023) Theorem 1 based on Debaly and Truquet, (2021). Indeed, these conditions imply the parameters $\alpha$ . A similar comparison can be made between the condition of Theorem 1 in Moysiadis and Fokianos, (2014) and that of Proposition 4 in Truquet, (2020).

2.2 Statistical inference

From Equation (2.2),

\eta_{t}=\omega+\Gamma X_{t-1}+A\overline{Y}_{t-1}+B\eta_{t-1},

where $\omega=(\omega_{j})_{1\leq j\leq K},\Gamma^{\top}=(\gamma|\cdots|\gamma),A^{\top}=(\alpha|\cdots|\alpha)$ and $B=\mathrm{diag}(\beta_{j},1\leq j\leq K)$ . The vector of the parameters of the model is denoted by $\theta=(\omega^{\top},\mathrm{vec}(\Gamma)^{\top},\alpha^{\top},\beta_{1},\ldots,\beta_{K})^{\top}$ , and we assume that $\Theta(\ni\theta)$ is a compact set. The true value of the parameter is $\theta_{0}$ . For what follows, we write $\eta_{t}(\theta)$ (resp. $\eta_{j,t}(\theta),\pi_{j,t}(\theta),1\leq j\leq K)$ to make the latent process depending on the parameter $\theta.$

For $k=1,\ldots,K$ ,

\pi_{k,t}(\theta)=\frac{\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}}{1+\sum_{i=1}^{K}\mathrm{e}^{\sum_{j=1}^{i}\eta_{j,t}(\theta)}},\quad\theta\in\Theta,

and the parameter $\theta$ can by estimated by

\hat{\theta}_{n}=\underset{\theta\in\theta}{\mathrm{argmin}}-\sum_{t=1}^{n}\left(\sum_{k=1}^{K}Y_{k,t}\left(\sum_{j=1}^{k}\eta_{j,t}(\theta)\right)-\log\left(1+\sum_{k=1}^{K}\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}\right)\right)

(2.6)

with initial values of $\eta_{0}$ (resp. $\overline{Y}_{0}$ ). This estimator corresponds to the conditional maximum likelihood estimator of the Model (2.1)-(2.2). Let us set

\displaystyle s_{t}(\theta)=-\sum_{k=1}^{K}\left(\sum_{j\geq k}Y_{j,t}-\frac{\sum_{j\geq k}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}}{1+\sum_{k=1}^{K}\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}}\right)\nabla_{\theta}\eta_{k,t}(\theta)

and

	$\displaystyle h_{t}(\theta)$	$\displaystyle=$	$\displaystyle\sum_{k=1}^{K}\left\{\frac{1}{\left(1+\sum_{k=1}^{K}\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}\right)^{2}}\left[\left(\sum_{j\geq k}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\sum_{\ell=1}^{k-1}\left(1+\sum_{j=1}^{\ell-1}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\nabla_{\theta}\eta_{\ell,t}(\theta)+\right.\right.$
			$\displaystyle\left.\left.\left(1+\sum_{j=1}^{k-1}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\sum_{\ell=k}^{K}\left(\sum_{j\geq\ell}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\nabla_{\theta}\eta_{\ell,t}(\theta)\right]\right\}\nabla_{\theta}^{\top}\eta_{k,t}(\theta)$

with $\nabla_{\theta}^{\top}\eta_{k,t}(\theta)=\left(\frac{\partial{\eta_{k,t}}}{\partial{\theta_{i}}}\right)_{1\leq i\leq 3K+P}$ (see Appendix for the computations).

Proposition 2.

1.

Suppose that (a) the conditions of Proposition 1 hold for any $\theta\in\Theta$ with $\mathbb{E}|X_{0}|_{1}<\infty$ and (b) $\eta_{0}(\theta)=\eta_{0}(\theta_{0})\leavevmode\nobreak\ \mathbb{P}_{\theta_{0}}-a.s\Rightarrow\theta=\theta_{0}.$ Then, the estimator (Equation (2.6)) is consistent, i.e., a.s.

$\lim_{n\to\infty}\hat{\theta}_{n}=\theta.$
2.

If, in addition, $\theta_{0}$ is located in the interior of $\Theta$ , $\mathbb{E}|X_{0}|_{1}^{2}<\infty$ , and the matrix $\mathbb{E}h_{0}(\theta_{0})$ is invertible.

$\lim_{n\to\infty}\sqrt{n}\left(\hat{\theta}_{n}-\theta_{0}\right)=\mathcal{N}\left(0,J^{-1}L{J^{-1}}^{\top}\right),$

where $L=\mathbb{E}s_{0}(\theta_{0})s_{0}^{\top}(\theta_{0})$ and $J=\mathbb{E}h_{0}(\theta_{0}).$

We do not prove the previous result. Interested readers will find some elements of the proof in Moysiadis and Fokianos, (2014), proof of Theorem 2. The same lines of proof can be found in Straumann et al., (2006) for the GARCH model or Debaly and Truquet, (2023) for a multivariate model for time series of mixed data. A condition for the identifiability assumption, $\eta_{0}(\theta)=\eta_{0}(\theta_{0})\leavevmode\nobreak\ \mathbb{P}_{\theta_{0}}-a.s\Rightarrow\theta=\theta_{0},$ based on Debaly and Truquet, (2023), is given in the Appendix. We asume the invertibility of the matrix $J$ . Finally, in the estimation procedure for an observation-driven model, one often requires the stationary condition only for the true value of parameter $\theta_{0}$ and the condition for contraction for the map that defines the latent process (see Chap. 7 in Francq and Zakoian, (2019) or Straumann et al., (2006)). In our case, these coincide.

2.3 Portemanteau-type tests for diagnostic checking

Here, we test the adequation of the Model (2.1)-(2.2) and the set of assumptions of Proposition 2. The null hypothesis is then that Model (2.1)-(2.2) holds with $\theta=\theta_{0}.$ One can look at the residuals:

e_{t}(\theta)=\left(\sum_{j\geq k}Y_{j,t}-\frac{\sum_{j\geq k}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}}{1+\sum_{k=1}^{K}\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}}\right)_{1\leq k\leq K},t\in\mathbb{Z},\theta\in\Theta

(2.7)

and the autocorrelations:

\rho_{h}(\theta)=\frac{1}{n}\sum_{t=1}^{n}e_{t}(\theta)\odot e_{t-h}(\theta),h=1,\ldots,n,\theta\in\Theta.

(2.8)

The vector of $q$ successive autocorrelations is $\rho_{1:q}(\theta)=\mathrm{vec}((\rho_{1}(\theta)|\cdots|\rho_{q}(\theta))^{\top})$ , and its evaluation at $\hat{\theta}_{n}$ is $\rho_{1:q}(\hat{\theta}_{n})=\mathrm{vec}((\rho_{1}(\hat{\theta}_{n})|\cdots|\rho_{q}(\hat{\theta}_{n}))^{\top}).$ The vector $\rho_{1:q}(\theta)$ evaluated at $\theta_{0}$ is denoted by $\rho_{1:q}.$ Note that $\rho_{1:q}=n^{-1}\sum_{t=1}^{n}m_{t}$ , and under the null hypothesis $\mathbb{E}(m_{t}|\mathcal{F}_{t-1})=0,$ i.e., $(m_{t})_{t\in\mathbb{Z}}$ , is a martingale difference sequence. Obviously, there are many choices for residuals that make $(e_{t}(\theta_{0})\odot e_{t-h}(\theta_{0}))_{t\in\mathbb{Z}}$ a martingale difference sequence, such as

\left(Y_{k,t}-\frac{\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}}{1+\sum_{k=1}^{K}\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}}\right)_{1\leq k\leq K},t\in\mathbb{Z},\theta\in\Theta.

(2.9)

The residuals of interest here is relevant because

s_{t}(\theta)=-\sum_{k=1}^{K}e_{k,t}(\theta)\nabla_{\theta}\eta_{k,t}(\theta)

with $e_{k,t}(\theta)$ as the $k$ -th element of $e_{t}(\theta)$ . This allows us to derive the asymptotic distribution of $\sqrt{n}\rho_{1:q}.$

The Jacobian matrix $\xi_{t}$ of $e_{t}$ with respect to $\theta$ is

	$\displaystyle\xi_{t}(\theta)(k,\cdot)^{\top}$	$\displaystyle=$	$\displaystyle-\frac{1}{\left(1+\sum_{k=1}^{K}\mathrm{e}^{\sum_{j=1}^{k}\eta_{j,t}(\theta)}\right)^{2}}\left[\left(\sum_{j\geq k}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\sum_{\ell=1}^{k-1}\left(1+\sum_{j=1}^{\ell-1}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\nabla_{\theta}\eta_{\ell,t}(\theta)+\right.$
			$\displaystyle\left.\left(1+\sum_{j=1}^{k-1}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\sum_{\ell=k}^{K}\left(\sum_{j\geq\ell}\mathrm{e}^{\sum_{i=1}^{j}\eta_{i,t}(\theta)}\right)\nabla_{\theta}\eta_{\ell,t}(\theta)\right].$

For the sake of readability, we write $\hat{\rho}_{1:q}$ (resp. $e_{t}$ and $\xi_{t}$ ) for $\rho_{1:q}(\hat{\theta}_{n})$ (resp. $e_{t}(\theta_{0})$ and $\xi_{t}(\theta_{0})$ ).

Let us set $C_{q}=(c_{1}|\cdots|c_{q})^{\top}$ with $c_{k}=(c_{k,1}|\cdots|c_{k,q})$ . For $h=1,\ldots,q,c_{k,h}^{\top}=\mathbb{E}e_{k,-h}\xi_{0}(k,\cdot),k=1,\ldots,K$ , and $\hat{C}_{K}$ is its empirical counterpart. Let us set

d_{i,j}({u,v})=\mathbb{E}\mathrm{e}_{i+1,0}\mathrm{e}_{i+1,-u}\mathrm{e}_{j+1,0}\mathrm{e}_{j+1,-v},i,j=0,\dots,K-1,u,v=1,\dots.q,

$G=(g_{1}|\cdots|g_{K})^{\top}$ , $g_{k}=(g_{k,1}^{\top}|\cdots|g_{k,q}^{\top})$ with

g_{k,h}=\mathbb{E}e_{k,-h}e_{k,0}\sum_{\ell=1}^{K}e_{\ell,0}\nabla_{\theta_{0}}^{\top}\eta_{\ell,0}(\theta){J^{-1}}^{\top},1\leq h\leq q,

and $D$ is the $Kq\times Kq$ matrix with elements given by

D[(iq+1):(i+1)q,(jq+1):(j+1)q]=[d_{i,j}(u,v)]_{1\leq u,v\leq q},0\leq i,j\leq K-1,

where $\hat{D}$ , $\hat{G}$ and $\hat{S}$ are the empirical counterparts of $D$ , $G$ and $S.$ Let us set $W=D+C_{q}J^{-1}L{J^{-1}}^{\top}C_{q}^{\top}+GC_{q}^{\top}+C_{q}G^{\top}$ and $\hat{W}$ as its empirical counterpart.

We require the following additional assumption, which represents the invertibility of $W.$ For $k=1,\ldots,K$ , let us set

\iota_{k,t}=\left(0_{q}^{\top},\ldots,0_{q}^{\top},\underbrace{e_{k,t-1}(\theta_{0}),\ldots,e_{k,t-q}(\theta_{0})}_{\text{k-th block of q elements}},0_{q}^{\top},\ldots,0_{q}^{\top}\right)^{\top}.

INV-1

For $\lambda\in\mathbb{R}^{Kq},$ if

\lambda^{\top}\sum_{k=1}^{K}e_{k,0}\left(\iota_{k,0}+C_{q}J^{-1}\nabla_{\theta_{0}}\eta_{k,0}(\theta)\right)=0\text{ a.s.},

then $\lambda=0.$

Proposition 3.

Under the assumptions of Proposition 2 with an additional moment condition on $X_{0}$ and INV-1, as $n$ tends to $\infty,$

n\hat{\rho}_{1:q}^{\top}\hat{W}^{-1}\hat{\rho}_{1:q}\Rightarrow\chi_{Kq}^{2}.

Then the adequacy of the ACAR Model (2.1)-(2.2) is rejected at the asymptotic level $\alpha$ if

n\hat{\rho}_{1:q}^{\top}\hat{W}^{-1}\hat{\rho}_{1:q}>\chi_{Kq}^{2}(1-\alpha).

It is well known that Portemanteau tests are omnibus tests (Francq and Zakoian,, 2019). They are less powerful than specific alternative tests that correspond to the nullity of the coefficient of the extra-lag model. Our study focuses only on a model of lag $1.$

2.4 Comparison test of two adjacent-category time series models

When we are interested in two or more sites for the same process, it may happen that we have the same models for certain sites with coefficient values that are a priori different. We propose a two-by-two model comparison test. For example, when we are interested in two ecological sites, we assume that the model on one site is driven by $\theta_{0}^{(1)}$ and that of the second site is driven by $\theta_{0}^{(2)}$ . We want to test whether the $p-th$ covariate has the same effect on both sites, that is $\theta_{p,0}^{(1)}=\theta_{p,0}^{(2)}$ , and also globally $\theta_{0}^{(1)}=\theta_{0}^{(2)}$ . For $j=1,2$ , let us denote $(Y_{t}^{(j)},X_{t}^{(j)})_{t\in\mathbb{Z}}$ as the processes $(Y_{t},X_{t})$ coinciding with the site $j$ . The issue here is testing the equality of the distributions of $(Y_{t}^{(1)},X_{t}^{(1)})_{t\in\mathbb{Z}}$ and $(Y_{t}^{(2)},X_{t}^{(2)})_{t\in\mathbb{Z}}$ . Tests of the laws of two strictly stationary processes have been investigated previously. They consist of tests on: 1. marginal distributions (Doukhan et al.,, 2020, 2015); 2. jointly finite distributions (Dhar et al.,, 2014); and 3. all possible $d-$ dimensional joint distributions of both processes (Pommeret et al.,, 2022). With regards to the parametric form of Equation (2.1)-(2.2), we can perform the test of equality of laws of entire processes without requiring their finite distributions.

For $\theta\in\Theta$ , $(s_{t}^{(j)}(\theta))_{t\in\mathbb{Z}}$ and $(h_{t}^{(j)}(\theta))_{t\in\mathbb{Z}}$ are respectively the score and Hessian processes evaluated on $\theta$ for the process $(Y_{t}^{(j)},X_{t}^{(j)})_{t\in\mathbb{Z}}.$ Under the conditions of Proposition 2 for $(Y_{t}^{(j)},X_{t}^{(j)})_{t\in\mathbb{Z}},$

\lim_{n\to\infty}\frac{1}{\sqrt{n}}\sum_{t=1}^{n}s_{t}^{(j)}(\theta_{0}^{(j)})=\mathcal{N}(0,L_{j})

and a.s.

\lim_{n\to\infty}\frac{1}{n}\sum_{t=1}^{n}h_{t}^{(j)}(\theta_{0}^{(j)})=J_{j},j=1,2.

Let us set $S=\mathbb{E}s_{0}^{(2)}(\theta_{0}^{(2)})s_{0}^{(1)}(\theta_{0}^{(1)})^{\top}$ , $V=J_{1}^{-1}L_{1}{J_{1}^{-1}}^{\top}+J_{2}^{-1}L_{2}{J_{2}^{-1}}^{\top}+J_{2}^{-1}S{J_{1}^{-1}}^{\top}+{J_{1}^{-1}}S^{\top}{J_{2}^{-1}}^{\top}$ and $\hat{V}$ its empirical counterpart. We denote $\hat{\theta}_{n}^{(1)}$ (resp. $\hat{\theta}_{n}^{(2)}$ ) the estimator of $\theta_{0}^{(1)}$ (resp. $\theta_{0}^{(2)}$ ), given by Equation (2.6) , and $\hat{\theta}_{p,n}^{(1)}$ (resp. $\hat{\theta}_{p,n}^{(2)}$ ) as the $p-th$ element of $\theta_{0}^{(1)}$ (resp $\theta_{0}^{(2)}$ ).

We require the following additional assumption for the invertibility of $V$ .

INV-2

For $\lambda\in\mathbb{R}^{3K+P},$ if

\lambda^{\top}\left(J_{1}^{-1}s_{t}^{(1)}(\theta_{0}^{(1)})+J_{1}^{-2}s_{t}^{(2)}(\theta_{0}^{(2)})\right)=0\text{ a.s.},

then $\lambda=0.$

Proposition 4.

Suppose that $\left(s_{t}^{(1)}(\theta_{0}^{(1)}),s_{t}^{(2)}(\theta_{0}^{(2)})\right)_{t\in\mathbb{Z}}$ is stationary and ergodic. Under the assumptions of Proposition 2 and INV-2:

1.

As $n$ tends to $\infty,$

$n^{1/2}\frac{\hat{\theta}_{p,n}^{(1)}-\hat{\theta}_{p,n}^{(2)}}{\sqrt{\hat{V}(p,p)}}\Rightarrow\mathcal{N}(0,1).$

The hypothesis $\theta_{p,0}^{(1)}=\theta_{p,0}^{(2)}$ is then rejected at the level $\alpha$ if

$\left|n^{1/2}\frac{\hat{\theta}_{p,n}^{(1)}-\hat{\theta}_{p,n}^{(2)}}{\sqrt{\hat{V}(p,p)}}\right|>q(1-\alpha/2).$

As $n$ tends to $\infty,$

n\left(\hat{\theta}_{n}^{(1)}-\hat{\theta}_{n}^{(2)}\right)^{\top}\hat{V}^{-1}\left(\hat{\theta}_{n}^{(1)}-\hat{\theta}_{n}^{(2)}\right)\Rightarrow\chi^{2}_{P}.

The hypothesis $\theta_{0}^{(1)}=\theta_{0}^{(2)}$ is then rejected at the level $\alpha$ if

n\left(\hat{\theta}_{n}^{(1)}-\hat{\theta}_{n}^{(2)}\right)^{\top}\hat{V}^{-1}\left(\hat{\theta}_{n}^{(1)}-\hat{\theta}_{n}^{(2)}\right)>\chi^{2}_{P}(1-\alpha).

On the stationary assumption for scores

Note that we require the joint process of scores $\left(s_{t}^{(1)}(\theta_{0}^{(1)}),s_{t}^{(2)}(\theta_{0}^{(2)})\right)_{t\in\mathbb{Z}}$ to be stationary and ergodic. This assumption is satisfied as soon as the process $\left(X_{t}^{(1)},X_{t}^{(2)},U_{t}^{(1)},U_{t}^{(2)}\right)_{t\in\mathbb{Z}}$ is stationary and ergodic, where ${(U_{t}^{(1)})}_{t\in\mathbb{Z}}$ (resp. ${(U_{t}^{(2)})}_{t\in\mathbb{Z}}$ ) is the process of uniform variable that generates $(Y_{t}^{(1)})_{t\in\mathbb{Z}}$ (resp. $(Y_{t}^{(2)})_{t\in\mathbb{Z}}$ . The case for the independency of the processes $\left(X_{t}^{(1)},U_{t}^{(1)}\right)_{t\in\mathbb{Z}}$ and $\left(X_{t}^{(2)},U_{t}^{(2)}\right)_{t\in\mathbb{Z}}$ is particularly interesting because it entails S = 0. It can be also interesting to consider the case where $\left(X_{t}^{(1)}\right)_{t\in\mathbb{Z}}$ and $\left(X_{t}^{(2)}\right)_{t\in\mathbb{Z}}$ are independent, but $\left(U_{0}^{(1)},U_{0}^{(2)}\right)_{t\in\mathbb{Z}}$ is degenerate; for example $U_{0}^{(1)}=U_{0}^{(2)}$ or $U_{0}^{(1)}=1-U_{0}^{(2)}.$ We will investigate the latter examples in the numerical experimentation section.

2.5 Model implementation

The numerical implementation of Model (2.1)-(2.2) and the statistical inference methods are implemented using R (R Core Team,, 2022). We choose $\beta_{j,0}=0.5,j=1,\ldots,K$ for the initial values of the latent processes, whereas the first observation of the paths acts to initialize the sequence $(Y_{t})_{t\in\mathbb{Z}}$ . The initial value of the score sequence is chosen to be equal to $0_{3K+P}.$ The loss function (Equation (2.6)) is optimized for $20$ different random initializations using the L-BFGS-B method (Byrd et al.,, 1995), a derivative-free optimization routine that allows box constraints, run in the R function optim. The estimated $\theta$ values are those corresponding to the minimum loss function value over all initializations. Let us set $\theta_{j}$ as the $j-th$ element of the parameters vector $\theta$ , and the set $\Theta\ni\theta$ is

\Theta=:\Theta_{\varepsilon}=\left\{\theta\in\mathbb{R}^{3K+P}:\theta_{3K+P-j}\in[-1+\varepsilon,1-\varepsilon],j=1,2,3\text{ and }\theta_{j}\in[-1/\varepsilon,1/\varepsilon],j=1,\ldots,2K+P\right\}

for a fixed and known $\varepsilon>0$ . Here we fix $\varepsilon=1\mathrm{e}^{-6}$ . All methods are made available using two top level functions poly_autoregression and comparison_test_dependent. The first takes as an argument the observed paths of sequences $(\overline{Y}_{t})_{t\in\mathbb{Z}}$ and $(X_{t})_{t\in\mathbb{Z}}$ and returns, among others, the estimated values of parameters $\theta$ , its standard values and the result of the Portemanteau test in Proposition 3. The second takes as arguments the returned values of two models fitted with the poly_autoregression function and gives the statistics and the p-values of the test in Proposition 4.

2.6 Numerical experimentation

Recall that the vector of the parameters is denoted by $\theta=(\omega^{\top},\mathrm{vec}(\Gamma)^{\top},\alpha^{\top},\beta_{1},\ldots,\beta_{K})^{\top}$ . For numerical experimentation, we use a simulated adjacent categorical response variable with four levels, as for our real data example, and five simulated covariates. We consider the sets of parameters of Table 1, chosen to meet the condition $|\beta_{j}|<1,j=1,\ldots K$ in Proposition 1.

	$\omega_{1}$	$\omega_{2}$	$\omega_{3}$	$\gamma_{1}$	$\gamma_{2}$	$\gamma_{3}$	$\gamma_{4}$	$\gamma_{5}$	$\alpha_{1}$	$\alpha_{2}$	$\alpha_{3}$	$\beta_{1}$	$\beta_{2}$	$\beta_{3}$
$\theta_{0}^{(1)}$	1.2	0.7	0.5	-0.8	1.5	-1.5	2.0	2.0	0.3	-0.3	0.5	0.8	-0.2	0.3
$\theta_{0}^{(2)}$	1.2	0.7	0.5	0.8	-1.5	1.5	-2.0	-2.0	-0.3	0.3	-0.5	-0.8	0.2	-0.3
$\theta_{0}^{(3)}$	1.2	0.7	1.5	0.8	-1.5	-1.5	2.0	-2.0	0.3	-0.3	-0.5	-0.8	0.2	-0.3

Table 1: The sets of parameters used in the numerical experimentation

Finite sample performance of the maximum likelihood estimator

We investigate the finite sample performance of the estimators (Equation(2.6)) for the set of the parameters $\theta_{0}^{(1)},\theta_{0}^{(2)}$ and $\theta_{0}^{(3)}$ . For each value of $\theta_{0},$ the data generation process consists of drawing the covariates $(X_{t})_{t=1..n}$ from a standard Gaussian random distribution on $\mathbb{R}^{5}$ and $(Y_{t})_{t=1..n}$ , according to Equation (2.1). We consider four sample sizes: $n=70,100,300\text{ and }500.$ We repeat this process $B=499$ times, and for each sample we compute the estimate $\hat{\theta}_{n}$ and the covariance matrix of Estimator 2. Table 3 in the Appendix presents the simulation results. The line CMLE refers to the average estimated values for the parameters, and TSE refers to the mean of the estimated values for theoretical standard error.

\mathrm{CMLE}=B^{-1}\sum_{b=1}^{B}\hat{\theta}_{T}^{(b)}\text{ and }\mathrm{TSE}=B^{-1}\sum_{b=1}^{B}\mathrm{diag}\left\{{\hat{J}^{-1(b)}}{\hat{L}}^{(b)}{\hat{J}^{-\top(b)}}\right\}^{1/2},

where the superscript $b$ represents the index of replication and $\mathrm{diag}M$ for a matrix $M$ is the diagonal elements of $M$ . MAE and MSE refer to the mean absolute and mean squared errors of the estimator. It appears that the model parameters are well estimated even for $n=100.$ The estimates are less accurate for $n=70.$ Nevertheless, the true values of parameters are recovered in many cases by the confidence interval at a $0.95$ level.

Comparison tests

We set four scenarios to access the quality of Comparison test 4.

1.

The processes $\left(X_{t}^{(1)},U_{t}^{(1)}\right)_{t\in\mathbb{Z}}$ and $\left(X_{t}^{(2)},U_{t}^{(2)}\right)_{t\in\mathbb{Z}}$ are independent, and the true values of the parameters are both equal to $\theta_{0}^{(1)}$ .
2.

The processes $\left(X_{t}^{(1)},U_{t}^{(1)}\right)_{t\in\mathbb{Z}}$ and $\left(X_{t}^{(2)},U_{t}^{(2)}\right)_{t\in\mathbb{Z}}$ are independent, and the true values of the parameters are $\theta_{0}^{(1)}$ and $\theta_{0}^{(3)}$ .
3.

The processes $\left(X_{t}^{(1)}\right)_{t\in\mathbb{Z}}$ and $\left(X_{t}^{(2)}\right)_{t\in\mathbb{Z}}$ are independent, $U_{0}^{(2)}=1-U_{0}^{(1)}$ , and the true values of the parameters are both equal to $\theta_{0}^{(1)}$ .
4.

The processes $\left(X_{t}^{(1)}\right)_{t\in\mathbb{Z}}$ and $\left(X_{t}^{(2)}\right)_{t\in\mathbb{Z}}$ are independent, $U_{0}^{(2)}=U_{0}^{(1)}$ and the true values of the parameters are $\theta_{0}^{(1)}$ and $\theta_{0}^{(3)}$ .

For each of theses scenarios, the length of the paths is $n=500$ , and we perform $B=1000$ times the comparison tests. For scenarios 2 and 4, the null hypothesis is rejected for all $B=1000$ tests, showing a test power near 1. For Scenario 1, the null hypothesis is accepted in 65% cases, and this rate drops to 60% for Scenario 3.

3 Application to spruce budworm defoliation data

Natural and anthropogenic disturbance regimes are majors drivers of change in the structure and function of forest ecosystems (De Grandpré et al.,, 2018; Aakala et al.,, 2023). SBW is the most important defoliator of conifer trees in North America (Lavoie et al.,, 2019; Fuentealba et al.,, 2022; Kayes and Mallik,, 2020; Anderegg et al.,, 2015). This defoliation and the subsequent tree mortality result in marked losses in forest productivity (Grondin et al.,, 1996) and reduce the economic and ecosystem and services provided by these trees and forests (Subedi et al.,, 2023; Scott et al.,, 2023; Sato et al.,, 2023; Chagnon et al.,, 2022). These defoliation events also reduce the boreal forest’s carbon sequestration capacity (Liu et al.,, 2019, 2020). Damage occurs when SBW larvae repeatedly feed on the annual foliage of mature balsam fir (Abies balsamea), white spruce (Picea glauca) and black spruce (Picea mariana). This defoliation reduces production, leading to radial growth suppression and eventual tree mortality (Lavoie et al.,, 2021; Liu et al.,, 2022; Debaly et al.,, 2022). Reduced tree-ring growth can therefore be an indirect indicator of defoliation events. Moreover, defoliated trees are more susceptible to fungus and disease, which leads to a reduction in the quality of the wood available for timber-based industries (MacLean,, 2016; Safranyik et al.,, 2010; Chang et al.,, 2012; Fuentealba et al.,, 2022).

The spread of forest pest epidemics is a complex phenomenon that can occur across a wide spatial scale (Bouchard and Auger,, 2014; Montoro Girona et al.,, 2018). In eastern Canada, the boreal forest is an integral part of the environment, economy, culture and history (Kayes and Mallik,, 2020). Given the ecological and economic consequences of SBW outbreaks, SBW-related defoliation is a major issue in Canadian forestry.

The reconstruction of the dynamics of SBW epidemics over the last centuries shows that the frequency, severity and spatial distribution of SBW outbreaks have changed (Simard et al.,, 2006; Navarro et al.,, 2018; Berguet et al.,, 2021; Girona et al.,, 2023). SBW outbreaks are shifting in a north-eastern manner in boreal Canada (Pureswaran et al.,, 2018; Jactel et al.,, 2019; Navarro et al.,, 2018; Seidl et al.,, 2017; Régnière and Nealis,, 2019). These changes can be explained by changes in the spatial distribution of SBW hosts because of climate shifts and climate effects on SBW population dynamics (Régnière and Nealis,, 2019; D’Amato et al.,, 2023; MacLean,, 2016; Schowalter,, 2022). Accurate prediction of SBW spread requires understanding the interactions between SBW defoliation and climate (Pureswaran et al.,, 2015; Overpeck et al.,, 1990; Régnière and Nealis,, 2019).

Understanding the interactions of climate with natural disturbances is crucial for adapting forest sustainable management to climate change (Seidl et al.,, 2023; Gauthier et al.,, 2023). SBW is also integrated within complex food webs (Eveleigh et al.,, 2007). Interactions with higher trophic levels (predators, pathogens and parasitoids) and lower trophic levels (trees) play a major role in controlling SBW population dynamics. Climate change will create extreme conditions that negatively affect the physiological mechanisms of SBW host species (photosynthesis, evapotranspiration, etc.) and the development, reproduction and mortality rates of SBW parasites and parasitoids (Bouchard and Auger,, 2014; Candau and Fleming,, 2011; Jactel et al.,, 2019; Li et al.,, 2020). It should be noted that climate change will not only affect the vulnerability of the forest to defoliation but also the SBW population dynamics responsible for this defoliation (Pureswaran et al.,, 2018; Jactel et al.,, 2019; Boakye et al.,, 2022). New methodological approaches for understanding climate and natural disturbance interactions are thus essential to develop reliable forecasts to guide forest management strategies in the face of climate change (Gauthier et al.,, 2023).

3.1 Existing models of forest insect outbreaks

Assessing the effects of climate on forest insect outbreaks is an ongoing research topic (Candau and Fleming,, 2011; Navarro et al.,, 2018; Zhu et al.,, 2018; Eickenscheidt et al.,, 2019; Berguet et al.,, 2021). Using the dendrochronological series of growth rings of spruce, Navarro et al., (2018) and Berguet et al., (2021) reconstructed the spatiotemporal dynamics of 20th-century SBW outbreaks across the insect’s range in Quebec in forests across several regions. They then enriched their analysis by including the climate characteristics of each forest. Berguet et al., (2021) found three patterns in the time series of defoliation. They applied k-means clustering (Hastie et al.,, 2009; Bishop and Nasrabadi,, 2006; Shalev-Shwartz and Ben-David,, 2014) to group the annual percentage of trees affected by SBW. One of these clusters was associated with lower annual temperatures; the other clusters corresponded to higher amounts of precipitation. The attempt of Navarro et al., (2018) to distinguish patterns of severity and frequency of SBW distribution paths was not very fruitful. The authors mentioned that the statistical method consisting of ordinary least squares linear regression (Hastie et al.,, 2009; Bishop and Nasrabadi,, 2006; Shalev-Shwartz and Ben-David,, 2014) of percentage of affected trees against climate data employed could be improved by other approaches.

Candau and Fleming, (2011) investigated the response of SBW defoliation in neighboring Ontario to climate change from a predictive point of view. They used random forest regression (Hastie et al.,, 2009; Bishop and Nasrabadi,, 2006; Shalev-Shwartz and Ben-David,, 2014) to relate the frequency of moderate-to-severe defoliation to climate conditions. This work measured the predictive strength of each variable via the decrease in prediction accuracy (i.e., the difference in MSE) between the out-of-bag samples and the out-of-bag samples after a random permutation with respect to the variable. They determined that seasonal temperatures in winter and minimum temperatures in summer and spring are relevant when forecasting the frequency of defoliation.

Eickenscheidt et al., (2019) used GAM models (Hastie,, 2017) and collected defoliation data to associate climate factors to defoliation events in Germany between 1989 and 2015. They reported a significant influence of climatic factors (drought stress, the strongest water deficits, etc.) and a drought-related increase in defoliation, relying on the studies of Ferretti et al., (2014); Zierl, (2004); Kammann and Wand, (2003); Seidling, (2007). These earlier papers demonstrated that lagged effects, especially drought in the preceding year and cumulative drought over several preceding years, have a major influence on defoliation in the following year.

Zhu et al., (2018) used Landsat data to analyze the defoliation of pine forests (Pinus tabulaeformis) around Beijing. They applied ordinary least squares linear regression (Hastie et al.,, 2009; Bishop and Nasrabadi,, 2006; Shalev-Shwartz and Ben-David,, 2014) to the remaining foliage against rainfall in the previous October and total annual hours of sunshine. They determined that 1. high active accumulated temperatures suppress the occurrence of the caterpillar and favor the recovery of trees; 2. heavy rainfall also suppresses outbreaks, especially rainfall in the preceding October; and 3. extensive sunshine can promote outbreaks and activate the insect. The influence of these climate variables on caterpillar abundance would be expected given the biological characteristics of this particular caterpillar, confirming the reliability of their predictive model.

Our contribution

We contribute to this debate by applying the ACAR model to defoliation data from eastern Canadian forests. We focus our efforts on methods for interpreting data in contrast with empirical forecasting methods such as random forest (Candau and Fleming,, 2011). Defoliation data are typically analyzed using predictive methods such as random forest (Candau and Fleming,, 2011), parametric statistical models like linear regression (Zhu et al.,, 2018; Germain et al.,, 2021) and semi-parametric methods such as GAM models (Eickenscheidt et al.,, 2019; Perrier et al.,, 2021; Boakye et al.,, 2022). However, all these methods do not account for the defoliation level being an ordinal time series or the process of defoliation being cumulative over time (Régnière and Nealis,, 2007; Chen et al.,, 2017; Sturtevant et al.,, 2015).

Rather than applying the usual method for repeated measures, which consists of fitting a global model for all series with as many parameters as possible being constant across series, we fit one model per series. Thus, we can highlight the dissimilarities between the defoliation processes for each region and the specific effects of covariates. From a practical point of view, the novelty of our methodology lies in its region-centric modeling. Our study therefore represents a beneficial region-adapted tool for intervention policy in forest resource management.

3.2 Study area

The study area is located in the northern temperate zone of northeastern North America in the temperate forests of the province of Québec, Canada. We selected the two study sites (Témiscamingue and Matawinie) on the basis of the availability of long time series of dendrochronological and climate data.

The Abitibi-Témiscamingue region is situated in western Quebec, and the regional climate is cold and dry (Dfc, Köppen–Geiger classification) (Kottek et al.,, 2006) with an average annual rainfall of approximately 900 mm (MDDEP, 2012). The sample site is at 46 ${}^{\circ}34^{\prime}$ $\text{ N and }78$ ${}^{\circ}50^{\prime}$ W within the northern temperate zone, comprising deciduous and mixed deciduous and coniferous forest, as part of the sugar maple–yellow birch bioclimatic zone (Figure 1).

Matawinie lies the administrative region Lanaudière in south-central Québec. Regional climate is cold (Dfb, Köppen–Geiger classification) (Kottek et al.,, 2006) and mean annual rainfall is about 1000 mm (MDDEP, 2012). The sampled forest is at 46 ${}^{\circ}26^{\prime}$ N and 73 ${}^{\circ}30^{\prime}$ W. Similar to the sampled forest in Témiscamingue, this eastern study site also lies in the northern temperate zone, comprising deciduous and mixed forests, and the sugar maple–yellow birch bioclimatic zone (Figure 1)

Refer to caption — Figure 1: Location of study sites in southern Québec and overview of the boreal ecoregions.

3.3 Data description

3.3.1 Defoliation data

For both study sites, we obtained 75-year (1920–1995) records of dendrochronological data from Navarro et al., (2018). The time period selected for this study ensures that tree-ring samples record multiple 20th-century insect outbreaks. This data set incorporates sites used in Jardon et al., (2003). In this latter survey, the authors sampled 400 m² plots and collected wood disks from seven dominant living trees and three dominant living saplings. They used dendrochronology to reconstruct SBW outbreaks in 126 locations within the eastern boreal forest and studied a large sample of trees at different forest sites, using reduced growth is an indirect indicator of defoliation. Thus, Jardon et al., (2003) calculated the average radial growth of the trees in the sample and inferred different levels of defoliation in the forest sites as a function of growth lag. We refer interested readers to Navarro et al., (2018) for details regarding the dendroecological methodology.

We then estimated the proportion of trees affected by defoliation at each site over the 75 years. We considered a tree to be defoliated when we observed a growth reduction for the sample that was more than 1.28 SD from the mean standardized dendrochronological record. Outbreak severity was defined as the proportion of trees at each site showing a pattern of growth reduction at a given time. We defined defoliation severity as: Level $0$ (null defoliation) corresponded to no trees showing a growth reduction greater than 1.28 SD; $1$ (light defoliation) reflected 1%– $35\%$ of trees experiencing a growth reduction; $2$ (moderate defoliation) represented 36%– $70\%$ of trees with reduced growth; and $3$ (severe defoliation) reflected 71%– $100\%$ of trees showing reduced growth. The data sets are available on https://figshare.com/articles/dataset/Untitled_Item/7887746.

3.3.2 Climate data sets

Climate data were obtained from the National Centers for Environmental Information https://www.ncei.noaa.gov/. The weather station in Témiscamingue is located 25 km away from the forest site and the distance between the Matawinie site and the nearest weather station is about 35 km. Both data sets include daily maximum and minimum temperatures, precipitation and snow depth from 1920 to 1995 (75 years of daily data). Daily climate data for 1928 contains much missing data; we therefore excluded 1928 from our analysis. From this raw data, we computed the seasonal covariates for summer and spring given the biological characteristics of SBW and following multiple earlier studies (Debaly et al.,, 2022; Volney and Fleming,, 2007, 2000). We considered April, May and June as being spring and July, August and September as summer. The specific covariates used are described in the following section. Given that the effect of defoliation on growth is delayed, our model includes the effect of climate covariates with a lag of two years.

3.4 Data analysis

From a preliminary descriptive analysis that consists of calculating the serial correlations, we considered two sets of climate covariates. We used the interannual difference (current year minus the previous year) of the mean daily maximum and mean daily minimum temperatures for the spring and summer, the seasonal range (maximum minus minimum across all days) of daily minimum and daily maximum temperatures for spring and summer, and the interannual difference in the log of annual precipitation, the log of annual snowfall, the log of annual precipitation and log of annual snowfall. For all temperature-related covariates, we applied a quadratic term to the model as well as a linear effect. For both sites, we fitted the Model (2.1)-(2.2) with all possible combinations of our covariates. For each fitted model, we performed a Portemanteau test (see Proposition 3) with one lag. Only models for which H₀ was not rejected were considered. We then performed a global comparison of the retained models with the test (Proposition 4 2.). When the null hypothesis was rejected, we carefully determined the variable that drove the deviation from H₀ by using Proposition 4 1. When the Portemanteau test did not reject the adequacy of the models of interest, we selected the best model on the data of the same forest using the nonformal criterion of the number of significance covariates at a 95% level because both models had the same number of covariates. We compared nested models for the same forest with the AIC information criterion. All models with an estimated $\beta_{j},j=1,\ldots,K$ equal to the upper or lower bounds (1- $\epsilon$ or -1+ $\epsilon$ ) were not included in our analysis.

4 Results

The best models for the Témiscamingue and Matawinie sites were those having the seasonal range of daily maximum and minimum temperatures as covariates (see Table 2). The best model selected for our two sites suggested not only that the effect of temperature on defoliation dynamics was nonlinear but that this effect also varied according to season (summer and spring). The estimated coefficients for summer temperature and spring temperature showed a quadratic effect of temperature with a maximum and minimum, respectively, for summer and spring temperatures. Both regions showed that up to a certain temperature threshold (see details in the Appendix)—estimated at 22.7 ^∘C (CI of 0–39.7 ^∘C at 95%) for Témiscamingue and 21.8 ^∘C (CI of 0–54.2 ^∘C at 95%) for Matawinie—a greater range of the summer daily maximum temperatures was associated with lower levels of defoliation. Over the entire period covered by our study, summer daily maximum temperatures used for modeling ranged between 15 ^∘C and 30 ^∘C, with a median of 23.3 ^∘C for Témiscamingue. These values were between 15 ^∘C and 28.3 ^∘C, with a median of 21.1 ^∘C for Matawinie.

The best model for Matawinie showed that a greater range in spring temperature was associated with a higher level of defoliation. We estimated a threshold of 32.5 ^∘C (CI of 0–80.0 ^∘C) under which an increase in temperature promoted defoliation. Spring daily maximum temperatures in Matawinie ranged between 25 and 37.8 ^∘C, with a median of 31.1 ^∘C. Indeed, warmer springs are characterized by the premature emergence of larvae from dormancy (Volney and Fleming,, 2000, 2007).

The comparison of these regions revealed that defoliation processes differed between the sites. This difference related to their autoregressive structure. The model for Témiscamingue was similar to a parameter-driven model (i.e., the latent process does not depend on previous values of the observed process), as the p-value of the test H ${}_{0}:\alpha_{1}=\alpha_{2}=\alpha_{3}=0$ was $0.512$ . This was not the case for Matawinie. We also denoted the statistical significance of the feedback parameters, in particular $\beta_{2}$ for both regions, indicating a link between $Y_{t}$ and its history. This result is consistent with the known cumulative effects of defoliation on tree growth (Chen et al.,, 2017; Houndode et al.,, 2021).

Table 2: Estimate of selected best models for Témiscamingue and Matawinie

$\Delta$ temp. : range of temperature over season
	Number of parameters = 14 n = 72
	Témiscamingue	Matawinie	Comparison of both models
	Parameters	Parameters	t-statistics	Signif. code
	(t-stat)	(t-stat)
$\omega_{1}$	-0.684	17.808	0.144
	(-0.005)	(0.635)
$\omega_{2}$	-35.239	-2.233	0.149
	(-0.160)	(-0.084)
$\omega_{3}$	-35.745	-6.407	0.133
	(-0.163)	(-0.244)
$\Delta$ temp. max. spring	56.351	31.206*	0.254
	(0.564)	(2.299)
$\Delta$ temp. min. summer	-2.454	1.248	0.543
	(-0.366)	(1.061)
$\Delta$ temp. max. summer	-58.714***	-67.179*	0.253
	(-3.820)	(-2.198)
Square of $\Delta$ temp. max. spring	-10.742	-4.794*	0.317
	(-0.568)	(-2.203)
Square of $\Delta$ temp. max. summer	12.927***	15.404*	0.338
	(4.966)	(2.219)
$\alpha_{1}$	34.638	3.201*	0.310
	(0.341)	(2.004)
$\alpha_{2}$	40.594	25.209***	0.136
	(0.357)	(3.264)
$\alpha_{3}$	42.134	27.066***	0.132
	(0.367)	(3.111)
$\beta_{1}$	0.839	0.263	0.824
	(1.348)	(0.669)
$\beta_{2}$	0.811***	-0.985***	9.288	***
	(4.212)	(-52.428)
$\beta_{3}$	0.693***	-0.009	0.460
	(3.614)	(-0.005)
AIC :	48.162	90.400	Comparison stat : 12006.0***
Portemanteau stat :	2.238	0.082	Comparison p-value : < $10^{-5}$
Portemanteau p-value :	0.524	0.994
Signif. code : 0.90 . 0.95 * 0.99 0.999 * 1

5 Discussion and perspectives

Ecological studies that involve time series are generally based on intensive studies of restricted populations or small-scale ecosystem recovery. Ecological variability often occurs in a few places and under relatively static conditions (Goodsell et al.,, 2021). It is therefore advantageous to cover a larger area more quickly, simply by estimating the data on an ordinal scale. Here, we propose a statistical tool for modeling adjacent-categories time series. Using the intensity of defoliation data as a case study, we relate the impact of climate variation on intensity of SBW defoliation in eastern Canadian forests through a categorical-adjacent autoregressive model. Our proposed model provides a useful tool for a more in-depth analysis of the impact of climate on defoliation because each successive defoliation level can be influenced differently by environmental factors during a SBW outbreak (Bouchard and Auger,, 2014; Jepsen et al.,, 2009), and SBW development also depends on defoliation in previous years (Bouchard and Auger,, 2014).

Our paper presents two majors findings. Although several studies show that increased temperatures (caused by climate change) increase in the frequency and severity of SBW outbreaks (Overpeck et al.,, 1990; Régnière and Nealis,, 2007; Pureswaran et al.,, 2015; Bouchard and Auger,, 2014; Bouchard et al.,, 2018; Moise et al.,, 2019; Pureswaran et al.,, 2019), we observe the same effects of warmer summer temperature; however, the effect from a threshold was reversed. In fact, extreme temperatures reduce SBW fecundity, increase mortality and affect the sex ratio via an impact on endosymbionts by favoring the production of male individuals (Hance et al.,, 2007). Moreover, certain temperatures favor the population dynamics of the SBW, but when this temperature preference range is exceeded, this factor negatively affects SBW. Summer temperature has a much greater influence on SBW population recruitment. Summer marks the development of larvae and the reproduction of adult SBW, life cycle steps that are closely linked to a number of climatic factors, including temperature (Bouchard and Auger,, 2014; Pureswaran et al.,, 2015; Volney and Fleming,, 2000; Pureswaran et al.,, 2019). Indeed, summer corresponds to the period when female SBW deposit their eggs (see Figure 2). Our results suggest that high variations in temperature induce a low survival egg rate. SBW density growth is determined by the egg recruitment rate and generation survival rate, e.g., Royama et al., (2005), because the success of SBW in establishing itself on sites depends on reproduction, the initial weight of the eggs and the synchronization of their development with that of the buds of their hosts, which are all strongly influenced by climatic factors (Volney and Fleming,, 2000, 2007).

We also denote a negative effect of warmer annual spring temperatures on defoliation levels. Spring is the critical period for defoliation, as it marks the emergence of SBW larvae from hibernation and the start of needle mining by SBW hosts. Early spring is generally marked by mild temperatures, which favors cohabitation between SBW and its parasitoids (Bouchard et al.,, 2018; Royama et al.,, 2005). This cohabitation could explain the negative effect of temperature in spring. SBW benefits from the negative synergistic effects of temperature pressures on its natural enemies, particularly its parasitoids. For example, the overall performance of some parasitoids is severely reduced at high temperatures (Régnière et al.,, 2020; Hébert and Cloutier,, 1990; Seehausen et al.,, 2017; Smith et al.,, 1986; Royama et al.,, 2005), e.g., the SBW parasitoid Tranosema rostrale, whose survival is negatively correlated with temperatures above 20 ^∘C (Régnière et al.,, 2020; Seehausen et al.,, 2017). Although we observe a high uncertainty for the threshold value of spring temperature, our results clearly illustrate the interaction dynamics between SBW and its parasitoids in relation to temperature and the consequences on defoliation levels.

The ACAR models perform quite well at both sites and reveal differences between the defoliation processes within each. There are almost 120 ecological sites in the reconstruction of Navarro et al., (2018); unfortunately, we consider only two of these because of the unavailability of local climate data in the NOAA (https://www.ncei.noaa.gov/) database for the full time period at the remaining sites. Fitting a classification and regression tree (CART) model (Loh,, 2011) to all these sites could lead to a clustering analysis using the test statistic of Proposition 4 as a dissimilarity measure and therefore provide a deep dive into the SBW defoliation processes in Quebec’s forests. A limitation of our study is that we rely on reconstructions of defoliation intensity. Thus, although we determine interesting insights into SBW-related defoliation, we clearly need direct observations of defoliation to validate our conclusions. For example, some prior distributions on model parameters could be extracted from our study to conduct Bayesian analysis on the shorter-length defoliation data (collected since the late 1960s) of the Ministère des Forêts, de la Faune et des Parcs (Quebec, Canada).

In regard to the statistical aspects, many questions remain about our ACAR model. For example, one can need a test for the condition of Proposition 1 to obtain a unique stationary and ergodic solution. Another statistically interesting question is the optimal selection of the set of covariates with penalized autoregression, like LASSO or Ridge penalties on the parameters $\gamma$ (Tibshirani,, 1996; Poignard,, 2020).

Author contributions

Design and conceptualization: OJFO, ZMD, PM and MMG. R software programming: OJFO and ZMD. Models and data analysis: OJFO, ZMD and PM. Writing first draft: OJFO. Supervision and validation: PM and MMG. All authors provided input for the manuscript, contributed critically to the drafts and gave final approval for publication.

Acknowledgments

We acknowledge all the previous projects and researchers from which we compiled the dendrochronological data. We also thank Professor Paul Doukhan from Paris University for his valuable comments.

Funding

PM and MMG obtained funding from the Natural Sciences and Engineering Research Council of Canada (NSERC) Alliance and the Québec Ministry of Natural Resources and Forests (MRNF) to understand the dynamics of spruce budworm (ALLRP 558267-20). MMG obtained additional funding from an NSERC Discovery grant to reconstruct the regime of natural disturbances in forest ecosystems (RGPIN-2022-05423). ZMD received support from the Fonds de recherche du Québec – Nature et technologies (FRQNT) (PBEEE Scholarship DEBZI-334569) and a CY Initiative of Excellence (grant ”Investissements d’Avenir” ANR-16-IDEX-0008), Project ”EcoDep” PSI-AAP2020-0000000013.

Conflict of interest statement

The authors have no conflict of interest to declare.

Data availability statement

The defoliation data set is available on figshare https://figshare.com/articles/dataset/Untitled_Item/7887746.

The underlying R code can be downloaded at https://github.com/judiprodige/Adjacent-category-time-series-models.git. A R package with ready-to-use functions will be available soon.

References

Aakala et al., (2023) Aakala, T., Remy, C. C., Arseneault, D., Morin, H., Girardin, M. P., Gennaretti, F., Navarro, L., Kuosmanen, N., Ali, A. A., Boucher, É., et al. (2023). Millennial-scale disturbance history of the boreal zone. In Boreal Forests in the face of climate change: sustainable management, pages 53–87. Springer.
Agresti, (2010) Agresti, A. (2010). Analysis of ordinal categorical data, volume 656. John Wiley & Sons.
Akashi et al., (2021) Akashi, F., Taniguchi, M., Monti, A., and Amano, T. (2021). Diagnostic Methods in Time Series. SpringerBriefs in Statistics. Springer Singapore.
Anderegg et al., (2015) Anderegg, W. R., Hicke, J. A., Fisher, R. A., Allen, C. D., Aukema, J., Bentz, B., Hood, S., Lichstein, J. W., Macalady, A. K., McDowell, N., et al. (2015). Tree mortality from drought, insects, and their interactions in a changing climate. New Phytologist, 208(3):674–683.
Anderson and Burnham, (2004) Anderson, D. and Burnham, K. (2004). Model selection and multi-model inference. Second. NY: Springer-Verlag, 63(2020):10.
Berchtold and Raftery, (2002) Berchtold, A. and Raftery, A. (2002). The mixture transition distribution model for high-order Markov chains and non-Gaussian time series. Statistical Science, 17(3):328 – 356.
Berguet et al., (2021) Berguet, C., Martin, M., Arseneault, D., and Morin, H. (2021). Spatiotemporal dynamics of 20th-century spruce budworm outbreaks in eastern canada: Three distinct patterns of outbreak severity. Frontiers in Ecology and Evolution, 8:544088.
Bishop and Nasrabadi, (2006) Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 4. Springer.
Boakye et al., (2022) Boakye, E. A., Houle, D., Bergeron, Y., Girardin, M. P., and Drobyshev, I. (2022). Insect defoliation modulates influence of climate on the growth of tree species in the boreal mixed forests of eastern canada. Ecology and Evolution, 12(3):e8656.
Bouchard and Auger, (2014) Bouchard, M. and Auger, I. (2014). Influence of environmental factors and spatio-temporal covariates during the initial development of a spruce budworm outbreak. Landscape Ecology, 29(1):111–126.
Bouchard et al., (2006) Bouchard, M., Kneeshaw, D., and Bergeron, Y. (2006). Forest dynamics after successive spruce budworm outbreaks in mixedwood forests. Ecology, 87(9):2319–2329.
Bouchard et al., (2018) Bouchard, M., Régnière, J., and Therrien, P. (2018). Bottom-up factors contribute to large-scale synchrony in spruce budworm populations. Canadian Journal of Forest Research, 48(3):277–284.
Byrd et al., (1995) Byrd, R. H., Lu, P., Nocedal, J., and Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190–1208.
Candau and Fleming, (2011) Candau, J.-N. and Fleming, R. A. (2011). Forecasting the response of spruce budworm defoliation to climate change in ontario. Canadian Journal of Forest Research, 41(10):1948–1960.
Chagnon et al., (2022) Chagnon, C., Bouchard, M., and Pothier, D. (2022). Impacts of spruce budworm defoliation on the habitat of woodland caribou, moose, and their main predators. Ecology and Evolution, 12(3):e8695.
Chang et al., (2012) Chang, W.-Y., Lantz, V. A., Hennigar, C. R., and MacLean, D. A. (2012). Economic impacts of forest pests: a case study of spruce budworm outbreaks and control in new brunswick, canada. Canadian Journal of Forest Research, 42(3):490–505.
Chen et al., (2017) Chen, C., Weiskittel, A., Bataineh, M., and MacLean, D. A. (2017). Even low levels of spruce budworm defoliation affect mortality and ingrowth but net growth is more driven by competition. Canadian Journal of Forest Research, 47(11):1546–1556.
Cox et al., (1981) Cox, D. R., Gudmundsson, G., Lindgren, G., Bondesson, L., Harsaae, E., Laake, P., Juselius, K., and Lauritzen, S. L. (1981). Statistical analysis of time series: Some recent developments [with discussion and reply]. Scandinavian Journal of Statistics, pages 93–115.
Davis et al., (2003) Davis, R. A., Dunsmuir, W. T., and Streett, S. B. (2003). Observation-driven models for poisson counts. Biometrika, 90(4):777–790.
Davis et al., (2016) Davis, R. A., Holan, S. H., Lund, R., and Ravishanker, N. (2016). Handbook of discrete-valued time series. CRC Press.
De Grandpré et al., (2018) De Grandpré, L., Waldron, K., Bouchard, M., Gauthier, S., Beaudet, M., Ruel, J.-C., Hébert, C., and Kneeshaw, D. D. (2018). Incorporating insect and wind disturbances in a natural disturbance-based management framework for the boreal forest.
Debaly et al., (2022) Debaly, Z. M., Marchand, P., and Girona, M. M. (2022). Autoregressive models for time series of random sums of positive variables: Application to tree growth as a function of climate and insect outbreak. Ecological Modelling, 471:110053.
Debaly and Truquet, (2021) Debaly, Z. M. and Truquet, L. (2021). Iterations of dependent random maps and exogeneity in nonlinear dynamics. Econometric Theory, pages 1––38.
Debaly and Truquet, (2023) Debaly, Z.-M. and Truquet, L. (2023). Multivariate time series models for mixed data. Bernoulli, 29(1):669–695.
Dhar et al., (2014) Dhar, S. S., Chakraborty, B., and Chaudhuri, P. (2014). Comparison of multivariate distributions using quantile–quantile plots and related tests. Bernoulli, 20(3):1484–1506.
Doukhan et al., (2020) Doukhan, P., Grublytė, I., Pommeret, D., and Reboul, L. (2020). Comparing the marginal densities of two strictly stationary linear processes. Annals of the Institute of Statistical Mathematics, 72(6):1419–1447.
Doukhan et al., (2015) Doukhan, P., Pommeret, D., and Reboul, L. (2015). Data driven smooth test of comparison for dependent sequences. Journal of Multivariate Analysis, 139:147–165.
Doukhan et al., (2016) Doukhan, P., Surgailis, D., et al. (2016). A nonlinear model for long-memory conditional heteroscedasticity. Lithuanian Mathematical Journal, 56(2):164–188.
D’Amato et al., (2023) D’Amato, A. W., Palik, B. J., Raymond, P., Puettmann, K. J., and Girona, M. M. (2023). Building a framework for adaptive silviculture under global change. In Boreal Forests in the Face of Climate Change: Sustainable Management, pages 359–381. Springer.
Eickenscheidt et al., (2019) Eickenscheidt, N., Augustin, N., and Wellbrock, N. (2019). Spatio-temporal modelling of forest monitoring data: modelling german tree defoliation data collected between 1989 and 2015 for trend estimation and survey grid examination using gamms. iForest-Biogeosciences and Forestry, 12(4):338.
Eveleigh et al., (2007) Eveleigh, E. S., McCann, K. S., McCarthy, P. C., Pollock, S. J., Lucarotti, C. J., Morin, B., McDougall, G. A., Strongman, D. B., Huber, J. T., Umbanhowar, J., et al. (2007). Fluctuations in density of an outbreak species drive diversity cascades in food webs. Proceedings of the National Academy of Sciences, 104(43):16976–16981.
Ferretti et al., (2014) Ferretti, M., Nicolas, M., Bacaro, G., Brunialti, G., Calderisi, M., Croisé, L., Frati, L., Lanier, M., Maccherini, S., Santi, E., and Ulrich, E. (2014). Plot-scale modelling to detect size, extent, and correlates of changes in tree defoliation in french high forests. Forest Ecology and Management, 311:56–69. Monitoring European forests: detecting and understanding changes.
Fokianos and Kedem, (2003) Fokianos, K. and Kedem, B. (2003). Regression theory for categorical time series. Statistical Science, 18(3):357–376.
Fokianos et al., (2020) Fokianos, K., Stove, B., Tjostheim, D., and Doukhan, P. (2020). Multivariate count autoregression. Bernoulli.
Fokianos and Tjøstheim, (2012) Fokianos, K. and Tjøstheim, D. (2012). Nonlinear poisson autoregression. Annals of the Institute of Statistical Mathematics, 64(6):1205–1225.
Francq and Zakoian, (2019) Francq, C. and Zakoian, J.-M. (2019). GARCH models: structure, statistical inference and financial applications. Wiley.
Fuentealba et al., (2022) Fuentealba, A., Dupont, A., Quezada-Garcia, R., and Bauce, E. (2022). Efficacy of insecticide aerial spraying programs to reduce tree mortality during a spruce budworm outbreak (1967–1992) in the province of quebec. Agricultural and Forest Entomology, 24(4):586–599.
Gauthier et al., (2023) Gauthier, S., Kuuluvainen, T., Macdonald, S. E., Shorohova, E., Shvidenko, A., Bélisle, A.-C., Vaillancourt, M.-A., Leduc, A., Grosbois, G., Bergeron, Y., et al. (2023). Ecosystem management of the boreal forest in the era of global change. In Boreal forests in the face of climate change: Sustainable management, pages 3–49. Springer.
Germain et al., (2021) Germain, M., Kneeshaw, D., De Grandpré, L., Desrochers, M., James, P., Vepakomma, U., Poulin, J.-F., and Villard, M.-A. (2021). Insectivorous songbirds as early indicators of future defoliation by spruce budworm. Landscape Ecology, 36(10):3013–3027.
Girona et al., (2023) Girona, M. M., Aakala, T., Aquilué, N., Bélisle, A.-C., Chaste, E., Danneyrolles, V., Díaz-Yáñez, O., D’Orangeville, L., Grosbois, G., Hester, A., et al. (2023). Challenges for the sustainable management of the boreal forest under climate change. In Boreal Forests in the face of climate change: sustainable management, pages 773–837. Springer.
Goodsell et al., (2021) Goodsell, R. M., Childs, D. Z., Spencer, M., Coutts, S., Vergnon, R., Swinfield, T., Queenborough, S. A., and Freckleton, R. P. (2021). Developing hierarchical density-structured models to study the national-scale dynamics of an arable weed. Ecological Monographs, 91(3):e01449.
Gorgi and Koopman, (2021) Gorgi, P. and Koopman, S. J. (2021). Beta observation-driven models with exogenous regressors: A joint analysis of realized correlation and leverage effects. Journal of Econometrics.
Grondin et al., (1996) Grondin, P., Ansseau, C., Bélanger, L., Bergeron, J., Bergeron, Y., Bouchard, A., Brisson, J., De Grandpré, L., Gagnon, G., Lavoie, C., et al. (1996). Écologie forestière. Manuel de foresterie, pages 133–279.
Guolo and Varin, (2014) Guolo, A. and Varin, C. (2014). Beta regression for time series analysis of bounded data, with application to canada google® flu trends. The Annals of Applied Statistics, 8(1):74–88.
Hance et al., (2007) Hance, T., van Baaren, J., Vernon, P., and Boivin, G. (2007). Impact of extreme temperatures on parasitoids in a climate change perspective. Annual Review of Entomology, 52:107–126.
Hastie et al., (2009) Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning. Cited on, 33.
Hastie, (2017) Hastie, T. J. (2017). Generalized additive models. In Statistical models in S, pages 249–307. Routledge.
Hébert and Cloutier, (1990) Hébert, C. and Cloutier, C. (1990). Temperature-dependent development of eggs and larvae of Winthemia fumiferanae toth.(diptera: Tachinidae), a larval–pupal parasitoid of the spruce budworm (lepidoptera: Tortricidae). The Canadian Entomologist, 122(2):329–341.
Heyde, (2013) Heyde, C. (2013). Quasi-Likelihood And Its Application: A General Approach to Optimal Parameter Estimation. Springer Series in Statistics. Springer New York.
Houndode et al., (2021) Houndode, D. J., Krause, C., and Morin, H. (2021). Predicting balsam fir mortality in boreal stands affected by spruce budworm. Forest Ecology and Management, 496:119408.
Jacobs and Lewis, (1978) Jacobs, P. A. and Lewis, P. A. (1978). Discrete time series generated by mixtures. i: Correlational and runs properties. Journal of the Royal Statistical Society: Series B (Methodological), 40(1):94–105.
Jactel et al., (2019) Jactel, H., Koricheva, J., and Castagneyrol, B. (2019). Responses of forest insect pests to climate change: not so simple. Current Opinion in Insect Science, 35:103–108.
Jardon et al., (2003) Jardon, Y., Morin, H., and Dutilleul, P. (2003). Périodicité et synchronisme des épidémies de la tordeuse des bourgeons de l’épinette au québec. Canadian Journal of Forest Research, 33(10):1947–1961.
Jepsen et al., (2009) Jepsen, J. U., Hagen, S. B., Karlsen, S.-R., and Ims, R. A. (2009). Phase-dependent outbreak dynamics of geometrid moth linked to host plant phenology. Proceedings of the Royal Society B: Biological Sciences, 276(1676):4119–4128.
Kammann and Wand, (2003) Kammann, E. E. and Wand, M. P. (2003). Geoadditive models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52(1):1–18.
Kayes and Mallik, (2020) Kayes, I. and Mallik, A. (2020). Boreal forests: Distributions, biodiversity, and management.
Kottek et al., (2006) Kottek, M., Grieser, J., Beck, C., Rudolf, B., and Rubel, F. (2006). World map of the köppen-geiger climate classification updated. Meteorologische Zeitschrift, 15(3):259–263.
Lavoie et al., (2021) Lavoie, J., Montoro Girona, M., Grosbois, G., and Morin, H. (2021). Does the type of silvicultural practice influence spruce budworm defoliation of seedlings? Ecosphere, 12(4):e03506.
Lavoie et al., (2019) Lavoie, J., Montoro Girona, M., and Morin, H. (2019). Vulnerability of conifer regeneration to spruce budworm outbreaks in the eastern canadian boreal forest. Forests, 10(10):850.
Leonard, (1999) Leonard, T. (1999). A course in categorical data analysis, volume 45. CRC Press.
Li et al., (2020) Li, M., MacLean, D. A., Hennigar, C. R., and Ogilvie, J. (2020). Previous year outbreak conditions and spring climate predict spruce budworm population changes in the following year. Forest Ecology and Management, 458:117737.
Liu et al., (2019) Liu, Z., Peng, C., De Grandpré, L., Candau, J.-N., Work, T., Huang, C., and Kneeshaw, D. (2019). Simulation and analysis of the effect of a spruce budworm outbreak on carbon dynamics in boreal forests of québec. Ecosystems, 22(8):1838–1851.
Liu et al., (2020) Liu, Z., Peng, C., De Grandpré, L., Candau, J.-N., Work, T., Zhou, X., and Kneeshaw, D. (2020). Aerial spraying of bacterial insecticides to control spruce budworm defoliation leads to reduced carbon losses. Ecosphere, 11(1):e02988.
Liu et al., (2022) Liu, Z., Peng, C., MacLean, D. A., De Grandpré, L., Candau, J.-N., and Kneeshaw, D. (2022). Evaluating and quantifying the effect of various spruce budworm intervention strategies on forest carbon dynamics in atlantic canada. Forest Ecosystems, 9:100052.
Loh, (2011) Loh, W.-Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):14–23.
MacLean, (2016) MacLean, D. A. (2016). Impacts of insect outbreaks on tree mortality, productivity, and stand development. The Canadian Entomologist, 148(S1):S138–S159.
Manner et al., (2016) Manner, H., Türk, D., and Eichler, M. (2016). Modeling and forecasting multivariate electricity price spikes. Energy Economics, 60:255–265.
Ministère des Forêts de la Faune et des Parcs du Québec, (2023) Ministère des Forêts de la Faune et des Parcs du Québec, M. (2023). Données sur les perturbations naturelles - insecte : Tordeuse des bourgeons de l’épinette. Online; Accessed: 2022-05-19.
Moise et al., (2019) Moise, E. R., Lavigne, M. B., and Johns, R. C. (2019). Density has more influence than drought on spruce budworm (Choristoneura fumiferana) performance under outbreak conditions. Forest Ecology and Management, 433:170–175.
Montoro Girona et al., (2018) Montoro Girona, M., Navarro, L., and Morin, H. (2018). A secret hidden in the sediments: Lepidoptera scales. Frontiers in Ecology and Evolution, 6:2.
Moysiadis and Fokianos, (2014) Moysiadis, T. and Fokianos, K. (2014). On binary and categorical time series models with feedback. Journal of Multivariate Analysis, 131:209–228.
Navarro et al., (2018) Navarro, L., Morin, H., Bergeron, Y., and Girona, M. M. (2018). Changes in spatiotemporal patterns of 20th century spruce budworm outbreaks in eastern canadian boreal forests. Frontiers in Plant Science, 9:1905.
Overpeck et al., (1990) Overpeck, J. T., Rind, D., and Goldberg, R. (1990). Climate-induced changes in forest disturbance and vegetation. Nature, 343(6253):51–53.
Perrier et al., (2021) Perrier, J. M., Kneeshaw, D., St-Laurent, M.-H., Pyle, P., and Villard, M.-A. (2021). Budworm-linked warblers as early indicators of defoliation by spruce budworm: a field study. Ecological Indicators, 125:107543.
Poignard, (2020) Poignard, B. (2020). Asymptotic theory of the adaptive sparse group lasso. Annals of the Institute of Statistical Mathematics, 72(1):297–328.
Pommeret et al., (2022) Pommeret, D., Reboul, L., and Yao, A.-f. (2022). Testing the equality of the laws of two strictly stationary processes. Statistical Inference for Stochastic Processes, pages 1–22.
Pureswaran et al., (2015) Pureswaran, D. S., De Grandpré, L., Paré, D., Taylor, A., Barrette, M., Morin, H., Régnière, J., and Kneeshaw, D. D. (2015). Climate-induced changes in host tree–insect phenology may drive ecological state-shift in boreal forests. Ecology, 96(6):1480–1491.
Pureswaran et al., (2019) Pureswaran, D. S., Neau, M., Marchand, M., De Grandpré, L., and Kneeshaw, D. (2019). Phenological synchrony between eastern spruce budworm and its host trees increases with warmer temperatures in the boreal forest. Ecology and Evolution, 9(1):576–586.
Pureswaran et al., (2018) Pureswaran, D. S., Roques, A., and Battisti, A. (2018). Forest insects and climate change. Current Forestry Reports, 4(2):35–50.
R Core Team, (2022) R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Régnière and Nealis, (2007) Régnière, J. and Nealis, V. (2007). Ecological mechanisms of population change during outbreaks of the spruce budworm. Ecological Entomology, 32(5):461–477.
Régnière and Nealis, (2019) Régnière, J. and Nealis, V. G. (2019). Influence of temperature on historic and future population fitness of the western spruce budworm, choristoneura occidentalis. International Journal of Pest Management, 65(3):228–243.
Régnière et al., (2020) Régnière, J., Seehausen, M. L., and Martel, V. (2020). Modeling climatic influences on three parasitoids of low-density spruce budworm populations. part 1: Tranosema rostrale (hymenoptera: Ichneumonidae). Forests, 11(8):846.
Royama et al., (2005) Royama, T., MacKinnon, W., Kettela, E., Carter, N., and Hartling, L. (2005). Analysis of spruce budworm outbreak cycles in new brunswick, canada, since 1952. Ecology, 86(5):1212–1224.
Safranyik et al., (2010) Safranyik, L., Carroll, A. L., Régnière, J., Langor, D., Riel, W., Shore, T., Peter, B., Cooke, B., Nealis, V., and Taylor, S. (2010). Potential for range expansion of mountain pine beetle into the boreal forest of north america. The Canadian Entomologist, 142(5):415–442.
Salisbury et al., (2013) Salisbury, F., Wareing, P., and Galston, A. (2013). The Flowering Process. Elsevier Science.
Sato et al., (2023) Sato, H., Chaste, E., Girardin, M. P., Kaplan, J. O., Hély, C., Candau, J.-N., and Mayor, S. J. (2023). Dynamically simulating spruce budworm in eastern canada and its interactions with wildfire. Ecological Modelling, 483:110412.
Schowalter, (2022) Schowalter, T. D. (2022). Insect ecology: an ecosystem approach. Academic Press.
Scott et al., (2023) Scott, L. N., Smith, S. M., Gunn, J. S., Petrik, M., Ducey, M. J., Buchholz, T. S., and Belair, E. P. (2023). Salvage decision-making based on carbon following an eastern spruce budworm outbreak. Frontiers in Forests and Global Change, 6:1062176.
Seehausen et al., (2017) Seehausen, M. L., Régnière, J., Martel, V., and Smith, S. M. (2017). Developmental and reproductive responses of the spruce budworm (lepidoptera: Tortricidae) parasitoid Tranosema rostrale (hymenoptera: Ichneumonidae) to temperature. Journal of Insect Physiology, 98:38–46.
Seidl et al., (2023) Seidl, R., Fortin, M.-J., Honkaniemi, J., and Lucash, M. (2023). Modeling natural disturbances in boreal forests. In Boreal Forests in the Face of Climate Change: Sustainable Management, pages 591–612. Springer.
Seidl et al., (2017) Seidl, R., Thom, D., Kautz, M., Martin-Benito, D., Peltoniemi, M., Vacchiano, G., Wild, J., Ascoli, D., Petr, M., Honkaniemi, J., et al. (2017). Forest disturbances under climate change. Nature Climate Change, 7(6):395–402.
Seidling, (2007) Seidling, W. (2007). Signals of summer drought in crown condition data from the german level i network. European Journal of Forest Research, 126(4):529–544.
Shalev-Shwartz and Ben-David, (2014) Shalev-Shwartz, S. and Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.
Simard et al., (2006) Simard, I., Morin, H., and Lavoie, C. (2006). A millennial-scale reconstruction of spruce budworm abundance in saguenay, quéebec, canada. The Holocene, 16(1):31–37.
Smith et al., (1986) Smith, S., Hubbes, M., and Carrow, J. (1986). Factors affecting inundative releases of Trichogramma minutum ril. against the spruce budworm 1. Journal of Applied Entomology, 101(1-5):29–39.
Soti et al., (2019) Soti, V., Thiaw, I., Debaly, Z. M., Sow, A., Diaw, M., Fofana, S., Diakhate, M., Thiaw, C., and Brévault, T. (2019). Effect of landscape diversity and crop management on the control of the millet head miner, Heliocheilus albipunctella (lepidoptera: Noctuidae) by natural enemies. Biological Control, 129:115–122.
Straumann et al., (2006) Straumann, D., Mikosch, T., et al. (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: a stochastic recurrence equations approach. The Annals of Statistics, 34(5):2449–2495.
Sturtevant et al., (2015) Sturtevant, B. R., Cooke, B. J., Kneeshaw, D. D., and MacLean, D. A. (2015). Modeling insect disturbance across forested landscapes: insights from the spruce budworm. In Simulation modeling of forest landscape disturbances, pages 93–134. Springer.
Subedi et al., (2023) Subedi, A., Marchand, P., Bergeron, Y., Morin, H., and Girona, M. M. (2023). Climatic conditions modulate the effect of spruce budworm outbreaks on black spruce growth. Agricultural and Forest Meteorology, 339:109548.
Tibshirani, (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
Truquet, (2020) Truquet, L. (2020). Coupling and perturbation techniques for categorical time series. Bernoulli, 26(4):3249 – 3279.
Van der Vaart, (2000) Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge University Press.
Volney and Fleming, (2000) Volney, W. J. A. and Fleming, R. A. (2000). Climate change and impacts of boreal forest insects. Agriculture, Ecosystems & Environment, 82(1-3):283–294.
Volney and Fleming, (2007) Volney, W. J. A. and Fleming, R. A. (2007). Spruce budworm (choristoneura spp.) biotype reactions to forest and climate characteristics. Global Change Biology, 13(8):1630–1643.
Weiß, (2018) Weiß, C. H. (2018). An introduction to discrete-valued time series, chapter 4. John Wiley & Sons.
Weiß, (2020) Weiß, C. H. (2020). Regime-switching discrete arma models for categorical time series. Entropy, 22(4):458.
Zhu et al., (2018) Zhu, C., Zhang, X., Zhang, N., Hassan, M. A., and Zhao, L. (2018). Assessing the defoliation of pine forests in a long time-series and spatiotemporal prediction of the defoliation using landsat data. Remote Sensing, 10(3).
Zierl, (2004) Zierl, B. (2004). A simulation study to analyse the relations between crown condition and drought in switzerland. Forest Ecology and Management, 188(1):25–38.

Appendix

Proof of Proposition 1

Let us set $\lambda_{t}=(\lambda_{1,t},\ldots,\lambda_{K,t})^{\top}.$ From Equation (2.3),

\lambda_{t}=\tilde{\omega}+\tilde{A}\overline{Y}_{t-1}+\tilde{B}\lambda_{t-1},t\in\mathbb{Z},

where $\tilde{\omega}=(\sum_{j=1}^{k}\omega_{j},{k=1,\ldots,K}),\tilde{A}=(\alpha|2\alpha|\cdots|K\alpha)^{\top}$ and

\tilde{B}=\begin{pmatrix}\beta_{1}&0&\cdots&0\\ \beta_{1}&\beta_{2}&0\cdots\\ \vdots&&\ddots&\ddots\\ \beta_{1}&\beta_{2}&\cdots&\beta_{K}\end{pmatrix}.

The result of Proposition 1 follows (Truquet,, 2020) Proposition 4 which provides stability conditions for category baseline multinomial autoregression. Indeed, for $k=1,\ldots,K,|\beta_{k}|<1$ , the roots $1/\beta_{k},k=1,\ldots,K$ of polynomial

\mathcal{P}(z)=\mathrm{det}\left(I-\tilde{B}z\right)=\Pi_{j=1}^{K}(1-\beta_{j}z)

(5.1)

are outside the unit disk. Moreover, the moments of latent process depend on that of $(X_{t})_{t\in Z}$ , as $Y_{k,t}$ ’s take only two values : $0$ and $1.$

Condition for identifiability of

$\theta_{0}.$

I0

For any $\theta\in\Theta,\leavevmode\nobreak\ |\beta_{k}|,k=1,\ldots,K<1$ ;
I1

For any $v\in\mathbb{R}^{P}$ ,

$v^{\prime}X_{1}\in\mathcal{F}_{0}\vee\sigma(U_{1})\Rightarrow v=0;$
I2

The distribution of $U_{0}$ is non-degenerate;
I3

If $v$ is equal to any $\alpha_{0,k}\iota,k=1,\ldots,K$ or $\gamma_{0,p}\iota,p=1,\ldots,P$ with $\iota=(1,\ldots,1)^{\top}$ has $K$ components, the equalities $B^{j}v=B_{0}^{j}v,\quad j\geq 1$ , entail $B=B_{0}$ .

Derivative of latent process

For $k=1,\ldots,K,$

$\displaystyle\frac{\partial{\eta_{k,t}(\theta)}}{\partial{\omega_{\ell}}}$	$\displaystyle=$	$\displaystyle 0,\ell\neq k,$	(5.2)
$\displaystyle\frac{\partial{\eta_{k,t}(\theta)}}{\partial{\omega_{k}}}$	$\displaystyle=$	$\displaystyle 1+\beta_{k}\frac{\partial{\eta_{k,t-1}(\theta)}}{\partial{\omega_{j}}}=1+\sum_{\ell\geq 1}\beta_{k}^{\ell}=\frac{1}{1-\beta_{k}},$	(5.3)
$\displaystyle\frac{\partial{\eta_{k,t}(\theta)}}{\partial{\gamma}}$	$\displaystyle=$	$\displaystyle X_{t-1}+\beta_{k}\frac{\partial{\eta_{k,t-1}(\theta)}}{\partial{\gamma}}=\sum_{\ell\geq 0}\beta_{k}^{\ell}X_{t-\ell-1},$	(5.4)
$\displaystyle\frac{\partial{\eta_{k,t}(\theta)}}{\partial{\alpha}}$	$\displaystyle=$	$\displaystyle\overline{Y}_{t-1}+\beta_{k}\frac{\partial{\eta_{k,t-1}(\theta)}}{\partial{\alpha}}=\sum_{\ell\geq 0}\beta_{k}^{\ell}\overline{Y}_{t-\ell-1},$	(5.5)
$\displaystyle\frac{\partial{\eta_{k,t}(\theta)}}{\partial{\beta_{\ell}}}$	$\displaystyle=$	$\displaystyle 0,\ell\neq k,and$	(5.6)
$\displaystyle\frac{\partial{\eta_{k,t}(\theta)}}{\partial{\beta_{k}}}$	$\displaystyle=$	$\displaystyle\eta_{k,t-1}(\theta)+\beta_{k}\frac{\partial{\eta_{k,t-1}(\theta)}}{\partial{\beta_{k}}}=\sum_{\ell\geq 0}\beta_{k}^{\ell}\eta_{k,t-\ell-1}(\theta).$	(5.7)

Proof of Proposition 3

Let us denote by $\tilde{\rho}_{h}(\theta)$ the autocorrelation functions when the latent process is initialized at $t=0$ and $\tilde{\rho}_{h}$ its value evaluate at $\theta=\theta_{0}$ and define $\rho_{1:q}(\theta)$ and $\tilde{\rho}_{1:q}(\theta)$ (resp. $\rho_{1:q}$ and $\tilde{\rho}_{1:q}$ accordingly. Under the conditions

n^{1/2}|\rho_{1:q}-\tilde{\rho}_{1:q}|_{1}=o_{\mathbb{P}}(1);\quad\sup_{\theta}\|\nabla_{\theta}\rho_{1:q}(\theta)-\nabla_{\theta}\tilde{\rho}_{1:q}(\theta)\|=o_{\mathbb{P}}(1)\text{ and }\lim_{n\to\infty}\mathbb{E}\sup_{\theta}\|\nabla_{\theta}^{2}\rho_{1:q}(\theta)\|<\infty

obtained with $\mathbb{E}|X_{0}|_{2}^{r}<\infty$ for sustainable $r,$

\sqrt{n}\hat{\rho}_{1:q}=\sqrt{n}\rho_{1:q}+C_{q}\sqrt{n}(\hat{\theta}-\theta_{0})+o_{\mathbb{P}}(1)

(5.8)

see Francq and Zakoian, (2019), proof of Theorem 8.2 for the case of GARCH model. From Proposition 2,

\sqrt{n}(\hat{\theta}_{n}-\theta_{0})=-J^{-1}\frac{1}{\sqrt{n}}\sum_{t=1}^{n}s_{t}(\theta_{0})+o_{\mathbb{P}}(1).

We also have

\sqrt{n}\rho_{1:q}=n^{-1/2}\sum_{t=1}^{n}\sum_{k=1}^{K}e_{k,t}\iota_{k}.

(5.9)

It follows that

\sqrt{n}\begin{pmatrix}\hat{\theta}_{n}-\theta_{0}\\ \rho_{1:q}\end{pmatrix}=\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sum_{k=1}^{K}e_{k,t}\begin{pmatrix}J^{-1}\nabla_{\theta_{0}}\eta_{k,t}(\theta)\\ \iota_{k,t}\end{pmatrix}+0_{\mathbb{P}}(1).

Because the sequence $\left(\sum_{k=1}^{K}e_{k,t}\begin{pmatrix}\nabla_{\theta_{0}}\eta_{k,t}(\theta)\\ \iota_{k,t}\end{pmatrix}\right)_{t\in\mathbb{Z}}$ is a martingale difference sequence,

\sqrt{n}\begin{pmatrix}\hat{\theta}_{n}-\theta_{0}\\ \rho_{1:q}\end{pmatrix}\Rightarrow\mathcal{N}\left(0_{Kq+Q},\begin{pmatrix}J^{-1}LJ^{-1^{\top}}&G^{\top}\\ G&D\end{pmatrix}\right).

From Equation (5.8),

\sqrt{n}\hat{\rho}_{1:q}\Rightarrow\mathcal{N}(0_{Kq},D+C_{q}J^{-1}L{J^{-1}}^{\top}C_{q}^{\top}+GC_{q}^{\top}+C_{q}G^{\top}).

Under INV-1 ,

D+C_{q}J^{-1}L{J^{-1}}^{\top}C_{q}^{\top}+GC_{q}^{\top}+C_{q}G^{\top}=\mathbb{E}ZZ^{\top}

with

Z=\sum_{k=1}^{K}e_{k,0}\left(\iota_{k}+C_{q}J^{-1}\nabla_{\theta_{0}}\eta_{k,0}(\theta)\right)

is invertible. Then,

n\hat{\rho}_{1:q}^{\top}\left(D+C_{q}J^{-1}L{J^{-1}}^{\top}C_{q}^{\top}+GC_{q}^{\top}+C_{q}G^{\top}\right)^{-1}\hat{\rho}_{1:q}\Rightarrow\chi_{Kq}^{2}.

The result of the proposition follows from continuous map theorem.

Proof of Proposition 4

From Proposition 2, $\begin{pmatrix}s_{t}^{(1)}(\theta_{0}^{(1)})\\ s_{t}^{(2)}(\theta_{0}^{(2)})\end{pmatrix}_{t\in\mathbb{Z}}$ is a square integrable martingale difference sequence. Then,

	$\displaystyle\sqrt{n}\begin{pmatrix}\hat{\theta}_{n}^{(1)}-\theta_{0}^{(1)}\\ \hat{\theta}_{n}^{(2)}-\theta_{0}^{(2)}\end{pmatrix}$	$\displaystyle=$	$\displaystyle-\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\begin{pmatrix}J_{1}^{-1}s_{t}^{(1)}(\theta_{0}^{(1)})\\ J_{2}^{-1}s_{t}^{(2)}(\theta_{0}^{(2)})\end{pmatrix}+0_{\mathbb{P}}(1)$		(5.10)
		$\displaystyle\Rightarrow$	$\displaystyle\mathcal{N}\left(0_{2(3K+P)},\begin{pmatrix}J_{1}^{-1}L_{1}{J_{1}^{-1}}^{\top}&{J_{1}^{-1}}S^{\top}{J_{2}^{-1}}^{\top}\\ J_{2}^{-1}S{J_{1}^{-1}}^{\top}&J_{2}^{-1}L_{2}{J_{2}^{-1}}^{\top}\end{pmatrix}\right).$		(5.11)

It follows that, under the null hypothesis,

\sqrt{n}\left(\theta_{n}^{(1)}-\theta_{n}^{(2)}\right)=\sqrt{n}\left(\hat{\theta}_{n}^{(1)}-\theta_{0}^{(1)}\right)-\sqrt{n}\left(\hat{\theta}_{n}^{(2)}-\theta_{0}^{(2)}\right)\Rightarrow\mathcal{N}(0,V)

with

V=J_{1}^{-1}L_{1}{J_{1}^{-1}}^{\top}+J_{2}^{-1}L_{2}{J_{2}^{-1}}^{\top}+J_{2}^{-1}S{J_{1}^{-1}}^{\top}+{J_{1}^{-1}}S^{\top}{J_{2}^{-1}}^{\top}=\mathbb{E}UU^{\top},

and $U=J_{1}^{-1}s_{t}^{(1)}(\theta_{0}^{(1)})+J_{1}^{-2}s_{t}^{(2)}(\theta_{0}^{(2)}).$ The matrix $V$ is then invertible under INV-2. The result of the proposition follows from the continuous map theorem.

Threshold estimate for temperature effect

The thresholds are derived using the delta method (Van der Vaart,, 2000). Denoting $\theta_{0}(x)$ the regression parameter of covariate $x,$ the threshold of the temperature effect in spring, for example, is $\mathrm{TS}=\frac{\theta_{0}(\Delta\text{temp. max. spring})}{2\times\theta_{0}(\text{Square of}\Delta\text{temp. max. spring})}.$ Then, $\widehat{\mathrm{TS}}$ is obtained by replacing $\theta_{0}(x)$ by $\hat{\theta}(x).$ The variance of $\widehat{\mathrm{TS}}-\mathrm{TS}$ is $\hat{\mu}^{\top}\hat{\Sigma}_{spring}\hat{\mu}$ , where

\hat{\mu}^{\top}=\left(\frac{1}{2\times\hat{\theta}(\text{Square of}\Delta\text{temp. max. spring})},-\frac{\hat{\theta}(\Delta\text{temp. max. spring})}{2\times\hat{\theta}^{2}(\text{Square of}\Delta\text{temp. max. spring})}\right),

and $\hat{\Sigma}_{spring}$ is the estimate covariance matrix of

(\hat{\theta}(\Delta\text{temp. max. spring}),\hat{\theta}(\text{Square of}\Delta\text{temp. max. spring})).

Table 3: Results of the quasi-maximum likelihood estimation

n	$\theta_{0}$	1.2	0.7	0.5	-0.8	1.5	-1.5	2.0	2.0	0.3	-0.3	0.5	0.8	-0.2	0.3
70	CMLE	3.616	3.311	1.687	-1.613	3.144	-3.108	4.252	4.165	-0.583	-1.507	-0.304	0.818	-0.132	0.298
	TSE	6.794	7.424	7.080	0.735	1.246	1.223	1.577	1.594	7.216	7.706	7.425	0.099	0.435	0.345
	MAE	3.482	3.985	3.058	2.154	2.716	3.597	3.773	3.692	3.208	3.912	3.659	0.824	0.970	0.806
	MSE	8.783	10.275	9.006	2.924	4.256	4.895	6.040	5.579	9.017	9.459	10.067	1.032	1.175	1.008
100	CMLE	1.642	0.854	0.625	-1.097	1.908	-1.966	2.581	2.577	0.180	-0.709	0.649	0.809	-0.171	0.273
	TSE	1.011	1.317	1.130	0.400	0.588	0.622	0.784	0.774	1.203	1.346	1.314	0.064	0.337	0.292
	MAE	0.756	0.641	0.646	1.200	0.925	2.122	1.258	1.258	0.903	1.646	0.819	0.736	0.537	0.097
	MSE	1.666	1.809	1.835	0.219	0.313	0.345	0.470	0.451	1.828	1.819	1.863	0.034	0.129	0.130
300	CMLE	1.237	0.636	0.580	-0.805	1.492	-1.531	1.997	2.012	0.250	-0.394	0.460	0.807	-0.207	0.308
	TSE	0.359	0.520	0.404	0.179	0.277	0.284	0.356	0.358	0.432	0.510	0.494	0.035	0.192	0.158
	MAE	0.843	0.191	0.123	0.979	1.223	1.784	1.644	1.642	0.134	0.810	0.103	0.765	0.392	0.143
	MSE	0.122	0.100	0.114	0.051	0.089	0.097	0.123	0.117	0.130	0.122	0.153	0.011	0.049	0.042
500	CMLE	1.182	0.578	0.518	-0.801	1.486	-1.492	1.973	1.966	0.277	-0.292	0.539	0.799	-0.207	0.291
	TSE	0.265	0.392	0.307	0.137	0.216	0.216	0.276	0.275	0.329	0.376	0.373	0.028	0.152	0.127
	MAE	0.935	0.308	0.193	0.937	1.284	1.716	1.724	1.725	0.060	0.676	0.137	0.772	0.352	0.173
	MSE	0.064	0.061	0.067	0.032	0.057	0.053	0.072	0.071	0.075	0.077	0.089	0.007	0.032	0.028
	$\theta_{0}$	1.2	0.7	0.5	0.8	-1.5	1.5	-2.0	-2.0	-0.3	0.3	-0.5	-0.8	0.2	-0.3
70	CMLE	4.484	1.885	-0.373	2.239	-4.037	4.062	-5.408	-5.373	-0.289	1.602	-0.402	-0.804	0.116	-0.316
	TSE	2.937	1.474	1.459	0.761	1.290	1.265	1.638	1.668	1.543	2.348	1.500	0.061	0.376	0.329
	MAE	5.424	2.649	2.531	2.486	3.922	4.215	5.270	5.232	2.800	3.881	2.892	1.111	0.919	0.938
	MSE	11.246	4.721	5.134	4.489	6.677	6.917	8.772	8.434	5.974	9.647	6.247	1.271	1.137	1.121
100	CMLE	2.286	1.077	0.401	1.237	-2.345	2.356	-3.117	-3.186	-0.305	1.209	-0.686	-0.803	0.157	-0.298
	TSE	1.956	0.918	0.986	0.451	0.718	0.738	0.925	0.941	0.897	1.403	0.926	0.050	0.319	0.303
	MAE	1.808	0.889	0.956	0.547	0.932	0.946	1.205	1.274	0.842	1.492	1.111	0.043	0.151	0.127
	MSE	3.910	1.730	1.603	1.373	2.356	2.377	3.276	3.578	1.256	5.799	2.350	0.058	0.198	0.166
300	CMLE	1.171	0.638	0.401	0.849	-1.611	1.600	-2.123	-2.112	-0.198	0.490	-0.352	-0.808	0.189	-0.319
	TSE	0.803	0.378	0.480	0.171	0.273	0.271	0.348	0.347	0.365	0.438	0.360	0.028	0.184	0.167
	MAE	0.651	0.349	0.393	0.124	0.192	0.175	0.234	0.245	0.378	0.455	0.373	0.020	0.072	0.073
	MSE	0.845	0.461	0.513	0.162	0.260	0.252	0.318	0.324	0.472	0.578	0.459	0.026	0.092	0.094
500	CMLE	1.143	0.654	0.461	0.821	-1.538	1.556	-2.053	-2.065	-0.196	0.455	-0.389	-0.806	0.184	-0.310
	TSE	0.607	0.282	0.360	0.127	0.203	0.205	0.262	0.263	0.274	0.320	0.265	0.021	0.144	0.131
	MAE	0.601	0.266	0.280	0.093	0.136	0.155	0.176	0.190	0.268	0.346	0.264	0.018	0.056	0.053
	MSE	0.738	0.328	0.365	0.119	0.180	0.194	0.224	0.247	0.340	0.462	0.326	0.021	0.072	0.065
	$\theta_{0}$	1.2	0.7	1.5	0.8	-1.5	-1.5	2.0	-2.0	0.3	-0.3	-0.5	-0.8	0.2	-0.3
70	CMLE	6.218	2.849	2.043	3.363	-6.405	-6.394	8.474	-8.528	2.311	0.434	0.235	-0.804	0.0742	-0.310
	TSE	3.610	1.817	1.786	0.968	1.551	1.603	2.042	2.103	2.1483	4.074	1.733	0.060	0.383	0.352
	MAE	7.506	4.794	3.945	3.563	6.421	6.409	8.480	8.530	4.885	6.469	4.185	1.179	1.002	1.033
	MSE	14.640	9.501	7.536	6.173	10.750	10.669	13.978	14.160	9.441	16.167	7.893	1.415	1.196	1.234
100	CMLE	1.870	0.861	1.385	1.249	-2.278	-2.265	2.965	-3.053	0.852	0.041	-0.284	-0.805	0.138	-0.319
	TSE	1.834	0.993	1.142	0.457	0.720	0.728	0.919	0.950	1.017	1.517	0.866	0.050	0.359	0.304
	MAE	1.542	0.886	0.922	0.524	0.837	0.828	1.025	1.107	1.101	1.413	0.950	0.043	0.162	0.134
	MSE	2.017	1.142	1.275	0.848	1.421	1.467	1.697	1.971	1.608	2.059	1.329	0.057	0.221	0.171
300	CMLE	1.324	0.795	1.448	0.874	-1.653	-1.650	2.196	-2.218	0.378	-0.247	-0.468	-0.802	0.183	-0.308
	TSE	0.853	0.438	0.545	0.181	0.293	0.292	0.379	0.381	0.378	0.499	0.371	0.028	0.192	0.181
	MAE	0.717	0.336	0.393	0.138	0.239	0.229	0.294	0.313	0.388	0.508	0.336	0.021	0.077	0.065
	MSE	0.861	0.451	0.504	0.188	0.332	0.311	0.406	0.418	0.490	0.647	0.422	0.027	0.096	0.083
500	CMLE	1.199	0.718	1.458	0.842	-1.580	-1.587	2.109	-2.116	0.380	-0.207	-0.431	-0.803	0.183	-0.307
	TSE	0.647	0.325	0.418	0.132	0.214	0.216	0.279	0.280	0.286	0.373	0.274	0.022	0.151	0.140
	MAE	0.537	0.298	0.353	0.103	0.161	0.168	0.207	0.201	0.313	0.364	0.264	0.017	0.060	0.051
	MSE	0.671	0.391	0.442	0.135	0.213	0.218	0.273	0.267	0.405	0.468	0.345	0.021	0.074	0.064