This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Wavelet Based Periodic Autoregressive Moving Average Models

\nameRhea Davisa and N. Balakrishnaa,b CONTACT Rhea Davis. Email: rheadavisc@gmail.com aDepartment of Statistics, Cochin University of Science and Technology, Kochi, India; bDepartment of Mathematics and Statistics, Indian Institute of Technology, Tirupati, India
Abstract

This paper proposes a wavelet-based method for analysing periodic autoregressive moving average (PARMA) time series. Even though Fourier analysis provides an effective method for analysing periodic time series, it requires the estimation of a large number of Fourier parameters when the PARMA parameters do not vary smoothly. The wavelet-based analysis helps us to obtain a parsimonious model with a reduced number of parameters. We have illustrated this with simulated and actual data sets.

keywords:
Fourier methods; hypothesis testing; parsimony, periodic ARMA; wavelets

1 Introduction

Time series with seasonal or periodic patterns occur in many fields of applications such as economics, finance, demography, astronomy, meteorology, etc. The commonly used seasonal autoregressive integrated moving average (SARIMA) models may not capture some of the important local features of such time series (see [14]). An alternative approach suggested is to model these data by periodically stationary time series. A process {Y~t}\{\tilde{Y}_{t}\} is said to be periodically (weak) stationary if mean function μt=E(Y~t)\mu_{t}=E(\tilde{Y}_{t}) and autocovariance function γt(h)=Cov(Y~t,Y~t+h),h\gamma_{t}(h)=Cov(\tilde{Y}_{t},\tilde{Y}_{t+h}),h\in\mathbb{Z}, are periodic functions with same period ν\nu (see [20]). A particular class of models having periodic stationarity is the periodic autoregressive moving average models of period ν\nu having autoregressive parameter pp and moving average parameter qq [PARMAν(p,q)\text{PARMA}_{\nu}(p,q)], given in [6] is

Ytj=1pϕt(j)Ytj=εtj=1qθt(j)εtj,t=1,2,,Y_{t}-\sum_{j=1}^{p}\phi_{t}(j)Y_{t-j}=\varepsilon_{t}-\sum_{j=1}^{q}\theta_{t}(j)\varepsilon_{t-j},t=1,2,\dotsc, (1)

where (i) Yt=Y~tμt,μt=μt+kν,k,Y_{t}=\tilde{Y}_{t}-\mu_{t},\mu_{t}=\mu_{t+k\nu},k\in\mathbb{Z}, (ii) ϕt=ϕt+kν,k\phi_{t}=\phi_{t+k\nu},k\in\mathbb{Z}, (iii) θt=θt+kν,k\theta_{t}=\theta_{t+k\nu},k\in\mathbb{Z}, and (iv) σt2=σt+kν2,k\sigma^{2}_{t}=\sigma^{2}_{t+k\nu},k\in\mathbb{Z}, where σt2\sigma^{2}_{t} is the variance of the mean zero random sequence {εt}\{\varepsilon_{t}\}. Thus, {δt}={σt1εt}\{\delta_{t}\}=\{\sigma_{t}^{-1}\varepsilon_{t}\} is an iid sequence of mean 0 and variance 1 random variables. Note that PARMA reduces to ARMA when ν=1\nu=1. PARMA models have been applied in fields as diverse as economics ([21] and [9]), climatology ([13], [7] and [4]), signal processing [10] and hydrology ([25], [24], [6], [1] and [5]). However, PARMA models contain a large number of parameters. For example, for a monthly series of period ν=12\nu=12, the total number of parameters to be estimated is 12×4=4812\times 4=48. Often, the inclusion of a large number of parameters leads to overfitting. Hence, it is important to develop methods that result in parsimonious models having less number of parameters.

This article is organised in the following manner: In Section 2, the estimators of the PARMA model and their several properties are mentioned. The Fourier-PARMA model is presented in Section 3. Section 4 gives a brief introduction to wavelet analysis with a special focus on discrete wavelet transform. In Section 5, our proposed wavelet-PARMA model is expounded. Section 6 consists of a simulation study of our proposed wavelet-PARMA model. The applicability of the proposed model is demonstrated using real data in Section 7. Section 8 contains the conclusion of our study. We have utilised the results from various articles, and for easy reference, we have included them in the Appendices.

2 Estimation of PARMA parameters

It is assumed hereafter that model (1) admits a causal representation (see [4])

Yt=j=0ψt(j)εtj,Y_{t}=\sum_{j=0}^{\infty}\psi_{t}(j)\varepsilon_{t-j},

where ψt(0)=1\psi_{t}(0)=1 and j=0|ψt(j)|<,\sum_{j=0}^{\infty}\left|\psi_{t}(j)\right|<\infty, for all tt. Note that ψt(j)=ψt+kv(j)\psi_{t}(j)=\psi_{t+kv}(j) for all jj. Also, Yt=Y~tμtY_{t}=\tilde{Y}_{t}-\mu_{t} and εt=σtδt\varepsilon_{t}=\sigma_{t}\delta_{t} where {δt}\left\{\delta_{t}\right\} is i.i.d.

Given data Y~0,Y~1,,Y~Nν1\tilde{Y}_{0},\tilde{Y}_{1},\dotsc,\tilde{Y}_{N\nu-1}, where NN is the number of cycles, the vector 𝝁={μ0,,μν1}\boldsymbol{\mu}=\{\mu_{0},\dotsc,\mu_{\nu-1}\}^{\prime} is estimated using

μ^i=1Nj=0N1Y~jν+i,i=0,1,,ν1.\hat{\mu}_{i}=\frac{1}{N}\sum_{j=0}^{N-1}\tilde{Y}_{j\nu+i},\,\,i=0,1,\dotsc,\nu-1. (2)

The sample autocovariance for lag mm and season ii is estimated using

γ^i(m)=1Nj=0N1(Y~jν+iμ^i)(Y~jν+i+mμ^i+m),i=0,1,,ν1,\hat{\gamma}_{i}(m)=\frac{1}{N}\sum_{j=0}^{N-1}(\tilde{Y}_{j\nu+i}-\hat{\mu}_{i})(\tilde{Y}_{j\nu+i+m}-\hat{\mu}_{i+m}),\,\,i=0,1,\dotsc,\nu-1, (3)

where μ^i\hat{\mu}_{i} is estimated using (2) and whenever t>Nν1t>N\nu-1, the corresponding term is set as zero. Let

Y^i+n=j=1nθn,j(i)(Yi+njY^i+nj),n1,\hat{Y}_{i+n}=\sum_{j=1}^{n}\theta_{n,j}^{(i)}(Y_{i+n-j}-\hat{Y}_{i+n-j}),\,n\geq 1,

denote the one-step predictor of Yi+nY_{i+n} that minimises the mean square error, vn,i=E(Yi+nY^i+n)2v_{n,i}=E(Y_{i+n}-\hat{Y}_{i+n})^{2}. The estimates of θn,j(i)\theta_{n,j}^{(i)} and vn,iv_{n,i} can be obtained by substituting the estimates of autocovariance function (3) in the recursive relation of innovation algorithm for periodic stationary processes given in Theorem A1. Then, Theorem A2 and Theorem A3 show that θ^n,jin\hat{\theta}_{n,j}^{\langle i-n\rangle} and v^n,in\hat{v}_{n,\langle i-n\rangle} are consistent estimators for ψi(j),\psi_{i}(j), for all jj, and σi2\sigma_{i}^{2} respectively. Here the notation, b\langle b\rangle is bνb/νforb=0,1,b-\nu\lfloor b/{\nu}\rfloor\,\text{for}\,b=0,1,\dotsc and ν+bνb/ν+1\nu+b-\nu\lfloor b/{\nu}+1\rfloor for b=1,2,b=-1,-2,\dotsc, where \lfloor\cdot\rfloor is the greatest integer function.

To demonstrate our proposed ideas we are focusing particularly on two classes of PARMA models, described in the following subsections.

2.1 Periodic autoregressive model of order 1 (PAR(1))

Consider the PARν(1)\text{PAR}_{\nu}(1) model given by,

Yt=ϕtYt1+εt,Y_{t}=\phi_{t}Y_{t-1}+\varepsilon_{t}, (4)

where (i) Yt=Y~tμt,μt=μt+kν,k,Y_{t}=\tilde{Y}_{t}-\mu_{t},\mu_{t}=\mu_{t+k\nu},k\in\mathbb{Z}, (ii) ϕt=ϕt+kν,k\phi_{t}=\phi_{t+k\nu},k\in\mathbb{Z} and (iii) σt2=σt+kν2,k\sigma_{t}^{2}=\sigma_{t+k\nu}^{2},k\in\mathbb{Z} where σt2\sigma_{t}^{2} is the variance of the mean zero normal random sequence {εt}\{\varepsilon_{t}\}. For the model (4), Theorem A4 tells us that {ψ0(1),ψ1(1),,ψν1(1)}={ϕ0,ϕ1,,ϕν1}\{\psi_{0}(1),\psi_{1}(1),\dotsc,\psi_{\nu-1}(1)\}^{\prime}=\{\phi_{0},\phi_{1},\dotsc,\phi_{\nu-1}\}^{\prime}. Thus, ϕ^={ϕ^0,ϕ^1,,ϕ^ν1}\hat{\boldsymbol{\phi}}=\{\hat{\phi}_{0},\hat{\phi}_{1},\dotsc,\hat{\phi}_{\nu-1}\}^{\prime} can be obtained using the consistent estimates of ψi(1),i=0,,ν1.\psi_{i}(1),i=0,\dotsc,\nu-1. Moreover, Theorem A5 tells us that the estimator of ϕ\boldsymbol{\phi} has asymptotic normality. Also, 𝝁^\hat{\boldsymbol{\mu}} and 𝝈^={v^n,0n,v^n,1n,,v^n,ν1n}\hat{\boldsymbol{\sigma}}=\{\hat{v}_{n,\langle 0-n\rangle},\hat{v}_{n,\langle 1-n\rangle},\dotsc,\hat{v}_{n,\langle\nu-1-n\rangle}\}^{\prime} have asymptotic normality as stated by Theorem A6 and Theorem A7, respectively.

2.2 PARMA(1,1)

Consider the PARMAν(1,1)\text{PARMA}_{\nu}(1,1) given by,

Y~t=ϕtY~t1+εtθtεt1,\tilde{Y}_{t}=\phi_{t}\tilde{Y}_{t-1}+\varepsilon_{t}-\theta_{t}\varepsilon_{t-1}, (5)

where (i) ϕt=ϕt+kν,k\phi_{t}=\phi_{t+k\nu},k\in\mathbb{Z} (ii) θt=θt+kν,k\theta_{t}=\theta_{t+k\nu},k\in\mathbb{Z} and {εt}\{\varepsilon_{t}\} is a sequence of normal random variables with mean 0 and variance 1. For the model (5), Theorem A8 tells us that,

ϕt=ψt(2)ψt1(1).\phi_{t}=\frac{\psi_{t}(2)}{\psi_{t-1}(1)}. (6)
θt=ϕtψt(1).\theta_{t}=\phi_{t}-\psi_{t}(1). (7)

Thus, ϕ^t\hat{\phi}_{t} and θ^t\hat{\theta}_{t} are obtained by replacing ψt\psi_{t} with its corresponding consistent estimate. The estimator of ϕ\boldsymbol{\phi} has asymptotic normality as given in Theorem A9. Also, Theorem A10 tells us that 𝜽^={θ^1,θ^2,,θ^ν1}\hat{\boldsymbol{\theta}}=\{\hat{\theta}_{1},\hat{\theta}_{2},\dotsc,\hat{\theta}_{\nu-1}\}^{\prime} also has asymptotic normality.

After estimating the PARMA parameters our objective is to reduce the number of parameters so that the final model is still able to capture the dependency structure adequately. Since parameter functions are periodic, it is natural to consider Fourier techniques to meet the objectives.

3 Fourier-PARMA model

Often, it is found that representing a function in terms of basis terms helps us to get new insights about the function, which is otherwise obscure. The selection of the set of basis functions depends on the kind of information we are interested in. The analysis of periodic functions usually involves the identification of prominent frequencies present in the signal. For this the function is represented as an infinite series of sinusoids, known as Fourier series, which is possible for almost all functions encountered in practice. The magnitude of the coefficients associated with sines and cosines indicates the strength of the corresponding frequency component present in the process.

For a given vector 𝐗\mathbf{X}, from [4], the periodic function 𝐗=[Xt:0tν1]\mathbf{X}=[X_{t}:0\leq t\leq\nu-1] has the representation

Xt=c0+r=1{crcos(2πrtν)+srsin(2πrtν)},X_{t}=c_{0}+\sum_{r=1}^{\ell}\left\{c_{r}\,cos\left(\frac{2\pi rt}{\nu}\right)+s_{r}\,sin\left(\frac{2\pi rt}{\nu}\right)\right\}, (8)

where crc_{r} and srs_{r} are Fourier coefficients and

={ν/2,ifνeven(ν1)/2,ifνodd.\ell=\begin{cases}\nu/2,\quad\quad\quad\,\,\,\text{if}\,\nu\,even\\ (\nu-1)/2,\quad\text{if}\,\nu\,odd\end{cases}.

For convenience, (8) is written as,

Xt=c0+r=1{crcos(r)+srsin(r))},X_{t}=c_{0}+\sum_{r=1}^{\ell}\left\{c_{r}\,cos^{*}(r)+s_{r}\,sin^{*}(r))\right\}, (9)

where cos(r)=cos(2πrtν)cos^{*}(r)=cos\left(\frac{2\pi rt}{\nu}\right) and sin(r)=sin(2πrtν)sin^{*}(r)=sin\left(\frac{2\pi rt}{\nu}\right).

Let f={[c0,c1,s1,,c(ν1)/2,s(ν1)/2](νodd)[c0,c1,s1,,s(ν/21),c(ν/2)](νeven)\textbf{f}=\begin{cases}[c_{0},c_{1},s_{1},\dotsc,c_{(\nu-1)/2},s_{(\nu-1)/2}]^{\prime}\quad(\nu\,\text{odd})\\ [c_{0},c_{1},s_{1},\dotsc,s_{(\nu/2-1)},c_{(\nu/2)}]^{\prime}\quad\quad(\nu\,\text{even})\end{cases}.

Then, we have, from [4],

f=𝒫𝒰𝐗,\textbf{f}=\mathcal{L}\mathcal{P}\mathcal{U}\mathbf{X}, (10)

where ,𝒫\mathcal{L},\mathcal{P} and 𝒰\mathcal{U} are respectively given in equations (60), (62) and (61) of Theorem A11. 𝒰\mathcal{U} and 𝒫\mathcal{P} are unitary matrices. Using (10), we have, f^=𝒫𝒰𝐗^\hat{\textbf{f}}=\mathcal{L}\mathcal{P}\mathcal{U}\hat{\mathbf{X}}. Note that X can be retrieved from f by,

X=𝒰~𝒫~1,\textbf{X}=\tilde{\mathcal{U}}^{\prime}\tilde{\mathcal{P}}^{\prime}\mathcal{L}^{-1},

where 𝒜~\tilde{\mathcal{A}}^{\prime} denotes the conjugate transpose of matrix 𝒜\mathcal{A}. Thus, X and f are representations of the same entity. Hence, to obtain a parsimonious PARMA model it is enough to represent the model using a reduced number of Fourier coefficients. Since it is observed that PARMA parameters often vary smoothly over time, [6], [5], [23] and [4] identified significant Fourier coefficients via a hypothesis test based on the asymptotic distributions of PARMA estimators. The significant Fourier coefficients are retained as it is while the insignificant coefficients are reduced to zero, thereby resulting in a parsimonious Fourier-PARMA model.

Although some success has been found in employing Fourier techniques they often fail when bursts and other transient events take place, which is often the case in many fields like economics and climatology. This is because, in the case of temporal events, all the Fourier coefficients get unduly affected as all the observations are involved in the computation of the coefficients.

4 Wavelet Analysis

Wavelet techniques are an excellent alternative to Fourier methods. Some applications of wavelets in statistics include estimation of functions like density, regression, and spectrum, analysis of long memory process, data compression, decorrelation, and detection of structural breaks. A “wavelet” means small wave (see [22], [18] and [11]). This essentially means that a wavelet has oscillations like a wave. However, it lasts only for a short duration unlike sine wave, which spans infinitely to both sides of the time axis. They are constructed so that they possess certain desirable properties. Many different wavelet functions and transforms are available in the literature. However, here we are considering only the orthogonal discrete wavelet transform (DWT) since the approximation based on an orthogonal set is best in terms of mean square error, as indicated by projection theorem (see [8]).

Fourier methods are enough when the frequencies of the signal do not change over time. However, Fourier techniques fail when different frequencies occur at different parts of the time axis. Hence, it is imperative to find techniques that have both time resolution and frequency resolution. But, by Heisenberg’s Principle, it is impossible to identify time and frequency simultaneously with arbitrary precision (see [15] and [26]) as there is a lower bound for the error occurred. However, wavelets intelligently bypass this difficulty by using short time windows to capture high frequency, so that we have good time resolution, and wider time windows for low frequencies so that we have good frequency resolution. Thus, wavelets can pick out characteristics that are local in time and frequency. Wavelets are functions of scale, instead of frequency. A scale can be loosely interpreted as a quantity inversely proportional to frequency. This makes the wavelet transform an exceptional tool for studying non-stationary signals. For the latest works on the applications of wavelets to time series, one can refer to [16], [27], and [17].

Let 𝐗\mathbf{X} be a vector of size N=2J,J+N=2^{J},J\,\in\,\mathbb{Z}^{+}, 𝒲\mathcal{W} is an N×NN\times N orthogonal matrix used for performing discrete wavelet transform (DWT), then the DWT of 𝐗\mathbf{X} is given by,

𝐖=𝒲𝐗=[VJ𝐖1𝐖J]=[W0W1WN]\mathbf{W}=\mathcal{W}\mathbf{X}=\begin{bmatrix}{V}_{J}\\ \mathbf{W}_{1}\\ \vdots\\ \mathbf{W}_{J}\end{bmatrix}=\begin{bmatrix}W_{0}\\ W_{1}\\ \vdots\\ W_{N}\end{bmatrix} (11)

Here, VJ=W0{V}_{J}=W_{0}, is the scaling coefficient that is proportional to the average of all the data, i.e,

VJ=W0=k11Ni=1NXi,{V}_{J}=W_{0}=k_{1}*\frac{1}{N}\sum_{i=1}^{N}X_{i}, (12)

where the proportionality constant k1k_{1} depends on the particular wavelet employed. 𝐖j,j{1,2,,J}\mathbf{W}_{j},j\in\{1,2,\dotsc,J\} contains 2j12^{j-1} wavelet coefficients associated with changes on a scale N/2j1N/2^{j-1}. For example, the Haar DWT of 𝐗=[X0,,X15]\mathbf{X}=[X_{0},\dotsc,X_{15}]^{\prime} is given by,

[14(X15++X0)14(X15+X8+X7+X0)18(X7+X4+X3++X0)18(X15X12+X11+X8)12(X3X2+X1+X0)12(X15X14+X13+X12)12(X1+X0)12(X15+X14)].\begin{bmatrix}\frac{1}{4}(X_{15}+\dotsc+X_{0})\\ \frac{1}{4}(-X_{15}+\dotsc-X_{8}+X_{7}-\dotsc+X_{0})\\ \frac{1}{\sqrt{8}}(-X_{7}+\dotsc-X_{4}+X_{3}+\dotsc+X_{0})\\ \frac{1}{\sqrt{8}}(-X_{15}-\dotsc-X_{12}+X_{11}-\dotsc+X_{8})\\ \frac{1}{2}(-X_{3}-X_{2}+X_{1}+X_{0})\\ \vdots\\ \frac{1}{2}(-X_{15}-X_{14}+X_{13}+X_{12})\\ \frac{1}{\sqrt{2}}(-X_{1}+X_{0})\\ \vdots\\ \frac{1}{\sqrt{2}}(-X_{15}+X_{14})\end{bmatrix}. (13)

Note that the wavelet coefficients are localised in time. For example, the last wavelet coefficient only involves the observations X14X_{14} and X15X_{15}. Since 𝒲\mathcal{W} is an orthogonal matrix, the original vector 𝐗\mathbf{X} can easily be reconstructed via the inverse DWT

𝐗=𝒲T𝐖,\mathbf{X}=\mathcal{W}^{T}\mathbf{W}, (14)

where 𝒲T\mathcal{W}^{T} denotes the transpose of 𝒲\mathcal{W}. Note that from (14), X and W are representations of the same quantity and hence it is enough to represent the parameters of the PARMA model in terms of discrete wavelet transform coefficients.

Another interesting advantage of the DWT, especially pertaining to our objective of achieving parsimonious PARMA models, is the fact that DWT redistributes the energy or variability contained in a sequence (see [26]). Hence, the crux of the sequence, spread throughout the original sequence, is concentrated in a few wavelet coefficients (see [19] and [26]). Hence, we propose a wavelet-PARMA model that captures the dependency structure adequately based on the idea of retaining only a few significant wavelet coefficients. We identify the significant wavelet coefficients by developing a hypothesis test, following the parallel approach given in [4] for Fourier coefficients, utilizing the asymptotic distributions of the estimators of the PARMA parameters.

5 Wavelet - PARMA model

Suppose random vector 𝐗^\hat{\mathbf{X}} satisfies,

N1/2(𝐗^𝐗)𝒩(0,Σ).N^{1/2}(\hat{\mathbf{X}}-\mathbf{X})\Rightarrow\mathcal{N}(0,\Sigma). (15)

Using Theorem A12, we have,

N1/2(𝐖^𝐖)𝒩(0,RW),N^{1/2}(\hat{\mathbf{W}}-\mathbf{W})\Rightarrow\mathcal{N}(0,R_{\textbf{W}}), (16)
whereRW=𝒲Σ𝒲T.\text{where}\quad R_{\textbf{W}}=\mathcal{W}\Sigma\mathcal{W}^{T}. (17)

The main idea is to identify the statistically significant wavelet coefficients of the PARMA estimator vectors, using an appropriate test procedure so that the other coefficients can be nullified, thereby obtaining a parsimonious PARMA model. The null hypothesis of the test is H0:𝐖=(VJ,0,,0)=(W0,0,,0)H_{0}:\mathbf{W}=(V_{J},0,\dotsc,0)=(W_{0},0,\dotsc,0). Under H0H_{0}, 𝐗=k2VJ=k2k11Ni=1NXi=k1Ni=1NXi\mathbf{X}=k_{2}*V_{J}=k_{2}*k_{1}*\frac{1}{N}\sum_{i=1}^{N}X_{i}=k*\frac{1}{N}\sum_{i=1}^{N}X_{i}, where kk depends on the type of wavelet used. Since the parameter vectors have all elements the same under H0H_{0}, the PARMA process reduces to a stationary ARMA process with ϕt=ϕ\phi_{t}=\phi for all t{0,1,,ν1}t\in\{0,1,\dotsc,\nu-1\}, and similarly for other parameter vectors. Thus, the null hypothesis states that the model is stationary. From (16), we have,

N1/2(W^iWi)𝒩(0,[RW]ii),fori=1,2,,ν1.N^{1/2}(\hat{W}_{i}-W_{i})\Rightarrow\mathcal{N}(0,[{R_{\textbf{W}}}]_{ii}),\,\text{for}\,i=1,2,\dotsc,\nu-1. (18)

Here, we will use the Bonferroni’s test procedure (see [12]) with α=0.05\alpha=0.05. In this case, we wish to test the null hypothesis H0:|Wi|=0H_{0}:|W_{i}|=0 versus Ha:|Wi|0,i=1,2,,ν1H_{a}:|W_{i}|\neq 0,i=1,2,\dotsc,\nu-1. The test statistics is

ZWi=W^i[RW]i,i/N,i=1,2,ν1.{Z_{W}}_{i}=\frac{\hat{W}_{i}}{\sqrt{[{R_{\textbf{W}}}]_{i,i}/N}},i=1,2,\dotsc\nu-1. (19)

Let α=αν1\alpha^{*}=\frac{\alpha}{\nu-1} and Zα/2Z_{\alpha^{*}/2} be the α/2\alpha^{*}/2 tail quantile, i.e, P(|Z|>Zα/2)=α/2P(|Z|>Z_{\alpha^{*}/2})=\alpha^{*}/2, where Z𝒩(0,1)Z\sim\mathcal{N}(0,1). We reject the null hypothesis when |ZWi|>Zα/2|{Z_{W}}_{i}|>Z_{{\alpha^{*}}/2}. Whenever the null hypothesis is not rejected, the corresponding element WiW_{i} is set as zero. Hopefully, this will result in a sparse 𝐖\mathbf{W}^{\prime} vector where most of the coefficients are 0, resulting in a parsimonious wavelet-PARMA model.

5.1 PAR(1) under H0H_{0}

For the model considered in (4), under null hypothesis, γ(h)=ν1m=0ν1γm(h)\gamma(h)=\nu^{-1}\sum_{m=0}^{\nu-1}\gamma_{m}(h) and ϕ(k)=ν1m=0ν1ϕm(k)\phi(k)=\nu^{-1}\sum_{m=0}^{\nu-1}\phi_{m}(k). In this case, the asymptotic variance-covariance matrix of 𝝁^,Σ𝝁\hat{\boldsymbol{\mu}},\Sigma_{\boldsymbol{\mu}}, becomes (see [4]),

(Σ𝝁)ii,=γ(0)[1+r1r], 0iν1,(\Sigma_{\boldsymbol{\mu}})_{ii,}=\gamma(0)\left[\frac{1+r}{1-r}\right],\quad\,0\leq i\leq\nu-1, (20)

where r=ϕνr=\phi^{\nu} and for j>ij>i,

(Σ𝝁)ij=γ(0)[ϕji+ϕν+ij1r],0i,jν1.(\Sigma_{\boldsymbol{\mu}})_{ij}=\gamma(0)\left[\frac{\phi^{j-i}+\phi^{\nu+i-j}}{1-r}\right],\quad 0\leq i,j\leq\nu-1. (21)

Since Σ𝝁\Sigma_{\boldsymbol{\mu}} is a symmetric matrix, we also have the elements (Σ𝝁)ij,i>j,0i,jν1(\Sigma_{\boldsymbol{\mu}})_{ij,i>j,0\leq i,j\leq\nu-1}. For hypothesis testing, the elements of Σ𝝁\Sigma_{\boldsymbol{\mu}} are replaced with their estimates.

Under H0H_{0}, the asymptotic variance-covariance matrix of 𝝈𝟐^\hat{\boldsymbol{\sigma^{2}}}, Σ𝝈𝟐\Sigma_{\boldsymbol{\sigma^{2}}} is given by (see [4]),

(Σ𝝈𝟐)ii=\displaystyle(\Sigma_{\boldsymbol{\sigma^{2}}})_{ii}= 2γ(0)2[1+ϕ2v1r2]2ϕ×2γ(0)2[ϕ+ϕ2v11r2]\displaystyle 2\gamma(0)^{2}\left[\frac{1+\phi^{2v}}{1-r^{2}}\right]-2\phi\times 2\gamma(0)^{2}\left[\frac{\phi+\phi^{2v-1}}{1-r^{2}}\right] (22)
+ϕ2γ(0)2[ϕ2+1+3r21r2],0iν1,\displaystyle+\phi^{2}\gamma(0)^{2}\left[\phi^{2}+\frac{1+3r^{2}}{1-r^{2}}\right],\quad 0\leq i\leq\nu-1, (23)

and if j>ij>i, we have,

(Σ𝝈2)ij=\displaystyle(\Sigma_{\boldsymbol{\sigma}^{2}})_{ij}= 2γ(0)2[ϕ2(ji)+ϕ2(v+ij)1r2]\displaystyle 2\gamma(0)^{2}\left[\frac{\phi^{2(j-i)}+\phi^{2(v+i-j)}}{1-r^{2}}\right] (24)
2γ(0)2ϕ[ϕ2j2i11r2+ϕ2v2j+2i+11r2]2γ(0)2ϕ[ϕ2j2i+1+ϕ2ν2j+2i11r2]\displaystyle-2\gamma(0)^{2}\phi\left[\frac{\phi^{2j-2i-1}}{1-r^{2}}+\frac{\phi^{2v-2j+2i+1}}{1-r^{2}}\right]-2\gamma(0)^{2}\phi\left[\frac{\phi^{2j-2i+1}+\phi^{2\nu-2j+2i-1}}{1-r^{2}}\right]
+2γ(0)2ϕ2[ϕ2j2i+ϕ2v2j+2i1r2],0i,jν1.\displaystyle+2\gamma(0)^{2}\phi^{2}\left[\frac{\phi^{2j-2i}+\phi^{2v-2j+2i}}{1-r^{2}}\right],\quad 0\leq i,j\leq\nu-1. (25)

Since Σ𝝈𝟐\Sigma_{\boldsymbol{\sigma^{2}}} is a symmetric matrix, we also have the elements si,j,i>j.s_{i,j},i>j. For hypothesis testing, the elements of Σ𝝈𝟐\Sigma_{\boldsymbol{\sigma^{2}}} are replaced with their estimates.

Under H0H_{0}, σi2=σ2,\sigma_{i}^{2}=\sigma^{2}, for all ii and so from Theorem A5, we get that the asymptotic variance-covariance matrix of ϕ^,Σϕ\hat{\boldsymbol{\phi}},\Sigma_{\boldsymbol{\phi}}, as the identity matrix \mathcal{I} of order ν\nu, i.e., Σϕ=\Sigma_{\boldsymbol{\phi}}=\mathcal{I}.

5.2 PARMA(1,1) under H0H_{0}

For the model considered in (5), under null hypothesis, ψ(j)=ν1t=0ν1ψt(j),j=1,2,ϕ=ν1t=0ν1ϕt\psi(j)=\nu^{-1}\sum_{t=0}^{\nu-1}\psi_{t}(j),j=1,2,\phi=\nu^{-1}\sum_{t=0}^{\nu-1}\phi_{t} and θ=ν1t=0ν1θt\theta=\nu^{-1}\sum_{t=0}^{\nu-1}\theta_{t}. Under H0H_{0}, the asymptotic variance-covariance matrix of ϕ^,𝒬\hat{\boldsymbol{\phi}},\mathcal{Q}, given in Theorem A9, reduces to

𝒬=k,l=12𝒱kk,\mathcal{Q}=\sum_{k,l=1}^{2}\mathcal{H}_{\ell}\mathcal{V}_{\ell k}\mathcal{H}_{k}^{\prime}, (26)

where

𝒱11\displaystyle\mathcal{V}_{11} =,\displaystyle=\mathcal{I}, (27)
𝒱22\displaystyle\mathcal{V}_{22} =[𝝍(1)+1],\displaystyle=[\boldsymbol{\psi}^{\prime}(1)+1]\mathcal{I}, (28)
𝒱12\displaystyle\mathcal{V}_{12} =𝝍(1)Π,\displaystyle=\boldsymbol{\psi}(1)\Pi, (29)
𝒱21\displaystyle\mathcal{V}_{21} =𝝍(1)Π,\displaystyle=\boldsymbol{\psi}(1)\Pi^{\prime}, (30)
1\displaystyle\mathcal{H}_{1} =2Π112.\displaystyle=-\mathcal{F}_{2}\Pi^{-1}\mathcal{F}_{1}^{-2}. (31)
2\displaystyle\mathcal{H}_{2} =Π111Π,\displaystyle=\Pi^{-1}\mathcal{F}_{1}^{-1}\Pi, (32)
j\displaystyle\mathcal{F}_{j} =𝝍(j),\displaystyle=\boldsymbol{\psi}(j)\mathcal{I},\quad (33)
𝝍(j)\displaystyle\boldsymbol{\psi}(j) ={ψ0(j),ψ1(j),,ψν1(j)},\displaystyle=\{\psi_{0}(j),\psi_{1}(j),\dotsc,\psi_{\nu-1}(j)\}^{\prime}, (34)
andΠ\displaystyle\text{and}\quad\Pi =[01000001000000110000].\displaystyle=\begin{bmatrix}0&1&0&0&\dotsc&0\\ 0&0&1&0&\dotsc&0\\ \vdots&\vdots&\vdots&\vdots&&\vdots\\ 0&0&0&0&\dotsc&1\\ 1&0&0&0&\dotsc&0\end{bmatrix}. (35)

Also, the asymptotic variance-covariance matrix of 𝜽^,𝒮,\hat{\boldsymbol{\theta}},\mathcal{S}, given in Theorem A10, has the following form under H0H_{0}:

𝒮=k,l=12𝒱kk,\mathcal{S}=\sum_{k,l=1}^{2}\mathcal{M}_{\ell}\mathcal{V}_{\ell k}\mathcal{M}_{k}^{\prime}, (36)

where 1=2Π112\mathcal{M}_{1}=-\mathcal{I}-\mathcal{F}_{2}\Pi^{-1}\mathcal{F}_{1}^{-2}, 2=Π111Π\mathcal{M}_{2}=\Pi^{-1}\mathcal{F}_{1}^{-1}\Pi, 𝒱lk,1l,k2\mathcal{V}_{lk},1\leq l,k\leq 2 are given in equations (27) to (30) , j\mathcal{F}_{j} is given in (33) and Π\Pi is given in (35).

If ν\nu is not a power of 2, each of the estimator vectors is extended periodically to the nearest power of 2, say ν\nu^{\prime}. The corresponding asymptotic variance-covariance matrices are also extended periodically, resulting in matrices of order ν\nu^{\prime}.

6 Simulation of wavelet-PARMA

The observations are simulated from the PARMA12(1,1)\text{PARMA}_{12}(1,1) model,

Y~t=ϕtY~t1+εtθtεt1,\tilde{Y}_{t}=\phi_{t}\tilde{Y}_{t-1}+\varepsilon_{t}-\theta_{t}\varepsilon_{t-1}, (37)

where (i) ϕt=ϕt+12k\phi_{t}=\phi_{t+12k}, (ii) θt=θt+12k,k\theta_{t}=\theta_{t+12k},k\in\mathbb{Z} and {εt}\{\varepsilon_{t}\} is a sequence of normal random variables with mean 0 and variance 1. The parameter values and their corresponding estimates are given in Table 6.

\tbl

True value and their estimates of PARMA model. Parameter ϕ\phi θ\theta Season True Estimate True Estimate 0 0.67 0.59 0.2 0.06 1 0.7 0.73 0.23 0.29 2 0.69 0.81 0.22 0.30 3 0.68 0.77 0.21 0.24 4 0.67 0.43 1.43 1.22 5 0.68 0.72 1.44 1.46 6 0.69 0.62 0.46 0.40 7 0.68 1.04 0.47 0.81 8 1.83 1.86 0.23 0.28 9 1.84 1.83 0.24 0.25 10 0.53 0.52 0.21 0.21 11 0.52 0.68 0.23 0.37

Since period 12 is not a power of 2, each of the vectors has been extended periodically to the nearest power of 2, i.e, 16. We have taken the number of cycles, NN to be 500.

The plot of the generated observations is given in Figure 1.

Refer to caption
Figure 1: Plot of simulated observations

7 iterations of the innovation algorithm was carried out to calculate the parameter estimates. The residuals are computed using the expression,

ϵ^t=Y~tϕ^tY~t1+θ^tε^t1,\hat{\epsilon}_{t}=\tilde{Y}_{t}-\hat{\phi}_{t}\tilde{Y}_{t-1}+\hat{\theta}_{t}\hat{\varepsilon}_{t-1}, (38)

where the initial value ε^0\hat{\varepsilon}_{0} is set as zero. The residual diagnostic plots are given in Figure 2.

Refer to caption
(a) ACF plot of PARMA(1,1) residuals.
Refer to caption
(b) Histogram of PARMA (1,1) residuals superimposed with normal PDF.
Figure 2: PARMA(1,1) Residual Diagnostics

The Box Pierce test results are given in Table 6.

\tbl

Box-Pierce test for PARMA residuals Lags 20 30 40 50 80 100 p-value 0.8844 0.8232 0.5431 0.706 0.8313 0.8898

The normality of the residuals was tested using Kolmogorov Smirnov test and it was found that the residuals are normally distributed (p-value = 0.8586). Thus, the residuals are uncorrelated and normally distributed.

The obtained Fourier-PARMA model, by using the method outlined in [4] is given by,

ϕt=0.880.375sin(1)0.27cos(2)+0.30sin(2)0.195sin(4).\phi_{t}=0.88-0.375\,sin^{*}(1)-0.27\,cos^{*}(2)+0.30\,sin^{*}(2)-0.195\,sin^{*}(4).
θt=0.490.37cos(1)+0.195sin(1)0.22sin(2)0.25cos(4).\theta_{t}=0.49-0.37\,cos^{*}(1)+0.195\,sin^{*}(1)-0.22\,sin^{*}(2)-0.25\,cos^{*}(4).

The ACF plot of residuals of the Fourier-PARMA model is given in Figure 3.

Refer to caption
Figure 3: ACF plot of residuals of Fourier-PARMA model

The results of Box-Pierce test are given in Table 6.

\tbl

Box-Pierce test for Fourier-PARMA model. Lags 20 30 40 50 80 100 p-value 1.753×1051.753\times 10^{-5} 7×1057\times 10^{-5} 4.34×1054.34\times 10^{-5} 0.0003 0.00299 0.0084

Thus, the Fourier-PARMA model is not able to capture the dependencies among the observations adequately as is evident from the ACF plot given in Figure 3 and Box-Pierce test results given in Table 6.

\tbl

DWT analysis of ϕ\phi estimates and θ\theta estimates of PARMA12(1,1)\text{PARMA}_{12}(1,1). The statistically significant DWT coefficients (|Z|>2.95|Z|>2.95) are denoted by *. The coefficients considered in the Wavelet-PARMA model are indicated in bold. Parameter ϕ\phi θ\theta DWT coefficient Z-score DWT coefficient Z-score 0 3.37 - 1.69 - 1 -0.52* -5.30 0.69* 6.12 2 0.03 0.21 -1.06* -6.60 3 0.71* 5.09 0.08 0.48 4 -0.13 -0.94 -0.10 -0.63 5 -0.26 -1.85 0.73* 4.80 6 1.25* 8.97 -0.02 -0.16 7 -0.13 -0.94 -0.10 -0.63 8 -0.10 -0.72 -0.16 -1.21 9 0.03 0.21 0.05 0.36 10 -0.21 -1.49 -0.17 -1.27 11 -0.30 -2.16 -0.29 -2.15 12 0.02 0.15 0.02 0.14 13 -0.11 -0.80 -0.11 -0.85 14 -0.10 -0.72 -0.16 -1.21 15 0.03 0.21 0.05 0.36

The discrete wavelet transform analysis was done using Haar wavelet. This wavelet was chosen for simulation because its structure is explicitly known. The results of Haar DWT analysis are given in Table 6. It is found that only 4 DWT coefficients of ϕ\phi and 4 DWT coefficients of θ\theta need to be incorporated into the wavelet-PAR model and so the final model contains only 8 parameters as opposed to 24 parameters in the original PARMA model. The ACF plot of the residuals of wavelet-PARMA model is given in Figure 4.

Refer to caption
Figure 4: ACF plot of Wavelet-PARMA residuals

The results of Box-Pierce test are given in Table 6.

\tbl

Box-Pierce test for Wavelet-PARMA model. Lags 20 30 40 50 80 100 p-value 0.0889 0.1274 0.0931 0.1923 0.4974 0.6524

The residuals of the wavelet-PARMA model are uncorrelated. Hence, the wavelet-PARMA model with just 8 parameters is able to capture the dependency structure adequately.

7 Data Analysis

The UK Meteorological (MET) Office, established in 1854, is the government body commissioned to analyse weather in the United Kingdom. They use their findings for forecasting and for issuing warnings. They offer data from 36 stations in the UK. For our analysis, we have considered the total sunshine duration in a month recorded in hours by the Ballypatrick weather station for the years 1966-1990. This data is available from the Kaggle site https://www.kaggle.com/datasets/josephw20/uk-met-office-weather-data/code, where the total sunshine duration is given under the variable sun. The data considered is fit using the PAR12(1)\text{PAR}_{12}(1) model given in equation (4). Here, period ν=12\nu=12 and the number of years (cycles), N=25N=25. The plot of the first 10 years is given in Figure 5. It was found that 2 iterations of the innovation algorithm were enough to capture the model dynamics. The residuals of the PAR model was computed using,

δ^t=Y^tϕ^tY^t1σ^t,\hat{\delta}_{t}=\frac{\hat{Y}_{t}-\hat{\phi}_{t}\hat{Y}_{t-1}}{\hat{\sigma}_{t}}, (39)

where Y^t=Y~tμ^t\hat{Y}_{t}=\tilde{Y}_{t}-\hat{\mu}_{t}. The PAR model diagnostics plots are given in Figure 6. The results of the Box-Pierce test are given in Table 7. The normality of the residuals was tested using the Kolmogorov-Smirnov test and it was found that the residuals were normally distributed (p-value = 0.319). Thus, the residuals are uncorrelated and normally distributed.

Refer to caption
Figure 5: Plot of total sunshine duration in a month recorded by Ballypatrick station from 1966 to 1990.
Refer to caption
(a) ACF plot of PAR(1) residuals.
Refer to caption
(b) Histogram of PAR(1) residuals superimposed with normal PDF.
Figure 6: PAR(1) Residual Diagnostics
\tbl

Box-Pierce test for PAR residuals Lags 20 30 p-value 0.0528 0.0601

The obtained Fourier-PAR model, by using the method outlined in [4] is given by,

μt\displaystyle\mu_{t} =106.4560.67cos(1)+35.29sin(1)12.02cos(2)+7.75cos(3)+6.88sin(4).\displaystyle=106.45-60.67\,cos^{*}(1)+35.29\,sin^{*}(1)-12.02\,cos^{*}(2)+7.75\,cos^{*}(3)+6.88\,sin^{*}(4).
ϕt\displaystyle\phi_{t} =0.09.\displaystyle=0.09.
σt2\displaystyle\sigma^{2}_{t} =713.78729.27cos(1)+395.27sin(1).\displaystyle=713.78-729.27\,cos^{*}(1)+395.27\,sin^{*}(1).

The ACF plot of Fourier-PAR residuals is given in Figure 7.

Refer to caption
Figure 7: ACF plot of Fourier-PAR residuals.

The results of Box-Pierce test of Fourier-PAR residuals are given in Table 7.

\tbl

Box-Pierce test of Fourier-PAR residuals. Lags 20 30 p-value 1.3×1051.3\times 10^{-5} 6.76×1086.76\times 10^{-8}

Thus, it is clear that the Fourier-PAR model has not been able to capture the dependency structure adequately.

The discrete wavelet transform analysis was done using Daubechies least asymmetric wavelet with 7 vanishing moments. This wavelet was chosen because the resulting wavelet-PAR model could capture the dependency structure adequately. The LA(7) DWT results are given in Table 7. The ACF plot of wavelet-PAR residuals is given in Figure 8. The Box-Pierce test of wavelet-PAR residuals is given in Table 7. From these results, it is clear that the resulting wavelet-PAR model was able to capture the dependency structure adequately. It is found that only 7 DWT coefficients of 𝝁\boldsymbol{\mu}, 1 DWT coefficient of ϕ\boldsymbol{\phi}, and 5 DWT coefficients of 𝝈𝟐\boldsymbol{\sigma^{2}} need to be incorporated into the wavelet-PAR model and so the final model contains only 13 parameters as opposed to 36 parameters in the original PAR model.

\tbl

DWT analysis of μ\mu estimates, ϕ\phi estimates and σ2\sigma^{2} estimates of PAR12(1)\text{PAR}_{12}(1). The statistically significant DWT coefficients (|Z|>2.95|Z|>2.95) are denoted by *. The coefficients considered in the wavelet-PAR model are indicated in bold. Parameter μ\mu ϕ\phi sigma2\ sigma^{2} DWT coefficient Z-score DWT coefficient Z-score DWT coefficient Z-score 0 408.88 - 0.39 - 2608.33 - 1 112.69* 16.91 -0.19 -0.95 1393.30* 5.47 2 80.35* 16.96 -0.02 -0.12 848.10* 4.60 3 -56.61* -10.52 0.07 0.35 -603.82 -2.93 4 6.75 1.15 -0.24 -1.19 278.60 1.19 5 -69.44* -10.87 -0.37 -1.87 -639.87 -2.49 6 80.40* 14.36 -0.46 -2.32 898.89* 3.99 7 -7.24 -1.24 0.05 0.26 -469.41 -2.01 8 -2.07 -0.38 -0.10 -0.48 202.58 0.85 9 -0.53 -0.09 -0.02 -0.11 -69.71 -0.29 10 4.87 0.84 -0.12 -0.62 147.40 0.59 11 -75.58* -13.48 0.17 0.87 -737.15 -3.17 12 4.84 0.93 -0.16 -0.79 143.75 0.64 13 -2.82 -0.51 0.21 1.03 0.37 0.00 14 1.22 0.22 -0.15 -0.74 308.65 1.30 15 -10.45 -1.90 0.05 0.25 -681.49 -2.87

Refer to caption
Figure 8: ACF plot of wavelet-PAR residuals.
\tbl

Box-Pierce test for wavelet-PAR residuals. Lags 20 30 p-value 0.3159 0.4438

8 Conclusion

PARMA models are used extensively in various fields like economics, climatology, signal processing and hydrology. However, the presence of a substantial number of parameters in a PARMA model reduces its efficiency. Fourier methods already exist in the literature for reducing the number of parameters and have found some success when the parameters vary slowly. However, transient events are quite common in real-life applications and wavelet techniques stand out as the principal analysis tool in these situations. The efficiency of wavelet methods in reducing the number of parameters of PARMA models has been demonstrated using a simulation study and their applicability in real life has been illustrated using real data. This study opens new avenues for the reduction of parameters in other classes of models containing a large number of parameters.

Acknowledgement

Rhea Davis acknowledges the financial support of the University Grants Commission (UGC), India under the Savitribai Jyotirao Phule Fellowship for Single Girl Child (SJSGC) scheme.

Disclosure statement

There is no conflict of interest between the authors.

References

  • [1] P.L. Anderson and M.M. Meerschaert, Modeling river flows with heavy tails, Water Resources Research 34 (1998), pp. 2271–2280.
  • [2] P.L. Anderson and M.M. Meerschaert, Parameter estimation for periodically stationary time series, Journal of Time Series Analysis 26 (2005), pp. 489–518.
  • [3] P.L. Anderson, M.M. Meerschaert, and A.V. Vecchia, Innovations algorithm for periodically stationary time series, Stochastic Processes and their Applications 83 (1999), pp. 149–169.
  • [4] P.L. Anderson, F. Sabzikar, and M.M. Meerschaert, Parsimonious time series modeling for high frequency climate data, Journal of Time Series Analysis 42 (2021), pp. 442–470.
  • [5] P.L. Anderson, Y.G. Tesfaye, and M.M. Meerschaert, Fourier-parma models and their application to river flows, Journal of Hydrologic Engineering 12 (2007), pp. 462–472.
  • [6] P. Anderson and A. Vecchia, Asymptotic results for periodic autoregressive moving-average processes, Journal of Time Series Analysis 14 (1993), pp. 1–18.
  • [7] P. Bloomfield, H.L. Hurd, and R.B. Lund, Periodic correlation in stratospheric ozone data, Journal of Time Series Analysis 15 (1994), pp. 127–150.
  • [8] P.J. Brockwell and R.A. Davis, Time series: theory and methods, Springer science & business media, 1991.
  • [9] P.H. Franses and R. Paap, Periodic time series models, OUP Oxford, 2004.
  • [10] W. Gardner and L. Franks, Characterization of cyclostationary random signal processes, IEEE Transactions on information theory 21 (1975), pp. 4–14.
  • [11] W. Härdle, G. Kerkyacharian, D. Picard, and A. Tsybakov, Wavelets, approximation, and statistical applications, Vol. 129, Springer Science & Business Media, 2012.
  • [12] D.C. Howell, Statistical methods for psychology, Cengage Learning, 2012.
  • [13] R.H. Jones and W.M. Brelsford, Time series with periodic structure, Biometrika 54 (1967), pp. 403–408.
  • [14] R. Lund and I. Basawa, Recursive prediction and likelihood evaluation for periodic arma models, Journal of Time Series Analysis 21 (2000), pp. 75–93.
  • [15] S. Mallat, A wavelet tour of signal processing, Elsevier, 1999.
  • [16] E.T. McGonigle, R. Killick, and M.A. Nunes, Trend locally stationary wavelet processes, Journal of Time Series Analysis 43 (2022), pp. 895–917.
  • [17] H.A. Mohammadi, S. Ghofrani, and A. Nikseresht, Using empirical wavelet transform and high-order fuzzy cognitive maps for time series forecasting, Applied Soft Computing 135 (2023), p. 109990.
  • [18] G.P. Nason, Wavelet methods in statistics with R, Springer, 2008.
  • [19] R.T. Ogden, Essential wavelets for statistical applications and data analysis, Springer, 1997.
  • [20] M. Pagano, On periodic and multiple autoregressions, The Annals of Statistics (1978), pp. 1310–1317.
  • [21] E. Parzen and M. Pagano, An approach to modeling seasonally stationary time series, Journal of Econometrics 9 (1979), pp. 137–153.
  • [22] D.B. Percival and A.T. Walden, Wavelet methods for time series analysis, Cambridge University Press, 2000.
  • [23] Y.G. Tesfaye, P.L. Anderson, and M.M. Meerschaert, Asymptotic results for fourier-parma time series, Journal of Time Series Analysis 32 (2011), pp. 157–174.
  • [24] A. Vecchia, Maximum likelihood estimation for periodic autoregressive moving average models, Technometrics (1985), pp. 375–384.
  • [25] A. Vecchia, Periodic autoregressive-moving average (parma) modeling with applications to water resources 1, JAWRA Journal of the American Water Resources Association 21 (1985), pp. 721–730.
  • [26] B. Vidakovic, Statistical modeling by wavelets, John Wiley & Sons, 1999.
  • [27] J. Wang, S. Shao, Y. Bai, J. Deng, and Y. Lin, Multiscale wavelet graph autoencoder for multivariate time-series anomaly detection, IEEE Transactions on Instrumentation and Measurement 72 (2022), pp. 1–11.

9 Appendices

In this section, we state the results used to analyse our models, which were proved under the following assumptions:

Let {Yt}\{Y_{t}\} be a process defined by (1). Then,

  1. B1.

    Finite Fourth moment: Eεt4=η<E\varepsilon_{t}^{4}=\eta<\infty.

  2. B2.

    The model admits a causal representation

    Yt=j=0ψt(j)εtj,Y_{t}=\sum_{j=0}^{\infty}\psi_{t}(j)\varepsilon_{t-j},

    where ψt(0)=1\psi_{t}(0)=1 and j=0|ψt(j)|<\sum_{j=0}^{\infty}\left|\psi_{t}(j)\right|<\infty for all tt. Note that ψt(j)=ψt+kν(j)\psi_{t}(j)=\psi_{t+k\nu}(j) for all jj. Also, Yt=Y~tμtY_{t}=\tilde{Y}_{t}-\mu_{t} and εt=σtδt\varepsilon_{t}=\sigma_{t}\delta_{t}, where {δt}\{\delta_{t}\} is i.i.d.

  3. B3.

    The model satisfies an invertibility condition

    εt=j=0πt(j)Ytj,\varepsilon_{t}=\sum_{j=0}^{\infty}\pi_{t}(j)Y_{t-j},

    where πt(0)=1\pi_{t}(0)=1 and j=0|πt(j)|<\sum_{j=0}^{\infty}\left|\pi_{t}(j)\right|<\infty for all tt. Again, πt(j)=πt+kν(j)\pi_{t}(j)=\pi_{t+k\nu}(j) for all jj.

  4. B4.

    The spectral density matrix f(λ)f(\lambda) of the equivalent vector ARMA process given by,

    Ut=j=ΨjZtj,U_{t}=\sum_{j=-\infty}^{\infty}\Psi_{j}Z_{t-j}, (40)

    where Ut=(Ytν,,Y(t+1)ν1)U_{t}=(Y_{t\nu},\dotsc,Y_{(t+1)\nu-1})^{\prime}, Zt=(εtν,,ε(t+1)ν1)Z_{t}=(\varepsilon_{t\nu},\dotsc,\varepsilon_{(t+1)\nu-1})^{\prime} and Ψt\Psi_{t} is the ν×ν\nu\times\nu matrix with ijij entry ψi(tν+ij)\psi_{i}(t\nu+i-j) and for some 0<cC<0<c\leq C<\infty we have,

    czzzf(λ)zCzz,πλπ,cz^{\prime}z\leq z^{\prime}f(\lambda)z\leq Cz^{\prime}z,\quad-\pi\leq\lambda\leq\pi,

    for all zz in RνR^{\nu}.

  5. B5.

    The number of iterations nn of the iterations algorithm satisfies nNv1n\leq Nv-1, and n2/N0n^{2}/N\rightarrow 0 as NN\rightarrow\infty and nn\rightarrow\infty.

  6. B6.

    The number of iterations nn of the iterations algorithm satisfies nNv1n\leq Nv-1, and n3/N0n^{3}/N\rightarrow 0 as NN\rightarrow\infty and nn\rightarrow\infty.

Theorem A1 (Innovation Algorithm for Periodically Stationary Processes, see [3]).

If {Yt}\left\{Y_{t}\right\} has zero mean and E(YYm)=γ(m)E\left(Y_{\ell}Y_{m}\right)=\gamma_{\ell}(m-\ell), where the matrixΓn,i=[γi+n1(m)],m=0,,n1,i=0,,v1\operatorname{trix}\Gamma_{n,i}=\left[\gamma_{i+n-1-\ell}(\ell-m)\right]_{\ell,m=0,\ldots,n-1},i=0,\ldots,v-1, is nonsingular for each n1n\geqslant 1, then the one-step predictors Y^i+n,n0\hat{Y}_{i+n},n\geqslant 0, and their mean-square errors vn,i,n1v_{n,i},n\geqslant 1, are given by

Y^i+n={0 if n=0j=1nθn,j(i)(Yi+njY^i+nj) if n1,\hat{Y}_{i+n}=\begin{cases}0&\text{ if }n=0\\ \sum\limits_{j=1}^{n}\theta_{n,j}^{(i)}\left(Y_{i+n-j}-\hat{Y}_{i+n-j}\right)&\text{ if }n\geqslant 1\end{cases},

and for k=0,1,,n1,k=0,1,\ldots,n-1,

v0,i=γi(0),v_{0,i}=\gamma_{i}(0),

θn,nk(i)=(vk,i)1[γi+k(nk)j=0k1θk,kj(i)θn,nj(i)vj,i],\theta_{n,n-k}^{(i)}=\left(v_{k,i}\right)^{-1}\left[\gamma_{i+k}(n-k)-\sum_{j=0}^{k-1}\theta_{k,k-j}^{(i)}\theta_{n,n-j}^{(i)}v_{j,i}\right], (41)
vn,i=γi+n(0)j=0n1(θn,nj(i))2vj,i.v_{n,i}=\gamma_{i+n}(0)-\sum_{j=0}^{n-1}\left(\theta_{n,n-j}^{(i)}\right)^{2}v_{j,i}. (42)

where (41) is solved recursively in the order v0,i;θ1,1(i),v1,i;θ2,2(i),θ2,1(i),v2,i;θ3,3(i),θ3,2(i),θ3,1(i)v_{0,i};\theta_{1,1}^{(i)},v_{1,i};\theta_{2,2}^{(i)},\theta_{2,1}^{(i)},v_{2,i};\theta_{3,3}^{(i)},\theta_{3,2}^{(i)},\theta_{3,1}^{(i)}, v3,i,v_{3,i},\ldots.
The estimates of θn,j(i)\theta_{n,j}^{(i)} and vn,iv_{n,i}, θ^n,j(i)\hat{\theta}_{n,j}^{(i)} and v^n,i\hat{v}_{n,i} respectively, can be obtained by substituting the estimates of autocovariance function (3).

Theorem A2 (see [3]).

Let Y~t\tilde{Y}_{t} be a process as defined in (1). Under the assumptions B1-B5, we have,

θ^n,j(in)𝑝ψi(j),asn,for allj,i=0,1,,ν1.\hat{\theta}_{n,j}^{(\langle i-n\rangle)}\xrightarrow{p}\psi_{i}(j),\quad\text{as}\,n\rightarrow\infty,\quad\text{for all}\,j,\,i=0,1,\dotsc,\nu-1. (43)

Here the notation, b\langle b\rangle is bνb/νforb=0,1,b-\nu\lfloor b/{\nu}\rfloor\,\text{for}\,b=0,1,\dotsc and ν+bνb/ν+1\nu+b-\nu\lfloor b/{\nu}+1\rfloor for b=1,2,b=-1,-2,\dotsc, where \lfloor\cdot\rfloor is the greatest integer function.

Theorem A3 (see [3]).

Let Y~t\tilde{Y}_{t} be a process as defined in (1). Under the assumptions B1-B5, we have,

v^n,in𝑝σi2,asn,i=0,1,,ν1.\hat{v}_{n,\langle i-n\rangle}\xrightarrow{p}\sigma_{i}^{2},\quad\text{as}\,n\rightarrow\infty,\,i=0,1,\dotsc,\nu-1. (44)

Here the notation, b\langle b\rangle is bνb/νforb=0,1,b-\nu\lfloor b/{\nu}\rfloor\,\text{for}\,b=0,1,\dotsc and ν+bνb/ν+1\nu+b-\nu\lfloor b/{\nu}+1\rfloor for b=1,2,b=-1,-2,\dotsc, where \lfloor\cdot\rfloor is the greatest integer function.

Theorem A4 (see [23]).

Let {Yt}\{Y_{t}\} be a PARν(1)\text{PAR}_{\nu}(1) as given in (4) having causal representation B2. Then, {ϕ0,ϕ1,,ϕν1}={ψ0(1),ψ1(1),,ψν1}\{\phi_{0},\phi_{1},\dotsc,\phi_{\nu-1}\}^{\prime}=\{\psi_{0}(1),\psi_{1}(1),\dotsc,\psi_{\nu-1}\}^{\prime}.

Theorem A5 (see [2], [23]).

Let {Yt}\{Y_{t}\} be a PARν(1)\text{PAR}_{\nu}(1) as given in (4). Under the assumptions B1-B4 and B6, and using Theorem A4, we have,

N1/2(ϕ^iϕi)𝒩(0,σi12σi2),i=0,1,,ν1.N^{1/2}(\hat{\phi}_{i}-\phi_{i})\Rightarrow\mathcal{N}(0,\sigma_{i-1}^{-2}\sigma_{i}^{2}),\quad i=0,1,\dotsc,\nu-1. (45)
Theorem A6 (see [4]).

Let Y~t\tilde{Y}_{t} be a process as defined in (1) having causal representation B2. Then,

N1/2(𝝁^𝝁)𝒩(0,Σ𝝁),N^{1/2}(\boldsymbol{\hat{\mu}}-\boldsymbol{\mu})\Rightarrow\mathcal{N}(0,\Sigma_{\boldsymbol{\mu}}), (46)

where Σ𝝁=n=BnΠn,Bn=diag(γ0(n),γ1(n),,γν1(n)),\Sigma_{\boldsymbol{\mu}}=\sum_{n=\infty}^{\infty}B_{n}\Pi^{n},B_{n}=\text{diag}\,(\gamma_{0}(n),\gamma_{1}(n),\dotsc,\gamma_{\nu-1}(n)), and Π\Pi is as given in (35).

Theorem A7 (see [4]).

Let Y~t\tilde{Y}_{t} be a process as defined in (1). Under assumptions B1-B5, and if,

N3/4l>n|πi(l)|0asNandn,N^{3/4}\sum_{l>n}|\pi_{i}(l)|\rightarrow 0\,\text{as}\,N\rightarrow\infty\,\text{and}\,n\rightarrow\infty, (47)

we have,

(v^n,0nv^n,1nv^n,ν1n)isAN[(σ02σ12σν12),N1(s0,0s0,ν1sν1,0sν1,ν1)],\begin{pmatrix}\hat{v}_{n,\langle 0-n\rangle}\\ \hat{v}_{n,\langle 1-n\rangle}\\ \vdots\\ \hat{v}_{n,\langle\nu-1-n\rangle}\end{pmatrix}\,\text{is}\,\text{AN}\left[\begin{pmatrix}\sigma_{0}^{2}\\ \sigma_{1}^{2}\\ \vdots\\ \sigma_{\nu-1}^{2}\end{pmatrix},N^{-1}\begin{pmatrix}s_{0,0}&\dotsc&s_{0,\nu-1}\\ \vdots&\cdots&\vdots\\ s_{\nu-1,0}&\dotsc&s_{\nu-1,\nu-1}\end{pmatrix}\right], (48)

where si,js_{i,j} is given by,

si,j=m1=0m2=0πi(m1)πj(m2)(Wm1m2)im1,jm2,s_{i,j}=\sum_{m_{1}=0}^{\infty}\sum_{m_{2}=0}^{\infty}\pi_{i}(m_{1})\pi_{j}(m_{2})(W_{m_{1}m_{2}})_{i-m_{1},j-m_{2}},

and (Wm1,m2)i,j(W_{m_{1},m_{2}})_{i,j} is given by,

(Wm1m2)i,j=\displaystyle\left(W_{m_{1}m_{2}}\right)_{i,j}= (η3)j1=ψi(j1)ψi+m1(j1+m1)\displaystyle(\eta-3)\sum_{j_{1}=-\infty}^{\infty}\psi_{i}\left(j_{1}\right)\psi_{i+m_{1}}\left(j_{1}+m_{1}\right)
n=ψj(j1+nν+ji)ψj+m2(j1+nν+ji+m2)σij14\displaystyle\cdot\sum_{n=-\infty}^{\infty}\psi_{j}\left(j_{1}+n\nu+j-i\right)\psi_{j+m_{2}}\left(j_{1}+n\nu+j-i+m_{2}\right)\sigma_{i-j_{1}}^{4}
+n=[γi(nν+ji)γi+m1(nν+jim1+m2)\displaystyle+\sum_{n=-\infty}^{\infty}\left[\gamma_{i}(n\nu+j-i)\gamma_{i+m_{1}}\left(n\nu+j-i-m_{1}+m_{2}\right)\right.
+γi(nν+ji+m2)γi+m1(nν+jim1)],for all\displaystyle\left.+\gamma_{i}\left(n\nu+j-i+m_{2}\right)\gamma_{i+m_{1}}\left(n\nu+j-i-m_{1}\right)\right],\,\text{for all}

with η=E(δt4)\eta=E(\delta_{t}^{4}) and {δt}={σt1ϵt}\{\delta_{t}\}=\{\sigma_{t}^{-1}\epsilon_{t}\}.

Theorem A8 (see [23]).

Let Y~t\tilde{Y}_{t} be PARMA(1,1) process as defined in (5) having causal representation B2. Then,

ϕt=ψt(2)ψt1(1).\phi_{t}=\frac{\psi_{t}(2)}{\psi_{t-1}(1)}. (49)
θt=ϕtψt(1).\theta_{t}=\phi_{t}-\psi_{t}(1). (50)
Theorem A9 (see [23]).

Let Y~t\tilde{Y}_{t} be the PARMA(1,1) process as defined in (5). Using assumptions B1-B4 and B6, we have,

N(1/2)(ϕ^ϕ)𝒩(0,𝒬),N^{(1/2)}(\hat{\boldsymbol{\phi}}-\boldsymbol{\phi})\implies\mathcal{N}(0,\mathcal{Q}), (51)

where ϕ^=[ϕ^0,,ϕ^ν1],ϕ=[ϕ0,,ϕν1]\boldsymbol{\hat{\phi}}=[\hat{\phi}_{0},\dotsc,\hat{\phi}_{\nu-1}]^{\prime},\boldsymbol{\phi}=[\phi_{0},\dotsc,\phi_{\nu-1}]^{\prime} and ν×ν\nu\times\nu matrix 𝒬\mathcal{Q} is defined by

𝒬=k,l=12𝒱kk,\mathcal{Q}=\sum_{k,l=1}^{2}\mathcal{H}_{\ell}\mathcal{V}_{\ell k}\mathcal{H}_{k}^{\prime}, (52)

where

𝒱k=j=1x({jΠj)j(kjΠ(kj))},x=min(,k),\mathcal{V}_{\ell k}=\sum_{j=1}^{x}(\{\mathcal{F}_{\ell-j}\Pi^{\ell-j})\mathcal{B}_{j}(\mathcal{F}_{k-j}\Pi^{-(k-j)})^{\prime}\},\,x=min(\ell,k), (53)
j=diag{ψ0(j),ψ1(j),,ψν1(j)},\mathcal{F}_{j}=\text{diag}\{\psi_{0}(j),\psi_{1}(j),\dotsc,\psi_{\nu-1}(j)\}, (54)
j=diag{σ02σ0j2,σ12σ1j2,σν12σν1j2},\mathcal{B}_{j}=\text{diag}\{\sigma_{0}^{2}\sigma_{0-j}^{2},\sigma_{1}^{2}\sigma_{1-j}^{2}\dotsc,\sigma_{\nu-1}^{2}\sigma_{\nu-1-j}^{2}\}, (55)

1=2Π112\mathcal{H}_{1}=-\mathcal{F}_{2}\Pi^{-1}\mathcal{F}_{1}^{-2}, 2=Π111Π\mathcal{H}_{2}=\Pi^{-1}\mathcal{F}_{1}^{-1}\Pi and Π\Pi is as defined in (35).

Theorem A10 (see [23]).

Let Y~t\tilde{Y}_{t} be the PARMA(1,1) process as defined in (5). Using assumptions B1-B4 and B6, we have,

N(1/2)(𝜽^𝜽)𝒩(0,𝒮),N^{(1/2)}(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta})\implies\mathcal{N}(0,\mathcal{S}), (56)

where 𝜽^=[θ^0,,θ^ν1],𝜽=[θ0,,θν1]\boldsymbol{\hat{\theta}}=[\hat{\theta}_{0},\dotsc,\hat{\theta}_{\nu-1}]^{\prime},\boldsymbol{\theta}=[\theta_{0},\dotsc,\theta_{\nu-1}]^{\prime} and ν×ν\nu\times\nu matrix 𝒮\mathcal{S} is defined by

𝒮=k,l=12𝒱kk,\mathcal{S}=\sum_{k,l=1}^{2}\mathcal{M}_{\ell}\mathcal{V}_{\ell k}\mathcal{M}_{k}^{\prime}, (57)

where 1=2Π112\mathcal{M}_{1}=\mathcal{I}-\mathcal{F}_{2}\Pi^{-1}\mathcal{F}_{1}^{-2}, 2=Π111Π,\mathcal{M}_{2}=\Pi^{-1}\mathcal{F}_{1}^{-1}\Pi, 𝒱k\mathcal{V}_{\ell k} is given in (53) and Π\Pi is as given in (35). Here, \mathcal{I} is the ν×ν\nu\times\nu identity matrix.

Theorem A11 (see [4]).

For a given vector 𝐗\mathbf{X}, the periodic function 𝐗=[Xt:0tν1]\mathbf{X}=[X_{t}:0\leq t\leq\nu-1] has the Fourier representation

Xt=c0+r=1{crcos(2πrtν)+srsin(2πrtν)},X_{t}=c_{0}+\sum_{r=1}^{\ell}\left\{c_{r}cos\left(\frac{2\pi rt}{\nu}\right)+s_{r}sin\left(\frac{2\pi rt}{\nu}\right)\right\}, (58)

where crc_{r} and srs_{r} are Fourier coefficients and

={ν/2ifνeven(ν1)/2ifνodd.\ell=\begin{cases}\nu/2\quad\text{if}\,\nu\,even\\ (\nu-1)/2\quad\text{if}\,\nu\,odd\end{cases}.

Let f={[c0,c1,s1,,c(ν1)/2,s(ν1)/2](νodd)[c0,c1,s1,,s(ν/21),c(ν/2)](νeven)f=\begin{cases}[c_{0},c_{1},s_{1},\dotsc,c_{(\nu-1)/2},s_{(\nu-1)/2}]^{\prime}\quad(\nu\,\text{odd})\\ [c_{0},c_{1},s_{1},\dotsc,s_{(\nu/2-1)},c_{(\nu/2)}]^{\prime}\quad(\nu\,\text{even})\end{cases}.

Then,f=𝒫𝒰𝐗,\text{Then,}\quad f=\mathcal{L}\mathcal{P}\mathcal{U}\mathbf{X}, (59)

where \mathcal{L} and 𝒰\mathcal{U} are respectively given by,

={diag(ν1/2,2/ν,,2/ν)(νodd)diag(ν1/2,2/ν,,2/ν,ν1/2)(νeven),\mathcal{L}=\begin{cases}\text{diag}(\nu^{-1/2},\sqrt{2/\nu},\dotsc,\sqrt{2/\nu})\quad\quad\quad\quad(\nu\,\text{odd})\\ \text{diag}(\nu^{-1/2},\sqrt{2/\nu},\dotsc,\sqrt{2/\nu},\nu^{-1/2})\quad(\nu\,\text{even})\end{cases}, (60)

and

𝒰=ν1/2(ei2πrtν).\mathcal{U}=\nu^{-1/2}\left(e^{\frac{-i2\pi rt}{\nu}}\right). (61)

The (,j)(\ell,j)th element of the 𝒫\mathcal{P} is given by,

[𝒫]j={1 if =j=0;21/2 if =2r1 and j=r for some 1r[(v1)/2]21/2 if =2r1 and j=vr for some 1k(v1)/2i21/2 if =2r and j=r for some 1r[(v1)/2];i21/2 if =2r and j=vr for some 1k(v1)/2;1 if =v1 and j=v/2 and v is even; and 0 otherwise. .[\mathcal{P}]_{\ell j}=\begin{cases}1&\text{ if }\ell=j=0;\\ 2^{-1/2}&\text{ if }\ell=2r-1\text{ and }j=r\text{ for some }1\leq r\leq[(v-1)/2]\\ 2^{-1/2}&\text{ if }\ell=2r-1\text{ and }j=v-r\text{ for some }1\leq k\leq\lfloor(v-1)/2\rfloor\\ i2^{-1/2}&\text{ if }\ell=2r\text{ and }j=r\text{ for some }1\leq r\leq[(v-1)/2];\\ -i2^{-1/2}&\text{ if }\ell=2r\text{ and }j=v-r\text{ for some }1\leq k\leq\lfloor(v-1)/2\rfloor;\\ 1&\text{ if }\ell=v-1\text{ and }j=v/2\text{ and }v\text{ is even; and }\\ 0&\text{ otherwise. }\end{cases}. (62)
Theorem A12 (see [8]).

If 𝐗n\mathbf{X}_{n} is a k×1k\times 1 vector satisfying,

𝐗n𝒩(𝝁𝒏,Σn),\mathbf{X}_{n}\Rightarrow\mathcal{N}(\boldsymbol{\mu_{n}},\Sigma_{n}), (63)

and \mathcal{B} is any non-zero m×km\times k matrix such that the matrices Σn\mathcal{B}\Sigma_{n}\mathcal{B}^{\prime} have no zero diagonal elements then,

𝐗n𝒩(𝝁n,Σn).\mathcal{B}\mathbf{X}_{n}\Rightarrow\mathcal{N}(\mathcal{B}\boldsymbol{\mu}_{n},\mathcal{B}\Sigma_{n}\mathcal{B}^{\prime}). (64)