This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Empirical Likelihood Inference of Variance Components in Linear Mixed-Effects Models

J. Zhang Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A. W. Guo Genetic Epidemiology Research Branch, National Institute of Mental Health, National Institutes of Health J.S. Carpenter The University of Sydney’s Brain and Mind Centre Andrew Leroux Genetic Epidemiology Research Branch, National Institute of Mental Health, National Institutes of Health Department of Biostatistics & Informatics, University of Colorado K.R. Merikangas Genetic Epidemiology Research Branch, National Institute of Mental Health, National Institutes of Health N.G. Martin QIMR Berghofer Medical Research Institute I.B. Hickie The University of Sydney’s Brain and Mind Centre H. Shou Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A. H. Li 111hongzhe@pennmedicine.upenn.edu Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A.
Abstract

Linear mixed-effects models are widely used in analyzing repeated measures data, including clustered and longitudinal data, where inferences of both fixed effects and variance components are of importance. Unlike the fixed effect inference that has been well studied, inference on the variance components is more challenging due to null value being on the boundary and the nuisance parameters of the fixed effects. Existing methods often require strong distributional assumptions on the random effects and random errors. In this paper, we develop empirical likelihood-based methods for the inference of the variance components in the presence of fixed effects. A nonparametric version of the Wilks’ theorem for the proposed empirical likelihood ratio statistics for variance components is derived. We also develop an empirical likelihood test for multiple variance components related to a sequence of correlated outcomes. Simulation studies demonstrate that the proposed methods exhibit better type 1 error control than the commonly used likelihood ratio tests when the Gaussian distributional assumptions of the random effects are violated. We apply the methods to investigate the heritability of physical activity as measured by wearable device in the Australian Twin study and observe that such activity is heritable only in the quantile range from 0.375 to 0.514.

Keywords  Boundary value; Global test; Heritability; Nonparametric test; Wearable device data.

1 Introduction

Longitudinal and clustered data commonly arise from observational studies or clinical trials, where subjects are measured repeatedly over time or within a cluster. The repeated measures within a subject or a cluster are often correlated. To analyze such data, linear mixed-effects models that incorporate both fixed and random effects are widely used. Many statistical methods have been developed for such linear mixed-effects models, especially methods for inference of the fixed effects. However, inference on the variance components is less studied and often requires strong distributional assumptions on the random effects and the error terms. When the underlying distributions are known, classical inference methods, including the likelihood ratio tests and the score tests, can be applied. However, these parametric methods are often restrictive and not robust if the model assumptions are violated.

Empirical likelihood (el) method, as an alternative to parametric likelihood-based methods, was first proposed by Owen, (1988) and has been applied to many statistical inference problems. Without a prespecified distributional assumption on the data, el methods incorporate side information through constraints or prior distributions and have favorable statistical properties, including but not limited to Bartlett correctability, transformation invariance, better coverage accuracy of the corresponding confidence internals and greater power. A comprehensive introduction to el methods can be found in Owen, (2001). el methods have been applied to inferences of mixture models (Zou et al.,, 2002) and censored survival data (Chang and McKeague,, 2016), and have also been considered for longitudinal data modeling. For example, You et al., (2006) proposed a block el method for inference of the regression parameters assuming a working independence covariance, and Xue and Zhu, (2007) considered a semiparametric regression model, where the repeated within-subject measures are summarized as a function over time in order to address the dependence issue. Wang et al., (2010) proposed a generalized el method that takes into account the within-subject correlations. Li and Pan, (2013) defined an empirical likelihood ratio (elr) test by utilizing the extended score from quadratic inference functions for longitudinal data, which does not involve direct estimation of the correlation parameters.

The el methods mentioned above only focus on the inference of fixed effects in linear mixed effect models. In this paper, we consider a general setting of linear mixed-effects models and develop el methods for the inference of the variance components. Specifically, suppose there are nn subjects and denote by nin_{i} the number of repeated measures for the iith subject. For the iith subject, we observe a response vector yiRniy_{i}\in R^{n_{i}}, an ni×pn_{i}\times p design matrix XiX_{i} for the fixed effects βRp\beta^{*}\in R^{p}, and dd ni×nin_{i}\times n_{i} semi-positive design matrices Φiq(q=1,,d)\Phi_{iq}~{}(q=1,\cdots,d) for the variance components θ(R+{0})d\theta^{*}\in(R_{+}\cup\{0\})^{d}. The general linear mixed-effects model can be written as

yi=Xiβ+ri,i=1,,n,y_{i}=X_{i}\beta^{*}+r_{i},\quad i=1,\cdots,n, (1)

where riRnir_{i}\in R^{n_{i}} is a zero-mean random variable with variance-covariance Hi(θ)H_{i}(\theta^{*}). We assume that Hi(θ)H_{i}(\theta^{*}) has a linear structure,

Hi(θ)=q=1dθqΦiq,θ=(θ1,,θd)T=(θ1,θ(1)T)T,H_{i}(\theta^{*})=\sum_{q=1}^{d}\theta^{*}_{q}\Phi_{iq},\quad\theta^{*}=(\theta^{*}_{1},\cdots,\theta^{*}_{d})^{T}=(\theta^{*}_{1},\theta^{*T}_{(1)})^{T},

where θ=(θ1,,θd)T\theta^{*}=(\theta^{*}_{1},\cdots,\theta^{*}_{d})^{T} is the vector of the variance components. We emphasize that this general setting does not require any assumptions on the distributions of the data or the distributions of the random effects.

In many real applications, we are interested in making statistical inference on the variance components θ\theta^{*} in model (1). For example, in the study of heritability based on twin data, each monozygotic twin or dizygotic twin is treated as one cluster, and the linear variance structure can be constructed based on the twin type (see details in Section 6). In the heritability analysis, a key question is whether there exists an genetic effect, which motivates us to study the inference of one of the variance components, say, θ1\theta^{*}_{1}. We propose to develop an el based inference method for θ1\theta^{*}_{1} without any assumptions on the random components. The method can effectively account for the nuisance parameters, including the unknown fixed effects β\beta^{*} and the variance components θ(1)T\theta^{*T}_{(1)}. The key difficulty when compared to the el inference of the fixed effects is to deal with the boundary value problem when θ1=0\theta^{*}_{1}=0 in local testing problem H0:θ1=θ10H_{0}:\theta^{*}_{1}=\theta_{1}^{0}. To solve the issues, we propose a new empirical likelihood ratio test by utilizing an unbiased estimator of β\beta^{*} under very mild conditions, and prove that the asymptotic distribution of the test statistic is a mixture of χ2\chi^{2} distribution.

Motivated by heritability analysis of daily activity distribution as measured by wearable device such as actigraphy, we also consider the setting when linear mixed-effects models are fitted to a sequence of dependent outcomes. The wearable device data have been increasingly collected for continuous activity monitoring in large observational or experimental studies (Burton et al.,, 2013; Krane-Gartiser et al.,, 2014). In typical wearable activity tracking data, the activity is measured at one-minute resolution over several days for a given subject. Such wearable device data with repeated measures enable us to account for day-to-day variability of the activity. Instead of focusing on the activity counts at any minute of the day, daily activity distribution or the amount of time with the activity count above a given threshold provides a biologically meaningful measure of the activity traits. When the activity counts are summarized as distributions, we can consider the activity quantile profile as a phenotype measure. In analysis of such wearable device data, we fit a linear mixed-effects model for each of the activity level or quantile yi(t)y_{i}(t) at tt. Denote by θ(t)\theta^{*}(t) the variance components for activity profile at level tt. We are then interested in testing the global null H0:θ1(t)θ10,t[t1,t2]H_{0}:\theta^{*}_{1}(t)\equiv\theta_{1}^{0},~{}t\in[t_{1},t_{2}]. We develop a max-type statistic for this global testing problem. Since the numerator of the proposed empirical likelihood ratio (elr) tests can be rewritten as the sum of approximately independent random variables over different subjects, a random perturbation method is developed to approximate the pp-values of the proposed global test.

We first introduce some notation. Denote by (A)1(A)_{-1} the submatrix of AA without the first column of AA. For two vectors or matrices AA and BB of compatible dimension, define the inner product A,B=tr(ATB)\langle A,B\rangle=\text{tr}(A^{T}B). For a matrix Dm×n=(D1,,Dn)D_{m\times n}=(D_{1},\cdots,D_{n}), where DiD_{i} is the iith column of DD, the vectorized DD is defined by (D1T,,DnT)T(D_{1}^{T},\cdots,D_{n}^{T})^{T}. Let E(x)E(x) and var(x)\text{var}(x) be the expectation and variance of a random vector xx, and let cov(x,y)\text{cov}(x,y) be the covariance of random vectors xx and yy. When xx is a random matrix, E(x)E(x) and var(x)\text{var}(x) represent the expectation and variance of the vectorized xx. When xx and yy are random matrices, cov(x,y)\text{cov}(x,y) denotes the covariance of the vectorized xx and vectorized yy. We use a=O(b)a=O(b) to denote that aa and bb are of the same order, and a=o(b)a=o(b) to denote that aa is of a smaller order than bb. We use x=Op(y)x=O_{p}(y) to denote that xx and yy are of the same order in probability, and x=op(y)x=o_{p}(y) to denote that xx is of a smaller order than y in probability.

2 elr test for the fixed effects β\beta^{*}

Statistical tests for the fixed effects in the linear mixed-effects model (1), H0:β=β0H_{0}:\beta^{*}=\beta_{0}, has been well studied. We first briefly review the subject-wise el method proposed in Wang et al., (2010), where the covariance structure for each subject is considered.

Let H^in\hat{H}_{in} be an estimator of HiH_{i}, and assume that H^in\hat{H}_{in} converges to some HiH_{i}^{*} in probability uniformly over all i=1,,ni=1,\cdots,n. One such a nonparametric sample covariance matrix H^in\hat{H}_{in} can be obtained using a simple two-step procedure, including estimating the residuals r^i=yiXiβ^\hat{r}_{i}=y_{i}-X_{i}\hat{\beta}, where β^\hat{\beta} is the least-squares estimator using working independence correlation matrices, and solving the constrained optimization problem minθ0i=1nHi(θ)r^ir^iTF2\min_{\theta\geq 0}\sum_{i=1}^{n}\|H_{i}(\theta)-\hat{r}_{i}\hat{r}_{i}^{T}\|_{F}^{2}. Let

ϕi(β)=XiTH^in1(yiXiβ),\phi_{i}(\beta)=X_{i}^{T}\hat{H}_{in}^{-1}(y_{i}-X_{i}\beta),

which satisfies E{ϕi(β)}=0E\{\phi_{i}(\beta)\}=0 when β\beta is the true value. Denote by pip_{i} the point mass at the iith subject. The nonparametric empirical likelihood is defined as

L0(β)=suppi{i=1npi:pi0,i=1npi=1,i=1npiϕi(β)=0}.L_{0}(\beta)=\sup_{p_{i}}\Big{\{}\prod_{i=1}^{n}p_{i}:p_{i}\geq 0,\sum_{i=1}^{n}p_{i}=1,\sum_{i=1}^{n}p_{i}\phi_{i}(\beta)=0\Big{\}}.

Since it can be proved that maxβL0(β)=1/nn\max_{\beta}L_{0}(\beta)=1/n^{n} (Owen,, 2001), Wang et al., (2010) proposed the elr statistic

elr0(β0)=L0(β0)maxβL0(β)=nnL0(β0).\textsc{elr}_{0}(\beta_{0})=\frac{L_{0}(\beta_{0})}{\max_{\beta}L_{0}(\beta)}=n^{n}L_{0}(\beta_{0}).

To obtain the asymptotic distribution of the elr statistic, the following regularity conditions are needed.

Condition 1.

As nn\rightarrow\infty, P(0ch{ϕ1(β0),,ϕn(β0)})1P(0\in ch\{\phi_{1}(\beta_{0}),\cdots,\phi_{n}(\beta_{0})\})\rightarrow 1, where ch{}ch\{\} is the convex hull.

Condition 2.

The limit limnn1i=1nXiTHi1HiHi1Xi\lim_{n\rightarrow\infty}n^{-1}\sum_{i=1}^{n}X_{i}^{T}H_{i}^{*-1}H_{i}H_{i}^{*-1}X_{i} exists and is positive definite.

Condition 3.

The expectation Eϕi(β0)22+γ1E\|\phi_{i}(\beta_{0})\|_{2}^{2+\gamma_{1}} are bounded uniformly for some γ1>0\gamma_{1}>0.

Condition 4.

Let G^in=H^in1\hat{G}_{in}=\hat{H}_{in}^{-1} with element g^ijk\hat{g}_{ijk}, xijTx_{ij}^{T} be the jjth row of XiX_{i}, and rikr_{ik} be the kkth element of rir_{i}. For each pair ii and ii^{\prime} with i,i=1,,ni,i^{\prime}=1,\cdots,n and iii\neq i^{\prime}, g^ijkg^(i,i)jk=Op(n1)\hat{g}_{ijk}-\hat{g}_{-(i,i^{\prime})jk}=O_{p}(n^{-1}) and sufficient moment conditions are satisfied so that E(B^ii)=O(n1)E(\hat{B}_{ii^{\prime}})=O(n^{-1}) and E(B^iiB^iiT)=O(n2)E(\hat{B}_{ii^{\prime}}\hat{B}_{ii^{\prime}}^{T})=O(n^{-2}), where g^(i,i)jk\hat{g}_{-(i,i^{\prime})jk} is g^ijk\hat{g}_{ijk} but computed with all the data except for subjects ii and ii^{\prime} and B^ii=j=1nik=1ni(g^ijkg^(i,i)jk)xijrik\hat{B}_{ii^{\prime}}=\sum_{j=1}^{n_{i}}\sum_{k=1}^{n_{i}}(\hat{g}_{ijk}-\hat{g}_{-(i,i^{\prime})jk})x_{ij}r_{ik}.

Conditions 13 are common conditions for the empirical likelihood methods (Owen,, 1991). Condition 4 assumes mild constraints on H^in1\hat{H}_{in}^{-1} to ensure that the difference between the statistic elr0(β0)\textsc{elr}_{0}(\beta_{0}) defined with H^in\hat{H}_{in} and the one using HiH_{i}^{*} vanishes as nn\rightarrow\infty. Under these regularity conditions, the following theorem provides the asymptotic distribution of the elr test elr0(β0)\textsc{elr}_{0}(\beta_{0}) (Wang et al.,, 2010) under the null.

Theorem 1.

Under the regularity conditions (1)–(4), as nn\rightarrow\infty, 2logelr0(β0)χp2-2\log\textsc{elr}_{0}(\beta_{0})\rightarrow\chi_{p}^{2} in distribution under the null hypothesis H0:β=β0H_{0}:\beta^{*}=\beta_{0}.

The asymptotic result only requires that the H^in\hat{H}_{in} converge uniformly to some HiH_{i}^{*}, which may not be the true HiH_{i} (Wang et al.,, 2010). When the correlation structure is correctly specified, the estimator H^in\hat{H}_{in} is a consistent estimator of Hi=HiH_{i}^{*}=H_{i}. The statistic defined with the true HiH_{i} is asymptotically locally most powerful among all the choices of the weight matrices.

3 elr test for the variance component θ1\theta_{1}^{*}

We consider the local test H0:θ1=θ10H_{0}:\theta^{*}_{1}=\theta_{1}^{0} in the framework of the empirical likelihood, including the null H0:θ1=0H_{0}:\theta_{1}^{*}=0, which is of the most interest. We define ri=yiXiβr_{i}=y_{i}-X_{i}\beta^{*} and Ri=ririTR_{i}=r_{i}r_{i}^{T}. Since E(ri)=0E(r_{i})=0 and var(ri)=Hi(θ)\text{var}(r_{i})=H_{i}(\theta^{*}), we have

Ri=Hi(θ)+δi=q=1dθqΦiq+δi,R_{i}=H_{i}(\theta^{*})+\delta_{i}=\sum_{q=1}^{d}\theta^{*}_{q}\Phi_{iq}+\delta_{i},

where E(δi)=0E({\delta_{i}})=0 and var(δi)\text{var}({\delta_{i}}) exists. Since β\beta^{*} is unknown, we first need an estimator of β\beta^{*}, denoted by β^\hat{\beta}. One simple choice is the least-squares estimator using the all data. Specifically, we stack the data from all subjects by denoting X=(X1T,,XnT)TX=(X_{1}^{T},\cdots,X_{n}^{T})^{T}, y=(y1T,,ynT)Ty=(y_{1}^{T},\cdots,y_{n}^{T})^{T}, and r=(r1T,,rnT)Tr=(r_{1}^{T},\cdots,r_{n}^{T})^{T}. Model (1) can be rewritten as

y=Xβ+r.y=X\beta^{*}+r.

Then the least-squares estimator is β^=(XTX)1XTy\hat{\beta}=(X^{T}X)^{-1}X^{T}y. For i=1,,ni=1,\cdots,n, let r^i=yiXiβ^=ri+Xi(ββ^).\hat{r}_{i}=y_{i}-X_{i}\hat{\beta}=r_{i}+X_{i}(\beta^{*}-\hat{\beta}). We have

R^i\displaystyle\hat{R}_{i} =r^ir^iT=ririT+ri(ββ^)TXiT+Xi(ββ^)riT+Xi(ββ^)(ββ^)TXiT\displaystyle=\hat{r}_{i}\hat{r}_{i}^{T}=r_{i}r_{i}^{T}+r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}+X_{i}(\beta^{*}-\hat{\beta})r_{i}^{T}+X_{i}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}
=Ri+ϵ^i=Hi(θ)+δi+ϵ^i,\displaystyle=R_{i}+\hat{\epsilon}_{i}=H_{i}(\theta^{*})+\delta_{i}+\hat{\epsilon}_{i},

where ϵ^i=ri(ββ^)TXiT+Xi(ββ^)riT+Xi(ββ^)(ββ^)TXiT\hat{\epsilon}_{i}=r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}+X_{i}(\beta^{*}-\hat{\beta})r_{i}^{T}+X_{i}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}.

To control the rates of E(ϵ^i)E({\hat{\epsilon}_{i}}), cov(ririT,ϵ^j)\text{cov}({r_{i}r_{i}^{T}},{\hat{\epsilon}_{j}}), and cov(ϵ^i,ϵ^j)\text{cov}({\hat{\epsilon}_{i}},{\hat{\epsilon}_{j}}), we need the following condition, which is also commonly used for empirical likelihood methods.

Condition 5.

The expectation Eri24+γ1E\|r_{i}\|_{2}^{4+\gamma_{1}} are bounded uniformly for some γ1>0\gamma_{1}>0.

Under Condition 5, we see that the least-squares estimator β^\hat{\beta} is good enough.

Proposition 1.

Assume that n1XTXΣn^{-1}X^{T}X\rightarrow\Sigma and n1/2XTr𝑑ηn^{-1/2}X^{T}r\xrightarrow{d}\eta as nn\rightarrow\infty, where 0<Σ2,Σ12<0<\|\Sigma\|_{2},\|\Sigma^{-1}\|_{2}<\infty, Eη=0E\eta=0 and Eη24=O(1)E\|\eta\|_{2}^{4}=O(1). When Condition 5 holds and β^=(XTX)1XTy\hat{\beta}=(X^{T}X)^{-1}X^{T}y, we have E(ϵ^i)=O(n1),E({\hat{\epsilon}_{i}})=O(n^{-1}), i=1,,ni=1,\cdots,n, and cov(ririT,ϵ^j),cov(ϵ^i,ϵ^j)=O(n2)\text{cov}({r_{i}r_{i}^{T}},{\hat{\epsilon}_{j}}),~{}\text{cov}({\hat{\epsilon}_{i}},{\hat{\epsilon}_{j}})=O(n^{-2}), i,j=1,,n,iji,j=1,\cdots,n,i\neq j.

Let Ξ=(Ξkl)d×d\Xi=(\Xi_{kl})_{d\times d} with Ξkl=i=1ntr(ΦikΦil)\Xi_{kl}=\sum_{i=1}^{n}\text{tr}(\Phi_{ik}\Phi_{il}), and Υ^=(Υ^1,,Υ^d)T\hat{\Upsilon}=(\hat{\Upsilon}_{1},\cdots,\hat{\Upsilon}_{d})^{T} with Υ^k=i=1ntr(ΦikR^i)\hat{\Upsilon}_{k}=\sum_{i=1}^{n}\text{tr}(\Phi_{ik}\hat{R}_{i}). We define

Z^i(θ1)=tr{Φi1(R^iΦi1θ1q=2dθ^qΦiq)},i=1,,n,\hat{Z}_{i}(\theta_{1})=\text{tr}\Bigg{\{}\Phi_{i1}\Bigg{(}\hat{R}_{i}-\Phi_{i1}\theta_{1}-\sum_{q=2}^{d}\hat{\theta}_{q}\Phi_{iq}\Bigg{)}\Bigg{\}},~{}i=1,\cdots,n,

where

θ^(1)=(θ^2,,θ^q)T=(Ξ1)1TΥ^.\hat{\theta}_{(1)}=(\hat{\theta}_{2},\cdots,\hat{\theta}_{q})^{T}=(\Xi^{-1})_{-1}^{T}\hat{\Upsilon}. (2)

Since Proposition 1 implies EZ^i(θ1)=O(n1)E\hat{Z}_{i}(\theta_{1})=O(n^{-1}) if θ1\theta_{1} is the true value (see (A18) in the appendix), we define the nonparametric likelihood as

L(θ1)=maxpi{i=1npi|pi0,i=1npi=1,i=1npiZ^i(θ1)=0}L(\theta_{1})=\max_{p_{i}}\Bigg{\{}\prod_{i=1}^{n}p_{i}|p_{i}\geq 0,\sum_{i=1}^{n}p_{i}=1,\sum_{i=1}^{n}p_{i}\hat{Z}_{i}(\theta_{1})=0\Bigg{\}}

and the corresponding elr statistic as

elr(θ10)=L(θ10)maxθ10L(θ1).\textsc{elr}(\theta_{1}^{0})=\frac{L(\theta_{1}^{0})}{\max_{\theta_{1}\geq 0}L(\theta_{1})}. (3)

If the true value θ1=0\theta_{1}^{*}=0 (i.e., the null hypothesis under the case θ10=0\theta_{1}^{0}=0), the denominator in (3) would not be 1/nn1/n^{n} as usual owing to the boundary value issue, and thus the existing results are inapplicable. To derive the asymptotic distribution of the proposed test elr(θ10)\textsc{elr}(\theta_{1}^{0}), we assume the following condition similar to Condition 1.

Condition 6.

As nn\rightarrow\infty, P(0ch{Z1(θ10),,Zn(θ10)})1P(0\in ch\{Z_{1}(\theta_{1}^{0}),\cdots,Z_{n}(\theta_{1}^{0})\})\rightarrow 1, where Zi(θ10)Z_{i}(\theta_{1}^{0}) is defined as Z^i(θ10)\hat{Z}_{i}(\theta_{1}^{0}) with R^i\hat{R}_{i} replaced by RiR_{i}.

Under Conditions 5 and 6, we have the following theorem on the asymptotic distribution of the elr test under the null.

Theorem 2.

Let c^n(θ10)=ν^2n2(θ10)/ν^1n2(θ10)\hat{c}_{n}(\theta_{1}^{0})=\hat{\nu}_{2n}^{2}(\theta_{1}^{0})/\hat{\nu}_{1n}^{2}(\theta_{1}^{0}), where ν^1n2(θ10)\hat{\nu}_{1n}^{2}(\theta_{1}^{0}) is a consistent estimator of the asymptotic variance of n1/2i=1nZ^i(θ10)n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0}) and ν^2n2(θ10)=n1i=1nZ^i2(θ10)\hat{\nu}_{2n}^{2}(\theta_{1}^{0})=n^{-1}\sum_{i=1}^{n}\hat{Z}_{i}^{2}(\theta_{1}^{0}). If θ(1)R+d1\theta^{*}_{(1)}\in R_{+}^{d-1}, and Conditions 5 and 6 hold, as nn\rightarrow\infty, c^n(θ10)\hat{c}_{n}(\theta_{1}^{0}) {2logelr(θ10)}χ12\left\{-2\log\textsc{elr}(\theta_{1}^{0})\right\}\rightarrow\chi_{1}^{2} in distribution when θ10>0\theta_{1}^{0}>0, and c^n(0){2logelr(0)}U+2\hat{c}_{n}(0)\left\{-2\log\textsc{elr}(0)\right\}\rightarrow U_{+}^{2} in distribution, where UN(0,1)U\sim N(0,1) and U+=max(U,0)U_{+}=\max(U,0).

Although the elr statistic c^n(θ10)(2logelr(θ10))\hat{c}_{n}(\theta_{1}^{0})\left(-2\log\textsc{elr}(\theta_{1}^{0})\right) in Theorem 2 involves optimizations in the numerator and denominator, the following lemma shows that the statistic has an asymptotically equivalent expression that can be used to calculate the statistic efficiently.

Lemma 3.

If θ(1)R+d1\theta^{*}_{(1)}\in R_{+}^{d-1}, then under Conditions 5 and 6,

c^n(θ10){2logelr(θ10)}\displaystyle\hat{c}_{n}(\theta_{1}^{0})\left\{-2\log\textsc{elr}(\theta_{1}^{0})\right\}
=\displaystyle= {{n1/2i=1nZ^i(θ10)}2ν^1n2(θ10)+op(1), if θ10>0,{n1/2i=1nZ^i(θ10)}2ν^1n2(θ10)I(i=1nZ^i(θ10)0)+op(1), if θ10=0.\displaystyle\begin{cases}\frac{\left\{n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0})\right\}^{2}}{\hat{\nu}_{1n}^{2}(\theta_{1}^{0})}+o_{p}(1),&\text{ if }\quad\theta_{1}^{0}>0,\\ \frac{\left\{n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0})\right\}^{2}}{\hat{\nu}_{1n}^{2}(\theta_{1}^{0})}I(\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0})\geq 0)+o_{p}(1),&\text{ if }\quad\theta_{1}^{0}=0.\end{cases}

We next provide an estimator of the asymptotic variance of n1/2i=1nZ^i(θ10)n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0}). We rewrite Ξ\Xi as

Ξ=(E11E12E21E22)\Xi=\left(\begin{smallmatrix}E_{11}&E_{12}\\ E_{21}&E_{22}\end{smallmatrix}\right)

with E11E_{11} being a scalar. Let F=E221E21=(F1,,Fd1)TF=E_{22}^{-1}E_{21}=(F_{1},\cdots,F_{d-1})^{T} and α=1E12F/E11(0,1]\alpha=1-E_{12}F/E_{11}\in(0,1]. It can be verified that

i=1nZ^i(θ10)=i=1nD^i(θ10)=i=1nM^i(θ10),\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0})=\sum_{i=1}^{n}\hat{D}_{i}(\theta_{1}^{0})=\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0}), (4)

where

D^i(θ10)\displaystyle\hat{D}_{i}(\theta_{1}^{0}) =α1Φi1q=1d1FqΦiq+1,R^iθ10Φi1,\displaystyle=\alpha^{-1}\langle\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\hat{R}_{i}-\theta_{1}^{0}\Phi_{i1}\rangle,
M^i(θ10)\displaystyle\hat{M}_{i}(\theta_{1}^{0}) =α1Φi1q=1d1FqΦiq+1,R^iHi((θ10,θ^(1)T)T).\displaystyle=\alpha^{-1}\langle\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\hat{R}_{i}-H_{i}((\theta_{1}^{0},\hat{\theta}_{(1)}^{T})^{T})\rangle.

In addition, for iji\neq j,

cov(R^i,R^j)=\displaystyle\text{cov}({\hat{R}_{i}},{\hat{R}_{j}})= cov(δi+ϵ^i,δj+ϵ^j)=cov(δi,ϵ^j)+cov(ϵ^i,δj)+cov(ϵ^i,ϵ^j)=O(n2)\displaystyle\text{cov}({\delta_{i}}+{\hat{\epsilon}_{i}},{\delta_{j}}+{\hat{\epsilon}_{j}})=\text{cov}({\delta_{i}},{\hat{\epsilon}_{j}})+\text{cov}({\hat{\epsilon}_{i}},{\delta_{j}})+\text{cov}({\hat{\epsilon}_{i}},{\hat{\epsilon}_{j}})=O(n^{-2})

based on proposition 1. Therefore, D^i(θ10)(i=1,,n){\hat{D}_{i}(\theta_{1}^{0})}~{}(i=1,\cdots,n) are asymptotically independent with expectation E(D^i(θ10))=α1Φi1q=1d1FqΦiq+1,q=2dθqΦiq+O(n1)E(\hat{D}_{i}(\theta_{1}^{0}))=\alpha^{-1}\langle\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\sum_{q=2}^{d}\theta_{q}^{*}\Phi_{iq}\rangle+O(n^{-1}), while the expectations of Z^i(θ10)\hat{Z}_{i}(\theta_{1}^{0}) and M^i(θ10)\hat{M}_{i}(\theta_{1}^{0}) are O(n1)O(n^{-1}) (see (A18) in the appendix). We have

D^i(θ10)E(D^i(θ10))\displaystyle\hat{D}_{i}(\theta_{1}^{0})-E(\hat{D}_{i}(\theta_{1}^{0})) =α1Φi1q=1d1FqΦiq+1,R^iHi((θ10,(θ(1))T)T)+O(n1)\displaystyle=\alpha^{-1}\langle\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\hat{R}_{i}-H_{i}((\theta_{1}^{0},(\theta_{(1)}^{*})^{T})^{T})\rangle+O(n^{-1})
=M^i(θ10)+op(1).\displaystyle=\hat{M}_{i}(\theta_{1}^{0})+o_{p}(1).

Therefore,

var{n1/2i=1nZ^i(θ10)}=\displaystyle\text{var}\big{\{}n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0})\big{\}}= 1ni=1n{D^i(θ10)E(D^i(θ10))}2+op(1)\displaystyle\frac{1}{n}\sum_{i=1}^{n}\{\hat{D}_{i}(\theta_{1}^{0})-E(\hat{D}_{i}(\theta_{1}^{0}))\}^{2}+o_{p}(1)
=\displaystyle= 1ni=1nM^i(θ10)2+op(1),\displaystyle\frac{1}{n}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0})^{2}+o_{p}(1), (5)

which leads a consistent estimator of the variance of n1/2i=1nZ^i(θ10)n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0}) as

ν^1n2(θ10)=n1i=1nM^i(θ10)2.\hat{\nu}_{1n}^{2}(\theta_{1}^{0})=n^{-1}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0})^{2}.

4 Variance Component Analysis Over a Sequence of Responses

In some applications, we are interested in testing whether the variance components are all zero over a set of possibly correlated outcomes. One example of such applications is to test the variance components for the activity distribution based on wearable device data where we are interested in testing the variance component at each of the quantiles tt of the activity distribution. Extending model (1), we assume the following outcome model at level tt,

yi(t)=Xiβ(t)+ri(t),i=1,,n,y_{i}(t)=X_{i}\beta^{*}(t)+r_{i}(t),\quad i=1,\cdots,n, (6)

where ri(t)Rnir_{i}(t)\in R^{n_{i}} is a zero-mean random variable with variance Hi(θ(t))H_{i}(\theta^{*}(t)). We assume that Hi(θ(t))H_{i}(\theta^{*}(t)) has the same linear structure for each tt,

Hi(θ(t))=q=1dθq(t)Φiq,θ(t)={θ1(t),,θd(t)}T={θ1(t),θ(1)T(t)}T.H_{i}(\theta^{*}(t))=\sum_{q=1}^{d}\theta^{*}_{q}(t)\Phi_{iq},\quad\theta^{*}(t)=\{\theta^{*}_{1}(t),\cdots,\theta^{*}_{d}(t)\}^{T}=\{\theta^{*}_{1}(t),\theta^{*T}_{(1)}(t)\}^{T}.

We are interested in testing the null H0:θ1(t)θ10,t[t1,t2]H_{0}:\theta^{*}_{1}(t)\equiv\theta_{1}^{0},~{}t\in[t_{1},t_{2}], where [t1,t2][t_{1},t_{2}] is a pre-defined interval. We propose the following maximally selected empirical likelihood ratio statistic (gelr),

Γ=supt[t1,t2]c^n(θ10,t){2logelr(θ10,t)},\Gamma=\sup_{t\in[t_{1},t_{2}]}\hat{c}_{n}(\theta_{1}^{0},t)\left\{-2\log\textsc{elr}(\theta_{1}^{0},t)\right\}, (7)

where c^n(θ10,t){2logelr(θ10,t)}\hat{c}_{n}(\theta_{1}^{0},t)\left\{-2\log\textsc{elr}(\theta_{1}^{0},t)\right\} is the elrstatistic for the outcome at tt. It can be shown that Γ=supt[t1,t2]S(t)+op(1),\Gamma=\sup_{t\in[t_{1},t_{2}]}S(t)+o_{p}(1), with

S(t)={{n1/2i=1nZ^i(θ10,t)}2ν^1n2(θ10,t), if θ10>0,{n1/2i=1nZ^i(θ10,t)}2ν^1n2(θ10,t)I{i=1nZ^i(θ10,t)0}, if θ10=0,S(t)=\begin{cases}\frac{\{n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},t)\}^{2}}{\hat{\nu}_{1n}^{2}(\theta_{1}^{0},t)},&\text{ if }\theta_{1}^{0}>0,\\ \frac{\{n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},t)\}^{2}}{\hat{\nu}_{1n}^{2}(\theta_{1}^{0},t)}I\{\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},t)\geq 0\},&\text{ if }\theta_{1}^{0}=0,\end{cases}

where

Z^i(θ10,t)\displaystyle\hat{Z}_{i}(\theta_{1}^{0},t) =tr{Φi1(R^i(t)Φi1θ10q=2dθ^q(t)Φiq)},\displaystyle=\text{tr}\Bigg{\{}\Phi_{i1}\Bigg{(}\hat{R}_{i}(t)-\Phi_{i1}\theta_{1}^{0}-\sum_{q=2}^{d}\hat{\theta}_{q}(t)\Phi_{iq}\Bigg{)}\Bigg{\}},
ν^1n2(θ10,t)\displaystyle\hat{\nu}_{1n}^{2}(\theta_{1}^{0},t) =n1α2i=1nR^i(t)Hi((θ10,θ^(1)(t)T)T),Φi1q=1d1FqΦiq+12.\displaystyle=n^{-1}\alpha^{-2}\sum_{i=1}^{n}\Bigg{\langle}\hat{R}_{i}(t)-H_{i}((\theta_{1}^{0},\hat{\theta}_{(1)}(t)^{T})^{T}),\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1}\Bigg{\rangle}^{2}.

Assessment of the statistical significance of the statistic Γ\Gamma defined in (7) is challenging because of the dependence of Z^i(θ10,t)\hat{Z}_{i}(\theta_{1}^{0},t). We propose a simple way of evaluating its significance by perturbing the el statistic. Specifically, we apply (4) to rewrite i=1nZ^i(θ10,t)\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},t) as i=1nM^i(θ10,t)\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},t), where M^i(θ10,t)=α1Φi1q=1d1FqΦiq+1,R^i(t)Hi((θ10,θ^(1)T(t))T)\hat{M}_{i}(\theta_{1}^{0},t)=\alpha^{-1}\left\langle\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\hat{R}_{i}(t)-H_{i}((\theta_{1}^{0},\hat{\theta}_{(1)}^{T}(t))^{T})\right\rangle. We can generate the null distribution of Γ\Gamma by perturbing the test statistic Γ(g)\Gamma^{(g)}. Specifically, for each g(g=1,,G)g~{}(g=1,\cdots,G), we generate ξi(g)\xi_{i}^{(g)} from i.i.d. standard normal distribution and define

S(g)(t)={{n1/2i=1nM^i(θ10,t)ξi(g)}2ν^1n2(θ10,t), if θ10>0,{n1/2i=1nM^i(θ10,t)ξi(g)}2ν^1n2(θ10,t)I{i=1nM^i(θ10,t)ξi(g)0}, if θ10=0.S^{(g)}(t)=\begin{cases}\frac{\{n^{-1/2}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)}\}^{2}}{\hat{\nu}_{1n}^{2}(\theta_{1}^{0},t)},&\text{ if }\theta_{1}^{0}>0,\\ \frac{\{n^{-1/2}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)}\}^{2}}{\hat{\nu}_{1n}^{2}(\theta_{1}^{0},t)}I\{\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)}\geq 0\},&\text{ if }\theta_{1}^{0}=0.\end{cases}

Define the corresponding perturbed test statistic as Γ(g)=supt[t1,t2]S(g)(t)\Gamma^{(g)}=\sup_{t\in[t_{1},t_{2}]}S^{(g)}(t). The following Proportion 2 shows that the perturbed test statistics have the same distribution as the original test statistic under the null. Therefore, the pp-value of Γ\Gamma can be approximated by g=1GI(Γ(g)>Γ)/G\sum_{g=1}^{G}I(\Gamma^{(g)}>\Gamma)/G.

Proposition 2.

M^i(θ10,t)\hat{M}_{i}(\theta_{1}^{0},t) satisfies

  1. (i)

    E{n1/2i=1nM^i(θ10,t)ξi(g)}E{n1/2i=1nZ^i(θ10,t)}=o(1)E\{n^{-1/2}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)}\}-E\{n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},t)\}=o(1);

  2. (ii)

    var{n1/2i=1nM^i(θ10,t)ξi(g)}var{n1/2i=1nZ^i(θ10,t)}=o(1)\text{var}\{n^{-1/2}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)}\}-\text{var}\{n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},t)\}=o(1);

  3. (iii)

    cov{n1/2i=1nM^i(θ10,s)ξi(g),n1/2j=1nM^j(θ10,t)ξj(g)}cov{n1/2i=1nZ^i(θ10,s),\text{cov}\{n^{-1/2}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},s)\xi_{i}^{(g)},n^{-1/2}\sum_{j=1}^{n}\hat{M}_{j}(\theta_{1}^{0},t)\xi_{j}^{(g)}\}-\text{cov}\{n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},s),
    n1/2j=1nZ^j(θ10,t)}=o(1)n^{-1/2}\sum_{j=1}^{n}\hat{Z}_{j}(\theta_{1}^{0},t)\}=o(1).

5 Simulation studies

5.1 Data generation

We examine the performance of the proposed empirical likelihood ratio tests for variance components and compare the results with the standard likelihood ratio (lr) test assuming Gaussian random effects and Gaussian errors. To mimic the twin design in the heritability analysis of wearable device data that we analyze next, we simulate data on a monozygotic or dizygotic twin pair. For the iith twin, let ni=ni1+ni2n_{i}=n_{i1}+n_{i2}, where ni1n_{i1} and ni2n_{i2} are the numbers of repeated measures for the twin. In wearable device data, yi(t)y_{i}(t) represents the ttth quantile of the activity distributions over nin_{i} days. The data are generated from the model:

yi(t)=Xiβ(t)+Tiai(t)+τi(t),i=1,,n,t=s1,s2,,sm,y_{i}(t)=X_{i}\beta(t)+T_{i}a_{i}(t)+\tau_{i}(t),\quad i=1,\cdots,n,\quad t=s_{1},s_{2},\cdots,s_{m}, (8)

where Ti=blkdiag{𝟏ni1,𝟏ni2}T_{i}=\text{blkdiag}\{\mathbf{1}_{n_{i1}},\mathbf{1}_{n_{i2}}\}, ai(t)a_{i}(t) is a random intercept, and τi(t)\tau_{i}(t) denotes zero-mean noise with variance σM2(t)𝐈ni\sigma_{M}^{2}(t)\mathbf{I}_{n_{i}}. Here, ai(t)a_{i}(t) is assumed as the sum of additive genetic effect gi(t)g_{i}(t), common environment ci(t)c_{i}(t), and unique subject-specific environment ei(t)e_{i}(t), i.e.,

ai(t)=gi(t)+ci(t)+ei(t),a_{i}(t)=g_{i}(t)+c_{i}(t)+e_{i}(t),

where gi(t),ci(t),ei(t)g_{i}(t),c_{i}(t),e_{i}(t) are independent zero-mean random variables with variance-covariance σA2(t)Ki\sigma_{A}^{2}(t)K_{i}, σC2(t)Λi\sigma_{C}^{2}(t)\Lambda_{i}, and σE2(t)𝐈2\sigma_{E}^{2}(t)\mathbf{I}_{2}, respectively. The variance components σA2(t),σC2(t)\sigma_{A}^{2}(t),\sigma_{C}^{2}(t), and σE2(t)\sigma_{E}^{2}(t) represent the additive genetic variance, common environmental variance, and unique environmental variance, respectively. For the iith twin, KiK_{i} is a genetic similarity matrix with

Ki=(1111) for monozygotic twin, Ki=(10.50.51) for dizygotic twin,K_{i}=\begin{pmatrix}1&1\\ 1&1\end{pmatrix}\mbox{ for monozygotic twin},\mbox{ }K_{i}=\begin{pmatrix}1&0.5\\ 0.5&1\end{pmatrix}\mbox{ for dizygotic twin},

and Λi\Lambda_{i} quantifies shared environment between the twin pair with

Λi=(1111).\Lambda_{i}=\begin{pmatrix}1&1\\ 1&1\end{pmatrix}.

Under this model, we have

Hi(θ(t))\displaystyle H_{i}(\theta^{*}(t)) =σA2(t)TiKiTiT+σC2(t)TiΛiTiT+σE2(t)TiTiT+σM2(t)𝐈ni,\displaystyle=\sigma_{A}^{2}(t)T_{i}K_{i}T_{i}^{T}+\sigma_{C}^{2}(t)T_{i}\Lambda_{i}T_{i}^{T}+\sigma_{E}^{2}(t)T_{i}T_{i}^{T}+\sigma_{M}^{2}(t)\mathbf{I}_{n_{i}},
θ(t)\displaystyle~{}\theta^{*}(t) =(σA2(t),σC2(t),σE2(t),σM2(t))T.\displaystyle=(\sigma_{A}^{2}(t),\sigma_{C}^{2}(t),\sigma_{E}^{2}(t),\sigma_{M}^{2}(t))^{T}.

We sample nik(i=1,,n;k=1,2)n_{ik}~{}(i=1,\cdots,n;k=1,2) from {3,4,5,6,7}\{3,4,5,6,7\} with equal probability 1/51/5. We set n=100n=100, among which there are 50 monozygotic twin families and 50 dizygotic twin families, and t=0.01,0.03,0.05,,0.99t=0.01,0.03,0.05,\cdots,0.99. Denote by xijTx_{ij}^{T} the jjth row of XiX_{i}. Let xij=(1,xij1,xij2)Tx_{ij}=(1,x_{ij1},x_{ij2})^{T} with xij1N(0,1)x_{ij1}\sim N(0,1) and xij2N(2,1)x_{ij2}\sim N(2,1). Moreover, we set β(t)=(β1(t),β2(t),β3(t))T\beta(t)=(\beta_{1}(t),\beta_{2}(t),\beta_{3}(t))^{T} where β1(t)\beta_{1}(t) and β3(t)\beta_{3}(t) are the quantile functions of N(1,6)N(1,6) and N(1,9)N(1,9), respectively, and β2(t)=0\beta_{2}(t)=0. Let

σA2(t)=l=1Naλla(t)(ψla(t))2,σC2(t)=0,σE2(t)=l=1Neλle(t)(ψle(t))2,\sigma_{A}^{2}(t)=\sum_{l=1}^{N_{a}}\lambda_{l}^{a}(t)(\psi_{l}^{a}(t))^{2},\quad\sigma_{C}^{2}(t)=0,\quad\sigma_{E}^{2}(t)=\sum_{l=1}^{N_{e}}\lambda_{l}^{e}(t)(\psi_{l}^{e}(t))^{2},

where Na=Ne=2N_{a}=N_{e}=2, (λ1a(t),λ2a(t))=Ca(t)(0.5,1)(\lambda_{1}^{a}(t),\lambda_{2}^{a}(t))=C_{a}(t)(0.5,1), (λ1e(t),λ2e(t))=Ce(t)(0.6,0.9)(\lambda_{1}^{e}(t),\lambda_{2}^{e}(t))=C_{e}(t)(0.6,0.9), ψ1a(t)=ψ2e(t)=2sin(2πt)\psi_{1}^{a}(t)=\psi_{2}^{e}(t)=\sqrt{2}\sin(2\pi t), ψ2a(t)=ψ1e(t)=2cos(2πt)\psi_{2}^{a}(t)=\psi_{1}^{e}(t)=\sqrt{2}\cos(2\pi t).

5.2 elr test for single variance component

For each t{0.01,0.03,0.05,,0.99}t\in\{0.01,0.03,0.05,\cdots,0.99\}, we test the null hypothesis H0:σA2(t)=0H_{0}:\sigma_{A}^{2}(t)=0. We consider the following two types of distributions for gi(t),ci(t),ei(t)g_{i}(t),c_{i}(t),e_{i}(t) and τi(t)\tau_{i}(t):

  1. (i)

    multivariate normal distribution, gi(t)iid𝒩(𝟎,σA2(t)Ki)g_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},\sigma_{A}^{2}(t)K_{i}), ci(t)iid𝒩(𝟎,σC2(t)Λi)c_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},\sigma_{C}^{2}(t)\Lambda_{i}), ei(t)iid𝒩(𝟎,σE2(t)𝐈i)e_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},\sigma_{E}^{2}(t)\mathbf{I}_{i}), τi(t)iid𝒩(𝟎,0.3𝐈ni)\tau_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},0.3\mathbf{I}_{n_{i}});

  2. (ii)

    multivariate tt distribution, gi(t)iidt3(𝟎,σA2(t)Ki/3)g_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}t_{3}(\mathbf{0},\sigma_{A}^{2}(t)K_{i}/3), ci(t)iidt3(𝟎,σC2(t)Λi/3)c_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}t_{3}(\mathbf{0},\sigma_{C}^{2}(t)\Lambda_{i}/3), ei(t)iide_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}
    t3(𝟎,σE2(t)𝐈i/3)t_{3}(\mathbf{0},\sigma_{E}^{2}(t)\mathbf{I}_{i}/3), τi(t)iidt3(𝟎,0.1𝐈ni)\tau_{i}(t)\stackrel{{\scriptstyle iid}}{{\sim}}t_{3}(\mathbf{0},0.1\mathbf{I}_{n_{i}}).

Refer to caption
(a) normal random effects and errors.
Refer to caption
(b) tt-distributed random effects and errors.
Figure 1: Type 1 error for each given value of tt. Top: normal random effects and errors; bottom: tt-distributed random effects and errors. elr: el ratio test with the lest-square estimate of β\beta^{*}; lr : likelihood ratio test under the normal random-effects assumptions.

Denote by elr the proposed empirical likelihood ratio test with unknown β(t)\beta^{*}(t). We first consider the type 1 error for each given value of tt. We consider the setting Ca(t)=0C_{a}(t)=0 and Ce(t)=0.1C_{e}(t)=0.1 with random errors generated both from a normal and a tt distribution. We repeat the simulations 500 times. Figure 1 gives the results of type 1 errors for different values of tt. We see all the methods perform well under the normal errors. However, lr shows inflated type 1 errors when the error follows a long-tailed tt distribution.

To evaluate the power of the proposed tests, we consider the model with Ca(t)=0.1C_{a}(t)=0.1 and Ce(t)=0.1C_{e}(t)=0.1. We calculate the empirical power of the proposed test at 0.05 level for different values of tt and present the results in Figure 2. The proposed method exhibits similar power as the lr test under the normal error. When the error follows a tt distribution, we do not report the result of the lr test because of its inflated type 1 error as shown in Figure 1 (b), and the elr test does not lose much power.

Refer to caption

(a) normal random effects and errors.

Refer to caption

(b) tt-distributed random effects and errors.

Figure 2: Empirical power for each given value of tt. Top: normal random effects and errors; bottom: tt-distributed random effects and errors. elr: el ratio test with the least-square estimate of β\beta^{*}; lr : likelihood ratio test under the normal random-effects assumptions.

5.3 elr test for variance components over an interval

To evaluate the proposed tests for variance components in the case of correlated outcomes over an interval of tt, we generate data as follows. Let gi(t)=σA(t)ζai,ci(t)=σC(t)ζcig_{i}(t)=\sigma_{A}(t)\zeta_{ai},c_{i}(t)=\sigma_{C}(t)\zeta_{ci}, and ei(t)=σE(t)ζeie_{i}(t)=\sigma_{E}(t)\zeta_{ei}. Let τi(t)=σM(t)ζτi\tau_{i}(t)=\sigma_{M}(t)\zeta_{\tau i}, where σM2(t)=l=1Nmλlm(t)(ψlm(t))2\sigma_{M}^{2}(t)=\sum_{l=1}^{N_{m}}\lambda_{l}^{m}(t)(\psi_{l}^{m}(t))^{2} with Nm=2N_{m}=2, (λ1m(t),λ2m(t))=Cm(t)(0.5,1)(\lambda_{1}^{m}(t),\lambda_{2}^{m}(t))=C_{m}(t)(0.5,1), ψ1m(t)=2cos(2πt)\psi_{1}^{m}(t)=\sqrt{2}\cos(2\pi t), and ψ2m(t)=2sin(2πt)\psi_{2}^{m}(t)=\sqrt{2}\sin(2\pi t). We consider two types of distributions for ζai,ζci,ζei,ζτi\zeta_{ai},\zeta_{ci},\zeta_{ei},\zeta_{\tau i}:

  1. (i)

    multivariate normal distribution, ζaiiid𝒩(𝟎,Ki)\zeta_{ai}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},K_{i}), ζciiid𝒩(𝟎,Λi)\zeta_{ci}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},\Lambda_{i}), ζeiiid𝒩(𝟎,𝐈2)\zeta_{ei}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},\mathbf{I}_{2}), ζτiiid𝒩(𝟎,𝐈2)\zeta_{\tau i}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(\mathbf{0},\mathbf{I}_{2});

  2. (ii)

    multivariate tt distribution, ζaiiidt3(𝟎,Ki/3)\zeta_{ai}\stackrel{{\scriptstyle iid}}{{\sim}}t_{3}(\mathbf{0},K_{i}/3), ζciiidt3(𝟎,Λi/3)\zeta_{ci}\stackrel{{\scriptstyle iid}}{{\sim}}t_{3}(\mathbf{0},\Lambda_{i}/3), ζeiiidt3(𝟎,𝐈2/3)\zeta_{ei}\stackrel{{\scriptstyle iid}}{{\sim}}t_{3}(\mathbf{0},\mathbf{I}_{2}/3), ζτiiidt3(𝟎,𝐈2/3)\zeta_{\tau i}\stackrel{{\scriptstyle iid}}{{\sim}}t_{3}(\mathbf{0},\mathbf{I}_{2}/3).

Refer to caption
Figure 3: Empirical power for the global test H0:σA2(t)0,t[0,1]H_{0}:\sigma_{A}^{2}(t)\equiv 0,~{}t\in[0,1] at different values of c0c_{0}.

Denote by gELR the proposed global empirical likelihood ratio test with unknown β(t)\beta^{*}(t). We first consider the global test H0:σA2(t)0,t[0,1]H_{0}:\sigma_{A}^{2}(t)\equiv 0,~{}t\in[0,1]. Let Ca(t)=c0I(t=0.49)C_{a}(t)=c_{0}I(t=0.49), Ce(t)=0.1C_{e}(t)=0.1, and Cm(t)=0.08C_{m}(t)=0.08, where I()I(\cdot) is an indicator function. We consider different choices of the signal size c0c_{0} by setting c0=0,0.02,0.04,0.06,0.08,0.1c_{0}=0,0.02,0.04,0.06,0.08,0.1, and generate 500 datasets for each setting. Figure 3 presents the empirical power of gELR at 0.05 significance level under different distributions of random errors and different c0c_{0}. As expected, the empirical power of rejecting the null hypothesis increases with the signal size. Compared to the results under the multivariate tt distribution, the proposed test gELR has higher power when data are normally distributed.

To further evaluate the type 1 error and the power, we consider models with Ca(t)=0.08I(t{0.47,0.49,0.51,0.53})C_{a}(t)=0.08I(t\in\{0.47,0.49,0.51,0.53\}), Ce(t)=0.1C_{e}(t)=0.1, and Cm(t)=0.08C_{m}(t)=0.08. We consider to test each of the candidate intervals of lengths {3,4,5,6}\{3,4,5,6\} and denote them by scan3, scan4, scan5, and scan6, respectively. Let 𝒥k\mathcal{J}_{k} be the set of candidate intervals under the scanning length kk (k=3,4,5,6k=3,4,5,6) and let 𝒥=k=36𝒥k\mathcal{J}=\cup_{k=3}^{6}\mathcal{J}_{k} be the set of all candidate intervals. For each candidate interval L𝒥L\in\mathcal{J}, we test the null hypothesis H0:σA2(t)0,tLH_{0}:\sigma_{A}^{2}(t)\equiv 0,~{}t\in L. The signal in the interval LL is significant if

h(ΓL)=ΓLΓ¯Lg=1G(ΓL(g)Γ¯L)2/(G1)>2log|𝒥|,h(\Gamma_{L})=\frac{\Gamma_{L}-\bar{\Gamma}_{L}}{\sqrt{\sum_{g=1}^{G}(\Gamma_{L}^{(g)}-\bar{\Gamma}_{L})^{2}/(G-1)}}>\sqrt{2\log|\mathcal{J}|},

where Γ¯L=(g=1GΓL(g))/G\bar{\Gamma}_{L}=(\sum_{g=1}^{G}\Gamma_{L}^{(g)})/G with G=1000G=1000. The threshold 2log|𝒥|\sqrt{2\log|\mathcal{J}|} is selected based on the extreme value distribution of |𝒥||\mathcal{J}| normal random variables.

Under each type of error distributions, 500 datasets are generated. For the global test under the candidate interval L={t1,t1+0.02,,t2}L=\{t_{1},t_{1}+0.02,\cdots,t_{2}\}, we mark its empirical power at (t1+t2)/2(t_{1}+t_{2})/2. The results are shown in Figure 4. The proposed global test gELR exhibits high power if the interval involves at least one nonzero time points and show almost no power otherwise.

Refer to caption Refer to caption
Figure 4: Empirical power of testing zero variance component in a given interval of length of 3,4,5 and 6, where the true non-zero variance compnents are at {0.47, 0.49, 0.51, 0.53}. Left: normal random effect and error; right: tt-radnom effect and error.

6 Application to genetic heritability analysis of physical activity distribution

6.1 Description of the data

We apply the methods to data set of the Australian Twin study, which includes 366 healthy twins, 151 of them are monozygotic twins, and 215 are dizygotic twins. The participants wore actigraphy to track their physical activities for no more than 14 days. The minute-to-minute activity counts derived from actigraphy were collected in a 1440-dimensional vector per day. Since we are interested in inference of the heritability of the activity distributions, we obtain the empirical quantiles of activity counts at different quantiles, t=1/144,2/144,,144/144t=1/144,~{}2/144,\cdots,144/144. Specifically, for the jjth measurement (day) from the kkth person in the iith twin family, the raw data ξikj=(ξikj1,,ξikj1440)T\xi_{ikj}=(\xi_{ikj1},\cdots,\xi_{ikj1440})^{T} from the wearable device are transformed by using

ξ~ikj=log(9250ξikj+1),i=1,,n;k=1,2;j=1,,nik.\tilde{\xi}_{ikj}=\log(9250\xi_{ikj}+1),\quad i=1,\cdots,n;~{}k=1,2;~{}j=1,\cdots,n_{ik}.

For the kkth person in the iith twin family, the jjth repeated measure of tt-quantile of activity counts is obtained as

yikj(t)=ξ~ikj[1440t],t=1/144,2/144,,144/144,y_{ikj}(t)=\tilde{\xi}_{ikj}^{[1440\cdot t]},\quad t=1/144,~{}2/144,\cdots,144/144,

where ξ~ikj[s]\tilde{\xi}_{ikj}^{[s]} denotes the ssth order statistic of ξ~ikj\tilde{\xi}_{ikj}.

The covariate xikjx_{ikj} includes gender, age, BMI, and indicator of weekend, i.e., xikj=(1,Gender,Age,x_{ikj}=(1,\text{Gender},\text{Age}, BMI,Weekend)T\text{BMI},\text{Weekend})^{T}. Let yi(t)=(yi11(t),,yi1ni1(t),yi21(t),,yi2ni2(t))Ty_{i}(t)=(y_{i11}(t),\cdots,y_{i1n_{i1}}(t),y_{i21}(t),\cdots,y_{i2n_{i2}}(t))^{T} and Xi=(xi11T,,xi1ni1T,X_{i}=(x_{i11}^{T},\cdots,x_{i1n_{i1}}^{T}, xi21T,,xi2ni2T)Tx_{i21}^{T},\cdots,x_{i2n_{i2}}^{T})^{T}. Let y(t)=(y1T(t),,ynT(t))Ty(t)=(y_{1}^{T}(t),\cdots,y_{n}^{T}(t))^{T}. For each tt, we remove the outliers of y(t)y(t) defined as values that are more than 1.5 times the inter-quantile range above the upper quantile or below the lower quantile. After removing all outliers and removing missing data, we have n=149n=149 twin families including 63 monozygotic twin families and 86 dizygotic twin families, and the total number of observations is 3,489.

6.2 Effects of Gender, Age, BMI, Weekend on Activity Profiles

We first examine the associations between the covariates including gender, age, BMI, and weekend vs weekday and the overall activity distribution. For each of the four covariates and each of the tt values, we obtain the el estimator by solving the estimating equations i=1nϕi(β)=0\sum_{i=1}^{n}\phi_{i}(\beta)=0, and apply Theorem 1 to construct the confidence interval {β0:2logelr0(β0)χ12(1α)}\{\beta_{0}:-2\log\textsc{elr}_{0}(\beta_{0})\leq\chi_{1}^{2}(1-\alpha)\}, where χ12(1α)\chi_{1}^{2}(1-\alpha) is the (1α)(1-\alpha) quantile of the χ12\chi_{1}^{2} distribution. The first column of Figure 5 shows the estimated regression coefficient for each of the tt values and its point-wise 95% confidence intervals using the el method.

Refer to caption
Figure 5: Estimate of β(t)\beta^{*}(t), its 95% confidence interval (left panel) and the log10(p-\log_{10}(p-value) (right panel) for gender, age, BMI, and weekend for each of quantile tt values. The black dashed horizontal line is the nominal threshold log10(0.05)-\log_{10}(0.05), and the black dotted horizontal line is the Bonferroni corrected threshold log10(0.05/144)-\log_{10}(0.05/144).

We then test whether there is any difference in activity profiles between individuals of different gender, age, and BMI and whether the activity profiles are different between weekdays and weekends. Specifically, we consider testing such differences at each of the quantile tt. To test H0:βl(t)=0,l{Gender, Age, BMI, Weekend}H_{0}:\beta_{l}(t)=0,~{}l\in\{\text{Gender, Age, BMI, Weekend}\}, we apply the empirical likelihood ratio test in Section 2 and the standard likelihood ratio (lr) test assuming normal random effects, and we obtain the pp-value for each of the tt values. The second column of Figure 5 shows the pp-values for each tt and for each of these four covariates. At the nominal pp-value of 0.05, the elr test shows that there is an gender effects when the activity counts are small (i.e., small tt). In contrast, the standard lr test only shows such significance in a smaller interval from 0.23 to 0.42. For age, the elr shows a significant effect for the large activity counts region (i.e., large tt). Both the elr and lr tests do not reject the null hypothesis that there is no effect of BMI, while the effects of weekend are statistically significant under almost the whole region of tt.

6.3 Analysis of heritability of the activity distribution

We then address the question whether the activity distribution is heritable, where the distribution is summarized as the quantiles. This is equivalent to test the null hypothesis H0:σA2(t)=0,t[0,1]H_{0}:\sigma_{A}^{2}(t)=0,~{}t\in[0,1]. For each quantile tt, we first estimate the fixed effects using the least-square estimate and then apply the proposed elr to test the null hypothesis H0:σA2(t)=0H_{0}:\sigma_{A}^{2}(t)=0 and to compare the results with the lr method. Figure 6 (a) gives the pp-values at different quantiles tt. It shows that the test H0:σA2(t)=0H_{0}:\sigma_{A}^{2}(t)=0 is rejected for t[0.375,0.958]t\in[0.375,0.958] based on the elr test and t[0.472,0.931]t\in[0.472,0.931] using lr at the nominal 0.05 significance level. However, if we use the Bonferroni correction for multiple testing, only the proposed elr test identifies significant heritability for the quantiles between 0.375 and 0.514. The pp-value of global test H0:σA2(t)0,t[0,1]H_{0}:\sigma_{A}^{2}(t)\equiv 0,~{}t\in[0,1] is 0 when applying the proposed elr with 1000 permutations. Overall, our analysis shows that the activity distribution is heritable, especially in the quantile range from 0.375 to 0.514.

Refer to caption
(a) Results under preprocessing 1
Refer to caption
(b) Results under preprocessing 2
Figure 6: The log10(p-value)-\log_{10}(p\text{-value}) of testing heritability H0:σA2(t)=0H_{0}:\sigma_{A}^{2}(t)=0 for different quantile values tt. The black dashed horizontal line is the nominal threshold log10(0.05)-\log_{10}(0.05), and the black dotted horizontal line is the Bonferroni corrected threshold log10(0.05/144)-\log_{10}(0.05/144). Left: results under preprocessing 1; right: results under preprocessing 2.

6.4 Sensitivity analysis of heritability of the activity distribution

To examine whether our previous preprocessing steps affect the analysis of heritability, we consider an alternative approach to remove the outliers. For each tt, we remove the outliers of y(t)y(t) defined as the values that are greater than 3 standard deviations from its median. After removing all the outliers and missing data, we have n=152n=152 twin families including 64 monozygotic twin families and 88 dizygotic twin families, and the total number of observations is 4,190.

Under this preprocessing method, the log10(p-\log_{10}(p-value) of testing heritability is provided in Figure 6 (b), which shows similar results as Figure 6 (a). The test H0:σA2(t)=0H_{0}:\sigma_{A}^{2}(t)=0 is rejected for t[0.285,0.979]t\in[0.285,0.979] when using the elrtest and t[0.514,1]t\in[0.514,1] using the LR test at the nominal 0.05 significance level. If we adopt the Bonferroni correction for multiple testing, only the proposed elrtest identified significant heritability under the quantiles between 0.396 and 0.576. The pp-value of the global test H0:σA2(t)0,t[0,1]H_{0}:\sigma_{A}^{2}(t)\equiv 0,~{}t\in[0,1] is 0 when applying the proposed elrwith 1000 permutations.

7 Discussion

In this paper, we have developed an empirical likelihood method for making inference of the variance components in general linear mixed-effects models. The proposed empirical likelihood ratio test statistic can be applied to a large set of related outcomes such as different quantiles of the activity distribution when we analyze the wearable device data sets. Simulation studies show that the proposed methods control type 1 error much better than the likelihood ratio method when the normality assumptions do not hold.

To address the unknown nuisance variance components, we assume its true value θ(1)\theta^{*}_{(1)} being positive and thus as nn\rightarrow\infty, (2) provides unbiased positive estimates with probability 1. When applying the proposed methods to the real data, we note that (2) may provide negative estimators at some quantile tt. To solve this problem, we first test whether the nuisance variance component (for example, σC2(t)\sigma_{C}^{2}(t) or σE2(t)\sigma_{E}^{2}(t)) is zero at these quantile points. If the null hypothesis is not rejected, we omit the nuisance variance components in the model and then apply the proposed elrtest for the components of interest.

ACKNOWLEDGMENT

This research was supported by the Intramural Research Program of the National Institute of Mental Health through grant ZIA MH002954-04 [Motor Activity Research Consortium for Health (mMARCH)]. We thank Dr. Hickie and Dr. Martin for sharing the Australian twin study data as part of the mMARCH network and the Genetic Epidemiology Research Branch at National Institute of Mental Health for processing the accelerometry data.

APPENDIX

Appendix A Proofs and complements

A.1 Proof of Theorem 2

To prove Theorem 2, we first consider the setting with known β\beta^{*}. We define Zi(θ1)Z_{i}(\theta_{1}), L1(θ1)L_{1}(\theta_{1}), elr1(θ10)\textsc{elr}_{1}(\theta_{1}^{0}), and ν~1n2(θ10)\tilde{\nu}_{1n}^{2}(\theta_{1}^{0}) in the same way as Z^i(θ1)\hat{Z}_{i}(\theta_{1}), L(θ1)L(\theta_{1}), elr(θ10)\textsc{elr}(\theta_{1}^{0}), and ν^1n2(θ10)\hat{\nu}_{1n}^{2}(\theta_{1}^{0}), respectively, with R^i\hat{R}_{i} replaced by RiR_{i}. Under Conditions 5 and 6, we derive the asymptotic distribution of elr1(θ10)\textsc{elr}_{1}(\theta_{1}^{0}) in the following theorem.

Theorem A1.

Let c~n(θ10)=ν~2n2(θ10)/ν~1n2(θ10)\tilde{c}_{n}(\theta_{1}^{0})=\tilde{\nu}_{2n}^{2}(\theta_{1}^{0})/\tilde{\nu}_{1n}^{2}(\theta_{1}^{0}), where ν~1n2(θ10)\tilde{\nu}_{1n}^{2}(\theta_{1}^{0}) is a consistent estimator of the asymptotic variance of n1/2i=1nZi(θ10)n^{-1/2}\sum_{i=1}^{n}Z_{i}(\theta_{1}^{0}) and ν~2n2(θ10)=n1i=1nZi2(θ10)\tilde{\nu}_{2n}^{2}(\theta_{1}^{0})=n^{-1}\sum_{i=1}^{n}Z_{i}^{2}(\theta_{1}^{0}). If θ(1)R+d1\theta^{*}_{(1)}\in R_{+}^{d-1}, then under Conditions 6 and 5, as nn\rightarrow\infty, c~n(θ10)(2logelr1(θ10))χ12\tilde{c}_{n}(\theta_{1}^{0})\big{(}-2\log\textsc{elr}_{1}(\theta_{1}^{0})\big{)}\rightarrow\chi_{1}^{2} in distribution when θ10>0\theta_{1}^{0}>0, and c~n(0)(2logelr1(0))U+2\tilde{c}_{n}(0)(-2\log\textsc{elr}_{1}(0))\rightarrow U_{+}^{2} in distribution, where UN(0,1)U\sim N(0,1) and U+=max(U,0)U_{+}=\max(U,0).

Proof.

For simplicity, we sometimes use ZiZ_{i} to denote Zi(θ1)Z_{i}(\theta_{1}) when there is no confusion. Using the method of Lagrange multipliers, let

=i=1nlogpi+κ(i=1npi1)+λ0i=1npiZi.\mathcal{L}=-\sum_{i=1}^{n}\log p_{i}+\kappa(\sum_{i=1}^{n}p_{i}-1)+\lambda_{0}\sum_{i=1}^{n}p_{i}Z_{i}.

Since

pi=1pi+κ+λ0Zi=0,\frac{\partial\mathcal{L}}{\partial p_{i}}=-\frac{1}{p_{i}}+\kappa+\lambda_{0}Z_{i}=0,

we have

pi=1κ+λ0Zi and κ=n.p_{i}=\frac{1}{\kappa+\lambda_{0}Z_{i}}\quad\text{ and }\quad\kappa=n. (A9)

Plugging λ0=nλ\lambda_{0}=n\lambda into (A9), we obtain

pi=1n(1+λZi).p_{i}=\frac{1}{n(1+\lambda Z_{i})}. (A10)

Since

0=i=1npiZi=i=1nZin(1+λZi),0=\sum_{i=1}^{n}p_{i}Z_{i}=\sum_{i=1}^{n}\frac{Z_{i}}{n(1+\lambda Z_{i})}, (A11)

under Condition 5, one can show that

λ=(i=1nZi2)1i=1nZi+op(n1/2) by Taylor expansion.\lambda=\big{(}\sum_{i=1}^{n}Z_{i}^{2}\big{)}^{-1}{\sum_{i=1}^{n}Z_{i}}+o_{p}(n^{-1/2})\text{ by Taylor expansion}.

Let W1(θ1)=nnL1(θ1)W_{1}(\theta_{1})=n^{n}L_{1}(\theta_{1}). We have

2log(W1(θ1))\displaystyle-2\log(W_{1}(\theta_{1})) =2i=1n(logpi+logn)=2i=1nlog(1+λZi)\displaystyle=-2\sum_{i=1}^{n}(\log p_{i}+\log n)=2\sum_{i=1}^{n}\log(1+\lambda Z_{i})
=2i=1n(λZi12(λZi)2)+op(1) (by Taylor expansion)\displaystyle=2\sum_{i=1}^{n}\big{(}\lambda Z_{i}-\frac{1}{2}(\lambda Z_{i})^{2}\big{)}+o_{p}(1)\text{ (by Taylor expansion)}
=(i=1nZi)2(i=1nZi2)1+op(1).\displaystyle=\big{(}\sum_{i=1}^{n}Z_{i}\big{)}^{2}\big{(}\sum_{i=1}^{n}Z_{i}^{2}\big{)}^{-1}+o_{p}(1). (A12)
  1. (1)

    If θ10>0\theta_{1}^{0}>0, then Lemma A2 implies

    c~n(θ10)(2logL1(θ10)maxθ10L1(θ1))=c~n(θ10)(2logW1(θ10))\displaystyle\tilde{c}_{n}(\theta_{1}^{0})\big{(}-2\log\frac{L_{1}(\theta_{1}^{0})}{\max_{\theta_{1}\geq 0}L_{1}(\theta_{1})}\big{)}=\tilde{c}_{n}(\theta_{1}^{0})\big{(}-2\log W_{1}(\theta_{1}^{0})\big{)}
    =\displaystyle= (n1/2i=1nZi(θ10))2/ν~1n2(θ10)+op(1)\displaystyle\big{(}n^{-1/2}\sum_{i=1}^{n}Z_{i}(\theta_{1}^{0})\big{)}^{2}/\tilde{\nu}_{1n}^{2}(\theta_{1}^{0})+o_{p}(1)
    =\displaystyle= (n1/2α1i=1nΦi1q=1d1FqΦiq+1,Riθ10Φi1)2n1α2i=1nRiHi((θ10,θ~(1)T)T),Φi1q=1d1FqΦiq+12+op(1).\displaystyle\frac{\big{(}n^{-1/2}\alpha^{-1}\sum_{i=1}^{n}\big{\langle}\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},R_{i}-\theta_{1}^{0}\Phi_{i1}\big{\rangle}\big{)}^{2}}{n^{-1}\alpha^{-2}\sum_{i=1}^{n}\big{\langle}R_{i}-H_{i}((\theta_{1}^{0},\tilde{\theta}_{(1)}^{T})^{T}),\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1}\big{\rangle}^{2}}+o_{p}(1).

    Since

    n1α2i=1nRiHi((θ10,θ~(1)T)T),Φi1q=1d1FqΦiq+12\displaystyle n^{-1}\alpha^{-2}\sum_{i=1}^{n}\big{\langle}R_{i}-H_{i}((\theta_{1}^{0},\tilde{\theta}_{(1)}^{T})^{T}),\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1}\big{\rangle}^{2}
    =\displaystyle= var(n1/2α1i=1nΦi1q=1d1FqΦiq+1,Riθ10Φi1)+op(1)\displaystyle\text{var}\big{(}n^{-1/2}\alpha^{-1}\sum_{i=1}^{n}\big{\langle}\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},R_{i}-\theta_{1}^{0}\Phi_{i1}\big{\rangle}\big{)}+o_{p}(1)

    under Condition 5, it implies c~n(θ10)(2logelr1(θ10))χ12\tilde{c}_{n}(\theta_{1}^{0})\big{(}-2\log\textsc{elr}_{1}(\theta_{1}^{0})\big{)}\rightarrow\chi^{2}_{1} in distribution when θ10>0\theta_{1}^{0}>0.

  2. (2)

    When θ10=0\theta_{1}^{0}=0, Lemma A2 implies

    c~n(0)(2logL1(0)maxθ10L1(θ1))=c~n(0)(2logW1(0))I(i=1nZi(0)0)\tilde{c}_{n}(0)\big{(}-2\log\frac{L_{1}(0)}{\max_{\theta_{1}\geq 0}L_{1}(\theta_{1})}\big{)}=\tilde{c}_{n}(0)\big{(}-2\log W_{1}(0)\big{)}I\big{(}\sum_{i=1}^{n}Z_{i}(0)\geq 0\big{)}

    as nn is large enough. Therefore, c~n(0)(2logelr1(0))U+2\tilde{c}_{n}(0)\big{(}-2\log\textsc{elr}_{1}(0)\big{)}\rightarrow U_{+}^{2} in distribution, where UN(0,1)U\sim N(0,1) and U+=max(U,0)U_{+}=\max(U,0).

Lemma A2.

Let W1(θ1)=nnL1(θ1)W_{1}(\theta_{1})=n^{n}L_{1}(\theta_{1}) and θ~1=argmaxθ10W1(θ1)\tilde{\theta}_{1}=\arg\max_{\theta_{1}\geq 0}W_{1}(\theta_{1}). If the true value θ1>0\theta^{*}_{1}>0, then W1(θ~1)=1W_{1}(\tilde{\theta}_{1})=1 as nn is large enough. If the true value θ1=0\theta^{*}_{1}=0, then W1(θ~1)=I(i=1nZi(0)0)+W1(0)I(i=1nZi(0)<0)W_{1}(\tilde{\theta}_{1})=I(\sum_{i=1}^{n}Z_{i}(0)\geq 0)+W_{1}(0)I(\sum_{i=1}^{n}Z_{i}(0)<0) as nn is large enough.

Proof.

Let θˇ1=argmaxθ1W1(θ1)\check{\theta}_{1}=\arg\max_{\theta_{1}}W_{1}(\theta_{1}). We use a similar method in Qin and Lawless, (1994). Since

θˇ1=argminθ12logW1(θ1),\check{\theta}_{1}=\arg\min_{\theta_{1}}-2\log W_{1}(\theta_{1}),
0=(2logW1(θ1))θ1|θ1=θˇ1\displaystyle 0=\frac{\partial(-2\log W_{1}(\theta_{1}))}{\partial\theta_{1}}|_{\theta_{1}=\check{\theta}_{1}} =2i=1nλθ1Zi+λZiθ11+λZi|θ1=θˇ1\displaystyle=2\sum_{i=1}^{n}\frac{\frac{\partial\lambda}{\partial\theta_{1}}Z_{i}+\lambda\frac{\partial Z_{i}}{\partial\theta_{1}}}{1+\lambda Z_{i}}|_{\theta_{1}=\check{\theta}_{1}}
=2λi=1n11+λZiZiθ1|θ1=θˇ1 (by (A11)).\displaystyle=2\lambda\sum_{i=1}^{n}\frac{1}{1+\lambda Z_{i}}\frac{\partial Z_{i}}{\partial\theta_{1}}|_{\theta_{1}=\check{\theta}_{1}}\text{ (by (\ref{eq:Q1}))}. (A13)

Let λˇ=λ(θˇ1)\check{\lambda}=\lambda(\check{\theta}_{1}). We note θˇ1\check{\theta}_{1} and λˇ\check{\lambda} satisfy

Q1n(θˇ1,λˇ)=0,Q2n(θˇ1,λˇ)=0,Q_{1n}(\check{\theta}_{1},\check{\lambda})=0,\quad Q_{2n}(\check{\theta}_{1},\check{\lambda})=0,

where

Q1n(θ1,λ)\displaystyle Q_{1n}(\theta_{1},\lambda) =1ni=1nZi(θ1)1+λZi(θ1) (by (A11)),\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\frac{Z_{i}(\theta_{1})}{1+\lambda Z_{i}(\theta_{1})}\text{ (by (\ref{eq:Q1}))},
Q2n(θ1,λ)\displaystyle Q_{2n}(\theta_{1},\lambda) =λni=1n11+λZi(θ1)Zi(θ1)θ1 (by (A.1)).\displaystyle=\frac{\lambda}{n}\sum_{i=1}^{n}\frac{1}{1+\lambda Z_{i}(\theta_{1})}\frac{\partial Z_{i}(\theta_{1})}{\partial\theta_{1}}\text{ (by (\ref{eq:Q2}))}.

Taking derivatives about θ1\theta_{1} and λ\lambda, we have

Q1n(θ1,0)θ1=1ni=1nZi(θ1)θ1,\displaystyle\frac{\partial Q_{1n}(\theta_{1},0)}{\partial\theta_{1}}=\frac{1}{n}\sum_{i=1}^{n}\frac{\partial Z_{i}(\theta_{1})}{\partial\theta_{1}}, Q1n(θ1,0)λ=1ni=1nZi(θ1)2,\displaystyle\frac{\partial Q_{1n}(\theta_{1},0)}{\partial\lambda}=-\frac{1}{n}\sum_{i=1}^{n}Z_{i}(\theta_{1})^{2},
Q2n(θ1,0)θ1=0,\displaystyle\frac{\partial Q_{2n}(\theta_{1},0)}{\partial\theta_{1}}=0, Q2n(θ1,0)λ=1ni=1nZi(θ1)θ1.\displaystyle\frac{\partial Q_{2n}(\theta_{1},0)}{\partial\lambda}=\frac{1}{n}\sum_{i=1}^{n}\frac{\partial Z_{i}(\theta_{1})}{\partial\theta_{1}}.

Expanding Q1nQ_{1n} and Q2nQ_{2n} at (θ1=θ1,λ=0)(\theta_{1}=\theta^{*}_{1},\lambda=0), we have

0=\displaystyle 0= Q1n(θˇ1,λˇ)\displaystyle Q_{1n}(\check{\theta}_{1},\check{\lambda})
=\displaystyle= Q1n(θ1,0)+Q1n(θ1,0)θ1(θˇ1θ1)+Q1n(θ1,0)λλˇ+op(n1/2),\displaystyle Q_{1n}(\theta^{*}_{1},0)+\frac{\partial Q_{1n}(\theta^{*}_{1},0)}{\partial\theta_{1}}(\check{\theta}_{1}-\theta^{*}_{1})+\frac{\partial Q_{1n}(\theta^{*}_{1},0)}{\partial\lambda}\check{\lambda}+o_{p}(n^{-1/2}), (A14)
0=\displaystyle 0= Q2n(θˇ1,λˇ)\displaystyle Q_{2n}(\check{\theta}_{1},\check{\lambda})
=\displaystyle= Q2n(θ1,0)+Q2n(θ1,0)θ1(θˇ1θ1)+Q2n(θ1,0)λλˇ+op(n1/2).\displaystyle Q_{2n}(\theta^{*}_{1},0)+\frac{\partial Q_{2n}(\theta^{*}_{1},0)}{\partial\theta_{1}}(\check{\theta}_{1}-\theta^{*}_{1})+\frac{\partial Q_{2n}(\theta_{1}^{*},0)}{\partial\lambda}\check{\lambda}+o_{p}(n^{-1/2}). (A15)

(A14) and (A15) give

(θˇ1θ1λˇ)=(1ni=1nZi(θ1)θ11ni=1nZi(θ1)201ni=1nZi(θ1)θ1)1(1ni=1nZi(θ1)+op(n1/2)op(n1/2))\begin{pmatrix}\check{\theta}_{1}-\theta^{*}_{1}\\ \check{\lambda}\end{pmatrix}=\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}\frac{\partial Z_{i}(\theta^{*}_{1})}{\partial\theta_{1}}&-\frac{1}{n}\sum_{i=1}^{n}Z_{i}(\theta^{*}_{1})^{2}\\ 0&\frac{1}{n}\sum_{i=1}^{n}\frac{\partial Z_{i}(\theta^{*}_{1})}{\partial\theta_{1}}\end{pmatrix}^{-1}\begin{pmatrix}-\frac{1}{n}\sum_{i=1}^{n}Z_{i}(\theta^{*}_{1})+o_{p}(n^{-1/2})\\ o_{p}(n^{-1/2})\end{pmatrix}

Hence,

θˇ1θ1=n1i=1nZi(θ1)n1i=1nZi(θ1)θ1+op(n1/2),\check{\theta}_{1}-\theta^{*}_{1}=-\frac{n^{-1}\sum_{i=1}^{n}Z_{i}(\theta^{*}_{1})}{n^{-1}\sum_{i=1}^{n}\frac{\partial Z_{i}(\theta^{*}_{1})}{\partial\theta_{1}}}+o_{p}(n^{-1/2}), (A16)

where (i=1nZi(θ1))/(i=1nZi(θ1)/θ1)=Op(n1/2)(\sum_{i=1}^{n}Z_{i}(\theta^{*}_{1}))/(\sum_{i=1}^{n}\partial Z_{i}(\theta^{*}_{1})/\partial\theta_{1})=O_{p}(n^{-1/2}).

When θ1>0\theta^{*}_{1}>0, (A16) implies θˇ1>0\check{\theta}_{1}>0 as nn is large enough. Thus, θ~1=θˇ1\tilde{\theta}_{1}=\check{\theta}_{1} as nn is large enough. Then θ~1\tilde{\theta}_{1} satisfies (A.1), i.e.,

2λi=1n11+λZi(Φi1F2)=0.2\lambda\sum_{i=1}^{n}\frac{1}{1+\lambda Z_{i}}(-\|\Phi_{i1}\|_{F}^{2})=0. (A17)

Plugging (A10) into (A17), we have λ=0\lambda=0. Then pi=n1p_{i}=n^{-1} and W1(θ~1)=1W_{1}(\tilde{\theta}_{1})=1.

When θ1=0\theta^{*}_{1}=0, θ~1=θˇ1I(θˇ10)\tilde{\theta}_{1}=\check{\theta}_{1}I(\check{\theta}_{1}\geq 0) as nn is large enough. Since i=1nZi(0)/θ1=i=1nΦi1F2<0\sum_{i=1}^{n}\partial Z_{i}(0)/\partial\theta_{1}=-\sum_{i=1}^{n}\|\Phi_{i1}\|_{F}^{2}<0, we have θ~1=θˇ1I(i=1nZi(0)0)\tilde{\theta}_{1}=\check{\theta}_{1}I(\sum_{i=1}^{n}Z_{i}(0)\geq 0) as nn is large enough. So W1(θ~1)=I(i=1nZi(0)0)+W1(0)I(i=1nZi(0)<0)W_{1}(\tilde{\theta}_{1})=I(\sum_{i=1}^{n}Z_{i}(0)\geq 0)+W_{1}(0)I(\sum_{i=1}^{n}Z_{i}(0)<0). ∎

of Theorem 2.

Let Δ\Delta be a dd-dimensional vector with the kkth element Δk=i=1ntr(ΦikE(ϵ^i))\Delta_{k}=\sum_{i=1}^{n}\text{tr}(\Phi_{ik}E(\hat{\epsilon}_{i})). Let ςi=(tr(Φi1Φi2),,tr(Φi1Φid))T\varsigma_{i}=(\text{tr}(\Phi_{i1}\Phi_{i2}),\cdots,\text{tr}(\Phi_{i1}\Phi_{id}))^{T}. For i=1,,ni=1,\cdots,n,

E(R^i)=Hi(θ)+E(ϵ^i),\displaystyle E(\hat{R}_{i})=H_{i}(\theta^{*})+E(\hat{\epsilon}_{i}),
E(θ^(1))=θ(1)+(Ξ1)1TΔ,\displaystyle E(\hat{\theta}_{(1)})=\theta^{*}_{(1)}+\big{(}\Xi^{-1}\big{)}_{-1}^{T}\Delta,

so we have

E(Z^i(θ10))=tr(Φi1E(ϵ^i))ςiT(Ξ1)1TΔ=O(n1),\displaystyle E(\hat{Z}_{i}(\theta_{1}^{0}))=\text{tr}(\Phi_{i1}E(\hat{\epsilon}_{i}))-\varsigma_{i}^{T}(\Xi^{-1})_{-1}^{T}\Delta=O(n^{-1}), (A18)
E(n1/2i=1nZ^i(θ10))=o(1)\displaystyle E(n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0}))=o(1)

under Proposition 1. Then with similar techniques in the proof of Theorem A1, Theorem 2 can be proved. ∎

A.2 Proof of of Proposition 1

of Proposition 1.

Since

n1/2(β^β)=n1/2(XTX)1XTr=(n1XTX)1n1/2XTr,n^{1/2}(\hat{\beta}-\beta^{*})=n^{1/2}(X^{T}X)^{-1}X^{T}r=(n^{-1}X^{T}X)^{-1}n^{-1/2}X^{T}r,

we have n1/2(β^β)𝑑Σ1ηn^{1/2}(\hat{\beta}-\beta^{*})\xrightarrow{d}\Sigma^{-1}\eta.

Under Condition 5, Eri22E\|r_{i}\|_{2}^{2} and Eri24E\|r_{i}\|_{2}^{4} are bounded uniformly. Since

nE(ri(ββ^)TXiT)=\displaystyle nE(r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T})= E(rik=1nrkTXk(n1XTX)1XiT)\displaystyle-E(r_{i}\sum_{k=1}^{n}r_{k}^{T}X_{k}(n^{-1}X^{T}X)^{-1}X_{i}^{T})
=\displaystyle= E(ririTXi(n1XTX)1XiT)O(1),\displaystyle-E(r_{i}r_{i}^{T}X_{i}(n^{-1}X^{T}X)^{-1}X_{i}^{T})\rightarrow O(1),
nE(Xi(ββ^)(ββ^)TXiT)O(1),nE(X_{i}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{i}^{T})\rightarrow O(1),

we have E(ϵ^i)=O(n1)E({\hat{\epsilon}_{i}})=O(n^{-1}).

Note that for iji\neq j,

cov(ririT,ϵ^j)\displaystyle\text{cov}({r_{i}r_{i}^{T}},{\hat{\epsilon}_{j}})
=\displaystyle= cov(ririT,rj(ββ^)TXjT)+cov(ririT,Xj(ββ^)rjT)+cov(ririT,Xj(ββ^)(ββ^)TXjT)\displaystyle\text{cov}({r_{i}r_{i}^{T}},{r_{j}(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}})+\text{cov}({r_{i}r_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})r_{j}^{T}})+\text{cov}({r_{i}r_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}})
=\displaystyle= cov(ririT,Xj(ββ^)(ββ^)TXjT).\displaystyle\text{cov}({r_{i}r_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}}).

Since

n2cov(ririT,Xj(ββ^)(ββ^)TXjT)\displaystyle n^{2}\text{cov}({r_{i}r_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}})
=\displaystyle= cov(ririT,Xj(n1XTX)1l=1nXlTrlk=1nrkTXk(n1XTX)1XjT)\displaystyle\text{cov}({r_{i}r_{i}^{T}},{X_{j}(n^{-1}X^{T}X)^{-1}\sum_{l=1}^{n}X_{l}^{T}r_{l}\sum_{k=1}^{n}r_{k}^{T}X_{k}(n^{-1}X^{T}X)^{-1}X_{j}^{T}})
\displaystyle\rightarrow cov(ririT,XjΣ1XiTririTXiΣ1XjT)=O(1),\displaystyle\text{cov}({r_{i}r_{i}^{T}},{X_{j}\Sigma^{-1}X_{i}^{T}r_{i}r_{i}^{T}X_{i}\Sigma^{-1}X_{j}^{T}})=O(1),

cov(ririT,ϵ^j)=O(n2)\text{cov}({r_{i}r_{i}^{T}},{\hat{\epsilon}_{j}})=O(n^{-2}).

For cov(ϵ^i,ϵ^j),ij\text{cov}({\hat{\epsilon}_{i}},{\hat{\epsilon}_{j}}),~{}i\neq j, we only analyze cov(ri(ββ^)TXiT,rj(ββ^)TXjT)\text{cov}({r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}},{r_{j}(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}}), cov(ri(ββ^)TXiT,\text{cov}({r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}}, Xj(ββ^)(ββ^)TXjT){X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}}) and cov(Xi(ββ^)(ββ^)TXiT,\text{cov}({X_{i}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}}, Xj(ββ^)(ββ^)TXjT){X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}}). Since

n2cov(ri(ββ^)TXiT,rj(ββ^)TXjT)\displaystyle n^{2}\text{cov}({r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}},{r_{j}(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}})
=\displaystyle= cov(ril=1nrlTXl(n1XTX)1XiT,rjk=1nrkTXk(n1XTX)1XjT)\displaystyle\text{cov}({r_{i}\sum_{l=1}^{n}r_{l}^{T}X_{l}(n^{-1}X^{T}X)^{-1}X_{i}^{T}},{r_{j}\sum_{k=1}^{n}r_{k}^{T}X_{k}(n^{-1}X^{T}X)^{-1}X_{j}^{T}})
\displaystyle\rightarrow cov(rirjTXjΣ1XiT,rjriTXiΣ1XjT)=O(1),\displaystyle\text{cov}({r_{i}r_{j}^{T}X_{j}\Sigma^{-1}X_{i}^{T}},{r_{j}r_{i}^{T}X_{i}\Sigma^{-1}X_{j}^{T}})=O(1),
n2cov(Xi(ββ^)(ββ^)TXiT,Xj(ββ^)(ββ^)TXjT)O(1),n^{2}\text{cov}({X_{i}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}})\rightarrow O(1),
cov(ri(ββ^)TXiT,Xj(ββ^)(ββ^)TXjT)\displaystyle\text{cov}({r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}})
=\displaystyle= n3cov(rik=1nrkTXk(n1XTX)1XiT,Xj(n1XTX)1s=1nt=1nXsTrsrtTXt(n1XTX)1XjT)\displaystyle-n^{-3}\text{cov}({r_{i}\sum_{k=1}^{n}r_{k}^{T}X_{k}(n^{-1}X^{T}X)^{-1}X_{i}^{T}},{X_{j}(n^{-1}X^{T}X)^{-1}\sum_{s=1}^{n}\sum_{t=1}^{n}X_{s}^{T}r_{s}r_{t}^{T}X_{t}(n^{-1}X^{T}X)^{-1}X_{j}^{T}})
=\displaystyle= n3cov(ririTXi(n1XTX)1XiT,Xj(n1XTX)1XiTririTXi(n1XTX)1XjT)\displaystyle-n^{-3}\text{cov}({r_{i}r_{i}^{T}X_{i}(n^{-1}X^{T}X)^{-1}X_{i}^{T}},{X_{j}(n^{-1}X^{T}X)^{-1}X_{i}^{T}r_{i}r_{i}^{T}X_{i}(n^{-1}X^{T}X)^{-1}X_{j}^{T}})
n3kicov(rirkTXk(n1XTX)1XiT,Xj(n1XTX)1(XiTrirkTXk+XkTrkriTXi)(n1XTX)1XjT),\displaystyle-n^{-3}\sum_{k\neq i}\text{cov}({r_{i}r_{k}^{T}X_{k}(n^{-1}X^{T}X)^{-1}X_{i}^{T}},{X_{j}(n^{-1}X^{T}X)^{-1}(X_{i}^{T}r_{i}r_{k}^{T}X_{k}+X_{k}^{T}r_{k}r_{i}^{T}X_{i})(n^{-1}X^{T}X)^{-1}X_{j}^{T}}),

we have cov(ri(ββ^)TXiT,rj(ββ^)TXjT)\text{cov}({r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}},{r_{j}(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}}), cov(ri(ββ^)TXiT,Xj(ββ^)(ββ^)TXjT)\text{cov}({r_{i}(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}}), and cov(Xi(ββ^)(ββ^)TXiT,Xj(ββ^)(ββ^)TXjT)=O(n2)\text{cov}({X_{i}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{i}^{T}},{X_{j}(\beta^{*}-\hat{\beta})(\beta^{*}-\hat{\beta})^{T}X_{j}^{T}})=O(n^{-2}). Hence, cov(ϵ^i,ϵ^j)=O(n2)\text{cov}({\hat{\epsilon}_{i}},{\hat{\epsilon}_{j}})=O(n^{-2}). ∎

A.3 proof of equation (4)

of equation (4).

Rewrite Ξ\Xi as Ξ=(E11E12E21E22)\Xi=\left(\begin{smallmatrix}E_{11}&E_{12}\\ E_{21}&E_{22}\end{smallmatrix}\right) with E11E_{11} being a scalar. Rewrite Υ^\hat{\Upsilon} as Υ^=(Υ^1,Υ^(1))T\hat{\Upsilon}=(\hat{\Upsilon}_{1},\hat{\Upsilon}_{(1)})^{T}. So

θ^(1)=(Ξ1)1TΥ^=E221E21q1Υ^1+E221Υ^(1)+E221E21q1E12E221Υ^(1),\displaystyle\hat{\theta}_{(1)}=(\Xi^{-1})_{-1}^{T}\hat{\Upsilon}=-E_{22}^{-1}E_{21}q^{-1}\hat{\Upsilon}_{1}+E_{22}^{-1}\hat{\Upsilon}_{(1)}+E_{22}^{-1}E_{21}q^{-1}E_{12}E_{22}^{-1}\hat{\Upsilon}_{(1)},

where q=E11E12E221E21q=E_{11}-E_{12}E_{22}^{-1}E_{21}. Let F=E221E21F=E_{22}^{-1}E_{21}. We obtain

i=1nZ^i(θ10)=\displaystyle\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0})= Υ^1FT(q1E21Υ^1+Υ^(1)+E21q1E12E221Υ^(1))E11θ10\displaystyle\hat{\Upsilon}_{1}-F^{T}\big{(}-q^{-1}E_{21}\hat{\Upsilon}_{1}+\hat{\Upsilon}_{(1)}+E_{21}q^{-1}E_{12}E_{22}^{-1}\hat{\Upsilon}_{(1)}\big{)}-E_{11}\theta_{1}^{0}
=\displaystyle= (1+q1FTE21)Υ^1(1+FTE21q1)FTΥ^(1)E11θ10\displaystyle(1+q^{-1}F^{T}E_{21})\hat{\Upsilon}_{1}-(1+F^{T}E_{21}q^{-1})F^{T}\hat{\Upsilon}_{(1)}-E_{11}\theta_{1}^{0}
=\displaystyle= (1+q1FTE21)i=1nΦi1q=1d1FqΦiq+1,R^iE11θ10\displaystyle(1+q^{-1}F^{T}E_{21})\sum_{i=1}^{n}\big{\langle}\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\hat{R}_{i}\big{\rangle}-E_{11}\theta_{1}^{0}
=\displaystyle= i=1nD^i(θ10),\displaystyle\sum_{i=1}^{n}\hat{D}_{i}(\theta_{1}^{0}),

where D^i(θ10)=α1Φi1q=1d1FqΦiq+1,R^iθ10Φi1\hat{D}_{i}(\theta_{1}^{0})=\alpha^{-1}\big{\langle}\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\hat{R}_{i}-\theta_{1}^{0}\Phi_{i1}\big{\rangle}. Note that for any b=(b1,,bd1)Tb=(b_{1},\cdots,b_{d-1})^{T},

i=1nΦi1q=1d1FqΦiq+1,j=1d1bjΦij+1=j=1d1bjΞ1j+1j=1d1bjq=1d1FqΞq+1j+1=0,\displaystyle\sum_{i=1}^{n}\big{\langle}\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\sum_{j=1}^{d-1}b_{j}\Phi_{ij+1}\big{\rangle}=\sum_{j=1}^{d-1}b_{j}\Xi_{1j+1}-\sum_{j=1}^{d-1}b_{j}\sum_{q=1}^{d-1}F_{q}\Xi_{q+1j+1}=0,

so we have

i=1nD^i(θ10)=i=1nM^i(θ10),\sum_{i=1}^{n}\hat{D}_{i}(\theta_{1}^{0})=\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0}),

where M^i(θ10)=α1Φi1q=1d1FqΦiq+1,R^iHi((θ10,θ^(1)T)T)\hat{M}_{i}(\theta_{1}^{0})=\alpha^{-1}\langle\Phi_{i1}-\sum_{q=1}^{d-1}F_{q}\Phi_{iq+1},\hat{R}_{i}-H_{i}((\theta_{1}^{0},\hat{\theta}_{(1)}^{T})^{T})\rangle. ∎

A.4 Proof of Proposition 2

of Proposition 2.

Since ξi(g)N(0,1)\xi_{i}^{(g)}\sim N(0,1) and (A18), property (i) holds.

To prove property (ii), we have

var(n1/2i=1nM^i(θ10,t)ξi(g))=n1i=1nvar(M^i(θ10,t)ξi(g))=n1i=1nEM^i(θ10,t)2,\displaystyle\text{var}\big{(}n^{-1/2}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)}\big{)}=n^{-1}\sum_{i=1}^{n}\text{var}(\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)})=n^{-1}\sum_{i=1}^{n}E\hat{M}_{i}(\theta_{1}^{0},t)^{2},

and

var(n1/2i=1nZ^i(θ10,t))=\displaystyle\text{var}\big{(}n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},t)\big{)}= var(n1/2i=1nD^i(θ10,t))=n1i=1nvar(D^i(θ10,t))+o(1)\displaystyle\text{var}\big{(}n^{-1/2}\sum_{i=1}^{n}\hat{D}_{i}(\theta_{1}^{0},t)\big{)}=n^{-1}\sum_{i=1}^{n}\text{var}(\hat{D}_{i}(\theta_{1}^{0},t))+o(1)
=\displaystyle= n1i=1nEM^i(θ10,t)2+o(1).\displaystyle n^{-1}\sum_{i=1}^{n}E\hat{M}_{i}(\theta_{1}^{0},t)^{2}+o(1).

Thus, we have property (ii).

Since

cov(n1/2i=1nM^i(θ10,s)ξi(g),n1/2j=1nM^j(θ10,t)ξj(g))=n1i=1ncov(M^i(θ10,s)ξi(g),M^i(θ10,t)ξi(g))\displaystyle\text{cov}\big{(}n^{-1/2}\sum_{i=1}^{n}\hat{M}_{i}(\theta_{1}^{0},s)\xi_{i}^{(g)},n^{-1/2}\sum_{j=1}^{n}\hat{M}_{j}(\theta_{1}^{0},t)\xi_{j}^{(g)}\big{)}=n^{-1}\sum_{i=1}^{n}\text{cov}(\hat{M}_{i}(\theta_{1}^{0},s)\xi_{i}^{(g)},\hat{M}_{i}(\theta_{1}^{0},t)\xi_{i}^{(g)})
=\displaystyle= n1i=1nE(M^i(θ10,s)M^i(θ10,t)),\displaystyle n^{-1}\sum_{i=1}^{n}E(\hat{M}_{i}(\theta_{1}^{0},s)\hat{M}_{i}(\theta_{1}^{0},t)),

and

cov(n1/2i=1nZ^i(θ10,s),n1/2j=1nZ^j(θ10,t))=cov(n1/2i=1nD^i(θ10,s),n1/2j=1nD^j(θ10,t))\displaystyle\text{cov}\big{(}n^{-1/2}\sum_{i=1}^{n}\hat{Z}_{i}(\theta_{1}^{0},s),n^{-1/2}\sum_{j=1}^{n}\hat{Z}_{j}(\theta_{1}^{0},t)\big{)}=\text{cov}\big{(}n^{-1/2}\sum_{i=1}^{n}\hat{D}_{i}(\theta_{1}^{0},s),n^{-1/2}\sum_{j=1}^{n}\hat{D}_{j}(\theta_{1}^{0},t)\big{)}
=\displaystyle= n1i=1ncov(D^i(θ10,s),D^i(θ10,t))+o(1)=n1i=1nE(M^i(θ10,s)M^i(θ10,t))+o(1),\displaystyle n^{-1}\sum_{i=1}^{n}\text{cov}(\hat{D}_{i}(\theta_{1}^{0},s),\hat{D}_{i}(\theta_{1}^{0},t))+o(1)=n^{-1}\sum_{i=1}^{n}E(\hat{M}_{i}(\theta_{1}^{0},s)\hat{M}_{i}(\theta_{1}^{0},t))+o(1),

we see property (iii) holds. ∎

References

  • Burton et al., (2013) Burton, C., McKinstry, B., Tătar, A. S., Serrano-Blanco, A., Pagliari, C., and Wolters, M. (2013). Activity monitoring in patients with depression: a systematic review. Journal of Affective Disorders, 145(1):21–28.
  • Chang and McKeague, (2016) Chang, H.-w. and McKeague, I. W. (2016). Empirical likelihood based tests for stochastic ordering under right censorship. Electronic Journal of Statistics, 10(2):2511.
  • Krane-Gartiser et al., (2014) Krane-Gartiser, K., Henriksen, T. E. G., Morken, G., Vaaler, A., and Fasmer, O. B. (2014). Actigraphic assessment of motor activity in acutely admitted inpatients with bipolar disorder. PloS One, 9(2):e89574.
  • Li and Pan, (2013) Li, D. and Pan, J. (2013). Empirical likelihood for generalized linear models with longitudinal data. Journal of Multivariate Analysis, 114:63–73.
  • Owen, (1991) Owen, A. (1991). Empirical likelihood for linear models. The Annals of Statistics, pages 1725–1747.
  • Owen, (1988) Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2):237–249.
  • Owen, (2001) Owen, A. B. (2001). Empirical likelihood. CRC press.
  • Qin and Lawless, (1994) Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. the Annals of Statistics, 22(1):300–325.
  • Wang et al., (2010) Wang, S., Qian, L., and Carroll, R. J. (2010). Generalized empirical likelihood methods for analyzing longitudinal data. Biometrika, 97(1):79–93.
  • Xue and Zhu, (2007) Xue, L. and Zhu, L. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94(4):921–937.
  • You et al., (2006) You, J., Chen, G., and Zhou, Y. (2006). Block empirical likelihood for longitudinal partially linear regression models. Canadian Journal of Statistics, 34(1):79–96.
  • Zou et al., (2002) Zou, F., Fine, J., and Yandell, B. (2002). On empirical likelihood for a semiparametric mixture model. Biometrika, 89(1):61–75.