Empirical Likelihood Inference of Variance Components in Linear Mixed-Effects Models
Abstract
Linear mixed-effects models are widely used in analyzing repeated measures data, including clustered and longitudinal data, where inferences of both fixed effects and variance components are of importance. Unlike the fixed effect inference that has been well studied, inference on the variance components is more challenging due to null value being on the boundary and the nuisance parameters of the fixed effects. Existing methods often require strong distributional assumptions on the random effects and random errors. In this paper, we develop empirical likelihood-based methods for the inference of the variance components in the presence of fixed effects. A nonparametric version of the Wilks’ theorem for the proposed empirical likelihood ratio statistics for variance components is derived. We also develop an empirical likelihood test for multiple variance components related to a sequence of correlated outcomes. Simulation studies demonstrate that the proposed methods exhibit better type 1 error control than the commonly used likelihood ratio tests when the Gaussian distributional assumptions of the random effects are violated. We apply the methods to investigate the heritability of physical activity as measured by wearable device in the Australian Twin study and observe that such activity is heritable only in the quantile range from 0.375 to 0.514.
Keywords Boundary value; Global test; Heritability; Nonparametric test; Wearable device data.
1 Introduction
Longitudinal and clustered data commonly arise from observational studies or clinical trials, where subjects are measured repeatedly over time or within a cluster. The repeated measures within a subject or a cluster are often correlated. To analyze such data, linear mixed-effects models that incorporate both fixed and random effects are widely used. Many statistical methods have been developed for such linear mixed-effects models, especially methods for inference of the fixed effects. However, inference on the variance components is less studied and often requires strong distributional assumptions on the random effects and the error terms. When the underlying distributions are known, classical inference methods, including the likelihood ratio tests and the score tests, can be applied. However, these parametric methods are often restrictive and not robust if the model assumptions are violated.
Empirical likelihood (el) method, as an alternative to parametric likelihood-based methods, was first proposed by Owen, (1988) and has been applied to many statistical inference problems. Without a prespecified distributional assumption on the data, el methods incorporate side information through constraints or prior distributions and have favorable statistical properties, including but not limited to Bartlett correctability, transformation invariance, better coverage accuracy of the corresponding confidence internals and greater power. A comprehensive introduction to el methods can be found in Owen, (2001). el methods have been applied to inferences of mixture models (Zou et al.,, 2002) and censored survival data (Chang and McKeague,, 2016), and have also been considered for longitudinal data modeling. For example, You et al., (2006) proposed a block el method for inference of the regression parameters assuming a working independence covariance, and Xue and Zhu, (2007) considered a semiparametric regression model, where the repeated within-subject measures are summarized as a function over time in order to address the dependence issue. Wang et al., (2010) proposed a generalized el method that takes into account the within-subject correlations. Li and Pan, (2013) defined an empirical likelihood ratio (elr) test by utilizing the extended score from quadratic inference functions for longitudinal data, which does not involve direct estimation of the correlation parameters.
The el methods mentioned above only focus on the inference of fixed effects in linear mixed effect models. In this paper, we consider a general setting of linear mixed-effects models and develop el methods for the inference of the variance components. Specifically, suppose there are subjects and denote by the number of repeated measures for the th subject. For the th subject, we observe a response vector , an design matrix for the fixed effects , and semi-positive design matrices for the variance components . The general linear mixed-effects model can be written as
(1) |
where is a zero-mean random variable with variance-covariance . We assume that has a linear structure,
where is the vector of the variance components. We emphasize that this general setting does not require any assumptions on the distributions of the data or the distributions of the random effects.
In many real applications, we are interested in making statistical inference on the variance components in model (1). For example, in the study of heritability based on twin data, each monozygotic twin or dizygotic twin is treated as one cluster, and the linear variance structure can be constructed based on the twin type (see details in Section 6). In the heritability analysis, a key question is whether there exists an genetic effect, which motivates us to study the inference of one of the variance components, say, . We propose to develop an el based inference method for without any assumptions on the random components. The method can effectively account for the nuisance parameters, including the unknown fixed effects and the variance components . The key difficulty when compared to the el inference of the fixed effects is to deal with the boundary value problem when in local testing problem . To solve the issues, we propose a new empirical likelihood ratio test by utilizing an unbiased estimator of under very mild conditions, and prove that the asymptotic distribution of the test statistic is a mixture of distribution.
Motivated by heritability analysis of daily activity distribution as measured by wearable device such as actigraphy, we also consider the setting when linear mixed-effects models are fitted to a sequence of dependent outcomes. The wearable device data have been increasingly collected for continuous activity monitoring in large observational or experimental studies (Burton et al.,, 2013; Krane-Gartiser et al.,, 2014). In typical wearable activity tracking data, the activity is measured at one-minute resolution over several days for a given subject. Such wearable device data with repeated measures enable us to account for day-to-day variability of the activity. Instead of focusing on the activity counts at any minute of the day, daily activity distribution or the amount of time with the activity count above a given threshold provides a biologically meaningful measure of the activity traits. When the activity counts are summarized as distributions, we can consider the activity quantile profile as a phenotype measure. In analysis of such wearable device data, we fit a linear mixed-effects model for each of the activity level or quantile at . Denote by the variance components for activity profile at level . We are then interested in testing the global null . We develop a max-type statistic for this global testing problem. Since the numerator of the proposed empirical likelihood ratio (elr) tests can be rewritten as the sum of approximately independent random variables over different subjects, a random perturbation method is developed to approximate the -values of the proposed global test.
We first introduce some notation. Denote by the submatrix of without the first column of . For two vectors or matrices and of compatible dimension, define the inner product . For a matrix , where is the th column of , the vectorized is defined by . Let and be the expectation and variance of a random vector , and let be the covariance of random vectors and . When is a random matrix, and represent the expectation and variance of the vectorized . When and are random matrices, denotes the covariance of the vectorized and vectorized . We use to denote that and are of the same order, and to denote that is of a smaller order than . We use to denote that and are of the same order in probability, and to denote that is of a smaller order than y in probability.
2 elr test for the fixed effects
Statistical tests for the fixed effects in the linear mixed-effects model (1), , has been well studied. We first briefly review the subject-wise el method proposed in Wang et al., (2010), where the covariance structure for each subject is considered.
Let be an estimator of , and assume that converges to some in probability uniformly over all . One such a nonparametric sample covariance matrix can be obtained using a simple two-step procedure, including estimating the residuals , where is the least-squares estimator using working independence correlation matrices, and solving the constrained optimization problem . Let
which satisfies when is the true value. Denote by the point mass at the th subject. The nonparametric empirical likelihood is defined as
Since it can be proved that (Owen,, 2001), Wang et al., (2010) proposed the elr statistic
To obtain the asymptotic distribution of the elr statistic, the following regularity conditions are needed.
Condition 1.
As , , where is the convex hull.
Condition 2.
The limit exists and is positive definite.
Condition 3.
The expectation are bounded uniformly for some .
Condition 4.
Let with element , be the th row of , and be the th element of . For each pair and with and , and sufficient moment conditions are satisfied so that and , where is but computed with all the data except for subjects and and .
Conditions 1–3 are common conditions for the empirical likelihood methods (Owen,, 1991). Condition 4 assumes mild constraints on to ensure that the difference between the statistic defined with and the one using vanishes as . Under these regularity conditions, the following theorem provides the asymptotic distribution of the elr test (Wang et al.,, 2010) under the null.
Theorem 1.
The asymptotic result only requires that the converge uniformly to some , which may not be the true (Wang et al.,, 2010). When the correlation structure is correctly specified, the estimator is a consistent estimator of . The statistic defined with the true is asymptotically locally most powerful among all the choices of the weight matrices.
3 elr test for the variance component
We consider the local test in the framework of the empirical likelihood, including the null , which is of the most interest. We define and . Since and , we have
where and exists. Since is unknown, we first need an estimator of , denoted by . One simple choice is the least-squares estimator using the all data. Specifically, we stack the data from all subjects by denoting , , and . Model (1) can be rewritten as
Then the least-squares estimator is . For , let We have
where .
To control the rates of , , and , we need the following condition, which is also commonly used for empirical likelihood methods.
Condition 5.
The expectation are bounded uniformly for some .
Under Condition 5, we see that the least-squares estimator is good enough.
Proposition 1.
Assume that and as , where , and . When Condition 5 holds and , we have , and , .
Let with , and with . We define
where
(2) |
Since Proposition 1 implies if is the true value (see (A18) in the appendix), we define the nonparametric likelihood as
and the corresponding elr statistic as
(3) |
If the true value (i.e., the null hypothesis under the case ), the denominator in (3) would not be as usual owing to the boundary value issue, and thus the existing results are inapplicable. To derive the asymptotic distribution of the proposed test , we assume the following condition similar to Condition 1.
Condition 6.
As , , where is defined as with replaced by .
Under Conditions 5 and 6, we have the following theorem on the asymptotic distribution of the elr test under the null.
Theorem 2.
Although the elr statistic in Theorem 2 involves optimizations in the numerator and denominator, the following lemma shows that the statistic has an asymptotically equivalent expression that can be used to calculate the statistic efficiently.
We next provide an estimator of the asymptotic variance of . We rewrite as
with being a scalar. Let and . It can be verified that
(4) |
where
In addition, for ,
based on proposition 1. Therefore, are asymptotically independent with expectation , while the expectations of and are (see (A18) in the appendix). We have
Therefore,
(5) |
which leads a consistent estimator of the variance of as
4 Variance Component Analysis Over a Sequence of Responses
In some applications, we are interested in testing whether the variance components are all zero over a set of possibly correlated outcomes. One example of such applications is to test the variance components for the activity distribution based on wearable device data where we are interested in testing the variance component at each of the quantiles of the activity distribution. Extending model (1), we assume the following outcome model at level ,
(6) |
where is a zero-mean random variable with variance . We assume that has the same linear structure for each ,
We are interested in testing the null , where is a pre-defined interval. We propose the following maximally selected empirical likelihood ratio statistic (gelr),
(7) |
where is the elrstatistic for the outcome at . It can be shown that with
where
Assessment of the statistical significance of the statistic defined in (7) is challenging because of the dependence of . We propose a simple way of evaluating its significance by perturbing the el statistic. Specifically, we apply (4) to rewrite as , where . We can generate the null distribution of by perturbing the test statistic . Specifically, for each , we generate from i.i.d. standard normal distribution and define
Define the corresponding perturbed test statistic as . The following Proportion 2 shows that the perturbed test statistics have the same distribution as the original test statistic under the null. Therefore, the -value of can be approximated by .
Proposition 2.
satisfies
-
(i)
;
-
(ii)
;
-
(iii)
.
5 Simulation studies
5.1 Data generation
We examine the performance of the proposed empirical likelihood ratio tests for variance components and compare the results with the standard likelihood ratio (lr) test assuming Gaussian random effects and Gaussian errors. To mimic the twin design in the heritability analysis of wearable device data that we analyze next, we simulate data on a monozygotic or dizygotic twin pair. For the th twin, let , where and are the numbers of repeated measures for the twin. In wearable device data, represents the th quantile of the activity distributions over days. The data are generated from the model:
(8) |
where , is a random intercept, and denotes zero-mean noise with variance . Here, is assumed as the sum of additive genetic effect , common environment , and unique subject-specific environment , i.e.,
where are independent zero-mean random variables with variance-covariance , , and , respectively. The variance components , and represent the additive genetic variance, common environmental variance, and unique environmental variance, respectively. For the th twin, is a genetic similarity matrix with
and quantifies shared environment between the twin pair with
Under this model, we have
We sample from with equal probability . We set , among which there are 50 monozygotic twin families and 50 dizygotic twin families, and . Denote by the th row of . Let with and . Moreover, we set where and are the quantile functions of and , respectively, and . Let
where , , , , .
5.2 elr test for single variance component
For each , we test the null hypothesis . We consider the following two types of distributions for and :
-
(i)
multivariate normal distribution, , , , ;
-
(ii)
multivariate distribution, , ,
, .
![]() |
(a) normal random effects and errors. |
![]() |
(b) -distributed random effects and errors. |
Denote by elr the proposed empirical likelihood ratio test with unknown . We first consider the type 1 error for each given value of . We consider the setting and with random errors generated both from a normal and a distribution. We repeat the simulations 500 times. Figure 1 gives the results of type 1 errors for different values of . We see all the methods perform well under the normal errors. However, lr shows inflated type 1 errors when the error follows a long-tailed distribution.
To evaluate the power of the proposed tests, we consider the model with and . We calculate the empirical power of the proposed test at 0.05 level for different values of and present the results in Figure 2. The proposed method exhibits similar power as the lr test under the normal error. When the error follows a distribution, we do not report the result of the lr test because of its inflated type 1 error as shown in Figure 1 (b), and the elr test does not lose much power.

(a) normal random effects and errors.

(b) -distributed random effects and errors.
5.3 elr test for variance components over an interval
To evaluate the proposed tests for variance components in the case of correlated outcomes over an interval of , we generate data as follows. Let , and . Let , where with , , , and . We consider two types of distributions for :
-
(i)
multivariate normal distribution, , , , ;
-
(ii)
multivariate distribution, , , , .

Denote by gELR the proposed global empirical likelihood ratio test with unknown . We first consider the global test . Let , , and , where is an indicator function. We consider different choices of the signal size by setting , and generate 500 datasets for each setting. Figure 3 presents the empirical power of gELR at 0.05 significance level under different distributions of random errors and different . As expected, the empirical power of rejecting the null hypothesis increases with the signal size. Compared to the results under the multivariate distribution, the proposed test gELR has higher power when data are normally distributed.
To further evaluate the type 1 error and the power, we consider models with , , and . We consider to test each of the candidate intervals of lengths and denote them by scan3, scan4, scan5, and scan6, respectively. Let be the set of candidate intervals under the scanning length () and let be the set of all candidate intervals. For each candidate interval , we test the null hypothesis . The signal in the interval is significant if
where with . The threshold is selected based on the extreme value distribution of normal random variables.
Under each type of error distributions, 500 datasets are generated. For the global test under the candidate interval , we mark its empirical power at . The results are shown in Figure 4. The proposed global test gELR exhibits high power if the interval involves at least one nonzero time points and show almost no power otherwise.
![]() ![]() |
6 Application to genetic heritability analysis of physical activity distribution
6.1 Description of the data
We apply the methods to data set of the Australian Twin study, which includes 366 healthy twins, 151 of them are monozygotic twins, and 215 are dizygotic twins. The participants wore actigraphy to track their physical activities for no more than 14 days. The minute-to-minute activity counts derived from actigraphy were collected in a 1440-dimensional vector per day. Since we are interested in inference of the heritability of the activity distributions, we obtain the empirical quantiles of activity counts at different quantiles, . Specifically, for the th measurement (day) from the th person in the th twin family, the raw data from the wearable device are transformed by using
For the th person in the th twin family, the th repeated measure of -quantile of activity counts is obtained as
where denotes the th order statistic of .
The covariate includes gender, age, BMI, and indicator of weekend, i.e., . Let and . Let . For each , we remove the outliers of defined as values that are more than 1.5 times the inter-quantile range above the upper quantile or below the lower quantile. After removing all outliers and removing missing data, we have twin families including 63 monozygotic twin families and 86 dizygotic twin families, and the total number of observations is 3,489.
6.2 Effects of Gender, Age, BMI, Weekend on Activity Profiles
We first examine the associations between the covariates including gender, age, BMI, and weekend vs weekday and the overall activity distribution. For each of the four covariates and each of the values, we obtain the el estimator by solving the estimating equations , and apply Theorem 1 to construct the confidence interval , where is the quantile of the distribution. The first column of Figure 5 shows the estimated regression coefficient for each of the values and its point-wise 95% confidence intervals using the el method.

We then test whether there is any difference in activity profiles between individuals of different gender, age, and BMI and whether the activity profiles are different between weekdays and weekends. Specifically, we consider testing such differences at each of the quantile . To test , we apply the empirical likelihood ratio test in Section 2 and the standard likelihood ratio (lr) test assuming normal random effects, and we obtain the -value for each of the values. The second column of Figure 5 shows the -values for each and for each of these four covariates. At the nominal -value of 0.05, the elr test shows that there is an gender effects when the activity counts are small (i.e., small ). In contrast, the standard lr test only shows such significance in a smaller interval from 0.23 to 0.42. For age, the elr shows a significant effect for the large activity counts region (i.e., large ). Both the elr and lr tests do not reject the null hypothesis that there is no effect of BMI, while the effects of weekend are statistically significant under almost the whole region of .
6.3 Analysis of heritability of the activity distribution
We then address the question whether the activity distribution is heritable, where the distribution is summarized as the quantiles. This is equivalent to test the null hypothesis . For each quantile , we first estimate the fixed effects using the least-square estimate and then apply the proposed elr to test the null hypothesis and to compare the results with the lr method. Figure 6 (a) gives the -values at different quantiles . It shows that the test is rejected for based on the elr test and using lr at the nominal 0.05 significance level. However, if we use the Bonferroni correction for multiple testing, only the proposed elr test identifies significant heritability for the quantiles between 0.375 and 0.514. The -value of global test is 0 when applying the proposed elr with 1000 permutations. Overall, our analysis shows that the activity distribution is heritable, especially in the quantile range from 0.375 to 0.514.


6.4 Sensitivity analysis of heritability of the activity distribution
To examine whether our previous preprocessing steps affect the analysis of heritability, we consider an alternative approach to remove the outliers. For each , we remove the outliers of defined as the values that are greater than 3 standard deviations from its median. After removing all the outliers and missing data, we have twin families including 64 monozygotic twin families and 88 dizygotic twin families, and the total number of observations is 4,190.
Under this preprocessing method, the -value) of testing heritability is provided in Figure 6 (b), which shows similar results as Figure 6 (a). The test is rejected for when using the elrtest and using the LR test at the nominal 0.05 significance level. If we adopt the Bonferroni correction for multiple testing, only the proposed elrtest identified significant heritability under the quantiles between 0.396 and 0.576. The -value of the global test is 0 when applying the proposed elrwith 1000 permutations.
7 Discussion
In this paper, we have developed an empirical likelihood method for making inference of the variance components in general linear mixed-effects models. The proposed empirical likelihood ratio test statistic can be applied to a large set of related outcomes such as different quantiles of the activity distribution when we analyze the wearable device data sets. Simulation studies show that the proposed methods control type 1 error much better than the likelihood ratio method when the normality assumptions do not hold.
To address the unknown nuisance variance components, we assume its true value being positive and thus as , (2) provides unbiased positive estimates with probability 1. When applying the proposed methods to the real data, we note that (2) may provide negative estimators at some quantile . To solve this problem, we first test whether the nuisance variance component (for example, or ) is zero at these quantile points. If the null hypothesis is not rejected, we omit the nuisance variance components in the model and then apply the proposed elrtest for the components of interest.
ACKNOWLEDGMENT
This research was supported by the Intramural Research Program of the National Institute of Mental Health through grant ZIA MH002954-04 [Motor Activity Research Consortium for Health (mMARCH)]. We thank Dr. Hickie and Dr. Martin for sharing the Australian twin study data as part of the mMARCH network and the Genetic Epidemiology Research Branch at National Institute of Mental Health for processing the accelerometry data.
APPENDIX
Appendix A Proofs and complements
A.1 Proof of Theorem 2
To prove Theorem 2, we first consider the setting with known . We define , , , and in the same way as , , , and , respectively, with replaced by . Under Conditions 5 and 6, we derive the asymptotic distribution of in the following theorem.
Theorem A1.
Proof.
Lemma A2.
Let and . If the true value , then as is large enough. If the true value , then as is large enough.
Proof.
Let . We use a similar method in Qin and Lawless, (1994). Since
(A13) |
Let . We note and satisfy
where
Taking derivatives about and , we have
Expanding and at , we have
(A14) | ||||
(A15) |
Hence,
(A16) |
where .
When , (A16) implies as is large enough. Thus, as is large enough. Then satisfies (A.1), i.e.,
(A17) |
When , as is large enough. Since , we have as is large enough. So . ∎
A.2 Proof of of Proposition 1
A.3 proof of equation (4)
of equation (4).
Rewrite as with being a scalar. Rewrite as . So
where . Let . We obtain
where . Note that for any ,
so we have
where . ∎
A.4 Proof of Proposition 2
References
- Burton et al., (2013) Burton, C., McKinstry, B., Tătar, A. S., Serrano-Blanco, A., Pagliari, C., and Wolters, M. (2013). Activity monitoring in patients with depression: a systematic review. Journal of Affective Disorders, 145(1):21–28.
- Chang and McKeague, (2016) Chang, H.-w. and McKeague, I. W. (2016). Empirical likelihood based tests for stochastic ordering under right censorship. Electronic Journal of Statistics, 10(2):2511.
- Krane-Gartiser et al., (2014) Krane-Gartiser, K., Henriksen, T. E. G., Morken, G., Vaaler, A., and Fasmer, O. B. (2014). Actigraphic assessment of motor activity in acutely admitted inpatients with bipolar disorder. PloS One, 9(2):e89574.
- Li and Pan, (2013) Li, D. and Pan, J. (2013). Empirical likelihood for generalized linear models with longitudinal data. Journal of Multivariate Analysis, 114:63–73.
- Owen, (1991) Owen, A. (1991). Empirical likelihood for linear models. The Annals of Statistics, pages 1725–1747.
- Owen, (1988) Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2):237–249.
- Owen, (2001) Owen, A. B. (2001). Empirical likelihood. CRC press.
- Qin and Lawless, (1994) Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. the Annals of Statistics, 22(1):300–325.
- Wang et al., (2010) Wang, S., Qian, L., and Carroll, R. J. (2010). Generalized empirical likelihood methods for analyzing longitudinal data. Biometrika, 97(1):79–93.
- Xue and Zhu, (2007) Xue, L. and Zhu, L. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94(4):921–937.
- You et al., (2006) You, J., Chen, G., and Zhou, Y. (2006). Block empirical likelihood for longitudinal partially linear regression models. Canadian Journal of Statistics, 34(1):79–96.
- Zou et al., (2002) Zou, F., Fine, J., and Yandell, B. (2002). On empirical likelihood for a semiparametric mixture model. Biometrika, 89(1):61–75.